Skip to content

mrrrk/MS40

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

GM40GPD (a.k.a. MS40)

This is a mainframe program written in C from my days at Experian in the nineties. Most of the stuff we did was written in PL/1 (the kipper tie of programming languages) but when I learned that Experian had invested in a C compiler for the mainframe, I saw it as a step on the way out of what I saw as a dead-end world and was keen to use it...

Background

This was the mid to late nineties in the Marketing Services department of Experian (nee CCN). We were responsible basically for sending out junk mail. A lot of it. The postman turned up with a 40 tonne articulated lorry. Something to feel really proud of. The mainframe site where we worked is now the site of a big Audi garage near the ring road in Nottingham. It was built on the site of an old landfill - and the roof leaked. Interesting to muse that the stuff we produced came from a landfill and shortly went back into one. Very Lion King.

Marketing Services dealt with big marketing mailing databases based on the electoral roll - so nearly everyone in the UK. When I say 'database', I mean it in the very the loosest sense. They were really just big files held on cartridge tape with one record (limited to 32 kB) typically representing one household. The record was keyed on URN (unique record number) and contained entities such as the root household, persons, accounts, etc. in a tree of hierarchically linked 'segments'. If the record spilled over 32 kB, things were split over two records - and life got complicated.

Updating these 'databases' was not done in real time but was done typically weekly (e.g. over a weekend) and involved merging some fairly massive files, over several separate processes including sorting, fuzzy name and address matching, etc. with any one process kicking out a file or files that led into the next and so on, culminating in the final grand update - the client-bespoke MS40xxx program. The whole update process, depending on the number of steps and the size of the files, could take up to 24 hours to run. Yes, 24 actual hours. That meant running through the night and being on call if in case something crashed. The MS40 stage was infamous because most of the bespoke nature of the particular update process and the inherent complexity and fragility was baked into this program. When it crashed - and it often did - it was invariably near the end when it was dealing with record overflows and other edge cases, after it had been running for several hours. Much frustration. Much wasted (expensive!) mainframe processor time. Much disturbed sleep and ruined weekends...

GM40GPD

This was my attempt as a solution for some of these problems. Rather than a bespoke MS40 based on a skeleton, written for each system, this would be a generic core program with bespoke 'plug-ins' for the bespoke stuff. By genericising the main functionality, most of the fragility could be addressed because it was tried and tested and for the most part, left alone. The bespoke plug-ins were statically linked libraries called at points during the processing. Another innovation was in the handling of the overflowing 32 kB records. Instead of spitting into two (or more) logically independent, individually valid records for a household that needed to be aggregated and processed as a group, I just built a great big logical record and split it across several physical records. No one physical, overflowing record could be read and understood in isolation - but this was rarely ever really needed anyway - the program would just read however many records were part of the group and assemble them sequentially in memory as one big record. The great big logical record approach was much simpler and cut down a lot on the horrible mash of spaghetti logic typically found at the end of an MS40.

Why the name?

Bespoke programs were usually named according to MSnnXXX where MS was Marketing Services, XXX was a client code (e.g. ACM for Acme), the number nn was a unique identifier within this scheme. The number would often vaguely indicate its function, e.g., 40 was special, sixty something meant a report of some kind, etc. For this one, they wouldn't let me use the MS prefix because it wasn't bespoke, it was generic. The suite of generic tools developed to deal with these files all started with MG - but were written by another department so mine had to be GM, not MG. No idea now what the GPD was for. I didn't really understand any of this then either.

What's the Point of This?

I found this code languishing in a folder somewhere and I just decided to stick it up on GitHub for posterity. I remember feeling pretty damned pleased with myself after writing this code in a way I've seldom been since. I barely understand the code now and haven't written any more stuff in C (apart from dabbling with Arduinos) since leaving Marketing Services, shortly after writing GP40GPD. It's just a bit of nostalgia for me and I very much doubt anyone else will look at it. Maybe if someone else is daft enough to Bingle search for MS40 and Experian, maybe they'll find it!

About

Some Old Experian Mainframe Code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages