doc/crack_passwords_on_VMs.txt

Cracking password-protected programs

The stuff explained below is theorically machine independent, and even CPU independent, but it's more adapted to a game running on an emulator or a virtual system (such as KickEmu/WHDLoad/HRTMon) where you have full control on the memory. But you can manage with only a debugger and some memory watch routines. You also need binary diff tools to compare values between 2 or 3 files. Such programs are easy to write in C, that sould make a good exercise.

To sum it up, to perform all operations described in this document, you need to:

- enter the debugger with a keypress/mouseclick/other
  (most debuggers such as Action Replay/HRTMon/ThrillKill allow that)
- put breakpoints when PC reaches given addresses
  (most debuggers allow that)
- dump the whole game memory to disk
  (most debuggers allow that)
- put memory watches when some byte is written at a given location
  (HRTMon & Action Replay do that)
- put memory watches when some byte is read at a given location
  (debuggers usually do not allow that, but you can use some trace routines
  if you don't have an MMU, or if you use WHDLoad/HRTMon with an MMU there is a very convenient command called wpr).
- have a copy of the protection codes, codewheel or whatever, in order to know what the game expects you to enter.

As opposed to disk-protected programs, password protected programs are the ones which rely on some information contained in the manual or on a protection device such as a codewheel or a colored picture. Each password protection type can be tackled from a slightly different point.

The easiest type is usually the one where some word in the manual is asked ("Enter word at page x, line y, word z"), for different reasons. First because it is the cheapest way to create a protection, once the manual has been created, and second because it is an input prompt, requiring special keyboard operation, and the moment where you press "Return" can be easily found and the code is often very straightforward.

The harder type is the one where you are asked to click on some items on screen matching a codewheel combination, or find the color on some black and white picture displayed on the screen (the colored version of the picture is in the manual, remember, that was done when color copies and scanners where expensive and not easily available, furthermore, the combination of colors used gave some trouble to xerox copiers).
It is usually harder for different reasons: first, there is an event loop, so it is harder to determine when the user clicked on something, and even harder if the Amiga OS is used (intuition) because you cannot look for hardware mouse clicks (BTST #6,$BFE001) and put breakpoints here. Second, because most of those codes, especially in adventure games, use the same system as the rest of the game. This is usually pseudo-code, and all the disassembly you can try is the code from the interpreter, the breakpoints you can set are reached anytime, with different values, so a different technique must be used, a technique I developped myself (some may have done better, but they did not talk), and I'll talk about it in the last section of this document.


Simplest case: no pseudo code, word to enter (typically Gremlins/Microprose)

Wait for the code screen to appear and the input code to be asked. Breakpoint here, and try to figure out where the test for <return> key is. If you find code like that: CMP.B #$0D,D0 then it's there. Try to figure out where the check is and cancel it.

More difficult: no pseudo code, but something to click on

Beware of late double checks

In Formula One Grand Prix, there is a word to enter from the manual. After having bypassed the first, obvious check, if you try to play, you find out that the car is uncontrollable. If you read-protect (using some trace system or the WHDLoad protect read feature) the area where the wrong password is entered, you find out that that area is checked one more just before game starts. Remove the check there and the car behaviour is back to normal. The hard part is to find where the check is done for the second time, because there is no user prompt, no disk check, nothing but a discreete memory read. In the case where that could occur more than once during the game, it may be wise to find where the page, line and word information is located, force it to some legal values, and copy the matching word in the proper location. Then, the program may check if the proper word has been entered, it is always ok.


Cracking pseudo-code based programs

Pseudo-code based programs are mainly adventure or action/adventure games, certainly not arcade games, where an interpreter would slow down the game.

How to tell if pseudocode is used:

There are many hints. First, check the executable.
If it is small, even uncompressed (example: 50Ko), or it does not contain the "Enter word at ..." string (or any string encountered in the game), then it's likely that the string data is somewhere else. Again this could be because of multilanguage feature. But it's likely that it's pseudo-code.

Second, the game type. If this is an adventure game, then the protection is easier to integrate to the rest of the game than for a shoot-them-up or car game. Also, since adventure games have been developped for at more than one platform (at least for Amiga and PC, but also for Atari ST and PCs), it's cheaper to port an interpreter than the whole game. That's why companies like Sierra, Adventuresoft, Lucasfilm, or Delphine have developped their scripting system (SCI for Sierra, SCUMM for Lucasfilm).

Example of games where pseudo-code has been used, and where the protection is also written using pseudo-code.

- Another World, Operation Stealth & Future Wars (Delphine). Flashback does not use pseudocode.
- King's Quest IV, V, Leisure Suit Larry 2, 3, 5 & remake and all code-protected Sierra adventure games (not Sierra Soccer for instance).
- Indiana Jones Last Crusade / Fate of Atlantis, Secret of Monkey Island 1 & 2, Loom, Maniac Mansion & Zak Mc Kracken (Lucasfilm). Battle of Britain does not use pseudocode.
- Simon The Sorcerer

Third, the name of the files on disk. If the files are called something.001, .002, etc... it's because there are some rules in the file naming imposed by the interpreter -> there is an interpreter -> this is pseudo-code.

Why cracking pseudo-code is (or looks) more difficult:

This is a kind of password protection which cannot be removed by inserting a BRA or a NOP somewhere, since the assembly code you could change is the interpreter code (it would be like trying to crack a BASIC program by putting BRA or NOP in the basic interpreter, or trying to crack an Amiga game running under UAE by changing 1 instruction in UAE). A very good cracker specialized in disk protections could be stuck there. It does not use encryption, trace mode or machine dependent stuff, just because most of the time, the interpreter is a C program, and seen from the interpreter view, the protection is just a part of the game.

The best way to crack pseudo-code is to understand how the interpreter works and what does the bytecode means. The most known interpreter is Lucasfilm SCUMM. I've seen smart cracks of Monkey Island 1 & 2 and Indiana Jones and Fate of Atlantis where some data file (ending with .001) had to be modified and then entering anything at the protection check is allowed, or even better the protection check does not even appear.
That method is very smart, because it does not change any assembler code (no TSR programs needed), and once the interpreter has been understood, lots of other games made with the same kind of interpreter can be easily processed. Once the location of the bytes to change is found, it is rather easy to look for the same sequence of bytes in another version of the same game (including other platform versions) and the crack also applies.
Example of games cracked using this technique:

- Most of the Lucasfilm adventure games (even if some other cracks use other techniques)
- Another World
- But none of the Sierra games I know was ever cracked using it.

I won't explain how to understand the pseudo-code because unless you are curious, it is not really needed to crack the games, and the "blind" technique I'm currently (and successfully) using is less time consuming and gives almost equivalent results. Myself I did not try to understand any pseudo-code systems. I just don't need that.

Detailing the "blind" and "replay" techniques that I used to crack pseudo-code games:

1. Simple example on how to use the "blind" technique

First, you have to understand that the people coding those protections within the games are the same ones who develop the game itself. The guy who coded protections for the Leisure Suit Larry games is called Al Lowe and probably never touched an amiga in his entire life. He himself says that protections are a pain in the neck, and he gives away all the codes for his Larry games on his home page. Obviously, password protections are fun to noone (but you and me, since I wrote this, and you are reading this :)).
Because of that fact, and also because the programmers know that their program is protected by the pseudo-code layer, they did not hide too well the variables used for the protection. When they do that better, my technique would be still useful, but less efficient.
Let's take the simple example of Leisure Suit Larry Remake version (simple because the protection is right at start, and because I remember I cracked it in less than 1 hour). You are presented a series of multiple choice questions where you have to click on a,b,c, or d.

Step 0:

If the pseudo-code is simple, then there must be somewhere in memory some location where the correct answer is contained. If we could figure out how the correct answer is represented in memory, that would be easier. Let's suppose that. We cannot just blindly scan the memory until we find something, there are too many locations where it could be, and we don't even know how it is represented.

Step 1: create useful memory dumps

Boot the computer, or WHDLoad slave, or whatever with only chip memory, so all the game memory is contained in one single chunk, and it's easier for the following steps. Larry needs something like 1,5 meg of memory. So chipmem size should be set to $180000 in the WHDLoad slave. Click until you are presented the first question, which is not really a protection, but the 6th question is a protection, and it works the same. Interrupt the game there, get the correct answer from the manual (let's say it is 'a'), and save the memory with the following name: "code_1a". Return to the game, answer 'a', you are presented with the second question. Same thing here, (let's say answer is 'c', save dump as "code_2c"). Answer 'c', and repeat the operation for the third question (answer 'b', save as "code_3b"). Then quit the game.

Step 2: analyze the dumps

We first suppose that 'a' is encoded as 1, 'b' is encoded as 2, and so on... With a binary comparison program, compare 2 or 3 of the dumps together (with 3 there's less chance of mistakes), with the following algorithm:

For each offset from start to end
   get value of all bytes in all dump files
   print offset when value 1 is 1 ('a'), value 2 is 3 ('c') and value 3 is 2 ('b').
End for

You won't find something there. That means that data is not encoded like this. Try several things, (ASCII code for a,c,b, upper and lower case, that still does not work). Finally, assuming that 'a' is encoded as 0, 'b' as 1, and so on gives us one single offset. The game is almost won. Of course, with simple values such as 0,1,2 ... there's always a possibility that the location means something else, but with 3 files at the same time it's less likely.

Step 3: verify your theory

Run the game again, and when you're asked the question, enter the debugger, and check that the correct answer matches the value at the location we found at step 2. If that's the case, then try to change the value in memory. Let's put 0 for instance, and return to the game. Now clicking on 'a' makes the game satisfied. When there are more than 1 matching addresses, you have to locate the source value, from which other values derive. Changing a derived value does not alter the game behaviour. Sometimes the derived value matches your theory (it may be used only to display some stuff), whereas the source value is less obvious to find. In that particular case, just put a write memwatchpoint on the derived value to see where it is copied/computed from.

Step 4: modify interpreter behaviour to pass the test everytime

Ok, you figured out where the memory should be changed to pass the test, but now it must be done automatically, and only in the case of password request (I've fixed a crack of Zak Mc Kracken where the protection was disabled, but also a part of the game known as the color puzzle, so you have to be cautious and to be sure to act on the program only at the proper time, or strange behaviour may occur).
Now that we are sure that the memoy location is the one the game looks into to tell the right answer from a wrong one, we just have to put a read memwatchpoint there (if you've got HRTMon + WHDLoad running, use the wpr command). If you don't have an MMU, it's still possible, with some memory access program running in trace mode. I've written such a program a long time ago with a friend, and I used it successfully on F1GP for the hidden password check, and I think Bert has written one too, available in the WHDLoad package).
Once the memwatchpoint has been set, click on any answer (it doesn't matter which answer it is), and you should be thrown into the debugger at once since the location you've selected has been read. Note down the program counter there. Remove the memwatchpoint then, and for that particular case, the instruction which reads the correct answer is   MOVE.W  (A3),D0. Since the absolute address is valid only for debug purposes, and submitted to changes if you change kickstart version and/or memory configuration, note down/save the code around to be able to find it in the interpreter code segments later on.
The problem is the same for the value of A3. You know the absolute value in that case since it is the address of the correct answer.
Try to locate some remarkable data near addresses pointed by the address registers. In this particular case, it would be better to try A3 first, since its neighbourhood has more chances to be related to the protection. Dump -$100,+$100 bytes around the current value of A3. In most cases, you'll find some strings such as "Enter the answer" somewhere near. Note down the offset where some remarkable string appears (difference between A3 and the string address).
Now quit the game and return to your crack program. Once you have located in which segment/offset the protection code is (the MOVE.W (A3),D0 there, in most Sierra games it's the same code), insert a patch here (some JSR or TRAP), possibly copying the instructions around because you won't have enough room for a JSR.
If you have doubts, just insert a patch which does exactly what the game does and test the game like that. If it works, then you can try to make some subtle changes in your patch hook.
In your patch routine, first check the address near A3 for the string you found when the protection variable was read. If it matches, then you can just put 0 in (a3). Test that, and if you're successful, you pass the test everytime with answer 'a'.
That's technically cracked, but that's not very good since other answers still can make you quit the game, and if an user enters the former correct answer he'll be kicked out. So you've got to warn the user to always enter answer 'a' in some cracktro or splash screen.
If you want to make an even better work, just repeat Steps 1 & 2 when your crack patch is called to try to locate where the user answer is stored (it's very likely that the answer 'a' is coded by 0, etc...). Once this is done, try to make a relation between current address registers (including A3) and the user input variable location. Here you'll find that A2 matches exactly the user input variable. It was predictable that some other address register contains something interesting since the interpreter was just working on protection variables. The only thing to do there to allow ever answer to pass the test is to copy (A3) value into (A2) value or reverse.

More complex examples of the "blind" technique

The Colonel's Bequest was my first Sierra pseudo-code crack. I almost gave it up when I had the idea I described above. In this example, there are no letters, only names to click on. There are 12 names which could match the fingerprint, but only 4 are possible (which means that if you click on others, you have no chance to pass the test). Anyhow, it's more difficult to figure out a data representation of the correct answer. So Step 1 and 2 are slightly different.

Step 1

Create several memory dumps with different or same correct answers.

Step 2

Try to find differences/equal zones in the dumps. I compare 3 files together, 2 dumps of the same correct answer, and one dump of another and the algorithm is as follows


For each offset from start to end
   get value of all bytes in all dump files
   print offset when value 1 is equal to value 2 and different from value 3
End for

You can also filter for values above some threshold such as 10 or 50, which can save time. That generates a hell of a report, and you have to filter the suspicious values. For example, try to find isolate values, preferably values below 10 (if you did not filter). Maybe you'll find the values at some very different offsets (ex: around $20000 and around $C0000). Keep the higher address as the other one is probably a copy made by the interpreter for some display.
Once you found some suspicious offsets, go to Steps 3 & 4.

The "blind" technique has been successfuly used by me on the following games:

- Leisure Suit Larry Remake
- The Colonel's Bequest
- Conquests of Camelot (flower test)
- Leisure Suit Larry 3 (ticket number and locker combination)
- King's Quest 5 (the spells, slightly worse, since the correct combination is always 'JOTD' and you've got infinite tries to find it (and it's written just above))

However, I was not able to figure out how to crack Codename: Iceman. Maybe some day...

The "replay" technique

Older to me, this technique is similar to the blind technique, but is useful when you are able to locate:

- where the protection variables are stored (roughly, of course)
- where the program goes when you've entered the protection (and what's in some variables at this moment)
- where you've got several retries (you often have 3 retries)

but 

- you're unable to give a representation of the data in the memory and you cannot find it even with diffs.
- you are able to use write memwatchpoint, but not read memwatchpoint.

I think you should try the "blind" technique first, since that one is more demanding.

Step 1: create dumps of try #1, #2, #3

answer wrong each time

Step 2: find where the number of retries is located

Follow Step 2 of the "blind" technique

Step 3: 

Once you've located the number of retries, set it to 50 and try to give wrong answers a lot of times. If you are not thrown out after the 3rd answer, then it works, and you could even consider that if the answer to give is simple enough (means that you've got 25% of chances to give the correct answer at random), then you could even consider this as a "crack" since tampering with the retry count variable would be enough to play the game after say 5 or 6 retries (statistically speaking).
But most of the time, the answer to give is complex, and if you don't have the documentation, you cannot possibly pass the test.

Step 4:

Put a write memwatchpoint on the retry count variable (variant: if you are asked to enter 4 symbols, the symbol counter should be easy to find too, put a memwatch point there, then and act only when the value is 4) and enter a correct code. In the Lucasfilm games I cracked the memwatchpoint is reached. You see that some register is written in an address relative destination (MOVE.W D7,-10(A4)).
Save memory between say A4-$100 and A4 and restart the game with the memwatch point. Enter a wrong answer, and when the memwatchpoint is reached, reload the A4-$100,A4 memory zone you saved and resume the game. If it passes the test, you're almost there, but to play it safe, reduce the range of the memory you load until the protection does not pass. Use the limit values, since it reduces the risk of a crash, and the patch size too. Refer to "blind" technique step 4 to find when to apply the patch. In those cases, the data register values have to be accounted for.
Explanation: you probably saved the memory of a correct code and the correct answer, you don't know where it is in your dump, but who cares since it works!

The "replay" technique has been successfully used by me on the following games:

- Cruise for a Corpse (there were a "blind" technique crack which did not work and which I adapted)
- Indiana Jones and the Last Crusade
- Indiana Jones and the Fate of Atlantis
- Loom
- Future Wars (although it does not work, fails at the Monastery: the monks kill you)
- King's Quest 4

I did not even use the WHDLoad wpr/wpw features there, as opposed to the "blind" technique.

JOTD, august 2003