Skip to content

How emulators work, what do you need and your first steps

Camilo Andrés Mella Lagos edited this page Jun 11, 2017 · 21 revisions

Emulators are programs which function is to recreate as closely as possible the target system you are trying to emulate

What do you need to create an emulator?

  • -Documents describing the components of the target machine
  • -Documents or spreadsheets with info on the target CPU. Most consoles are based on CPUs available in the market at the time of their release. Granted there will be differences, but that's not gonna stop you isn't?? :P
  • -Knowledge on variable types, variable length or the desire to learn as you go
  • -Understanding on Logic operators, bitwise shifting and other bitwise shennanigans
  • -Windows programmer calculator (not even kidding on this one, this will be your best friend)
  • -An IDE with relaxing colors
  • -Knowledge on any programming language (High level langs recommended)
  • -Knowledge on some way of outputting graphics (SDL, OPENGL, std::cout << "*"; (don't do this hahahah)
  • -A strong desire of having little social life :D
  • -The will to learn, to go on no matter what hardships you may encounter (you will, rather)*
  • -A high tolerance towards failures and understanding you are not going to code an emulator overnight

Okay, now you know all that you need to work on your emulator. Where should you start, you ask me?

How about implementing a chip8 interpreter? BUT I WANNA DO SUPER NINTENDO / GAMEBOY!!! and that sounds boring ....

Hold your horses sir/madam! As tempting as it may sound, implementing a Gameboy emulator as your first project is not advisable. Gameboy and similar complexity consoles start dwelling in the realm of memory interrupts, of multi color screens and most important... they have a metric shit ton (yes, this is accurate) of Opcodes. Furthermore the amount of work you will need to do before you see any kind of improvement or results is often too much to handle for newcomers.

That's why every seasoned n00b like me will direct you to Chip8. Okay, turn that frown upside down, I know you are dissapointed, but that's how it is, so let's make the most out of it

Chip8 it's a low complexity THING. yep it doesn't even qualify as a console, it only has 35 opcodes so implementing the CPU should be a cinch, out of those 35 opcodes you will find at most 5 that are actually hard to implement, but i can help you with those :D The best part is that you will be seeing results in no time (granted you will be banging your head against the wall after that happens, but hey... you wouldn't be here if you weren't a masochist / programmer aahha )

Chip8 resources are vast an useful. the very best i've found is http://www.multigesture.net/articles/how-to-write-an-emulator-chip-8-interpreter/

it contains a master class on Bitwise instructions and most of my guidance will come as derivative if you follow that guide. I will though, try to explain things a little more in depth so you can have a firm grasp and lots of AHA! moments.

About the last thing... I believe you can copy code and get your emulator running really fast, but sadly you will be tricking yourself into not knowing how to follow up, or finding out that after your project, you don't have the technical knowledge to go on your own and jump into more complex systems.

As an automation engineer I believe the best way to learn is by yourself and having AHA! moments, that's why i'm not gonna spoon feed you with code. Rather i'll teach the fundamentals you will need to continue growing on your own

Let's go then! First we will be talking about Variable types

I know you can find a lot of information on the internet about variable types, but it will not sink in until you realize your are cramming an elephant into a shoe box or using ship containers to hold a neddle.

Okay, so let's start by the very basics, High level languages use variables, but when your program is finished and you want to test it or make a release build, the compiler will transform all your your beautiful variables into 0s and 1s, the most basic unity of those 0s and 1s is the BIT

Here we have a container and this container can hold 8 BITS, thus each one of those [0] is representing a BIT

[0][0][0][0] [0][0][0][0]

Why did I draw them like that? Well cause' when you have 8 BITS coupled like that is called a BYTE and BYTES have a lower region and an upper region. Just like with Arabic numbers 0000000002 is still 2 no matter how many left hand zeroes you have. Well Bytes work just like that, they have a MOST SIGNIFICANT BIT [MSB], AND A LESS SIGNIFICANT BIT [LSB], further more they are coupled into NIBBLES. A NIBBLE is a 4bit structure, so all combined... TADA behold my ASCII drawing skills

MSB [0][0][0][0] [0][0][0][0] LSB  <- Read order (The number may grow using more left Hand spaces) 
    UPPER NIBBLE  LOWER NIBBLE

Okay, hopefully you are still awake (it gets better believe me xD ) So now we speak the same language. we can start laying the ground work

Most easy systems don't use much memory, thus We will be dealing with a lot of 1 Byte (8 bits) and 2 Bytes(16 bits) variables

One of the most used variables when dealing with 1 byte variables is CHAR and for 2 bytes is SHORT At this point it will be wise to know that variable size means it can hold numbers between a given range and that variables can be signed (meaning they can be positive and negative) or unsigned (only positive numbers allowed) a rule of thumb when starting on emulation is that negatives values are BAD so just use unsigned ones until you get a better grasp of what you need for certain cases

OKAY Let's pop the windows Calculator and put it in programmer mode (VIEW - PROGRAMMER) Feeling like a pro yet? Well now we are in business, you can change the mode between HEX/DEC/OCT/BIN And at this time i want you to focus on the BIN(binary mode)

Let's do a quick experiment and hopefully you will have your first AHA! moment

input 1111 1111

The first thing you may notice is that the number grows from the Right to the Left, but now that we have filled all those 1, we can do several things. for example change the mode to DEC and Yep sure enough you are seeing (255), now change the mode to HEX. You should see FF or as we will refer to them from now onward 0xFF

Let's break this down... Some of you maybe already had the AHA moment i was talking about. Most old arcades worked on 1byte Registers (a container for CPU math work, like score or level number) so as you reached level 255 the game would crash and you encountered the "Kill screen" as is known but... WHY? well.. you have an 8bit container... but what happens if we add 1? Go into decimal mode (and input 255 + 1) Okay, you and me know, that should be add to 256, but take a look at the binary representation below the number

[0][0][0][1] [0][0][0][0] [0][0][0][0]

Yep you went from this

          [1][1][1][1] [1][1][1][1]   = 255 = 0xFF

to this

 [0][0][0][1] [0][0][0][0] [0][0][0][0]   = 256 = 0x100                 

The explanation is rather simple. When you add [0][9] + [0][1] in plain Arabic numbers, you obtain [1][0] in other words you CARRY the 1, well with bits it happens the very same, Now imagine you only have a 8 digit display

[1][1][1][1] [1][1][1][1] you went from this
[0][0][0][0] [0][0][0][0] to this

That's why old games have kill screens. You overflow the memory and all those precious numbers, the CPU was storing on an 8 bit container, are suddenly gone

If you already learned to follow my style, you may have taken notice that I said a CARRY like it's important... Well Carries are really a big deal for us. A carry occurs when you go past 0xFF. Another type of carry is the BORROW, and a borrow, just as in math, occurs when you go from 256 to less or equal than 255. Okay i'll go deeper when we really need to comprehend all of this.

As my preferred language is C++ I will be focusing on that lang, but you should be fine as most variable types are shared through all languages. Java may have issues with signed and unsigned variables, but why are you using JAVA! ahha, JK just find a workaround and comeback, or just switch to C++ if you want the full experience :P

So I already gave you all you need to start mapping your own emulator

  unsigned char -> an 8bit/1byte container that can hold numbers between 0-255
  unsigned short -> a 16bit/2 byte container that can hold numbers between 0 and 65535

As for now i want you to read about chip8 and write what you will need. Don't worry I'll get you started For example... DON'T SUE ME WIKIPEDIA!!!!


Memory[edit] CHIP-8 was most commonly implemented on 4K systems, such as the Cosmac VIP and the Telmac 1800. These machines had 4096 (0x1000) memory locations, all of which are 8 bits (a byte) which is where the term CHIP-8 originated. However, the CHIP-8 interpreter itself occupies the first 512 bytes of the memory space on these machines. For this reason, most programs written for the original system begin at memory location 512 (0x200) and do not access any of the memory below the location 512 (0x200). The uppermost 256 bytes (0xF00-0xFFF) are reserved for display refresh, and the 96 bytes below that (0xEA0-0xEFF) were reserved for call stack, internal use, and other variables.

In modern CHIP-8 implementations, where the interpreter is running natively outside the 4K memory space, there is no need for any of the lower 512 bytes memory space to be used, but it is common to store font data in those lower 512 bytes (0x000-0x200).


Okay no need to Understand all that tech blabling. You just need to extract the basics just do a grocery store list:

 Memory size: **4096** positions of **1 byte** each
 Font memory needed: **512** positions of **1 byte** each

Also take notice that it mentions that programs usually start at 0x200, but how can we access a given memory address freely? Well we need some kind of COUNTER for the PROGRAM... some may say... a PROGRAM COUNTER or PC for short

So how could we implement that, programming wise? Well first don't forget to include those Pre compiled headers :P

Let's start by laying the ground Work

We need an structure that can hold 4096 positions of 1 byte each, well for each position we could use unsigned char, but what about the positions... Well if you use an array you can lay how many position you want

unsigned char memory[4096];
unsigned char fontMemory[512];

What about the PC (program counter if you already forgot) it has the be able to count from 0 to at least 4096 as it needs to be able to walk though all positions in the memory. So? I take you are stuck... Let's pop the windows Calc once more

Go to DEC mode and write 4096 - Now take a look at the binary representation below the number... Yep

 0001 0000 0000 0000

Okay so, using an unsigned char is out of the question as it holds from 0-255 (1byte in lenght ), but what about a 2 byte alternative? Yep, we should use

unsigned short PC;

and there you go, we now have a memory container, a font memory container and a way of pointing which part of the memory we want to access. I know what you are thinking... Wasn't this supposed to be hard? haha It's not if you go step by step and use a logical approach

Let's see if you can do registers :D and I'll see you on the next part of this beautifully crafted, (but poorly written) Chip8 Tutorial. Hopefully I sparked your curiosity enough so you can work on your own on deciding which variables to use and the logic behind your choices

]-[ D x

PS: Almost forgot... HEXADECIMALS Go from 0 to F like this 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F wink wink xD (you may need this for registers)


Registers[edit] CHIP-8 has 16 8-bit data registers named from V0 to VF. The VF register doubles as a flag for some instructions, thus should avoid using. In addition operation VF is for carry flag. While in subtraction, it is the "not borrow" flag. In the draw instruction the VF is set upon pixel collision.

The address register, which is named I, is 16 bits wide and is used with several opcodes that involve memory operations.

The stack[edit] The stack is only used to store return addresses when subroutines are called. The original 1802 version allocated 48 bytes for up to 12 levels of nesting; modern implementations normally have at least 16 levels.