I plan to follow Maxim's tutorial. It's going to be a messy stream-of-consciousness type of writing, documenting my progress, pain points and victories alike.
I downloaded latest source code for WLA-DX.
After reviewing it, it seemed it's using CMake.
I downloaded current Windows win64-x64 Installer and installed CMake.
In CMake GUI, I pointed to
wla-dx folder for source and
wla-dx\binaries for build files.
I generated Visual Studio 14 (2015) x86 solution, which compiled fine.
I dropped release build of executables in a folder that's listed in PATH variable.
I downloaded last official release: Meka 0.73 from 2010.
Meka Windows build doesn't work, it is missing MSVCR71.dll.
71 is a reference to Visual C++ 7.1, which is part of Visual Studio .NET 2003.
I hope MSVCR71.dll is part of .NET Framework 1.1 redistributable.
Downloaded it, got dotnetfx.exe file (24,265,736 bytes).
I don't think you can install this ancient software on Windows 10 x64.
I ran it with question mark parameter:
This popped a dialog, telling me I can use
/Q for quiet mode,
/T:full-path to use explicit temporary folder,
/C to extract only and
to override default install command.
dotnetfx.exe /T:%USERPROFILE%\Desktop\dotnet11 /C.
This created 5 files in
dotnet11 folder on my desktop:
InstMsi.exe, InstMsiW.exe, install.exe, netfx.msi and netfx1.cab.
Next, I ran admin install on msi file to extract compressed files within:
msiexec /A netfx.msi TARGETDIR=%USERPROFILE%\Desktop\dotnet11-admin /Qb.
I then found MSVCR71.dll in
After copying MSVCR71.dll to folder with mekaw.exe, it launched.
On first launch Meka prompted me for initial setup:
I clicked on checkbox to enable debugger and left other options as-is. Meka launched, and to my mild surprise, it turned out to be a full screen program.
Building Meka from source on Windows
There is a more recent version on Omar Cornut's Meka GitHub project.
For whatever reason, it's enormous: ~70 MB (◔_◔).
Cloning it in GitHub desktop client takes forever.
Ok, cloning is done, and it contains huge history: ~50 MB in
It also contains a bunch of large and totally unnecessary binary files:
|binary file||size in bytes|
This could probably be fixed by carefully editing .git history. May be I should try it, just it for fun.
I only have access to Visual Studio 2015 at this time.
Visual Studio did some conversion activity and opened solution successfully.
First attempt at compiling failed due to missing Allegro library files:
This fixed first build issue.
Second build issue is missing
I tried to just kludge through by running bundled
This didn't work, throwing tonns of
hq2x16.asm:NNN: error: binary output format does not support external references,
where NNN are line numbers.
Looking at makefile rule for asm files shows that they are built with this command:
$(ASM) $(DEF_OS) -f $(OTYPE) $< -o $@, where
ASM = nasm,
Brief web search revealed that
$< stands for source file and
$@ for generated file.
I ran the following command:
meka\meka\libs\nasm\nasmw.exe -DARCH_WIN32 -f win32 meka\meka\srcs\hq2x16.asm -o meka\meka\srcs\hq2x16.obj.
It worked, producing hq2x16.obj. I tried to plant hq2x16.obj in Release folder,
but Visual Studio solution rebuild still failed the same.
I think Visual Studio removes all files in build folders as part of build.
I searched in Visual Studio project files for hq2x16 and found a reference in
Here is the custom build rule command-line:
CommandLine=""..\..\..\libs\nasm\nasmw.exe" -O1 -DWIN32 -f win32 "$(InputPath)" -o "$(OutDir)/$(InputName).obj"
I found hq2x16.asm file in Visual Studio Solution Explorer:
I right-clicked and selected Properties, this opened custom build dialog:
I modified custom build rule from:
"..\..\..\libs\nasm\nasmw.exe" -O1 -DWIN32 -f win32 "%(FullPath)" -o "$(OutDir)%(Filename).obj" to
notepad.exe %(FullPath), hoping that this will open notepad with asm file in it.
This didn't happen on solution rebuild, but it does happen if I right-click on hq2x16.asm in Solution Designer and choose Compile.
I replaced nasm executable with my troubleshooting program printargs
and got this on right-click Compile:
I restored proper nasm executable and ran right-click Compile, it produced
This means custom build rule is fine as is, but for some reasons solution and project doesn't attempt to build hq2x16.asm.
Original project file in github master is from Visual Studio 2008. I wonder if conversion to 2015 breaks it somehow. A good test would be to install Visual Studio 2008 on my Windows XP laptop and try to build the solution.
I'll pause build from source effor for now. I may attempt to create a fresh project and solution from scratch later. Even better would be to generate project and solution using CMake.
Tutorial points to Charles MacDonald's web site that used to be at cgfm2.emuviews.com. At some point in early 2015 it moved to dreamjam.co.uk/emuviews/. I downloaded msvdp.txt, sc3000h.txt, smsmap.txt and smstech.txt. Sadly links to zip files with sample code are broken and I can't find any mirrors.
Ok, now I'm on the next page of Maxim's turorial. He's talking about his set up with some text editor called ConTEXT. I don't really want to use yet another editor. My personal preference is Far Manager 3 built-in editor. I did take his Compile.bat file for reference, originally from his WLA-DX binary build.
Maxim provides a "Hello, World!" program source code.
I renamed "Hello World.asm" to ex1.asm.
I then tried to use Compile.bat as-is, but ran into issues.
The biggest is that wlalink.exe v5.8b doesn't understand combined options as in:
wlalink.exe -drvs linkfile.tmp ex1.sms, as in Compile.bat example from Maxim.
It's fine if you specify options separately:
wlalink.exe -d -r -v -s linkfile.tmp ex1.sms
I modified Compile.bat and renamed it to build.bat,
because that is what I normally call simple build scripts on Windows.
I added call to mekaw.exe on successful build.
Example worked fine, thank you Maxim!
Meka is a very unusual Windows application, owing to it's use of Allegro 5.1 game library.
I discovered that ESC switches between full-screen and windowed mode.
Pressing Alt-X quits Meka, just like in 1987-1992 Borland IDEs!
To add a tool window, you have to choose it from top menu Tools.
To hide a tool window, you have to choose it from the menu again!
Clicking on little star in top-right corner doesn't remove the window, even through I expected it would.
After some mucking around, I modified mekaw.cfg:
start_in_gui = 1 and
bios_logo = 1 to
start_in_gui = 0 and
bios_logo = 0.
gui_video_mode = 640x480 to
gui_video_mode = 1920x1200 (my display's native resolution).
It made it a bit faster to load, which is great if you constantly lauch it as part of build script.
There is one very annoying feature that I didn't find a way to fix easily: if I start in windowed mode and then press ESC to switch to full-screen mode, Meka does that several seconds long animation of menu items from right to left. I really wish there was an easy to use setting to disable that.
At this point, I assembled and ran an assembly program for Sega Master System! Now, onto the next page of Maxim's tutorial.
WLA-DX takes both traditional assembler
; single-line and C-style
/* */ multi-line comments:
; this is a single-line comment ld a,0 ; another single-line comment, this time after assembly instruction /* Hello there! This is a C-style multi-line comment. Thank you for your time. Goodbye! */
WLA-DX understands bitwise numbers. For example 21 is 16+4+1, so it can be written as %00010101.
ld a,9 and
ld a,%00010101 are equivalent.
I have to check if it supports bitwise representation for 16-bit values.
Can I write
ld bc,%0000000111110100 instead of
WLA-DX will obviously work with hex numbers. Canonical form is with
ld bc,500 is equivalent to
Supposedely WLA-DX will also take hexadecimal numbers in several other syntaxic conventions:
$01f4 is the same as
&1f4 (huh?) and
1f4h (tasm/masm style).
Surprisingly no octadecimal support as in C, not that I've seen anyone using it.
Here are some of WLA-DX supported directives:
||same as offset, but at page border?|
||still no idea...|
||something about ram organization and debug symbols|
||unsigned byte, i.e.
||signed byte, i.e.
||unsigned word, i.e.
||signed word, i.e.
||probably takes argument and stores it's sin(arg)|
||see above, but cos(arg)|
||define byte with different content on each build?|
||see above, but word|
||include content from binary file as-is|
SMS specific directive
.sdsctag directive generates valid SMS ROM header that will pass SMS BIOS check.
This is also useful for ROM management tools that need identification data.
Conditional assembly directives
.if is apparently similar fo C preprocessor
WLA-DX supports macros, more on this later...
Now onto the next page of Maxim's tutorial!
A label is just a pretty name of a specific memory address. Syntax-wise it is a word, followed by colon:
ld a,trigger1 ; load address where we set a byte with value 123 trigger1: db 123
One WLA-DX specific syntax is:
ld a,:trigger1 ; this will load bank number for the label instead of it's address
Another WLA-DX specific label syntax:
-: ld a,(hl) ; some code here jp -
-: is an anonymous label for jumping backwards.
There is also
+: for jumping forward.
Or you can use labels starting with two underscores and a letter to jump back and forth:
The purpose of anonymous labels is that programmer donesn't need to come up with unique name for each.
Assembler translating a jump to anonymous label will look for the nearest match.
Labels are replaced by specific numeric memory addresses at linking time.
To the next page!
Ok, I'm tired. I'll stop here for today! (9 hours later) I'm back for the second day of this journey!
Read-only memory, usually on game cartridge. This memory is exposed to CPU when it's switched in one of the memory pages, I think. We can use it to read data or execute code, but we can't modify data in it.
Random-access memory. Starts either zeroed or undefined. Our ROM code could fill it with code and/or data and then even jump execution to it. All bytes in RAM are modifiable, both code and data. Access time is typically faster then ROM. I do know that SRAM is not normal RAM. It's battery backed memory on some cartridges, used to store save data between play sessions. SEGA Master System has 8KB of RAM.
According to Maxim this is when you connect a smaller ROM to a memory slot. If chip size is 2K, at 2K+1 you'll see a first ROM byte again etc.
SMS and GG memory is mirrored. There is 8KB of RAM in a 16KB memory slot.
SMS memory map
Z80 can address 64KB. On SMS, this memory is mapped as following:
|$0000-$00ff||ROM (system rom?)||1KB|
|$0100-$3fff||ROM slot 1||15KB|
|$4000-$7fff||ROM slot 2||16KB|
|$8000-$bfff||ROM slot 3||16KB|
Larger games could swap content of each slot to a different bank in ROM. Small games will typically load up in slot 1 or slots 1 and 2. To the next page!
Z80 has 8 and 16-bit general purpose and specialized registers.
|a||accumulator, used for arithmetic operations|
|f||flag bits for previous opcode|
|b, c, d, e, h, l||general purpose 8-bit registers|
|bc, de, hl||general purpose 16-bit registers, build as pairs|
|ix, iy||index registers for memory access|
|i||interrupt page address|
|a'-f', h', l'||Z80 has a second set of registers, swap all at once|
Lower and higher parts of ix/iy can be used as individual 8-bit registers. Those are not officially defined, but are often called ixh/ixl and iyh/iyl.
Z80 provides 256 potential ports. SMS hardware supports only 8, but those are mirrored, i.e. same port can be accessed at different value.
|$3f||i/o port control||i/o port control|
|$dc||controller 1||controller 1|
|$dd||controller 2||controller 2|
Bold means often used. To the next page!
Simple program walkthrough
Our program doesn't use much memory, so we won't need to swap slots at run-time. We do still need to do a basic memory setup.
;============================================================== ; WLA-DX banking setup ;============================================================== .memorymap defaultslot 0 slotsize $8000 slot 0 $0000 .endme .rombankmap bankstotal 1 banksize $8000 banks 1 .endro
It looks like we tell WLA-DX that we plan to use 32KB in slot 0 ($0000-$7fff).
I'm a bit confused about this. $8000 is 32KB. Does it mean we can use either 3 16KB slots or 1 32KB slot? And how does the setup look in actual Z80 opcodes? Is there a good Z80 disassembler that is SMS-aware?
;============================================================== ; SDSC tag and SMS rom header ;============================================================== .sdsctag 1.10,"Hello World!","SMS programming tutorial program (bugfixed)","Maxim"
So this is apparently a result of cooperation between Ville Helin, WLA-DX author and Maxim from SMS Power. It does generate a valid ROM header, but also embeds extra bits about author, program name, version and build date that is specific to homebrew. The only program that I am aware of that can read this extended identification is Maxim's Header Reader.
Program memory setup expectations
.bank 0 slot 0 .org $0000
This tells that the follow up code is going to bank 0 (the only in our case) and it is expected to be in first slot (again, the only one in our case). Next page.
;============================================================== ; Boot section ;============================================================== di ; disable interrupts im 1 ; Interrupt mode 1 jp main ; jump to main program
This is a standard SMS program boot process.
di disables interrupts until we have interrupt handlers setup.
im 1 means interrupt mode 1. Just roll with it, SMS program never use
jp main jumps to code that follows
Apparently SMS expects a gap between $0000 and normal code.
Probably something to do with interrupt handlers.
Pause button handler
.org $0066 ;============================================================== ; Pause button handler ;============================================================== ; Do nothing retn
Maxim explains how
di can't turn NMI (non-maskable interrupts) off,
and how pressing the pause button is the only NMI on SMS.
This NMI will stop code execution and jump to $0066.
Whatever code is at $0066 will execute, ending with retn,
at which point normal program will resume.
At minimum we should put retn at $0066, so pressing Pause won't do anything.
That is exactly what we did in code above.
;============================================================== ; Main program ;============================================================== main: ld sp, $dff0
Set up stack by pointing
sp register to location 16 bytes below the last byte of RAM.
I think stack grows down the memory.
Also $dff1-$dfff is probably reserved for something else.
Set up VDP registers
;============================================================== ; Set up VDP registers ;============================================================== ld hl,VdpData ld b,VdpDataEnd-VdpData ld c,$bf otir
; VDP initialisation data VdpData: .db $04,$80,$00,$81,$ff,$82,$ff,$85,$ff,$86,$ff,$87,$00,$88,$00,$89,$ff,$8a VdpDataEnd:
Here we use otir opcode to send some data (VdpDataEnd-VdpData bytes) starting at VdpData address to port $bf.
VRAM is attached to VDP and is accessible via VDP ports. At start it normally contains SEGA logo from system bios. Following code will clear it out:
;============================================================== ; Clear VRAM ;============================================================== ; 1. Set VRAM write address to 0 by outputting $4000 ORed with $0000 ld a,$00 out ($bf),a ld a,$40 out ($bf),a ; 2. Output 16KB of zeroes ld bc, $4000 ; Counter for 16KB of VRAM ClearVRAMLoop: ld a,$00 ; Value to write out ($be),a ; Output to VRAM address, which is auto-incremented after each write dec bc ld a,b or c jp nz,ClearVRAMLoop
No idea why Maxim did the ident shift. Assembler is not C or Python. I'd format all opcodes in a single row.
We output $00 and then $40 to VDP control port $bf. VDP expects lsb byte of word first, so to send $4000 to VDP, we send $00 and then $40. I assume that tells VDP we're about to send it some data to VDP data port. VDP should write it to it's attached VRAM. I should read msvdp.txt document to understand how VDP works.
We then use a loop to clear out 16KB of VRAM. I assume here that parts that won't show on screen or will contain our own data we'll load up later don't need to be zeroed out.
Another tought, instead of zeroing out all VRAM, we can fill one page with a starter screen and switch to that VRAM page to show it.
Loop is built using conditional jump instruction
jp nz, label.
f for z flag, set by opcode executed just prior:
or c does bitwise
a OR c, where
a is loaded up with current value of
b (ld a,b).
or c will set z flag to 0 only if both
c are zero, so only if
bc is zero.
Yay! Our first loop! A venerable workhorse of any program!
Maxim also show all
f flags that can be used in a conditional jump opcode.
Setting up graphics
VDP supports palette (also CMAP for color map), tiles, tilemap and sprite table. Backgound is made using palette, tiles and tilemap. Sprites are made of palette, tiles and sprite table.
Our program only sets the background. Any changes to VRAM are done via VDP ports.
To set up palette, we have to write to CRAM. To write to byte N of CRAM, we have to OR it with $c000 and then output the value to VDP control port $bf. Low then high byte:
;============================================================== ; Load palette ;============================================================== ; 1. Set VRAM write address to CRAM (palette) address 0 (for palette index 0) ; by outputting $c000 ORed with $0000 ld a,$00 out ($bf),a ld a,$c0 out ($bf),a
PaletteData: .db $00,$3f ; Black, white PaletteDataEnd:
We can use
otir again to send bytes to CMAP:
; 2. Output colour data ld hl,PaletteData ld b,(PaletteDataEnd-PaletteData) ld c,$be otir
This only works because we send less then 256 values.
If it was more, we'd have to use conditional jump loop and
There are 16 colors we can set up at once, we only set up 2 right now.
Maxim goes into more details in his palette sidebar page.
It's some riveting stuff! CRAM is 32 bytes long, 16 for background and 16 for sprites. Each byte holds 2 bits per color, bitwise it looks like: %xxbbggrr, where 2 top bits are ignored. Palette index 0 is transparent color in sprites. In background it's drawn behind sprites, even if the rest of background is on top of sprites.
Game Gear has twice the CRAM, and each palette element is 2 bytes with 4 bits per color: %xxxxbbbbggggrrrr, which allows 2^12 = 4096 colors.
Ok back to main tutorial!
Tiles are loaded to VRAM at $0000. Each tile is 8x8 pixels. Each pixel takes up 4 bits. Value of those 4 bits is an index of color we want in CRAM.
;============================================================== ; Load tiles (font) ;============================================================== ; 1. Set VRAM write address to tile index 0 ; by outputting $4000 ORed with $0000 ld a,$00 out ($bf),a ld a,$40 out ($bf),a
Then we get a loop to copy data from ROM to VRAM:
; 2. Output tile data ld hl,FontData ; Location of tile data ld bc,FontDataEnd-FontData ; Counter for number of bytes to write WriteTilesLoop: ld a,(hl) ; Get data byte out ($be),a ; Output it inc hl ; Add one to hl so it points to the next data byte dec bc ; Decrement the counter and repeat until it's zero ld a,b or c jp nz,WriteTilesLoop
This loop introduces indirect addressing:
ld a,(hl) will load a byte form memory at address stored in
Maxim goes into more details in his tiles sidebar page.
I tried to read it, Maxim calls it planar. I think it's kind of like the old DOS VGA mode-x with separate bit planes. I can't understand it right now, but I'll experiemnt with it soon and hopefully get it then. Once I get it, I'll draw my own picture where bits get attribution. Somehow first 4 bytes of a tile represent first 8 pixels.
Maxim recommends to use tools to convert graphics to tile data.
He mentions that up to 14K of 16K VRAM can be used for tiles, which somehow gives us up to 448 tiles. Maxim recommends to use compression when storing tile data in ROM. Game Gear tiles work the same way.
Ok back to main tutorial!
Tilemap tells VDP which tile goes where on video page. There are also some extra options for how each tile is displayed. Tilemap lives at $3800 in VRAM.
Here is how we prepare for tilemap setup loop:
;============================================================== ; Write text to name table ;============================================================== ; 1. Set VRAM write address to tilemap index 0 ; by outputting $4000 ORed with $3800+0 ld a,$00 out ($bf),a ld a,$38|$40 out ($bf),a
Copy tilemap data to VRAM:
; 2. Output tilemap data ld hl,Message ld bc,MessageEnd-Message ; Counter for number of bytes to write WriteTextLoop: ld a,(hl) ; Get data byte out ($be),a inc hl ; Point to next letter dec bc ld a,b or c jp nz,WriteTextLoop
And data is defined like this (I assume each byte corresponds to a letter in "Hello World!"):
Message: .dw $28,$45,$4c,$4c,$4f,$00,$37,$4f,$52,$4c,$44,$01 MessageEnd:
Maxim goes into more details in his tilemap sidebar page.
Screen is composed from 32x28 tiles (896 in total). For each tile, tilemap uses two bytes: %xxxpcvhnnnnnnnnn:
- x are unused bits;
- p is a priority bit, I assume if it's set, tile is drawn on top of sprites;
- c is for first or second palette;
- v is a vertical flip bit;
- h is a horizontal flip bit;
- nnnnnnnnn is the index of tile: it can have value 0-511, but maximum tiles VRAM can hold is 448.
Turn on the screen
Screen was off since program set up VDP registers. We should switch it on. To do this, we should set bit 6 of VDP register 1:
; Turn screen on ld a,%01000000 ; ||||||`- Zoomed sprites -> 16x16 pixels ; |||||`-- Doubled sprites -> 2 tiles per sprite, 8x16 ; ||||`--- Mega Drive mode 5 enable ; |||`---- 30 row/240 line mode ; ||`----- 28 row/224 line mode ; |`------ VBlank interrupts ; `------- Enable display out ($bf),a ld a,$81 out ($bf),a
We then halt the program by having it run in infinite loop:
; Infinite loop to stop program Loop: jp Loop
Maxim proposes to use .define directive to give mnemonic names to various numbers.
This is a lot like
equ in classic tasm/masm.
He then suggest replacing repeated code with functions, leveraging
Those use stack. Maxim goes on a bit about how stack works, which I think I understand.
Maxim suggests we start using relative jump instruction
It's smaller, but needs address calculation.
Maxim recommends using it for loops.
Taking a break here. Up next "Convert ASCII text to tile numbers".