Skip to content

Commit

Permalink
[Readme] Remove the porting plans
Browse files Browse the repository at this point in the history
Well, that might have been my entire sales pitch for the decompilation,
and now I've ruined everything… Still, I've been getting way too many
confused questions about this by now. Let's focus on reconstructing,
and once that's done, ReC98 is done, and another project can take over.
Maybe done by me and mostly following the original plans, but probably
not.

Thanks to -Tom- for finally pushing me to do this 😛
  • Loading branch information
nmlgc committed Nov 12, 2019
1 parent a2e8866 commit 1cb9731
Showing 1 changed file with 10 additions and 48 deletions.
58 changes: 10 additions & 48 deletions README.md
Expand Up @@ -15,7 +15,7 @@

### Overview

This project aims to rebuild the first five games of the [Touhou Project](http://en.wikipedia.org/wiki/Touhou_Project) series by *ZUN Soft* (now *Team Shanghai Alice*), which were originally released exclusively for the NEC PC-9801 system, into a cross-platform, open and truly moddable form.
This project aims to perfectly reconstruct the source code of the first five [Touhou Project](http://en.wikipedia.org/wiki/Touhou_Project) games by *ZUN Soft* (now *Team Shanghai Alice*), which were originally released exclusively for the NEC PC-9801 system.

The original games in question are:

Expand All @@ -25,23 +25,21 @@ The original games in question are:
* TH04: 東方幻想郷 ~ Lotus Land Story (1998)
* TH05: 東方怪綺談 ~ Mystic Square (1998)

After completely decompiling the games first, we can then build an efficient environment for modifications on top. This project could then easily achieve what [thcrap](http://thcrap.nmlgc.net) has always wanted to do but struggled to realize, due to its nature of only being a patching framework targeted at 32-bit Windows binaries.
Since we only have the binaries, we obviously can't know how ZUN named any variables and functions, and which comments the original code was surrounded with. *Perfect* therefore means that the binaries compiled from the code in the ReC98 repository are [indistinguishable](https://giithub.com/nmlgc/mzdiff) from ZUN's original builds, making it impossible to *disprove* that the original code *couldn't* have looked like this. This property is maintained for every Git commit along the way.

This approach is different from a project like [PyTouhou](http://pytouhou.linkmauve.fr/), a free/libre black-box reimplementation of the engine of TH06, [東方紅魔郷 ~ the Embodiment of Scarlet Devil](http://en.wikipedia.org/wiki/The_Embodiment_of_Scarlet_Devil). There are a number reasons why decompilation seems to be more worthwhile for the PC-98 games:
Aside from the preservation angle and the resulting deep insight into the games' mechanics, the code can then serve as the foundation for any type of mod, or any port to non-PC-98 platforms, developed by the community. This is also why ReC98 values *readable and understandable* code over a pure decompilation.

#### Why?
There are a number reasons why achieving moddability via full decompilation seems to be more worthwhile for the PC-98 games, in contrast to a [PyTouhou](http://pytouhou.linkmauve.fr/)-style black-box reimplementation:

* While stage enemies and their bullet patterns are controlled by bytecode in TH04's and TH05's .STD files that *could* just be interpreted by an alternate VM, midboss and boss battles are entirely hardcoded into the executables.
* Even though complete decompilation will take a long time, partial reverse-engineering results will be very useful to modders who just want to work on the original PC-98 versions of the games.
* PC-98 emulation is messy and overly complicated. It has been getting better as of 2018 thanks to [DOSBox-X](https://github.com/joncampbell123/dosbox-x) adding support for the platform, but even at its best, it will always consume way more system resources than what would be appropriate for those games.
* thcrap-style multilingual translation on PC-98 would be painful for languages with non-ASCII scripts. The obvious method of modifying the font ROM specifically for each language is ugly and won't work on real hardware, so a custom renderer would be needed. That by itself requires a lot of reverse-engineering and, preferably, compilable source code to avoid the limits of hex-editing. Or, even better, the prospect to do this entirely on a more modern system.
* These games stopped being sold in 2002, ZUN has confirmed on multiple occasions to have lost all the data of the "earlier games" <sup>[citation needed]</sup>, and PC-98 hardware is long obsolete. In short, these games are as abandoned as they can possibly be, and are unlikely to ever turn a profit again.

#### So is this about remakes?
Even once we get to porting the games to modern systems, our goal will still be quite different from the typical idea of [a from-scratch *remake* in a modern engine](https://www.youtube.com/watch?v=-W5YIY4UWsY), which typically involves little or no original data. ReC98 by itself will only ever be about providing *exact, credible, source ports* that fully replace the need for the proprietary, PC-98-exclusive original releases and their emulation for even the most conservative fan. Ultimately, this project should merely serve as the *foundation* for future remastered versions of these games, developed by third parties. And after all, preserving a reconstructed, readable version of the games' source code is undoubtedly more valuable than preserving a bunch of binaries.

To achieve this, ReC98 has been kept in an openly available Git repository from day one. Each commit represents an atomic step along the way. This makes it easy to prove the correctness of the reconstruction process, or to track down potential regressions.

#### Is this even viable?
It certainly *seems* to be. During the development of the static English patches for these games, we identified two main libraries used across all 5 games, and even found their source code. These are:
Definitely. During the development of the static English patches for these games, we identified two main libraries used across all 5 games, and even found their source code. These are:

* [master.lib](http://www.koizuka.jp/~koizuka/master.lib/), a 16-bit x86 assembly library providing an abstraction layer for all components of a PC-98 DOS system
* as well as the Borland C/C++ runtime library, version 4.0.
Expand All @@ -50,18 +48,11 @@ It certainly *seems* to be. During the development of the static English patches

master.lib and the C/C++ runtime alone make up a sizable amount of the code in all the executables. In TH05, for example, they amount to 74% of all code in `OP.EXE`, and 40% of all code in `MAIN.EXE`. That's already quite a lot of code we do not have to deal with. Identifying the rest of the code shared across the games will further reduce the workload to a more acceptable amount.

With DOSBox-X and [the Debug edition of Neko Project II](https://github.com/nmlgc/np2debug), we now also have two open-source PC-9821 emulators capable of running the games. This will greatly help in understanding and porting all hardware-specific code.

Still, it will no doubt take a long time until this project will have made any visible and useful progress. Any help will be appreciated! If you are interested, check [`CONTRIBUTING.md`](CONTRIBUTING.md) for the general contribution guidelines.
With DOSBox-X and [the Debug edition of Neko Project II](https://github.com/nmlgc/np2debug), we now also have two open-source PC-9821 emulators capable of running the games. This will greatly help in understanding all hardware-specific code.

## Roadmap

### Reconstruction phase
First, we have to accurately reconstruct understandable C/C++ source code for all the games from the original binaries, while still staying on the PC-98 platform. To ensure the correctness of the process, each commit during this phase should result in a build that ideally is byte-identical to the previous commit.

##### Step 1: Dumping
Based on the auto-analysis done by IDA, create assembly dumps for all necessary executables, and edit them until they successfully recompile back into working binaries equivalent to ZUN's builds. In total, there are 27 unique executables we need to dump:
And while this project has made decent progress so far, completing the decompilation of even just a single game will still take a long time. Any help will be appreciated! If you are interested, check [`CONTRIBUTING.md`](CONTRIBUTING.md) for the general contribution guidelines.

## Dumped executables
* TH01: `zunsoft.com`, `op.exe`, `reiiden.exe`, `fuuin.exe`
* TH02: zun.com (<s>`ongchk.com`</s>, `zuninit.com`, `zun_res.com`, <s>`zunsoft.com`</s>), `op.exe`, `main.exe`, `maine.exe`
* TH03: zun.com (<s>`ongchk.com`</s> [-1], <s>`zuninit.com`</s> [-2], <s>`zunsoft.com`</s> [-3], `zunsp.com` [-4], `res_yume.com` [-5]), `op.exe`, `main.exe`, `mainl.exe`
Expand All @@ -70,35 +61,6 @@ Based on the auto-analysis done by IDA, create assembly dumps for all necessary

Crossed-out files are identical to their version in the previous game. ONGCHK.COM is part of the PMD sound driver by KAJA, and therefore doesn't need to be disassembled either; we only need to keep the binary to allow bit-perfect rebuilds of ZUN.COM.

##### Step 2: Reduction (current)
Identify duplicated functions and either replace them with references to the original libraries or move them to separate include files. In the end, only ZUN's own code and data will remain in the dumps created in Step 1.

##### Step 3: Reverse-engineering
Translate the remaining assembly back into platform-independent C code. Some PC-98-specific in-line assembly inside ZUN's code will remain, which will have to be kept and moved to a separate library for now.

### Modernization phase
With the source code recreated, everything is possible! Let's turn them into the best games they can possibly be.

##### Step 4: Porting
* Refactor Neko Project II's graphics and sound emulation into a library we can hook up to our code
* Port all of master.lib and the shared hardware-specific library built in Step 3
* Rewrite all DOS API calls (`int 21h`) into platform-independent alternatives

##### Step 5: Sanitization
Get rid of any bad coding practices and the remaining hardware-specific formats, while breaking compatibility to both PC-98 hardware and the original data in the process. This includes, but won't be limited to:
* removing unnecessary encryption
* using text and custom fonts instead of graphics where possible
* standardizing graphics formats to PNG, replacing all the original ones (.grf, .cdg, .cd2, .bos, .mrs, .pi, .grp, as well as hardcoded bitmaps)
* merging the three executables per game into one
* changing gaiji characters into a semantically better representation (Unicode characters, alternate fonts, or simply normal graphics) on a case-by-case basis

##### Step 6: Moddability
Do whatever else is necessary to easily modify the game elements people like to modify. This may involve further changes to the formats used by the games, and the addition of one or more scripting languages (and subsequent porting of previously hardcoded functions). Mods will use the thcrap patch format, complete with support for patch stacking and dependencies, and will be easily selectable from a new menu added to every game.

A clear line will be drawn between the original content and fanmade modifications, which will only be available as optional packages. This also means that we won't ship the reconstructed games with any existing English translation patch. For the visible modifications that *will* be necessary (read: the "Mods" screen in the main menu), great care will be taken to keep them in the spirit of the original PC-98 games.

Since this will most likely result in graphics mods that exceed the specifications of PC-98 hardware, there will also be an optional filter to reduce the rendered output to the original resolution of 640x400 and a 16-color palette, for the sake of keeping the original spirit.

## Building
You will need:

Expand Down

0 comments on commit 1cb9731

Please sign in to comment.