Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n builds #626

Open
Rangi42 opened this issue Apr 14, 2019 · 38 comments
Open

i18n builds #626

Rangi42 opened this issue Apr 14, 2019 · 38 comments

Comments

@Rangi42
Copy link
Member

Rangi42 commented Apr 14, 2019

With a system like pokeruby has to keep different languages' data separate, so code files don't get cluttered with if/elses.

So far the only separate builds are USA/Europe Revision 1.1 and Australian (which just censors the Game Corner text from 1.1), but others are possible:

889a06fc0bb863666865aa69def0adf97945ac2a *pokecrystal-es.gbc
accb584293ba056152f1fd908439b019017ff2fe *pokecrystal-de.gbc
c055992b16b7399c687647725cdd1f4f13a2f75c *pokecrystal-fr.gbc
6cee05e5b95beeae74b8365ad18ec4a07a8c4af8 *pokecrystal-it.gbc

There's also 95127b901bbce2407daf43cce9f45d4c27ef635d *pokecrystal-jp.gbc, but that probably has enough different code to justify a separate project, like pokegold.

@aaaaaa123456789
Copy link
Contributor

This is probably a hopeless endeavor.

When Ruby and friends were translated, they just had to replace strings. They relied on the compiler to place them in ROM, so only the strings themselves changed.

This is not the case for generations 1 and 2. Data was put into banks manually. Translations often caused banks to get full, leading to text (and maps, and everything) being moved around; that's the reason why text_far exists at all, for instance.

If nothing was moved across banks for the translations, this is possible, but I very much doubt it. If stuff was moved around, this will quickly turn into either if jungle or a pile of ad hoc pseudo-metadata files just to handle the bank allocation.

@Rangi42
Copy link
Member Author

Rangi42 commented Apr 14, 2019

Maybe so. I'm leaving it open just like #285 for now, as a maybe-impossible goal. Even just separate repos for each language would be better than nothing.

@mid-kid
Copy link
Member

mid-kid commented Apr 14, 2019

I'll just leave my notes on this endeavour behind, here:

First of all, pokeruby's way of doing this is terribly wasteful. It uses rsync to copy the data_de directory on top of the regular directory. If we're going to do this, we should do the following:

  • Have the makefile pick up the dependencies for the proper language, for example through the VPATH variable, and make sure all generated files go to a proper subdirectory.
  • Make sure you can build all of the languages with a single make compare
  • Abuse rgbasm's -i option to "overlay" included files. This should also be able to be used to include generated files for the proper language.

I'm not entirely too sure what I'd make the directory structure look like, but I think top-level i18n/<lang> directories would be the simplest and most effective.

This, however, poses a problem with how map scripts are laid out, which are a huge part of the translation. It'd be rather wasteful to have full copies of the map scripts where only the text changes, since that'd require propagating map script changes for 4 different languages, which is undesireable. However, having separate "_text.asm" files for each map sounds about as undesireable, so I'm not sure how to solve this. (I've been musing about a gettext-like system but it seems terribly impractical)

As for what @aaaaaa123456789 mentions, this method would make it a non-issue. Text banks would be entirely overlaid, causing the exact position of each text to stop mattering, and so would files that include others (data/maps/scripts.asm and main.asm, for example) as well as linker scripts.

@iimarckus
Copy link
Member

It would be easier to reason about what duplication will be necessary or avoidable if we had at least one other language disassembled. Hint, hint…

@iimarckus
Copy link
Member

Can the current build infrastructure handle multiple include directories? Specifically, will makefile dependencies be generated correctly?

@mid-kid
Copy link
Member

mid-kid commented Apr 14, 2019

No, it can't. scan_includes will need adaptation.

@Kroc
Copy link

Kroc commented Apr 15, 2019

There are steps you could take to make i18n more plausible, without actually going ahead and implementing it. For example, adding run-time word-wrapping to the game, so that text lines do not have to split into separate directives. Being able to equate one string in the source with one string in the ROM would help with a gettext like system.

A low-tech possible solution is to replace strings in the source with with constants, and different include files are used to define the constants according to language. This way, maybe all strings for the game can be kept in a single file for each language.

@aaaaaa123456789
Copy link
Contributor

@Kroc Additional features don't go well with the idea of making a matching ROM.

@mid-kid
Copy link
Member

mid-kid commented Apr 15, 2019

While, yes, having strings be macros or constants that get replaced for each different language would probably work, you have to keep in mind that text isn't the only thing that changes in the translated ROMs. Some code and graphics change as well, so we wouldn't be able to do much with just that.

It'd be the way I'd solve it, if I were the only person using this codebase, but the thing is that very few people are actually interested in working with translations, and any i18n changes should be burden-free for people who don't want them. Hence, adding yet another layer of indirection when defining strings sounds like a bad idea.

I'd rather have some kind of gettext-like system, or something that can not only overlay entire files but also just specific labels in a file. However, both of those solutions sound a bit too finnicky and hard to get right.

@vulcandth
Copy link
Collaborator

Unfortunately; these will probably be better off remaining as a Feature Branch. Although it is very much possible to build a single repo to support all the region releases, it would cause too much clutter in the repo. Although, I don't want mid-kid's effort to be completely wasted... so I'd be willing to help make them fully fleshed Feature Branches based on modern pokecrystal. We can then link them to the Wiki.

@mid-kid
Copy link
Member

mid-kid commented Apr 25, 2022

I disagree! The different localisations are easy enough to keep separate and non-intrusive. The only intrusive change being the build system changes required to make it work. I don't think that would be a blocker.

@vulcandth
Copy link
Collaborator

vulcandth commented Apr 25, 2022

4,228 changed files with 290,729 additions and 71,353 deletions... is a lot.

It would be massively disruptive, and very difficult for downstream users to stay up to date with pokecrystal. In the proposed system it looks like you renamed most of the .asm files to .inc and are now including the .inc in the new version changed files. The english files are still in the main data area.. where as the other versions are in a version area. Furthermore, the layout.link file is... something.... of a mess.

Is it feasible, sure... unfortunately I think we would have to hand hold every downstream repo through the update process if they want to maintain updateability. I don't see pushing this out an not getting many many confused downstream users. It would be a ton of work for them to get current, for a feature they probably never cared about.

Edit: A base branch or even possibly a patch branch would be much cleaner and better suited for this imo.

@aaaaaa123456789
Copy link
Contributor

I'd point out that a net line change of +219,376 is a massive increase in repo size.

@mid-kid
Copy link
Member

mid-kid commented Apr 25, 2022

And I'd like to point out that the i18n repo currently is built upon the -splitting branch, most of the deletions stem from there. that is the disruptive change, not the i18n. i18n doesn't depend on -splitting, it was just built on it since back then I expected -splitting to get merged eventually.
A net increase of +666,666 doesn't matter when it's all in separate directories and doesn't touch the regular english code all that much.
...and that's without mentioning that the i18n branch is out of date by a couple of years, the stats are massively skewed by that too, if you're comparing current master to the latest i18n commit. You're better off comparing i18n to -splitting.

Furthermore, the layout.link file is... something.... of a mess.

That's part of the -splitting changes, but, how so? It simply lists each file name like main.asm currently does.

In the proposed system it looks like you renamed most of the .asm files to .inc and are now including the .inc in the new version changed files.

That's not really the case. This is again, a -splitting change, you can read all of its gory details here, but to sum it up, the -splitting build system scans for files ending with .asm, and calls rgbasm on those. this was done to give a better overview of what files can be built independently and increase (mostly incremental) build speeds, as well as it being tidier coming from a C background, since you know a .asm file will provide its whole context, while a .inc file is included from a different file and may inherit definitions and macros from whoever included it.

This doesn't have much to do with i18n, since .asm files in the version/ directory will be used instead of the regular files when a localization is built, as well, and the build system can be accomodated to a non-splitting environment.

@mid-kid
Copy link
Member

mid-kid commented Apr 25, 2022

Oh, and please note that the current if DEF() for various strings in between the code was still under consideration for improvement, for example with a macro, I just didn't consider it a pressing decision to take at the time.

It was a proof of concept made to bounce ideas and spark discussion, not a final thing.

@vulcandth
Copy link
Collaborator

Comparing i18n to -splitting, is better... 1,612 changed files with 211,336 additions and 231 deletions.

  • I do not see a case where a user would want to build multiple localizations for their Rom-hack. (Multiple languages, maybe, but not localizations.)
  • There is a lot of redundant code being added. (Example: each localization has its own map .asm file). This adds to the complexity of making changes to the pret repo.
  • If a user only wants to work with one localization, it would be much cleaner and simpler as a feature branch dedicated to that localization.
  • the if DEF()'s would begin to aggravate in repo's that already build multiple versions (pokered, pokegold). Yes, macro's may make this better.
  • I can't see a reason why it shouldn't be a feature branch. It wouldn't be too difficult to rebase them every once in a while.

We could even make the branches on the pret repo. I'm not 100% positive.. but perhaps we can set up some clever CI to keep them up to date.

@vulcandth
Copy link
Collaborator

I'm just giving my thoughts on the topic. If we decide to press forward with this, then of course I'll help out in getting it done with whatever we decide to do. I just want to make sure we are taking the right approach to this.

@mid-kid
Copy link
Member

mid-kid commented Apr 26, 2022

I do see the benefit of having this in the main repo. There's many simple hacks like difficulty, speedchoice or gimmick hacks like enabling mobile adapter features that don't change much text and would instantly benefit from being able to be built in all languages; I've already seen people ask for a spanish version of the latter, and there's huge spanish communities that'd rather hack in their own language, without having to resort to outdated forks.

Making it easy for this to happen, while it still being trivial to remove support for localizations (i.e. delete the versions/ directory, or copy the files over for the localization you want to use), is IMO the best solution for this.

There is a lot of redundant code being added. (Example: each localization has its own map .asm file).

That's the only example, and a necessary evil unless you want to decouple map scripts from their text. This solution was chosen to minimize impact on people who only hack the english game.

I don't intend to port this to pokered or pokegold or whatever, but in case it ever happens, the IF DEFs won't overlap anyway so I consider that a moot point.

It shouldn't be a feature branch because it's not a feature branch, it's a base. Despite the relatively unobtrusive changes for english-only hackers, adding all the localized in-code strings is a fair bit of effort, the build system changes are non-trivial, and renaming and moving of text labels for e.g. battle features is easier to do up front than having to go back and port the texts later.
Additionally, this is significantly easier to maintain here, add language-specific tutorials and bugs_and_glitches entries, and it completes the romset we're trying to reproduce.

The main point of contention is the duplication of map scripts imo, but I genuinely don't know of a better solution without pissing off a dozen people, and I think it's worth having despite that.

@mid-kid
Copy link
Member

mid-kid commented Apr 26, 2022

Also note that I'm not expecting you to do this, I'm fully expecting to do it myself (since I have the most experience with the changeset...), and am trying to find time to do it myself, but it'll take a while, it's not a small thing to port.

@vulcandth
Copy link
Collaborator

vulcandth commented Apr 26, 2022

I don't intend to port this to pokered or pokegold or whatever, but in case it ever happens, the IF DEFs won't overlap anyway so I consider that a moot point.

All the Gen I/II repo's should share the same goals. It doesn't make sense to do something for one repo, and not the rest. If we can't support it on the other repo's we shouldn't do it here. Now, if it is a matter that you just don't have the time to do pokegold/pokered/pokeyellow, then it's fine.. as one of us (me?) would finish the work to port to them.

I do see the benefit of having this in the main repo. There's many simple hacks like difficulty, speedchoice or gimmick hacks like enabling mobile adapter features that don't change much text and would instantly benefit from being able to be built in all languages;

How many are actually doing this compared to majority of Rom Hackers? I'd imagine it is significantly less than those who are building generic rom-hacks based on the English localization. We would be forcing a bunch of localization stuff to the majority to accommodate the minority here.

I've already seen people ask for a spanish version of the latter, and there's huge spanish communities that'd rather hack in their own language, without having to resort to outdated forks.

This I believe is the greatest benefit of what your trying to do. Although, looking over the changes, majority does seem to be language changes. I still think it could be turned into a patch branch that replaces the english code/text with spanish code/text. Forget the building multiple versions in a single repo ect.. It would be much more solid to work from. Since it is mostly text changes, I still think rebasing would be fairly easy.


Now, with all that being said; I do have some suggestions if we decide to proceed with building multiple localizations in a single repo.

  • Let's leave out -splitting for now; it overcomplicates this issue.
  • Leave out the build system changes; We should be able to accomplish this without modification to the build system itself.
  • We should differentiate between localization changes and changes that support the additional language. This is helpful, as people may want to prune localization changes, but keep the languages. This can be done a few ways; one way would be creating two different macro prefix's. localization_de_* and language_de_*
    (Localization change might be something like: Jynx is color purple in the German localization) {This is a made up example}
    (Language changes might be things like: Language text, textbox alignment changes, charmap changes, ect.)
  • Avoid redundant code. In the case of the Map files, extract only the non-english text. Changes that are in large chunks should be extracted into the version folders. See the below map example.
AzaleaMartBugCatcherScript:
	jumptextfaceplayer AzaleaMartBugCatcherText

+if !DEF(_CRYSTAL_EU) ; or whatever we decide to filter out.
AzaleaMartCooltrainerMText:
	text "There's no GREAT"
	line "BALL here. #"

	para "BALLS will have"
	line "to do."

	para "I wish KURT would"
	line "make me some of"
	cont "his custom BALLS."
	done

AzaleaMartBugCatcherText:
	text "A GREAT BALL is"
	line "better for catch-"
	cont "ing #MON than a"
	cont "# BALL."

	para "But KURT's might"
	line "be better some-"
	cont "times."
	done
+endc

+      language_de_include maps/AzaleaMart.asm ; Macro conditionally includes version/de/maps/AzaleaMart.asm
+
AzaleaMart_MapEvents:
	db 0, 0 ; filler

	def_warp_events
	warp_event  2,  7, AZALEA_TOWN, 3
	warp_event  3,  7, AZALEA_TOWN, 3

	def_coord_events

	def_bg_events

	def_object_events
	object_event  1,  3, SPRITE_CLERK, SPRITEMOVEDATA_STANDING_RIGHT, 0, 0, -1, -1, 0, OBJECTTYPE_SCRIPT, 0, AzaleaMartClerkScript, -1
	object_event  2,  5, SPRITE_COOLTRAINER_M, SPRITEMOVEDATA_STANDING_UP, 0, 0, -1, -1, 0, OBJECTTYPE_SCRIPT, 0, AzaleaMartCooltrainerMScript, -1
	object_event  7,  2, SPRITE_BUG_CATCHER, SPRITEMOVEDATA_WALK_LEFT_RIGHT, 2, 0, -1, -1, PAL_NPC_RED, OBJECTTYPE_SCRIPT, 0, AzaleaMartBugCatcherScript, -1
  • Section with small code changes could do something like this:
+if !DEF(_CRYSTAL_EU)
	db "FIGHT@"
	db "<PKMN>@"
	db "PACK@"
	db "RUN@"
+endc
+	language_de_db "KMPF@"  ;Macro will have a built in conditional
+	language_de_db "<PKMN>@"
+	language_de_db "BEUTEL@"
+	language_de_db "FLUCHT@"
+	language_es_db "LUCHA@"
+	language_es_db "<PKMN>@"
+	language_es_db "MOCHILA@"
+	language_es_db "ESC@"
	ld hl, .BuenaComeAgainText
	call PrintText
	call JoyWaitAorB
+	localization_es_call PlayClickSFX ;Macro will have a built in conditional
	ret

This makes it a bit easier for downstream users to remove localization specific code using a search and remove; or even entire languages.

@mid-kid
Copy link
Member

mid-kid commented Apr 26, 2022

You really underestimate how long it took to dump just those two languages... lol. I just don't expect it to be done for the other repos anytime soon. It should be possible to introduce a feature here that won't be backported. But again, moot point since the if defs wouldn't overlap in significant areas.

Yes, most people will be english-only hackers, which is why the i18n repo is built as it is. Most english hackers wouldn't notice the changes all that much, and the languages would be removable with the press of a delete key.

Building multiple languages in one repo is beneficial to enough people, it's just rarely done because of the upfront cost there's been all these years. Having the infrastructure in place would incentivize doing it, especially for smaller scale rom hacks. I don't think it's a feature worth overlooking. I've mentioned this before, but I would be using this, and I don't believe I'd be the only one.

Your desire to make it unintrusive to english-only hackers is unfortunately incompatible with your desire to not change the build system, as not doing so implies having conditional includes everywhere. I think treating the version/ subdirectories as overlays on top of the regular files is both fairly intuitive and keeps the main files cleaner. That said, I don't completely dislike your solution to map script duplication, and would be fairly compatible with the overlay system, though I believe this is a thing that can be discussed after everything is in place. Straightforward approach first, cleanups later.

@mid-kid
Copy link
Member

mid-kid commented Apr 26, 2022

Oh and I should mention there really aren't that many non-text localization changes. It's all related to metric vs imperial, with a couple of text engine changes. I was planning on having a _CRYSTAL_METRIC define.

@vulcandth
Copy link
Collaborator

Alright. @Rangi42 whenever you have time (I know you are busy), can you please weigh in on this discussion? This is a big enough change that it needs a general consensus.

@Rangi42
Copy link
Member Author

Rangi42 commented Apr 26, 2022

I agree that pret's pokecrystal should be the main source for reproducing all Crystal ROMs. However, I think all the solutions for making master reproduce every ROM have difficulties:

  • Cluttering the files with if DEF checks and foreign text/data/code
  • Cluttering the file system with extra i18n .asm files
  • Complicating the build system to use different files depending on the language
  • Modifying scan_includes to handle language_*_include macros
  • Slowing down make compare

Personally I like the sound of separate branches for each language: german, crystal-de, pokecrystal-de, whatever you want to call them. Each could initially be done with two commits, one to remove the debug ROMs and one to replace the English ROM and VC builds with the translated text, graphics, hlcoords, etc, like https://github.com/Rangi42/pokecrystal/tree/no-maps. More commits could be added if necessary, changes to master could be rebased or merged (I prefer rebasing since it avoids the clutter of merge commits), and GitHub makes it easy to compare the two branches. (Or locally you can checkout two copies and diff them.)

The main advantage to having all languages in one repo is being able to check matching locally with one make compare command. However, I don't think we really need that: GitHub CI can compare for every language by just checking out and building each branch.

If we did one branch per language, I'd also be in favor of moving crystal-au to such a branch for consistency.

@mid-kid
Copy link
Member

mid-kid commented Apr 27, 2022

Applying every commit fivefold is going to be a massive pain...

Would an acceptable middle ground be to have a branch that builds all the versions, while keeping master "clean"?

I really don't like the idea of multiple official branches since I think it's a slippery slope and I don't want every mildly controversial feature to end up as one but one is better than five imo, and it'd still give people the tools to support multiple languages in one repo.

@Asday
Copy link

Asday commented Apr 27, 2022

How similar would the other branches be? Could be as simple as an action for each of them that rebases them every time master changes, or on PRs, and in the (presumably rare) occasions the changes to master were big enough to cause a conflict when rebasing, the PR author can go in and fix them.

E: As a separate point, the easier it is for other language communities to work on or with the codebase the better, I think. If that makes it a bit harder for English language communities to work with it because there's dirty foreign clutter, then... English speakers already have everything else pretty easy. I'm sure they can handle this one hardship to be a little more welcoming to the overseas communities.

@Rangi42
Copy link
Member Author

Rangi42 commented Apr 27, 2022

A separate i18n branch sounds fine to me. It would give us more freedom to experiment with just how the languages are implemented, and might even decide to merge it into master.

@vulcandth
Copy link
Collaborator

vulcandth commented Apr 27, 2022

Although i'm still not convinced that a single branch should make all the Roms.. I'm open to exploring what we can do in i18n in it's own branch. Perhaps once we all start working on it, we can find creative ways to make it work. It is a lot easier to come up with solutions when you have something viable to work with.

@mid-kid how would you like to proceed? I'm thinking we need to remove the -splitting changes from your i18n branch and bring it up to date with modern pokecrystal first. We should probably do this on one of our fork's and get it somewhat viable (buildable) before attempting to bring anything to pret. We can either set something up on my fork, or yours.

Edit: Oh, and I plan on helping.... so don't try to assume you are doing it yourself. I know you have limited, time.. so I think your time would be better spent on providing knowledge, review, and direction.

@rawr51919
Copy link
Contributor

so if the i18n branch is gonna exist, what do we do in terms of disassembling the rest of the languages and whatnot? Using directives like in pokeruby and etc. isn't gonna quite cut it since this is ASM, I'm confused myself

@aaaaaa123456789
Copy link
Contributor

It's one of the many reasons why a single internationalization build would be extremely cumbersome... I'd much rather see a branch per language.

@Rangi42
Copy link
Member Author

Rangi42 commented Jun 18, 2022

No need to rehash the above discussion until/unless anyone has new i18n work already.

@rawr51919
Copy link
Contributor

rawr51919 commented Jun 18, 2022

With a system like pokeruby has to keep different languages' data separate, so code files don't get cluttered with if/elses.

So far the only separate builds are USA/Europe Revision 1.1 and Australian (which just censors the Game Corner text from 1.1), but others are possible:

889a06fc0bb863666865aa69def0adf97945ac2a *pokecrystal-es.gbc
accb584293ba056152f1fd908439b019017ff2fe *pokecrystal-de.gbc
c055992b16b7399c687647725cdd1f4f13a2f75c *pokecrystal-fr.gbc
6cee05e5b95beeae74b8365ad18ec4a07a8c4af8 *pokecrystal-it.gbc

There's also 95127b901bbce2407daf43cce9f45d4c27ef635d *pokecrystal-jp.gbc, but that probably has enough different code to justify a separate project, like pokegold.

And let's not forget about pokegold-ko as G/S has Korean releases, might be worth opening a companion issue for that one there as well
Edit: pokegold companion issue at pret/pokegold#94

@rawr51919
Copy link
Contributor

rawr51919 commented Jun 20, 2022

Wouldn't it be better to use a separate layout.link for each language edition so things that are in different banks across versions will be laid out correctly and match in the different languages?

Ergo layout-es.link and etc.

@mid-kid
Copy link
Member

mid-kid commented Jun 20, 2022

pokecrystal-i18n does exactly that.

@rawr51919
Copy link
Contributor

rawr51919 commented Jun 20, 2022

pokecrystal-i18n does exactly that.

Can't seem to find the branch/repo, could be a private one for now though so that might be why

@mid-kid
Copy link
Member

mid-kid commented Jun 20, 2022

https://github.com/mid-kid/pokecrystal/tree/i18n. The linker script files are under version/crystal-xx/layout.link.

@rawr51919
Copy link
Contributor

re: pokecrystal-jp
Would it be best if a pokecrystal-jp repo was created for disassembling the Japanese version because of how many differences there are between it and the rest of the versions? It makes sense because the differences are mainly in things like the mobile code and may in turn assist with documenting the rest of the mobile functions here in the main pokecrystal disassembly

@mid-kid
Copy link
Member

mid-kid commented Jun 24, 2022

Yes, -jp isn't included in this issue as per the opening message. Feel free to create a separate -jp disassembly any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants