Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export all possible strings [Localization] #514

Open
AndrewCarvalho opened this issue Feb 18, 2019 · 7 comments
Open

Export all possible strings [Localization] #514

AndrewCarvalho opened this issue Feb 18, 2019 · 7 comments

Comments

@AndrewCarvalho
Copy link

Quick version

I am looking to export every possible string that could be generated from an ink story. From my initial (but short) read through the documentation, this isn't currently supported and I am looking for either a place to start an implementation of this or where it is supported in the case I just happened to miss it.

I've posted the context as to why I am doing this below (localization) in case anyone else is interested.

Context

Hello!

I am investigating moving away from a client's proprietary scripting system and replacing it with ink and it's Unity runtime. So far I (and my client) have been impressed with the features and ease of use, as well as the fact it works without issue on consoles, all of which were key to adoption.

One final hurdle, though, is localization. I know it has been discussed before in several places but most having the solution suggested in #98 where the localization team would learn the ink format which I think is a little unreasonable for a standard localization team but also may not work for all translations in a easy way.

I believe that localization should be kept separate from the ink runtime, if possible. This keeps ink simple and leaves the team to focus on more important tasks, which is of benefit to everyone.

My perfect approach would be to leverage the gettext standard for localization and have ink populate the template file in an intelligent way, taking advantage of pluralization and contextual strings when appropriate. This would provide a minimal template file for translators to work with and minimize the time for translation as well as the cost. It's also something that ink could do (or support a tool to do) since it would likely use or build from the existing resolution code for generation.

That said, supporting more complex situations like pluralization and particular strings would still take a fair bit of work and for a first pass I am okay with simply generating a template with every single possible string that could be produced from an ink story. It's not ideal, no, but it would definitely work and my current use case isn't using many of the generative features in ink so it wouldn't have a large impact.

@jacobsky
Copy link

Unfortunately, ink currently isn't really suitable for the task that you are talking about directly. You will need to do some additional legwork in order to support this kind of functionality.

Thinking about this, I don't think you'll want to pursue compiling all the final strings as it'll be virtually impossible to get them back into the ink story file for use in your game. I have some examples at the bottom.

The two best places to do the localization is either creating a tag to put inside the ink file and create a pre-compiler layer that can create the localization necessary, all your writers would need to do is make sure they put the agreed upon marker string in and you could preprocess and generate the localization file or you could directly manipulate the final JSON as it already generates unique ids for all the string pieces that might need to be in the file.

You'll probably want to look at the json format in either case so I'll leave the link to the spec here.

I hope this helps.

Note on how ink compiles stories

In inkle each line is a container, but the container becomes much more intricate as you add more directives.

Consider the phrase
Hello Player I am character
Ink will compile this static phrase into
{"inkVersion":19,"root":[["^Hello player I'm character","\n",["done",{"#f":7,"#n":"g-0"}],null],"done",{"#f":3}],"listDefs":{}}

When we simply add two constant name substitutions and change it to

Hello {Player} I am {character}

We get this

{"inkVersion":19,
    "root":[["^Hello ","ev", "str", "^buddy", "/str", "out", "/ev", "^ I'm ", "ev", "str", "^guy", "/str", "out", "/ev", "\n",
        ["done", {"#f":7, "#n":"g-0"}],null], "done", {"#f":3}],
"listDefs":{}}

And when we allow both the player and the character to have a variable name, we get this.

{"inkVersion":19,
    "root":[["^Hello ","ev",{"VAR?":"player"},"out","/ev","^ I'm ","ev",{"VAR?":"character"},"out","/ev","\n",
    ["done",{"#f":7,"#n":"g-0"}],null],"done",{"global decl":["ev","str","^guy","/str",{"VAR=":"character"},"str","^buddy","/str",{"VAR=":"player"},"/ev","end",null],"#f":3}],"listDefs":{}}

This is a relatively trivial example in ink, but this gets even more complicated as you add flow control into the mix. Going this route will lose most of the power that the ink engine leverages.

@AndrewCarvalho
Copy link
Author

I don't think I explained myself properly.

What I'm looking for is the final strings that would be generated by the story that are passed to the actual view, not the intermediate json. Assuming we are using the Unity integration (which I am) I am looking to all possible strings that could be generated by

string text = story.Continue ();

The translation layer would be completely separated from ink's compilation and generation. I am looking to add an additional step between the final generated strings and displaying those strings where the translation lookup would happen.

To do this, the translation would be built from every possible final string. Due to the generative nature of ink strings are not easily parsed out of the markup as sentences can span several lines and generate variations. This is why I would like to generate all possible final strings post compilation. I realize this may be a task better left to the runtimes (such as an editor script within the Unity integration) but a cross platform way of generating the string would be ideal.

I am aware that variables would cause issues with this approach but I feel that it would be easier to work around those, for the time being as our current project is mostly linear and variables are minimal. At a later point I would like to have the translation step occur after after sentence composition but before variable replacement and have the ink runtime generate the final translated string.

Upon writing this I guess it's appropriate to post a similar question on the Unity integration page as well but hopefully this post clear up my intention.

@y-lohse
Copy link
Contributor

y-lohse commented Feb 25, 2019

I can confirm this doesn't exist at the moment.

If you want every possible string, you would need to play every possible path in your story. There's a super rough and incomplete js-implementation here if that's any help.
Off the top of my head, here are some things you'll need to account for:

  • variables
  • loops
  • content that is identical the first X times you run it, but changes afterwards
  • content that can only be accessed after game-side changes to variables

If your stories structure is relatively simple it's a realistic solution though.


That being said, here's how I would approach this (probably contains some pitfalls):

  1. Parse the original ink file for the main language
  2. Extract all the plain text from it, discarding as much ink structure as possible
  3. For each extracted line, store some reference about where it came from (line number, discarded ink, …)
  4. Transform the extracted text to gettext format and translate as usual
  5. Use the info of 3 to recompose an ink file, using the translations

@AndrewCarvalho
Copy link
Author

AndrewCarvalho commented Feb 26, 2019

Your 5 steps are basically what I'm looking at, though I wouldn't recompose the translation back into ink. The issues I will face will mostly be regarding variables since I wouldn't want those parsed into their values. That way translations can be done with the variables inline.

What I would ideally do is the following:

  1. Complete Ink script
  2. Extract out the generated strings before variable replacement occurs
  3. Generate the pot file with references to line numbers
  4. Send out the pot for translation and generate mo files from them

And that would be it for the ink file. There would only be the one ink story (and it's includes), in the original languiage. In the playback engine (in my case Unity) the flow of logic would be:

  1. Load the mo file into your gettext implementation of choice (I am using NGettext).
  2. Load the story file.
  3. Get the constructed the string to be displayed before variable replacement.
  4. Feed the constructed string through gettext.
  5. Complete variable replacement with the translated string and pass the string back to the user.

There may be some use cases that won't be able to construct the text without doing some variable replacement but I'm not certain of that. I'm thinking variables that are numbers that are then used in logic that also drive the string construction. I would imagine this isn't a problem since the variable expansion in logic doesn't require variable expansion within the constructed string.

I know there have been many questions about localization and I don't think asking a translation team to learn ink syntax is a realistic solution. Having an end to end solution where the ink file stays in the native language and a translation company can be given a standard they are used to (like gettext po files) and the runtime actually handles most the translation would likely cover most people's use cases.

@KumoKairo
Copy link
Contributor

You seem to be overcomplicating things. We are just giving the whole Ink text to interpreters and get an edited version back, nothing fancy.

Translation team doesn't need to learn Ink syntax, they just need to translate things that look "human". We have absolutely non-technical people writing Ink and translating it and so far it haven't seen any major issues (we are maintaining English as main and adding French, German and Russian as secondary).
This approach allows to maintain relative flexibility in writing, sometimes adding more dialogue lines to the story to make it fit to hard-defined borders of the text fields.

One possible additional validation step can be added though just to make sure that the overall story structure is unchanged. But it's just the validation and none of the writers even sees it.

IMO you should't really try exporting every single line variation and try translating it - it's going to backfire somewhere for sure.

@AndrewCarvalho
Copy link
Author

The concern I have about translating within ink is for languages that don't really fit the grammatical style of English which is what ink is primarily designed for.

Some languages have the subject and predicate flip or mix in certain situations which isn't possible to reflect in ink's syntax at times. This also can happen with numbers, pluralization, and context.

For example, the ink statement

How are you[?] doing, $playerName?

might need to move the subject to the front of the sentence in the case of printing out the longer phrase. This doesn't work with the syntax very nicely since we have an assumption that the subject can always go at the end. While this case can be worked around somewhat awkwardly with

[How are you?]How are you, $playerName?

I feel like the benefits of ink (string concatenations to reduce the amount of text needed to write out) are lost when doing this.

In more complex concatenations, it might not be possible to make the correct changes in the arrangement of the sentence to fit a language. I realize this is a corner case and it may be possible to find a fix either on the translator's end or by changing the ink syntax but if it can be avoided in our workflow I think it would be best.

To me it seems a lot safer to dump all the variations and let the translation happen outside of the limitations of any scripting syntax (as this is not specifically a problem unique to ink) which is what started me on this path originally.

I haven't had a chance to actually dig into the logic to do this yet as I am currently trying to finish a build of my client's game fro PAX before leaving for GDC. When I return I plan on investigating it a bit more and trying out some approaches I have in mind. I'll update this thread as I think is necessary.

@YongHeeK
Copy link

I think it is too ambitious to preserve all the narrative context for localization...
even commercial tools like lokalise won't provide such feature.

I've created a simple script to localize the final JSON, but the key values in JSON data are not really helpful to follow the narrative of the story.
It would be nice to preserve the flow of choices in localize key string. but JSON structure seems to be not intended for humans to read?

It would be nice if I can put more context in localize key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants