Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert object to data file #5

Closed
Vladimir37 opened this issue May 7, 2017 · 25 comments · Fixed by #102
Closed

Convert object to data file #5

Vladimir37 opened this issue May 7, 2017 · 25 comments · Fixed by #102

Comments

@Vladimir37
Copy link

Jomini converts data files generated by the Clausewitz engine into an object, but can not converts JS-object into Clausewitz engine data file. Why not do the method for reverse conversion? This would facilitate the creation of various editors.

@nickbabcock
Copy link
Owner

Yes, I absolutely agree that reverse conversion (often called serialization) would be a huge boon. Unfortunately the conversion from the data file to an object is lossy, meaning that given certain objects, it's ambiguous what the correct serialization should be.

Given:

{
  "foo": [1, 2]
}

is the correct serialization:

foo=1
foo=2

or

foo = { 1 2 }

or even

foo = { 1.000 2.0 }

jomini currently doesn't have a strong enough vocabulary to roundtrip deserialize and then serialize without ambiguity.

@Vladimir37
Copy link
Author

It seems to me that this problem can be solved if Jomini will display the serialized data in a format like this:

{
  "type": <type>,
  "body": <body>
}

In this way,

  {
    "foo": {
      "type": "int_array",
      "body": [1, 2]
    }
  }

will be deserialized to

foo = { 1 2 }

If type field is float_array:

foo = { 1.0 2.0 }

If type field is chain:

foo=1
foo=2

Using the "type" field would help eliminate ambiguity, as it seems to me.

@nickbabcock
Copy link
Owner

You're absolutely right that there are ways to disambiguate the types (and your idea is a good one). The one downside is that instead of accessing like foo.bar, one would have to do foo.body.bar.body

@Vladimir37
Copy link
Author

This problem can be solved if Jomini will have two methods. For example:

  • jomini.parse - Currently existing method. The data is easy to read, but they can not be serialized.
  • jomini.deserialization - Derealization using body/type objects.

Thus, the data for easy viewing and data for use with subsequent serialization will be separated.

@nickbabcock
Copy link
Owner

Ideally there'd only be one method to ease differences in parsing. There may be a way to create a class where we keep all properties on the class + a jomini_type() method that is used in a save() method to disambiguate.

But this may be wishful thinking and creating two methods may be more practical in the short term.

@C45tr0
Copy link

C45tr0 commented Dec 2, 2018

You could add all the information to disambiguate types into a meta object at the top level. This way the clean access is still given, but allows you to parse that or define it if needed.

@nickbabcock
Copy link
Owner

Right it should be possible to hide the disambiguation away from the user (but still keep it available for serialization) 🤔

Saying that, I don't have any plans for continued development as the current method of parsing (using jison) exhausts heap space, so a rewrite would be necessary to make viable to parse large files.

@C45tr0
Copy link

C45tr0 commented Dec 2, 2018

Do you have any current thoughts for what you want to rewrite it in/to use?

@nickbabcock
Copy link
Owner

So this can still be written in js -- it'd just need to be some sort of hand written recursive descent parser (basically any JSON parser can be used for inspiration). I've written paradox parsers in C#, js, F#, and most recently (but not open sourced) rust. Each language has it's own tradeoffs, so I don't think there'd be one solution that could rule all.

@soryy708
Copy link
Contributor

Why a recursive descent parser?
I've written a parser in C++ that achieves this with regular expressions, which proves that the language is regular. What files did you look at when deciding it's a context free grammar?

@soryy708
Copy link
Contributor

I've begun a hand-rewrite of the parser, so that the output is optionally instrumented in a way that allows unambiguous serialization.
As a start, I've ported my C++ parser to JS.
https://github.com/soryy708/jomini/tree/parser
There's still some work to be done on the parser and tokenizer, so that the tests will pass.

@nickbabcock
Copy link
Owner

nickbabcock commented Jan 3, 2020

What files did you look at when deciding it's a context free grammar?

I'm not too privy to computer science terminology, but I believe it is not a regular language as the format allows arbitrary embedding of delimiters (objects can contain array of objects repeatedly). It's the same reason why JSON is not regular.

I've begun a hand-rewrite of the parser, so that the output is optionally instrumented in a way that allows unambiguous serialization.
As a start, I've ported my C++ parser to JS.

Excellent. I'm more than happy to see what you're thinking.

@soryy708
Copy link
Contributor

soryy708 commented Jan 4, 2020

Apparently you've also made some hand-rolling progress a while ago:
https://github.com/nickbabcock/jomini/tree/handroll

@soryy708
Copy link
Contributor

soryy708 commented Jan 9, 2020

Apparently this used to be powered by a hand-rolled parser before. parser.js (4c7ece2)
Why was it migrated to Jison?

@soryy708
Copy link
Contributor

soryy708 commented Jan 9, 2020

Someone made a F# implementation here: https://github.com/tboby/cwtools/tree/master/CWToolsTests

@soryy708
Copy link
Contributor

soryy708 commented Jan 9, 2020

Someone made a Python implementation here: https://github.com/Shadark/ClauseWizard/

@nickbabcock
Copy link
Owner

Apparently this used to be powered by a hand-rolled parser before. parser.js (4c7ece2)
Why was it migrated to Jison?

Haha, who knew!? Forgot that the commit is from 5 years ago. Looks like I may need to write more descriptive commit messages 😆

My assumption looking at those commits is that jison provided an easier API for development and users at that time. In hindsight, I wished I iterated on the handrolled version, as jison seems unmaintained and a bit baroque, but oh well 🤷‍♂

Someone made a F# implementation here: https://github.com/tboby/cwtools/tree/master/CWToolsTests

Someone made a Python implementation here: https://github.com/Shadark/ClauseWizard/

Yeah there are a lot of parsers out there. I've written my own fair share (C#, C# (2), F#, this one, and other closed sourced implementations). Writing parsers for games you love is a great excuse to program 😄

@soryy708
Copy link
Contributor

Are any of these parsers fit for the purpose of unambiguous conversion from JSON back to Clausewitz format? Maybe the cheapest solution is to make a binding between C# and JavaScript (with edge and/or node-gyp)

@nickbabcock
Copy link
Owner

The latest release uses a parser that is functionally lossless so it would be possible to write out a structure (but not from a JS object).

It would be something along the lines of:

const out = parser.parseText(data, {}, (q) => {
  // update an EU4 save so that the player is england
  q.at("/player", "ENG");
  return q.writeTo(/* a writable stream? */);
});

While this feature is now possible to be implemented in the latest release, I don't have a personal drive for implementing this feature, so as of now if this feature needs to be implemented, it should be done by the community. I'm happy to guide one through the process if they decide to take up this mantle, but until there is a volunteer, I'm going to close this issue.

@soryy708
Copy link
Contributor

soryy708 commented Oct 4, 2020

@nickbabcock sounds good, and I have some interest in implementing that. I don't know how to interface with your webasm implementation though. Does it have documentation?

@nickbabcock
Copy link
Owner

Excellent, I'll reopen the issue for further discussion.

The underlying parser has documentation.

One can derive inspiration from the code bases that convert binary data to plain text:

The binary data has a slightly different format so it won't be one to one but both text and binary formats functionally behave the same.

@nickbabcock nickbabcock reopened this Oct 4, 2020
@CharacterOverflow
Copy link

I too started to take a peek into this. I unfortunately don't have a ton of experience, especially with web assembly, and have been pretty lost in trying to make this change.

I noticed @nickbabcock that another library of yours implements this feature: https://github.com/nickbabcock/Pdoxcl2Sharp

I'm considering using C# just for this feature in a tool I'm creating, but figured I'd ask if there's any kind of update coming on this soon or if there's a way I can help.

@nickbabcock
Copy link
Owner

The issue with converting js objects is that some fields will need to be enriched so that they can be written out properly: For instance, we'd want an object like

{
  army: Inflate([{ name: Quoted("army1") }, { name: Quoted("army2") }]),
  type: "western",
  cores: [Quoted("ENG"), Quoted("FRA")]
}

in order to write out:

army={ name="army1" }
army={ name="army1" }
type=western
cores={ "ENG" "FRA" }

In order to facilitate ergonomics, currently the object returned from parsing is not enriched. I would need to see / investigate how one could provide these enriched types without sacrificing ergonomics or performance. Feel free to share ideas or suggestions.

@nickbabcock
Copy link
Owner

I created a PR to allow one to create PDS text documents: #59

Please let me know your feedback and if that PR would close this issue.

@Clashsoft
Copy link

I have some basic code for writing arbitrary objects, in case anyone finds it useful.
The constants at the start are somewhat game-specific, but can be adapted.
Here I have what works for Stellaris custom empire designs.

const FLAT_ARRAY_KEYS = [
  'ethic',
  'trait',
];
const UNQUOTED_KEYS = [
  'gender',
];

/**
 * @param writer {Writer}
 * @param key {string}
 * @param value {any}
 */
function writeKeyValue(writer, key, value) {
  if (/^[a-zA-Z_]+$/.test(key)) {
    writer.write_unquoted(key);
  } else {
    writer.write_quoted(key);
  }
  writer.write_operator('=');
  writeAny(writer, value, key);
}

/**
 * @param writer {Writer}
 * @param obj {object}
 */
function writeObject(writer, obj) {
  writer.write_object_start();
  writeEntries(writer, obj);
  writer.write_end();
}

/**
 * @param writer {Writer}
 * @param obj {object}
 */
function writeEntries(writer, obj) {
  for (const [key, value] of Object.entries(obj)) {
    if (FLAT_ARRAY_KEYS.includes(key) && Array.isArray(value)) {
      for (const item of value) {
        writeKeyValue(writer, key, item);
      }
    } else {
      writeKeyValue(writer, key, value);
    }
  }
}

/**
 * @param writer {Writer}
 * @param obj {Array}
 */
function writeArray(writer, obj) {
  writer.write_array_start();
  for (const item of obj) {
    writeAny(writer, item);
  }
  writer.write_end();
}

/**
 * @param writer {Writer}
 * @param obj {any}
 * @param key {string}
 */
function writeAny(writer, obj, key = undefined) {
  if (Array.isArray(obj)) {
    writeArray(writer, obj);
  } else switch (typeof obj) {
    case 'string':
      if (UNQUOTED_KEYS.includes(key)) {
        writer.write_unquoted(obj);
      } else {
        writer.write_quoted(obj);
      }
      break;
    case 'number':
      if (Number.isInteger(obj)) {
        writer.write_integer(obj);
      } else {
        writer.write_f64(obj);
      }
      break;
    case 'boolean':
      writer.write_bool(obj);
      break;
    case 'object':
      if (obj instanceof Date) {
        writer.write_date(obj);
      } else if (obj) {
        writeObject(writer, obj);
      }
      break;
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants