Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion of None to "null" string inconsistency #23

Open
himbeles opened this issue May 25, 2021 · 5 comments
Open

Conversion of None to "null" string inconsistency #23

himbeles opened this issue May 25, 2021 · 5 comments

Comments

@himbeles
Copy link

Currently, rtoml converts fields with value None to a value of string "null" in the dumped string.
Loading the string back in results in a "null" string in python instead of None.

I can only imagine two possible correct behaviors for loading "null" strings instead:

  • "null" strings should be converted to None in python, or
  • fields with value "null" should not be imported at all.

Conversely, for dumping None values:

  • convert to null (without the quotation marks) -- representing a json null value
  • do not dump fields with value None at all

Here are two tests demonstrating the failing scenario with rtoml and the passing example using toml:

import rtoml

def test_none_to_null_string():
    """ fails with 

    ```AssertionError: assert {'optional_field': None} == {'optional_field': 'null'}```

    because None is dumped to "null" string value which is also loaded back in as the string "null".
    """
    d = {"optional_field": None}
    d_goal1 = d
    d_goal2 = {}
    dumped = rtoml.dumps(d)
    loaded = rtoml.loads(dumped)
    assert loaded == d_goal1 or loaded == d_goal2

def test_none_to_null_string_passing_toml_package():
    """ passes with toml packages, 
    because fields with value None are not dumped.
    """
    import toml
    d = {"optional_field": None}
    d_goal1 = d
    d_goal2 = {}

    dumped = toml.dumps(d)
    loaded = toml.loads(dumped)
    assert loaded == d_goal1 or loaded == d_goal2
@ma1ex
Copy link

ma1ex commented Sep 12, 2021

I can only imagine two possible correct behaviors for loading "null" strings instead:

* "null" strings should be converted to `None` in python, or

* fields with value "null" should not be imported at all.

Conversely, for dumping None values:

* convert to null (without the quotation marks) -- representing a json null value

* do not dump fields with value `None` at all

In general, the TOML v1.0.0 specification does not allow serialization/deserialization of NoneType. Therefore, NoneType is converted to a string type with a value of 'null'.

Until the author decides to add such functionality, it is possible to simulate the expected behavior at the Python level, for example like this:

import traceback
from pathlib import Path
from typing import Dict, Any

import rtoml


def load_config(toml_path: str, encoding='utf-8-sig', errors='ignore') -> Dict[str, Any]:
    """
    Custom loader for deserializing a TOML file with recursively
    replaced 'null' strings in Python NoneType

    :param toml_path: string path;
    :param encoding: source TOML encoding (Default opens UTF-8 with/without BOM);
    :param errors: 'strict' or 'ignore' encoding errors;

    :return: Deserialized TOML as Python dict
    """

    def dict_replace_null(_dict: dict) -> dict:
        x = {}
        for k, v in _dict.items():
            if isinstance(v, dict):
                v = dict_replace_null(v)
            elif isinstance(v, list):
                v = list_replace_null(v)
            elif isinstance(v, str) and v == 'null':
                v = None
            x[k] = v
        return x

    def list_replace_null(_list: list) -> list:
        x = []
        for e in _list:
            if isinstance(e, list):
                e = list_replace_null(e)
            elif isinstance(e, dict):
                e = dict_replace_null(e)
            elif isinstance(e, str) and e == 'null':
                e = None
            x.append(e)
        return x

    try:
        data = rtoml.load(Path(toml_path).read_text(
            encoding=encoding,
            errors=errors
        ))

        return dict_replace_null(data)

    except FileNotFoundError:
        print(traceback.format_exc())  # Del 'print()' and wrap in a logging

        return {}  # Return empty dict without stopping the program

@samuelcolvin
Copy link
Owner

samuelcolvin commented Sep 12, 2021

toml-lang/toml#30 is the discussion about this for the TOML spec.

I happen to disagree with them, but that's doesn't really matter much.

If we follow their suggestions we should remove any Nones, but that still won't provide correct round tripping: [1, 2, 3, None, 5] would become [1, 2, 3, 5] after a round trip which is arguable more confusing than [1, 2, 3, 'null', 5], and the same with dicts.

I therefore think we should either:

  1. leave it as is - people can add their own logic as @ma1ex suggests
  2. add a none_string argument to load() and dump() which when set means that string is interpreted as None (and visa versa) - this would allow loss-less round trips

I guess we could also have another argument "omit_none" to dump() to just omit None as suggested in the discussion above, but that's kind of a separate feature.

I'm a bit busy, but happy to review a PR to add none_string and/or omit_none.

@ma1ex
Copy link

ma1ex commented Sep 12, 2021

I happen to disagree with them, but that's doesn't really matter much.

I am too!

If we follow their suggestions we should remove any Nones, but that still won't provide correcting round tripping: [1, 2, 3, None, 5] would become [1, 2, 3, 5] after a round trip which is arguable more confusing than [1, 2, 3, 'null', 5], and the same with dicts.

But then such behavior would break the logic, IMHO. Perhaps not everyone needs this functionality, but having it allows to migrate from JSON painlessly.

I therefore think we should either:

2. add a `none_string` argument to `load()` and `dump()` which when set means that string is interpreted as `None` (and visa versa) - this would allow loss-less round trips

Personally, I prefer this option, but at the level of Rust to preserve the integrity of the code and performance.

I guess we could also have another argument "omit_none" to dump() to just omit None as suggested in the discussion above, but that's kind of a separate feature.

I still think that "omit_none" in the case of Python is not a very useful functionality and will only confuse people and allow mistakes to be made.

I'm a bit busy, but happy to review a PR to add none_string and/or omit_none.

If you don't mind, I can suggest PR but at Python code level - implement my example in rtoml\__init__.py, as my knowledge of Rust, unfortunately, does not yet allow me to fully code in it ))

@karolzlot
Copy link

I happen to disagree with them, but that's doesn't really matter much.

@samuelcolvin I also disagree, not having proper None really hurts usability of toml format.
As we probably can't change official specification, the only solution is to have option to enable/disable various modes of rtoml operation (so people who want to have official behavior can have it).


I think the best solution is to have at least those two modes:

  • Mode "no-null" strictly following the specification
  • Mode "json-null" using null value similar to one in json, so without quotation marks as @ma1ex suggested:

Conversely, for dumping None values:

* convert to null (without the quotation marks) -- representing a json null value
  • Mode "string-null" is also possible, but I think it isn't good replacement for "json-null" mode.

This "json-null" would be easier to use than "string-null" for uses cases similar to json / yaml use cases.
It can always happen that somebody wants to have None and "None" as separate values.


Having all 3 of those modes available of course won't hurt.

@karolzlot
Copy link

karolzlot commented Apr 19, 2022

@ma1ex I would like to give a little feedback to your solution you provided in #23 (comment)

In this case, as you need to modify only the current item, it is safe to modify directly dict/list which you are iterating over without creating second dict/list.

Not sure if it would save any memory though, because I suspect that Python isn't doubling anything in memory anyway. (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants