Bugfix: Databook Write #18

rbreslavsky · 2023-06-06T00:11:43Z

I noticed that when I have custom columns in my run matrix that use strings with a comma, CAPE cannot write databook files correctly. In cape/cfdx/dataBook.py/write(), the run matrix cells are correctly read and interpreted, but on the write command the comma inside the string is recognized as a delimiter. The result is that the strings are split into multiple columns. This breaks anything that needs to process databooks.

I wonder if there's a way to make sure that strings are recognized and not split up? Another approach could be to use pandas to read/write csv files, but that's probably a longer term solution. I'm sure it would allow you to eliminate some code.

nasa-ddalle · 2023-06-06T23:31:46Z

I remember encountering this problem years ago and deciding that our team would just use a different delimiter for run matrix keys, e.g 1.5,2.0,poweroff,bro|ta using | as a secondary delimiter. The problem seemed intractable b/c the DataBook read used np.loadtxt(), which completely ignored quotation marks. Looking at the current code, I don't think it will be hard to fix this, though.

nasa-ddalle · 2023-06-06T23:33:50Z

Should be fixed with commit e72d019 "Move updated DataBook line splitter to cape.util".

There's an update to test/001_cape/007_databook that makes sure the str with comma within can be read by the databook (see aero_fuselage.csv).

rbreslavsky · 2023-06-07T22:26:39Z

Thanks Derek. Now a list of strings, when read from a CSV, is encoded in CAPE as "['one', 'two']", which writes correctly to CSV format. However, a list of values would be encoded as as '[1, 2]'; this is not recognized as a string in CSV, and is still delimited (split) by the comma. What we'd want is something like "[1, 2]".

With that in mind, line 3481 of cfdx/dataBook.py might benefit from this change:
fmtj = '%r' --> fmtj = '%r' if "'" in vj else '"%r"'

The downside is that this will break if the number of apostrophes isn't 2. So maybe there's a more robust way to code it. What do you think?

nasa-ddalle · 2023-06-07T22:52:36Z

This makes me a bit uncomfortable b/c there's a silent transition from a list to a string. So in your example writing out [1, 2] would then cause it to be read back in as "[1, 2]". I can do it, but there's an argument you should just convert your lists to strings prior to writing... I can put it in though. As for the quotations worry, I'll just double-call repr(). For example repr(repr([1, 'a'])) -> "[1, 'a']", and repr(repr([1, 'a "b"'])) -> '[1, \'a "b"\']'.

nasa-ddalle · 2023-06-07T23:07:47Z

Ok, I pushed a new commit d1a6a313 that's supposed to fix this. I didn't update the test yet, but it should work as you described. I'm not sure what happens when you read something like '[1, \'a "b"\']' back into a new databook instance, though. In particular, if that column is expecting an int or float, I think it will raise an exception. If that column was already set to be a str, the solution should be correct.

rbreslavsky · 2023-06-08T00:24:56Z

Just to be clear, all of the lists in my run matrix are indeed written as strings. I've been using pandas to generate the run matrix in the first place, and it writes to CSV in the correct manner. So all I'm trying to do is make sure that when my matrix columns are read in and written to the databooks, they're still strings (as recognized by CSVs).

nasa-ddalle · 2023-06-08T07:23:26Z

Ok, so I now understand the issue to be that single quotes from Python's repr() are not going over well. So for example writing the string "[1, 2]" to file with the text '[1, 2]' has compatibility issues. With that in mind, bad8e2d uses json.dumps() instead of repr(). That will always produce double quotes for the outer string definition while still being able to escape quotes within, e.g. "[1, \"a\"]".

nasa-ddalle closed this as completed Jun 8, 2023

nasa-ddalle mentioned this issue Jul 6, 2023

DataBook read issue #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix: Databook Write #18

Bugfix: Databook Write #18

rbreslavsky commented Jun 6, 2023

nasa-ddalle commented Jun 6, 2023

nasa-ddalle commented Jun 6, 2023

rbreslavsky commented Jun 7, 2023 •

edited

nasa-ddalle commented Jun 7, 2023

nasa-ddalle commented Jun 7, 2023

rbreslavsky commented Jun 8, 2023 •

edited

nasa-ddalle commented Jun 8, 2023

Bugfix: Databook Write #18

Bugfix: Databook Write #18

Comments

rbreslavsky commented Jun 6, 2023

nasa-ddalle commented Jun 6, 2023

nasa-ddalle commented Jun 6, 2023

rbreslavsky commented Jun 7, 2023 • edited

nasa-ddalle commented Jun 7, 2023

nasa-ddalle commented Jun 7, 2023

rbreslavsky commented Jun 8, 2023 • edited

nasa-ddalle commented Jun 8, 2023

rbreslavsky commented Jun 7, 2023 •

edited

rbreslavsky commented Jun 8, 2023 •

edited