You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the files are read in via the Table class it seems to define columns based on the first row. So table1 reads in incorrectly (dropping the extra column) but table 2 reads in correctly as the key exists in the first row and subsequent rows are null.
Happy to keep closed if you think it should. I guess the workaround for our needs would be to parse the data once grabbing the keys and then using them with the Table class (via the Dialect).
Although you get the same skipping of keys when using describe, which I would expect to catch all columns of your data?
fromfrictionlessimportdescribe_schemaschema=describe_schema("test_data1.jsonl")
schema# No new_col in the schema
For example when using pandas you get the extra column (my assumption is that it must parse the data twice, not sure you would want to do the same for describe, or give a "greedy" option to scan the data.
Overview
jsonl additional keys (columns) in a jsonl file will be dropped if they do not exist in the first json blob of the file. I have two files:
test_data1.jsonl
test_data2.jsonl (same as testdata1 but with row 1 and row 3 switched)
When the files are read in via the Table class it seems to define columns based on the first row. So table1 reads in incorrectly (dropping the extra column) but table 2 reads in correctly as the key exists in the first row and subsequent rows are null.
Outputs:
Please preserve this line to notify @roll (lead of this repository)
The text was updated successfully, but these errors were encountered: