CSV inference fixes by mildbyte · Pull Request #562 · splitgraph/sgr

mildbyte · 2021-11-08T16:20:41Z

Make CSV parsing/inference more robust so that it doesn't crash on examples from (https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html) (sometimes they are malformed, but it will output a table with paddings/truncations).

Add `skipinitialspace=True` to csv reader. This works around CSV files that use leading spaces in headers/fields and makes inference more tolerant (" 2" gets inferred as a number instead of a string). col1,col2,col3 1, 2, aa Only treat actual JSON objects as the JSON datatype (had some false positives where `"42"` parses as JSON by us but it really shouldn't be. Ignore empty rows at inference/query time.

This is a tradeoff, since it means we will try to silently ignore errors in malformed/weird CSV files and return a bunch of varchar columns because some data is shifted around, however, this is still better than flat out erroring since it will give the user some feedback and let them change the parameters or know how to fix their file.

* Splitfile speedups (#567) * Various query speedups (#563, #561) * More robust CSV querying (#562) Full set of changes: [`v0.2.17...v0.2.18`](v0.2.17...v0.2.18)

mildbyte added 4 commits November 8, 2021 13:53

Minor refactor + add a test for one of the people.sc.fsu.edu CSV files

f69b64b

Bump PostGIS.

5bb830a

mildbyte merged commit 10966c3 into master Nov 8, 2021

mildbyte deleted the bugfix/csv-inference-fixes branch November 8, 2021 16:24

mildbyte added a commit that referenced this pull request Nov 17, 2021

Bump version: 0.2.17 → 0.2.18

cc98bd8

* Splitfile speedups (#567) * Various query speedups (#563, #561) * More robust CSV querying (#562) Full set of changes: [`v0.2.17...v0.2.18`](v0.2.17...v0.2.18)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV inference fixes#562

CSV inference fixes#562
mildbyte merged 4 commits intomasterfrom
bugfix/csv-inference-fixes

mildbyte commented Nov 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mildbyte commented Nov 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant