Skip to content

Fixes to CSV encoding/line endings/dialect inference#432

Merged
mildbyte merged 5 commits intomasterfrom
feature/csv-encoding-inference
Apr 7, 2021
Merged

Fixes to CSV encoding/line endings/dialect inference#432
mildbyte merged 5 commits intomasterfrom
feature/csv-encoding-inference

Conversation

@mildbyte
Copy link
Copy Markdown
Contributor

@mildbyte mildbyte commented Apr 7, 2021

  • Autodetect the encoding using chardet
  • Add more configuration to the CSV plugin for: encoding, dialect (e.g. "excel"), sample size for inference
  • Bump the sample size to 64KB to have a better chance of inferring the dialect for wider tables
  • Autogenerate column names for unnamed columns
  • Handle Mac-style and other newlines (universal newlines mode)

mildbyte added 5 commits April 7, 2021 12:59
…to a separate module. Get the CSV plugin to also infer the file's encoding and get it to handle Windows line endings properly. Also make the sample size for inference customizable.
….g. col_1) since PG doesn't like empty column names. Add an integration test for the end-to-end querying + import through FDW with an unnamed column.
@mildbyte mildbyte merged commit 09e0f56 into master Apr 7, 2021
@mildbyte mildbyte deleted the feature/csv-encoding-inference branch April 7, 2021 15:24
mildbyte added a commit that referenced this pull request Apr 7, 2021
  * Fixes to the Snowflake data source (#421)
  * Add automatic encoding, newline and dialect inference to the CSV data source (#432)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant