-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CSV Storage, #1280
Implement CSV Storage, #1280
Conversation
Currently it only fails on schemaless tests
now CsvStorage passes all the tests in the test-suite.
Pull Request Test Coverage Report for Build 5864011608
💛 - Coveralls |
add schema but no-types case. add having both schema and types case.
}; | ||
|
||
let mut data_rdr = csv::Reader::from_path(data_path).map_storage_err()?; | ||
let mut fetch_data_header_columns = || -> Result<Vec<String>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is okay to use just immutable
and variable
let fetch_data_header_columns = data_rdr
.headers()
.map_storage_err()?
.into_iter()
.map(|header| header.to_string())
.collect::<Vec<_>>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... If it was intended to run only in 2 of 3 conditions, mutable closure would be right.
.chain(rows) | ||
.collect(); | ||
|
||
self.write(table_name, columns, rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using mut and extend?
let mut appended_rows = prev_rows
.map(|result| Ok(result?.1))
.collect::<Result<Vec<_>>>()?;
appended_rows.extend(rows);
self.write(table_name, columns, appended_rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
let rows = prev_rows
.map(|item| item.map(|(_, row)| row))
.chain(rows.into_iter().map(Ok))
.collect::<Result<Vec<_>>>()?;
this? then we can merge two rows with a single collect.
Rather than collecting all rows to memory, now it uses tmp files. After all read & write tasks finished, rename tmp_* files to data and types files.
Add CSVStorage Functionality
Introducing
CSVStorage
: a utility to process *.csv files, enabling SQL-like query operations such as SELECT, INSERT, and UPDATE.Key Features:
SQL Queries on CSV: Directly parse and operate on *.csv files using familiar SQL query operations.
Optional Schema Support: An associated schema can be provided for each CSV file. For instance, for a data file named
Book.csv
, its corresponding schema file should be namedBook.sql
.Type Info File for Schemaless Data: An auxiliary types file (
*.types.csv
) can be used to support data type recognition for schemaless data.Book.csv
, its corresponding types file will beBook.types.csv
.