Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CSV Storage, #1280

Merged
merged 28 commits into from Aug 15, 2023
Merged

Implement CSV Storage, #1280

merged 28 commits into from Aug 15, 2023

Conversation

panarch
Copy link
Member

@panarch panarch commented Jul 13, 2023

Add CSVStorage Functionality

Introducing CSVStorage: a utility to process *.csv files, enabling SQL-like query operations such as SELECT, INSERT, and UPDATE.

Key Features:

  1. SQL Queries on CSV: Directly parse and operate on *.csv files using familiar SQL query operations.

  2. Optional Schema Support: An associated schema can be provided for each CSV file. For instance, for a data file named Book.csv, its corresponding schema file should be named Book.sql.

    • If an associated schema file is found, it will be read and applied.
    • In the absence of a schema file, the first row of the data file will be treated as column headers and all column types will default to TEXT.
  3. Type Info File for Schemaless Data: An auxiliary types file (*.types.csv) can be used to support data type recognition for schemaless data.

    • For a CSV data file named Book.csv, its corresponding types file will be Book.types.csv.
    • The types file will have a 1:1 mapping with the CSV data file entries, specifying the data type for each entry in alignment with the GlueSQL conventions.

Currently it only fails on schemaless tests
@panarch panarch added the enhancement New feature or request label Jul 13, 2023
@panarch panarch self-assigned this Jul 13, 2023
@github-actions github-actions bot requested review from devgony and ever0de July 13, 2023 07:37
@coveralls
Copy link

coveralls commented Jul 20, 2023

Pull Request Test Coverage Report for Build 5864011608

  • 797 of 800 (99.63%) changed or added relevant lines in 11 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 99.18%

Changes Missing Coverage Covered Lines Changed/Added Lines %
storages/csv-storage/src/store_mut.rs 155 158 98.1%
Files with Coverage Reduction New Missed Lines %
core/src/executor/validate.rs 1 99.24%
Totals Coverage Status
Change from base Build 5857086559: -0.02%
Covered Lines: 46791
Relevant Lines: 47178

💛 - Coveralls

@panarch panarch marked this pull request as ready for review August 8, 2023 14:24
};

let mut data_rdr = csv::Reader::from_path(data_path).map_storage_err()?;
let mut fetch_data_header_columns = || -> Result<Vec<String>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is okay to use just immutable and variable

let fetch_data_header_columns = data_rdr
    .headers()
    .map_storage_err()?
    .into_iter()
    .map(|header| header.to_string())
    .collect::<Vec<_>>();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... If it was intended to run only in 2 of 3 conditions, mutable closure would be right.

.chain(rows)
.collect();

self.write(table_name, columns, rows)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using mut and extend?

let mut appended_rows = prev_rows
    .map(|result| Ok(result?.1))
    .collect::<Result<Vec<_>>>()?;

appended_rows.extend(rows);

self.write(table_name, columns, appended_rows)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

let rows = prev_rows
    .map(|item| item.map(|(_, row)| row))
    .chain(rows.into_iter().map(Ok))
    .collect::<Result<Vec<_>>>()?;

this? then we can merge two rows with a single collect.

@panarch panarch requested a review from devgony August 15, 2023 06:09
@panarch panarch merged commit 116e92f into main Aug 15, 2023
9 checks passed
@panarch panarch deleted the csv-storage branch August 15, 2023 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants