Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(executor): support reading and writing csv files #112

Merged
merged 2 commits into from Nov 8, 2021

Conversation

wangrunji0408
Copy link
Member

This PR adds 2 executors: CopyFromFile and CopyToFile, which can read chunks from file and write chunks to file in CSV format (.csv, .tbl).
Given that the csv crate only supports sync file IO, we spawn the task to blocking IO threads, and transfer the chunks through async channel in tokio.
Next step we will parse the COPY statement and link it to these executors. But I found the sqlparser crate does not fully support COPY yet (sqlparser-rs/sqlparser-rs#364). Maybe we should patch this crate first (

@wangrunji0408 wangrunji0408 added the enhancement New feature or request label Nov 7, 2021
@skyzh
Copy link
Member

skyzh commented Nov 7, 2021

I would suggest SELECT * FROM 'xxx.csv' directly reading the csv file as table. For export, I haven't thought of that.

Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

/// The file format.
pub format: FileFormat,
/// The column types.
pub column_types: Vec<DataType>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we know that column types when constructing the plan? Do we need to open that file before actually executing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column types can be inferred from table catalog or query results, so we don't need to open the file.
btw, I have another question: If we are going to export data to an existing file, should we append or truncate it to empty first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only allow export to a new file. e.g. OpenOptions::default().create_new(true).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... if exporting to an existing file, we should truncate the content, as it might contain data from other tables. btw, we should warn users before really overwrite a file.

@wangrunji0408 wangrunji0408 merged commit 7c84a16 into main Nov 8, 2021
@wangrunji0408 wangrunji0408 deleted the wrj/copy-executor branch November 8, 2021 05:33
@wangrunji0408 wangrunji0408 changed the title executor: support reading and writing csv files feat(executor): support reading and writing csv files Nov 8, 2021
This was referenced Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants