feat(executor): support reading and writing csv files #112

wangrunji0408 · 2021-11-07T14:02:28Z

This PR adds 2 executors: CopyFromFile and CopyToFile, which can read chunks from file and write chunks to file in CSV format (.csv, .tbl).
Given that the csv crate only supports sync file IO, we spawn the task to blocking IO threads, and transfer the chunks through async channel in tokio.
Next step we will parse the COPY statement and link it to these executors. But I found the sqlparser crate does not fully support COPY yet (sqlparser-rs/sqlparser-rs#364). Maybe we should patch this crate first (

skyzh · 2021-11-07T15:56:02Z

I would suggest SELECT * FROM 'xxx.csv' directly reading the csv file as table. For export, I haven't thought of that.

skyzh

Rest LGTM

skyzh · 2021-11-07T15:58:35Z

src/physical_planner/copy_to_file.rs

+    /// The file format.
+    pub format: FileFormat,
+    /// The column types.
+    pub column_types: Vec<DataType>,


How can we know that column types when constructing the plan? Do we need to open that file before actually executing?

The column types can be inferred from table catalog or query results, so we don't need to open the file.
btw, I have another question: If we are going to export data to an existing file, should we append or truncate it to empty first?

We should only allow export to a new file. e.g. OpenOptions::default().create_new(true).

... if exporting to an existing file, we should truncate the content, as it might contain data from other tables. btw, we should warn users before really overwrite a file.

wangrunji0408 added 2 commits November 7, 2021 13:56

implement copy-from-file executor

9d48887

implement copy-to-file executor

7b0dd9d

wangrunji0408 added the enhancement New feature or request label Nov 7, 2021

wangrunji0408 requested review from skyzh, MingjiHan99 and pleiadesian November 7, 2021 14:02

skyzh approved these changes Nov 7, 2021

View reviewed changes

wangrunji0408 merged commit 7c84a16 into main Nov 8, 2021

wangrunji0408 deleted the wrj/copy-executor branch November 8, 2021 05:33

wangrunji0408 changed the title ~~executor: support reading and writing csv files~~ feat(executor): support reading and writing csv files Nov 8, 2021

This was referenced Nov 19, 2021

feat: support COPY statement #154

Merged

TODO list for stage 3 #86

Closed

infdahai mentioned this pull request Mar 25, 2023

feat: support reading form csv file cmu-db/bustub#550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(executor): support reading and writing csv files #112

feat(executor): support reading and writing csv files #112

wangrunji0408 commented Nov 7, 2021

skyzh commented Nov 7, 2021

skyzh left a comment

skyzh Nov 7, 2021

wangrunji0408 Nov 8, 2021

skyzh Nov 8, 2021

skyzh Nov 8, 2021

feat(executor): support reading and writing csv files #112

feat(executor): support reading and writing csv files #112

Conversation

wangrunji0408 commented Nov 7, 2021

skyzh commented Nov 7, 2021

skyzh left a comment

Choose a reason for hiding this comment

skyzh Nov 7, 2021

Choose a reason for hiding this comment

wangrunji0408 Nov 8, 2021

Choose a reason for hiding this comment

skyzh Nov 8, 2021

Choose a reason for hiding this comment

skyzh Nov 8, 2021

Choose a reason for hiding this comment