New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(executor): support reading and writing csv files #112
Conversation
I would suggest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
/// The file format. | ||
pub format: FileFormat, | ||
/// The column types. | ||
pub column_types: Vec<DataType>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we know that column types when constructing the plan? Do we need to open that file before actually executing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column types can be inferred from table catalog or query results, so we don't need to open the file.
btw, I have another question: If we are going to export data to an existing file, should we append or truncate it to empty first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only allow export to a new file. e.g. OpenOptions::default().create_new(true)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... if exporting to an existing file, we should truncate the content, as it might contain data from other tables. btw, we should warn users before really overwrite a file.
This PR adds 2 executors:
CopyFromFile
andCopyToFile
, which can read chunks from file and write chunks to file in CSV format (.csv, .tbl).Given that the
csv
crate only supports sync file IO, we spawn the task to blocking IO threads, and transfer the chunks through async channel in tokio.Next step we will parse the
COPY
statement and link it to these executors. But I found thesqlparser
crate does not fully supportCOPY
yet (sqlparser-rs/sqlparser-rs#364). Maybe we should patch this crate first (