Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using CSV format by default for simulations #304

Merged
merged 20 commits into from
Aug 21, 2023
Merged

Using CSV format by default for simulations #304

merged 20 commits into from
Aug 21, 2023

Conversation

al8n
Copy link
Contributor

@al8n al8n commented Aug 15, 2023

I suppose using CSV format by default should be straightforward, but unfortunately, I forgot that complex structs and vec (nested with struct) are not supported by csv crate. I solve this problem by manually implementing CarnotState::serialize deeper and adding a new Record type.

@al8n al8n added this to the Simulation App milestone Aug 15, 2023
@al8n al8n self-assigned this Aug 15, 2023
Copy link
Collaborator

@danielSanchezQ danielSanchezQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the performance improved?

@al8n
Copy link
Contributor Author

al8n commented Aug 16, 2023

Did the performance improved?

The changes in this PR seem not to improve the performance. However, I found several small places about serialization (mainly about avoiding deep clones and the serialization output format) that can be optimized. I will make more small PRs to optimize them.

Copy link
Member

@bacv bacv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix needed for csv writer, otherwise looks good.

simulations/src/streaming/naive.rs Outdated Show resolved Hide resolved
simulations/src/node/carnot/mod.rs Outdated Show resolved Hide resolved
@bacv
Copy link
Member

bacv commented Aug 16, 2023

The changes in this PR seem not to improve the performance. However, I found several small places about serialization (mainly about avoiding deep clones and the serialization output format) that can be optimized. I will make more small PRs to optimize them.

When application uses json as output format for step state, every line of generated json file contains the list of node states during that step. When using csv file, node state during the step is saved as an individual line. The end user might need take that into account when processing data.
One idea for a new PR is to add a STEP_ID to the record keys.

@al8n
Copy link
Contributor Author

al8n commented Aug 16, 2023

The changes in this PR seem not to improve the performance. However, I found several small places about serialization (mainly about avoiding deep clones and the serialization output format) that can be optimized. I will make more small PRs to optimize them.

When application uses json as output format for step state, every line of generated json file contains the list of node states during that step. When using csv file, node state during the step is saved as an individual line. The end user might need take that into account when processing data. One idea for a new PR is to add a STEP_ID to the record keys.

I like the STEP_ID idea. I make some new changes to match CSV format, in JSON, the node states also be flattened to match the CSV format now. I will open a new PR to optimize the serialization so that a step_id is assigned to each node state.

@bacv bacv self-requested a review August 18, 2023 08:59
Copy link
Member

@bacv bacv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, all fields serializes as it should in json and csv, amazing work. 🚀

@al8n al8n merged commit 5391382 into master Aug 21, 2023
2 checks passed
@al8n al8n deleted the csv-sinker branch August 21, 2023 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants