Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sylvan.Data.Csv to the benchmarks #4

Merged
merged 4 commits into from Jan 5, 2021
Merged

Add Sylvan.Data.Csv to the benchmarks #4

merged 4 commits into from Jan 5, 2021

Conversation

MarkPflug
Copy link
Contributor

Hello Joel.

I've also also put together some CsvBenchmarks for .NET. I've been claiming that my library, Sylvan.Data.Csv, is the fastest, so I was curious how I stacked up with your dataset. This PR adds my library to your benchmarks. I was aware of NReco and how fast it was, but I hadn't seen this mgholam.FastCsv library. I'll add it to my benchmarks too. I'm also adding the FluentCsv to mine (not yet on github).

@joelverhagen
Copy link
Owner

Woah, nice! Thanks!

Did we really start work on writing a big benchmark suite on the same day?

image

image

How does that happen, LOL!

@MarkPflug
Copy link
Contributor Author

No, we didn't. I pulled the benchmarking code out of my main Sylvan library repo on that day. If you look in that repository you'd see history of this stuff going back to earlier in 2020. Still, pretty funny that date aligned.

@MarkPflug
Copy link
Contributor Author

FWIW I added both mgholam.fastCSV and FluentCSV to my benchmarks on github. You'll probably see a notice from me trying to figure out a better way to implement the mgholam benchmark. The API was a little unclear to me, so if you know of a better way to use it I'd like to see it.

give Sylvan a bigger buffer (to match competitors)
This might be considered "unfair" as it takes advantage of the construction of the test data. However, real-life datasets have shown significant savings here too.
Copy link
Owner

@joelverhagen joelverhagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

@joelverhagen joelverhagen merged commit 8fcbbde into joelverhagen:main Jan 5, 2021
@MarkPflug MarkPflug deleted the sylvan branch January 5, 2021 17:48
@joelverhagen
Copy link
Owner

Woah this thing is fast! I've updated my post:
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers

Nice work Mark!

@MarkPflug
Copy link
Contributor Author

Hah, thanks!
I will point out that it "cheats" a little, in that it employs a mechanism that benefits from highly repetitive data. Your benchmark data set happens to be highly repetitive due to how you construct it. If you look at the memory usage in the benchmark output it is probably even more illuminating than the difference in time.

Anyway, thanks for updating the article. I'm trying to get some momentum behind this project, so every little bit of exposure is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants