Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-utf8 data in collapse by doing lossy conversion #196

Merged
merged 4 commits into from Nov 30, 2020

Conversation

austinabell
Copy link
Contributor

A project I'm benchmarking has non-utf8 data in the dtrace, and it's quite annoying to have to use iconv to sanitize the file before collapsing. I did this for my own needs, and figured I would PR as it might be useful to others as well.

If you are open to this, I should also add this to other parts of the code that do similar things, but I won't do that preemptively unless go ahead is given (not sure if there is a good reason this is not done)

@jonhoo
Copy link
Owner

jonhoo commented Nov 17, 2020

This seems like a very reasonable thing to include to me!

@austinabell
Copy link
Contributor Author

austinabell commented Nov 17, 2020

This seems like a very reasonable thing to include to me!

Awesome! I've made the change to the other options. I did remove the handling inside of the is_applicable function because the parameter is already a &str and I'm not confident of the usage and was never able to trigger an issue there. I also didn't want to change the interface in any way. If you'd like I can try to switch it to &[u8] param, but I haven't had an issue with it yet.

There should be no additional allocations, except in the case of non-utf8 data, and although benchmarks on my system were pretty imprecise, none showed regressions. I also updated the test case which specifically fails on invalid utf8, was this intentional before?

@austinabell austinabell changed the title Support non-utf8 data in dtrace by doing lossy conversion Support non-utf8 data in collapse by doing lossy conversion Nov 17, 2020
Copy link
Owner

@jonhoo jonhoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I don't have a good answer for why we explicitly tested against the error for the non-UTF8 case. That just seems wrong!

@jonhoo
Copy link
Owner

jonhoo commented Nov 30, 2020

It looks like your change no longer compiles on Rust 1.40.0, though I'm not sure. And looks like the build logs have since disappeared (sorry it too me so long to get back to this). Could you either try running it locally (cargo +1.40.0 test) or just push an empty commit (git commit --allow-empty) to trigger CI again so we can look at what fails?

@austinabell
Copy link
Contributor Author

It looks like your change no longer compiles on Rust 1.40.0, though I'm not sure. And looks like the build logs have since disappeared (sorry it too me so long to get back to this). Could you either try running it locally (cargo +1.40.0 test) or just push an empty commit (git commit --allow-empty) to trigger CI again so we can look at what fails?

Ah it does fail locally, but it does also on master, I can look into fixing this later, but don't have time in the very near future

@jonhoo
Copy link
Owner

jonhoo commented Nov 30, 2020

Ah, looks like one of our dependencies bumped their MSRV. I'll look into it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants