Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CString::from_reader #59314

Closed
wants to merge 7 commits into from

Conversation

@DevQps
Copy link
Contributor

commented Mar 20, 2019

Description

This PR implements the method CString::from_reader(impl Read) -> Result<CString, io::Error>.
Issue #59229 explains why this method should be included in the Standard Library, but it boils down to:

  • People are using unsafe code to read a CString without performance overhead.
  • There are currently no methods in the Standard Library that allow us to read a CString safely without having to check for intermediate null characters (something is often already proven by the way the code is written).

I still have one open point. Currently I construct a CString by pushing the null character to the buffer and then using:

CString { inner: buffer.into_boxed_slice() }

I thought that would be better then calling CString::from_vec_unchecked which results in:

v.reserve_exact(1);
v.push(0);
CString { inner: v.into_boxed_slice() }

In all cases where the vector still contains space for another null character not using CString::from_vec_unchecked would be faster because v.reserve_exact(1); does not have to be called. I believe this occurs far more often then a Vec having no space left when reading from an object that implements Read.

I do however wonder if the compiler is able eliminate the check that occurs in v.push when v.reserve_exact is called just prior to it. If it is able to do that I will change to code to use CString::from_vec_unchecked because in case of a memory allocation it will try to provide a more exact block without the performance overhead of having to do an extra check. If anyone is able to verify this (Assembly is not my strongest point), please let me know :)

What do you guys think about this?

closes #59229

EDIT: Oh, and this is my first pull request on Rust! So if I did anything wrong, please let me know! :)

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Mar 20, 2019

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @shepmaster (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

Two more open questions ( see #59229 ):

  1. Should we implement From<Vec<NonZeroU8> for CString as well?
  2. Should we implement CString::from_bufreader as well?
src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
@rust-highfive

This comment was marked as outdated.

Copy link
Collaborator

commented Mar 20, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:2c28b2fa:start=1553116747728380361,finish=1553116832684845043,duration=84956464682
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
Setting environment variables from .travis.yml
---
travis_time:start:tidy
tidy check
[00:03:26] * 570 error codes
[00:03:26] * highest error code: E0725
[00:03:26] tidy error: /checkout/src/libstd/ffi/c_str.rs:374: malformed stability attribute
[00:03:27] some tidy checks failed
[00:03:27] 
[00:03:27] 
[00:03:27] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:27] 
[00:03:27] 
[00:03:27] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:27] Build completed unsuccessfully in 0:00:45
[00:03:27] Build completed unsuccessfully in 0:00:45
[00:03:27] Makefile:67: recipe for target 'tidy' failed
[00:03:27] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:15b0b390
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed Mar 20 21:24:09 UTC 2019
---
travis_time:end:07b4f18e:start=1553117050281334060,finish=1553117050286103434,duration=4769374
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:1ae81d28
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:076ca21c
travis_time:start:076ca21c
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:01728810
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive

This comment was marked as outdated.

Copy link
Collaborator

commented Mar 20, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:08fad5c6:start=1553119047671070905,finish=1553119138062251271,duration=90391180366
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
Setting environment variables from .travis.yml
---
travis_time:start:tidy
tidy check
[00:03:32] * 570 error codes
[00:03:32] * highest error code: E0725
[00:03:32] tidy error: /checkout/src/libstd/ffi/c_str.rs:374: malformed stability attribute
[00:03:33] some tidy checks failed
[00:03:33] 
[00:03:33] 
[00:03:33] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:33] 
[00:03:33] 
[00:03:33] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:33] Build completed unsuccessfully in 0:00:45
[00:03:33] Build completed unsuccessfully in 0:00:45
[00:03:33] make: *** [tidy] Error 1
[00:03:33] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:03b4be20
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed Mar 20 22:02:41 UTC 2019
---
travis_time:end:1954c950:start=1553119361984604632,finish=1553119361989740828,duration=5136196
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0536ae50
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:00d3c6ab
travis_time:start:00d3c6ab
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:36cc8b71
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@DevQps

This comment was marked as resolved.

Copy link
Contributor Author

commented Mar 20, 2019

@Centril Can you tell me what I did wrong? I replaced it by my issue number ( #59229 ) on which I explained this PR. But maybe something else is meant by "issue"?

src/libstd/ffi/c_str.rs Outdated Show resolved Hide resolved
Changes stable to unstable and removed the hashtag.

Co-Authored-By: DevQps <46896178+DevQps@users.noreply.github.com>
@rust-highfive

This comment was marked as outdated.

Copy link
Collaborator

commented Mar 21, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:0ee2ca80:start=1553120295439445868,finish=1553120399184774265,duration=103745328397
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
Setting environment variables from .travis.yml
---
travis_time:start:test_assembly
Check compiletest suite=assembly mode=assembly (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:21:34] 
[01:21:34] running 9 tests
[01:21:34] iiiiiiiii
[01:21:34] 
[01:21:34]  finished in 0.157
[01:21:34] travis_fold:end:test_assembly

---
travis_time:start:test_debuginfo
Check compiletest suite=debuginfo mode=debuginfo-both (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:21:51] 
[01:21:51] running 120 tests
[01:22:18] .iiiii...i.....i..i...i..i.i...i.ii...i.....i..i....i..........iiii..........i...ii...i.......ii.i.i 100/120
[01:22:23] .i......iii.i.....ii
[01:22:23] 
[01:22:23]  finished in 32.050
[01:22:23] travis_fold:end:test_debuginfo

---
[01:41:26] 
[01:41:26] running 997 tests
[01:41:50] i................................................................................................... 100/997
[01:42:04] .................................................................................................... 200/997
[01:42:14] ............iii......i......i...i.F.....i........................................................... 300/997
[01:42:28] ...........................i.i.....................................iiii........ii................... 500/997
[01:42:37] .................................................................................................... 600/997
[01:42:45] .................................................................................................... 700/997
[01:42:55] .................iiii............................................................................... 800/997
[01:42:55] .................iiii............................................................................... 800/997
[01:43:09] .................................................................................................... 900/997
[01:43:17] ...........................................iiii..................................................
[01:43:17] failures:
[01:43:17] 
[01:43:17] ---- ffi/c_str.rs - ffi::c_str::CString::from_reader (line 367) stdout ----
[01:43:17] error[E0658]: use of unstable library feature 'cstring_from_reader' (see issue #59229)
[01:43:17]  --> ffi/c_str.rs:371:14
[01:43:17]   |
[01:43:17] 7 | let string = CString::from_reader(test.as_bytes()).unwrap();
[01:43:17]   |
[01:43:17]   |
[01:43:17]   = help: add #![feature(cstring_from_reader)] to the crate attributes to enable
[01:43:17] error: aborting due to previous error
[01:43:17] 
[01:43:17] For more information about this error, try `rustc --explain E0658`.
[01:43:17] thread 'ffi/c_str.rs - ffi::c_str::CString::from_reader (line 367)' panicked at 'couldn't compile the test', src/librustdoc/test.rs:310:13
---
[01:43:17] 
[01:43:17] error: test failed, to rerun pass '--doc'
[01:43:17] 
[01:43:17] 
[01:43:17] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "std" "--" "--quiet"
[01:43:17] 
[01:43:17] 
[01:43:17] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:43:17] Build completed unsuccessfully in 0:33:42
[01:43:17] Build completed unsuccessfully in 0:33:42
[01:43:17] make: *** [check] Error 1
[01:43:17] Makefile:48: recipe for target 'check' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:04a49840
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Mar 21 00:03:26 UTC 2019

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive

This comment was marked as outdated.

Copy link
Collaborator

commented Mar 21, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:1e8cd399:start=1553178190718059071,finish=1553178268014020174,duration=77295961103
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
Setting environment variables from .travis.yml
---
travis_time:start:test_assembly
Check compiletest suite=assembly mode=assembly (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:30:05] 
[01:30:05] running 9 tests
[01:30:05] iiiiiiiii
[01:30:05] 
[01:30:05]  finished in 0.163
[01:30:05] travis_fold:end:test_assembly

---
travis_time:start:test_debuginfo
Check compiletest suite=debuginfo mode=debuginfo-both (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:30:23] 
[01:30:23] running 120 tests
[01:30:52] .iiiii...i.....i..i...i..i.i...i.ii...i.....i..i....i..........iiii..........i...ii...i.......ii.i.i 100/120
[01:30:57] .i......iii.i.....ii
[01:30:57] 
[01:30:57]  finished in 34.059
[01:30:57] travis_fold:end:test_debuginfo

---
[01:51:07] 
[01:51:07] running 997 tests
[01:51:33] i................................................................................................... 100/997
[01:51:48] .................................................................................................... 200/997
[01:51:58] ............iii......i......i...i.F.....i........................................................... 300/997
[01:52:13] ...........................i.i.....................................iiii........ii................... 500/997
[01:52:22] .................................................................................................... 600/997
[01:52:30] .................................................................................................... 700/997
[01:52:39] .................iiii............................................................................... 800/997
[01:52:39] .................iiii............................................................................... 800/997
[01:52:55] .................................................................................................... 900/997
[01:53:04] ...........................................iiii..................................................
[01:53:04] failures:
[01:53:04] 
[01:53:04] ---- ffi/c_str.rs - ffi::c_str::CString::from_reader (line 367) stdout ----
[01:53:04] error[E0658]: use of unstable library feature 'cstring_from_reader' (see issue #59229)
[01:53:04]  --> ffi/c_str.rs:371:14
[01:53:04]   |
[01:53:04] 7 | let string = CString::from_reader(test.as_bytes()).unwrap();
[01:53:04]   |
[01:53:04]   |
[01:53:04]   = help: add #![feature(cstring_from_reader)] to the crate attributes to enable
[01:53:04] error: aborting due to previous error
[01:53:04] 
[01:53:04] For more information about this error, try `rustc --explain E0658`.
[01:53:04] thread 'ffi/c_str.rs - ffi::c_str::CString::from_reader (line 367)' panicked at 'couldn't compile the test', src/librustdoc/test.rs:310:13
---
[01:53:04] 
[01:53:04] error: test failed, to rerun pass '--doc'
[01:53:04] 
[01:53:04] 
[01:53:04] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "std" "--" "--quiet"
[01:53:04] 
[01:53:04] 
[01:53:04] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:53:04] Build completed unsuccessfully in 0:36:28
[01:53:04] Build completed unsuccessfully in 0:36:28
[01:53:04] make: *** [check] Error 1
[01:53:04] Makefile:48: recipe for target 'check' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0491a5d0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Mar 21 16:17:42 UTC 2019

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@DevQps

This comment was marked as resolved.

Copy link
Contributor Author

commented Mar 21, 2019

@Centril Sorry for bothering you again. In my last commit I removed the unit tests I made because they cannot be executed if the feature cstring_from_reader is not enabled. However, I know see that the doc test cannot be executed as well because of the same reason. How do you suggest I approach this? Remove the doc test as well? Or add a feature attribute somehow?

@lzutao

This comment was marked as resolved.

Copy link
Contributor

commented Mar 21, 2019

@DevQps You could add #![feature(cstring_from_reader)] to those examples.

@DevQps

This comment was marked as outdated.

Copy link
Contributor Author

commented Mar 25, 2019

@shepmaster @Centril Just as a bump! If there's something I can change to make it better, please let me know!

@DevQps

This comment was marked as resolved.

Copy link
Contributor Author

commented Mar 30, 2019

@shepmaster @Centril Just as an extra bump in case you've forgotten about this one :) If there is anything I can do to improve it, or if you have better idea's please let me know :) EDIT: Just as an extra question: What do you see as an appropriate time frame to ping reviewers again if no response was given yet?

@Centril

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

Randomly reassigning to t-libs; r? @Amanieu

@rust-highfive rust-highfive assigned Amanieu and unassigned shepmaster Mar 30, 2019
@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

@Amanieu Just a bump! I hope you can review this for me!

@sfackler

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Reading a single byte at a time has a massive amount of overhead - this is why read_until is only defined on BufRead. If the objective here is to create a CString with minimal overhead, that's not the right approach.

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

Reading a single byte at a time has a massive amount of overhead - this is why read_until is only defined on BufRead. If the objective here is to create a CString with minimal overhead, that's not the right approach.

I thought this at the start as well! But let's take a &[u8] which also implements Read. I guess this call can be easily inlined for great performance. BufRead also implements Read which (I am quite sure of) can also be inlined for good performance, since it contains a buffer right? I agree that some implementations of Read might be very slow, but maybe it's best to leave that up to the programmer, since some objects that implement Read can be fast as well?

Another problem with BufRead::read_until is that it (very efficiently) reads into a Vec<u8>, but if we want to construct a CString from that we'd either have to use CString::new which suffers in performance (because it checks the whole string again for intermediate null-characters) or CString::from_vec_unchecked which is unsafe.

This method was proposed to allow crate authors to safely construct a CString without suffering performance overhead or using unsafe methods.

I hope you follow my rationale! If you have a better idea or just don't agree, please let me know :)

@lzutao

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

I think @dtolnay would have a thought on this PR (based on serde_json::de::from_reader
and his comments on serde-rs/json#160).

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

@lzutao Thanks for the interesting links! Let's wait what @dtolnay has to say on this :) I might also be able to create some benchmarks tomorrow maybe.

@sfackler

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

some objects that implement Read can be fast as well?

All of those objects also implement BufRead.

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 5, 2019

@sfackler I just saw that &[u8] also implements BufRead. What do you suggest to do? Replace this with a function 'CString::from_bufread` or do you have something else in mind?

@dtolnay

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

As far as I am able to tell from the PR description and the linked issue, the motivation for this is focused on the intersection of high-performance code and safe code. If we knew downstream code didn't care about performance, we would be fine with them implementing this using CString::new rather than adding a constructor. If we knew downstream code didn't mind writing some unsafe code, such as in libflate, similarly we would be fine with them implementing this using CString::from_vec_unchecked.

In particular, the motivation does not appear to claim that this piece of logic is so commonly seen in Rust code that it makes sense for std to provide a helper for it. The change is justified really only because this is not possible to implement efficiently in safe code. If it were possible to implement this efficiently in safe code, we would not see the need to bring it into std. Let me know if this does not accurately describe the position of the PR author.


I see these options:

  • Add a CString constructor that reads from io::Read, as implemented here.

    This is maximally flexible but you would rarely want to call it on an unbuffered reader because of the 8x performance hit (as observed in serde-rs/json#160 (comment)). Even on readers that happen to be buffered, this implementation would be substantially slower than calling BufRead::read_until which is based on very optimized memchr and extend_from_slice. Finally the implementation would not be able to leverage the better performance of BufRead implementations that provide a custom implementation of read_until.

    I think given the motivation for the change, this option is not justified.

  • Add a constructor that reads from io::BufRead.

    This makes things potentially very complicated if code after reading the CString needs to continue with the same R type as the input unbuffered reader. Minimized example:

    fn f<R: Read>(mut rd: R) -> (CString, R) {
        // ???
    }

    The implementation cannot wrap R into io::BufReader for the call to CString::from_reader because then anything sitting in the buffer after the call would be lost when unwrapping the buffered reader back into R.

    I guess the caller would need to use a "fake" BufRead adapter instead of io::BufReader, where the adapter pretends to be buffered by implementing BufRead but really only reads one byte at a time without a buffer.

  • Add a constructor that reads from io::Read but uses specialization to read more efficiently from io::BufRead.

    I think this is viable but we would need to consult with people who have an opinion on how specialization is used in externally observable ways within the standard library, and we may block stabilization until specialization is better figured out.

  • Don't add such a constructor.

    This would be my current preference. I don't think this use case is common enough to support via a new CString constructor in the standard library. Do you know of any code other than libflate that needs this functionality?

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 6, 2019

@dtolnay Thanks for your detailed response! What you said about my assumptions is indeed right. Currently, it is impossible to safely construct a CString without performance overhead. If it were possible using another elegant way, then this PR would indeed not be necessary!

I must admit that it doesn't feel like the most beautiful solution. It feels like there must be some better way then adding methods to structs such that they can be read. In my opinion, it always felt weird that Rust does not support reading "primitive" types in std. Types such as u32, f32 but also a textual number or a CString. These operations feel so basic to me, that it feels strange to require an external crate such as byteorder to do this.

I have been diving a bit in the usage of unsafe for only a couple of weeks so I don't know many other scenario's. However, I think there are quite some binary file formats that use CStrings right? There are network protocols such as DNS which use this as well. Therefore, I do believe it needs a safe/performant way. I admit, that most locations inside binary formats that use CString are often in headers which are not that performance critical, but for some network protocols this is not the case I guess.

So in short points:

  • I believe reading a CString safely/fast is a basic operation used quite often in binary formats and network protocols, which should be somehow supported by std.
  • I agree that the current solution is not the most elegant solution.
  • I feel like Rust std currently misses the basic capability of reading "primitive types" in general.

I wonder what your thoughts are on this though! I will probably close this PR afterwards.

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 6, 2019

Oh! I completely forgot to ask, but I hope you could explain the following to me!:

Let's say we have a function called fn read_something_from_read<T: Read>(r: &T) -> Something.
Let's take two examples:

  • We have a Read object without buffering and we pass it to our function. This will (I guess?) probably inline the Read::read call for the specific case. So if the struct that implements Read implements it very efficiently it's great, otherwise it's still very slow.

  • We have a BufRead object (by for example wrapping our original Read implementation in a BufReader). And we then pass it to our function. I guess again this function can (hopefully?) be inlined again and probably has great performance due to buffering right?

So I was wondering: Why is the serde's from_reader case linked by @lzutao still so much slower (or the case mentioned here)? I keep failing to understand that somehow :)

EDIT: Or does this have to do with memchr and extend_from_slice? I guess a BufRead also needs to maintain an internal buffer so I guess still do not see why it could be faster.

Hope you don't mind all these questions :)

@dtolnay

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

It feels like there must be some better way then adding methods to structs such that they can be read.

I share this opinion.

In my opinion, it always felt weird that Rust does not support reading "primitive" types in std.

But not this one. I think there is room for a library based on a well-designed trait (as opposed to one-off associated functions) that makes it easy to write the type of code you have in mind involving reading primitives and cstrings.

Regarding from_reader performance: the from_reader codepath inherently involves more work than from_bytes. https://github.com/serde-rs/json/blob/v1.0.39/src/read.rs has the details, but some examples:

  • When processing strings that the caller needs to look at but not store, such as names of struct fields in JSON, from_str can let them look directly at a substring of the input data (performing zero writes to memory) but from_reader must write data from the input stream into a buffer for the caller to look at. Check out the method signatures on io::Read and notice that copying into a buffer is fundamentally the only way that data can be looked at -- I guess except for Read::bytes but that would be even slower for our use case.

  • To come up with line numbers for error reporting, from_str can go back to the beginning and count lines only in the case that an error occurs. From_reader can't "go back" so it must keep track of position in the stream even in the case that no error happens and the position is never used.

@DevQps

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

@dtolnay Thanks for your response! I didn't have time this weekend so sorry for the small delay!

I will close this issue now!

I secretly have a few more questions (hope you don't mind answering them otherwise you can ignore them :))

  1. I think your point of "referencing" the input buffer such that we don't have to copy content into another buffer is great (didn't pop up in my mind)! But I guess there are also cases where this is not very efficient right?. Let's say we have a rather large part of a binary file which contains both string-like data and data that we want to process/consume and represent in a different format. If we would reference the string-like data we would have to maintain the complete buffer (whereas maybe only a small part would actually be referenced to right?) In this case copying would likely be better in order to save memory? (Just checking to be sure)

  2. I still fail to understand why this functionality is not included in the std but I didn't want to bother you with that question (after all I already asked you so much) so I asked it in a Reddit post here might you be interested.

Re-reading your comments from earlier provided me some useful insights! Maybe if I have some spare time, I can look into some interesting solutions that could solve this problem.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.