Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocating new Vec for every get() call #26

Closed
Boscop opened this issue Jul 14, 2018 · 17 comments
Closed

Avoid allocating new Vec for every get() call #26

Boscop opened this issue Jul 14, 2018 · 17 comments

Comments

@Boscop
Copy link

Boscop commented Jul 14, 2018

This allocates a new Vec for every get() call:
https://github.com/pyros2097/rust-embed/blob/b509d63985c4079c0cd2aaf2e3c951ee249003d5/src/lib.rs#L62

It would be better to return a &'static [u8] by writing this instead:

    #key => Some(&include_bytes!(#canonical_path_str)[..]),

And then pub fn get(file_path: &str) -> Option<&'static [u8]> {

@pyrossh
Copy link
Owner

pyrossh commented Jul 16, 2018

Also at the time most web frameworks supported the Vec<u8> as a response type.
Will need to verify this if it is possible both in dev mode and release mode. If it is more performant and handles integration with existing frameworks well then it should be a no problem. (albeit this can be solved easily)
Also would like to know if it allocates the whole file data in memory or only references that part of the memory and returns a pointer to it?

@Boscop
Copy link
Author

Boscop commented Jul 16, 2018

to_vec() always allocates a Vec and copies the data over.
If that's necessary in the user's use case, then the client code should call to_vec() (Asset::get(path).to_vec()).
It should be up to the user to decide if the allocation is necessary, so that it won't do the extra allocation+copy when it's not needed.

@pyrossh
Copy link
Owner

pyrossh commented Jul 17, 2018

Asserted. Then this will be a refactor and change in the api. Will need to push a new major for this and correct the examples. Will do this and add the path method/docs for this in the next major release 4.

@Boscop
Copy link
Author

Boscop commented Nov 25, 2018

Now get() returns impl AsRef<[u8]> but this forces me to always alloc a Vec on my side, which means that if the return value was actually a Vec, a new Vec will be allocated by .to_vec():

.resource("/{tail:.*}", move |r| r.f({
    move |r| {
        let mut path: String = r.match_info().query("tail").unwrap();
        if path == "" {
            path = "index.html".into();
        }
        match Assets::get(&path) {
            Some(content) => HttpResponse::Ok()
                                .content_type(mime_guess::guess_mime_type(path).as_ref())
                                .body(content.as_ref().to_vec()),
            None => HttpResponse::NotFound().body("404 Not Found"),
        }
    }
}))

But if get() would return &'static [u8] (which is the case in release) it can be forwarded without allocation:
Because of where T: Into<Binary> (which &'static [u8] impls):
https://docs.rs/actix-web/0.7.14/actix_web/dev/struct.HttpResponseBuilder.html#method.body
https://docs.rs/actix-web/0.7.14/actix_web/enum.Body.html#impl-From%3CT%3E
https://docs.rs/actix-web/0.7.14/actix_web/enum.Binary.html#impl-From%3C%26%27static%20%5Bu8%5D%3E

@pyrossh
Copy link
Owner

pyrossh commented Nov 26, 2018

Nice.. How do we solve this? &'static [u8] for both debug and release? In debug its dynamically loaded right how do we make it &' static.

@Boscop
Copy link
Author

Boscop commented Nov 27, 2018

If we change the function to return impl Into<Vec<u8>> + AsRef<[u8]> instead then one can .into() the result to get a Vec<u8>, and it won't clone if it's already a Vec<u8>. And then .as_ref() still works.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=d1ec9177a09f620813d4d066e55d8275

@pyrossh pyrossh reopened this Nov 27, 2018
@pyrossh
Copy link
Owner

pyrossh commented Nov 27, 2018

@vemoo any thoughts on this.
Shall we go ahead on change the return type as you had done in this PR.
#40

impl Into<Vec<u8>> + AsRef<[u8]> seems like a better approach for integration with other webframeworks.

@Boscop Also why do you need to return a Vector<u8> in actix. I thought it takes a [u8] also according to the example.
https://github.com/pyros2097/rust-embed/blob/master/examples/actix.rs#L18

@Boscop
Copy link
Author

Boscop commented Nov 27, 2018

Ah, I hadn't seen the Body::from_slice method. That works, but other web server frameworks (or other use cases) will still require an owned Vec so I think we should still change the return type to enable this optimization. And I won't break any existing code.

@vemoo
Copy link
Contributor

vemoo commented Nov 27, 2018

If we change the function to return impl Into<Vec<u8>> + AsRef<[u8]> instead then one can .into() the result to get a Vec<u8>, and it won't clone if it's already a Vec<u8>. And then .as_ref() still works.

When in debug a Vec<u8> is returned but when in release a &'static u8. So this will only be an optimization when in debug. In release calling .into() will allocate a new Vec<u8>.

@Boscop
Copy link
Author

Boscop commented Nov 27, 2018

@vemoo No, because .into() (from Into<Vec<T>> is a no-op when called on a Vec<T> because of the reflexive impl of From<T> for all T.
So this will optimize the release case.

@vemoo
Copy link
Contributor

vemoo commented Nov 27, 2018

@Boscop but this macro doesn't create a Vec<u8> on release, it returns the &'static [u8] directly from include_bytes!(...): https://github.com/pyros2097/rust-embed/blob/3421ff019a09f4955c5e05aef75986ad318d7440/impl/src/lib.rs#L59
So .into() will use https://doc.rust-lang.org/src/alloc/vec.rs.html#2236-2245

@Boscop
Copy link
Author

Boscop commented Nov 28, 2018

Ah right (that was already fixed), but it will still prevent a Vec clone in debug mode, so it's still worth doing (no drawbacks).

@vemoo
Copy link
Contributor

vemoo commented Nov 29, 2018

I think I found a better solution. We return std::borrow::Cow<'static, [u8]>, so into() for Vec<u8> will not allocate in debug, and since we also expose that if it's not and owned value, its lifetime is static, this allows other optimizations in release.

See the updated actix example here: vemoo@74c8579

The bytes.into() call in each case will end up calling https://docs.rs/actix-web/0.7.14/actix_web/enum.Binary.html#impl-From%3C%26%27static%20%5Bu8%5D%3E and https://docs.rs/actix-web/0.7.14/actix_web/enum.Binary.html#impl-From%3CVec%3Cu8%3E%3E respectively because of https://docs.rs/actix-web/0.7.14/actix_web/enum.Body.html#impl-From%3CT%3E

@Boscop
Copy link
Author

Boscop commented Nov 29, 2018

I was aware of the Cow solution but I decided not to mention it because it does dynamic dispatch on the enum, whereas we don't have to do any dynamic dispatch, we can do the dispatch statically, because the case is known at compile-time. Returning impl Trait doesn't do any dynamic dispatch.

@vemoo
Copy link
Contributor

vemoo commented Nov 30, 2018

I don't think that's dynamic dispatch. All the posible types are know at compile time. We just go from

match Assets::get(&path) {
    Some(Vec<u8>) => ...,
    None => ...,
}

in debug or

match Assets::get(&path) {
    Some(&'static [u8]) => ...,
    None => ...,
}

in release to

match Assets::get(&path) {
    Some(Cow::Borrowed(&'static [u8])) => ...,
    Some(Cow::Owned(Vec<u8>)) => ...,
    None => ...,
}

both in debug and release, with the added benefit that we have access to the static lifetime, which allows using https://docs.rs/actix-web/0.7.14/actix_web/enum.Binary.html#impl-From%3C%26%27static%20%5Bu8%5D%3E which wasn't possible before.

@Boscop
Copy link
Author

Boscop commented Dec 1, 2018

@vemoo Yea, by dynamic dispatch I meant the matching on the Cow..
But we're matching already, so it's "ok" from that point, if the information that the lifetime is 'static prevents actix from having to copy the slice..

But with the Cow solution, we'd then require client code to have essentially the same code in both cases of the Cow match (it will just choose a different instance of Binary::from at compile-time) :/
But that seems to be unavoidable..

@vemoo
Copy link
Contributor

vemoo commented Dec 2, 2018

I've added impl From<Cow<'static, [u8]>> for Binary to actix, so we don't need to match on the Cow.
For other web frameworks, if they don't have a special case for &'static [u8] calling into_owned() will reuse the vector in debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants