Skip to content

Conversation

@stuart-byma
Copy link
Contributor

Issue or RFC Endorsed by Maintainers

#286

Description of the Change

This change reduces required copying of immutable data primarily in the HTTP request API. Methods returning strings now return string_view to avoid copying immutable data from the underlying MHD impl. get_*s methods still return a std::map which must be allocated and constructed, but does not copy underlying strings.

Fix a few instances of these method calls in the code base.

Introduce some type aliases for maps from string_view to string_view, string to string to reduce code verbosity.
Overload the comparator operator()s as required.

Provide overloads for dump_*_map functions and a template impl to avoid code duplication.

Bump C++ version to 17 required to use string_view. Remove CI configurations using ancient compilers that don't support string_view or C++17

Alternate Designs

No other alternatives. The change is relatively simple.

Possible Drawbacks

This is a potentially breaking change to the API. Users may have to update their apps after updating.

Verification Process

All tests passing.

Release Notes

Many instances of std::string within the API changed to std::string_view to improve performance by reducing copies and memory allocations.

This change reduces required copying of immutable data primarily in the
HTTP request API. Methods returning strings now return string_view to
avoid copying immutable data from the underlying MHD impl.
get_*s methods still return a std::map which must be allocated
and constructed, but does not copy underlying strings.

Fix a few instances of these method calls in the code base.

Introduce some type aliases for maps from string_view to string_view,
string to string to reduce code verbosity.
Overload the comparator operator()s as required.

Provide overloads for dump_*_map functions and a template impl to avoid
code duplication.

Bump C++ version to 17 required to use string_view. Remove CI
configurations using ancient compilers that don't support string_view
or C++17
Copy link
Owner

@etr etr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments from me - sorry for the delay.

**/
void dump_header_map(std::ostream &os, const std::string &prefix, const std::map<std::string, std::string, header_comparator> &map);
void dump_header_map(std::ostream &os, const std::string &prefix, const http::value_map &map);
void dump_header_map(std::ostream &os, const std::string &prefix, const http::value_map_view &map);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go one way or the other for this dump_* methods and not keep both versions. These are mostly for debug purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, but the problem is that the HttpResponse type owns its header/footer data and therefore has value_maps and not value_map_views. So there are places where dump_header_map is used on both owning and non-owning map types.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother trying to produce a "view" version of this one as it is a method used for debugging only. Internally to the library only http_response and http_request use it (both in their internal operations) and a test for http_utils that builds the map as map of strings. I think just keeping the old ones here is fine - outside of this PR, I'll move this under "details" to make it clear that it is not something we offer as interface, anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually left just the view version. The test case usages are switched to view maps (trivial) which left HttpResponse as the only place data owning versions were being dumped. I added utility to convert the map to a view just for that case. I think this results in the fewest lines of code changed. The utility (to_view_map) can also just go in the details namespace when you make that change.

This change reduces required copying of immutable data primarily in the
HTTP request API. Methods returning strings now return string_view to
avoid copying immutable data from the underlying MHD impl.
get_*s methods still return a std::map which must be allocated
and constructed, but does not copy underlying strings.

Fix a few instances of these method calls in the code base.

Introduce some type aliases for maps from string_view to string_view,
string to string to reduce code verbosity.
Overload the comparator operator()s as required.

Provide overloads for dump_*_map functions and a template impl to avoid
code duplication.

Bump C++ version to 17 required to use string_view. Remove CI
configurations using ancient compilers that don't support string_view
or C++17

const std::map<std::string, std::string, http::arg_comparator> http_request::get_args() const {
std::map<std::string, std::string, http::arg_comparator> arguments;
const http::arg_map http_request::get_args() const {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense for this to return a map of view as for the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ran into an issue with this. build_request_args actually makes a temp copy of the underlying data and does some unescape stuff on it. We cannot return a view over that temp data since it goes out of scope.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm...that is a really annoying inconsistency in the interface that will leave folks puzzled. I need to think of whether there is a way out (philosophically, I'd rather prioritize clarity/consistency over performance, unless major)

Copy link
Owner

@etr etr Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, all of this originates from some old "unescaping" logic that made us bypass the mhd unescaper and build a separated unescaping logic on top of it. I think we might just have to unblock that separately (and I think it should be possible, given that nobody has touched that code in the past 10 years and existed to fix some misbehavior that was there in a dependency at that time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that can be handled as a separate issue. Afterward we can make get_args return an arg_view_map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think the MHD unescaper is fine at this point? Maybe worth checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_args firsts takes a copy of the args member var, then goes to MHD to get additional args. Can we use the args member var to just cache the unescaped values from MHD, and then return an arg_view_map constructed from the args member var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the barrage of comments, but I notice get_arg to get a single arg does not do any unescaping, why is that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the barrage of comments, but I notice get_arg to get a single arg does not do any unescaping, why is that?

It sounds very much like a bug. To be fair, I can see there is some old code in here that makes no sense anymore (see it tries to get data from an "args" dataset that never gets filled because the method is const). I think the general management of args and unescaping has issues that need fixing.

README.md Outdated
* _**const std::map<std::string_view, std::string_view, http::header_comparator>** get_headers() **const**:_ Returns a map containing all the headers present in the HTTP request.
* _**const std::map<std::string_view, std::string_view, http::header_comparator>** get_cookies() **const**:_ Returns a map containing all the cookies present in the HTTP request.
* _**const std::map<std::string_view, std::string_view, http::header_comparator>** get_footers() **const**:_ Returns a map containing all the footers present in the HTTP request (only for http 1.1 chunked encodings).
* _**const std::map<std::string_view, std::string_view, http::arg_comparator>** get_args() **const**:_ Returns all the arguments present in the HTTP request. Arguments can be (1) querystring parameters, (2) path argument (in case of parametric endpoint, (3) parameters parsed from the HTTP request body if the body is in `application/x-www-form-urlencoded` or `multipart/form-data` formats and the postprocessor is enabled in the webserver (enabled by default).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the implementation below (which still seems not to be using views)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I think I caught all instances, but if you can double check that would be great.

This change reduces required copying of immutable data primarily in the
HTTP request API. Methods returning strings now return string_view to
avoid copying immutable data from the underlying MHD impl.
get_*s methods still return a std::map which must be allocated
and constructed, but does not copy underlying strings.

Fix a few instances of these method calls in the code base.

Introduce some type aliases for maps from string_view to string_view,
string to string to reduce code verbosity.
Overload the comparator operator()s as required.

Bump C++ version to 17 required to use string_view. Remove CI
configurations using ancient compilers that don't support string_view
or C++17
In the file upload test, it was saving the arg_view_map from the
request, which went out of scope, leaving dangling refs.

Fix by making a deep copy.
Because build_request_args makes a temp copy of underlying data and
mutates it, it is not feasible currently to return a view over that data.
* @return the value of the header.
**/
const std::string get_header(const std::string& key) const;
std::string_view get_header(std::string_view key) const;
Copy link
Owner

@etr etr Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we align the rest of the public interface of this class to use string_view? (e.g. get_user, get_pass, get_method, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be best. Design decisions to make though: some of these other methods modify data extracted from MHD (e.g. get_querystring), construct a result and then return it. I see two options here. One is to change the signature and/or return values to match whatever MHD gives, and return views over that data. The other is to construct the result as now only once, cache in the request object, and return a view over that (also if the user calls twice, the cached value is simply returned on the second call). Thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that I am not sure I really like any of the five options I see for this. We basically can:

  1. Leave the logic as is, and have an inconsistent interface. Some methods will return string, others return string_view. The issue with it is that basically we are basically projecting outside our own internal logic and the user pays the cost for this by having to adapt to what each method returns for a reason that is not immediately obvious when in their position.

  2. Revert the methods to pure proxies over what libmicrohttpd returns. This comes at a loss of the extra simplicity that httpserver provides (which is one of the core reasons people use this library).

  3. Pre-fetch all data from libmicrohttpd and store it preventively in the request object. This would allow to use string_view everywhere. This used to be the old implementation, but we moved away from it because it was just way too expensive for no reason (the user pays the cost of the fetch + copy even when they never use the data).

  4. Do a fetch and cache. This is an implementation I had thought of when migrating off of the old prefetch approach. It has the extra advantage of making subsequent calls cheaper. It has a massive issue that it makes all getters non const which is a really annoying property for the interface of what basically is a class meant to store and transport data. I considered and discarded this approach when moving away from the old pre-fetch one.

  5. Just leave the class as is and not use string_view at all.

I am currently pondering 1 an 5 - everything else just doesn't seem worth it. Instinctively, I'd say 5 would be my general inclination because philosophically I'd rather prioritize clarity/consistency over marginal performance, but I can welcome disagreements, if well motivated. I also think it would be worth to simulate some heavy data fetch (e.g. of headers) with both approaches to actually prove that using string_view whenever possible is worth the inconsistency it costs interface-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that 3 is untenable performance-wise. Most users are not going to use every field, prefetch has a high init cost. 1 is also undesirable; I agree it is better to be as consistent as possible.

I am not sure I agree with your assessment of point 4. An object internal cache doesn't mean we have to sacrifice const on all the accessors. It just means we can't cache the items directly in the request object. We can use an private cache type, and store it within a unique_ptr. Then we have a mutable cache from within a const accessor. There are only a few values that need to be cached in this way for us to make the whole API return string_view.

I think I prefer this option, it seems to be the best compromise between API consistency, const-ness, and performance. I have already tried this out, I will add it to the PR and look forward to your thoughts.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea - I am looking at the implementation and it seems like it lands in a good place. I've just left a few minor comments. Take a look at those. I'll take another re-read later to give you a more complete feedback.

Add a data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.
Add a data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.
Add a data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.
Add a data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.
@stuart-byma stuart-byma reopened this Nov 14, 2022
Add a http_request_data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.

Remove the existing http_request::args member, use the cache instead.

Make the http_request class move-only.
namespace {
static inline http::header_view_map to_view_map(const http::header_map & hdr_map) {
http::header_view_map view_map;
for (const auto & item : hdr_map) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keep '&' attached on the left

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure you did this?

Copy link
Owner

@etr etr Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed others below - can you make sure you keep the reference '&' consistently together (left attached) to the type in all of your changes? (same for pointers '*')

I am aware the code is inconsistent (even in some of the rows you are modifying), but let's make sure we fix everything we are touching or creating anew. I need to go and have a pass at cleaning everything up later for the rest..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got them all now. I would suggest configuring something clang-format, though the initial application of it will likely make quite a few changes that are formatting-only.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there is a change pending on that (with a few thousands fixes) that I am going through.

@etr
Copy link
Owner

etr commented Nov 16, 2022

Only minor comments on the change at this point - I think once we close those, we are done.

Add a http_request_data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.

Remove the existing http_request::args member, use the cache instead.

Make the http_request class move-only.
Add a http_request_data_cache type to the http request type, and a unique_ptr
instance member. This allows caching of transformed and/or copied data
from MHD, while preserving const-ness of all accessor methods.

Change all applicable methods to string_view, fix a few call sites.

Remove the existing http_request::args member, use the cache instead.

Make the http_request class move-only.
@etr etr merged commit 1777284 into etr:master Nov 17, 2022
@etr
Copy link
Owner

etr commented Nov 17, 2022

Thanks for going through with this. It is merged now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants