-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options to store files directly to the file system and to omit duplicating files in memory #257
Conversation
…plicating files in memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting so much effort in this topic, for me it looks like you are following a good concept here. I added some comments from my perspective as a previous contributor of minor fixes, you'll see I'm not aware of some design principles of the library. Reviewed nevertheless, because I'm highly interested in solving #229 😃 . Some more things, like those mentioned inline rather nitpicking, than serious flaws:
- Run
cpplint --extensions=cpp,hpp --headers=hpp --recursive .
before pushing. - Would be not my style to put all this in a single commit, because that makes review harder, but @etr squashes them anyways on merging, so this is probably fine. However the commit message should reflect the reasons for the changes in standard Git style: first line a short (!) summary, followed by an empty line, followed by reasoning for why this change.
- There are no unit tests for this at all. Yet? 😉
Again, I think I would want to use your patch eventually, seems reasonable and clear from a library users point of view. 👍🏼
src/webserver.cpp
Outdated
file_info.file_system_file_name = http_utils::generate_random_upload_filename(mr->ws->post_upload_dir); | ||
} | ||
else { | ||
file_info.file_system_file_name = mr->ws->post_upload_dir + "/" + std::string(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path separator is platform dependent, is libhttpserver supposed to work on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does (it auto-compiles there on commit).
We might want to just use a preferred separator. The simplest would be to have a quick inline like this: https://stackoverflow.com/a/12971535/756399
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented, but I went for the solution (which is also part of the provided link) to use a static const char in favor of an inline function.
I hope you agree on that
@@ -193,6 +193,8 @@ const char* http_utils::http_post_encoding_multipart_formdata = MHD_HTTP_POST_EN | |||
const char* http_utils::application_octet_stream = "application/octet-stream"; | |||
const char* http_utils::text_plain = "text/plain"; | |||
|
|||
const char* http_utils::upload_filename_template = "libhttpserver.XXXXXX"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe static const char*
because not used outside of this source file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to
static const char* upload_filename_template = "libhttpserver.XXXXXX";
and removed the declaration from http_utils.hpp
src/http_utils.cpp
Outdated
@@ -216,6 +218,22 @@ std::string http_utils::standardize_url(const std::string& url) { | |||
return result; | |||
} | |||
|
|||
std::string http_utils::generate_random_upload_filename(std::string &directory) { | |||
char *filename = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullptr
… see #247.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with the rework of this function to not use asprintf any more
src/httpserver/http_utils.hpp
Outdated
static std::vector<std::string> tokenize_url(const std::string&, const char separator = '/'); | ||
static std::string standardize_url(const std::string&); | ||
|
||
static std::string generate_random_upload_filename(std::string &directory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpplint warns here:
src/httpserver/http_utils.hpp:253: Is this a non-const reference? If so, make const or use a pointer: std::string &directory [runtime/references] [2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is const now (input and output)
And yes, I will use cpplint before pushing in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, thanks a lot for the change; you did a great amount of work here, and I think we are in a pretty good state.
I left you a few comments and tried to avoid overlapping with @LeSpocky.
I think the only major one (besides the inlines) are unit tests that are current missing and the README documentation that is not updated. Let's try to update both, but we are basically near done here.
Note: While I squash on merge, I'd still recommend splitting the changes when you layer them on each other (as it will speed-up the review process).
src/http_request.cpp
Outdated
@@ -127,6 +127,10 @@ const std::map<std::string, std::string, http::arg_comparator> http_request::get | |||
return arguments; | |||
} | |||
|
|||
file_info_s &http_request::get_or_create_file_info(const char *upload_file_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we generally attach '&' and '*' to the preceding term in declarations of inputs and output (check across the whole CR).
Also, I'd prefer passing around std::string
over const char*
.
Can the return be marked const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attached &
and *
are adjusted. Additionally this function takes const std::string& upload_file_name
as parameter now.
The return value is now const.
Therefor I introduced new setters for the content of the file_info_s as these where previously set in the calling function directly on the returned reference.
src/http_utils.cpp
Outdated
@@ -216,6 +218,22 @@ std::string http_utils::standardize_url(const std::string& url) { | |||
return result; | |||
} | |||
|
|||
std::string http_utils::generate_random_upload_filename(std::string &directory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both input and output can be const.
src/http_utils.cpp
Outdated
if (asprintf(&filename, "%s/%s", directory.c_str(), http_utils::upload_filename_template) == -1) { | ||
return ""; | ||
} | ||
int fd = mkstemp(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how portable mkstemp is (to be fair, the autobuild on windows will say that).
I think you might want to use a combination of tpnam
and fstream
. See: https://stackoverflow.com/a/7778959/756399
src/httpserver/create_webserver.hpp
Outdated
return *this; | ||
} | ||
|
||
create_webserver& file_upload_target(file_upload_target_T file_upload_target) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use a const reference?
src/httpserver/http_request.hpp
Outdated
* @param upload_file_name he file name the user uploaded (this is the identifier for the map entry) | ||
* @result a file info struct file_info_s | ||
**/ | ||
file_info_s &get_or_create_file_info(const char *upload_file_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the output can be marked as const. The input can use a reference to std::string
src/httpserver/http_request.hpp
Outdated
* @param file_system_file_name The path to the file in the file system (the name may not be the original name depending on configuration of the webserver) | ||
* @param size The size in number of char of the file. | ||
**/ | ||
void set_file(const char* upload_file_name, const char* file_system_file_name, size_t file_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use strings references on method signatures as much as we can (I am fine with internal conversions, but I'd preserve the interface, if possible).
src/httpserver/webserver.hpp
Outdated
file_upload_target_T file_upload_target; | ||
std::string post_upload_dir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these two be marked const
?
src/webserver.cpp
Outdated
if (filename == nullptr || mr->ws->file_upload_target != FILE_UPLOAD_DISK_ONLY) { | ||
mr->dhr->set_arg(key, mr->dhr->get_arg(key) + std::string(data, size)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this prevent the library from parsing the other POST parameters whenever the configuation is on?
I guess a unit test would help us see if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it does not (at least if I can trust my tests).
filename == nullptr
is the indicator, that the currently targeted key in the post_iterator is not an uploaded file. So these parameters will always be put to the arguments.
the mr->ws->file_upload_target != FILE_UPLOAD_DISK_ONLY
will trigger, that files (where filename is not a nullptr) should be stored to the arguments too.
I did tests with all of the possible file_upload_target values and all parameters, other content (application/json, ...) was all handled as it should.
Yes, a unit test would be helpful. That's on my agenda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did tests with all of the possible file_upload_target values and all parameters, other content (application/json, ...) was all handled as it should.
Yes, a unit test would be helpful. That's on my agenda.
FWIW, from the HTML point of view, and because I looked at it just recently: There's the additional attribute multiple1. This means you can have:
- multiple files with the same key but different filenames in one request
- multiple files with the same name but different keys in one request, if you add more than one element of type
<input type="file">
to the form, and user puts the same file at it
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I was not aware of that. I will give this a try and if this does not work, adjust the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a few tests on this (with Postman)
The content (to be retrieved with get_content
) is always correct.
Upload the same file with different keys
- Two separate entries in the args map
- Only one file created with doubled content (and only one file in the files map)
Upload different files with the same key
- Only one arg with the doubled content
- Two separate files (and two files in the map)
Upload same file with the same key (this IMHO makes absolutely no sense)
- Only one arg and one file with duplicated content
This behaviour is actually logical, as the args map is set up by the key and the files map by the filename
Question is probably: What is the expected behavior?
Better solution for the files map would be to set up the map with an unique identifier and always take key and filename into account.
But how can this be changed for the args? As changing the map will change the API and therefor break the compatibility.
src/webserver.cpp
Outdated
file_info.file_system_file_name = http_utils::generate_random_upload_filename(mr->ws->post_upload_dir); | ||
} | ||
else { | ||
file_info.file_system_file_name = mr->ws->post_upload_dir + "/" + std::string(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does (it auto-compiles there on commit).
We might want to just use a preferred separator. The simplest would be to have a quick inline like this: https://stackoverflow.com/a/12971535/756399
Thanks for the feedback. I already made some fixed and already commented on some of your comments. First of all I have to admit, that I am not really a C++-Developer, but more into C (I think, that explains some of the stuff I did). Regarding the asprintf, yes, that is not a good way (as the test build showed too). I will try to get rid of this. Regarding documentation of the options: Yes I am aware of this and this will follow shortly. I just waited for a general approval of the supplied patch, so that I do not go into the wrong direction. But as you both agreed, that this is a feasible approach, I will do the docu. Regarding unit-tests: I do not really know, how to setup unit tests on this, as we would need file system operations. How can we mock such things with the used unit test framework? Regarding mkstemp. One problem is, that I do not have a testing (or compiling) environment for windows. |
Sounds good, and thanks again for all the good work.
The test framework doesn't have mocking mechanism. For files that you might need in input, there are "real" files that are copied into the test root by the configure script (see: https://github.com/etr/libhttpserver/blob/master/configure.ac#L275). You can use those. For files that you need to verify in output, you should just be able to write within the testdir or the tmp directory.
The automatic build runs on AppVeyor; while that's not the same as running a local windows instance, that should help. |
On Thu, Feb 10, 2022 at 09:08:08AM -0800, Sebastiano Merlino wrote:
For files that you might need in input, there are "real" files that are copied into the test root by the configure script (see: https://github.com/etr/libhttpserver/blob/master/configure.ac#L275). You can use those.
By chance I wrote some tests with libcurl today for a different
project, which actually do file upload with that format. If you're
interested I can change some example code.
|
As asprintf is a gnu extension it should not be used in a multi plattform code. To generate the needed string, std::string is used As the finally called function mkstemp changes its first argument of type char * the concatenated string is still strdup-ed into a char * before handing it to mkstemp
Mkstemp is only available on windows. As there is no suitable plattform independant function to generate a unique filename within a specified directory, use mkstemp on linux and a combination of _mktemp_s and _sopen_s on windows. Furthermore the path separator differs on linux and windows so use a static const char which provides the appropriate path separator Additionally within this patch the function http_utils::generate_random_upload_filename uses an exception to reports errors rather than a empty string as return code
…to run on windows The current code won't compile on ApVeyor. So this is a an untested patch to include the correct headers and not use functions, which are not supported (or deprecated) on windows.
src/webserver.cpp
Outdated
if (filename) { | ||
std::cout << "got filename '" << filename << "'" << std::endl; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this debug code slipped in by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes. Sorry for that.
I used that to test, what is happening when uploading multiple files with either same key or same filename.
And in parallel I added the includes, as the ApVeyor did not compile.
Obviously messed the commit up by not specifying the file to be commited.
Will be removed.
This is only used inside http_utils.cpp and therefor can be static
The output of get_or_create_file_info should be const. As previously the result was used to direclty alter the content of the struct new setters to set the file_system_file_name and to grow the size are introduced (and used)
file_upload.cpp is a very basic example for a file upload server. It features a simple HTML input form with file upload (which supports multiple files). The uploaded files will be stored to the directory, which has to be given as command line parameter to the example program. All uploaded files will be printed in a table including their original name, the generated random filepath and the size.
examples/file_upload.cpp
Outdated
@@ -61,7 +61,7 @@ class file_upload_resource : public httpserver::http_resource { | |||
</tr>"; | |||
|
|||
std::map<std::string, file_info_s> files = req.get_files(); | |||
for (std::map<std::string, file_info_s>::iterator it = files.begin(); it != files.end(); it++) { | |||
for (std::map<std::string, file_info_s>::iterator it = files.begin(); it != files.end(); ++it) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto it
should do in modern C++
Static code analysis complained about use of unprotected strlen. This changes adds a safety check directly before the us of strlen. Additionally some free()s (before some throws) where missing.
There is the possibility to upload the same file with two different keys or to upload multiple (different) files with the same key within the same request. The current solution can only handle the second use case but not the first. This patch changes the file map to use a nested map of files within a map of keys to support both use cases.
I already added a very basic example for the file upload and storing to the file system last week. I don't know, whether there should only be some selected examples as part of the library. So if this should not be part of the merge, I can remove it again. Additionally I reworked the map of files to look like
|
As Codacy static code analysis warns about the safety issue of strlen if a given string is not terminated, this is removed. The solution uses the size of the original string, which is just duplicated to a char* as the used function _mktemp_s requires a char* which will be changed by the function and the changed result is needed afterwards.
One of the workflows did not run successfully, as it is obviously not allowed to have a std::ofstream in a class if it is not a pointer or reference. This patch uses a pointer to the ofstream in the modded requests and adjusts all usage of the ofstream in webserver.cpp
I tried to resolve all of the workflow and code analysis errors, but one thing remains, which I don't really now, how I should solve this. The Codacy Static Code Analyzer complains, that the file_size (inside the map) is never used. I can't figure out how to tell the compiler to ignore this (I tried some Do you have any advice how to solve this? |
As these two constant values are used in the unit tests they could no longer be static only in http_utils.cpp. So they are members of the class http_utils and therefor accessible in the unit tests
This is from our applications unit tests, redacted so you won't be able to compile it, and it's using a different unit test framework, but you get an idea how to create those multipart form data requests with libcurl: https://gist.github.com/LeSpocky/b35b33b835116bbdb16fad7ba3c0c34e It's basically using those |
Added an integration test for file upload which tests the upload with different settings of the webserver: - Upload to memory and disk - Upload to disk only - Upload to memory, duplicated in get_content and args - Upload to memory, only stored in args
I just added an integration test for the file upload Please review, as I don't really know, how I should (or could) improve this. |
test/integ/file_upload.cpp
Outdated
(*content) << req.get_content(); | ||
(*args) = req.get_args(); | ||
(*files) = req.get_files(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we manage these class attributes as internal values and not use pointers (better with setters/getters, acceptable with a public interface)?
Ideally, from the outside, you can use the getters.
test/integ/file_upload.cpp
Outdated
string expected_filename = upload_directory + httpserver::http::http_utils::path_separator + httpserver::http::http_utils::upload_filename_template; | ||
LT_CHECK_EQ(file->second.get_file_system_file_name().substr(0, file->second.get_file_system_file_name().size() - 6), expected_filename.substr(0, expected_filename.size() - 6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these lines are a bit long; let's wrap them
test/integ/file_upload.cpp
Outdated
string expected_filename = upload_directory + httpserver::http::http_utils::path_separator + httpserver::http::http_utils::upload_filename_template; | ||
LT_CHECK_EQ(file->second.get_file_system_file_name().substr(0, file->second.get_file_system_file_name().size() - 6), expected_filename.substr(0, expected_filename.size() - 6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these lines are a bit long; let's wrap them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests.
There are a few tests that I'd additionally do on this.
(1) A test where we provide both a file to upload and other parameters to parse.
(2) A test where we provide multiple files in input.
(3) A test where we upload through PUT (instead of POST)
Avoid using pointers in class print_file_upload_resource in favour of internal values and according getter functions. Additionally simplified the code by wrapping lines and using the auto keyword instead of long map specifiers.
Added the following integration tests - Upload of a file via PUT (then there is no post_processor used and the content of the file will only be available via get_content - Upload of two files - Upload of a file and an additional ordinary POST parameter
As the test_content_2 was missing in the configure script the file was not copy to the build/test directory and therefor the integration tests failed
Is the current error for the AppVeyor my fault? I did not knowingly change anything to the YAML file. The file is (according to the repo) still there and untouched since 7 days (which was a merge of the current master). Or is this only a temporary sporadic issue and the workflow just needs to be restarted (which I cannot do)? |
Not your fault at all, I think AppVeyor itself was having issues with Github permissions yesterday |
test/integ/file_upload.cpp
Outdated
curl_easy_setopt(curl, CURLOPT_URL, "localhost:8080/upload"); | ||
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L); | ||
curl_easy_setopt(curl, CURLOPT_READDATA, fd); | ||
curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)file_info.st_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spacing between the cast and the the value
shared_ptr<string_response> hresp(new string_response("OK", 201, "text/plain")); | ||
return hresp; | ||
} | ||
const shared_ptr<http_response> render_PUT(const http_request& req) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: empty line before this
test/integ/file_upload.cpp
Outdated
LT_CHECK_EQ(actual_content.find(FILENAME_IN_GET_CONTENT) != string::npos, true); | ||
LT_CHECK_EQ(actual_content.find(TEST_CONTENT) != string::npos, true); | ||
|
||
auto args = resource.get_args(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's avoid auto
in this context - it makes it harder to understand the type used (same for all the others in this code).
Only a couple of nitpicky comments, but I think we are done for the rest here (albeit I think some checks are still failing) |
If multiple files are uploaded, the ofstream to write the data to the file was created again (with new()) for the next file without deleting the previously created one. This resulted in a memory leak.
Approved the PR. Thanks for all the great work and the amount of effort you put into getting this done. This is a great piece of improvement to the library. |
Great news. I am glad, that I could contribute. |
Issue or RFC Endorsed by Maintainers
#229
Description of the Change
As discussed in the issue this is an approach to add file handling options for the user of the library
/tmp
)Additionally this patch adds an option to avoid extensive memory usage by not duplicating processed data (by the post processor of libmicrohttpd) to the content. Especially when uploading bigger files this increased the memory usage massively.
Alternate Designs
The option to create a random filename could be enhanced to offer the possibility to sanitize the provided filename rather than creating a unique one
Possible Drawbacks
Currently if the uploaded file path already exists (if files are stored to disc) the old file gets silently overwritten (will not happen when unique filenames are enabled). This could be enhanced by giving the libhttpserver another option to specify what should happen in this case (like
overwrite_files_if_exist
andno_overwrite_files_if_exist
)Otherwise there should not be drawbacks for current users, as the default values for all introduced options preserve the current behavior.
Verification Process
Release Notes
tbd