Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse the current http header processing #7229

Closed
amoss opened this issue Oct 30, 2019 · 13 comments · Fixed by #7264

Comments

@amoss
Copy link
Contributor

@amoss amoss commented Oct 30, 2019

Summary

Splitting the picohttpparser issue into two smaller more understandable pieces. This first part is to analyse what happens in the current code when we fuzz the API, and compare it to a streaming configuration to check for any relevant differences.

  • Setup a test configuration with multiple nodes streaming to a common master.
  • Fuzz the API and extract the headers from web_client.c:web_client_process_request().
  • Use the streaming configuration and extract the headers for STREAM requests from web_client.c:web_client_process_request().
  • Connect with different browsers and check for relevant differences in the headers.
  • Check that the set of URLs that we see in the headers for processing is covered by the set we are fuzzing.
  • Write PoC unit-test for the http header request processing.
  • Find out what provokes the other code-path for WEB_CLIENT_MODE_FILECOPY, start by tracing a request for http://localhost:19999/netdata.conf.

Expected result

Unit-tests to cover the different headers / URLs.

@amoss amoss self-assigned this Oct 30, 2019
@amoss amoss added this to the v1.19-Sprint2 milestone Oct 30, 2019
@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Oct 30, 2019

Estimate is from cutting #7164 into two smaller pieces.

@stelfrag

This comment has been minimized.

Copy link
Contributor

@stelfrag stelfrag commented Oct 30, 2019

Estimate is from cutting #7164 into two smaller pieces.

I think its much much better

@thiagoftsm

This comment has been minimized.

Copy link
Contributor

@thiagoftsm thiagoftsm commented Oct 30, 2019

I agreee with @stelfrag, with this steps you have everything to have success. Any doubt let us know!

@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Oct 31, 2019

It's taken an entire dev-day (not calendar day) to setup a working test configuration, so this task is going to take longer than two days.

@thiagoftsm

This comment has been minimized.

Copy link
Contributor

@thiagoftsm thiagoftsm commented Oct 31, 2019

Do not worry, take the necessary time and ask any doubt that you have!

@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Oct 31, 2019

@thiagoftsm and @mfundul Thanks for the meeting this afternoon. There is a lot of complexity in the interactions between the collectors, the database and the web API. It touches a huge amount of code and it was helpful to get a condensed overview of how it works and what the technical constraints we have.

amoss added a commit that referenced this issue Nov 4, 2019
New testing tool for the web API.

We are calling this a "fuzzer" until a better name is suggested. This tool reads the swagger definitions of the API and parses the format of the requests and responses. The tool can generate randomized requests, which are sent to a netdata host, and then validate the json responses against the schema defined in the swagger. 

A traditional fuzzer only produces a single bit of information about each test (did the target system crash). This tool verifies that the call into the API produced a valid response structure, which produces more information about the correct functioning of the host.

This current version performs a small sweep through the API calls as that is sufficient to find some incorrect response codes, and for testing the URL parser in the next issue (#7229) . A future update (in the next sprint) will add options to perform a deeper scan that brute-forces the parameter-space of the API, and combine it with our standard approach to stress-testing.
@amoss amoss added the area/tests label Nov 4, 2019
@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Nov 6, 2019

While I'm mocking the interfaces used inside web/server/web_client.c I found something a bit strange. I'm noting it here so that I remember to restructure it in the next issue:

            // getpwnam() is not thread safe,
            // but we have called this function once
            // while single threaded
            pw = getpwnam(web_owner);

This is a strange piece of code. Neither our effective UID, or the "web owner" setting in the configuration will change at runtime, but we are calling a non-thread-safe function from a thread on the basis that the value it returns will not change. Inside mysendfile we then use this check to guard against sending files that we do not own.

There is no need to call these functions each time, we can store the values while we are single threaded and then shared them among the threads as read-only global values.

@stelfrag

This comment has been minimized.

Copy link
Contributor

@stelfrag stelfrag commented Nov 6, 2019

This is a strange piece of code. Neither our effective UID, or the "web owner" setting in the configuration will change at runtime, but we are calling a non-thread-safe function from a thread on the basis that the value it returns will not change. Inside mysendfile we then use this check to guard against sending files that we do not own.

There is no need to call these functions each time, we can store the values while we are single threaded and then shared them among the threads as read-only global values.

it is called once

upd: let me clarify -- safe due to checks -- but we can save a function call. I agree

@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Nov 6, 2019

I was a little unclear myself - the underlying getpwnam is called twice which is strange as it could be a potential race-hazard, i.e. if multiple threads are spawned that immediately get file requests then they could collide on setting that static inside the function. We are then assuming that the non-thread-safe function probably doesn't do anything wrong if it is called from two threads with the same argument.

But the web_file_uid() is called many times, it could just be an int.

@stelfrag

This comment has been minimized.

Copy link
Contributor

@stelfrag stelfrag commented Nov 6, 2019

Ok so I was also unclear :-) -- that's why web_files_uid() is called from main.c (before thread dispatching) to init the static vars and avoid that.

@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Nov 6, 2019

Ah ha! I missed that call. Then it is definitely less problematic. Thanks for pointing it out.

@thiagoftsm

This comment has been minimized.

Copy link
Contributor

@thiagoftsm thiagoftsm commented Nov 6, 2019

Hi @amoss and @stelfrag ,

I remember I already explained this function months ago for other person, it is common on netdata code to have static variable that determine when we will run an algorithm, we have a lot of functions like this on web server code.
This function would be a problem, if and only if, we did not initialized it from the main.c, but as you could see we call this function while netdata has only one thread, so we won't have any problem with it.

Best regards!

@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Nov 6, 2019

Finishing up, and about to attach a PR to this issue. The scope has changed during the past week - the original plan was to write some unit-tests that simply linked against the target functions and fed some captured raw headers into the processing functions and gave some coarse indicators of how they worked.

A better option appeared while working on this issue as @vlvkobal supplied the latest work on using CMocka against the exporting engine. I've now implemented a CMocka testdriver that works against the web_client.c code directly:

  • Parts of the database are mocked (currently stubs, no functional checks).
  • Our logging is mocked to recover progress.
  • Real web_client and web_buffer structures are instantiated.
  • The web_client_process_request can be test in a real context.
  • Chunks of the live data-capture have been pulled in to drive the tests.
  • Worked out a way to do data-driven testing inside CMocka (although it is designed to use a static list of tests).

Progress was slower than planned because of the change in scope, learning CMocka and unexpected behaviour during the data capture (issue #7253 ) The current status of the testdriver is a "PoC" - it works against the real code and provokes some strange behaviour, but it requires more work to complete fully. I will fix up issues in the backlog to describe current status before the planning session tomorrow but the immediate plan is:

  • Expand the testdriver to replicate all the behaviour from the live capture.
  • Capture better pass/fail indicators using the expect_ functionality from CMocka.
  • Use gcov/lcov to check that everything in the code that will be rewritten has been covered.
@amoss amoss closed this in #7264 Nov 7, 2019
amoss added a commit that referenced this issue Nov 7, 2019
* Start of unit-test for http request processing.

A CMocka test-driver has been added to replay the live-capture headers against the http request processing code. Enough stubs / mocking code have been written to check that we can inject a chosen raw request into the web_client code and check how it was processed. A parameterised buffer building function can setup requests for testing in `web_buffer` structures. Each endpoint in the API will be checked against a variety of http requests. This PoC demonstrates enough functionality to show building the request on demand for each test-case and cleaning up afterwards.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.