Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the URL parsing code. #7164

Open
amoss opened this issue Oct 23, 2019 · 3 comments
Open

Rewrite the URL parsing code. #7164

amoss opened this issue Oct 23, 2019 · 3 comments
Assignees
Milestone

Comments

@amoss
Copy link
Contributor

@amoss amoss commented Oct 23, 2019

Summary

Careful analysis of the code in the picohttpparser has shown that it does not solve the problem that we have, and it solves a different problem that we do not have.

What we need:

  • We need to process a complete http request, detecting partial requests and avoid potential for slow TCP attacks.
  • Check if the request uses a GET or our custom STREAM and dispatch appropriately.
  • Handle any differences in file-transfers vs API calls (e.g. setting socket options).
  • Break the path down into components, and use to handle the dispatch into the API.
  • Break down the rest of the URL, extracting the parameter key/value pairs quickly, and producing a form that we can use in the API.
  • Extract the values of white-listed headers that we care about (avoiding buffer overflow attacks).

What the picohttpparser code does:

  • Builds a dictionary of all the headers in the request for arbitrary processing by the h2o web-server.
  • Treats the URL as an opaque string without any further processing.
  • Extract an arbitrary method but does no processing on it.

Proposed work:

  • Implement the list above ("what we need") directly in the existing code.
  • Use the unit-tests to make sure it is bullet-proof.
  • Use the fuzzer to check that it accesses the API correctly.
  • Put gcov / lcov into the unit-test build to make sure that we've tested it properly.

Notes about complexity:

  • The largest change in the implementation style is that we should not rewrite the URL string in order to call simple_hash as this creates shared state that is pushed down a deep call-chain.
  • Add simple_hash_n to libnetdata: a variation that does not search for a null-terminator but instead hashes n characters.
  • Build a fixed size auxiliary structure that identifies the location of known parameter values in the string in a single-pass.
  • Avoid partitioning the string with null-terminators to separate into tokens.

Expected result

Given the wide estimates for planning, and the team perception of risk the result will be demonstrable code on a fork. Testing and review before it is merged into main will be handled in a separate issue.

@amoss amoss self-assigned this Oct 23, 2019
@cosmix cosmix added this to the v1.19-Sprint2 milestone Oct 23, 2019
amoss added a commit to amoss/netdata that referenced this issue Oct 29, 2019
This find a number of 200 status codes returning from invalid requests, which would be the original problem that the
issue for replacing the url-parser was intended to fix.

I can't do any further validation at this point because of missing json descriptions in the swagger / bugs in the
netdata code.

This matches the intention in the issues, the missing documentation will get pushed into relevant issues (which will
get pulled inside the same epic). The final validation of the working code is scheduled for netdata#7165 after netdata#7164 is
complete.
amoss added a commit to amoss/netdata that referenced this issue Oct 29, 2019
This find a number of 200 status codes returning from invalid requests, which would be the original problem that the
issue for replacing the url-parser was intended to fix.

I can't do any further validation at this point because of missing json descriptions in the swagger / bugs in the
netdata code.

This matches the intention in the issues, the missing documentation will get pushed into relevant issues (which will
get pulled inside the same epic). The final validation of the working code is scheduled for netdata#7165 after netdata#7164 is
complete.
@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Oct 30, 2019

Estimate is changing because some of the work has moved into #7229 .

@amoss amoss mentioned this issue Oct 30, 2019
6 of 7 tasks complete
amoss added a commit to amoss/netdata that referenced this issue Oct 31, 2019
This find a number of 200 status codes returning from invalid requests, which would be the original problem that the
issue for replacing the url-parser was intended to fix.

I can't do any further validation at this point because of missing json descriptions in the swagger / bugs in the
netdata code.

This matches the intention in the issues, the missing documentation will get pushed into relevant issues (which will
get pulled inside the same epic). The final validation of the working code is scheduled for netdata#7165 after netdata#7164 is
complete.
amoss added a commit to amoss/netdata that referenced this issue Nov 1, 2019
This find a number of 200 status codes returning from invalid requests, which would be the original problem that the
issue for replacing the url-parser was intended to fix.

I can't do any further validation at this point because of missing json descriptions in the swagger / bugs in the
netdata code.

This matches the intention in the issues, the missing documentation will get pushed into relevant issues (which will
get pulled inside the same epic). The final validation of the working code is scheduled for netdata#7165 after netdata#7164 is
complete.
@amoss amoss removed this from the v1.19-Sprint2 milestone Nov 6, 2019
@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Nov 6, 2019

Sprint milestone label should have been removed when the other issues in the epic were re-split after the planning meeting - doing it now to clear up before tomorrow.

@amoss amoss changed the title Import the picohttpparser code. Rewrite the URL parsing code. Nov 8, 2019
@rduffield rduffield added this to the v1.19-Sprint4 milestone Nov 21, 2019
@amoss

This comment has been minimized.

Copy link
Contributor Author

@amoss amoss commented Dec 3, 2019

Task will not be finished in this sprint - we had a change of priorities and this is being delayed until after the Winter Release.

@amoss amoss modified the milestones: v1.19-Sprint4, AfterWR-2019 Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.