Permalink
Browse files

Start changing HTTP parser.

  • Loading branch information...
1 parent 19933cb commit 0dde1f589e1e71069734be829dc90582f121cfd6 @nicolasff committed Jan 22, 2011
Showing with 2,232 additions and 5 deletions.
  1. +4 −2 Makefile
  2. +4 −0 http-parser/.gitignore
  3. +4 −0 http-parser/CONTRIBUTIONS
  4. +19 −0 http-parser/LICENSE-MIT
  5. +171 −0 http-parser/README.md
  6. +1,601 −0 http-parser/http_parser.c
  7. +181 −0 http-parser/http_parser.h
  8. +105 −0 http.c
  9. +52 −0 http.h
  10. +87 −2 server.c
  11. +4 −1 server.h
View
@@ -2,9 +2,11 @@ OUT=webdis
HIREDIS_OBJ=hiredis/hiredis.o hiredis/sds.o hiredis/net.o hiredis/async.o hiredis/dict.o
JANSSON_OBJ=jansson/src/dump.o jansson/src/error.o jansson/src/hashtable.o jansson/src/load.o jansson/src/strbuffer.o jansson/src/utf.o jansson/src/value.o jansson/src/variadic.o
FORMAT_OBJS=formats/json.o formats/raw.o formats/common.o formats/custom-type.o
-OBJS=webdis.o conf.o $(FORMAT_OBJS) cmd.o slog.o server.o $(HIREDIS_OBJ) $(JANSSON_OBJ) libb64/cencode.o acl.o md5/md5.o
+HTTP_PARSER_OBJS=http-parser/http_parser.o
+DEPS=$(FORMAT_OBJS) $(HIREDIS_OBJ) $(JANSSON_OBJ) $(HTTP_PARSER_OBJS)
+OBJS=webdis.o conf.o cmd.o slog.o server.o libb64/cencode.o acl.o md5/md5.o http.o $(DEPS)
-CFLAGS=-O3 -Wall -Wextra -I. -Ijansson/src
+CFLAGS=-O3 -Wall -Wextra -I. -Ijansson/src -Ihttp-parser
LDFLAGS=-levent
all: $(OUT) Makefile
View
@@ -0,0 +1,4 @@
+tags
+*.o
+test
+test_g
@@ -0,0 +1,4 @@
+Contributors must agree to the Contributor License Agreement before patches
+can be accepted.
+
+http://spreadsheets2.google.com/viewform?hl=en&formkey=dDJXOGUwbzlYaWM4cHN1MERwQS1CSnc6MQ
View
@@ -0,0 +1,19 @@
+Copyright 2009,2010 Ryan Dahl <ry@tinyclouds.org>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.
View
@@ -0,0 +1,171 @@
+HTTP Parser
+===========
+
+This is a parser for HTTP messages written in C. It parses both requests and
+responses. The parser is designed to be used in performance HTTP
+applications. It does not make any syscalls nor allocations, it does not
+buffer data, it can be interrupted at anytime. Depending on your
+architecture, it only requires about 40 bytes of data per message
+stream (in a web server that is per connection).
+
+Features:
+
+ * No dependencies
+ * Handles persistent streams (keep-alive).
+ * Decodes chunked encoding.
+ * Upgrade support
+ * Defends against buffer overflow attacks.
+
+The parser extracts the following information from HTTP messages:
+
+ * Header fields and values
+ * Content-Length
+ * Request method
+ * Response status code
+ * Transfer-Encoding
+ * HTTP version
+ * Request path, query string, fragment
+ * Message body
+
+
+Usage
+-----
+
+One `http_parser` object is used per TCP connection. Initialize the struct
+using `http_parser_init()` and set the callbacks. That might look something
+like this for a request parser:
+
+ http_parser_settings settings;
+ settings.on_path = my_path_callback;
+ settings.on_header_field = my_header_field_callback;
+ /* ... */
+
+ http_parser *parser = malloc(sizeof(http_parser));
+ http_parser_init(parser, HTTP_REQUEST);
+ parser->data = my_socket;
+
+When data is received on the socket execute the parser and check for errors.
+
+ size_t len = 80*1024, nparsed;
+ char buf[len];
+ ssize_t recved;
+
+ recved = recv(fd, buf, len, 0);
+
+ if (recved < 0) {
+ /* Handle error. */
+ }
+
+ /* Start up / continue the parser.
+ * Note we pass recved==0 to signal that EOF has been recieved.
+ */
+ nparsed = http_parser_execute(parser, &settings, buf, recved);
+
+ if (parser->upgrade) {
+ /* handle new protocol */
+ } else if (nparsed != recved) {
+ /* Handle error. Usually just close the connection. */
+ }
+
+HTTP needs to know where the end of the stream is. For example, sometimes
+servers send responses without Content-Length and expect the client to
+consume input (for the body) until EOF. To tell http_parser about EOF, give
+`0` as the forth parameter to `http_parser_execute()`. Callbacks and errors
+can still be encountered during an EOF, so one must still be prepared
+to receive them.
+
+Scalar valued message information such as `status_code`, `method`, and the
+HTTP version are stored in the parser structure. This data is only
+temporally stored in `http_parser` and gets reset on each new message. If
+this information is needed later, copy it out of the structure during the
+`headers_complete` callback.
+
+The parser decodes the transfer-encoding for both requests and responses
+transparently. That is, a chunked encoding is decoded before being sent to
+the on_body callback.
+
+
+The Special Problem of Upgrade
+------------------------------
+
+HTTP supports upgrading the connection to a different protocol. An
+increasingly common example of this is the Web Socket protocol which sends
+a request like
+
+ GET /demo HTTP/1.1
+ Upgrade: WebSocket
+ Connection: Upgrade
+ Host: example.com
+ Origin: http://example.com
+ WebSocket-Protocol: sample
+
+followed by non-HTTP data.
+
+(See http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75 for more
+information the Web Socket protocol.)
+
+To support this, the parser will treat this as a normal HTTP message without a
+body. Issuing both on_headers_complete and on_message_complete callbacks. However
+http_parser_execute() will stop parsing at the end of the headers and return.
+
+The user is expected to check if `parser->upgrade` has been set to 1 after
+`http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
+offset by the return value of `http_parser_execute()`.
+
+
+Callbacks
+---------
+
+During the `http_parser_execute()` call, the callbacks set in
+`http_parser_settings` will be executed. The parser maintains state and
+never looks behind, so buffering the data is not necessary. If you need to
+save certain data for later usage, you can do that from the callbacks.
+
+There are two types of callbacks:
+
+* notification `typedef int (*http_cb) (http_parser*);`
+ Callbacks: on_message_begin, on_headers_complete, on_message_complete.
+* data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
+ Callbacks: (requests only) on_path, on_query_string, on_uri, on_fragment,
+ (common) on_header_field, on_header_value, on_body;
+
+Callbacks must return 0 on success. Returning a non-zero value indicates
+error to the parser, making it exit immediately.
+
+In case you parse HTTP message in chunks (i.e. `read()` request line
+from socket, parse, read half headers, parse, etc) your data callbacks
+may be called more than once. Http-parser guarantees that data pointer is only
+valid for the lifetime of callback. You can also `read()` into a heap allocated
+buffer to avoid copying memory around if this fits your application.
+
+Reading headers may be a tricky task if you read/parse headers partially.
+Basically, you need to remember whether last header callback was field or value
+and apply following logic:
+
+ (on_header_field and on_header_value shortened to on_h_*)
+ ------------------------ ------------ --------------------------------------------
+ | State (prev. callback) | Callback | Description/action |
+ ------------------------ ------------ --------------------------------------------
+ | nothing (first call) | on_h_field | Allocate new buffer and copy callback data |
+ | | | into it |
+ ------------------------ ------------ --------------------------------------------
+ | value | on_h_field | New header started. |
+ | | | Copy current name,value buffers to headers |
+ | | | list and allocate new buffer for new name |
+ ------------------------ ------------ --------------------------------------------
+ | field | on_h_field | Previous name continues. Reallocate name |
+ | | | buffer and append callback data to it |
+ ------------------------ ------------ --------------------------------------------
+ | field | on_h_value | Value for current header started. Allocate |
+ | | | new buffer and copy callback data to it |
+ ------------------------ ------------ --------------------------------------------
+ | value | on_h_value | Value continues. Reallocate value buffer |
+ | | | and append callback data to it |
+ ------------------------ ------------ --------------------------------------------
+
+
+See examples of reading in headers:
+
+* [partial example](http://gist.github.com/155877) in C
+* [from http-parser tests](http://github.com/ry/http-parser/blob/37a0ff8928fb0d83cec0d0d8909c5a4abcd221af/test.c#L403) in C
+* [from Node library](http://github.com/ry/node/blob/842eaf446d2fdcb33b296c67c911c32a0dabc747/src/http.js#L284) in Javascript
Oops, something went wrong.

0 comments on commit 0dde1f5

Please sign in to comment.