Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Faster HTTP date parsing #4

Merged
merged 5 commits into from Dec 25, 2011

Conversation

Projects
None yet
2 participants

Time (in us) for +dateFromHttpDateString:-
... before this commit: 69.439us.
... after this commit   :  2.993us, over 22x faster.

johnezang added some commits Dec 24, 2011

Fix bug in the date strings that the unit test creates.
Per [RFC 2616, section 3.3.1](http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3.1)-

> All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception.

Prior to this commit, the dates being created were in the systems local timezone.
Significantly improve the performance of parsing HTTP dates.
Time (in us) for `+dateFromHttpDateString:`-
... before this commit: 69.439us.
... after this commit :  2.993us, over 22x faster.

Awesome, thank you John!

Can you shortly explain how you created those tables? I totally will merge this, but I'd love to understand it :)

Owner

johnezang replied Dec 24, 2011

Ragel. Below is the ragel source I used. I then polished / pretty-printed and tweaked the output from ragel. Compiled with:

shell% ragel -F1 http-date.rl
shell% gcc -o http-date http-date.c
shell% ./http-date 'Sun, 06 Nov 1994 08:49:37 GMT' 'Sunday, 06-Nov-94 08:49:37 GMT' 'Sun Nov  6 08:49:37 1994' 'Sat Dec 24 14:34:26 2037' 'Sunday, 06-Nov-94 08:49:37 GMT' 'Sun, 06 Nov 1994 08:49:37 GMT'
// http-date.rl

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>

%%{
    machine httpDate;

    yearDigit   = ( [0-9] @{ gdate.year   = gdate.year   * 10   + (fc - '0'); });
    dayDigit    = ( [0-9] @{ gdate.day    = gdate.day    * 10   + (fc - '0'); });
    hourDigit   = ( [0-9] @{ gdate.hour   = gdate.hour   * 10   + (fc - '0'); });
    minuteDigit = ( [0-9] @{ gdate.minute = gdate.minute * 10   + (fc - '0'); });
    secondDigit = ( [0-9] @{ gdate.second = gdate.second * 10.0 + (fc - '0'); });

    wkday        = ("Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun");
    weekday      = ("Monday" | "Tuesday" | "Wednesday" | "Thursday" | "Friday" | "Saturday" | "Sunday");
    month        = (("Jan" @{ gdate.month =  1; }) | ("Feb" @{ gdate.month =  2; }) | ("Mar" @{ gdate.month =  3; }) |
                    ("Apr" @{ gdate.month =  4; }) | ("May" @{ gdate.month =  5; }) | ("Jun" @{ gdate.month =  6; }) |
                    ("Jul" @{ gdate.month =  7; }) | ("Aug" @{ gdate.month =  8; }) | ("Sep" @{ gdate.month =  9; }) |
                    ("Oct" @{ gdate.month = 10; }) | ("Nov" @{ gdate.month = 11; }) | ("Dec" @{ gdate.month = 12; }));

    year4Digits   = (yearDigit . yearDigit . yearDigit . yearDigit);
    year2Digits   = ((yearDigit . yearDigit) @{ gdate.year += 1900; });
    dayDigits     = (dayDigit . dayDigit);
    hourDigits    = (hourDigit . hourDigit);
    minuteDigits  = (minuteDigit . minuteDigit);
    secondDigits  = (secondDigit . secondDigit);

    date1        = (dayDigits  . " " . month . " " . year4Digits);
    date2        = (dayDigits  . "-" . month . "-" . year2Digits);
    date3        = (month      . " " . (dayDigits | " " . dayDigit));
    time         = (hourDigits . ":" . minuteDigits . ":" . secondDigits);

    rfc1123date  = (wkday   . "," . " " . date1 . " " . time . " " . "GMT");
    rfc850date   = (weekday . "," . " " . date2 . " " . time . " " . "GMT");
    asctimedate  = (wkday   . " "       . date3 . " " . time . " " . year4Digits);

    HTTPdate    = ((rfc1123date %{ parsed = 1; }) | (rfc850date %{ parsed = 1; }) | (asctimedate %{ parsed = 1; }));

    main := HTTPdate;
}%%

%% write data nofinal;

typedef signed char SInt8;
typedef signed int SInt32;

struct __gdate {
    SInt32 year;
    SInt8 month;
    SInt8 day;
    SInt8 hour;
    SInt8 minute;
    double second;
};
typedef struct __gdate __gdate;

void scanner(char *buf) {
    int cs;

    int parsed = 0;
    __gdate gdate;
    memset(&gdate, 0, sizeof(__gdate));

    %% write init;

    {
        int len = strlen(buf);
        char *p = buf, *pe = p + len, *eof = pe;

        %% write exec;
    }

    printf("parsed: %d, %d/%d/%d %d:%d:%.2f\n", parsed, gdate.month, gdate.day, gdate.year, gdate.hour, gdate.minute, gdate.second);
}


int main(int argc, char *argv[]) {
  if(argc > 1) { int x; for(x = 1; x < argc; x++) { scanner(argv[x]); printf("-----\n");   }}
    return 0;
}

Great, thanks! We should probably commit this code too, since the tables generated from the state machines are pretty un-debug-able.
What's a good way to add this? As a comment?

Owner

johnezang replied Dec 25, 2011

As a comment probably makes the most sense, all things considered.

steipete added a commit that referenced this pull request Dec 25, 2011

Merge pull request #4 from johnezang/master
Faster HTTP date parsing (22x faster!)

@steipete steipete merged commit f8cf070 into steipete:master Dec 25, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment