Skip to content
Yet another simple url parser.
C C++ Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Makefile
README.md
test_urlparse.c
url.txt
urlparse.c
urlparse.h

README.md

urlparser

Yet another simple url parser.

Motivation

Parsing an URL is neither a challenging nor even an interesting problem, and there have already been lots of implementations.

I still started this idea as a side-project, due to 1) the coding work is moderate, it should be done in a few hundred lines of code; 2) it is somehow practical even it is considered as a toy-project.

After all, as the old saying mentions: "learning by doing".

Design

Since URL is relatively straightforward, currently I follow the description on Wiki/URL.^1

The syntax of an URI^1:

URI = scheme:[//authority]path[?query][#fragment]
authority = [userinfo@]host[:port]

  • scheme is mandatory
  • authority is optional, and if authority is present:
    • user info is optional
    • host is mandatory
    • port is optional
  • path is mandatory
  • query is optional
  • fragment is optional

Please notice that currently the url parser can only recognize a valid url format, that is, follows the syntax above.

Implementation

I refer to the design of http-parser^2, that is, instead of duplicating the url string, each field only points to the offset of the given url string, with a len limit.

I do not use regular expression (re) to parse urls. Instead, it simply scans the given url from beginning to end, and look for delimiters of each field.

The parsing result returns a struct:

typedef struct {
  field_t *scheme;     // mandatory
  field_t *usernm;     // optional
  field_t *passwd;     // optional
  field_t *host;       // optional
  field_t *port;       // optional
  field_t *path;       // mandatory
  field_t *query;      // optional
  field_t *frag;       // optional
} url_t;

with each field defined as:

typedef struct {
  char *offset;
  unsigned int len;
} field_t;

If a field is not NULL, then

  • [field_t]->offset: points to the start character of the filed in the original url
  • [field_t]->len: give the len of the field

API

url_t *
url_parse(char *url); // parse the given url, returns the url_t as result

void
url_print(url_t *url_stru); // print parsing result

void
url_del(url_t *url_stru); // delete parsing result, free memory

Test

You can’t perform that action at this time.