Permalink
Browse files

New branch including upgrade to 00 interface. Now "new" method can pa…

…rse URI and init object. Formalize access to parse tree from grammar. New pluggable grammar design. Add query_form method to provide access to parsed and split up query section of URI for applications needing CGI variables.
  • Loading branch information...
1 parent 87638ae commit 6854c4a371ac23b988a8088f235ec961774fe9f5 @ronaldxs ronaldxs committed Jun 18, 2011
Showing with 214 additions and 64 deletions.
  1. +2 −2 README
  2. +45 −0 lib/IETF/RFC_Grammar.pm
  3. +105 −19 lib/URI.pm
  4. +52 −31 t/01.t
  5. +10 −12 t/rfc-3986-examples.t
View
4 README
@@ -1,3 +1,3 @@
-Perl6 realization of URI Uniform Resource Identifiers handler
+Perl6 realization of URI - Uniform Resource Identifiers handler
-It is can only parse URI for now, and provide some methods to it`s parts.
+A URI implementation using Perl 6 grammars to implement RFC 3986 BNF. Currently only implements parsing.
View
@@ -0,0 +1,45 @@
+class IETF::RFC_Grammar;
+
+# Thanks in part to Aaron Sherman
+# and his article http://essays.ajs.com/2010/05/writing-perl-6-uri-module.html
+# for inspiring this.
+
+# below should be constant when implemented ...
+my %rfc_grammar_build = (
+ 'rfc3896' => 'IETF::RFC_Grammar::URI'
+);
+my %rfc_grammar;
+
+
+has $.rfc;
+has $.grammar;
+has $.parse_result;
+
+method parse($parse_str) {
+ $!grammar.parse($parse_str);
+ unless $/ { die "Parse failed" } # very weak I think
@tadzik

tadzik Jun 18, 2011

Could be just grammar.parse() or die "bla bla"

+ $!parse_result = $/
+}
+
+submethod BUILD($!rfc, $!grammar) {}
+
+method new(Str $rfc, $grammar?) {
+ my $init_grammar = $grammar;
+
+ if (
+ (! $init_grammar.can('parse')) and
+ %rfc_grammar_build.exists($rfc)
+ ) {
+ unless %rfc_grammar.exists($rfc) {
+ require %rfc_grammar_build{$rfc};
+ %rfc_grammar{$rfc} = eval %rfc_grammar_build{$rfc};
+ }
+ $init_grammar = %rfc_grammar{$rfc};
+ }
+ if (! $init_grammar.can('parse')) {
+ die "Need either rfc with known grammar or grammar";
+ }
+
+ return self.bless(*, rfc => $rfc, grammar => $init_grammar);
+}
+
View
@@ -1,32 +1,45 @@
class URI;
-has $.uri;
+use IETF::RFC_Grammar;
+use IETF::RFC_Grammar::URI;
+use URI::Escape;
+
+has $.uri; # use of this now deprecated
@tadzik

tadzik Jun 18, 2011

If it's deprecated, something like this could be useful:
has $!uri; # no public accessor method uri() { warn "this method is now deprecated"; $!uri }

+
+has $.grammar is ro;
has $!path;
has Bool $!is_absolute is ro;
has $!scheme;
has $!authority;
has $!query;
has $!frag;
-has @.chunks;
+has %!query_form;
-method init ($str) {
- use IETF::RFC_Grammar::URI;
+has @.segments;
+
+method parse (Str $str) {
# clear string before parsing
my $c_str = $str;
$c_str .= subst(/^ \s* ['<' | '"'] /, '');
$c_str .= subst(/ ['>' | '"'] \s* $/, '');
- IETF::RFC_Grammar::URI.parse($c_str);
- unless $/ { die "Could not parse URI: $str" }
-
$!uri = $!path = $!is_absolute = $!scheme = $!authority = $!query =
$!frag = Mu;
- @!chunks = Nil;
+ %!query_form = @!segments = Nil;
+
+ try {
+ $!grammar.parse($c_str);
+ }
+ CATCH {
+ die "Could not parse URI: $str";
+ }
- $!uri = $/;
+ # now deprecated
+ $!uri = $!grammar.parse_result;
- my $comp_container = $/<URI_reference><URI> // $/<URI_reference><relative_ref>;
+ my $comp_container = $!grammar.parse_result<URI_reference><URI> //
+ $!grammar.parse_result<URI_reference><relative_ref>;
$!scheme = $comp_container<scheme>;
$!query = $comp_container<query>;
$!frag = $comp_container<fragment>;
@@ -40,14 +53,67 @@ method init ($str) {
$!path //= $comp_container<path_noscheme> //
$comp_container<path_rootless> ;
- @!chunks = $!path<segment>.list() // ('');
+ @!segments = $!path<segment>.list() // ('');
if my $first_chunk = $!path<segment_nz_nc> // $!path<segment_nz> {
- unshift @!chunks, $first_chunk;
+ unshift @!segments, $first_chunk;
}
- if @!chunks.elems == 0 {
- @!chunks = ('');
+ if @!segments.elems == 0 {
+ @!segments = ('');
}
-# @!chunks ||= ('');
+# @!segments ||= ('');
+
+ try {
+ %!query_form = split_query( ~$!query );
+ }
+ CATCH {
+ %!query_form = Nil;
+ }
+}
+
+sub split_query(Str $query) {
+ my %query_form;
+
+ for map { [split(/<[=]>/, $_) ]}, split(/<[&;]>/, $query) -> $qmap {
+ for (0, 1) -> $i { # could go past 1 in theory ...
+ $qmap[ $i ] ~~ s:g/\+/ /;
+ $qmap[ $i ] = uri_unescape($qmap[ $i ]);
+ }
+ if %query_form.exists($qmap[0]) {
+ if %query_form{ $qmap[0] } ~~ Array {
+ %query_form{ $qmap[0] }.push($qmap[1])
+ }
+ else {
+ %query_form{ $qmap[0] } = [
+ %query_form{ $qmap[0] }, $qmap[1]
+ ]
+ }
+ }
+ else {
+ %query_form{ $qmap[0]} = $qmap[1]
+ }
+ }
+
+ return %query_form;
+}
+
+# deprecated old call for parse
+method init ($str) {
+ $.parse($str);
+}
+
+# new can pass alternate grammars some day ...
+submethod BUILD {
+ $!grammar = IETF::RFC_Grammar.new('rfc3896');
+}
+
+method new(Str $str?) {
+ my $obj = self.bless(*);
+
+ if ($str.defined) {
+ $obj.parse($str);
+ }
+
+ return $obj;
}
method scheme {
@@ -94,21 +160,29 @@ method Str() {
$str ~= $.path;
$str ~= '?' ~ $.query if $.query;
$str ~= '#' ~ $.frag if $.frag;
- return $str;
+ return $str;
+}
+
+# chunks now strongly deprecated
+# it's segments in p5 URI and segment is part of rfc so no more chunks soon!
+method chunks {
+ return @!segments;
}
+method query_form {
+ return %!query_form;
+}
=begin pod
=head NAME
-URI — Uniform Resource Identifiers (absolute and relative)
+URI — Uniform Resource Identifiers (absolute and relative)
=head SYNOPSYS
use URI;
- my $u = URI.new;
- $u.init('http://her.com/foo/bar?tag=woow#bla');
+ my $u = URI.new('http://her.com/foo/bar?tag=woow#bla');
my $scheme = $u.scheme;
my $authority = $u.authority;
@@ -117,10 +191,22 @@ URI — Uniform Resource Identifiers (absolute and relative)
my $path = $u.path;
my $query = $u.query;
my $frag = $u.frag; # or $u.fragment;
+ my $tag = $u.query_form<tag>; # should be woow
my $is_absolute = $u.absolute;
my $is_relative = $u.relative;
+ # something p5 URI without grammar could not easily do !
+ my $host_in_grammar =
+ $u.grammar.parse_result<URI_reference><URI><hier_part><authority><host>;
+ if ($host_in_grammar<reg_name>) {
+ say 'Host looks like registered domain name - approved!';
+ }
+ else {
+ say 'Sorry we do not take ip address hosts at this time.';
+ say 'Please use registered domain name!';
+ }
+
=end pod
View
83 t/01.t
@@ -1,62 +1,83 @@
use v6;
use Test;
-plan 28;
+plan 38;
use URI;
ok(1,'We use URI and we are still alive');
-my $u = URI.new;
-$u.init('http://example.com:80/about/us?foo#bar');
+my $u = URI.new('http://example.com:80/about/us?foo#bar');
is($u.scheme, 'http', 'scheme');
-is($u.host, 'example.com', 'host');
-is($u.port, '80', 'port');
-is($u.path, '/about/us', 'path');
-is($u.query, 'foo', 'query');
-is($u.frag, 'bar', 'frag');
-is($u.chunks, 'about us', 'chunks');
-is($u.chunks[0], 'about', 'first chunk');
-is($u.chunks[1], 'us', 'second chunk');
+is($u.host, 'example.com', 'host');
+is($u.port, '80', 'port');
+is($u.path, '/about/us', 'path');
+is($u.query, 'foo', 'query');
+is($u.frag, 'bar', 'frag');
+is($u.segments, 'about us', 'segements');
+is($u.segments[0], 'about', 'first chunk');
+is($u.segments[1], 'us', 'second chunk');
is( ~$u, 'http://example.com:80/about/us?foo#bar',
'Complete path stringification');
-$u.init('https://eXAMplE.COM');
+$u.parse('https://eXAMplE.COM');
-is($u.scheme, 'https', 'scheme');
-is($u.host, 'example.com', 'host');
+is($u.scheme, 'https', 'scheme');
+is($u.host, 'example.com', 'host');
is( "$u", 'https://example.com',
'https://eXAMplE.COM stringifies to https://example.com');
-$u.init('/foo/bar/baz');
+$u.parse('/foo/bar/baz');
-is($u.chunks, 'foo bar baz', 'chunks from absolute path');
-ok($u.absolute, 'absolute path');
-nok($u.relative, 'not relative path');
+is($u.segments, 'foo bar baz', 'setments from absolute path');
+ok($u.absolute, 'absolute path');
+nok($u.relative, 'not relative path');
-$u.init('foo/bar/baz');
+$u.parse('foo/bar/baz');
-is($u.chunks, 'foo bar baz', 'chunks from relative path');
-ok( $u.relative, 'relative path');
-nok($u.absolute, 'not absolute path');
+is($u.segments, 'foo bar baz', 'segements from relative path');
+ok( $u.relative, 'relative path');
+nok($u.absolute, 'not absolute path');
-is($u.chunks[0], 'foo', 'first chunk');
-is($u.chunks[1], 'bar', 'second chunk');
-is($u.chunks[*-1], 'baz', 'last chunk');
+is($u.segments[0], 'foo', 'first segment');
+is($u.segments[1], 'bar', 'second segment');
+is($u.segments[*-1], 'baz', 'last seement');
-$u.init('http://foo.com');
+$u.parse('http://foo.com');
-ok($u.chunks.list.perl eq '[""]', ".chunks return [''] for empty path");
-ok($u.absolute, 'http://foo.com has an absolute path');
-nok($u.relative, 'http://foo.com does not have a relative path');
+ok($u.segments.list.perl eq '[""]', ".segments return [''] for empty path");
+ok($u.absolute, 'http://foo.com has an absolute path');
+nok($u.relative, 'http://foo.com does not have a relative path');
# test URI parsing with <> or "" and spaces
-$u.init("<http://foo.com> ");
+$u.parse("<http://foo.com> ");
is("$u", 'http://foo.com', '<> removed from str');
-$u.init(' "http://foo.com"');
+$u.parse(' "http://foo.com"');
is("$u", 'http://foo.com', '"" removed from str');
+my $host_in_grammar =
+ $u.grammar.parse_result<URI_reference><URI><hier_part><authority><host>;
+is($host_in_grammar<IPv4address>, '', 'grammar detected host not ip'
+);
+is($host_in_grammar<reg_name>, 'foo.com', 'grammar detected registered domain style');
+
+$u.parse('http://10.0.0.1');
+is($u.host, '10.0.0.1', 'numeric host');
+$host_in_grammar =
+ $u.grammar.parse_result<URI_reference><URI><hier_part><authority><host>;
+
+is($host_in_grammar<IPv4address>, '10.0.0.1', 'grammar detected ipv4');
+is($host_in_grammar<reg_name>, '', 'grammar detected no registered domain style');
+
+$u.parse('http://example.com:80/about?foo=cod&bell=bob#bar');
+is($u.query, 'foo=cod&bell=bob', 'query with form params');
+is($u.query_form<foo>, 'cod', 'query param foo');
+is($u.query_form<bell>, 'bob', 'query param bell');
+
+$u.parse('http://example.com:80/about?foo=cod&foo=trout#bar');
+is($u.query_form<foo>[0], 'cod', 'query param foo - el 1');
+is($u.query_form<foo>[1], 'trout', 'query param foo - el 2');
# vim:ft=perl6
Oops, something went wrong.

4 comments on commit 6854c4a

Very nice. The new interface looks much better. Have you considered making the URI object immutable? Also, when deprecating something, it's maybe good to put a warn() somewhere in the method.

Collaborator

ronaldxs replied Jun 19, 2011

Have you considered making the URI object immutable?

The Perl 5 URI module has mutators as well as accessors for many URI components. Why should the Perl 6 URI module be different in this respect?

No specific thing, it just feels natural to me in a semi-functional Perl 6 world. But I don't have to be right :)

Could be just grammar.parse() or die "bla bla"

The grammar has a parse method too but currently the URI parse method calls the grammar parse method and then does a fair amount of its own
processing. That seems fine and I am not sure there is a reason to change it.

I don't think I understand all this, I was refering to unless $/ { die "Parse failed" } # very weak I think specifically

In my email yesterday I asked
about other ways to ask for feedback from Perl6 developers. Any
suggestions on appropriate ways to do that?

perl6 irc channel on freenode is the most usual way to contact Perl 6 people, at least that's what I feel. Feel free to drop in, that's where most of the devs reside.

Collaborator

ronaldxs replied Jun 22, 2011

Please sign in to comment.