diff --git a/README b/README index 0f97224..f2e67bf 100644 --- a/README +++ b/README @@ -1,94 +1,399 @@ +NAME + HTML::Form - Class that represents an HTML form element + +SYNOPSIS + use HTML::Form; + $form = HTML::Form->parse($html, $base_uri); + $form->value(query => "Perl"); + + use LWP::UserAgent; + $ua = LWP::UserAgent->new; + $response = $ua->request($form->click); + +DESCRIPTION + Objects of the `HTML::Form' class represents a single HTML `
... +
' instance. A form consists of a sequence of inputs that usually + have names, and which can take on various values. The state of a form + can be tweaked and it can then be asked to provide `HTTP::Request' + objects that can be passed to the request() method of `LWP::UserAgent'. + + The following methods are available: + + @forms = HTML::Form->parse( $html_document, $base_uri ) + @forms = HTML::Form->parse( $html_document, base => $base_uri, %opt ) + @forms = HTML::Form->parse( $response, %opt ) + The parse() class method will parse an HTML document and build up + `HTML::Form' objects for each
element found. If called in + scalar context only returns the first . Returns an empty list + if there are no forms to be found. + + The required arguments is the HTML document to parse + ($html_document) and the URI used to retrieve the document + ($base_uri). The base URI is needed to resolve relative action URIs. + The provided HTML document should be a Unicode string (or US-ASCII). + + By default HTML::Form assumes that the original document was UTF-8 + encoded and thus encode forms that don't specify an explict + *accept-charset* as UTF-8. The charset assumed can be overridden by + providing the `charset' option to parse(). It's a good idea to be + explict about this parameter as well, thus the recommended simplest + invocation becomes: + + my @forms = HTML::Form->parse( + Encode::decode($encoding, $html_document_bytes), + base => $base_uri, + charset => $encoding, + ); + + If the document was retrieved with LWP then the response object + provide methods to obtain a proper value for `base' and `charset': + + my $ua = LWP::UserAgent->new; + my $response = $ua->get("http://www.example.com/form.html"); + my @forms = HTML::Form->parse($response->decoded_content, + base => $response->base, + charset => $response->content_charset, + ); + + In fact, the parse() method can parse from an `HTTP::Response' + object directly, so the example above can be more conveniently + written as: + + my $ua = LWP::UserAgent->new; + my $response = $ua->get("http://www.example.com/form.html"); + my @forms = HTML::Form->parse($response); + + Note that any object that implements a decoded_content(), base() and + content_charset() method with similar behaviour as `HTTP::Response' + will do. + + Additional options might be passed in to control how the parse + method behaves. The following are all the options currently + recognized: + + `base => $uri' + This is the URI used to retrive the original document. This + option is not optional ;-) + + `charset => $str' + Specify what charset the original document was encoded in. This + is used as the default for accept_charset. If not provided this + defaults to "UTF-8". + + `verbose => $bool' + Warn (print messages to STDERR) about any bad HTML form + constructs found. You can trap these with $SIG{__WARN__}. + + `strict => $bool' + Initialize any form objects with the given strict attribute. + + $method = $form->method + $form->method( $new_method ) + This method is gets/sets the *method* name used for the + `HTTP::Request' generated. It is a string like "GET" or "POST". + + $action = $form->action + $form->action( $new_action ) + This method gets/sets the URI which we want to apply the request + *method* to. + + $enctype = $form->enctype + $form->enctype( $new_enctype ) + This method gets/sets the encoding type for the form data. It is a + string like "application/x-www-form-urlencoded" or + "multipart/form-data". + + $accept = $form->accept_charset + $form->accept_charset( $new_accept ) + This method gets/sets the list of charset encodings that the server + processing the form accepts. Current implementation supports only + one-element lists. Default value is "UNKNOWN" which we interpret as + a request to use document charset as specified by the 'charset' + parameter of the parse() method. + + $value = $form->attr( $name ) + $form->attr( $name, $new_value ) + This method give access to the original HTML attributes of the + tag. The $name should always be passed in lower case. + + Example: + + @f = HTML::Form->parse( $html, $foo ); + @f = grep $_->attr("id") eq "foo", @f; + die "No form named 'foo' found" unless @f; + $foo = shift @f; + + $bool = $form->strict + $form->strict( $bool ) + Gets/sets the strict attribute of a form. If the strict is turned on + the methods that change values of the form will croak if you try to + set illegal values or modify readonly fields. The default is not to + be strict. + + @inputs = $form->inputs + This method returns the list of inputs in the form. If called in + scalar context it returns the number of inputs contained in the + form. See INPUTS for what methods are available for the input + objects returned. + + $input = $form->find_input( $selector ) + $input = $form->find_input( $selector, $type ) + $input = $form->find_input( $selector, $type, $index ) + This method is used to locate specific inputs within the form. All + inputs that match the arguments given are returned. In scalar + context only the first is returned, or `undef' if none match. + + If $selector is specified, then the input's name, id, class + attribute must match. A selector prefixed with '#' must match the id + attribute of the input. A selector prefixed with '.' matches the + class attribute. A selector prefixed with '^' or with no prefix + matches the name attribute. + + If $type is specified, then the input must have the specified type. + The following type names are used: "text", "password", "hidden", + "textarea", "file", "image", "submit", "radio", "checkbox" and + "option". + + The $index is the sequence number of the input matched where 1 is + the first. If combined with $name and/or $type then it select the + *n*th input with the given name and/or type. + + $value = $form->value( $selector ) + $form->value( $selector, $new_value ) + The value() method can be used to get/set the value of some input. + If strict is enabled and no input has the indicated name, then this + method will croak. + + If multiple inputs have the same name, only the first one will be + affected. + + The call: + + $form->value('foo') + + is basically a short-hand for: + + $form->find_input('foo')->value; + + @names = $form->param + @values = $form->param( $name ) + $form->param( $name, $value, ... ) + $form->param( $name, \@values ) + Alternative interface to examining and setting the values of the + form. + + If called without arguments then it returns the names of all the + inputs in the form. The names will not repeat even if multiple + inputs have the same name. In scalar context the number of different + names is returned. + + If called with a single argument then it returns the value or values + of inputs with the given name. If called in scalar context only the + first value is returned. If no input exists with the given name, + then `undef' is returned. + + If called with 2 or more arguments then it will set values of the + named inputs. This form will croak if no inputs have the given name + or if any of the values provided does not fit. Values can also be + provided as a reference to an array. This form will allow unsetting + all values with the given name as well. + + This interface resembles that of the param() function of the CGI + module. + + $form->try_others( \&callback ) + This method will iterate over all permutations of unvisited + enumerated values ( *elements* in the HTML document. An input object + basically represents a name/value pair, so when multiple HTML elements + contribute to the same name/value pair in the submitted form they are + combined. + + The input elements that are mapped one-to-one are "text", "textarea", + "password", "hidden", "file", "image", "submit" and "checkbox". For the + "radio" and "option" inputs the story is not as simple: All elements with the same name will contribute to the same + input radio object. The number of radio input objects will be the same + as the number of distinct names used for the + elements. For a + element there will be one input object for each contained