Skip to content
Browse files

Remove outdated docs (not touched since 1996 :-)

  • Loading branch information...
1 parent ee9b279 commit 34ae651d8a729d6b41e568d939665c16c4626573 @gisle committed May 2, 2011
Showing with 0 additions and 782 deletions.
  1. +0 −387 doc/examples.html
  2. +0 −45 doc/features.html
  3. +0 −36 doc/install.html
  4. +0 −314 doc/norobots.html
View
387 doc/examples.html
@@ -1,387 +0,0 @@
-<title>libwww-perl-5</title>
-
-<h1 align=center>LIBWWW-PERL-5</h1>
-
-<h2>Introduction</h2>
-
-<p>The libwww-perl-5 library is a collection of perl modules that
-provide a simple and consistent programming interface to the
-World-Wide Web. The library also contain modules that are of more
-general use.
-
-<p>This article gives an introduction to the library and ...
-
-<p>The main focus of the library is to provide functions that allow
-you to write WWW clients, thus libwww-perl is often presented as a WWW
-client library. The main features of the library are:
-
-<ul>
-
- <li> Contains various reuseable components (modules) that can be
- used separately.
-
- <li> Provides an object oriented model of HTTP-style communication.
- Within this framework we currently support access to
-
- http,
- gopher,
- ftp,
- file, and
- mailto
-
- resources.
-
- <li> Support basic authorization
-
- <li> Transparent redirect handling
-
- <li> Supports proxy
-
- <li> URL handling (both absolute &amp; relative)
-
- <li> RobotRules (a parser for robots.txt files)
-
- <li> MailCap handling
-
- <li> HTML parser and formatter (PS and plain text)
-
- <li> The library be used through the full object oriented interface
- or through a very simple procedural interface.
-
- <li> A simple command line client application that is called
- <em>request</em>.
-
- <li> The library can cooperate with Tk.
- A simple Tk-based GUI browser is distributed with the Tk
- extention for perl.
-</ul>
-
-
-
-
-<h2>HTTP style communication</h2>
-
-The libwww-perl library is based on HTTP style communication. What
-does that mean? This is a quote from the <a
-href="http://www.w3.org/pub/WWW/Protocols/">HTTP specification</a>
-document:
-
-<blockquote>
-<p>The HTTP protocol is based on a request/response paradigm. A client
-establishes a connection with a server and sends a request to the
-server in the form of a request method, URI, and protocol version,
-followed by a MIME-like message containing request modifiers, client
-information, and possible body content. The server responds with a
-status line, including the message's protocol version and a success or
-error code, followed by a MIME-like message containing server
-information, entity metainformation, and possible body content.
-</blockquote>
-
-<p>What this means to libwww-perl is that communcation always take
-place by creating and configuring a <em>request</em> object. This
-object is then passed to a server and we get a <em>response</em>
-object in return that we can examine. The same simple model is used
-for any kind of service we want to access.
-
-<p>If we want to fetch a document from a remote file server we send it
-a request that contains a name for that document and the response
-contains the document itself. If we want to send a mail message to
-somebody then we send the request object which contains our message to
-the mail server and the response object will contain an acknowledgment
-that tells us that the message has been accepted and will be forwarded
-to the receipients.
-
-<p>It is as simple as that!
-
-<h3>Request object</h3>
-
-The request object has the class name <em>HTTP::Request</em> in
-libwww-perl. The fact that the class name use <em>HTTP::</em> as a
-name prefix only implies that we use this model of communication. It
-does not limit the kind of services we can try to send this
-<em>request</em> to.
-
-The main attributes of <em>HTTP::Request</em> objects are:
-
-<ul>
-
- <li> The <b>method</b> is a short string that tells what kind of
- request this is. The most usual methods are <em>GET</em>,
- <em>POST</em> and <em>HEAD</em>.
-
- <li> The <b>url</b> is a string denoting the protocol, server and
- the name of the "document" we want to access. The url might
- also encode various other parameters. This is the name of the
- resource we want to access.
-
- <li> The <b>headers</b> contain additional information about the
- request and can also used to describe the content. The headers
- is a set of keyword/value pairs.
-
- <li> The <b>content</b> is an arbitrary amount of binary data.
-
-</ul>
-
-
-
-<h3>Response object</h3>
-
-The request object has the class name <em>HTTP::Response</em> in
-libwww-perl. The main attributes of objects of this class are:
-
-<ul>
- <li> The <b>code</b> is a numerical value that encode the overall
- outcome of the request.
-
- <li> The <b>message</b> is a short (human readable) string that
- corresponds to the <em>code</em>.
-
- <li> The <b>headers</b> contain additional information about the
- response and they describe the content.
-
- <li> The <b>content</b> is an arbitrary amount of binary data.
-
-</ul>
-
-Since we don't want to handle the <em>code</em> directly in our
-programs the libwww-perl response object have methods that can be used
-to query the kind of code present:
-
-<ul>
-
- <li> <b>isSuccess</b>
- <li> <b>isRedirect</b>
- <li> <b>isError</b>
-
-</ul>
-
-
-<h3>User Agent</h3>
-
-Ok, I have created this nice <em>request</em> object. What do I do
-with it?
-
-<p>The answer is that you pass it on to the <em>user agent</em> object
-and it will take care of all the things that need to be done
-(low-level communcation and error handling) and the user agent will
-give you back a <em>response</em> object. The user agent represents
-your application on the network and it provides you with an interface
-that can accept <em>requests</em> and will return <em>responses</em>.
-
-<p><i>There should be a nice figure here explaining this. It should
-show the UA as an interface layer between the application code and the
-network.</i>
-
-<p>The libwww-perl class name for the user agent is
-<em>LWP::UserAgent</em>. Every libwww-perl application that wants to
-communicate should create at least one object of this kind. The main
-method provided by this object is <em>request()</em>. This method
-takes an <em>HTTP::Request</em> object as argument and will return a
-<em>HTTP::Response</em> object.
-
-<p>The <em>LWP::UserAgent</em> has many other attributes that lets you
-configure how it will interact with the network and with your
-application code.
-
-<ul>
-
- <li> The <b>timeout</b> specify how much time we give remote servers
- in creating responses before the library creates an internal
- <em>timeout</em> response.
- <li> The <b>agent</b> specify the name that your application should
- present itself as on the network.
- <li> The <b>useAlarm</b> specify if it is ok for the user agent to
- use the alarm(3) system to implement timeouts.
- <li> The <b>useEval</b> specify if the agent should raise an
- expection (<em>die</em> in perl) if an error condition occur.
-
- <li> The <b>proxy</b> and <b>noProxy</b> specify when communication
- should go through a <a
- href="http://www.w3.org/pub/WWW/Proxies/">proxy server</a>.
-
- <li> The <b>credentials</b> provide a way to set up usernames and
- passwords that is needed to access certain services.
-
-</ul>
-
-<p>Many applications would want even more control over how they
-interact with the network and they get this by specializing the
-<em>LWP::UserAgent</em> by sub-classing.
-
-<!-- I don't want to describe these!!!
-<ul>
-
- <li> simpleRequest()
- <li> redirectOK()
- <li> credentials()
- <li> getBasicCredentials()
- <li> mirror
-
-</ul>
--->
-
-<h1>Examples</h1>
-
-Let's turn to a few examples to illustrate the concepts described
-above. You should be able to run these examples directly given that
-you have both perl and libwww-perl <a
-href="install.html">installed</a> on your system. If you store the
-examples in files you might want to change the first line (#!....) to
-reflect the location of the perl interpreter on your system.
-
-<a name="ex1"><h3>Example 1</h3></a>
-<hr>
-<pre>
-#!/local/bin/perl -w
-
-require LWP;
-
-$ua = new LWP::UserAgent;
-
-$request = new HTTP::Request 'GET', 'http://www.perl.com/perl/';
-$request->header('Accept', 'text/html');
-
-$response = $ua->request($request);
-
-if ($response->isSuccess) {
- die "This is bad" if $response->header('Content-Type') ne 'text/html';
- print $response->content;
-} else {
- die "Request failed: " . $response->code . " " . $response->message . "\n";
-}
-
-</pre>
-<hr>
-
-This example show a simple application that fetch an HTML document
-with the name <a
-href="http://www.perl.com/perl/">http://www.perl.com/perl/</a> from
-the network and then prints it out (without reformatting). What is
-going on is the following:
-
-<ul>
-
- <li> First the statement <em>"require LWP;"</em> is needed to make the
- libwww-perl classes available to the application.
-
- <li> The next thing that happens is that we create an user agent
- object and assigns the reference to this object to the variable
- <em>$ua</em>.
-
- <li> Then we create a request object and assing it to the
- <em>$request</em> variable. The request object is initialized
- with the method <em>GET</em> and the URL <em>http://www....</em>.
-
- <li> The next thing that happens it that we configure the request by
- adding an <em>Accept</em> header to it. This header informs the
- server serving this request that we want an HTML document back.
-
- <li> Then we hand the request object over to the user agent and we
- receive a response object in return. We assign the response
- object to the $response variable.
-
- <li> Then we check the response to see that it really was a
- successful response and if it is we print the content (i.e. the
- document) that comes with the response.
-
- <li> If it was not a successful response we print an error message
- and die. <em>You might want to try to change the URL so that
- you get an unsuccessful response back!</em>
-
-</ul>
-
-Was this complicated for something as simple as retrieving a simple
-file from a network server? Let's take a look at how we can make the
-same thing much simpler.
-
-<a name="ex2"><h3>Example 2</h3></a>
-
-<hr>
-<pre>
-#!/local/bin/perl -w
-use LWP::Simple;
-getprint 'http://www.perl.com/perl/';
-</pre>
-<hr>
-
-In this example we have used a module called <em>LWP::Simple</em>.
-This two-line program essentially does the same as the code in <a
-href="#ex1">example 1</a>. The <em>LWP::Simple</em> module provide a
-very simplied procedural interface to the libwww-perl library. After
-you have executed the <em>use LWP::Simple;</em> statement you have
-access to the following routines:
-
-<ul>
- <li> <em>get($url)</em><br> Takes an URL as an argument and returns the
- content. Returns <em>undef</em> if an error occured.
-
- <li> <em>getprint($url)</em>
- <li> <em>head($url)</em> --&gt;
- ($content_type,
- $document_size,
- $modified_time, $expires, $server)
- <li> <em>mirror($url, $file)</em>
-
-
-</ul>
-
-<p>The LWP::Simple module is also suitable for direct invocation from
-the command line. The following command is equivalent with the script
-above:
-
-<pre>
- perl -MLWP::Simple -e "getprint 'http://www.perl.com/perl/'"
-</pre>
-
-<p>Let's write a more "complete" web browser...
-<hr>
-<pre>
-#!/local/bin/perl -w
-use LWP::Simple;
-getprint shift || die "Usage: $0 &lt;url&gt;\n";
-</pre>
-<hr>
-
-
-<a name="ex3"><h3>Example 3</h3></a>
-
-Process data as it arrives from the network. Use a callback routine.
-
-<h3>Example 4</h3>
-
-Let's try to reformat the document using the HTML formatter.
-
-<p>Let's write an even more "complete" web browser...
-<hr>
-<pre>
-#!/local/bin/perl -w
-use LWP::Simple;
-use HTML::Parse;
-print parse_html(get shift || die "Usage : $0 &lt;url&gt;\n")->format;
-<pre>
-</hr>
-
-<p>Invoke a viewer for the document (MailCap)
-
-<h3>Example 5</h3>
-
-<ul>
-
- <li> A robot
-
- <li> Postscript output using the font metrics modules
-
- <li> Base64/Quoted printable
-
- <li> URL handling
-
- <li> HTTP headers
-
-</ul>
-
-
-
-
-
-
-
View
45 doc/features.html
@@ -1,45 +0,0 @@
-<title>libwww-perl-5 features</title>
-
-<h1>Features</h1>
-
-<ul>
-
- <li> Provides an OO model of HTTP style communication. Within this
- framework we support:
-
- <ul>
- <li> http
- <li> gopher
- <li> ftp
- <li> file
- <li> mailto
- </ul>
-
- <li> Contains several reuseable components that can be used
- separately.
-
- <li> Support basic authorization
-
- <li> Transparent redirect handling
-
- <li> Supports proxy
-
- <li> URL handling (both absolute &amp; relative)
-
- <li> RobotRules (a parser for robots.txt files)
-
- <li> MailCap handling
-
- <li> HTML parser and formatter (PS and plain text)
-
- <li> The library provides both an OO interface and a simple
- procedural interface.
-
- <li> A simple command line client application that is called
- <em>request</em>.
-
- <li> The library can cooperate with Tk.
- A simple Tk-based GUI browser is distributed with the Tk
- extention for perl.
-</ul>
-
View
36 doc/install.html
@@ -1,36 +0,0 @@
-<title>Installation of libwww-perl-5</title>
-
-<h1>How do you install libwww-perl-5?</h1>
-
-<ol>
-
- <li> First you have to obtain a copy of the <em>libwww-perl-5</em>
- package. It is distributed as a gzipped tar file which you must
- unpack at some suitable location in your file system.
- The most recent version of the library should be available at &lt;URL:<a
- href="http://www.sn.no/libwww-perl/">http://www.sn.no/libwww-perl/</a>&gt;
- as well as on the <a
- href="http://www.perl.com/perl/CPAN/CPAN.html">CPAN</a>.
-
- <li> Read the <em>README</em> file that comes with the package. It
- should mention what version of perl that is required.
-
- <li> Check perl version. Try to run the command <em>"perl -v"</em>. It should
- print a message like this <em>"This is perl, version
- 5.002"</em>. If it prints <em>"perl: command not found"</em> then you
- need to install perl before you can proceed :-)
-
- <li> Run the following commands (the '$' is the shell prompt and might
- be different on your system and the <em>xx</em> will depend on
- the version of the library that you are installing):
-<pre>
- $ cd libwww-perl-5.xx
- $ perl Makefile.PL
- $ make
- $ make test
- $ make install
-</pre>
-
- <li> If it looks like these steps succeeded then that's it!
-
-</ol>
View
314 doc/norobots.html
@@ -1,314 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
-<html>
-<head>
-<title>A Standard for Robot Exclusion</title>
-</head>
-<body>
-
-<h1>A Standard for Robot Exclusion</h1>
-
-Table of contents:
-
-<ul>
-<li>
-
-<a href="#status">
-Status of this document
-</a>
-
-<li>
-
-<a href="#introduction">
-Introduction
-</a>
-
-<li>
-
-<a href="#method">
-Method
-</a>
-
-<li>
-
-<a href="#format">
-Format
-</a>
-
-<li>
-
-<a href="#examples">
-Examples
-</a>
-
-<li>
-
-<a href="#code">
-Example Code
-</a>
-
-<li>
-
-<a href="#author">
-Author's Address
-</a>
-
-</ul>
-<hr>
-
-<h2><a name="status">Status of this document</a></h2>
-
-This document represents a consensus on 30 June 1994 on the robots
-mailing list (robots-request@nexor.co.uk), between the majority of
-robot authors and other people with an interest in robots. It has
-also been open for discussion on the Technical World Wide Web
-mailing list (www-talk@info.cern.ch). This document is based on a
-previous working draft under the same title.
-
-<p>
-
-It is not an official standard backed by a standards body,
-or owned by any comercial organisation.
-
-It is not enforced by anybody, and there no guarantee that
-all current and future robots will use it.
-
-Consider it a common facility the majority of robot authors
-offer the WWW community to protect WWW server against
-unwanted accesses by their robots.</p>
-
-<p>
-
-The latest version of this document can be found on
-<a href="http://web.nexor.co.uk/mak/doc/robots/norobots.html">
-http://web.nexor.co.uk/mak/doc/robots/norobots.html</a>.</p>
-
-<hr>
-
-<h2><a name="introduction">Introduction</a></h2>
-
-WWW Robots (also called wanderers or spiders) are programs
-that traverse many pages in the World Wide Web by
-recursively retrieving linked pages. For more information
-see <a href="robots.html">the robots page</a>.
-
-<p>
-
-In 1993 and 1994 there have been occasions where robots
-have visited WWW servers where they weren't welcome for
-various reasons. Sometimes these reasons were robot specific,
-e.g. certain robots swamped servers with rapid-fire
-requests, or retrieved the same files repeatedly.
-In other situations robots traversed parts of WWW servers
-that weren't suitable, e.g. very deep virtual trees,
-duplicated information, temporary information, or
-cgi-scripts with side-effects (such as voting).</p>
-
-<p>
-
-These incidents indicated the need for established
-mechanisms for WWW servers to indicate to robots which parts
-of their server should not be accessed. This standard
-addresses this need with an operational solution.</p>
-
-<hr>
-
-<h2><a name="method">The Method</a></h2>
-
-The method used to exclude robots from a server is to
-create a file on the server which specifies an access
-policy for robots.
-
-This file must be accessible via HTTP on the local URL
-"<code>/robots.txt</code>".
-The contents of this file are specified <a href="#format">below</a>.
-
-<p>
-
-This approach was chosen because it can be easily
-implemented on any existing WWW server, and a robot can find
-the access policy with only a single document retrieval.</p>
-
-<p>
-
-A possible drawback of this single-file approach is that only a
-server administrator can maintain such a list, not the
-individual document maintainers on the server. This can be
-resolved by a local process to construct the single file
-from a number of others, but if, or how, this is done is
-outside of the scope of this document.</p>
-
-<p>
-
-The choice of the URL was motivated by several criteria:</p>
-
-<ul>
-<li>
-
-The filename should fit in file naming restrictions of all
-common operating systems.
-
-<li>
-
-The filename extension should not require extra server
-configuration.
-
-<li>
-
-The filename should indicate the purpose of the file
-and be easy to remember.
-
-<li>
-
-The likelihood of a clash with existing files should
-be minimal.
-
-</ul>
-<hr>
-
-<h2><a name="format">The Format</a></h2>
-
-The format and semantics of the "<code>/robots.txt</code>" file
-are as follows:
-
-<p>
-
-The file consists of one or more records separated by one or
-more blank lines (terminated by CR,CR/NL, or NL). Each
-record contains lines of the form
-"<code>&lt;field&gt;:&lt;optionalspace&gt;&lt;value&gt;&lt;optionalspace&gt;</code>".
-The field name is case insensitive.</p>
-
-<p>
-
-Comments can be included in file using UNIX bourne shell
-conventions: the '<code>#</code>' character is used to
-indicate that preceding space (if any) and the remainder of
-the line up to the line termination is discarded.
-Lines containing only a comment are discarded completely,
-and therefore do not indicate a record boundary.</p>
-
-<p>
-The record starts with one or more <code>User-agent</code>
-lines, followed by one or more <code>Disallow</code> lines,
-as detailed below. Unrecognised headers are ignored.</p>
-
-<dl>
-<dt>User-agent</dt>
-<dd>
-
-The value of this field is the name of the robot the
-record is describing access policy for.
-
-<p>
-If more than one User-agent field is present the record
-describes an identical access policy for more
-than one robot. At least one field needs to be present
-per record.</p>
-
-<p>
-The robot should be liberal in interpreting this field.
-A case insensitive substring match of the name without
-version information is recommended.</p>
-
-<p>
-
-If the value is '<code>*</code>', the record describes
-the default access policy for any robot that has not not
-matched any of the other records. It is not allowed to
-have two such records in the "<code>/robots.txt</code>"
-file.</p></dd>
-
-<dt>Disallow</dt>
-<dd>
-
-The value of this field specifies a partial URL that is not
-to be visited. This can be a full path, or a partial
-path; any URL that starts with this value will not be
-retrieved. For example, <code>Disallow: /help</code>
-disallows both <code>/help.html</code> and
-<code>/help/index.html</code>, whereas
-<code>Disallow: /help/</code> would disallow
-<code>/help/index.html</code>
-but allow <code>/help.html</code>.
-
-<p>
-
-Any empty value, indicates that all URLs can be
-retrieved. At least one Disallow field needs to
-be present in a record.</p></dd>
-
-</dl>
-
-The presence of an empty "<code>/robots.txt</code>" file
-has no explicit associated semantics, it will be treated
-as if it was not present, i.e. all robots will consider
-themselves welcome.
-
-<hr>
-
-<h2><a name="examples">Examples</a></h2>
-
-The following example "<code>/robots.txt</code>" file specifies
-that no robots should visit any URL starting with
-"<code>/cyberworld/map/</code>" or
-"<code>/tmp/</code>:
-
-<hr>
-<pre>
-# robots.txt for http://www.site.com/
-
-User-agent: *
-Disallow: /cyberworld/map/ # This is an infinite virtual URL space
-Disallow: /tmp/ # these will soon disappear
-</pre>
-<hr>
-
-This example "<code>/robots.txt</code>" file specifies
-that no robots should visit any URL starting with
-"<code>/cyberworld/map/</code>", except the robot called
-"<code>cybermapper</code>":
-
-<hr>
-<pre>
-# robots.txt for http://www.site.com/
-
-User-agent: *
-Disallow: /cyberworld/map/ # This is an infinite virtual URL space
-
-# Cybermapper knows where to go.
-User-agent: cybermapper
-Disallow:
-</pre>
-<hr>
-
-This example indicates that no robots should visit
-this site further:
-
-<hr>
-<pre>
-# go away
-User-agent: *
-Disallow: /
-</pre>
-<hr>
-
-<h2><a name="code">Example Code</a></h2>
-
-Although it is not part of this specification, some example code
-in Perl is available in <a href="norobots.pl">norobots.pl</a>. It
-is a bit more flexible in its parsing than this document
-specificies, and is provided as-is, without warranty.
-
-<h2><a name="author">Author's Address</a></h2>
-
-<address>
-<a href="/mak/mak.html">Martijn Koster</a>
-&lt;m.koster@webcrawler.com&gt;<br>
-NEXOR<br>
-PO Box 132, <br>
-Nottingham,<br>
-The United Kingdom<br>
-Phone: +44 602 520576
-</address>
-</body>
-</html>

0 comments on commit 34ae651

Please sign in to comment.
Something went wrong with that request. Please try again.