Skip to content

Commit

Permalink
[docs] updating citations and more #72
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Nov 19, 2018
1 parent 90cc814 commit 486104e
Show file tree
Hide file tree
Showing 5 changed files with 137 additions and 2,381 deletions.
73 changes: 63 additions & 10 deletions docs/api/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ pre-installed for the webservice. The NLP application is usually expected to pro
subsequently made available through the webservice for viewing and downloading. CLAM can, however, just as well be used
in fields other than NLP.

The CLAM webservice is a RESTful webservice:raw-latex:`\citep{REST}`, meaning it uses the HTTP verbs GET, POST, PUT and
The CLAM webservice is a RESTful webservice [Fielding2000]_, meaning it uses the HTTP verbs GET, POST, PUT and
DELETE to manipulate resources and returns responses using the HTTP response codes and XML. The principal resource in
CLAM is called a *project*. Various users can maintain various projects, each representing one specific run of the
system, with particular input data, output data, and a set of configured parameters. The projects and all data is stored
on the server.

The webservice responds in the CLAM XML format. An associated XSL stylesheet :raw-latex:`\citep{XSLT}` can directly
The webservice responds in the CLAM XML format. An associated XSL stylesheet [XSLT]_ can directly
transform this to xhtml in the user’s browser, thus providing a standalone web application for human end-users.

The most notable features of CLAM are:
Expand Down Expand Up @@ -77,11 +77,14 @@ Gompel (2014). CLAM: Computational Linguistics Application Mediator. Documentati
CLAM is open-source software licensed under the GNU Public License v3, a copy of which can be found along with the
software.

.. [Fielding2000] R. T. Fielding (2000). Architectural Styles and the DEsign of Network-based Software Architecture. Doctoral Dissertation. University of California, Irvine. `(HTML) <http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm>`_
.. [XSLT] J. Clark (1999). XSL Transformations (XSLT) Version 1.0. W3C Recommendation. http://www.w3.org/TR/xslt
Technical details
-----------------

CLAM is written in Python :raw-latex:`\citep{PYTHON}`, and is built on
the Flask framework. [1]_ It can run stand-alone thanks to the built-in
CLAM is written in Python [python]_, and is built on
the Flask framework [flask]_. It can run stand-alone thanks to the built-in
webserver; no additional webserver is needed to test your service. In
production environments, it is however strongly recommended that CLAM is
integrated into a real webserver. Supported are: Apache, nginx or
Expand All @@ -91,6 +94,9 @@ The software is designed for Unix-based systems (e.g. Linux or BSD)
only. It also has been verified to run on Mac OS X as well. Windows is
not supported.

.. [python] Python Software Foundation. Python Language Reference. Available at https://www.python.org
.. [flask] http://flask.pocoo.org
Intended Audience
-----------------

Expand All @@ -112,25 +118,72 @@ clients to communicate with the aforemented webservice.
This documentation is not intended for end users using only the web
application interface.

Architecture
--------------

CLAM has a layered architecture, with at the core the command line
application(s) you want to turn into a webservice. The application
itself can remain untouched and unaware of CLAM. The scheme in
the figure below illustrates the various layers. The
workflow interface layer is not provided nor necessary, but shows a
possible use-case.

.. figure:: architecture.png
:alt: The CLAM Architecture
:name: fig:arch
:width: 130mm

The CLAM Architecture

CLAM presents two different paradigms for wrapping your script or
application. The second is a new addition since CLAM 0.9.11 . You may
use either or both at the same time.

#. **Project Paradigm** – Users create projects, upload files with
optional parameters to those projects, and subsequently start the
project, optionally passing global parameters to the system. The
system may run for a long time and may do batch-processing on
multiple input files.

#. `Action Paradigm <#sec:actions>`_ – This is a more limited, and simple
remote-procedure call mechanism. Users interact in real-time with the
service on specific URLs, passing parameters. Unlike the project
paradigm, this is not suitable for complex operations on big-data.

A CLAM webservice needs the following three components from the service
developer:

#. A `service configuration <#sec:serviceconfiguration>`_

#. A `wrapper script <#sec:wrapperscript>`_ for your command line application;

#. A command line application (your NLP tool)

The wrapper script is not strictly mandatory if the command line
application can be directly invoked by CLAM. However, for more complex
applications, writing a wrapper script is recommended, as it offers more
flexibility and better integration, and allows you to keep the actual
application unmodified. The wrapper scripts can be seen as the “glue”
between CLAM and your application, taking care of any translation steps.

Note that wrapper scripts in the action paradigm are more constrained,
and there may be multiple wrapper scripts for different actions.


Table of Contents
-----------------------

.. toctree::
:maxdepth: 3
:glob:

installation
gettingstarted
serviceconfiguration
wrapperscript
deployment
client
troubleshooting
restapi
*






Running a test webservice
Expand Down
8 changes: 3 additions & 5 deletions docs/api/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,11 @@ an interactive Python interpreter and writing: ``import clam``\ ”
LaMachine: a meta-distribution with CLAM
---------------------------------------------

We also offer *LaMachine*, an environment with CLAM and various CLAM
We also offer `LaMachine <https://proycon.github.io/LaMachine>`_, an environment with CLAM and various CLAM
webservices pre-installed, along with a lot of other NLP software. It is
available as a Virtual Machine, Docker container, as well as a virtual
environment through a native installation script. It is designed to
facilitate installation of our software. See
https://github.com/proycon/lamachine for details.

facilitate installation of our software.

Installation Details
-------------------------
Expand Down Expand Up @@ -116,7 +114,7 @@ Apache and nginx.
Source Code Repository
---------------------------

The CLAM source code is hosted on `github <https://github.com/proycon/clam>`_.
The CLAM source code is hosted on `Github <https://github.com/proycon/clam>`_.

If you want to work with the latest development release of CLAM rather than the latest stable version. You can cloning this git
repository is done as follows:
Expand Down
46 changes: 25 additions & 21 deletions docs/api/source/serviceconfiguration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ User Authentication

Being a RESTful webservice, user authentication proceeds over HTTP
itself. CLAM implements HTTP Basic Authentication, HTTP Digest
Authentication :raw-latex:`\cite{HTTPAUTH}` and OAuth2
:raw-latex:`\cite{OAUTH2}`. HTTP Digest Authentication, contrary to HTTP
Authentication [Franks1999]_ and OAuth2
[Hardt2012]_. HTTP Digest Authentication, contrary to HTTP
Basic Authentication, computes a hash of the username and password
client-side and transmits that hash, rather than a plaintext password.
User passwords are therefore only available to CLAM in hashed form and
Expand Down Expand Up @@ -172,7 +172,11 @@ common use would be to define one user to be the guest user, for instance the us

In production environments, you will also want to set ``SECRET_KEY`` to
a string value that is kept strictly private. It is used for
cryptographically signing session data and preventing CSRF attacks. [3]_
cryptographically signing session data and preventing CSRF attacks (`details <http://flask.pocoo.org/docs/0.10/quickstart/#sessions>`_).

.. [Franks1999] J. Franks, P. Hallam-Baker, J. Hostelter, S. Lawrence, P.Leach, A. Luotonen and L. Stewart (1999). HTTP Authentication: Basic and Digest Access Authentication (RFC2617). The Internet Engineering Task Force (IETF). `(HTML) <http://tools.ietf.org/html/rfc2617>`_
.. [Hardt2012] D. Hardt (2012) The OAuth 2.0 Authorization Framework (RFC6749). `(Text) <http://www.rfc-editor.org/rfc/rfc6749.txt`_
MySQL backend
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -260,7 +264,7 @@ this variable name is just an example in a CLARIN-NL context.
OAuth2
~~~~~~~~~

CLAM also implements OAuth2 :raw-latex:`\cite{OAUTH2}`, i.e. it acts as
CLAM also implements OAuth2 [Hardt2012]_, i.e. it acts as
a client in the OAuth2 Authorization framework. An external OAuth2
authorization provider is responsible for authenticating you, using your
user credentials to which CLAM itself will never have access. Many
Expand All @@ -282,8 +286,8 @@ used with the built-in webserver but requires being embedded in a
webserver such as Apache2, with SSL support.

When the user approaches the CLAM webservice, he/she will need to pass a
valid access token. If none is passed, the user is instantly delegated
to the OAuth2 authorization provider [5]_. The authorization provider
valid access token. If none is passed, the user is instantly delegated (HTTP 303)
to the OAuth2 authorization provider. The authorization provider
makes available a URL for authentication and for obtaining the final
access token. These are configured as follows in the CLAM service
configuration file:
Expand Down Expand Up @@ -334,7 +338,7 @@ for token encryption has to be configured through
interface will furthermore explicitly ask users to log out. Logging out
is done by revoking the access token with the authorization provider.
For this to work, your authentication provider must offer a revoke URL,
as described in RFC7009 [6]_, which you configure in your service
as described in `RFC7009 <https://tools.ietf.org/html/rfc7009>`_, which you configure in your service
configuration file as follows:

.. code-block:: python
Expand Down Expand Up @@ -482,7 +486,7 @@ within the file itself, as is also not the case in the example of plain
text files. CLAM therefore builds external metadata files for each input
and output file. These files contain all metadata of the files they
describe. These are stored in the CLAM Metadata XML format, a very
simple and straightforward format. [8]_ Metadata simply consists of
simple and straightforward format. Metadata simply consists of
metadata fields and associated values.

Metadata in CLAM is tied to a particular file format (such as plain text
Expand Down Expand Up @@ -566,7 +570,7 @@ parameters can be subdivided into parameter groups, but these serve only
presentational purposes.

There are seven parameter types available, though custom types can be
easily added. [9]_ Each parameter type is a Python class taking the
easily added. Each parameter type is a Python class taking the
following mandatory arguments:

#. **``id``** – An id for internal use only.
Expand Down Expand Up @@ -1068,10 +1072,10 @@ however will be provided out of the box. Note that the actual conversion
will be performed by 3rd party software in most cases.

- ``MSWordConverter`` – Convert MS Word files to plain text. This
converter uses the external tool ``catdoc`` by default. [10]_
converter uses the external tool `catdoc <http://www.wagner.pp.ru/~vitus/software/catdoc/>`_ by default.

- ``PDFConverter`` – Convert PDF to plain text. This converter uses the
external tool ``pdftohtml`` by default. [11]_
external tool `pdftohtml <http://pdftohtml.sourceforge.net/>`_ by default.

- ``CharEncodingConverter`` – Convert between plain text files in
different character encodings.
Expand All @@ -1095,7 +1099,7 @@ any metafield actors.
The below example illustrates the use of the viewer
``SimpleTableViewer``, capable of showing CSV files:

::
.. code-block:: python
OutputTemplate('freqlist',CSVFormat,"Frequency list",
SimpleTableViewer(),
Expand Down Expand Up @@ -1302,15 +1306,15 @@ interface.

Actions will show in the web-application interface on the index page.

In this example, we specify two parameters, they will be passed *in the
order they are defined* to the script. The command to be called is
configured analagous to ``COMMAND``, but only a subset of the variables
are supported. The most prominent is the ``$PARAMETERS`` variable. Note
that you can set ``paramflag`` on the parameters to pass them with an
option flag. String parameters with spaces will work without
problem [12]_. Actions do not have the notion of the CLAM XML datafile
that wrapper scripts in the project paradigm can use, so passing
command-line parameters is the only way here.
In this example, we specify two parameters, they will be passed *in the order
they are defined* to the script. The command to be called is configured
analagous to ``COMMAND``, but only a subset of the variables are supported. The
most prominent is the ``$PARAMETERS`` variable. Note that you can set
``paramflag`` on the parameters to pass them with an option flag. String
parameters with spaces will work without problem (be ware that shells do have a
maximum length for all parameters combined). Actions do not have the notion of
the CLAM XML datafile that wrapper scripts in the project paradigm can use, so
passing command-line parameters is the only way here.

It may, however, not even be necessary to invoke an external script.
Actions support calling Python functions directly. Consider the
Expand Down

0 comments on commit 486104e

Please sign in to comment.