## Using Pynorama

### Defining a view

To create a new pynorama view, we first have to derive from the View base class.
Let's have a look at an example from the examples folder:

In [1]:
import pandas as pd
from nltk.corpus import reuters
from nltk import sent_tokenize, word_tokenize

from pynorama import View, make_config
from pynorama.table import PandasTable
from pynorama.logging import logger
from pynorama.exceptions import RecordNotFound

class ReutersView(View):
    def __init__(self):
        super(ReutersView, self).__init__(
            name='reuters',
            description='nltk\'s reuters corpus')

    def load(self):
        logger.info('Starting processing reuters dataset.')
        self.df = pd.DataFrame([{
            'id': id,
            'abspath': str(reuters.abspath(id)),
            'categories': [c+' ' for c in reuters.categories(id)],
            'headline': reuters.raw(id).split('\n', 1)[0],
            'length': len(reuters.raw(id))
        } for id in reuters.fileids()])
        logger.info('Finishing processing reuters dataset.')

    def get_table(self):
        return PandasTable(self.df)

    def get_pipeline(self):
        return {
            'raw': { 'viewer': 'raw'},
            'doctree': {'parents': ['raw'],
                        'viewer': 'doctree'}
        }

    def get_record(self, key, stage):
        rawdoc = reuters.raw(key)
        if stage == 'raw':
            return rawdoc
        if stage == 'doctree':
            return [word_tokenize(sent) for sent in sent_tokenize(rawdoc)]
        raise RecordNotFound(key, stage);


    def get_config(self):
        return make_config('id',
            available_transforms=["nans", "search", "quantile_range"],
            initial_visible_columns=["id"])


Let's step through each of the methods. [TODO: with screenshots]

* `__init__` is called when you initialise your dataset and can be used for once-only initializations and assignments.

* `load` should be used to load resources. It is called once upon registration and every time the *reload* button in the top left corner is clicked.
 
* `get_table` is responsible for the contents of the table. PandasTable is a subclass of `pynorama.table.Table` and provides functionality to transform (i.e. filter) the table based on user actions. Pynorama comes with out-of-the-box for pandas DataFrames and Mongo DB collections as table. A different set of filters is available for both. The table is requested every time a user edited a filter or changed a page in the table.
 
* `get_pipeline` defines the different stages of your pipeline, which are later rendered as a graph. You return a dictionary of stages and their coniguration, see available options [in this section](#Pipeline-definition). This function is called upon loading of the HTML and upon reload of the view.

* `get_record` returns the content that will be displayed by the chosen viewer for the selected stage. Viewers expect data to be in a certain format. This function is called when a user has selected a document and a stage.

* `get_config` is useful adapting the user interface in some cases without having to write Javascript. The function has to return a nested dict that is then converted to JSON and given to the user interface. `make_config` is a util function that creates this unhandy nested dictionary for some parameters. See [below](#Configuring-the-user-interface) for more information about the config. This function is called upon loading the HTML of this view. 

### Pipeline definition

`get_pipeline` expects a dictionary defining the different stages of your pipeline, which are later rendered as a graph. For each stage name as key, the value is another dictionary with configuration options for that stage. The following options are available:
* parents: an array of the stages that acted as input to the current stage. This will create visual connections in the graph.
* viewer: the front-end viewer that should be used to display a record of this stage. See [below](#Viewers) for more information about viewers.
* parameters for the selected viewer depending on the type of viewer chosen
* color: the background color for the node in the pipeline graph
* TODO: more options
   

### Viewers

Pynorama comes with the following viewers out-of-the-box, each expecting a certain input format and some requiring additional parameters: [TODO: screenshot]

* `json`: A json object inspector of the JSON-serialized record returned by `get_record`. If no viewer was given `json` is assumed.
* `pdf`: TODO
* `doctree`: Renders a nested tree of words.
* `xml`: Renders an interactive tree of an xml document that was returned by `get_record` as a string.
* `raw`: Renders a string representation of the record, while preserving whitespace and line-breaks. 

### Configuring the user interface

### Defining a session store

TODO: screenshot

Users can store the state of the user interface in a view at any given point in sessions. Storing these is the responsibility of the session store. Pyonrama comes with out-of-the-box support for:

* Transient storage (`InMemorySessionStore`)
* JSON files (`JsonFileSessionStore`)
* Mongo DB Collection (`MongoSessionStore`)

By default, Pynorama uses the `InMemorySessionStore`, which requires no configuration. Hence, **sessions are lost** after the server is stopped. **To store sessions permanently**, supply a session store as the first argument to `make_server`.

To define your own sessions store, inherit SessionStore and override `save_sessions` and `load_sessions`. You don't have worry about caching sessions in memory, as that is the SessionStore's responsibility. Have a look at the source code of the other stores to see how they work.

### Deployment

For development, you can simply execute the python files or alternatively use the `flask run` command ([see here](http://flask.pocoo.org/docs/latest/quickstart/)). Since `make_server` just returns the flask application object, you can use all of flask's deployment options ([see here](http://flask.pocoo.org/docs/latest/deploying/)).

### Examples

The examples folder is a great way to explore the possibilities of pynorama.
Note that the examples have extra dependencies on:

* nltk (in particular the *reuters* corpus has to be installed using `nltk.download()`)

The entry point for the examples application is *server.py*. The following views are available:

* reuters: example demonstrating visualization of a corpus of news data.

## Extending Pynorama

### Setting up the Javascript development environment

The Pynorama front-end is a [React](https://reactjs.org/) project and uses [Webpack](https://webpack.js.org/) to transpile and bundle the javascript. When you install python using pip only bundles and no source files are included. To develop the front-end code, you have to:

* clone the git repo using `git clone https://github.com/manahl/pynorama`.
* go to the pynorama root folder using `cd pynorama`.
* execute `pip install -e .` to install pynorama in developer mode
* go to the pynorama-js folder using `cd pynorama-js`.
* execute `npm install` ([node_js](https://nodejs.org/en/) is required) to install depedencies.

Now you can develop the front-end code.

To debug, run `debug.sh [PORT]` in `pynorama-js`, which starts a local webpack-dev-server at the given port, transpiling the webpack bundles as you change the source files. Start your Pynorama server the usual way, but when you open your view in the browser, add the parameter webpack_dev_port=[PORT] to your url, (e.g. http://localhost:5000/view/example/?webpack_dev_port=5001). To build the bundles, use `build.sh` in `pynorama-js`. For non-unix systems or special requirements, have a look inside `debug.sh` and `build.sh`.

### Front-end project structure

TODO

### Table

TODO