# Example Notebook for annotator package
In the following the basic functionalities of the package are introduced. We will load basic text in english and german and annotate it using the available features.

In [None]:
# import annotator.base as be
# import annotator.mspacy as msp

import base as be
import mspacy as msp

# The input
The input for the package would normally be set up in an input.json file passed to the package.  There are two example .json files in this directory which we will use. Lets look at the contents of example_en.json. The first few parameters we encounter are:
```
    "input": "example_en.txt",
    "output": "output_en",
    "tool": "spacy",
```
These tell the programm that the data we want to annotate is stored in example_en.txt, that we want to output to a file which we can identify as output_en and that we want to use the tool spacy to annotate the data. There will be more tools to chose from, but for simplicity their configurations have been stripped for now. The spacy specific config is found in the `"spacy_dict"` section. Here we find the parameters we can tell spacy to enable it to annotate the data. The entries do usually come with a comment explaining what the parameters do. Lets look through the ones we set up in example_en.json:
```
    "model": false,
```
Here we can specify a model spacy should use to annotate the text if we want to. We leave it to false for now though.
```
    "lang": "en",
```
Here we specify that the language of the data we want to annotate is english. Since we didn't specify a model this information will be needed to chose one for us.
```
    "text_type": "news",
```
We specify what kind of text we want to annotate in order to chose an appropriate model for the task. This does currently only support "news" for english. The setup we chose here will lead to the usage of the model en_core_web_md.
```
    "processors": "senter, tagger, parser, attribute_ruler, lemmatizer, ner",
```
Here we specifiy the processors for the pipeline we will apply to our data. This will define what kind of annotations we get in the end.

The remaining entries are not immediatly important for this example and are all set to their default values. Especially the `"config"` parameter and its contents are usually defined for a given model in it's config.cfg file and should not be tempered with.

# Running based of the .json
Let's now look at what the program would do with the supplied information from the .json input files. First we would read in the .json files to make them available as dictionaries.

In [None]:
# read in example_en.json
dict_en = be.prepare_run.load_input_dict("example_en")
# print(dict_en)

# read in example_de.json
dict_de = be.prepare_run.load_input_dict("example_de")

Now that we habe access to the information from the .json files we can load in the data from the specified locations.

In [None]:
# read in the english example text from example_en.txt
data_en = be.prepare_run.get_text(dict_en["input"])
# print(data_en)

# read in the german example text from example_de.txt
data_de = be.prepare_run.get_text(dict_de["input"])
# print(data_de)

Next we would load the tool as specified by the .json. In this case we would load the spacy pipeline from the mspacy module. We are told what components we load and which model we are using.

In [None]:
# load the pipeline from the config
pipe_en = msp.spacy_pipe(dict_en)

pipe_de = msp.spacy_pipe(dict_de)

After doing this we only have to apply the pipeline to the data we read in earlier.

In [None]:
# apply pipeline to data
annotated_en = pipe_en.apply_to(data_en)

annotated_de = pipe_de.apply_to(data_de)

The text has now been annotated. We can easily pass the results to a .vrt file using the output name defined in the .json. This would also directly encode the annotated results to cwb.

In [None]:
# get the annotated .vrt and pass to cwb
annotated_en.pass_results()

annotated_de.pass_results()

Loading the pipeline, applying it and passing the results can be done conveniently in one line:

# example for english data
msp.spacy_pipe(dict_en).apply_to(data_en).pass_results()

# is equivalent to the above
# pipe_en = 