# Spacy Demo

This demonstrates the usage of spacy within this project.

Here you should be able to read your data, convert it into a spacy readable format and finally train and deploy your own pipeline.

In [None]:
# Please ignore this cell: extra install steps that are only executed when running the notebook on Google Colab
# flake8-noqa-cell
import os
if 'google.colab' in str(get_ipython()) and not os.path.isdir('Test_Data'):
    # we're running on colab and we haven't already downloaded the test data
    # first install pinned version of setuptools (latest version doesn't seem to work with this package on colab)
    !pip install setuptools==61 -qqq
    # install the moralization package
    !pip install git+https://github.com/ssciwr/moralization.git@publish_spacy_example_notebook -qqq

    # download test data sets
    !wget https://github.com/ssciwr/moralization/archive/refs/heads/test_data.zip -q
    !mkdir -p data && unzip -qq test_data.zip && mv -f moralization-test_data/*_Data ./data/. && rm -rf moralization-test_data test_data.zip
    !spacy download de_core_news_sm

In [None]:
from moralization.spacy_model import SpacySetup, SpacyTraining

First we define all relevant file paths.

data_dir:       The location of your xmi files. <br>
config_file:    The config used for spacy training. <br>
working_dir:    The output dir of the data_conversion and the input/output directory for the training. <br>
                (If the working dir is None a temp dir will be used. When testing this can reduce clutter)

In [None]:
data_dir = "./data/Test_Data/XMI_11"
config_file = "./data/Test_Data/example_config.cfg"
# working_dir = "./test"
working_dir = None

For the SpacySetup we can include a custom working directory, if not a temp dir will be created. <br>
This working directory is also the default save location for `export_training_testing_data` 

In [None]:
example_setup = SpacySetup(data_dir, working_dir=working_dir)

One can quickly accesses all avaiable span keys for visualisation via:

Note: `sc` is the spacy default for all categories.

In [None]:
print(example_setup.span_keys.keys())

here we show all available file names.


In [None]:
print(example_setup.doc_dict.keys())

For ease of use you can either directly give the filenames or just the indeces. (eg.: ["name", 1]) <br>
You can view multiple files simultaniously or leave it blank to display all files.<br>
Use the spans_key filter to only show specific span groups.<br>

Disclaimer: This does not seem to work on google colab.

In [None]:
# example_setup.visualize_data(filenames=["test_data-trimmed_version_of-Gerichtsurteile-neg-AW-neu-optimiert-BB",1],
#              spans_key="sc")

The export function will write your spacy docs on the drive and also provide the Path as a return value. This is specially usefull when a temporary working dir is used.

In [None]:
working_dir = example_setup.export_training_testing_data(output_dir=working_dir)
print(working_dir)

Here you can set the working directory and the location of your config file for the spacy training.

In [None]:
example_training = SpacyTraining(working_dir=working_dir, config_file=config_file)

One can use the spacy overwrite syntax to change parameters from the config in code without altering the file everytime. <br>
To use a gpu either locally install the necessary cuda drivers or change the goggle colab runtime.
To disable the gpu set `use_gpu=-1`

In [None]:
example_training.train(overwrite={"training.max_epochs": 10},use_gpu=0)

Use the spacy evaluation method to get an overview of the result

In [None]:
example_training.evaluate()

Try the model on custom strings.

In [None]:
example_training.test_model_with_string("Das hier ist ein toller positiver und wertvoller Satz von Hans Peter!")

Here the best model can be saved in a custom directory. 

In [None]:
example_training.save_best_model("test_model")