# Dependencies for training with GPU (transformer)

To train the model with GPU, you need to install the following dependencies:

In [None]:
# Install CuPy (adjust based on your CUDA version)
%pip install cupy-cuda12x

In [None]:
# Install PyTorch, torchvision, and torchaudio (adjust CUDA version if needed)
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

# Generating the Configuration File

To generate the configuration file, execute the following command:

```bash
!python -m spacy init fill-config /path/to/base_config_tagger.cfg /path/to/config.cfg
```


This command utilizes the spacy init fill-config module to create a configuration file named config.cfg. It fills in the base configuration from the file specified at /path/to/base_config_tagger.cfg, which contains initial settings and parameters for the SpaCy model training process.

You can download a base_config_tagger.cfg file from the SpaCy website https://spacy.io/usage/training. After downloading the desired configuration file, specify its path as the base configuration file argument in the command. 

In [2]:
!python -m spacy init fill-config C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/base_config.cfg C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/config.cfg

[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
C:\Users\mikek\projects\Text-Normalization\src\text_norm_NER\config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


# Training the Model

To train the model, execute the following command:

```bash
!python -m spacy train /path/to/config.cfg --output /path/to/model_output_directory --paths.train /path/to/train.spacy --paths.dev /path/to/test.spacy --gpu-id 0
```

This command initiates the training process using SpaCy's training module. It requires a configuration file specified at /path/to/config.cfg, which contains settings and parameters for training the model. Additionally, it specifies the output directory for saving the trained model files at /path/to/model_output_directory and the paths to the training and testing datasets at /path/to/train.spacy and /path/to/test.spacy, respectively. The --gpu-id 0 flag indicates that the training process should utilize the GPU with ID 0.

In [None]:
!python -m spacy train C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/config.cfg --output C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/model --paths.train C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/train_data.spacy --paths.dev C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/train_data.spacy --gpu-id 0

# Evaluating the Model

To evaluate the model, execute the following command:

```bash
!python -m spacy evaluate /path/to/trained_model_directory/model-best /path/to/eval.spacy --output /path/to/evaluation_output.json --gpu-id 0
```

This command evaluates the performance of the trained model using SpaCy's evaluation module. It requires specifying the directory containing the trained model files at /path/to/trained_model_directory/model-best. Additionally, it specifies the path to the evaluation dataset at /path/to/eval.spacy.

The evaluation results will be saved in JSON format at the location specified by /path/to/evaluation_output.json. The --gpu-id 0 flag indicates that the evaluation process should utilize the GPU with ID 0.

In [None]:
!python -m spacy evaluate C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/model/model-best C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/val_data.spacy --output C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/output.json --gpu-id 0

# Packaging the Model

To package the model, execute the following command:

```bash
!python -m spacy package "/path/to/trained_model_directory/model-best" "/path/to/output_directory"
```

This command packages the trained model using SpaCy's packaging module. It requires specifying the directory containing the trained model files at /path/to/trained_model_directory/model-best. Additionally, it specifies the directory where the packaged model will be saved at /path/to/output_directory.

The packaged model can be installed using pip with the following command:

```bash
!pip install /path/to/output_directory/model_name-version
```

In [None]:
!python -m spacy package "C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/model/model-best" "C:/Users/mikek/projects/Text-Normalization/src/text_norm_NER/output"

In [None]:
! pip install "C:\Users\mikek\projects\Text-Normalization\src\text_norm_NER\output\en_pipeline-0.0.0"

# Performing Inference with the Model


In [None]:
import spacy

nlp = spacy.load("en_pipeline")

# add the text to be processed
text = "mk,wtf,fgh"

doc = nlp(text)

entities = []
for ent in doc.ents:
    print(f"Entity: {ent.text}, Label: {ent.label_}")
    if ent.label_ == "PERSON":
        entities.append(ent.text)

# Join PERSON entities with /
if entities:
    joined_entities = "/".join(entities)
    print(f"Joined PERSON entities: {joined_entities}")