# NLP Tasks and Applications

In Chapter 2, we provided you a gentle introduction to language models and fine-tuning. Now, let's explore more of what we can actually *use* fine-tuning for. Fine-tuning is good for more than just generating better domain-specific language models, as we alluded to in the previous chapter. Fine-tuning can be used to solve meaningful real world tasks, which serve as the building blocks of complex real world NLP applications.

In this chapter, we will officially introduce several of these "more meaningful" real world tasks and present several popular benchmarks (such as [GLUE](https://gluebenchmark.com/) and [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) for measuring performance on these tasks. We will also highlight several standard publicly available datasets for you to use when solving these tasks on your own. And, most importantly, we will solve two of these tasks (named entity recognition and text classification) together to show just how all of this works.

We hope this chapter gives you a deeper, more applied and hands-on take to performing NLP and can serve as the launch pad for you to build your own real world NLP applications.

## Pretrained Language Models

As we mentioned in Chapter 1, NLP has come a long way in a short amount of time over the past few years. Instead of training NLP models from scratch, it is now possible (and advisable) to leverage pretrained language models to perform common NLP tasks such as named entity recognition. Only when you have highly custom NLP needs is it advisable to train your NLP model from scratch. But, before we proceed any further, let’s define some of the terms we will use in this chapter. Some of these terms we have already covered in the previous two chapters, but this will be good refresher nonetheless to tie everything together.

> Note: Models that have trained on natural language data to perform NLP tasks are typically known as NLP models as opposed to other types of models such as computer vision models that have trained on visual data such as images or videos to perform computer vision tasks. Note that language models are a specific type of NLP model that have been trained to perform language modeling, which we introduced in Chapter 2.

Machine learning is an application of artificial intelligence that provides machines the ability to improve their performance on a defined task by learning from data such that the performance on the task improves as the machine learns from more data.

Natural language processing is the branch of machine learning that involves natural (aka “human”) language such as text and speech. Computer vision, which we will not cover in this book, is the branch of machine learning that involves visual data such as images and video.

Machines can learn from labeled data or unlabeled data. The area of machine learning that involves labeled data (e.g., this is an image of a “cat” or a “dog”) is known as supervised learning, and the area that involves unlabeled data (e.g., you have images of cats and dogs but none are labeled as such) is known as unsupervised learning.footnote:[Ankur has an [entire book on hands-on unsupervised learning](www.unsupervisedlearningbook.com) if you're curious.]. The third major area of machine learning, known as reinforcement learning, involves software agents learning how to take action in an environment (either physical or digital) in order to maximize the rewards they receive.

In machine learning, the process of machines learning from data (also referred to as “training on data”) to improve their performance on a specific task results in a model. Once the machines have learned / trained to a satisfactory level of performance on the task, the model stores the knowledge acquired from the training process in the form of model parameters (e.g., weights), which are used in the calculus and linear algebra performed in machine learning.

The model uses this stored knowledge (i.e., model parameters) to perform inference (i.e., generate predictions) on new / never-before-seen data. So long as the new data is similar to the data the machines had trained on, the performance on the new data should be similar to the performance the machines had achieved on the original training dataset.

Turning back to our original topic, we can use pretrained language models to perform common NLP tasks. When we refer to pretrained models, we refer to models that were previously trained on data. Instead of having machines train on data from scratch to perform NLP tasks, we start with pretrained *language* models that have already been trained on lots and lots of data to perform *language modeling* to good levels of performance. We then *fine-tune* the pretrained language models to perform specific NLP tasks beyond language modeling; this process of fine-tuning a language model to perform another NLP task is known as transfer learning, which we will turn to next. (Don't worry: we will discuss what these *common* NLP tasks are very soon, too.)

## Transfer Learning and Fine-tuning

Using pretrained language models is the fastest way to perform common NLP tasks. In contrast, if you need to perform uncommon NLP tasks, you may need to train the model from scratch, including sourcing and annotating / labeling the data relevant for your task.

There will be times when your task is similar to but not exactly the same as the task the pretrained model was trained to perform. In these cases, it is possible to leverage some of the “prior learning” by the pretrained model instead of training a model entirely from scratch. Leveraging some of this “prior learning” to train your own model is known as transfer learning. You are effectively “transferring” learning from one model to another.

Transfer learning is possible because pretrained language models are neural networks. Neural networks are a class of models in machine learning in which machines learn to represent data in a manner which enables the machines to perform complex tasks such as classifying the data.

Neural networks typically involve learning a series of representations, with each subsequent representation making it easier for the machine to interpret the data from the prior representation. Each representation is learned by a layer in the neural network; the more layers a neural network has, the more representations are learned. Modern neural networks typically have many layers — in other words, the neural networks are very deep. This is where the terms “deep learning” and “deep neural networks” come from (see Figure 3-0).

![Neural Network](images/hulp_0300.png)

In transfer learning, we borrow the first several layers of a pretrained *language model*. These first several layers have already learned some useful representations of the data, making it easier for us to train the subsequent layers of the neural network for our specific task.

For example, these first several layers of the pretrained language model may have already discovered a good way to represent the various tokens in our text; instead of having to learn these representations from scratch, we can just borrow the knowledge the pretrained model has learned and then train the model some more on our specific task.

The additional training of the model that we perform once we have borrowed the first several layers of the pretrained model is known as fine-tuning. In other words, we take some of the pretrained model, transferring some of the knowledge it has acquired, and then we fine-tune the remaining layers of the neural network for our specific task.

Transfer learning and fine-tuning are a very common practice in NLP today and have helped accelerate the build of NLP applications in specific domains (e.g., finance, legal, etc.). If we had to train NLP models from scratch every time we switched from one domain to another (e.g., from analyzing finance documents to legal documents), building NLP applications would be a very slow and arduous process.

Instead, we could leverage a generic pretrained language model trained on lots of textual data crawled from the Web and fine-tune it for finance or for legal and quickly build a domain-specific language model, similar to what we did for movie reviews in Chapter 2. Transfer learning is why NLP has blossomed in industry in recent years.

With this context in mind, let's introduce the common NLP tasks.

## NLP Tasks

Hugging Face has an excellent [overview of the common NLP tasks](https://huggingface.co/transformers/task_summary.html), which we will present here now. These tasks include sequence classification, question answering, language modeling, named entity recognition, summarization, and translation. This list is by no means exhaustive, but this list does highlight the most frequent use cases for NLP today in building applications and is a great place for us to start.

- **Sequence Classification:** Sequence classification is as straightforward as it sounds; it involves classifying sequences into a given number of classes. When performed on text, sequence classification is also referred to as text classification, which we will perform together later in this chapter. An example of sequence classification is sentiment analysis, which we performed when we classified IMBD movie reviews as positive, negative, or neutral in Chapter 2. Another example is entailment, which involves labeling the relationship between two statements (know as the text and hypothesis, respectively) into one of three classes: positive entailment (hypothesis states something that is definitely correct about the situation or event in the text), neutral entailment (hypothesis states something that might be correct about the situation or event in the text), or negative entailment (hypothesis states something is definitely incorrect about the situation or event in the premise). Within the field of NLP, this is more specifically referred to as a natural language understanding (NLU) task given what the machine has to be able to infer from the data in order to perform this task well. The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is the most popular benchmark to measure progress on sequence classification tasks and natural language understanding, more generally. The authors of the original GLUE paper released an even harder benchmark to measure progress on NLU known as [SuperGLUE](https://super.gluebenchmark.com/), which you should be aware of. You can find many more [text classification datasets](https://paperswithcode.com/task/text-classification) on Papers with Code.

- **Question Answering:** Question answering is the task of providing the correct answer from a sequence of text or audio given a question. Think of this as reading comprehension; the machine has to find the correct segment of text from a reading passage and present this as the answer to a question that is being asked. The most popular benchmark to measure progress on question answering is known as [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/). It is a collection of 100,000 answerable questions from the original SQuAD dataset (known as SQuAD 1.1) plus 50,000 unanswerable questions that look similar to answerable questions. The unanwerable questions were introduced to fool the machine, making the task more difficult. The machine has to decide whether the question is answerable or not, and, if the question is answerable, the machine has to provide the correct answer.

- **Language Modeling:** We have covered language modeling already, but, as a refresher, language modeling is the task of predicting the next sequence of words given a sequence of words. This particular type of language modeling is known as *causal lanuage modeling* and is commonly used for *natural language generation* (NLG) in the field of NLP. Another type of language modeling is *masked language modeling*, in which the machine must predict the masked word or words in a sequence given the context aroung the masked word or words. Given the nature of this task, there is no industry-setting performance benchmark, but there are plenty of [datasets](https://paperswithcode.com/task/language-modelling) available.


- **Text Generation:** Text generation is similar to language modeling in that the task involves generating a coherent sequence of text that is a continuation of the given text, but text generation is more open-ended compared to language modeling. Think of text generation as longer sequence text prediction versus the shorter sequence text prediction involved in language modeling. Text generation gained mainstream popularity with the release of [OpenAI's GPT-2](https://openai.com/blog/better-language-models/) in 2019. There is no industry performance benchmark for this task, but here are some [datasets](https://paperswithcode.com/task/text-generation). 

- **Named Entity Recognition:** We introduced named entity recognition (NER) in Chapter 1; it is the task of classifying tokens of interest (think words) in a sequence of tokens (think sentence) into specific entity types such as a person, an organization, or a location. The most popular dataset and benchmark for this task is [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/), which is an NER challenge dating back to 2003. Back then, statistical NLP models were used to perform NER, but today the best-performing NER models are transformer-based. For more on NER, including datasets, please visit the [Papers with Code](https://paperswithcode.com/task/named-entity-recognition-ner) website. We will perform NER together later in this chapter.

- **Summarization:** Summarization is the task of summarizing a document into a shorter text. The usefulness of this task should be fully apparent; this is a task all of us perform on a daily basis, synthesizing information from a long article into a shorter block/summary to hold as memory. The industry performance benchmark and dataset for this task is [CNN / Daily Mail](https://paperswithcode.com/sota/document-summarization-on-cnn-daily-mail), and here are some [public datasets](https://paperswithcode.com/task/text-summarization) that are available.

- **Translation:** Translation (or machine translation as it is commonly called) is the task of translating a text from language to another. Think of Google Translate or the Translate app by Apple. The most popular metric to score the quality of machine translation is known as [BLEU](https://en.wikipedia.org/wiki/BLEU). You can also find [many datasets](https://paperswithcode.com/task/machine-translation) for this task on Papers with Code.

To reiterate: this list is by no means exhaustive. These are just some of the frequent use cases for NLP today; other use cases include voicebots, chatbots, speech recognition, entity linking (which we explored in Chapter 1), and more. Nevertheless, this should give you a flavor of how NLP is being using in applications today.

## Natural Language Dataset

Now that we've covered the common NLP tasks, let’s go perform two of these NLP tasks - named entity recognition and text classfication - using pretrained language models. Before we do, we need a natural language dataset to work with.

We will use the AG News Classification Dataset in this chapter. AG is a collection of more than 1 million news articles, gathered from more than 2,000 news sources. This dataset is provided by the academic community and is commonly used for research purposes (e.g., to benchmark performance of various NLP models over the years).footnote:[For more on the dataset, please view [the original source](http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html).]

We will use a specific version of this AG News Classification Dataset that was constructed by Xiang Zhang and is available on Kaggle.footnote:[For more on the dataset, please visit the dataset page on [Kaggle](https://www.kaggle.com/amananandrai/ag-news-classification-dataset).] This version of the dataset has better documentation and is readily available as a comma-separated values (CSV) file whereas the original is not.

This Kaggle-version of the dataset, which we will refer to as the AG News Topic Classification Dataset (“AG Dataset”) from now on, is a labeled dataset. Each news article has a title and a description and is classified into one of four classes (1-World, 2-Sport, 3-Business, and 4-Sci/Tech). Each class contains 30,000 training samples and 1,900 testing samples, and the entire dataset has 120,000 training samples and 7,600 testing samples.

### Explore AG Dataset

Let’s explore the training dataset in Google Colab.footnote:[To follow along, please visit the Chapter 3 notebook in our [Github repo](https://github.com/nlpbook/nlpbook/).] Since we want to use GPUs to train our models, let’s enable GPUs in Google Colab (or locally, if GPUs are available). In your Google Colab session, go to Edit → Notebook settings, select GPU under Hardware Accelerator, and hit Save. Note that this restarts the runtime; all of your cell states get lost.

Next, we will load the data, convert all column names to lowercase and replace spaces with underscores, and add a new feature called “class_name” that maps the numerical labels to class names.

In [None]:
# Import libraries
import pandas as pd
import os

# Get current working directory
cwd = os.getcwd()

# Import AG Dataset
data = pd.read_csv(cwd+'/data/ag_dataset/train.csv')
data = pd.DataFrame(data=data)
data.columns = data.columns.str.replace(" ","_")
data.columns = data.columns.str.lower()
data["class_name"] = data["class_index"].map({1:"World", 2:"Sports", 
                                              3:"Business", 4:"Sci_Tech"})

Let's preview the data now.

In [None]:
# View data
data

As shown in the cell output, the training dataset has 120,000 observations and four features, as expected. The 4 features are class_index, title, description, and class_name.

Here are the number of observations per class (30,000 each, as expected).

In [None]:
# Count observations by class
data.class_name.value_counts()


Next, let’s view the titles and descriptions of the first ten news articles to get a better sense of the data. 

In [None]:
# View titles
for i in range(10):
    print("Title of Article",i)
    print(data.loc[i,"title"])
    print("\n")

In [None]:
# View descriptions
for i in range(10):
    print("Description of Article",i)
    print(data.loc[i,"description"])
    print("\n")

Based on these titles and descriptions, you should now have a better feel for the data, including the somewhat noisy text in the descriptions (e.g., the description of article 8).

Let’s pre-process the text some more to remove some of the noisiness in the data. This will remove and replace tokens that are superfluous (such as double spaces) and make reading the text (for humans) more difficult.

In [None]:
# Clean up text
cols = ["title","description"]
data[cols] = data[cols].applymap(lambda x: x.replace("\\"," "))
data[cols] = data[cols].applymap(lambda x: x.replace("#36;","$"))
data[cols] = data[cols].applymap(lambda x: x.replace("  "," "))
data[cols] = data[cols].applymap(lambda x: x.strip())

In [None]:
# Write data to CSV
data.to_csv(cwd+'/data/ag_dataset/prepared/train_prepared.csv', index=False)

Great! This is the dataset we will work with. Now, let’s proceed with the our first NLP application, named entity recognition.

## NLP Task #1 - Named Entity Recognition

In Chapter 1, we briefly explored named entity recognition (NER), which parses notable entities in natural language and labels them with their appropriate class label such as “Person” or “Location.” Named entity recognition is a form of text classification. NER models use the context of tokens around the token of interest to predict the entity label of the token of interest. Once the entities are labeled correctly, we can use the extracted information to perform information retreival (search documents based on people or places we care about), create structured data from unstructured documents (e.g., parse key binding legal terms from legal documents at scale), and more. Think of NER as adding rich metadata to every document, which then allows us to perform rich analysis downstream.

### Perform Inference using Original SpaCy Model

Let’s first use a pretrained language model from SpaCy to perform named entity recognition. SpaCy offers four different pretrained models for NER: small, medium, large, and transformer-based. All four are trained on written text in the form of blogs, news, and comments but differ in size. The larger the model and the more data it has trained on, the better the performance, generally speaking. In Chapter 1, we opted for the small model to perform the basic NLP tasks. Now, we will opt for the transformer-based model, spaCy's best model.

> Note: We will install spaCy on GPU by specifying spacy[cuda110]. You can specify other CUDA versions, too. For more, please visit the [spaCy documentation](https://spacy.io/usage). If you do not wish to use GPUs, please install spacy using pip install -U spacy (without the cuda reference). If you run into issues, send us an email at authors@appliednlpbook.com.

If you haven't installed spaCy already, these commands will get you everything you need. If you're running them in a noteook, prefix each line with a `!` character, as we've done before.

```bash
pip install -U spacy[cuda110,transformers,lookups]==3.0.3
pip install -U spacy-lookups-data==1.0.0
pip install cupy-cuda110==8.5.0
python -m spacy download en_core_web_trf
```

> Warning: You may need to restart your runtime after installing spaCy and downloading the pretrained language model before you can successfully import the model in the next step.

In [None]:
# Import spacy and load language model
# To use GPU, uncomment the GPU-related code below

import spacy
# spacy.require_gpu()
# print(spacy.require_gpu())
nlp = spacy.load("en_core_web_trf")

> Note: If spaCy on GPU is successfully installed and activated, you will see "GPU: True". If you do not, please troubleshoot your GPU installation or revert to CPU.

Now that we’ve installed spaCy and loaded the transformer-based model, let’s print the metadata of the model, which highlights the underlying components of the model and the associated accuracy metrics.

In [None]:
# View metadata of the model
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(nlp.meta)

Based on the metrics (which we will not print here, given the volume of text involved), we can see that the model has an NER component, which supports various entity types including the following: cardinal, date, event, fac, gpe, language, law, loc, money, norp, ordinal, org, percent, person, product, quantity, time, and work of art.

Let’s focus on three of the more common entity types: ORG (short for organization), PERSON, and GPE (i.e., geopolitical entity such as country, cities, and states). Let’s review the accuracy metrics for these three. F is the F1 Score. P is the Precision, and R is the Recall.

As a refresher, precision is the percentage of true positives / the number of total positive predictions. Recall is the percentage of true positives / the number of total true positives. F1 is a blended metric and is calculated as 2 x (Precision x Recall)/(Precision + Recall). The higher the F1, precision, and recall, the better.

```
'PERSON': {'f': 0.9546191248, 'p': 0.9481648422000001, 'r': 0.9611618799}

'ORG': {'f': 0.9012772751, 'p': 0.9046474359000001, 'r': 0.8979321315000001}

'GPE': {   'f': 0.9467271182, 'p': 0.9619925137, 'r': 0.9319386332}
```

From these metrics, we can see that the model is decently good at all of these entities but is worst at ORG, for which it has an F1 score of 90.

Now that we have loaded the spaCy model and reviewed some of its metadata, let’s apply the spaCy model to our AG News data and generate the results of named entity recognition.

In [None]:
# Print NER results for Descriptions
for i in range(9):
    print("Article",i)
    print(data.loc[i,"description"])
    print("Text Start End Label")
    doc = nlp(data.loc[i,"description"])
    for token in doc.ents:
        print(token.text, token.start_char,
              token.end_char, token.label_)
    print("\n")

Here are the NER labels for the descriptions of the first nine articles, including the start and end positions of every tagged entity. Let's review the performance of the NER model.

- Article 0: Reuters - Short-sellers, Wall Street's dwindling band of ultra-cynics, are seeing green again.
- Text Start End Label
- Reuters 0 7 ORG

Great result.

- Article 1: Reuters - Private investment firm Carlyle Group, which has a reputation for making well-timed and occasionally controversial plays in the defense industry, has quietly placed its bets on another part of the market.
- Text Start End Label
- Reuters 0 7 ORG
- Carlyle Group 34 47 ORG

Great result.

- Article 2: Reuters - Soaring crude prices plus worries about the economy and the outlook for earnings are expected to hang over the stock market next week during the depth of the summer doldrums.
- Text Start End Label
- Reuters 0 7 ORG
- next week 134 143 DATE
- summer 168 174 DATE

Great result. Even the date entities were captured correctly.

- Article 3: Reuters - Authorities have halted oil export flows from the main pipeline in southern Iraq after intelligence showed a rebel militia could strike infrastructure, an oil official said on Saturday.
- Text Start End Label
- Reuters 0 7 ORG
- Iraq 86 90 GPE
- Saturday 186 194 DATE

Great result.

- Article 4: AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.
- Text Start End Label
- AFP 0 3 ORG
- barely three months 103 122 DATE
- US 134 136 GPE

Great result.

- Article 5: Reuters - Stocks ended slightly higher on Friday but stayed near lows for the year as oil prices surged past $46 a barrel, offsetting a positive outlook from computer maker Dell Inc. (DELL.O)
- Text Start End Label
- Reuters 0 7 ORG
- Friday 42 48 DATE
- the year 74 82 DATE
- 46 110 112 MONEY
- Dell Inc. 173 182 ORG

Great result.

- Article 6: AP - Assets of the nation's retail money market mutual funds fell by $1.17 billion in the latest week to $849.98 trillion, the Investment Company Institute said Thursday.
- Text Start End Label
- 1.17 billion 69 82 MONEY
- the latest week 86 101 DATE
- 849.98 trillion 105 121 MONEY
- the Investment Company Institute 123 155 ORG
- Thursday 161 169 DATE

AP should have been recognized as an organization, but otherwise great result.

- Article 7: USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.
- Text Start End Label
- July 50 54 DATE
- last week 97 106 DATE
- Thursday 128 136 DATE
- midsummer 181 190 DATE

USATODAY.com should have been recognized as an organization, but otherwise great result.

- Article 8: Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of $70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.
- Text Start End Label
- Danny Bazil Riley 49 66 PERSON
- annual 145 151 DATE
- 70,000 168 174 MONEY
- 32 315 317 DATE
- Riley 380 385 PERSON

Forbes.com is an organization, and 32 is not a date.

All in all, the NER results from the pretrained SpaCy model are excellent. This highlights why you should leverage pretrained models, where possible, for your work.

### Custom NER

However, sometimes pretrained models are insufficient for the task at hand. This could be for several reasons. First, the corpus on which we want to apply a pretrained model may be materially different from the corpus on which the model was trained on. For example, the transformer-based spaCy model we just used was trained on blogs, news, and comments on the web. If our corpus is materially different (e.g., a very technical corpus such as legal, finance, or health data), we may want to annotate a portion of our corpus and fine-tune the transformer-based spaCy model. By fine-tuning the model, the model will perform better on our specific corpus.

Second, the tasks that the transformer-based spaCy model was trained to perform may differ from the task we wish to perform. For example, the spaCy named entity recognition does not support stock tickers (TICKER) as an entity type. If we wish to add this TICKER entity type, we would have to annotate tickers in our data and fine-tune the transformer-based spaCy model.

To demonstrate how transfer learning and fine-tuning a model work, let’s annotate a small portion of our data for the three core entity types (ORG, PERSON, and GPE) and add a new entity type (TICKER).

We will use an annotation platform called Prodigy to annotate our data. Prodigy, like spaCy, is the product of the software company [Explosion](https://explosion.ai/). Prodigy allows us to load our corpus into a beautiful browser-based UI to label our data however we wish. These labels then become available for us to fine-tune our spaCy model. Unfortunately, Prodigy is not available for free, but we do highly recommend it for purchase.

In the next section, we will install and use Prodigy to annotate a small portion of our AG News dataset. Then, we will use these annotations to fine-tune our spaCy model from earlier. For those that do not wish to purchase a Prodigy license, please feel free to skip the next section.

### Annotate via Prodigy - NER

After purchasing a license for Prodigy, you will be able to download a Python .whl file (also known as a wheel). Unfortunately, this wheel cannot be installed on Google Colab, so we will need to install it locally on our own machine.

Before installing Prodigy, we recommend you create and activate a virtual environment on your local machine. If you have the Anaconda distribution of Python installed, you can create and activate a new virtual environment using the following commands on the command line. Even if you have set up your local environment using the README on our Github repo, you should create a separate virtual environment solely for Prodigy to avoid any conflicts with our main applied_nlp Conda environment. 

> Warning: It is generally preferable to create new virtual environments for every machine learning project you have. Having a separate environment for each project allows you to install the relevant libraries for your current project without having to uninstall libraries that you may need for other projects but that can cause code to fail for your current project. Think of a virtual environment as a blank new canvas (i.e., new set of libraries) for you to do your work without having to worry about how changes to the current canvas conflict with the canvases you have for other pojects.

```bash
$ conda create -n prodigy anaconda python=3.8
$ conda activate prodigy
```

Now, navigate to the directory with the Prodigy wheel and install the package. You may need to specify the wheel by its full file name if this doesn’t work.

```bash
$ pip install prodigy*.whl
```

You will also need to install spaCy in this virtual environment and download the en\_core\_web\_lg model if you haven't already.

> Warning: As of March 2021, Prodigy does not support SpaCy 3.x (hence no transformer-based pipelines). We expect Prodigy to introduce support for SpaCy 3.x in the near future, but, for now, we will have to work with SpaCy 2.x and the en_core_web_lg model.

```bash
$ pip install -U spacy[cuda110]==2.3.5
$ pip install -U spacy-lookups-data==1.0.0
$ pip install cupy-cuda110==8.5.0
$ python -m spacy download en_core_web_lg
```

Now, let’s prepare a file to load into Prodigy. For NER, we need a CSV of text snippets with a column name of “text.” We will use the descriptions from the AG News dataset as the text snippets and annotate these in Prodigy.

In [None]:
# Prepare text for annotation in Prodigy
train_prodigy_ner = data.copy()
train_prodigy_ner = train_prodigy_ner.description
train_prodigy_ner.rename("text",inplace=True)
train_prodigy_ner.to_csv(cwd +
                         "/data/ag_dataset/ner/raw/train_prodigy_ner.csv",
                         index=False)

We can now load the data into Prodigy and begin annotating the data. We will annotate the data for the three main entities we care about (ORG, PERSON, and GPE) and a fourth new entity (TICKER). 

To perform this annotation, we will use the Prodigy recipe called *ner.manual* (Figure 3-2). This recipe allows us to mark entity spans in a text by highlighting them and selecting the respective labels.footnote:[For more on these Prodigy recipes, visit the https://prodi.gy/docs/recipes#training[Prodigy website].]

![ner.manual Recipe](images/hulp_0302.png)

In the command line, we need to specify the name of the recipe (ner.manual), the name of the dataset to which we want to save the annotations to (e.g., ag_data_ner_ticker), a spaCy model (e.g., en_core_web_lg or blank:en if we want to start with a blank model), the text source (in our case, the path to the train_prodigy_ner.csv), and the entity labels we want available in the Prodigy UI to annotate the text.

```bash
$ python -m prodigy ner.manual <dataset> <spacy_model> <source> \
 --label ORG,PERSON,GPE,TICKER
```

If successful, you will see this message in the command line.

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Go ahead and copy the URL into your web browser and you should see an annotation UI such as the one shown in Figure 3-3.

![Prodigy NER Annotation UI](images/hulp_0303.png)

We can now highlight spans and label the data with the correct entities, as seen in Figure 3-4. Click on the big green checkmark box to proceed to the next example (or press the “a” key on your keyboard). If you would like to skip an example because you are not sure of the answer, press “space” on your keyboard.

![Prodigy NER Annotation UI - Annotating First Example](images/hulp_0304.png)

Let’s annotate a few hundred of these and then save them by clicking on the floppy disk sign next to the word “prodigy” on the upper left hand corner of the UI. A few hundred annotations should be good enough for a decent fine-tuned model, although, as always, the more annotations, the better the model’s performance will be.

Once the annotations are ready, we can output the NER annotations in spaCy’s JSON format using the *data-to-spacy* Prodigy recipe (Figure 3-5).

![data-to-spacy Prodigy Recipe](images/hulp_0305.png)

For this recipe, we need to specify the output path (to train the model), an evaluation output path (to evaluate the model), the language (“en” in our case), and the ner dataset using the --ner tag.

```bash
$ python -m prodigy data-to-spacy <output> <eval_output> --lang en \
--ner ag_data_ner_ticker
```

This command outputs the annotations in JSON format, but, as of spaCy v3.0 (released in January 2021), spaCy's main data format is a binary format. Before we can train using spaCy, we need to convert the JSON format to a binary format. SpaCy has a convert recipe for this, which we will use now.

> Note: The reason we are converting to a binary format is because we will use the annotations to fine-tune the transformer-based model in SpaCy 3.x after we finish exporting from Prodigy.

![Convert spaCy Recipe](images/hulp_0305b.png)

```bash
$ python -m spacy convert <path-to-json> <path-for-binary-output>
```

Great! We are now ready to fine-tune our spaCy model with this annotated data.

### Train Custom NER Model using SpaCy

If you skipped the Prodigy section just now, do not worry. We have generated the train and eval annotations and made them available to you for this next section.

> Tip: At this point, please deactivate the Conda environment called "prodigy" and activate the main Conda environment called "applied_nlp" if you are developing the SpaCy model on your local machine.

We will train two separate NER models. First, we will train an NER model using transfer learning. To perform transfer learning, we will use a transformer model called [RoBERTa](https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/), a large, pretrained language model released by Facebook in 2019. Second, we will train an NER model without a transformer model and GPUs and rely just on a CPU-based training pipeline. This will help us compare the transformer-based GPU-enabled performance vs. the standard CPU-based performance.

Let’s go ahead and train the transformer-based model first. We will use the **train** command in spaCy, as shown in Figure 3-6.footnote:[For more on the train command, please visit the [official spaCy documentation](https://spacy.io/api/cli#train).\]

![SpaCy Train Command](images/hulp_0306.png)

For this command, we will need to specify the config path, the output path, the gpu tag to enable training on GPU. The requirement for the training configuration path is new to spaCy v3.0. The training config is the file that sets all the settings and hyperparameters for the model development.

```bash
    $ python -m spacy train <config_path> --output <output_path> \
    --gpu-id 0
```

Let's generate this config file for training first. Surprise! There is spaCy recipe for creating config files from scratch. For this command, we need to specify the lang (en), the pipeline component we need to modify (ner), the optimize tag ("efficiency" for faster inference / smaller model or "accuracy" for higher accuracy / slower, larger model), whether gpus will be used or not, and whether the command should overwrite the output file, if one exists.

![SpaCy Init Config](images/hulp_0306b.png)

```bash
    $ python -m spacy init config <config_path> --lang --pipeline \
    --optimize --gpu --force 
```

Another option is to use the [training configuration UI](https://spacy.io/usage/training#config) on spaCy's official website to generate the best practices-version of the config file for NER. To kick off most projects, this is the best place to start because spaCy updates this configuration widget with the best practices it has discovered based on its model experimentation. This is what we will use. We will start a blank transformer-based template (with GPU enabled).

We need to auto-fill this base NER template from spaCy using another spaCy command called init fill-config, shown in Figure 3.6c.

![SpaCy Init Fill-Config](images/hulp_0306c.png)

This commands takes in two very simple parameters: an input path to the config file (which we downloaded from spaCy's website) and an output path. The command will generate the final output config file by auto-populating the remaining components of the base template generated from spaCy's widget.

```bash
    $ python -m spacy init fill-config <config_path_original> \ 
    <config_path_new> 
```

Let's run this command now and then proceed to training. We will train for 30 epochs.

In [None]:
# Auto-fill base template
ner_path = "/data/ag_dataset/ner/"

# the downloaded file from spaCy
config_file_path_input = cwd + ner_path + "config_spacy_template_gpu_blank.cfg"

# the output file we will use for training
config_file_path_output = cwd + ner_path + "config_final_gpu_blank.cfg"

```bash
python -m spacy init fill-config "$config_file_path_input" \ 
"$config_file_path_output" 
```

In [None]:
# Train SpaCy model on NER annotations
output_path = cwd + "/models/ag_dataset/ner/ner-gpu-blank"
train_path = cwd + "/data/ag_dataset/ner/annotations/binary/train"
dev_path = cwd + "/data/ag_dataset/ner/annotations/binary/eval"

```bash
python -m spacy train "$config_file_path_output" \ 
--output "$output_path" --paths.train "$train_path" \ 
--paths.dev "$dev_path" --training.max_epochs 30 --gpu-id 0 --verbose
```

Figure 3-7 displays the results. As you can see, the model achieves an F1 score above 95 within 30 epochs.

![SpaCy NER - Transformer GPU-Based](images/hulp_0307.png)

Let's now train the second model, this time without transfer learning from a transformer-based model and no GPUs.

We will generate a config file first, and then we will train the model using this config file for 30 epochs. Notice that the GPU tag is missing, which is what we want. The absence of the GPU tag creates a config file that forgoes the transformer-based model. This second model does not use transfer learning from the RoBERTa model, whereas the first model we developed did.

In [None]:
# Generate config file
# the output file we will use for training
ner_path = "/data/ag_dataset/ner"
config_file_path_output = cwd + new_path + "config_final_no_gpu_blank.cfg"

```bash
python -m spacy init config "$config_file_path_output" --lang en \ 
--pipeline ner --optimize efficiency --force 
```

In [None]:
# Train SpaCy model on NER annotations
output_path = cwd + "/models/ag_dataset/ner/ner-no-gpu-blank"
train_path = cwd + "/data/ag_dataset/ner/annotations/binary/train"
dev_path = cwd + "/data/ag_dataset/ner/annotations/binary/eval"

```bash
python -m spacy train "$config_file_path_output" \ 
--output "$output_path" --paths.train "$train_path" \ 
--paths.dev "$dev_path" --gpu-id 0 --training.max_epochs 30 --verbose
```

The results of the second model (Figure 3-8) are not bad but clearly not as good as the results from the first model. This second model - which does not use transfer learning from a large, pre-trained language model - achieves an F1 score that is near 90, well shy of the 95 F1 score of transformer-based model.

![SpaCy NER - No Transformer CPU-Based](images/hulp_0308.png)

Since the transformer-based model performs better than the non-transformer-based model (as expected), let’s compare this fine-tuned transformer-based model (fine-tuned on the AG News Dataset) with the original transformer-based spaCy version (`en_core_web_trf`).

### Custom NER Model vs. Original NER Model

Comparing the fine-tuned transformer-based model with the original `en_core_web_trf` is not an apples-to-apples comparison since the fine-tuned model supports just four entity types (ORG, PERSON, GPE, and TICKER) while the original `en_core_web_trf` supports many more entity types (but does not support TICKER, which is the new entity type we just annotated for the AG News dataset).

Nevertheless, we can compare the two models on a sample of article descriptions in the AG News dataset and see which model performs better. This will help us determine whether fine-tuning the RoBERTa model improved the NER performance on our dataset compared to the original spaCy model.

Before we compare the results of the two models, let’s load our fine-tuned NER model and view the metadata of the model.

In [None]:
# Load custom NER model
# To use GPU, uncomment the GPU-related code below

# spacy.require_gpu()
custom_ner_model = spacy.load(cwd + \
    '/models/ag_dataset/ner/ner-gpu-blank/model-best')

# View metadata of the model
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(custom_ner_model.meta)

The fine-tuned NER model supports just four entity types but the F1 scores are pretty good: 97 for GPE, 93 for ORG, 96 for PERSON, and 98 for TICKER. By comparison, the original spaCy model has F1 scores of 95 for GPE, 90 for ORG, and 95 for PERSON (and no F1 for TICKER, which the original spaCy model does not support). Note that the comparison is not apples-to-apples because the F1 scores were measured on different datasets, but this gives you a sense of relative performance.

Now, let’s use a built-in spaCy visualizer for NER to compare the two models.

> Note: We use original and base interchangeably.

In [None]:
# Compare NER results on Descriptions: Original/Base vs. Custom
# To use GPU, uncomment the GPU-related code below

from spacy import displacy
import random

#spacy.require_gpu()
base_model = spacy.load("en_core_web_trf")

options = {"ents": ["ORG","PERSON","GPE","TICKER"]}

for j in range(3):
    i = random.randint(0, len(data))
    print("Article",i)
    doc_base = base_model(data.loc[i,"description"])
    doc_custom = custom_ner_model(data.loc[i,"description"])
    print("Base Model NER:")
    displacy.render(doc_base, style="ent", options=options, jupyter=True)
    print("Custom Model NER:")
    displacy.render(doc_custom, style="ent", options=options, jupyter=True)
    print("\n")

As shown in Figure 3-10, the two models have similar NER results (which makes sense since the F1 scores are somewhat similar, albeit a bit higher for the fine-tuned model). For example, for article 55405, both the base model and the fine-tuned model capture "Apple" as an ORG. For article 4145, the base model misses "NewsFactor" as an ORG and "Cingular Wireless" as an ORG, both of which the fine-tuned model captures. The fine-tuned model misses NYSE as an ORG but captures "AWS" as TICKER. For article 106431, the results are identical, too.

![Article Examples](images/hulp_0310.png)

While the fine-tuned model does seem to perform better than the base model overall, the fine-tuned model will not always outperform the base model for any given example. To see this, test the code snippet above to compare the results of the base model against the results of the fine-tuned model. You will surely find instances where the base model performs better.

Congratulations! We fine-tuned the RoBERTa model and added a new entity for stock tickers by annotating a small percentage of the AG News Dataset in Prodigy and training it using spaCy. We then compared the fine-tuned model against the original spaCy model and saw (generally) better performance from the fine-tuned model. We can also confirm that the fine-tuned model is now tagging stock tickers, as expected.

This is a big accomplishment for two reasons. First, our work shows that fine-tuning a large, pretrained language model even on a small set of annotations (just a few hundred) improves performance. You don't need to train a model from scratch; you can leverage some prior learning from the pretrained language model and use that as a launch pad to improve performance on your specific task on your specific corpus. Transfer learning is a huge benefit to practitioners, dramatically reducing the time for any new model build. Second, we showed just how easy it is to develop your own NER model for your custom entity types (e.g., stock ticker). New model development is pretty painless; you can get up and running with new custom models fast.

In the next section, let's perform our second NLP task - text classification - applying some of the same techniques we used for our custom NER model. These techniques include annotating data and fine-tuning our large, pretrained language model to achieve good performance on this text classification task.

## NLP Task #2 - Text Classification

Now that we’ve finished performing named entity recognition, let’s turn to a second NLP task - text classification. Text classification is a very common application of NLP; applications include news apps that classify news articles into topic-based categories, the spam / not spam feature of email apps, and the real news / fake news classification model on Facebook and other social platforms.

As a recap, all of the articles in the AG News train dataset are already classified into one of four classes: Business, Sci\_Tech, Sports, and World. We can skip annotations altogether and use these labels to train a text classification model using spaCy. But, in the real world, you will rarely have preannotated datasets like this. Instead, you will typically have to go through the exercise of annotating your data from scratch. 

To show you how easy it is to annotate data and generate a text classification model by fine-tuning a large, pretrained language model, let’s annotate several examples from scratch using Prodigy and then train a text classification model from the labels we generate.

Like before, you can skip this next section if you’d rather not purchase a license in Prodigy (or if you would rather annotate the data in another annotation platform such as [Labelbox](https://labelbox.com/)). We will export our annotations from Prodigy and make them available for you to train the text classification model so if you decide to skip the Prodigy annotations, do not worry.

### Annotate via Prodigy - Text Classification

Like we did for NER, let’s prepare a file to load into Prodigy for text classification. As before, we need a CSV of text snippets with a column name of “text.” We will use the titles and descriptions (not just descriptions as we did for NER) from the AG News Dataset as the text snippets and annotate these in Prodigy.

Let’s split the train dataset into two: one to use in Prodigy and one to use to evaluate the text classification model. We can call these “textcat_train” and “textcat_eval” sets, respectively. To perform this split, we will use the train_test_split function in Scikit-Learn.footnote:[For more on the train_test_split function, visit the [official Scikit-Learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).]

In [None]:
# To train and evaluate text classification models in Prodigy
from sklearn.model_selection import train_test_split
 
# Prepare for Text Classification
textcat = data.copy()
textcat["text"] = textcat["title"] + str(" ") + textcat["description"] 
textcat["label"] = textcat["class_name"]
textcat.drop(columns=["class_index","title","description","class_name"],
 inplace=True)
textcat_train, textcat_eval = train_test_split(textcat, test_size=0.2,
 random_state=2020, stratify=textcat.label)
 
textcat_train.to_csv(cwd +
 '/data/ag_dataset/textcat/raw/train_prodigy_textcat_train_with_labels.csv',
 index=False)

textcat_eval.to_csv(cwd +
 '/data/ag_dataset/textcat/raw/train_prodigy_textcat_eval.csv',
 index=False)
 
textcat_train = textcat_train.text
textcat_train.to_csv(cwd +
 '/data/ag_dataset/textcat/raw/train_prodigy_textcat_train_without_labels.csv',
 index=False)

We can now load the data into Prodigy and begin annotating the data. We will annotate the data using four mutually-exclusive labels: Business, Sci_Tech, Sports, and World. To do this, we will use the Prodigy recipe called *textcat.manual* (Figure 3-13). This recipe allows us to manually annotate categories that apply to the text. We set the labels using the --label flag, and we use the --exclusive flag to designate the labels as mutually exclusive; in other words, an example may have only one correct class rather than multiple classes/labels.footnote:[For more on these Prodigy recipes, visit the https://prodi.gy/docs/recipes#training[Prodigy website].]

![textcat.manual Prodigy Recipe](images/hulp_0313.png)

In the command line, we need to specify the name of the recipe (textcat.manual), the name of the dataset to which we want to save the annotations to (e.g., `ag_data_textcat`), the text source (in our case, the path to the `train_prodigy_textcat_train_without_labels.csv`), the labels we want available in the Prodigy UI to annotate the text, and the --exclusive flag.

```bash
    $ python -m prodigy textcat.manual <dataset> <source> \
     --label Business,Sci_Tech,Sports,World --exclusive
```

If successful, you will see this message in the command line.

```bash
✨ Starting the web server at <http://localhost:8080> …​ \
Open the app in your browser and start annotating!
```

Go ahead and copy the URL into your web browser and you should see an annotation UI as shown in Figure 3-14. Make sure you see “choice” next to the “VIEW ID” in the upper left hand corner; if not, you likely forgot to set the --exclusive tag.

![Prodigy Textcat Annotation UI](images/hulp_0314.png)

We can now categorize each text into one of four categories. Click on the big green checkmark box to proceed to the next example (or press the “a” key on your keyboard). If you would like to skip an example because you are not sure of the answer, press “space” on your keyboard.

Let’s annotate a few hundred of these and then save them by clicking on the floppy disk sign next to the word “prodigy” on the upper left hand corner of the UI. A few hundred annotations should be good enough for a decent text classification model, although, as always, the more annotations, the better the model’s performance will be.

Once the annotations are ready, we can output the annotations in spaCy’s JSON format using the **data-to-spacy** Prodigy recipe, which we also used earlier (Figure 3-15).

![data-to-spacy Prodigy Recipe](images/hulp_0315.png)

For this recipe, we need to specify the output path (to train the model), the language (“en” in our case), the textcat dataset using the `--textcat` flag, and the `--textcat-exclusive` flag since we want to treat our classes as mutually exclusive. Note that we do NOT need to set an evaluation output path since we already have a labeled `textcat_eval` set, which we generated in the previous section using `train_test_split`.

```bash
    $ python -m prodigy data-to-spacy <output> \
    --lang en --textcat ag_data_textcat --textcat-exclusive
```

Next, let's convert this training data from JSON format to a binary format for spaCy training.

```bash
$ python -m spacy convert <path-to-json> <path-for-binary-output>
```

In [None]:
# Convert from JSON to binary format for spaCy training 
# Few Labels from Prodigy Annotations
json_path = "/data/ag_dataset/textcat/annotations/jsons/"
bin_path = "/data/ag_dataset/textcat/annotations/binary/"
input_path = cwd + json_path + "train_few_labels"
output_path = cwd + bin_path + "train_few_labels"
!python -m spacy convert "$input_path" "$output_path"

Finally, let’s convert the `textcat_eval` set into from a CSV format to a JSON format for use in spaCy. To perform this, we first need to load the CSV into prodigy using the **db-in** recipe (Figure 3-16). For this recipe, we need to designate the dataset name (e.g., `ag_data_textcat_eval`) and the file path (e.g., path to `train_prodigy_textcat_eval.csv`).

![db-in Prodigy Recipe](images/hulp_0316.png)

```bash
    $ python -m prodigy db-in dataset in_file
```

If successful, you should see the following message.

```bash
✔ Imported 24000 annotations to 'ag\_data\_textcat\_eval' \
in database SQLite Found and keeping existing "answer" \
in 0 examples
```

Now, we can use the data-to-spacy recipe to export into spaCy’s JSON format.

```bash
    $ python -m prodigy data-to-spacy <output> \
     --lang en --textcat ag_data_textcat_eval --textcat-exclusive
```

This is the same as before except we will need to change the &lt;output&gt; path and the `--textcat` tag to `ag_data_textcat_eval`.

Now, let’s also prepare the `textcat_train_with_labels` set because we will want to train a text classification model on the original labels as well in order to compare how well the Prodigy annotated model does versus one that is trained on many more labels.

We will repeat the same steps that we used to prepare the eval set but with the `textcat_train_with_labels` this time. We will call this dataset `ag_data_textcat_train_with_labels`.

```bash
    $ python -m prodigy db-in dataset in_file
    $ python -m prodigy data-to-spacy <output> \
     --lang en --textcat ag_data_textcat_train_with_labels --textcat-exclusive
```

Now, let's convert both of these JSONs to a binary format for spaCy training.

In [None]:
# Convert from JSON to binary format for spaCy training - Eval Set
input_path = cwd + "/data/ag_dataset/textcat/annotations/jsons/eval"
output_path = cwd + "/data/ag_dataset/textcat/annotations/binary/eval"
!python -m spacy convert "$input_path" "$output_path"

In [None]:
# Convert from JSON to binary format for spaCy training
# Full Set of Labels
json_path = "/data/ag_dataset/textcat/annotations/jsons/"
bin_path = "/data/ag_dataset/textcat/annotations/binary/"
input_path = cwd + json_path + "train_full_labels"
output_path = cwd + bin_path + "train_full_labels"
!python -m spacy convert "$input_path" "$output_path"

Great! We are now ready to train a text classification model with this annotated data.

### Train Text Classification Models using SpaCy

If you skipped the Prodigy section just now, don't worry. We have generated the train annotations and made them available to you.

We will train two separate models. First, we will train a text classification model using the few hundred annotations we generated using Prodigy and evaluate it against the textcat\_eval set we generated earlier. Then, we will train a second text classification model using the full set of labels in the textcat\_train dataset from earlier and evaluate this, too, against the textcat\_eval set.

To get started, let's first generate the config file for training. We will use a transformer-based model (RoBERTa) and perform transfer learning for our text classification models. We will designate this as a multi-label classification problem because we may have different labels for the same text; in other words, for the same text, different annotators may have labeled the data differently because they disagreed. This is a very common problem when annotating data. You will have internal disagreements / differing judgments among annotators. You could review all the disagreements and resolve them before training your model, or you could set up the model as a multi-label classification model, as we have chosen to do.

> Note: We will refer to the Prodigy annotations version as the "few labels" model since we have only ~800 annotations in total. We will call the model trained on the 96,000 annotations the "full labels" model.

In [None]:
# Generate config file
# the output file we will use for training
config_file_path_output = cwd + "/data/ag_dataset/textcat/config_final.cfg" 

```bash
python -m spacy init config "$config_file_path_output" --lang en \
--pipeline textcat_multilabel --optimize efficiency --gpu --force 
```

If successful, you should see a message similar to Figure 03-16b. We have configured this model as a multi-label text classification model, optimized for efficiency, leveraging GPUs, and using the RoBERTa transformer model as its base model.

![SpaCy Textcat Configuration](images/hulp_0316b.png)

Let’s go ahead and train the text classification model using our Prodigy annotations first. As we did before in the NER training process, we will use the train command in spaCy (Figure 3-17).footnote:[For more on the train command, please visit the [official spaCy documentation](https://spacy.io/api/cli#train).\]

![SpaCy Train Command](images/hulp_0317.png)

In [None]:
# Train model on Text Classification annotations 
# Few Labels from Prodigy
import spacy
annots_path = "/data/ag_dataset/textcat/annotations/binary/"
output_path = cwd + "/models/ag_dataset/textcat/few_labels"
train_path = cwd + annots_path + "train_few_labels"
dev_path = cwd + annots_path + "eval"

Them launch the script via a seperate command:

```bash
python -m spacy train "$config_file_path_output" \ 
--output "$output_path" --paths.train "$train_path" \
--paths.dev "$dev_path" --gpu-id 0 --training.max_epochs 30 --verbose
```

Figure 3-18 displays the results of the training process.

![SpaCy Text Classification Model - Prodigy Annotations](images/hulp_0318.png)

As you can see, the model gets to an F1 score of \~83 after 30 epochs. This is still remarkably good performance given that we trained the model on just a few hundred annotations.

Now, let’s train the text classification model using the 96,000 original labels in the textcat\_train set (remember 24,000 examples were set aside for the textcat\_eval set using train\_test\_split). We should see a much higher F1 score since we will be training on a lot more labels now.

The spaCy train command remains the same as before except for the new output and train path.

In [None]:
# Train model on Text Classification annotations
# Full Set of Labels from AG Dataset
import spacy
annots_path = "/data/ag_dataset/textcat/annotations/binary/"
output_path = cwd + "/models/ag_dataset/textcat/full_labels"
train_path = cwd + annots_path + "train_full_labels"
dev_path = cwd + annots_path + "eval"

```bash
python -m spacy train "$config_file_path_output" \
--output "$output_path" --paths.train "$train_path" \
--paths.dev "$dev_path" --gpu-id 0 --training.max_epochs 1 --verbose
```

Figure 3-19 displays the results of the training process.

![SpaCy Text Classification Model - 96k Original Annotations](images/hulp_0319.png)

As you can see, with the original 96k labels, the model gets an F1 score above 94 after just one epoch. This should be no surprise; with more data, the model performance improves dramatically.

Awesome! We have now just finished training our second NLP model.

## Conclusion

In this chapter, we built models to solve two very popular and core NLP tasks - named entity recognition and text classification - using one of the most widely used and commercially relevant NLP libraries in the market today - spaCy. We also annotated our own data from scratch using Prodigy to develop these two models. You should now have a much better feel for how easy it is to get up and running with NLP models. Both of these models are ready to be used in production, and we will explore how to stand them up in a production pipeline in Chapter 11.

We cannot emphasize this enough: when possible, it is best to start with a large, pretrained language model and then fine-tune the model for your specific task on your specific corpus. By leveraging the prior learning of the pretrained model, you will need far fewer labels and considerably less time to achieve really great performance on tasks such as the ones we solved together in this chapter. In a nutshell, this ability to transfer learning from pretrained models to accelerate new model build is what has made NLP such a hot topic of interest in enterprise today.

Now that you have a better feel for state of the art NLP and how to solve some real world NLP tasks, let's go back to the basics and build some of the foundational knowledge you will need to perform NLP well. We will start with preprocessing and tokenization in the next chapter, following by word embeddings, RNNs, and Transformers. Later in the book, we will return to these models when we discuss productionization of machine learning models.