# Using a custom text classifier

Now that we've built a custom text classifier, let's try to use it!

In [None]:
pip install -q transformers sentencepiece huggingface_hub sklearn

In [None]:
from transformers import pipeline
import pandas as pd
import huggingface_hub
import requests
import os.path

pd.options.display.max_colwidth = 500

## Load in our dataset

In [None]:
if not os.path.exists("wapo-app-reviews-huggingface-full.csv"):
    print("Downloading dataset")
    response = requests.get("https://raw.githubusercontent.com/jsoma/nicar23-huggingface/main/data/wapo-app-reviews-huggingface-full.csv")
    with open('wapo-app-reviews-huggingface-full.csv', 'w') as f:
        f.write(response.text)

In [None]:
df = pd.read_csv("wapo-app-reviews-huggingface-full.csv")
df.head()

## Use our model

We've set our model to private, so we'll need to log in to Hugging Face to be able to use it.

In [None]:
huggingface_hub.login()

But once we do that, we can use the model just like we used for the sentiment analysis notebook!

**You'll need to change the `model="XXXXXX"` line to match your model's name.** Mine was something like `wendys-llc/autotrain-wapo-v3-38832102021` (I recommend using the copy button at the top of your model's web page).

In [None]:
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    # tokenizer="wendys-llc/autotrain-wapo-v3-38832102021",
    model="wendys-llc/autotrain-wapo-v3-38832102021",
    use_auth_token=True)

In [None]:
results = sentiment_pipeline(df.Review.tolist())
results = pd.DataFrame(results).add_prefix('prediction_')
scored = df.join(results)

In [None]:
scored.sort_values(by=['prediction_label', 'prediction_score'], ascending=False).head(20)

In [None]:
scored.prediction_label.value_counts()

## But how did it really do?

While we have measurements like "precision" and "accuracy" and "recall," looking at the actual results in tiny boxes is far more useful than those abstract numbers.

In [None]:
from sklearn.metrics import confusion_matrix

# The predictions are string 0 and 1, so we
# need to convert the 'sexual' column
y_true = scored.sexual.replace({0: '0', 1: '1'})
y_pred = scored.prediction_label
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['not creepy', 'creepy'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)
