# Text Classification: Topics
Trains a model to classify user text into one of 20 different topics.


Below we do the following:

1. Setup the training environment.
2. Load labeled text training data.
3. Build a topic classification model.
4. Convert the model to CoreML and upload to Skafos.

The example is based on [Turi Create's Text Classifier](https://github.com/apple/turicreate/tree/master/userguide/text_classifier).

## Environment Setup
All we need to do is install the turicreate and skafos libraries to get started. This example **doesn't** use a GPU for training.

In [0]:
# Install turicreate and skafos
!pip install turicreate==5.4
!pip install skafos

## Data Preparation and Model Training
The training & testing data for this example is pulled directly from the sklearn package (a popular machine learning library in the Python world), comprised of approximately 20,000 newsgroup documents and partitioned (nearly) evenly across 20 different newsgroups. We use this data to train a topic classifier: *given a sample of text, assign a category that best summarizes the content*. The original dataset can be found [here](http://qwone.com/~jason/20Newsgroups/).



In [0]:
# Import libraries
import pandas as pd
import turicreate as tc
from sklearn.datasets import fetch_20newsgroups

In [0]:
# Select training and testing data. This creates newsgroups_train and newsgroups_test as sklearn.utils.Bunch objects
newsgroup_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'), shuffle='True')
newsgroup_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'), shuffle='True')

In [0]:
# Convert integer labels to text label names for both training and testing data
train_label_names = dict(enumerate(newsgroup_train['target_names']))
train_labels = [train_label_names.get(x) for x in newsgroup_train['target']]

test_label_names = dict(enumerate(newsgroup_test['target_names']))
test_labels = [test_label_names.get(x) for x in newsgroup_test['target']]

In [0]:
# Construct training and testing dataframes
train_data = tc.SFrame({'text': newsgroup_train['data'], 'label': train_labels})
test_data = tc.SFrame({'text': newsgroup_test['data'], 'label': test_labels})

# Strip out new lines and other characters here
# In the future, you can include more text cleaning logic here
# This is useful for normalizing/standardizing your text input in order to build a more accurate classifier
train_data['text'] = train_data['text'].apply(lambda x: x.replace('\n', ' ').replace('/', '').replace('\\', ''))
test_data['text'] = test_data['text'].apply(lambda x: x.replace('\n', ' ').replace('/', '').replace('\\', ''))

In [0]:
# Take a look at our training SFrame
train_data.head()

In [0]:
# Train a topic classification model - this may take a few minutes to train
model = tc.text_classifier.create(
    dataset=train_data,
    target='label',
    features=['text'],
    drop_stop_words=True,
    word_count_threshold=2
)

# Text Classifier Training Docs:
# https://apple.github.io/turicreate/docs/api/generated/turicreate.text_classifier.create.html#turicreate.text_classifier.create

## Model Evaluation


In [0]:
# Now that the model is trained, we can evaluate against a test set
test_predictions = model.predict(test_data)
accuracy = tc.evaluation.accuracy(test_data['label'], test_predictions)
print(f'Topic classifier model has a testing accuracy of {accuracy*100} % !', flush=True)

In [0]:
# Classify a new example of text - try different text values here
example_text = {"text": ["My computer is broken and I need to take it to the repair shop to have electrical components fixed."]}
example_prediction = model.classify(tc.SFrame(example_text))
print(example_prediction, flush=True)

## Model Export and Skafos Upload
- Convert the model to CoreML format so that it can run on an iOS device. Then deliver the model to your apps with **[Skafos](https://skafos.ai)**.

- If you don't already have an account, Sign Up for one **[here](https://dashboard.skafos.ai)**. 
- Once you've signed up for an account, grab an API token from your account settings.

In [0]:
# Specify the CoreML model name
model_name = 'TextClassifier'
coreml_model_name = model_name + '.mlmodel'

# Export the trained model to CoreML format
res = model.export_coreml(coreml_model_name) 


In [0]:
import skafos
from skafos import models
import os

# Set your API Token first for repeated use
os.environ["SKAFOS_API_TOKEN"] = "<YOUR-SKAFOS-API-TOKEN>"

# You can retrieve this info with skafos.summary()
org_name = "<YOUR-SKAFOS-ORG-NAME>"    # Example: "mike-gmail-com-467h2"
app_name = "<YOUR-SKAFOS-APP-NAME>"    # Example: "Text-App"
model_name = "<YOUR-MODEL-NAME>"       # Example: "TextClassifierModel"

# Upload model version to Skafos
model_upload_result = models.upload_version(
    files="TextClassifier.mlmodel",
    org_name=org_name,
    app_name=app_name,
    model_name=model_name
)