## Using SAS DLPy to Train Deep Learning Models with Non-Default Parameters

You can use SAS DLPy to  easily create and train a variety of basic task-oriented deep learning models. Basic model training parameters can be customized to better tailor the model for your analytic task and data. You can change your model's existing hyperparameter values, as well as add new hyperparameters for advanced training settings. 

The learning objective of this notebook is to understand the mechanics of using SAS DLPy to modify default model hyperparameter values, and how to score benchmark data to qualitatively understand how different hyperparameter changes affect the model's predictive performance. 

Model parameter configuration and optimization depends on the individual analytic task and the data. The different types of hyperparameter setting changes explored in this notebook should not be mistaken for steps to tune or optimize model hyperparameters. Users should be able to use the tools demonstrated in this notebook to explore their own unique analytic task, data, and model performance, with the goal of potentially identifying a good starting candidate for hyperparameter tuning. 

This notebook shows how to use and modify non-default hyperparameters for a basic text classification model designed to perform a text sentiment analysis task. The task is to read short restaurant reviews and classify the sentiment of the review text as either `positive`, `negative`, or `neutral`.

A series of text classification sentiment analysis models are created and trained using provided train and test data. Each successive trained model scores the test data in order to assess its performance. The best performing model might be a starting candidate for hyperparameter tuning.

The example steps include creating train and test data for the analytic task. Like many predicitive text classification models, the models in this notebook require a trained word embedding to perform the analytic task. This notebook contains detailed instructions to create your own reusable word embedding using publicly available resources. 

The example begins by using SAS DLPy to create and train a simple text classification model. The simple model is trained using default hyperparameter values, then scores the test data. A total of five text classification models are created and trained using different types of hyperparameter settings.
 
This example assumes that you have SAS DLPy API installed, have an active SAS CAS server, and have installed the common Python utilities used in the code (numpy, matplotlib, Pandas).

### Table of Contents 
- [Important Note: Client and Server Definitions](#ClientServer)
- [Prepare Resources and Configure Environment for Modeling](#getReady)
    - [Import Required Python and SAS DLPy Utilities](#importPythonDLPy)
    - [Download a Pre-Trained Word Vector File](#downloadEmbeddings)
    - [Format the Word Vector File for Modeling](#addColHeadings)
- [Configure SAS SWAT and Launch SAS CAS](#launchCAS)
    - [Load the Word Embedding File](#loadEmbeddings)
    - [Create and Load the Training Data](#loadTrain)
    - [Create and Load the Test Data](#loadTest)
- [Use SAS DLPy to Create a Simple Text Classification Model](#Model1)
    - [Train with Default Settings](#train1Model1)
    - [Score Test Data with Simple Model](#scoreModel1)
- [Create Simple Text Classification Model 2](#Model2) 
    - [Train with Different Learning Rate, Epoch Count, Log Level](#trainModel2)
    - [Score Test Data with Model 2](#scoreModel2)
- [Create Simple Text Classification Model 3](#Model3)
    - [Train with Different Optimizer Settings](#trainModel3)
    - [Score Test Data with Model 3](#scoreModel3)
- [Create Simple Text Classification Model 4](#Model4)    
    - [Train with Different Optimizer and Momentum Solver Settings](#trainModel4)
    - [Score the Test Data with Model 4](#scoreModel4)
- [Create Simple Text Classification Model 5](#Model5)    
    - [Train with Different Optimizer and Adam Solver Settings](#trainModel5)
    - [Score Test Data with Model 5](#scoreModel5)
- [Summary](#summary)    


<a id = "ClientServer"></a>

### Important Note: Client and Server Definitions
SAS Viya literature and technical documentation often refers to client and server entities. In this scenario, the client is the computer that runs the Jupyter notebook with the example code. The server is the computer that is running the Viya server. These two computers might (or might not) use the same operating system, and might (or might not) have access to a common file system.

This notebook assumes that the client and server do not use the same operating system, but that they do have access to a common file system. If the client and server in your environment do not have access to a common file system, you will need to copy or transfer files between client and server project folders during this example.

For this notebook example, you can point the server-side and client-side path variables to the same folder location. You must use appropriate client-side or server-side path specifications in the path expression.  

In [1]:
# This code defines server-side and client-side path variables 
# used in the code to specify the location for input data files,
# model files, and scored data tables in the example.
# Both paths can point to the same folder in a common file system.

# Server project root location (your path will be different)
server_project_root = r'/your-server-side/path-to/example-files/' 

# Client project root location (your path will be different)
client_project_root = r'\\your-client-side\path-to\example-files'

<a id = "getReady"></a>

### Prepare Resources and Configure Computing Environment for Modeling

Use this section to organize all of the resources you need and configure your local computing environment in advance, so you can follow along with the example notebook modeling operations without interruption. 


<a id="importPythonDLPy"></a>

#### Import Required Python and SAS DLPy Modules

Import the Python utilities and SAS DLPy libraries that will be used for the text classification task. Import utilities for the the pandas data analysis library, the numPy scientific computing library, and the matplotlib plotting utility.

In [2]:
# Import Matplotlib Utilities  
from matplotlib import pylab as plt
from matplotlib import image as mpimg

# Display plot results in notebook cells
%matplotlib inline

# Python utility libraries
import pandas as pd
import numpy as np
import pandas as pd
import csv
import os

# Import SAS DLPy Python libraries
import dlpy
from dlpy import *
from dlpy import Sequential
from dlpy.model import *
from dlpy.model import TextParms
from dlpy.blocks import *
from dlpy.blocks import Bidirectional
from dlpy.applications import *
from dlpy.applications import TextClassification
from dlpy.network import *
from dlpy.utils import *
from dlpy.layers import *

# Filter warning messages
import warnings
warnings.simplefilter(action='ignore', 
                      category=FutureWarning)

<a id="downloadEmbeddings"></a>

#### Download a Pre-Trained Word Embedding

Sentiment analysis is one of many NLP machine learning tasks that use a pre-trained word embedding. A word embedding maps word and phrase data into numeric vector data. A typical word embedding might have 50, 100, 200, or 300 dimensions for each word or phrase. Larger dimensions in a word embedding can increase predictive accuracy, potentially at the cost of an increase in the complexity of the model using the embedding. 

Pre-trained word embeddings are an example of transfer learning. A word embeddings model first performs extensive trainining on a massive input corpus of text. Once the extensive training investment is complete, the output word embedding can be used in numerous other NLP models without the significant initial training cost. 

Word embedding models use a diverse variety of input text sources for model training. For example, the well-known [Word2Vec](https://code.google.com/archive/p/word2vec/) project in the Google Code Archive includes a model that was trained on a Google News data set of about 100 billion words. The output is a large downloadable word embedding archive GoogleNews-vectors-negative300.bin.gz (1.5 GB) that contains 300 dimensions for 3 million English language words and phrases. You can choose a different text corpus to train Word2Vec: the [Google code archive](https://code.google.com/archive/p/word2vec/) includes links to five other online text corpora that range in size from 1 billion to 3 billion words. The diverse corpora sources include aggregated European News Commentary archives, Wikipedia content dumps, Gigaword repositories in multiple languages, and the historical base from the UMBC (University of Maryland, Baltimore Campus) web archives.   

<b>Note:</b> The Google Word2Vec research project is not a Google product. Word2Vec is licensed for use according to the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) agreement.    

Another well-known word embeddings source is the [GloVe](https://nlp.stanford.edu/pubs/glove.pdf) (Global Vectors for Word Representation) unsupervised learning algorithm developed by Jeffrey Pennington, Richard Socher, and Christopher D. Manning of Stanford University. GloVe models train by determining global word-word co-occurrence statistics from a large text corpus. The Stanford [GloVe site](https://nlp.stanford.edu/pubs/glove.pdf) features a number of word embedding archives that were trained on different sizes and types of text corpora. The GloVe site files can be downloaded for use as specified in the [Public Domain Dedication and License v1.0](http://www.opendatacommons.org/licenses/pddl/1.0/) document. 

For example, the [glove.6B.zip](http://nlp.stanford.edu/data/glove.6B.zip) (822 MB) word vector files were trained on the the Wikipedia 2014 + Gigaword 5 corpus. It features 6 billion tokens and a 400,000 word uncased vocabulary. The archive includes four separate trained word embeddings that have 50, 100, 200, and 300 dimensions, respectively. 

Numerous available word embedding algorithms and software products provide trained word embeddings that are suitable for tutorial and research modeling consumption. This example does not require a specific word embedding from any specific provider: the choice of provider and word embedding is left to the modeler.

The toy input data set used in this example is relatively small, so a trained word embedding with 100 dimensions is more than sufficient. (A 50-dimension word embedding would suffice, but considering the small size of the training data, the increase to 100 dimensions is a trade-off to improve model performance.) The 100-dimension word embedding used in this example was formatted by adding column header information and cleansing the index column. The resulting file is saved as `word_embeddings.txt`. This notebook includes code that you can use to format your choice of trained word embedding.

The structure of the word embedding `word_embeddings.txt` used in this example resembles the 100-dimension trained word embedding `glove.6B.100d.txt` that is included in Stanford's Wikipedia 2014 + Gigaword 5 [glove.6B.zip](http://nlp.stanford.edu/data/glove.6B.zip) archive. However, you should be able to format and use any one of a number of available trained 100-dimension word embeddings to use with this notebook and deliver fundamentally equivalent predictive results. 

Copy the word embedding of your choice to the folder location you specified earlier as `server_text_generation_root`. The following example section provides code that you can use to format the word embedding for modeling.

<a id="addColHeadings"></a>

#### Format the Word Embedding for Modeling

Word embedding files typically contain sequential delimited term and vector data in a flat file. The flat file is formatted for model consumption by creating a word vector (word embeddings) table with column headings, and then the table is cleansed by removing rows that have forbidden index column values (e.g., quotation marks). The table header row consists of a `term` column, followed by sequential integer headings for each dimension column in the table. For example, a formatted 100-dimension word embedding file has the following column headings: " term , _1_ , _2_ , ... , _99_ , _100_". The `term` column contains character and word strings. The numeric columns contain word vector values for the specified dimension. The number of rows in the table is determined by the size of the source corpus. 

If your chosen word embedding file is not formatted and cleansed, you can use the following code to format a "raw" downloaded word embedding file into a table indexed by term, with sequential numbered columns for every dimension of word embedding values. The code also removes table rows for index entries that have reserved or forbidden character string values.

Note: Word embedding files can be large. A typical 100-dimension word embedding file for a corpus of 400,000 terms with 100 vector values per term is a table with over 40 million values. Adjust your expectations for the computation time accordingly. The required time to format a word embedding file for modeling using the code below scales with the number of dimensions in the raw embedding file.

In [3]:
# Most open-source word embedding files do not have formatted column headings.
# (Use a file viewer to check for column headings and table structdure.)
# You can skip this section if your word embedding file already has been cleansed
# and appropriately formatted for modeling.

# Save the word embedding file full path spec to the variable 'raw_embedding_file'.
# This example expects a 100-dimension trained word embedding file in .txt format. 
raw_embedding_file = os.path.join(client_project_root,'glove.6B.100d.txt')

# The 'raw_embedding_dimension' parameter should match the  
# number of dimensions in the preferred word embedding file:

# For a 50-dimension word embedding file
# raw_embedding_dimension = 50

# For a 100-dimension word embedding file
raw_embedding_dimension = 100

# Use variable 'col_names' to accumulate the 
# generated header strings for all table columns.
col_names = ['term'] + ['_'+str(ii)+'_' for ii in range(1,raw_embedding_dimension+1)]

# Pandas reads in the tab-delimited embedding values
# from the word embedding file with no header
df = pd.read_csv(raw_embedding_file, 
                 names=col_names,
                 sep=" ", 
                 index_col=0, 
                 header=None,
                 quoting=csv.QUOTE_NONE)


# Clean up and omit rows in the table that have 
# reserved or forbidden index character strings.
tmp = [str(df.index[ii]) for ii in range(df.shape[0])]
idx = [ii for ii,txt in enumerate(tmp) if ('"' not in txt) and ("'" not in txt)]
df1 = df.iloc[idx]

# Save the updated 100-dimension table with column headings  
# as a new tab-delimited file named 'word_embeddings_100.txt'. 
# Store the full client-side path specification for the word 
# embedding file in the variable 'pretrained_embedding_file'.

pretrained_embedding_file = os.path.join(client_project_root,'word_embeddings_100.txt')
df1.to_csv(pretrained_embedding_file, 
           sep='\t', 
           header=True,
           float_format='%5.6f',
           index=True,
           quoting=csv.QUOTE_NONE)

<a id="launchCAS"></a>

### Configure SAS SWAT and Launch SAS CAS

The following code configures SAS SWAT and launches SAS CAS. 

SWAT is a Python interface to SAS CAS that enables you to load data into memory and apply CAS actions to the data.

<b>Note:</b> For more information about starting a CAS session with the SWAT package, see https://sassoftware.github.io/python-swat/getting-started.html.

In [4]:
# Import SAS SWAT
from swat import *

# SWAT data message handler
import swat.cas.datamsghandlers as dmh

In [5]:
# Configure CAS session for Analytics

s = CAS('your-host-name.unx.company-name.com', 5570)

# Exception handler for reading StringIO text data into memory
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO  


<a id="loadEmbeddings"></a>

### Load the Word Embedding File

Load the 100-dimension word embedding file that you created. Create a Pandas data frame `embeddings` for the `word_embeddings_100.txt` file, then upload the `embeddings` data frame to SAS CAS as table `word_embeddings_100`. Word embedding files tend to be large, so be patient during processing.

In [6]:
# load the pretrained word embedding file
# that you created earlier by creating and uploading 
# a Pandas data frame with the embeddings.
embeddings = pd.read_csv(client_project_root+'\word_embeddings_100.txt',
                         skipinitialspace=True, 
                         index_col=False, 
                         delimiter='\t'
                         )
s.upload_frame(embeddings, 
               casout=dict(name='word_embeddings_100', 
                           replace=True
                          )
              )

NOTE: Cloud Analytic Services made the uploaded file available as table WORD_EMBEDDINGS_100 in caslib CASUSER(UserID).
NOTE: The table WORD_EMBEDDINGS_100 has been created in caslib CASUSER(UserID) from binary data uploaded to Cloud Analytic Services.


CASTable('WORD_EMBEDDINGS_100', caslib='CASUSER(UserID)')

Now verify that the CAS table `word_embeddings_100` containing the word vector file was created.

In [7]:
# Verify that the embedding file was 
# created and loaded
s.table.tableInfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,WORD_EMBEDDINGS_100,398921,101,0,utf-8,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,UTF8,1929200000.0,...,0,0,0,,,0,UserID,,2021-02-17T12:01:34-05:00,1929200000.0


<a id="loadTrain"></a>

### Create and Load the Training Data

After creating the CAS table `WORD_EMBEDDINGS_100`, use the code block below to manually create the toy training data set for the text classification model. The model task is text sentiment analyis. 

The table creation code below contains data for three comma delimited columns: `review` text, `sentiment` classification (`positive`, `neutral`, or `negative`), and a rating of one-to-five `stars`. The table is saved in Python as `sentiment_data` and loaded into CAS as table `sentiment_data`. The `sentiment_data` is used to train all five text classification models in this example.

In [8]:
# Create a toy dataset named sentiment_data 
# to be used for text classification training 
# using restaurant review metadata
# (review,sentiment,stars)

sentiment_data = StringIO('''review,sentiment,stars
 "Average food and average experience.","neutral", 3
 "The desserts are amazing! Love the banana pudding!","positive", 5
 "I love this place. Friendly and welcoming, makes me feel happy!","positive", 5
 "Disappointed. Overpriced cafeteria fare.","negative", 1
 "My favorite place to celebrate birthdays! Yummy cake!","positive", 4
 "Regular chow. Does the job.","neutral", 3
 "Wow! Best burger in years. Extra juicy! Yum!","positive", 5
 "Love a huge side salad! Good value for the money.","positive", 4
 "I don't like cold food. I was disappointed.","negative", 2
 "I don't like fishy smell! Not worth it.","negative", 1
 "Awesome desserts! The best.","positive", 5
 "Predictable comfort food.","neutral", 3
 "Great food and wonderful atmosphere","positive", 5
 "I love this place! Great fun on weekends!","positive", 5 
 "Worst restaurant ever!","negative", 1
 "Such a disaster! Waste of money.","negative", 1
 "Great bartenders. Kim my favorite always makes me happy.", "positive", 5
 "The food is so good! Always pleased. Kudos.", "positive", 4
 "Had a terrible table next to the kitchen. Too loud.", "negative", 2
 "Regular comfort food.","neutral", 3 
 "The whole place smelled bad. Not great.", "negative", 2
 "It was too cold inside. Yuck! I was freezing!", "negative", 2
 "Did not like the sauce. Too spicy.", "negative", 2
 "Most awesome chicken fried steak! Favorite!", "positive", 5
 "The cooks are great! This place rocks!", "positive", 5
 "Chicken was overcooked. So disappointed.", "negative", 2
 "Angry! Was overcharged! Dishonest! Never coming again.", "negative", 1
 "Server forgot our order. What a disaster. Very disappointed.", "negative", 1
 "It was what we expected. Good stuff. Happy.", "positive", 4
 "Slow service, lousy food. Unhappy.", "negative", 2
 "Fast and tasty game day treats! The best!", "positive", 5
 "My favorite place for wings. Yum!", "positive", 4
 "They have problems keeping sour cream. It makes me sad.", "negative", 2
 "Yuck! Fingernail in my food. Gross! Not coming back!", "negative", 1
 "Good place to eat and the hostess is so nice!", "positive", 4
 "Great birthday venue. Happiness all around.", "positive", 5
 "Average cocktails and average beers.", "neutral", 3
 "OK salad bar and OK burgers.", "neutral", 3
 "Very disappointing dining experience." , "negative", 1
 "Sorry waitress disappeared. Aggravating.", "negative", 1
 "Fries were OK.", "neutral", 3
 "Clumsy server spilled tea on me. So disappointed.", "negative", 2
 "Failed to cook my steak properly.", "negative", 1
 "Everything was amazing! Perfect! I'll be back!", "positive", 5
 "Not for me. Shabby. Not like in New York.", "negative", 1 
 "Thrilled to come every time. Fantastic food!", "positive", 5
 "Amazing place. Our favorite for years.", "positive", 5
 "Delightful drinks fabulous food.", "positive", 5
 "Disastrous drinks terrible food.", "negative", 1
 "Fantastic frog legs! Best in the Bayou!", "positive", 5
 "Disgusting frog legs! Worst in the Bayou!", "negative", 1
 "Yummy asparagus best cheesecake in town!", "positive", 5
 "Mushy asparagus worst cheesecake ever.", "negative", 1
 "Best place ever!", "positive", 5
 "Worst place ever.", "negative", 1
 "It was OK.", "neutral", 3
 "Average dining experience.", "neutral", 3
 "Spectacular! Delightful!", "positive", 5
 "Edible and OK.", "neutral", 3
 "Amazing place! My favorite!", "positive", 5
 "Disgusting place! Awful choice.", "negative", 1
 "Gross! My server was sick. Unacceptable!", "negative", 1
 "Unsanitary and disgusting. Got sick at home.", "negative", 1 
 "Super clean! Tasty food! Friendly Staff! Everyone happy!", "positive", 5
 "Mostly average.", "neutral", 3
 "No! A bug in my food! Bad experience.", "negative", 1
 "My server smelled bad. Very unfortunate.", "negative", 2
 "The best chicken cordon bleu ever!", "positive", 5
 "Average drinks and average appetizers.", "neutral", 3
 "Amazing bartender! Best drinks ever!", "positive", 5
 "Fantastic hostess! Great tables!", "positive", 5
 "Average food. OK.", "neutral", 3
 "Forgot my order. Hated the wait ruined the night.", "negative", 1
 "It was food.", "neutral", 3
 "Fast turnover good service good food.", "positive", 4
 "Nightmare experience. Everything went wrong.", "negative", 1
 "Best night ever! So special! Perfect for pre-prom!", "positive", 5
 "Good hot and plenty food.", "positive", 4
 "The food was OK.", "neutral", 3
 "Fast and yummy in my tummy.", "positive", 4
 "Slow boring and below-par experience.", "negative", 2
 "Popular on dates. Everybody leaves happy.", "positive", 4
 "Love this place! Eat here all the time!", "positive", 4
 "Best grits in the South! Love love love it!", "positive", 5
 "Worst grits ever! Disaster using instant grits!", "negative", 1
 "Awful hearing sneezes from the kitchen. Unhealthy and rude!", "negative", 1
 "Yum! The soup of the day is always delicious!", "positive", 4
 "Wonderful for big parties! Great!", "positive", 5
 "Crooks and bums. Bad food, awful service.", "negative", 1
 "Disappointed in my waitress. Needed ketchup.", "negative", 2
 "Server forgot my food. Ate late. Very unhappy.", "negative", 1
 "Most amazing place ever! Delicious!", "positive", 5
 "My favorite! Eat here every day if I can! Yum!", "positive", 5
 "OK experience.", "neutral", 3
 "Everything was OK and average.", "neutral", 3
 "Unhappy. Oysters had gone bad. Got sick. Bleh.", "negative", 1
 "I won't be back. Too much rudeness and ugly behavior.", "negative", 1
 "A reasonable meal. Average.", "neutral", 3
 "Pretty good nachos and great tasty beer.", "positive", 4
 "The best cheesecake EVER!!!", "positive", 5
 "The cheesecake was AWFUL! Ruined!", "negative", 1
 "The cheesecake was from a box. It was OK.", "neutral", 3
 "I did not like the nacho chips. Stale food is bad.", "negative", 2
 "The nachos are wonderful! A taste of home! Love it!", "positive", 5
 "warm entrees OK.", "neutral", 3
 "Amazing appetizers delicious entrees! A winner!", "positive", 5
 "Cold appetizers soggy entrees a sad experience.", "negative", 1
 "They ran out of beer! What a disaster! Awful!", "negative", 1
 "My server stank bad of cigarettes.", "negative", 2
 "They ran out of ice! Bad drinks, bad waits.", "negative", 2
 "The bartender tended the bar.", "neutral", 3
 "My favorite fried cheese ever! Piping hot and yummy!", "positive", 5
 "Worst fried cheese ever! Frozen cold and unappetizing.", "negative", 1
 "The fried cheese was OK.", "neutral", 3
 "The steaks were legendary! Nice size, cooked right.", "positive", 5
 "Best hot dogs ever.", "positive", 5
 "Love the chicken tenders. Happy kid!", "positive", 5
 "Bartender cannot make a good Bongo Smash. Such a shame.", "negative", 2
 "They ran out of beer at 10 pm. Disappointing.", "negative", 2
 "I love the salad bar. Great after-tennis meal.", "positive", 4
 "Amazing place. Our favorite for years.", "positive", 5
 "Delightful drinks and fabulous food.", "positive", 5
 "Disastrous drinks and terrible food.", "negative", 1
 "Fantastic frog legs! Best in the Bayou!", "positive", 5
 "Disgusting frog legs! Worst in the Bayou!", "negative", 1
 "Yummy asparagus, best cheesecake in town!", "positive", 5
 "Mushy asparagus, worst cheesecake ever.", "negative", 1
 "Best place ever!", "positive", 5
 "Worst place ever.", "negative", 1
 "It was OK.", "neutral", 3
 "Amazing.", "positive", 4
 "Acceptable food.", "neutral", 3
 "I had the veal. OK.", "neutral", 3
 "Amazing place! My favorite!", "positive", 5
 "Disgusting place! Terrible choice.", "negative", 1
 "My server was sick. Unacceptable!", "negative", 1
 "Unsanitary and disgusting. Got sick at home.", "negative", 1 
 "The steaks were awful! Too small and overcooked.", "negative", 1
 "OK steaks and OK meal.", "neutral", 3
 "Great steaks.", "positive", 4
 "Steaks were not the best.", "negative", 2
 "Salad was disappointing. Wilty and sad looking.", "negative", 2
 "Delicious crisp salads! Fresh croutons! Great dressings!", "positive", 5
 "Food and salads average.", "neutral", 3
 "Awful salad. Awful food.", "negative", 1
 "Great chicken and good salad.", "positive", 4
 "Salad was bad.", "negative", 2
 "Best breakfast ever!", "positive", 5
 "Worst breakfast ever!", "negative", 1
 "Breakfast was OK.", "neutral", 3
 "Great brunch love the hollandaise and eggs.", "positive", 4
 "Very disappointing brunch. Long wait, bad food.", "negative", 1
 "Amazing brunch. Five stars happy!", "positive", 5
 "It was brunch.", "neutral", 3
 "Tuna salad smelled bad. Unhappy. Would not eat it.", "negative", 2
 "Grossest tuna salad I've ever had. Awful. Won't be back.", "negative", 1
 "Most amazing tuna salad in the world! Winner!", "positive", 5
 "Awful bartender. Couldn't make a Goombay Smash.", "negative", 1
 "Outstanding bartender! Best Goombay Smash ever!", "positive", 5
 "Acceptable Goombay Smash.", "neutral", 3
 "Tastiest frosty beer in the state!  My best choice!", "positive", 5
 "Warm beer stinks! No fun. Fix that broken keg box.", "negative", 2
 "Disaster. No beer. So disappointed.", "negative", 1
 "They had draft beer.", "neutral", 3
 ''')
handler = dmh.CSV(sentiment_data, 
                  skipinitialspace=True
                 )
s.addtable(table='sentiment_data', 
           replace=True, 
           **handler.args.addtable
          )

In [9]:
# Verify that the sentiment_data table 
# was created.
s.table.tableInfo()


Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,WORD_EMBEDDINGS_100,398921,101,0,utf-8,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,UTF8,1929200000.0,...,0,0,0,,,0,UserID,,2021-02-17T12:01:34-05:00,1929200000.0
1,SENTIMENT_DATA,164,3,0,utf-8,2021-02-17T14:04:53-05:00,2021-02-17T14:04:53-05:00,2021-02-17T14:04:53-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,


What does the data in the table `sentiment_data` look like? Find out by using the DLPy `table.fetch()` function to display 5 samnple rows.

In [10]:
# Display five rows from the train table
# sentiment_data.
s.table.fetch(table="sentiment_data", 
              format=True,
              to=5
              )

Unnamed: 0,review,sentiment,stars
0,Average food and average experience.,neutral,3
1,The desserts are amazing! Love the banana pudd...,positive,5
2,"I love this place. Friendly and welcoming, mak...",positive,5
3,Disappointed. Overpriced cafeteria fare.,negative,1
4,My favorite place to celebrate birthdays! Yumm...,positive,4


<a id="loadTest"></a>

### Create and Load the Test Data

After creating the CAS table `SENTIMENT_DATA`, use the code block below to manually create the toy test data set `sentiment_test` that trained text sentiment models can score. 

The test code block is read into a CAS table named `sentiment_test`. The `sentiment_test` table is scored by all of the text classification models in this example.  

In [11]:
# Now create a toy test dataset named sentiment_test
# to be used for text classification modeling. 
# (review,sentiment,stars)

sentiment_test = StringIO('''review,sentiment,stars
 "Disappointed in the expensive food. Not worth it!","negative", 1
 "Wow! I love this place. The best desserts! ","positive", 5
 "Ordinary meal. OK.","neutral", 3
 "Average dining experience.", "neutral", 3
 "Loved it! Fantastic server and food!", "positive", 5
 "I love the ravioli. It is my favorite!", "positive", 4
 "Rude server forgot us. Disaster! Very disappointed.", "negative", 1
 "Good food. Good staff. Happy.", "positive", 4
 "Slow service, crappy food. Unhappy.", "negative", 2
 "Fast and tasty game day treats! The best!", "positive", 5
 "My favorite place for wings. Yum!", "positive", 4
 "Problems keeping sour cream. It makes me sad.", "negative", 2
 "Yuck! Fingernail in my potatoes. Disgusting! Not coming back!", "negative", 1
 "Good place to eat. Nice hostess too!", "positive", 4
 "Great birthday venue. Happiness all around.", "positive", 5
 "Average cocktails. OK.", "neutral", 3
 "My favorite salad bar and delicious burgers.", "positive", 5
 "Upset. Mixed up my order and ruined my night." , "negative", 1
 "Waitress disappeared. So unhappy.", "negative", 1
 "Fries were awesome and wonderful. Yes!", "positive", 5
 "Best hot dogs ever.", "positive", 5
 "Love the chicken tenders. Happy kid!", "positive", 5
 "Untrained bartender makes bad drinks. Such a shame.", "negative", 2
 "Poor planning no beer at 10 pm. Disappointing.", "negative", 2
 "I love the salad bar. Great meal.", "positive", 4
 "Amazing place. Our favorite for years.", "positive", 5
 "Delightful drinks and fabulous food.", "positive", 5
 "Disastrous drinks and terrible food.", "negative", 1
 "Fantastic frog legs! Best in the Bayou!", "positive", 5
 "Disgusting frog legs! Worst ever!", "negative", 1
 "Yummy asparagus, best cheesecake in town!", "positive", 5
 "Mushy asparagus, worst cheesecake ever.", "negative", 1
 "Best place ever!", "positive", 5
 "Worst place ever.", "negative", 1
 "It was OK.", "neutral", 3
 "Average.", "neutral", 3
 "Amazing food. Hits the spot! Awesome", "positive", 5
 "Delightful food. It's the best.", "positive", 5
 "Amazing place! My favorite!", "positive", 5
 "Disgusting place! Terrible choice.", "negative", 1
 "Awful night my server was sick. Unacceptable!", "negative", 1
 "Unsanitary and disgusting. Bad meal.", "negative", 1 
  "Best pizza ever! Awesome!", "positive", 5
 "Worst calzone ever! Total disappointment.", "negative", 1
 "Terrible pizza place.", "negative", 1
 "Superior service and delightful food!", "positive", 5
 "Really good chow.", "positive", 4
 "Great staff and delicious food!", "positive", 5
 "Best cocktails in New Orleans", "positive", 5
 "I love this bakery! Top class!", "positive", 5
 ''')
handler = dmh.CSV(sentiment_test, 
                  skipinitialspace=True
                 )
s.addtable(table='sentiment_test', 
           replace=True, 
           **handler.args.addtable
          )

In [12]:
# Verify that the toy data set sentiment_test
# was created.

s.table.tableInfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,WORD_EMBEDDINGS_100,398921,101,0,utf-8,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,UTF8,1929200000.0,...,0,0,0,,,0,UserID,,2021-02-17T12:01:34-05:00,1929200000.0
1,SENTIMENT_DATA,164,3,0,utf-8,2021-02-17T14:04:53-05:00,2021-02-17T14:04:53-05:00,2021-02-17T14:05:30-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,
2,SENTIMENT_TEST,50,3,0,utf-8,2021-02-17T14:12:35-05:00,2021-02-17T14:12:35-05:00,2021-02-17T14:12:35-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,


In [13]:
# Display five rows from the test table
# sentiment_data.
s.table.fetch(table="sentiment_test", 
              format=True,
              to=5
              )

Unnamed: 0,review,sentiment,stars
0,Disappointed in the expensive food. Not worth it!,negative,1
1,Wow! I love this place. The best desserts!,positive,5
2,Ordinary meal. OK.,neutral,3
3,Average dining experience.,neutral,3
4,Loved it! Fantastic server and food!,positive,5


<a id="Model1"></a>

### Use DLPy to Create a Simple Text Classification Model

Now use SAS DLPy `TextClassification()` function to create a simple RNN text classification model with default settings. Name the model `easy_rnn_model` and save it in SAS CAS as `text_classifier_1`.

In [14]:
# Create a simple RNN Text Classification 
# model for classifying reviews and call 
# it easy_rnn_model
easy_rnn_model = TextClassification(s, model_table='text_classifier_1')

NOTE: Output layer added.
NOTE: Model compiled successfully.


In [15]:
# Verify that the CAS table text_classifier_1 
# was created.

s.table.tableInfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,WORD_EMBEDDINGS_100,398921,101,0,utf-8,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,UTF8,1929200000.0,...,0,0,0,,,0,UserID,,2021-02-17T12:01:34-05:00,1929200000.0
1,SENTIMENT_DATA,164,3,0,utf-8,2021-02-17T14:04:53-05:00,2021-02-17T14:04:53-05:00,2021-02-17T14:05:30-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,
2,SENTIMENT_TEST,50,3,0,utf-8,2021-02-17T14:12:35-05:00,2021-02-17T14:12:35-05:00,2021-02-17T14:13:32-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,
3,TEXT_CLASSIFIER_1,121,5,0,utf-8,2021-02-17T14:14:13-05:00,2021-02-17T14:14:13-05:00,2021-02-17T14:14:13-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,


As an additional exercise, you can uncomment the code block below and use the `plot_network()` function to visualize the DAG for the easy text classification network created with DLPy.

In [None]:
# Uncomment the bottom line to 
# visualize the text classifier model easy_model

#easy_rnn_model.plot_network()

<a id="trainModel1"></a>

#### Train the Text Classification Model using Default Pararmeters

Train the new text classifier model `easy_rnn_model` using default SAS DLPy hyperparameter settings. Use the DLPy `fit()` function, the `sentiment_data` training table, and the `word_embeddings_100` embedding file.

The training uses the text data in the `review` column of the input data to predict the value of the nominal class (`positive`, `neutral`, `negative`) in the `sentiment column`. This is a text classification example, so the numeric values in the `stars` column are not used to make predictions.

Values for `seed` and `n_threads` are specified to support model determinism and to provide repeatable model training results. Deterministic models always produce the same output from a given set of starting parameters. Using single-threaded computations eliminates computational randomness introduced by multiple threading. This is useful when you want to be able to duplicate example notebook model training results. 

<b>Warning:</b> Using `n_threads=1` to force single-threaded computations for large models or large data sets is not recommended unless you need a fully deterministic model. Allowing multiple threads enables significantly faster model training. It is normal to expect slightly different results in multi-threaded trained models.

In [16]:
# Train the text classification model
# easy_rnn_model using sentiment_data  
# and default SAS DLPy parameter values
easy_rnn_model.fit(data='sentiment_data', 
                   seed=8675309,
                   n_threads=1,
                   inputs='review', 
                   texts='review', 
                   target='sentiment', 
                   nominals='sentiment',
                   text_parms=TextParms(init_input_embeddings='word_embeddings_100')
                   )

NOTE: Training from scratch.
NOTE:  Synchronous mode is enabled.
NOTE:  The total number of parameters is 10443.
NOTE:  The approximate memory cost is 1.00 MB.
NOTE:  Loading weights cost       0.00 (s).
NOTE:  Initializing each layer cost       2.39 (s).
NOTE:  The total number of threads on each worker is 1.
NOTE:  The total mini-batch size per thread on each worker is 1.
NOTE:  The maximum mini-batch size across all workers for the synchronous mode is 1.
NOTE:  Target variable: sentiment
NOTE:  Number of levels for the target variable:      3
NOTE:  Levels for the target variable:
NOTE:  Level      0: negative
NOTE:  Level      1: neutral 
NOTE:  Level      2: positive
NOTE:  Number of input variables:     1
NOTE:  Number of text input variables:      1
NOTE:  Batch nUsed Learning Rate        Loss  Fit Error   Time(s) (Training)
NOTE:      0     1     0.01            1.334          1     0.00
NOTE:      1     1     0.01           0.7283          0     0.00
NOTE:      2     1     0.0

Unnamed: 0,Descr,Value
0,Model Name,text_classifier_1
1,Model Type,Recurrent Neural Network
2,Number of Layers,8
3,Number of Input Layers,1
4,Number of Output Layers,1
5,Number of Convolutional Layers,0
6,Number of Pooling Layers,0
7,Number of Fully Connected Layers,0
8,Number of Recurrent Layers,6
9,Number of Weight Parameters,10260

Unnamed: 0,Epoch,LearningRate,Loss,FitError
0,1,0.01,1.08487,0.585366
1,2,0.01,1.044423,0.554878
2,3,0.01,1.042524,0.634146
3,4,0.01,1.029038,0.628049
4,5,0.01,0.986248,0.52439

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),text_classifier_1_weights,10443,3,"CASTable('text_classifier_1_weights', caslib='..."


The model `easy_rnn_model` that was trained using default hyperparameters has 10,443 parameters, a default learning rate of 0.01, and final loss and fit errors of 0.986248 and 0.524390 respectively. 

How does the model perform? To explore its performance, use it to score the `sentiment_test` data. 


<a id="scoreModel1"></a>

#### Score the Test Data with the Simple Text Classification Model

How does the model perform? To benchmark performance, use it to score the sentiment_test data.

In [17]:
easy_rnn_model.evaluate(data='sentiment_test',
                        top_probs=2, 
                        model_task='CLASSIFICATION',
                        text_parms=TextParms(init_input_embeddings='word_embeddings_100')
                        )

Unnamed: 0,Descr,Value
0,Number of Observations Read,50.0
1,Number of Observations Used,50.0
2,Misclassification Error (%),60.0
3,Top 2 Misclassification Error (%),0.0
4,Loss Error,0.914429

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),Valid_Res_Jo3NVY,50,14,"CASTable('Valid_Res_Jo3NVY', caslib='CASUSER(c..."


When trained using default settings, the current text sentiment classification model has a 60% misclassification error and a loss error of 0.914429. 

Let's explore how to specify some non-default hyperparameters that might be used to improve the predictive performance.

<a id="Model2"></a>

### Create Simple Text Classification Model 2

Use the SAS DLPy `TextClassification()` function to create a second simple text classification model. Create a second model named `rnn_model_2` using the DLPy `TextClassification()` function. Name the new SAS CAS model table `text_classifier_2`. 

In [18]:
# Create a second text classification model 
# Name the model rnn_model_2
rnn_model_2 = TextClassification(s, model_table='text_classifier_2')

NOTE: Output layer added.
NOTE: Model compiled successfully.


<a id="Train2"></a>

#### Train Text Classification Model 2 with Different Learning Rate and Epoch Count

Use the `fit()` function with `word_embeddings_100`, and specify new hyperparameter values to override default settings for `learning rate` and `max_epochs`. 

The training uses the text data in the `review` column of the input data to predict the value of the nominal class (`positive`, `neutral`, `negative`) in the `sentiment column`. This is a text classification example, so the numeric values in the `stars` column are not used to make predictions.

The `seed` and `record_seed` parameter values are included to create a deterministic model with repeatable results. Deterministic models always produce the same output from a given set of starting parameters.

The code below also adds the parameter `n_threads=1` to force model determinism. Using single-threaded computations eliminates  randomness introduced by multiple threading. This is useful when you want to be able to duplicate example model training results. 

<b>Warning:</b> Using `n_threads=1` to force single-threaded computations for large models or large data sets is not recommended unless you need a fully deterministic model. Allowing multiple threads enables significantly faster model training. It is normal to expect slightly different results in multi-threaded trained models.

In [23]:
# Train rnn_model_2, but specify non-default
# values for number of epochs, learning rate, and  
# SAS log-level.
rnn_model_2.fit(data='sentiment_data', 
                inputs='review', 
                texts='review',
                target='sentiment', 
                nominals='sentiment',
                text_parms=TextParms(init_input_embeddings='word_embeddings_100'),
                n_threads=1,
                seed=867,
                record_seed=5309,
                max_epochs=45, 
                lr=0.05, 
                log_level=2
               )

NOTE: Training based on existing weights.
NOTE:  Synchronous mode is enabled.
NOTE:  The total number of parameters is 10443.
NOTE:  The approximate memory cost is 6.00 MB.
NOTE:  Loading weights cost       0.00 (s).
NOTE:  Initializing each layer cost       1.36 (s).
NOTE:  The total number of threads on each worker is 56.
NOTE:  The total mini-batch size per thread on each worker is 1.
NOTE:  The maximum mini-batch size across all workers for the synchronous mode is 56.
NOTE:  Target variable: sentiment
NOTE:  Number of levels for the target variable:      3
NOTE:  Levels for the target variable:
NOTE:  Level      0: negative
NOTE:  Level      1: neutral 
NOTE:  Level      2: positive
NOTE:  Number of input variables:     1
NOTE:  Number of text input variables:      1
NOTE:  Epoch Learning Rate        Loss  Fit Error   Time(s)
NOTE:  0          0.05          0.9379     0.6667     0.00
NOTE:  1          0.05          0.8535     0.3333     0.00
NOTE:  2          0.05          0.9044  

Unnamed: 0,Descr,Value
0,Model Name,text_classifier_2
1,Model Type,Recurrent Neural Network
2,Number of Layers,8
3,Number of Input Layers,1
4,Number of Output Layers,1
5,Number of Convolutional Layers,0
6,Number of Pooling Layers,0
7,Number of Fully Connected Layers,0
8,Number of Recurrent Layers,6
9,Number of Weight Parameters,10260

Unnamed: 0,Epoch,LearningRate,Loss,FitError
0,51,0.05,0.937887,0.666667
1,52,0.05,0.853456,0.333333
2,53,0.05,0.904444,0.333333
3,54,0.05,0.756510,0.333333
4,55,0.05,0.629624,0.333333
...,...,....,........,........
40,91,0.05,0.873657,0.333333
41,92,0.05,1.355024,1.000000
42,93,0.05,0.877529,0.333333
43,94,0.05,1.085014,0.666667

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),text_classifier_2_weights,10443,3,"CASTable('text_classifier_2_weights', caslib='..."


The model `rnn_model_2` that was trained overriding default hyperparameter values for learning rate and epoch count has 10,443 parameters, a fit error of 0.675062, and a loss error of 0.3333333. 


<a id="scoreModel2"></a>

#### Score the Test Data with Text Classification Model 2

How does text classification model `rnn_model_2` perform? To explore its performance, use it to score the `sentiment_test` data. 

In [24]:
rnn_model_2.evaluate(data='sentiment_test',
                     top_probs=2, 
                     model_task='CLASSIFICATION',
                     text_parms=TextParms(init_input_embeddings='word_embeddings_100')
                    )

Unnamed: 0,Descr,Value
0,Number of Observations Read,50.0
1,Number of Observations Used,50.0
2,Misclassification Error (%),46.0
3,Top 2 Misclassification Error (%),0.0
4,Loss Error,1.041906

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),Valid_Res_3Ke2ak,50,14,"CASTable('Valid_Res_3Ke2ak', caslib='CASUSER(c..."


When trained using a non-default max number of epochs (45) and a learning rate of 0.05, the current text sentiment classification model has a 46% misclassification error and a loss error of 1.041906. 

Now let's explore how to train text classification models using non-default optimizer settings.

<a id="Model3"></a>

### Create Simple Text Classification Model 3

Create a third text classification model named `rnn_model_3` and save the new CAS table as `text_classifier_3`. 

In [25]:
# Create a third text classification model for classifying reviews
# Name the model rnn_model_3.
rnn_model_3 = TextClassification(s, model_table='text_classifier_3')

NOTE: Output layer added.
NOTE: Model compiled successfully.


<a id="trainModel3"></a>

### Train Text Classification Model 3 with Different Optimizer Settings¶

Use the DLPy `fit()` function with `word_embeddings_100` to train the third text classification model `rnn_model_3`. Override default optimizer hyperparameter settings by specifying `mini_batch_size=10`, `max_epochs=60`, and `reg_l1=0.001`. 

The training uses the text data in the `review` column of the input data to predict the value of the nominal class (`positive`, `neutral`, `negative`) in the `sentiment column`. This is a text classification example, so the numeric values in the `stars` column are not used to make predictions.

Values for `seed` and `record_seed` are specified to support model determinism and repeatable results. Deterministic models always produce the same output from a given set of starting parameters.

The code below also adds the parameter `n_threads=1` to force full model determinism.  Using single-threaded computations eliminates randomness introduced by multiple threading. This is useful when you want to be able to duplicate example model training results. 

<b>Warning:</b> Using `n_threads=1` to force single-threaded computations for large models or large data sets is not recommended unless you need a fully deterministic model. Allowing multiple threads enables significantly faster model training. It is normal to expect slightly different results in successive multi-threaded trained models.

In [27]:
# Train rnn_model_3 and override
# the default values to use non-default  
# training optimizer parameter settings
from dlpy.model import Optimizer
rnn_model_3.fit(data='sentiment_data', 
                inputs='review', 
                texts='review',
                target='sentiment', 
                nominals='sentiment',
                seed=867,
                record_seed=5309,
                n_threads=1,
                text_parms=TextParms(init_input_embeddings='word_embeddings_100'),
                optimizer=Optimizer(mini_batch_size=10,
                                   max_epochs=60,
                                   reg_l1=0.001
                                   )
                  )

NOTE: Training from scratch.


Unnamed: 0,Descr,Value
0,Model Name,text_classifier_3
1,Model Type,Recurrent Neural Network
2,Number of Layers,8
3,Number of Input Layers,1
4,Number of Output Layers,1
5,Number of Convolutional Layers,0
6,Number of Pooling Layers,0
7,Number of Fully Connected Layers,0
8,Number of Recurrent Layers,6
9,Number of Weight Parameters,10260

Unnamed: 0,Epoch,LearningRate,Loss,FitError,L1Norm
0,1,0.001,1.162090,0.7,1.272249
1,2,0.001,1.237088,0.7,1.272239
2,3,0.001,0.986235,0.6,1.272228
3,4,0.001,1.243963,0.9,1.272218
4,5,0.001,1.124457,0.6,1.272208
...,...,.....,.......,...,........
56,57,0.001,1.185624,0.6,1.271672
57,58,0.001,1.169853,0.6,1.271661
58,59,0.001,1.298165,0.9,1.271651
59,60,0.001,1.204853,0.7,1.271640

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),text_classifier_3_weights,10443,3,"CASTable('text_classifier_3_weights', caslib='..."


The model `rnn_model_3` that was trained using non-default optimizer hyperparameters `mini_batch_size=10`, `max_epochs=60`, and  `reg_l1=0.001`. The model has 10,443 parameters, a loss of 1.204853, and a fit error of 0.7.

How does this model perform? To explore its performance, use it to score the `sentiment_test` data. 

<a id="scoreModel3"></a>

#### Score the Test Data with Text Classification Model 3

How does text classification model `rnn_model_3` perform? To benchmark performance, use it to score the `sentiment_test` data.

In [28]:
rnn_model_3.evaluate(data='sentiment_test',
                     top_probs=2, 
                     model_task='CLASSIFICATION',
                     text_parms=TextParms(init_input_embeddings='word_embeddings_100')
                     )

Unnamed: 0,Descr,Value
0,Number of Observations Read,50.0
1,Number of Observations Used,50.0
2,Misclassification Error (%),46.0
3,Top 2 Misclassification Error (%),0.0
4,Loss Error,1.102811

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),Valid_Res_eKEyYM,50,14,"CASTable('Valid_Res_eKEyYM', caslib='CASUSER(c..."


When model `rnn_model_3` trains using the non-default optimizer hyperparameter settings  `mini_batch_size=10`, `max_epochs=60`, and `reg_l1=0.001`, the text sentiment classification model has a 46% misclassification error and a loss error of 1.102811. 

Now, let's explore how to train text classification models using non-default momentum solver settings.

<a id="Model4"></a>

### Create Simple Text Classification Model 4

Create a fourth text classification model named `rnn_model_4` and save the new CAS table as `text_classifier_4`. 

In [30]:
# Create a fourth text classification 
# model for classifying reviews. Name 
# the model rnn_model_4
rnn_model_4 = TextClassification(s, 
                                 model_table='text_classifier_4'
                                )

NOTE: Output layer added.
NOTE: Model compiled successfully.


<a id="trainModel4"></a>

#### Train Text Classification Model 4 with Different Optimizer and Momentum Solver Settings    

Use the DLPy `fit()` function with `word_embeddings_100` to train the model `rnn_model_4`. Override default optimizer hyperparameter settings for `mini_batch_size=10`, `max_epochs=70`, `reg_l1=0.001`. 
Choose the `MomentumSolver` optimization algorithm and set the momentum parameter `momentum=0.089`. 

The training uses the text data in the `review` column of the input data to predict the value of the nominal class (`positive`, `neutral`, `negative`) in the `sentiment column`. This is a text classification example, so the numeric values in the `stars` column are not used to make predictions.

Values for `seed` and `record_seed` are specified to support model determinism and repeatable results. Deterministic models always produce the same output from a given set of starting parameters.

The code below also adds the parameter `n_threads=1` to force full model determinism.  Using single-threaded computations eliminates randomness introduced by multiple threading. This is useful when you want to be able to duplicate example model training results. 

<b>Warning:</b> Using `n_threads=1` to force single-threaded computations for large models or large data sets is not recommended unless you need a fully deterministic model. Allowing multiple threads enables significantly faster model training. It is normal to expect slightly different results in successive multi-threaded trained models.

In [31]:
# Train the model rnn_model_4 and override
# the default parameters to use advanced 
# settings for Optimizer and Momentum Solver
from dlpy.model import Optimizer, MomentumSolver
rnn_model_4.fit(data='sentiment_test', 
                inputs='review', 
                texts='review',
                target='sentiment', 
                nominals='sentiment',
                seed=867,
                record_seed=5309,
                n_threads=1,
                text_parms=TextParms(init_input_embeddings='word_embeddings_100'),
                optimizer=Optimizer(mini_batch_size=10,
                                    max_epochs=70,
                                    reg_l1=0.001,
                                    algorithm=MomentumSolver(momentum=0.089)
                                   )
              )

NOTE: Training from scratch.


Unnamed: 0,Descr,Value
0,Model Name,text_classifier_4
1,Model Type,Recurrent Neural Network
2,Number of Layers,8
3,Number of Input Layers,1
4,Number of Output Layers,1
5,Number of Convolutional Layers,0
6,Number of Pooling Layers,0
7,Number of Fully Connected Layers,0
8,Number of Recurrent Layers,6
9,Number of Weight Parameters,10260

Unnamed: 0,Epoch,LearningRate,Loss,FitError,L1Norm
0,1,0.001,1.031601,0.60,1.266981
1,2,0.001,0.993128,0.52,1.266927
2,3,0.001,1.088150,0.68,1.266874
3,4,0.001,1.062645,0.68,1.266820
4,5,0.001,1.027493,0.54,1.266766
...,...,...,...,...,...
65,66,0.001,0.969179,0.44,1.263450
66,67,0.001,0.995851,0.54,1.263395
67,68,0.001,1.019326,0.54,1.263342
68,69,0.001,0.947062,0.48,1.263286

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),text_classifier_4_weights,10443,3,"CASTable('text_classifier_4_weights', caslib='..."


The model `rnn_model_4` has 10,443 parameters, a loss rate of 0.908797, and a fit error of 0.4. 

<a id="scoreModel4"></a>

#### Score the Test Data with Text Classification Model 4

How does text classification model `rnn_model_4` perform? To explore performance, use it to score the `sentiment_test` data.

In [32]:
# Use rnn_model_4 to score the test data
rnn_model_4.evaluate(data='sentiment_test',
                     top_probs=2, 
                     model_task='CLASSIFICATION',
                     text_parms=TextParms(init_input_embeddings='word_embeddings_100')
                     )

Unnamed: 0,Descr,Value
0,Number of Observations Read,50.0
1,Number of Observations Used,50.0
2,Misclassification Error (%),46.0
3,Top 2 Misclassification Error (%),0.0
4,Loss Error,0.940008

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),Valid_Res_y7n2zg,50,14,"CASTable('Valid_Res_y7n2zg', caslib='CASUSER(c..."


When model `rnn_model_4` trains using the non-default optimizer hyperparameter settings, the text sentiment classification model has a 46% misclassification error and a loss error of 0.940008. 

Now explore how to change the default settings in order to use different optimizer settings and the Adam solver during model training.

<a id="Model5"></a>

### Create Simple Text Classification Model 5

Create a fifth text classification model named `rnn_model_5` and save the new CAS table as `text_classifier_5`. 

In [33]:
# Create a fifth text classification model 
# named rnn_model_5
rnn_model_5 = TextClassification(s, 
                                 model_table="text_classifier_5"
                                )

NOTE: Output layer added.
NOTE: Model compiled successfully.


### Train Text Classification Model 5 with Different Optimizer and Adam Solver Settings¶

Use  the DLPy `fit()` function with `word_embeddings_100` file. Override the default model optimizer settings  and specify custom Adam solver parameter settings.

Use optimizer settings to specify `mini_batch_size=10`, `max_epochs=70`, and `reg_l1=0.001`. Choose the `AdamSolver` optimization algorithm, and set the beta1 parameter `beta1=0.901`, the beta2 parameter `beta2=0.988`, and learning rate `learning_rate=0.03`. 

The training uses the text data in the `review` column of the input data to predict the value of the nominal class (`positive`, `neutral`, `negative`) in the `sentiment column`. This is a text classification example, so the numeric values in the `stars` column are not used to make predictions.

Values for `seed` and `record_seed` are specified to support model determinism and repeatable results. Deterministic models always produce the same output from a given set of starting parameters.

The code below also adds the parameter `n_threads=1` to force full model determinism.  Using single-threaded computations eliminates randomness introduced by multiple threading. This is useful when you want to be able to duplicate example model training results. 

<b>Warning:</b> Using `n_threads=1` to force single-threaded computations for large models or large data sets is not recommended unless you need a fully deterministic model. Allowing multiple threads enables significantly faster model training. It is normal to expect slightly different results in successive multi-threaded trained models. 

In [34]:
# Train the model rnn_model_5 and override
# the default parameters to use advanced 
# settings for Optimizer and Adam Solver
from dlpy.model import Optimizer, AdamSolver
rnn_model_5.fit(data='sentiment_data', 
                inputs='review', 
                texts='review',
                target='sentiment', 
                nominals='sentiment',
                seed=867,
                record_seed=5309,
                n_threads=1,
                text_parms=TextParms(init_input_embeddings='word_embeddings_100'),
                optimizer=Optimizer(mini_batch_size=10,
                                    max_epochs=70,
                                    reg_l1=0.001,
                                    algorithm=AdamSolver(beta1=0.901, 
                                                         beta2=0.988, 
                                                         learning_rate=0.03
                                                         )
                                   )
              )

NOTE: Training from scratch.


Unnamed: 0,Descr,Value
0,Model Name,text_classifier_5
1,Model Type,Recurrent Neural Network
2,Number of Layers,8
3,Number of Input Layers,1
4,Number of Output Layers,1
5,Number of Convolutional Layers,0
6,Number of Pooling Layers,0
7,Number of Fully Connected Layers,0
8,Number of Recurrent Layers,6
9,Number of Weight Parameters,10260

Unnamed: 0,Epoch,LearningRate,Loss,FitError,L1Norm
0,1,0.03,1.084948,0.558824,0.612217
1,2,0.03,1.012678,0.552941,0.273684
2,3,0.03,0.949112,0.588235,0.171789
3,4,0.03,0.990371,0.523529,0.148057
4,5,0.03,0.931523,0.494118,0.133256
...,...,...,...,...,...
65,66,0.03,0.079460,0.035294,0.193687
66,67,0.03,0.058496,0.023529,0.186927
67,68,0.03,0.086638,0.041176,0.180091
68,69,0.03,0.114686,0.029412,0.191764

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),text_classifier_5_weights,10443,3,"CASTable('text_classifier_5_weights', caslib='..."


The model `rnn_model_5` has 10,443 parameters, a loss error of 0.076158, and a fit error of 0.011765. These are very encouraging model training statistics that tend to indicate a trained model that has very low misclassification rates.

Let's find out by using the model `rnn_model_5` to score the test data:


<a id="scoreModel5"></a>

#### Score the Test Data with Text Classification Model 5

How does the model `rnn_model_5` perform? Use `rnn_model_5` to score the `sentiment_test` data. 

In [35]:
# Score the test data using 'rnn_model_5' 
# with non-default settings for optimizer and Adam solver. 
# Inference will predict the sentiment of test table reviews.
rnn_model_5.evaluate(data='sentiment_test',
                     top_probs=2, 
                     model_task='CLASSIFICATION',
                     text_parms=TextParms(init_input_embeddings='word_embeddings_100')
                     )

Unnamed: 0,Descr,Value
0,Number of Observations Read,50.0
1,Number of Observations Used,50.0
2,Misclassification Error (%),0.0
3,Top 2 Misclassification Error (%),0.0
4,Loss Error,0.006622

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(UserID),Valid_Res_aQKgGA,50,14,"CASTable('Valid_Res_aQKgGA', caslib='CASUSER(c..."


When model `rnn_model_5` trains with the hyperparameters above and the Adam solver, the text sentiment classification model has a 0% misclassification error and a loss error of 0.006622. 

The `rnn_model_5` model clearly has the best performance of the three models with a 0% misclassification rate. Great performance, but keep in mind: this is an example model and the toy data sets are small. 

In fact, is it too good? It might be interesting to train `rnn_model_5` again, using a significantly larger and more diverse training data set, and then explore opportunities to improve the scoring performance of that trained model. Unfortunately, that is beyond the scope of this notebook. 

Instead, examine the scored class category probabilities in the scored output data. First, get the name of the scored `rnn_model_5` table in CAS:

In [36]:
# Display CAS tables
# to get the name of the scored 
# rnn_model_5 table.

s.table.tableInfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,...,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,WORD_EMBEDDINGS_100,398921,101,0,utf-8,2021-02-17T12:01:34-05:00,2021-02-17T12:01:34-05:00,2021-02-17T16:03:54-05:00,UTF8,1929200000.0,...,0,0,0,,,0,UserID,,2021-02-17T12:01:34-05:00,1929200000.0
1,SENTIMENT_DATA,164,3,0,utf-8,2021-02-17T14:04:53-05:00,2021-02-17T14:04:53-05:00,2021-02-17T16:00:41-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,
2,SENTIMENT_TEST,50,3,0,utf-8,2021-02-17T14:12:35-05:00,2021-02-17T14:12:35-05:00,2021-02-17T16:03:54-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,
3,TEXT_CLASSIFIER_1,121,5,0,utf-8,2021-02-17T14:14:13-05:00,2021-02-17T14:14:13-05:00,2021-02-17T14:20:16-05:00,UTF8,1929208000.0,...,0,0,0,,,0,UserID,,,
4,TEXT_CLASSIFIER_1_WEIGHTS,10443,3,0,utf-8,2021-02-17T14:17:12-05:00,2021-02-17T14:17:12-05:00,2021-02-17T14:20:16-05:00,UTF8,1929209000.0,...,0,0,0,,,0,UserID,,,
5,VALID_RES_JO3NVY,50,14,0,utf-8,2021-02-17T14:20:17-05:00,2021-02-17T14:20:17-05:00,2021-02-17T14:20:17-05:00,UTF8,1929209000.0,...,0,0,0,,,0,UserID,,,
6,TEXT_CLASSIFIER_2,121,5,0,utf-8,2021-02-17T14:22:51-05:00,2021-02-17T14:22:51-05:00,2021-02-17T14:41:36-05:00,UTF8,1929209000.0,...,0,0,0,,,0,UserID,,,
7,TEXT_CLASSIFIER_2_WEIGHTS,10443,3,0,utf-8,2021-02-17T14:40:07-05:00,2021-02-17T14:40:07-05:00,2021-02-17T14:41:36-05:00,UTF8,1929210000.0,...,0,0,0,,,0,UserID,,,
8,VALID_RES_3KE2AK,50,14,0,utf-8,2021-02-17T14:41:37-05:00,2021-02-17T14:41:37-05:00,2021-02-17T14:41:37-05:00,UTF8,1929210000.0,...,0,0,0,,,0,UserID,,,
9,TEXT_CLASSIFIER_3,121,5,0,utf-8,2021-02-17T15:33:29-05:00,2021-02-17T15:33:29-05:00,2021-02-17T15:39:04-05:00,UTF8,1929213000.0,...,0,0,0,,,0,UserID,,,


The name of the scored table from `rnn_model_5` is `VALID_RES_AQKGGA`. Your scored table will have a different generated name. 

Use the `table.fetch()` function to display the first 10 observations in the scored validation data table.  The output table shows the predictive probabilities for each sentiment class as well as the imputed (predicted) sentiment value. 

In [40]:
# Show the first 10 rows of the scored data  
# in VALID_RES_AQKGGA for rnn_model_5:
s.table.fetch(table="VALID_RES_AQKGGA", 
              format=True,
              to=10)

Unnamed: 0,review,sentiment,stars,P_sentimentnegative,P_sentimentneutral,P_sentimentpositive,I_sentiment,_DL_PredP_,_DL_PredLevel_,_DL_TOP_P0_,_DL_TOP_PredName0_,_DL_TOP_P1_,_DL_TOP_PredName1_,_DL_TOP_Missit_
0,Disappointed in the expensive food. Not worth it!,negative,1,0.9985839128,0.0008758493,0.0005402573,negative,0.9985839128,0,0.9985839128,negative,0.0008758493,neutral,0
1,Wow! I love this place. The best desserts!,positive,5,0.0282318536,0.0001211427,0.9716470838,positive,0.9716470838,2,0.9716470838,positive,0.0282318536,negative,0
2,Ordinary meal. OK.,neutral,3,0.0084569836,0.991517067,2.60317e-05,neutral,0.991517067,1,0.991517067,neutral,0.0084569836,negative,0
3,Average dining experience.,neutral,3,0.0194560047,0.9804186225,0.0001253785,neutral,0.9804186225,1,0.9804186225,neutral,0.0194560047,negative,0
4,Loved it! Fantastic server and food!,positive,5,0.0003648701,2.4235758e-06,0.9996327162,positive,0.9996327162,2,0.9996327162,positive,0.0003648701,negative,0
5,I love the ravioli. It is my favorite!,positive,4,0.0045795697,2.44237e-05,0.9953959584,positive,0.9953959584,2,0.9953959584,positive,0.0045795697,negative,0
6,Rude server forgot us. Disaster! Very disappoi...,negative,1,0.9986647367,0.0008475339,0.0004877492,negative,0.9986647367,0,0.9986647367,negative,0.0008475339,neutral,0
7,Good food. Good staff. Happy.,positive,4,0.0004021133,2.6598098e-06,0.9995952249,positive,0.9995952249,2,0.9995952249,positive,0.0004021133,negative,0
8,"Slow service, crummy food. Unhappy.",negative,2,0.9983657002,0.0010216051,0.000612738,negative,0.9983657002,0,0.9983657002,negative,0.0010216051,neutral,0
9,Fast and tasty game day treats! The best!,positive,5,0.0003246331,2.2227691e-06,0.9996731281,positive,0.9996731281,2,0.9996731281,positive,0.0003246331,negative,0


<a id="summary"></a>

### Summary

It is relatively easy to create and modify task-centric models using SAS DLPy. This example showed how to create five different text classification models to perform a text sentiment analysis task.

There are a variety of hyperparameters that affect model training.  Experimenting to find the right configuration of hyperparameter settings for a given task and input data is essential for good predictive model performance. Exploring the performance of different model training configurations can help find a good starting candidate for focused hyperparameter tuning.