In [1]:
import asyncio
import json
import logging
import multiprocessing
import typing
import warnings
from multiprocessing import Process
from urllib.parse import unquote

import pandas as pd
import uvicorn
from flama.applications import Flama
from flama.components import Component
from flama.routing import Router
from marshmallow import fields, Schema
from uvicorn import Config, Server

logging.disable(logging.INFO)
warnings.filterwarnings('ignore')

<div align="center"><img src="img/portada.jpg" alt="Test Academy" width="800" height="600" /></div>

<div align="center"><h1>Machine Learning Services Validation</h1></div>
<br>
<div align="center"><em>Do not let your Machine Learning service going into production without being tested</em></div>


<br>

***
<div style="width: 100%; display: table;">
    <div style="display: table-row">
        <div style="display: table-cell; text-align: center; width: 50%;">
            <span><b>José Antonio Perdiguero López</b></span>
            <br><br>
            <span>🌐 http://www.perdy.io</span>
            <br><br>
            <span>👨‍💻 https://github.com/perdy</span>
            <br><br>
            <span>👔 https://www.linkedin.com/in/perdy</span>
            <br><br>
            <span>📧 perdy@perdy.io</span>
        </div>
        <div style="display: table-cell; text-align: center; width: 50%;">
            <span><b>Support open source projects by giving a star ⭐ and spreading the word 📣</b></span>
            <br><br>
            <span>https://flama.perdy.io/</span>
            <br><br>
            <span>https://getgauge.io/</span>
            <br><br>
            <span>https://www.tensorflow.org</span>
            <br><br>
            <span>https://www.python.org/</span>
        </div>
    </div>
</div>

## Index

1. [Introduction](#Introduction)
2. [Machine Learning](#Machine-Learning)
3. [Building A Machine Learning Model](#Building-A-Machine-Learning-Model)
4. [Developing The Service](#Developing-The-Service)
5. [Testing The Service](#Testing-The-Service)

<h1><center>Introduction</center></h1>

## Introduction

Do we really know what **Artificial Intelligence** is?

And, specifically, do we know what **Machine Learning** is?

How can we **verify** and **validate** services whose response depends on a Machine Learning model?

## Goals

1. **Discover new tools** for building REST APIs and design Tests.

2. **Understand** what is Artificial Intelligence and Machine Learning.

3. **Build a Machine Learning model** for a solving a complex problem.

4. **Develop a service** that relies on a Machine Learning model.

5. **Generate some tests** that verify and validate the service and the model.

## Tools

<div style="width: 100%; display: table;">
    <div style="display: table-row">
        <div style="display: table-cell; align: center; width: 50%;">
            <h3><em>The Glue</em></h3>
            <img src="img/python-logo.png" width="300"/>
        </div>
        <div style="display: table-cell; align: center; width: 50%;">
            <h3><em>The Mind</em></h3>
            <img src="img/tensorflow-logo.png" width="300"/>
        </div>
    </div>
    <div style="display: table-row">
        <div style="display: table-cell; align: center; width: 50%;">
            <h3><em>The Power</em></h3>
            <img src="img/flama.png" width="300"/>
        </div>
        <div style="display: table-cell; align: center; width: 50%;">
            <h3><em>The Shield</em></h3>
            <img src="img/gauge-logo.png" width="300"/>
        </div>
    </div>
</div>

<h1><center>Machine Learning</center></h1>

## What Is Artificial Intelligence?

Artificial intelligence is the simulation of human intelligence processes by computer systems. These processes include **learning** (the acquisition of information and rules for using the information), **reasoning** (using rules to reach approximate or definite conclusions) and **self-correction**.

AI can be categorized as either weak or strong.

### Strong AI
Also known as artificial general intelligence. Is an AI system with generalized human cognitive abilities. When presented with an unfamiliar task, a strong AI system is able to find a solution without human intervention.

### Weak AI
Also known as narrow AI. Is an AI system that is designed and trained for a particular task.

## AI In Perspective

<br>
<div align="center"><img src="img/tree-ai.png" alt="Tree AI" width="1024" height="768" /></div>

## AI In Perspective

<br>
<div align="center"><img src="img/venn-ai.png" alt="Venn AI" width="1024" height="768" /></div>

## AI In Perspective

<br>
<div align="center"><img src="img/venn-ai-simplified.png" alt="Venn AI Simplified" width="1024" height="768" /></div>

## What Is Machine Learning?

The science of getting a computer to act without programming. There are three types of machine learning algorithms:

### Supervised learning

Data sets are labeled so that patterns can be detected and used to label new data sets.

### Unsupervised learning

Data sets aren’t labeled and are sorted according to similarities or differences.

### Reinforcement learning

Data sets aren’t labeled but, after performing an action or several actions, the AI system is given feedback.

## Supervised Learning

<br>
<div align="center"><img src="img/training-process.png" alt="Training Process" width="1024" height="768" /></div>

## The Model

We are going to build a model that performs a sentiment analysis over a text and concludes if it is positive or negative.

**Input:** A text (a list of integers representing each word).

**Output:** 0 (negative) or 1 (positive).

<br>
<div align="center"><img src="img/lstm-model.png" alt="LSTM Model" width="800" height="600" /></div>

<h1><center>Building A Machine Learning Model</center></h1>

In [None]:
from keras import Sequential
from keras.datasets import imdb
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing import sequence

## Training Dataset

The dataset used for training this model is based on movie’s reviews from IMDB and it will have the following shape:

<table>
	<thead>
		<tr>
			<th>Text input</th>
			<th>Input</th>
			<th>Output</th>
		</tr>
    </thead>
    <tbody>
		<tr>
			<td>the as you with out themselves...</td>
			<td>[1, 14, 22, 16, 43, 530, ...]</td>
			<td>1</td>
		</tr>
		<tr>
			<td>the thought solid thought sena...</td>
			<td>[1, 194, 1153, 194, 8255, ...]</td>
			<td>0</td>
		</tr>
		<tr>
			<td>the as there in at by br of su...</td>
			<td>[1, 14, 47, 8, 30, 31, 7, ...]</td>
			<td>0</td>
		</tr>
		<tr>
			<td>the of bernadette mon they hal...</td>
			<td>[1, 4, 18609, 16085, 33, ...]</td>
			<td>1</td>
		</tr>
		<tr>
			<td>the sure themes br only acting...</td>
			<td>[1, 249, 1323, 7, 61, 113, ...]</td>
			<td>0</td>
		</tr>
	</tbody>
</table>

## Building The Training Dataset

A first step in a dataset building process could be to define how large it will be.

In [2]:
VOCABULARY_LENGTH = 20000

Keras provides some datasets so it's possible to skip the data gathering process.

In [3]:
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = VOCABULARY_LENGTH)

Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz


As the dataset used for training will consist of integers that represents a word, it could be interesting to have mappings of **Word -> ID** and **ID -> Word**.

In [4]:
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}

Downloading data from https://s3.amazonaws.com/text-datasets/imdb_word_index.json


All input documents must have the same length so it's necessary to limit the maximum review length by truncating longer reviews and padding shorter reviews with a null value (`0`).

In [5]:
MAX_WORDS = 500
X_train = sequence.pad_sequences(X_train, maxlen=MAX_WORDS)
X_test = sequence.pad_sequences(X_test, maxlen=MAX_WORDS)

### Design Model

**Input**: Sequence of words (integer ids) whose length are MAX_WORDS.

**Output**: Binary label (0 means *Negative* and 1 means *Positive*)

In [6]:
model=Sequential()
model.add(Embedding(VOCABULARY_LENGTH, 32, input_length=MAX_WORDS))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 32)           640000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
Total params: 693,301
Trainable params: 693,301
Non-trainable params: 0
_________________________________________________________________


### Compile And Train Our Model

We first need to compile our model by specifying the loss function and optimizer we want to use while training, as well as any evaluation metrics we'd like to measure.

In [7]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Once compiled, we can run the training process.

In [8]:
BATCH_SIZE = 64
NUM_EPOCHS = 5

validation = X_train[:BATCH_SIZE], y_train[:BATCH_SIZE]
training = X_train[BATCH_SIZE:], y_train[BATCH_SIZE:]

model.fit(*training, validation_data=validation, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS);

Train on 24936 samples, validate on 64 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


### Test Model

Once the training process is finished the model should be tested using unseen data to measure its accuracy.

In [9]:
scores = model.evaluate(X_test, y_test, verbose=0)
f"Test accuracy: {scores[1]:.5f}"

'Test accuracy: 0.85628'

### Prediction Example

In [10]:
def predict(model, word2id, x):
    x = x.lower().split()
    x = [word2id.get(i, 0) for i in x]
    x = sequence.pad_sequences([x], maxlen=MAX_WORDS)
    score = model.predict(x)[0][0]
    sentiment = {0: "Negative", 1: "Positive"}[model.predict_classes(x)[0][0]]
    return score, sentiment

In [12]:
score, sentiment = predict(model, word2id, "A girl is happy while playing with her toys")

f"The result is '{sentiment}' with a score of {score:.2f}"

"The result is 'Positive' with a score of 0.80"

## Save Model And Words Mapping

In order to use the model within a service it's necessary to export it and save the model and the words dictionary in files.

In [13]:
model.save("sentiment_analysis_model.h5")

with open("sentiment_analysis_words.json", "w") as f:
    json.dump(word2id, f, indent=4)

<h1><center>Developing The Service</center></h1>

## How To Expose The ML Model?

Machine Learning models can be used either as an internal piece of a service or as a service itself.

If it is used as an **internal piece** you won’t notice it, such as scoring or recommendation systems within bigger products like Spotify or Netflix.

But you can also find them as a **service that exposes an API** to directly interact with the model. There are many examples of that in AWS, Google Cloud, Azure...

## Wrapping Up A ML Model

One of the most widely adopted way of serving a ML model is to wrap it into a **REST API** with specific methods for calling the model.

## The Service

Our service will expose a single endpoint that let us interact with the model.

### Request

<br>
<div style="display: table; padding-left: 2em;">
    <div style="display: table-row;">
        <div style="display: table-cell; text-align: right; padding-right: 2em;">
            <b>Verb</b>
        </div>
        <div style="display: table-cell; text-align: left;">
            <tt>GET</tt>
        </div>
    </div>
    <div style="display: table-row;">
        <div style="display: table-cell; text-align: right; padding-right: 2em;">
            <b>URL</b>
        </div>
        <div style="display: table-cell; text-align: left;">
            <tt>https://service.url/analyze/</tt>
        </div>
    </div>
    <div style="display: table-row;">
        <div style="display: table-cell; text-align: right; padding-right: 2em;">
            <b>Params</b>
        </div>
        <div style="display: table-cell; text-align: left;">
            <tt>text=The%20girl%20is%20having%20fun%20while%20playing</tt>
        </div>
    </div>
</div>

### Response

```json
{
  ”text”: ”The girl is having fun while playing”,
  ”sentiment”: ”Positive”,
  ”score”: 0.6321590542793274
}
```

## Building A REST API With Flama

To build a REST API we need to define:

1. A **component** that loads our ML model.

2. The **data schema** for our response.

3. The **view** function that will be called through requests to /analyze/ endpoint.

4. The whole API **application**.

Everything put together is less than 100 lines of python code.

## ML Component

In [2]:
class SentimentAnalysisModel:
    def __init__(self, model, words: typing.Dict[str, int], max_words: int=500, vocabulary_length: int=20000):
        self.model = model
        self.words = words
        self.max_words = max_words
        self.vocabulary_length = vocabulary_length
    
    def predict(self, text: str) -> typing.Tuple[float, str]:
        x = [self.words.get(i, 0) if self.words.get(i, 0) <= self.vocabulary_length else 0 for i in text.lower().split()]
        x = sequence.pad_sequences([x], maxlen=self.max_words)
        score = self.model.predict(x)
        sentiment = {0: "Negative", 1: "Positive"}[self.model.predict_classes(x)[0][0]]
        return score, sentiment

In [3]:
class SentimentAnalysisModelComponent(Component):
    def __init__(self, model_path: str, words_path: str):
        self._model_path = model_path
        with open(words_path) as f:
            self.words = json.load(f)
            
    @property
    def model(self):
        if not hasattr(self, "_model"):
            from keras.models import load_model
            self._model = load_model(self._model_path)
            self._model._make_predict_function()
        return self._model
            
    def resolve(self) -> SentimentAnalysisModel: 
        return SentimentAnalysisModel(model=self.model, words=self.words)

## Data Schema

In [4]:
class SentimentAnalysis(Schema):
    text = fields.String(
        title="text",
        description="Text to analyze"
    )
    score = fields.Float(
        title="score",
        description="Sentiment score in range [0,1]"
    )
    sentiment = fields.String(
        title="sentiment",
        description="Sentiment class (Positive or Negative)"
    )

## Analysis View

In [5]:
def analyze(text: str, model: SentimentAnalysisModel) -> SentimentAnalysis:
    """
    tags:
        - sentiment-analysis
    summary:
        Sentiment analysis.
    description:
        Performs a sentiment analysis on a given text.
    responses:
        200:
            description: Analysis result.
    """
    text = unquote(text)
    score, sentiment = model.predict(text)
    return {
        "text": text,
        "score": score,
        "sentiment": sentiment,
    }

## API Application

In [6]:
app = Flama(
    components=[SentimentAnalysisModelComponent("model.h5", "words.json")],
    title="Sentiment Analysis",
    version="0.1",
    description="A sentiment analysis API for movies reviews",
    redoc="/redoc/",
)

In [7]:
app.add_route("/analyze/", analyze, methods=["GET"])

## Run The Service

In [8]:
Process(target=uvicorn.run, kwargs={"app": app, "host": "0.0.0.0", "port": 8000}, daemon=True).start()

<h1><center>Testing The Service</center></h1>

## Testing Considerations

The most commons development cases of Machine Learning services are those where the building of the model and the service are done completely separated and even by different teams.

That implies we aren’t in control of the training process so that we cannot test the model until both are merged.

## Validation VS Verification

<table>
	<thead>
		<tr>
			<th>Criteria</th>
			<th>Verification</th>
			<th>Validation</th>
		</tr>
    </thead>
    <tbody>
		<tr>
			<td><b>Definition</b></td>
			<td>The process of evaluating products of a development phase to determine whether they meet the specified re- quirements.</td>
			<td>The process of evaluating software during or at the end of the development process to determine whether it sat- isfies specified business re- quirements.</td>
		</tr>
		<tr>
			<td><b>Objective</b></td>
			<td>To ensure that the product is being built according to the requirements and design specifications.</td>
			<td>To demonstrate that the product fulfills its intended use when placed in its intended environment.</td>
		</tr>
		<tr>
			<td><b>Question</b></td>
			<td>Are we building the product right?</td>
			<td>Are we building the right product?</td>
		</tr>
	</tbody>
</table>

## Test Specification: Verification

```markdown
## Endpoint Verification
Tags: functional, verification

Verify if the endpoint that allows interaction with Sentiment Analyzer
is properly defined based on specifications. It must provide a query
parameter **text** that acts as the input of the model and it cannot be
empty. The response must be a JSON containing three attributes:
**text**, **score** and **sentiment**.

* Request sentiment analysis with text "Perdy is testing this" returns "200"
* Response schema contains attributes
    |Attribute|
    |---------|
    |text     |
    |score    |
    |sentiment|
* Request sentiment analysis with text "" returns "400"
```

## Test Specification: Validation

```markdown
## Model Validation
Tags: ml, validation

Validate the model predictions against a set of fixed data. This data set
must contains a minimum list of well-known pairs of input and output to
check that after retraining the model it will continue behaving the same
way against these inputs.

* Analyze and validate the following texts <table:data/sentiment_analysis.csv>
```

## Step Implementation

```python
@step("Response schema contains attributes <table>")
def assert_response_schema(table):
    response = data_store.scenario[”response”]
    for attribute in table.get_column_values_with_name(”Attribute”):
        assert attribute in response
```

## Run Tests

In [9]:
!gauge run tests/specs

ERROR: tensorboard 2.0.1 has requirement grpcio>=1.24.3, but you'll have grpcio 1.15.0 which is incompatible.
Error ----------------------------------

[Gauge]
Failed to start gauge API: Timed out connecting to 127.0.0.1:33411

Get Support ----------------------------
	Docs:          https://docs.gauge.org
	Bugs:          https://github.com/getgauge/gauge/issues
	Chat:          https://gitter.im/getgauge/chat

Your Environment Information -----------
	linux, 1.0.6, 2bc49db
	html-report (4.0.8), python (0.3.6), screenshot (0.0.1), xml-report (0.2.2)


[HTML Report](tests/reports/html-report/index.html)

<div align="center"><img src="img/contraportada.jpg" alt="Test Academy" width="800" height="600" /></div>

<div align="center"><h1>Machine Learning Services Validation</h1></div>
<br>
<div align="center"><em>Do not let your Machine Learning service going into production without being tested</em></div>


<br>

***
<div style="width: 100%; display: table;">
    <div style="display: table-row">
        <div style="display: table-cell; text-align: center; width: 50%;">
            <span><b>José Antonio Perdiguero López</b></span>
            <br><br>
            <span>🌐 http://www.perdy.io</span>
            <br><br>
            <span>👨‍💻 https://github.com/perdy</span>
            <br><br>
            <span>👔 https://www.linkedin.com/in/perdy</span>
            <br><br>
            <span>📧 perdy@perdy.io</span>
        </div>
        <div style="display: table-cell; text-align: center; width: 50%;">
            <span><b>Support open source projects by giving a star ⭐ and spreading the word 📣</b></span>
            <br><br>
            <span>https://flama.perdy.io/</span>
            <br><br>
            <span>https://getgauge.io/</span>
            <br><br>
            <span>https://www.tensorflow.org</span>
            <br><br>
            <span>https://www.python.org/</span>
        </div>
    </div>
</div>