After finding the best machine learning model, CNN, for emotion classification of the Go-emotions dataset, we were required to built a web service, using one of the choices under (A), to host the model as an endpoint. In this document some functionality is built to perform testing on the deployed endpoint.

## Designing the test functions

Sample function, `retreive_sentiments`, in Python to consume the model endpoint. This function will take an input, convert it to the correct format, send it to the model endpoint using an HTTP request, and then process and return the response. The function uses the `requests` library to send the HTTP request.

Here is the basic skeleton for the function:

In [1]:
import json
import requests
import time

def retreive_sentiments(data):
    # Convert the input data to JSON format.
    payload = json.dumps(data)  # Convert the list of comments to JSON format
    
    # Define the URL of the model endpoint.
    url = "http://127.0.0.1:5050/retreive_sentiments"
    
    # Send the HTTP request to the model endpoint with the input data.
    response = requests.post(url, json=payload)
    
    # Check if the request was successful.
    if response.status_code == 200:
        # If so, return the prediction from the model.
        prediction = response.json()
        return response.text
    else:
        # If not, print the response and raise an exception.
        print(response.text)
        response.raise_for_status()
        

In this function, `data` should be the input to your model. This data needs to be converted to JSON format because that's the format the model endpoint expects.

The `response` object contains the HTTP response from the model endpoint. If the request was successful, it will have a status code of 200, and the prediction from the model can be extracted with `response.json()`. If the request was not successful, an error message is printed and an exception is raised.

## Average response time

Helps assess the efficiency and performance of the deployed model. 

In [2]:
testExamples = [
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
]

expectedLabels = ['gratitude', 'gratitude', 'gratitude', 'gratitude', 'gratitude', 'gratitude', 'gratitude', 'gratitude', 'gratitude', 'gratitude']


In [3]:
all_times=[]

In [4]:
for i in range(len(testExamples)):
    startTime = time.time()
    sample_sentences = testExamples[i]
    print(sample_sentences)
    predictions =  retreive_sentiments([testExamples[i]])
    # Record the end time
    endTime = time.time()
    # Calculate the time taken
    timeTaken = endTime - startTime
    all_times.append(timeTaken)
    print(f'sentence: {sample_sentences}, Predicted label: {predictions},Expected label: {expectedLabels[i]} ')
    print(f'Length of sentence: {sample_sentences},Time taken: {timeTaken} seconds')
    print()

I truly appreciate your kindness and support.
sentence: I truly appreciate your kindness and support., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: I truly appreciate your kindness and support.,Time taken: 50.17180943489075 seconds

I truly appreciate your kindness and support.
sentence: I truly appreciate your kindness and support., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: I truly appreciate your kindness and support.,Time taken: 0.403348445892334 seconds

I truly appreciate your kindness and support.
sentence: I truly appreciate your kindness and support., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: I truly appreciate your kindness and support.,Time taken: 0.4017014503479004 seconds

I truly appreciate your kindness and support.
sentence: I truly appreciate your kindness and support., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: I truly appreciate y

In [5]:
sum(all_times[1:])/len(all_times[1:])

0.3751990530225966

<font color="blue">Provides information about the execution time for each prediction, with the average response time being approximately 0.375 seconds, excluding the model’s initial cold start, which takes around 50.172 seconds. This low average response time allows for a smooth user experience.</font>

The following function named `test_sentiment` takes two parameters: `test_examples` and `expected_labels`. Inside the function, a loop iterates over the test examples. For each example, the code records the start time, retrieves the sentiment prediction using an unknown function called `retreive_sentiments`, and calculates the execution time. The sentence, predicted label, and expected label are then printed, along with the length of the sentence and the time taken for the prediction. This code seems to be a testing function that evaluates sentiment predictions for a list of test examples and provides information about the execution time for each prediction.

In [6]:
def test_sentiment(test_examples,expected_labels):
    for i in range(len(test_examples)):
        start_time = time.time()
        sentence = test_examples[i]
        prediction =  retreive_sentiments([test_examples[i]])
        # Record the end time
        end_time = time.time()
        # Calculate the time taken
        time_taken = end_time - start_time
        print(f'sentence: {sentence}, Predicted label: {prediction},Expected label: {expected_labels[i]} ')
        print(f'Length of sentence: {len(sentence)},Time taken: {time_taken} seconds')
        print()

## Testing bad performing emotions (Approval and Disapproval):

Evaluates how well the model performs on these emotions. That helps in identifying any potential weaknesses or biases in CNN’s predictions.

In [7]:
comments = [
    "This book is amazing! I couldn't put it down.",
    "The customer service at this store is outstanding. They go above and beyond to assist you.",
    "I'm really impressed with the new features in the latest software update.",
    "The concert last night was incredible! The band's performance was top-notch.",
    "I highly recommend this restaurant. The food is delicious and the atmosphere is fantastic.",
    "I'm extremely disappointed with the poor quality of this product.",
    "The service at this hotel was terrible. The staff was rude and unhelpful.",
    "I regret purchasing this item. It doesn't meet my expectations.",
    "The movie I watched yesterday was a complete waste of time. The plot was confusing and the acting was awful.",
    "I had a horrible experience with this airline. They lost my luggage and provided no assistance."
]
expectedLabels = [
    "Approval",
    "Approval",
    "Approval",
    "Approval",
    "Approval",
    "Dissaproval",
    "Dissaproval",
    "Dissaproval",
    "Dissaproval",
    "Dissaproval"
]


In [8]:
test_sentiment(comments,expectedLabels)

sentence: This book is amazing! I couldn't put it down., Predicted label: ["joy"],Expected label: Approval 
Length of sentence: 45,Time taken: 0.37007713317871094 seconds

sentence: The customer service at this store is outstanding. They go above and beyond to assist you., Predicted label: ["approval"],Expected label: Approval 
Length of sentence: 90,Time taken: 0.4719111919403076 seconds

sentence: I'm really impressed with the new features in the latest software update., Predicted label: ["gratitude"],Expected label: Approval 
Length of sentence: 73,Time taken: 0.41318488121032715 seconds

sentence: The concert last night was incredible! The band's performance was top-notch., Predicted label: ["joy"],Expected label: Approval 
Length of sentence: 76,Time taken: 0.4196176528930664 seconds

sentence: I highly recommend this restaurant. The food is delicious and the atmosphere is fantastic., Predicted label: ["approval"],Expected label: Approval 
Length of sentence: 90,Time taken: 0.3493

<font color="blue"> When evaluating the two worst-performing emotions, approval, and disapproval, the model predicted 2 out of 5 correctly for approval, and sadly none for disapproval. Notably, the time taken for these predictions remained consistent, regardless of the accuracy of the outcome. </font>

## Testing all different types of emotions:

Helped determine if the model is biased or inaccurate towards specific emotions.

In [9]:
comments = [
    "This movie is hilarious! I can't stop laughing.",  # amusement
    "I'm so angry at what happened.",  # anger
    "I approve of your decision.",  # approval
    "I'm confused about what to do next.",  # confusion
    "I'm really curious to learn more about this topic.",  # curiosity
    "I desire success and happiness.",  # desire
    "I strongly disapprove of your behavior.",  # disapproval
    "I'm afraid of what might happen.",  # fear
    "I'm grateful for your help.",  # gratitude
    "The news filled me with joy.",  # joy
    "I love spending time with my family.",  # love
    "I feel remorseful for my actions.",  # remorse
    "I'm feeling sad and lonely.",  # sadness
    "I don't have any strong emotions right now."  # neutral
]

expectedLabels = [
    "amusement",
    "anger",
    "approval",
    "confusion",
    "curiosity",
    "desire",
    "disapproval",
    "fear",
    "gratitude",
    "joy",
    "love",
    "remorse",
    "sadness",
    "neutral"
]


In [10]:
test_sentiment(comments,expectedLabels)

sentence: This movie is hilarious! I can't stop laughing., Predicted label: ["amusement"],Expected label: amusement 
Length of sentence: 47,Time taken: 0.37226057052612305 seconds

sentence: I'm so angry at what happened., Predicted label: ["anger"],Expected label: anger 
Length of sentence: 30,Time taken: 0.3471353054046631 seconds

sentence: I approve of your decision., Predicted label: ["approval"],Expected label: approval 
Length of sentence: 27,Time taken: 0.351421594619751 seconds

sentence: I'm confused about what to do next., Predicted label: ["confusion"],Expected label: confusion 
Length of sentence: 35,Time taken: 0.3596150875091553 seconds

sentence: I'm really curious to learn more about this topic., Predicted label: ["curiosity"],Expected label: curiosity 
Length of sentence: 50,Time taken: 0.4208106994628906 seconds

sentence: I desire success and happiness., Predicted label: ["desire"],Expected label: desire 
Length of sentence: 31,Time taken: 0.3453202247619629 seconds

<font color="blue">When testing all types of emotions, the emotions of **remorse** and **neutral** were frequently misclassified. Despite this, the prediction times for most emotions remained consistent.</font>

## Testing different input lengths:

Examined the model's performance in relation to input length. That helped ensure that the model can handle and classify emotions correctly for extended texts without any performance or memory issues. 

In [11]:
comments = [
    "Thanks!",  # Gratitude, short
    "I truly appreciate your kindness and support.",  # Gratitude, a bit longer
    "I cannot express how much your help means to me. You have my eternal gratitude.",  # Gratitude, longer
    "I want to sincerely thank you for your invaluable help. Your support during this challenging period is deeply appreciated.",  # Gratitude, very long
    "I'm sorry.",  # Remorse, short
    "I apologize for my actions and I regret that I hurt you.",  # Remorse, a bit longer
    "I deeply regret my actions and I sincerely apologize for the harm I caused.",  # Remorse, longer
    "I'm truly sorry for my hurtful actions. I deeply regret the pain I've caused and I promise to learn from this mistake to avoid causing such harm in the future.",  # Remorse, very long
]

expectedLabels = ['gratitude', 'gratitude', 'gratitude', 'gratitude', 'remorse', 'remorse', 'remorse', 'remorse']

In [12]:
test_sentiment(comments,expectedLabels)

sentence: Thanks!, Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: 7,Time taken: 0.4464993476867676 seconds

sentence: I truly appreciate your kindness and support., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: 45,Time taken: 0.3529856204986572 seconds

sentence: I cannot express how much your help means to me. You have my eternal gratitude., Predicted label: ["love"],Expected label: gratitude 
Length of sentence: 79,Time taken: 0.3458139896392822 seconds

sentence: I want to sincerely thank you for your invaluable help. Your support during this challenging period is deeply appreciated., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: 122,Time taken: 0.37062978744506836 seconds

sentence: I'm sorry., Predicted label: ["remorse"],Expected label: remorse 
Length of sentence: 10,Time taken: 0.35904502868652344 seconds

sentence: I apologize for my actions and I regret that I hurt you., Predicted l

<font color="blue">In this case, the parameter under examination is the time taken for increasing input lengths. The results above show that an increase in the sentence length does not correspond to an increase in the time taken to predict an emotion.  </font>

In [13]:
comments = ["People often talk about love, but most of us are not fully aware about its true essence. There is lot of misconception about efficacy of love that holds us back to generate loving feelings towards others. Love is not possessiveness. People look love as a possession that has to be acquired and preserved. To expect that others ought to provide it to us so that our life is filled with love is the biggest fallacy, which is cause of much unhappiness. It is not like any other material thing to be demanded from others. Even if other person offers us plenty of love, we may not be able to feel it. It is normal to blame others for not loving us, but much depends on our inner self whether it has capability to feel it from others. Love is a feeling of well being and of good emotions. It is an activity that keeps us in good spirit and is liked to our emotions. Let us engage and create feeling of love by making self capable through appropriate changes in our dealings with others. There is no other way to love and be loved. The physical intimacy devoid of good feelings is not love but lust. People often fail to conceive love as pious in essence. While dealing with others, let us take care that our dealings make them cheerful, by helping them to come out of their problem, appreciate their successes and be grateful for help received from them. All these activities are to express love. The benefit of giving love to others is that it appeals to our heart and makes us connected to others, provides stability and security, removes fear and gives a feeling of being good towards other people. One can get to know love by first generating such feelings of being good to others. How can a person feel love from others if he is filled with ego, anger and selfish tendencies? These negative emotions suppress inner urge to love others. Love is a divine energy. I had a very vague idea about love initially. As I tried to understand more about it, a completely different perspective and thinking develops that explains the true essence of love. I have come to know from spiritual literature that love is God and God is love. It appears too abstract in the first instance, but more we tend to think of God will make us to love God and all other creations of God. It is like energy flowing within us derived from Ultimate that thinks positive and helps in inner purification. Albert Einstein discovered energy mass equation that explains interconnection of material and energy. It revolutionized the thinking of present century by using a small amount of mass to derive a tremendous energy. Hence, along with our material existence, somewhere we are also part of the divine energy lying within us as dormant. Logically, this divine energy which is nothing but love brings us close to Ultimate. I can imagine that every one of us has a great capacity of this divine love within us, but it is hidden, untapped and misdirected. Great saints have worked on human beings from time to time by developing intense feeling of love and concern for others. This has helped them to achieve higher levels of spiritual growth and closeness with the ultimate. The true meaning of love is inner purification of soul. This is the real purpose of love."]
expectedLabels = ["love"]

In [14]:
test_sentiment(comments,expectedLabels)

sentence: People often talk about love, but most of us are not fully aware about its true essence. There is lot of misconception about efficacy of love that holds us back to generate loving feelings towards others. Love is not possessiveness. People look love as a possession that has to be acquired and preserved. To expect that others ought to provide it to us so that our life is filled with love is the biggest fallacy, which is cause of much unhappiness. It is not like any other material thing to be demanded from others. Even if other person offers us plenty of love, we may not be able to feel it. It is normal to blame others for not loving us, but much depends on our inner self whether it has capability to feel it from others. Love is a feeling of well being and of good emotions. It is an activity that keeps us in good spirit and is liked to our emotions. Let us engage and create feeling of love by making self capable through appropriate changes in our dealings with others. There is 

<font color="blue">The result above solidifies the precious conclusion made as the length of the sentence is very large in this case but the time taken to yield a prediction approximately stays constant. Even with a large length of an input sentence the model yields a response at a very short time and very accurately. </font>

## Testing unusual characters:

Tests if CNN can generalize well and make accurate predictions even when faced with unexpected or uncommon characters.

In [15]:
comments = [
    "Thank$$$! @@@very much!!!",  # Gratitude, short
    "You're %%%amazing, I can't thank you enough!",  # Gratitude, a bit longer
    "Your help has been / / / invaluable, I am eternally grateful. ####",  # Gratitude, longer
    "I am deeply **grateful** for your support during this extremely <<difficult>> time.",  # Gratitude, very long
    "I'm s*rry... :(",  # Remorse, short
    "My #apologies for my behavior, I really @regret that.",  # Remorse, a bit longer
    "I ^deeply regret my actions and ~apologize for the harm I caused.",  # Remorse, longer
    "I am truly, deeply sorry for my hurtful actions. I !REGRET! the pain I've caused.",  # Remorse, very long
]

expectedLabels = ['gratitude', 'gratitude', 'gratitude', 'gratitude', 'remorse', 'remorse', 'remorse', 'remorse']

In [16]:
test_sentiment(comments,expectedLabels)

sentence: Thank$$$! @@@very much!!!, Predicted label: ["neutral"],Expected label: gratitude 
Length of sentence: 25,Time taken: 0.3705124855041504 seconds

sentence: You're %%%amazing, I can't thank you enough!, Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: 44,Time taken: 0.35370397567749023 seconds

sentence: Your help has been / / / invaluable, I am eternally grateful. ####, Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: 66,Time taken: 0.4207751750946045 seconds

sentence: I am deeply **grateful** for your support during this extremely <<difficult>> time., Predicted label: ["gratitude"],Expected label: gratitude 
Length of sentence: 83,Time taken: 0.3645198345184326 seconds

sentence: I'm s*rry... :(, Predicted label: ["neutral"],Expected label: remorse 
Length of sentence: 15,Time taken: 0.3423118591308594 seconds

sentence: My #apologies for my behavior, I really @regret that., Predicted label: ["remorse"],Expected labe

<font color="blue">The above test did not show any significant change in prediction time neither in missclassifications. That signifies that despite using special characters the model is able to yield accurate results, fairly quickly.</font>

## Test different number of list elements:

In [17]:
testExamples = [
    "This movie is hilarious! I can't stop laughing.",  # amusement
    "I'm so angry at what happened.",  # anger
    "I approve of your decision.",  # approval
    "I'm confused about what to do next.",  # confusion
    "I'm really curious to learn more about this topic.",  # curiosity
    "I desire success and happiness.",  # desire
    "I strongly disapprove of your behavior.",  # disapproval
    "I'm afraid of what might happen.",  # fear
    "I'm grateful for your help.",  # gratitude
    "The news filled me with joy.",  # joy
    "I love spending time with my family.",  # love
    "I feel remorseful for my actions.",  # remorse
    "I'm feeling sad and lonely.",  # sadness
    "I don't have any strong emotions right now."  # neutral
]

expectedLabels = [
    "amusement",
    "anger",
    "approval",
    "confusion",
    "curiosity",
    "desire",
    "disapproval",
    "fear",
    "gratitude",
    "joy",
    "love",
    "remorse",
    "sadness",
    "neutral"
]

In [18]:
for i in range(1,len(testExamples)):
    startTime = time.time()
    sample_sentences = testExamples[0:i]
    print(sample_sentences)
    predictions =  retreive_sentiments(testExamples[0:i])
    # Record the end time
    endTime = time.time()
    # Calculate the time taken
    timeTaken = endTime - startTime
    print(f'sentence: {sample_sentences}, Predicted label: {predictions},Expected label: {expectedLabels[i]} ')
    print(f'Length of sentence: {len(sample_sentences)},Time taken: {timeTaken} seconds')
    print()

["This movie is hilarious! I can't stop laughing."]
sentence: ["This movie is hilarious! I can't stop laughing."], Predicted label: ["amusement"],Expected label: anger 
Length of sentence: 1,Time taken: 0.3778963088989258 seconds

["This movie is hilarious! I can't stop laughing.", "I'm so angry at what happened."]
sentence: ["This movie is hilarious! I can't stop laughing.", "I'm so angry at what happened."], Predicted label: ["amusement", "anger"],Expected label: approval 
Length of sentence: 2,Time taken: 0.6845967769622803 seconds

["This movie is hilarious! I can't stop laughing.", "I'm so angry at what happened.", 'I approve of your decision.']
sentence: ["This movie is hilarious! I can't stop laughing.", "I'm so angry at what happened.", 'I approve of your decision.'], Predicted label: ["amusement", "anger", "approval"],Expected label: confusion 
Length of sentence: 3,Time taken: 1.107055902481079 seconds

["This movie is hilarious! I can't stop laughing.", "I'm so angry at what

sentence: ["This movie is hilarious! I can't stop laughing.", "I'm so angry at what happened.", 'I approve of your decision.', "I'm confused about what to do next.", "I'm really curious to learn more about this topic.", 'I desire success and happiness.', 'I strongly disapprove of your behavior.', "I'm afraid of what might happen.", "I'm grateful for your help.", 'The news filled me with joy.', 'I love spending time with my family.', 'I feel remorseful for my actions.', "I'm feeling sad and lonely."], Predicted label: ["amusement", "anger", "approval", "confusion", "curiosity", "desire", "disapproval", "fear", "gratitude", "joy", "love", "neutral", "sadness"],Expected label: neutral 
Length of sentence: 13,Time taken: 4.7518470287323 seconds



<font color="blue">From testing different number of list elements, one can deduce the larger the number of elements of the input list the more the time taken to yield a prediction. This may be due to the increase in required computation time as the algorithm has to make another prediction for each element added to the list.</font>

# More performance testing on the app

- Parallel requests and associated runtimes (to simulate concurrent users)
- Throughput
- Memory management
- Adversarial testing

In [5]:
# import required libraries
import json
import requests
import concurrent.futures
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import pandas as pd
import time

In [6]:
# Function to test flask app by getting an output
def get_sentiment(data):
    url = "http://127.0.0.1:5050/get_sentiment"
    payload = json.dumps(data)  # Convert the list of comments to JSON format
    response = requests.post(url, json=payload)
    return response.text

In [7]:
# Function for sending a list of inputs
def get_sentiments_list(data):
    url = "http://127.0.0.1:5050/retreive_sentiments"
    payload = json.dumps(data)  # Convert the list of comments to JSON format
    response = requests.post(url, json=payload)
    return response.text

In [None]:
# Get log contents
with open('service.log', 'r') as file:
    log_contents = file.read()

print(log_contents)

In [9]:
# Load the sentences from test dataset
df = pd.read_csv('test_data.csv')
texts = df['1'].tolist()

### Trial functions
functions used to send requests to app

In [13]:
get_sentiments_list(texts[:10])

'["gratitude", "approval", "neutral", "neutral", "amusement", "approval", "neutral", "curiosity", "love", "gratitude"]'

### Concurrent Users testing

For this test, we simulated concurrent users accessing the deployed service to evaluate its performance under realistic user scenarios. The response time and throughput were measured to assess user experience. The first test which sent 100 requests in parallel to 2 workers took 41.78 seconds – this is about 0.42 seconds per each request. This is an acceptable amount of time taken to generate an output from the app. Using ‘ThreadPoolExecutor’ and ‘ProcessPoolExecutor’ yielded similar results, and increasing the number of parallel workers up to 20 also did not affect the time taken per each request. The number of parallel workers that will be used by ‘ThreadPoolExecutor’ is limited by the hardware configuration of the machine that is hosting the application – mainly on the number of CPUs available to handle requests. 

 Using ‘ThreadPoolExecutor’ and ‘ProcessPoolExecutor’ yielded similar results, and increasing the number of parallel workers up to 20 also did not affect the time taken per each request. The number of parallel workers that will be used by ‘ThreadPoolExecutor’ is limited by the hardware configuration of the machine that is hosting the application – mainly on the number of CPUs available to handle requests. 

The test results using different numbers of parallel request and total numbers of request are shown below: 

In [14]:
# Function for concurrent users - getting outputs in parallel
def get_sentiment_parallel(data, workers):

    with ThreadPoolExecutor(max_workers=workers) as executor:
        futures = []
        for text in data:
            future = executor.submit(get_sentiment, text)
            futures.append(future)
            
        results = []
        for future in concurrent.futures.as_completed(futures):
            result = future.result()
            results.append(result)
    return results

In [15]:
def get_sentiment_parallel1(data, workers):

    with ProcessPoolExecutor(max_workers=workers) as executor:
        
        future = executor.map(get_sentiment, data)
        results= list(future)
    return results

In [18]:
starttime = time.time()
numsentences = 100
numworkers = 2
get_sentiment_parallel(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 100 requests with 2 parallel workers: 
41.78329825401306


In [36]:
starttime = time.time()
numsentences = 100
numworkers = 5
get_sentiment_parallel(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 100 requests with 5 parallel workers: 
41.27588415145874


0.42 secs per request

In [20]:
starttime = time.time()
numsentences = 100
numworkers = 10
get_sentiment_parallel(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 100 requests with 10 parallel workers: 
42.67776656150818


In [21]:
starttime = time.time()
numsentences = 100
numworkers = 20
get_sentiment_parallel(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 100 requests with 20 parallel workers: 
42.478466272354126


In [22]:
starttime = time.time()
numsentences = 100
numworkers = 20
get_sentiment_parallel1(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 100 requests with 20 parallel workers: 
42.20289349555969


In [23]:
starttime = time.time()
numsentences = 1000
numworkers = 5
get_sentiment_parallel(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 1000 requests with 5 parallel workers: 
419.41624879837036


0.419 secs per request

In [24]:
starttime = time.time()
numsentences = 2000
numworkers = 5
get_sentiment_parallel(texts[:numsentences], numworkers)
runtime = time.time() - starttime

print(f'runtime for {numsentences} requests with {numworkers} parallel workers: ')
print(runtime)

runtime for 2000 requests with 5 parallel workers: 
828.6388969421387


0.414 secs per request

### Throughput

The throughput measures the total number of requests the system can handle per second with the current architecture.

For testing the throughput, we simulated a large number of requests and measured the response time of the app. The throughput can be calculated by dividing the number of successful requests by the total time taken. The results can help identify any performance bottlenecks or scalability issues in the app. Overall, the system can handle about 1.6 requests per second with the current architecture on a moderately powerful machine running Ubuntu. This is expected to change with the machine hosting the application. 

In [25]:
def measure_throughput():
    start_time = time.time()
    for i in range(1000):
        get_sentiment(texts[:1000])
    end_time = time.time()
    throughput = 1000 / (end_time - start_time)
    return throughput

In [26]:
throughput = measure_throughput()
print(f"Throughput: {throughput}")

Throughput: 1.5700119529852363


### Memory management

We evaluated the system's memory management capabilities to detect any memory leaks or inefficient memory usage that may lead to performance degradation or system failures over time. Memory management can be a complex task due to the many approaches being available. There is no one right way of taking care of memory of a system. We first observed the change in memory usage before and after sending the app 100 requests using a memory profiler. Two different functions were created for this which used different tools to measure the memory usage. Both times, the output showed that the system is using 0 MB of memory. This may be because the requests we are sending are short sentences and this does not really use a lot of memory. Heavier requests can be sent in the future, but we have the tools to identify memory leaks using these functions. 

Additionally, a memory profile decorator was added to the ‘/get_sentiment’ route of our app. This prints the memory usage per line of code for the app route to observe the memory allocation for various processes and get a better understanding of how the app behaves. To predict the emotion of an input text using the CNN model, we observed that a shorter sentence takes up slightly less memory than a longer sentence as expected. The application is not misbehaving here. However, it was also observed that upon restarting the application, the first request sent would allocate a significantly larger portion of memory to the same process. This can be kept in mind when considering redeploying the application after any updates. 

In [27]:
import requests
import json
from memory_profiler import memory_usage

# Function to monitor memory usage during request handling
def monitor_memory_usage():
    # Start memory profiler
    mem_usage_start = memory_usage()[0]

    # Send request to Flask app
    for i in range(100):
        get_sentiment(texts[:100])

    # End memory profiler
    mem_usage_end = memory_usage()[0]

    # Calculate memory usage during request handling
    mem_usage_diff = mem_usage_end - mem_usage_start

    # Print memory usage information
    print(f"Memory usage during request handling: {mem_usage_diff} MiB")

monitor_memory_usage()

Memory usage during request handling: 0.0 MiB


In [88]:
import requests
import json
from memory_profiler import memory_usage
import psutil 

# Function to monitor memory usage during request handling
def measure_memory_usage():
    notebook = psutil.Process()
    start_mem = notebook.memory_info().rss
    get_sentiments_list(texts[20:30])
    mem_usage = notebook.memory_info().rss - start_mem
    return mem_usage

mem = measure_memory_usage()
print(f"Memory usage during request handling: {mem} MiB")

Memory usage during request handling: 0 MiB


After using a memory profiler in app.py, observe the memory usage difference (from the terminal) upon sending requests of varying lengths
The shorter input is allocated less memory as expected and the long input is assigned more.

In [35]:
get_sentiment("oh wow no wonder I am so tired after all the assignments are over.")

'sadness'

In [33]:
get_sentiment("Happiness is not something that you can find outside of yourself. It is not something that you can buy, earn, or achieve. Happiness is a state of mind, a way of being, a choice that you make every day. Happiness is not dependent on your circumstances, your achievements, or your relationships. Happiness is something that you create within yourself, by choosing to focus on the positive aspects of your life, by expressing gratitude for what you have, by being kind to yourself and others, and by living in alignment with your values and purpose. Happiness is not a destination, but a journey. Happiness is not a fixed point, but a dynamic process. Happiness is not a one-time event, but a habit that you cultivate and practice. Happiness is not something that happens to you, but something that you make happen. Happiness is not something that you can find outside of yourself. It is not something that you can buy, earn, or achieve. Happiness is a state of mind, a way of being, a choice that you make every day. Happiness is not dependent on your circumstances, your achievements, or your relationships. Happiness is something that you create within yourself, by choosing to focus on the positive aspects of your life, by expressing gratitude for what you have, by being kind to yourself and others, and by living in alignment with your values and purpose. Happiness is not a destination, but a journey. Happiness is not a fixed point, but a dynamic process. Happiness is not a one-time event, but a habit that you cultivate and practice. Happiness is not something that happens to you, but something that you make happen. Happiness is not something that you can find outside of yourself. It is not something that you can buy, earn, or achieve. Happiness is a state of mind, a way of being, a choice that you make every day. Happiness is not dependent on your circumstances, your achievements, or your relationships. Happiness is something that you create within yourself, by choosing to focus on the positive aspects of your life, by expressing gratitude for what you have, by being kind to yourself and others, and by living in alignment with your values and purpose. Happiness is not a destination, but a journey. Happiness is not a fixed point, but a dynamic process. Happiness is not a one-time event, but a habit that you cultivate and practice. Happiness is not something that happens to you, but something that you make happen. Happiness is not something that you can find outside of yourself. It is not something that you can buy, earn, or achieve. Happiness is a state of mind, a way of being, a choice that you make every day. Happiness is not dependent on your circumstances, your achievements, or your relationships. Happiness is something that you create within yourself, by choosing to focus on the positive aspects of your life, by expressing gratitude for what you have, by being kind to yourself and others, and by living in alignment with your values and purpose. Happiness is not a destination, but a journey. Happiness is not a fixed point, but a dynamic process. Happiness is not a one-time event, but a habit that you cultivate and practice. Happiness is not something that happens to you, but something that you make happen.")

'approval'

### Adversarial testing

This is a test on the deployed model rather than on the app. A machine learning model can be deceived by an adversarial example, which is an input that has been intentionally modified to cause a wrong prediction. A special type of adversarial example is an adversarial perturbation, which is a small change applied to a valid input. Adversarial perturbations are different from adversarial examples that are created from scratch to mislead machine learning models. TextAttack is a tool that can generate adversarial perturbations for text inputs. There can be 2 types of adversarial attacks in NLP: 

1. Visually similar adversarial examples 

These are cases where the altered input text is visually almost like the original text with very small differences. This could be a typo too. This kind of adversarial attacks try to get a different output from the model after changing only some characters of the text 

2. Semantically similar adversarial examples 

This kind of adversarial attacks try to get a different output from an NLP model after replacing some words with their synonyms or paraphrasing sentences while preserving the underlying emotion. 

Some test results are shown below: 

Example from TextAttack

In [112]:
# original text
get_sentiment("Connoisseurs of Chinese film will be pleased to discover that Tian's meticulous talent has not withered during his enforced hiatus.")

'joy'

In [133]:
# Visually similar
get_sentiment("Aonnoisseurs of Chinese film will be qleased to discover that Tian's meticulous talent has not withered during his enforced hiatus.")

'approval'

In [134]:
# Semantically similar
get_sentiment("Connoisseurs of Chinese footage will be pleased to find out that Tian's meticulous talent has not withered during his enforced hiatus.")

'approval'

Here, the model provides a different output with both attacks. This displays poor adversarial robustness of the model. Next, we used an example from the test-data and tried to get varying outputs. 

Example from test dataset

In [122]:
# original text
get_sentiment("My grandparents were holocaust survivors. Seeing things like this and trying to comprehend the horrors that they had to endure really helps put my problems into perspective.")

'sadness'

In [132]:
# visually similar
get_sentiment("My grandparents were holocaust curvivors. Seeing things like this and trying to comprehend the horrors that they had to endure really helps put my problems into perspective.")

'anger'

In [139]:
# semantically similar
get_sentiment("My grandparents were genocide survivors. Seeing things like this and trying to understand the horrors that they had to go through makes my issues seem smaller")

'sadness'

Again, the first attack gives a different output which is not favourable. However, for the semantically similar example, the model still predicted the correct emotion. 