# Machine Learning Engineering

## Sentiment Analysis Model Deployment and Cloud Integration

### Description: Building a machine learning model using scikit learn and making it queriable using cloud functions

I build a Machine Learning model that does sentiment analysis of financial text data using a Multinomial Naive Bayes classifier. 

In [26]:
pip install google-api-python-client

Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
import numpy as np
import sklearn
from google.cloud import storage
import os
from io import StringIO
from io import BytesIO

In [2]:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Gcredentials_Rhea3.json"

In [3]:
client = storage.Client()

### Training the ML model:

In [4]:
# Importing data 

df = pd.read_csv("data.csv")

In [5]:
df.head()

Unnamed: 0,Sentence,Sentiment
0,The GeoSolutions technology will leverage Bene...,positive
1,"$ESI on lows, down $1.50 to $2.50 BK a real po...",negative
2,"For the last quarter of 2010 , Componenta 's n...",positive
3,According to the Finnish-Russian Chamber of Co...,neutral
4,The Swedish buyout firm has sold its remaining...,neutral


In [6]:
# Dropping duplicates

df.drop_duplicates(inplace=True)

In [7]:
# Getting count of values for each class

df['Sentiment'].value_counts()

neutral     3124
positive    1852
negative     860
Name: Sentiment, dtype: int64

In [8]:
# Importing necessary libraries for model

import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import joblib
from sklearn.pipeline import Pipeline

nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /Users/rheasethi/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/rheasethi/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [9]:
# Defining X and y 

X = df['Sentence']
y = df['Sentiment']

In [10]:
# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [19]:
# Creating a pipeline for the preprocessing and for fitting the model 

pipeline = Pipeline([
    ('preprocessing', TfidfVectorizer(
        lowercase=True,
        token_pattern=r'\b\w+\b',
        stop_words='english'
    )),
    ('classifier', MultinomialNB())
])

In [12]:
# Train the model
pipeline.fit(X_train, y_train)

# Predict the sentiment for test data
y_pred = pipeline.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.6806506849315068


In [20]:
# Locally saving the model

joblib.dump(pipeline, 'sentiment_analysis_pipeline.pkl')

['sentiment_analysis_pipeline.pkl']

### Storing the pipeline on cloud:

In [21]:
# Storing on cloud

bucket = client.get_bucket("econ446hw2")
blob = bucket.blob("financial_sentiment/pipeline.pkl")
blob.upload_from_filename("sentiment_analysis_pipeline.pkl")

### Cloud Functions:

In [15]:
import google
import joblib
import pandas
import requests
import sklearn
from urllib.parse import parse_qs
from google.cloud import storage
import os
from io import StringIO
from joblib import load
from io import BytesIO

In [42]:
# Function to access the model from cloud

def load_sentiment_model(file_name):
    bucket_name = "econ446hw2"
    source_blob = "financial_sentiment/" + file_name
    
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "Gcredentials_Rhea3.json"
    client = storage.Client()
    
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(source_blob)
    
    model_data = blob.download_as_bytes()
    
    model = joblib.load(BytesIO(model_data))
    return(model)


# Function that takes an input from user and returns model's prediction

def sentiment_analysis(request):
    try:
        # Load the sentiment analysis pipeline
        pipeline = load_sentiment_model("pipeline.pkl")

        # Get the input text from the request
        input_text = request.get_data().decode()

        # Preprocess and make predictions using the pipeline
        prediction = pipeline.predict([input_text])[0]
        probability = str(round(pipeline.predict_proba([input_text])[0][1]*100,2)) + "%"

        # Return the sentiment prediction and probability as the response
        return {
            "sentiment": prediction,
            "probability": probability
        }

    except Exception as e:
        return {"error": str(e)}

### Querying the model:

In [25]:
url = "https://us-central1-festive-airway-386516.cloudfunctions.net/nlpfunc"

In [30]:
r = requests.post(url, "Nvidia shares had surged but are now pulling back")

In [31]:
r.text

'{"probability":"42.14%","sentiment":"neutral"}\n'

## Making a user-friendly page that takes inputs to my ML model and displays the output. 

In [2]:
import ipywidgets as widgets
from IPython.display import display

In [3]:
input_text = widgets.Text(
    value = "",
    placeholder = "Let's talk Finance",
    description = "Input Text",
    disabled = False)

button = widgets.Button(description = "Click for Results!")

def my_function(button):
    url = "https://us-central1-festive-airway-386516.cloudfunctions.net/nlpfunc"
    r = requests.post(url, input_text.value)
    
    print("Prediction of Sentiment: " , r.json()["sentiment"])
    print("Probability of Sentiment: " , r.json()["probability"])

button.on_click(my_function)

In [4]:
display(input_text)
display(button)

Text(value='', description='Input Text', placeholder="Let's talk Finance")

Button(description='Click for Results!', style=ButtonStyle())

Prediction of Sentiment:  neutral
Probability of Sentiment:  60.92%
Prediction of Sentiment:  neutral
Probability of Sentiment:  50.64%
Prediction of Sentiment:  positive
Probability of Sentiment:  39.53%


## Industrial Applications of this Project for Companies and Employees: 

The ML app developed here for financial sentiment analysis can be highly valuable to companies and professionals in the financial domain, especially because of its accessibility and ease-of-use by virtue of it being on the cloud and the user-friendly GUI. 

Financial institutions could employ this sentiment analysis ML model at scale and across teams to analyze the sentiment of financial news, analyst reports, press releases etc. to gain valuable insights and enhance their operations and decision-making processes. The research and trading teams could use this to identify trends, assess market sentiment, and adjust their trading strategies. The risk management teams could leverage the model to monitor sentiment around specific financial instruments, industries, or market segments, helping them identify potential risks and take appropriate risk mitigation measures. 

Overall, the app empowers employees across various departments in financial firms to efficiently extract sentiment insights from financial data, enabling them to make data-driven decisions and gain a competitive edge in the industry.