# BentoML Example: Sentiment Analysis with Scikit-learn

* [Origin Notebook on Google Colab](https://colab.research.google.com/github/bentoml/gallery/blob/0.13-LTS/scikit-learn/sentiment-analysis/sklearn-sentiment-analysis.ipynb)

* [Dataset](https://docs.google.com/file/d/0B04GJPshIjmPRnZManQwWEdTZjg/edit?resourcekey=0-betyQkEmWZgp8z0DFxWsHw) Already save a copy file in my google drive (under the `FileDisk` folder)

* Requried packages with theirversions:

> python=3.7
>
> bentoml=0.9.0
>
> protobuf=3.20.*
>
> sqlalchemy=1.3.*

## 安裝Conda

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:15
🔁 Restarting kernel...


## 取得 trainingandtestdata.zip

In [None]:
import os

from google.colab import drive

drive.mount('/content/drive/')
os.chdir('/content/drive/MyDrive/FileDisk')
# os.listdir()
# !mv trainingandtestdata.zip /content
!cp trainingandtestdata.zip /content/trainingandtestdata.zip
os.chdir('/content') ## 切換到 colab 使用者 default 目錄

Mounted at /content/drive/


In [None]:
# 似乎沒用到
# %reload_ext autoreload
# %autoreload 2
# %matplotlib inline

## 解壓 trainingandtestdata.zip

In [None]:
%%bash
unzip -n trainingandtestdata.zip

Archive:  trainingandtestdata.zip
  inflating: testdata.manual.2009.06.14.csv  
  inflating: training.1600000.processed.noemoticon.csv  


## 用 Conda 建立 virtual environment

In [None]:
!conda create -p env python=3.7 -y -q

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /content/env

  added / updated specs:
    - python=3.7


The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  ca-certificates    conda-forge/linux-64::ca-certificates-2025.1.31-hbcca054_0 
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.43-h712a8e2_4 
  libffi             conda-forge/linux-64::libffi-3.4.6-h2dba641_0 
  libgcc             conda-forge/linux-64::libgcc-14.2.0-h767d61c_2 
  libgcc-ng          conda-forge/linux-64::libgcc-ng-14.2.0-h69a702a_2 
  libgomp            conda-forge/linux-64::libgomp-14.2.0-h767d61c_2 
  liblzma            conda-forge/linux-64::liblzma-5.6.4-hb9d3cd8_0 
  liblzma-devel      conda-forge/linux-64::liblzma-dev

## 在 env 下，安裝 `bentoml(0.9.0)`

In [None]:
!source activate ./env; pip install -q 'bentoml==0.9.0'

## 在 env 下，安裝 `scipy` 與 `matplotlib`

In [None]:
!source activate ./env; pip install -U scipy matplotlib --quiet

## 在 env 下，安裝 `scikit-learn` 與 `pandas` 與 `numpy`

In [None]:
!source activate ./env; pip install -q 'scikit-learn>=0.23.2' 'pandas>=1.1.1' 'numpy>=1.8.2'

## 在 env 下，安裝 `protoful(3.20.X)`

In [None]:
!source activate ./env; pip install protobuf==3.20.* -q

## 在 env 下，安裝 `sqlalchemy(1.3.*)`

In [None]:
!source activate ./env; pip install -q sqlalchemy==1.3.*

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25h

## 訓練模型 sentiment_lr

用 `pickle` 將模型 sentiment_lr 存起來成為 `sentiment_lf.pkl` 

In [None]:
%%bash
source activate ./env

python

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.pipeline import Pipeline

import bentoml

log_model = LogisticRegression(solver='lbfgs', max_iter=1000)

columns = ['polarity', 'tweetid', 'date', 'query_name', 'user', 'text']
dftrain = pd.read_csv('training.1600000.processed.noemoticon.csv',
                      header = None,
                      encoding ='ISO-8859-1')
dftest = pd.read_csv('testdata.manual.2009.06.14.csv',
                     header = None,
                     encoding ='ISO-8859-1')
dftrain.columns = columns
dftest.columns = columns

sentiment_lr = Pipeline([
                         ('count_vect', CountVectorizer(min_df = 100,
                                                        ngram_range = (1,2),
                                                        stop_words = 'english')),
                         ('lr', log_model)])
sentiment_lr.fit(dftrain.text, dftrain.polarity)

Xtest, ytest = dftest.text[dftest.polarity!=2], dftest.polarity[dftest.polarity!=2]
print(classification_report(ytest,sentiment_lr.predict(Xtest)))

# sentiment_lr.predict([Xtest[0]])
result=sentiment_lr.predict([Xtest[0]])
print("result=",result)


import pickle

# save the iris classification model as a pickle file
model_pkl_file = "sentiment_lf.pkl"

with open(model_pkl_file, 'wb') as file:
    pickle.dump(sentiment_lr, file)

              precision    recall  f1-score   support

           0       0.86      0.81      0.83       177
           4       0.82      0.87      0.85       182

    accuracy                           0.84       359
   macro avg       0.84      0.84      0.84       359
weighted avg       0.84      0.84      0.84       359

result= [4]


## 建立 BentoService API 

In [None]:
%%writefile sentiment_analysis_service.py
import pandas as pd
import bentoml
from bentoml.frameworks.sklearn import SklearnModelArtifact
from bentoml.service.artifacts.common import PickleArtifact
from bentoml.handlers import DataframeHandler
from bentoml.adapters import DataframeInput

@bentoml.artifacts([PickleArtifact('model')])
@bentoml.env(pip_packages=["scikit-learn", "pandas"])
class SKSentimentAnalysis(bentoml.BentoService):

    @bentoml.api(input=DataframeInput(), batch=True)
    def predict(self, df):
        """
        predict expects pandas.Series as input
        """
        series = df.iloc[0,:]
        return self.artifacts.model.predict(series)

Writing sentiment_analysis_service.py


## 保存 BentoService 到檔案中

BentoService 把 sentiment_lf 模型打包

In [None]:
%%bash

source activate ./env

python

from sentiment_analysis_service import SKSentimentAnalysis
import pickle

model_pkl_file = "sentiment_lf.pkl"

with open(model_pkl_file, 'rb') as file:
    model = pickle.load(file)

bento_service = SKSentimentAnalysis()
bento_service.pack('model', model)

saved_path = bento_service.save() ## Finally successful

# train 過程中產生的 "saved_path"，被存到'var_obj.pkl'中
var_file = "var_obj.pkl"

with open(var_file, 'wb') as file:
    pickle.dump(saved_path, file)

[2025-03-03 10:52:31,308] INFO - BentoService bundle 'SKSentimentAnalysis:20250303105201_8A2A51' saved to: /root/bentoml/repository/SKSentimentAnalysis/20250303105201_8A2A51


In [None]:
!source activate ./env; bentoml list

[39mBENTO_SERVICE                              AGE                           APIS                                   ARTIFACTS              LABELS
SKSentimentAnalysis:20250303072553_3E9744  13 minutes and 28.14 seconds  predict<DataframeInput:DefaultOutput>  model<PickleArtifact>[0m


## 啟動 REST API model server 用上一節保存的 BentoService

In [None]:
!source activate ./env; bentoml serve SKSentimentAnalysis:latest

[2025-03-03 08:17:13,426] INFO - Getting latest version SKSentimentAnalysis:20250303072553_3E9744
[2025-03-03 08:17:13,427] INFO - Starting BentoML API server in development mode..
 * Serving Flask app 'SKSentimentAnalysis'
 * Debug mode: off
 * Running on http://127.0.0.1:5000
[33mPress CTRL+C to quit[0m

Aborted!


### 另法，但需要註冊 ngrok 

In [None]:
!source activate ./env; bentoml serve SKSentimentAnalysis:latest --run-with-ngrok

## Load saved BentoService

In [None]:
%%bash

source activate ./env

python

import bentoml
import pandas as pd

# saved_path = "/root/bentoml/repository/SKSentimentAnalysis/20250303095521_E343F0" ## 測試用

# train 過程中產生的 "saved_path"，被存到'var_obj.pkl'中，在此再從'var_obj.pkl'存取 "saved_path"
import pickle
var_file = "var_obj.pkl"

with open(var_file, 'rb') as file:
    saved_path = pickle.load(file)

# Load exported bentoML model archive from path
loaded_bento_service = bentoml.load(saved_path)

# Call predict on the restored sklearn model
result=loaded_bento_service.predict(pd.DataFrame(data=["good", "great"]))
print("result=",result)

result= [4]


## Launch inference job from CLI

In [None]:
!source activate ./env; bentoml run SKSentimentAnalysis:latest predict \
--input '["some new text, sweet noodles", "happy time", "sad day"]'

[2025-03-03 08:45:36,070] INFO - Getting latest version SKSentimentAnalysis:20250303072553_3E9744
[2025-03-03 08:45:42,363] INFO - {'service_name': 'SKSentimentAnalysis', 'service_version': '20250303072553_3E9744', 'api': 'predict', 'task': {'data': '["some new text, sweet noodles", "happy time", "sad day"]', 'task_id': 'd2b2302b-1372-4ca2-9b82-d492d3c80413', 'batch': 3, 'cli_args': ('--input', '["some new text, sweet noodles", "happy time", "sad day"]')}, 'result': {'data': '[4, 4, 4]', 'http_status': 200, 'http_headers': (('Content-Type', 'application/json'),)}, 'request_id': 'd2b2302b-1372-4ca2-9b82-d492d3c80413'}
[4, 4, 4]
