<a href="https://colab.research.google.com/github/mltrev23/tech-test/blob/main/3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem
3. Implementation of a recommendation system with Scikit-surprise
   - Case study: [Book Recommendation Dataset](https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset/data?select=Ratings.csv)


# Code
Setup Environment

In [1]:
pip install scikit-surprise pandas numpy matplotlib fastapi uvicorn pyngrok nest_asyncio

Collecting scikit-surprise
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting fastapi
  Downloading fastapi-0.112.2-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn
  Downloading uvicorn-0.30.6-py3-none-any.whl.metadata (6.6 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB)
Collecting starlette<0.39.0,>=0.37.2 (from fastapi)
  Downloading starlette-0.38.4-py3-none-any.whl.metadata (6.0 kB)
Collecting h11>=0.8 (from uvicorn)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading fastapi-0.112.2-py3-none-any.whl (93 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.5/93.5 kB[0m [31m6.6 MB/s[0m eta [36m0:

Data Handling

In [2]:
import subprocess
import pandas as pd

# Define the Kaggle dataset URL and filename
kaggle_url = "arashnic/book-recommendation-dataset"
file_name = "Ratings.csv"

# Use Kaggle API to download the file
subprocess.run(['kaggle', 'datasets', 'download', '-d', kaggle_url, '-f', file_name])

# Unzip the downloaded file
subprocess.run(['unzip', f'{file_name}.zip'])

# Load the dataset
ratings = pd.read_csv(file_name)

# Display the first few rows
ratings.head()


Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


Building the Recommendation System

In [4]:
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate

# Load data into Scikit-surprise format
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(ratings[['User-ID', 'ISBN', 'Book-Rating']], reader)


Train model

In [7]:
# Test the trained algorithm
from surprise import accuracy
from surprise.model_selection import train_test_split

# Use SVD algorithm
model = SVD()

# Train-test split
trainset, testset = train_test_split(data, test_size=0.2)
model.fit(trainset)
predictions = model.test(testset)

# Calculate RMSE
accuracy.rmse(predictions)


RMSE: 3.5011


3.501114381182237

Evaluation

In [8]:
# Perform cross-validation
cross_validate(model, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    3.5020  3.4927  3.4992  3.4998  3.4966  3.4981  0.0032  
MAE (testset)     2.9343  2.9243  2.9284  2.9260  2.9289  2.9284  0.0034  
Fit time          26.56   24.66   24.15   24.29   24.24   24.78   0.91    
Test time         3.78    3.91    1.57    3.85    1.63    2.95    1.10    


{'test_rmse': array([3.50201537, 3.49270644, 3.49917718, 3.49980343, 3.49656876]),
 'test_mae': array([2.93428584, 2.92427814, 2.92838962, 2.92603293, 2.9289058 ]),
 'fit_time': (26.560561418533325,
  24.659388065338135,
  24.14574122428894,
  24.288732051849365,
  24.24217200279236),
 'test_time': (3.775022029876709,
  3.9061596393585205,
  1.5724499225616455,
  3.8524391651153564,
  1.628727674484253)}

Save model

In [9]:
import joblib

# Save the model
joblib.dump(model, 'svd_book_recommendation_model.pkl')

['svd_book_recommendation_model.pkl']

Deployment

In [12]:
from fastapi import FastAPI
import joblib
from pydantic import BaseModel

# Initialize FastAPI app
app = FastAPI()

# Load the trained model
model = joblib.load('svd_book_recommendation_model.pkl')

# Define the input data model
class RatingRequest(BaseModel):
    user_id: str
    item_id: str

# Define the prediction route
@app.post('/predict')
def predict_rating(data: RatingRequest):
    # Make prediction
    prediction = model.predict(data.user_id, data.item_id).est
    return {'predicted_rating': prediction}


  self.stack[-2:] = [(self.stack[-2], self.stack[-1])]


ngrok auth

In [15]:
!ngrok authtoken 2laQP6bVYRgAXRWonIEL3VdYIfQ_29SVCfHbGCRJxAieHco41

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


Serving

In [None]:
import uvicorn
import nest_asyncio
from pyngrok import ngrok

nest_asyncio.apply()

public_url = ngrok.connect(9003, "http")
print('Public URL:', public_url)

uvicorn.run(app, host='0.0.0.0', port=9003)

Public URL: NgrokTunnel: "https://5d61-34-106-142-89.ngrok-free.app" -> "http://localhost:9003"


INFO:     Started server process [166]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9003 (Press CTRL+C to quit)


INFO:     104.223.87.12:0 - "POST /predict HTTP/1.1" 200 OK
INFO:     104.223.87.12:0 - "POST /predict HTTP/1.1" 200 OK
INFO:     104.223.87.12:0 - "POST /predict HTTP/1.1" 200 OK
