# Deployed with Flask

Using scikit-learn, I trained a classifier to classify movie reviews using the IMDB dataset. A positive review should get a score of 1 (*i.e.,* thumbs-up) and a negative review should get a score of 0 (*i.e., thumbs-down). 

I then used Flask to set up a local server at http://localhost:5000/classify. (The corresponding code is in 'app/api.py'.) Upon receiving a POST request with a payload consisting of a list of (text) movie reviews, the server applies the trained model to generate corresponding movie-review-classification predictions.

To try the out the trained classifier, I use samples from the set of unlabelled (*i.e.,* unsupervised) movie revews from the IMDB dataset.

## Imports

In [1]:
import numpy as np
import pandas as pd
import requests

from datasets import load_dataset

from app.cleaner.preprocessor import Preprocessor
from app.cleaner.tokenizer import Tokenizer

In [2]:
rng = np.random.default_rng()

## Load Dataset

In [3]:
ds = load_dataset('imdb')
all_reviews = ds['unsupervised']['text']

Here's an example movie review:

In [4]:
rng.choice(all_reviews)

"This game is terrific. Even do this game is made for the game boy advance with all of the console's limits, is also extremely enjoyable. It is a very long and fun game with innovating stuff for the genre. Its story is really good as the characters are. But what is the difference between this RPG and others? It has magic.<br /><br />When you take into account that the game boy advance is not one of the best consoles out there, you would be a bit hesitant about this game. But I can assure you that all the good stuff totally make up for the system's limits. The fight are very well designed, the world is really big and you can find tons of monster to practice the moves. The sound is also enjoyable and the graphics are good for the game boy advance.<br /><br />The story is about a group of four adepts (The person who can use Psynergy) with different powers, who need to stop the bad guys from lighting the four beacons/seals and therefore break the alchemy seal that use to balance to forces 

## Predictions

In [5]:
# url of the movie-review-classifier server
URL = 'http://localhost:5000/classify'

In [6]:
def get_predictions(reviews: list[str]) -> list[int]:
    """Return a list of movie-review-classifier predictions.

    Makes a POST request to the movie-review classifier and returns
    a list of 0's and 1's where a 0 indicates a thumbs-down
    classification of the corresponding movie review and a 1 indicates
    a thumbs-up.

    :param reviews: A list of movie reviews
    :type reviews: list[str]

    :return: A list of predictions from the movie-review-classifier
    :rtype: list[int]
    """
    payload = {'reviews': reviews}
    response = requests.post(URL, json=payload)
    return response.json()['labels']

In [7]:
reviews = rng.choice(all_reviews, size=4, replace=False).tolist()
reviews

["This is one of the Great ones: In terms of French cinema in the first full decade of Sound it belongs right up there with Marius, Fanny, Cesar, Quai des Brumes, Le Jour se leve, La Femme du boulanger, La Grande Illusion and you can't put anything higher than that though Duvivier's own La Belle equipe and Pepe Le Moko both come within a whisker. It's the kind of film that would be difficult to make today as would, for example, Dial M For Murder. Frederick Knott wrote Dial as a play in the early fifties and the Hitchcock film version was released in 1954 BUT the entire plot (our old friend the 'perfect' murder) hinged on the fact that in those days only the upper and middle classes had telephones at all and those were in fixed locations and in this era of jack points and cell phones the idea of someone obliged to answer a telephone located on a desk in front of heavy drapes behind which a murderer was lurking ready to strike when the phone was answered would be ludicrous. Carnet is sim

In [8]:
predictions = get_predictions(reviews)
predictions

[1, 0, 1, 0]