<a href="https://colab.research.google.com/github/shaikadish/imdbProject/blob/main/rating_predictor_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to my Review Rating Predictor demo!

This is a demo for an NLP model I trained to predict your rating of a movie (on a scale from 1-10) based on a review that you write for it. 

To run the demo, hover your mouse over the code block directly below the **Demo** heading and press the play button on the top left. The first time you run the demo, some files will be loaded, which can take a minute or two. Once the demo is running, you can type your film review in the box that appears at the bottom of the page, and then press enter to see the score that the model predicted. To run the demo again, just press the play button! Remember, the model performs best if you type longer reviews, so if it is giving nonsensical ratings from reviews like "That movie sucked" try fleshing out the review to get a more accurate rating.

If you are interested in how I made this, please check out this [tutorial](https://github.com/shaikadish/imdbProject) I wrote which details the entire process!

# Demo:

In [8]:
# I am remove all printed output. I don't want to scare any non-technical people!

# Run setup only on first run
if not('BertTokenizer' in locals()):
  # Download model and library
  print('Downloading important things...')
  !pip install transformers &> /dev/null
  !gdown --id 1-00HNjkWgzq6_b-UDJtUfztR5Ga-Ymbi &> /dev/null

  # Disable output watnings
  import warnings
  warnings.filterwarnings("ignore")

  # Import libraries
  from transformers.utils.logging import disable_progress_bar
  import torch
  import torch.nn as nn
  from transformers import BertTokenizer, AutoModel
  import transformers
  transformers.logging.set_verbosity_error()

  # Model class
  class RatingModel(nn.Module):
      def __init__(self):
          super(RatingModel, self).__init__()
          
          # Load pretrained BERT model
          self.base_model = AutoModel.from_pretrained('fabriceyhc/bert-base-uncased-imdb')
          # Include dropout layer in classification head
          self.dropout = nn.Dropout(0.5)
          # Add new output layer for classification
          self.linear = nn.Linear(768, 1) 
          
      def forward(self, tokenized_reviews, attn_mask):

          # Pass the tokenized reviews through the BERT model
          outputs = self.base_model(tokenized_reviews, attention_mask=attn_mask)
          outputs = self.dropout(outputs[0])

          # The slicing of the BERT model outputs (bellow) is to make use of the BERT features 
          # from the only the special [CLS] token for classification. This token is at index 0
          outputs = self.linear(outputs[:,0,:])
          
          return outputs

  # Predictor class
  class RatingPredictor():
      def __init__(self,model,tokenizer):

        # Load model and tokenizer
        self.model = model.eval()
        self.tokenizer = tokenizer

      def predict(self,review):

        # Tokenize review
        encoded_dict = self.tokenizer.encode_plus(
                          review,                      
                          truncation=True,
                          add_special_tokens = True, 
                          max_length = 512,          
                          pad_to_max_length = True,
                          return_attention_mask = True,
                                            )
            
        tokenized_review = encoded_dict['input_ids']
        attention_mask = encoded_dict['attention_mask']

        # Get rating prediction
        preds = self.model(torch.tensor([tokenized_review]),torch.tensor([attention_mask]))

        # Return rating out of 10
        return round(preds.item()*10)

  # Download tokenizer
  print('Almost done...')
  tokenizer = BertTokenizer.from_pretrained('fabriceyhc/bert-base-uncased-imdb', do_lower_case=True,disable_progress_bar=True)

  # Create model and load
  model = RatingModel()
  model.load_state_dict(torch.load('imdb_1.pt',map_location=torch.device('cpu')))

  # Create predictor
  predictor = RatingPredictor(model,tokenizer)
  print('Done!\n')

# Run demo
review=input("Type a review: \n\n")
prediction=predictor.predict(review)
print(f'\nPredicted rating: {prediction}/10')

KeyboardInterrupt: ignored