<a href="https://colab.research.google.com/github/shokirovnozir/DL/blob/master/rottenReviews.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setting Up

In [19]:
# Mount Google Drive
from google.colab import drive
ROOT = "/content/drive"
drive.mount(ROOT)

Mounted at /content/drive


In [2]:
#hide
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

[K     |████████████████████████████████| 61kB 2.7MB/s 
[K     |████████████████████████████████| 1.0MB 7.3MB/s 
[K     |████████████████████████████████| 358kB 19.2MB/s 
[K     |████████████████████████████████| 51kB 5.9MB/s 
[K     |████████████████████████████████| 92kB 8.6MB/s 
[K     |████████████████████████████████| 40kB 5.5MB/s 
[K     |████████████████████████████████| 51kB 6.6MB/s 
[K     |████████████████████████████████| 61kB 6.6MB/s 
[K     |████████████████████████████████| 2.6MB 20.2MB/s 
[31mERROR: fastai 2.0.6 has requirement pandas>=1.1.0, but you'll have pandas 1.0.5 which is incompatible.[0m
[?25hMounted at /content/gdrive


In [3]:
#hide
from fastbook import *

# Training Language model

In [4]:
from fastai.text.all import *
 
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=16)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.615372,0.415917,0.81168,20:38


epoch,train_loss,valid_loss,accuracy,time
0,0.298334,0.242986,0.90024,35:28
1,0.267946,0.226205,0.914,35:46
2,0.197395,0.181424,0.93036,35:24
3,0.146515,0.182509,0.93076,35:22


---

### Web Scrapping

**Instead of writing one mini review or copy pasting from the internet, let us scrap some reviews from [rottentomatoes](https://www.rottentomatoes.com/critics/latest_reviews).**

**Credit for the scrapping part goes to Dwarkesh Natarajan [medium post](https://medium.com/opex-analytics/simple-web-scraping-in-python-90d6fddfaeca).** 

---



In [5]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.rottentomatoes.com/critics/latest_reviews/"



In [55]:
def rottenReviews(url):
  '''
  This function gets an url from rotten tomatoes latest reviews
  and predicts the wether the reviews were positive or negative
  and also outputs the probability by which percantage the decision was made
  The actual scrapped date has the "Rating" which can act kind of label for 
  the prediction.
  '''
  #make a request 
  r = requests.get(url)
  soup = BeautifulSoup(r.content)

  # make data frame out of our scrapped soup
  a = []
  df = pd.DataFrame(columns = ['Rating', 'Movie', 'Review',
                              'Critics'])
  for link in soup.find_all('td'):
      a.append(link.get_text())
      if len(a) == 4:
          df_length = len(df)
          df.loc[df_length] = a
          a = []

  # do some more cleaning
  df['Review'] = df['Review'].apply(lambda st: st[st.find("\"")+1:st.find(".")])
  df['Movie'] = df['Movie'].apply(lambda st: st[st.find("\n")+1:st.find("\n")])
  df['Critics'] = df['Critics'].str.replace("\n", " ")
  df['Rating'] = df['Rating'].str.replace("\n", "")

  # predict the reviews using our pretrained model
  predicted_review = []
  for rev in df['Review']:
    predicted_review.append(learn.predict(rev))


  # make a dataframe to store the results better
  df1 = pd.DataFrame(columns = ['Predicted Review', 'Neg Prob', 'Pos Prob',
                             'Probability'])
  
  posprob = []
  negprob = []
  prob = []
  revPred = []
  for x in predicted_review:
    if x[0] == 'pos':
      revPred.append("Positive")
      prob.append("{:.2%}".format(x[2].numpy()[1]))
    else:
      revPred.append("Negative")
      prob.append("{:.2%}".format(x[2].numpy()[0]))

    negprob.append("{:.2%}".format(x[2].numpy()[0]))
    posprob.append("{:.2%}".format(x[2].numpy()[1]))

  df1['Predicted Review'] = revPred
  df1['Probability'] = prob
  df1['Neg Prob'] = negprob
  df1['Pos Prob'] = posprob

  # Present the results we want
  dff = pd.DataFrame(df['Rating'])
  dff = dff.join(df1[['Predicted Review', 'Probability']])
  dff = dff.join(df['Review'])

  return dff

In [60]:
reviews = rottenReviews(url)
reviews.head(21)

Unnamed: 0,Rating,Predicted Review,Probability,Review
0,2.5/5,Negative,77.01%,"Though high on style and atmosphere, Hall fails to populate its appealing setup with the required suspense"
1,3/5,Positive,90.35%,"This amusing horror from Down Under doesn't reinvent the wheel, but benefits from its punchy social commentary and grisly third act"
2,,Positive,84.27%,"a literal mind-melt, pitting an unbalanced neuroscientist against different isolated sections of his brain"
3,,Positive,86.57%,Armando Fonseca and Kapel Furman bring bloody bedlam and an allegory of national rapine to the streets of contemporary São Paulo
4,,Positive,85.62%,"A drunken Australian step-cousin of 1970s European and American cinema, Parish Malfitano's excellent debut is a rich minestrone stew of cinephilic allusions"
5,5/10,Positive,62.60%,The first problem with the film is the amount of time it takes before the monster appears
6,3/10,Negative,98.78%,It seems pointless to criticize a film for small errors when the low budget and ridiculous nature of the story and characters make the whole ordeal a moviemaking joke
7,6/10,Negative,78.17%,Bare breasts and brutal violence always seem to present themselves just when audiences might start to focus on the more nonsensical bits
8,7/10,Positive,93.44%,A ghastly little sea-faring thriller worthy of a viewing
9,6/10,Positive,74.75%,"The plot isn't terribly creative, but it effectively sets up some delightfully gory ambushes and boo moments for the revealing of dismembered corpses"


**We can see that this model predicts almost most of our movie reviews with their rating with impunity, however fails to capture cemantic meaning, sarcasm especially in a very short review (see # 11)**