# **Rotten Tomatoes Analysis**

Importing relevant libraries

In [1]:
import numpy as np
import pandas as pd

from matplotlib import pyplot as plt
%matplotlib inline


### **Working Files**


In [386]:
rt_movie_info = pd.read_csv('data/zippedData/rt.movie_info.tsv.gz',encoding='unicode_escape', sep='\t')
rt_reviews = pd.read_csv('data/zippedData/rt.reviews.tsv.gz',encoding='unicode_escape', sep='\t')

### **High Level Analysis**

Checking to see if there are any relation between the two dataframes (rt_movie_info and rt_reviews)

In [389]:
print(rt_movie_info.columns)
print()
print(rt_reviews.columns)

Index(['id', 'synopsis', 'rating', 'genre', 'director', 'writer',
       'theater_date', 'dvd_date', 'currency', 'box_office', 'runtime',
       'studio'],
      dtype='object')

Index(['id', 'review', 'rating', 'fresh', 'critic', 'top_critic', 'publisher',
       'date'],
      dtype='object')


### Comment:
We have 'id' and 'rating' that match up, but it is too early to tell if they have anything in common.

In [394]:
print(f'*** rt_movie_info DataFrame ***')
rt_movie_info.info()

*** rt_movie_info DataFrame ***
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1560 entries, 0 to 1559
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            1560 non-null   int64 
 1   synopsis      1498 non-null   object
 2   rating        1557 non-null   object
 3   genre         1552 non-null   object
 4   director      1361 non-null   object
 5   writer        1111 non-null   object
 6   theater_date  1201 non-null   object
 7   dvd_date      1201 non-null   object
 8   currency      340 non-null    object
 9   box_office    340 non-null    object
 10  runtime       1530 non-null   object
 11  studio        494 non-null    object
dtypes: int64(1), object(11)
memory usage: 146.4+ KB


### Comment:
There is a good chunk of missing data for 'curreny', 'box office', and 'studio'. We also have other columns that have a bit of missing values. Depending on their value, I may or may not need to clean them.

In [395]:
print(f'*** rt_reviews DataFrame ***')
rt_reviews.info()

*** rt_reviews DataFrame ***
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54432 entries, 0 to 54431
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          54432 non-null  int64 
 1   review      48869 non-null  object
 2   rating      40915 non-null  object
 3   fresh       54432 non-null  object
 4   critic      51710 non-null  object
 5   top_critic  54432 non-null  int64 
 6   publisher   54123 non-null  object
 7   date        54432 non-null  object
dtypes: int64(2), object(6)
memory usage: 3.3+ MB


### Comment:
We do have some more missing values, but the biggest issues are the data types. I may need to convert them to int or float. Most likely the 'rating' column.

In [396]:
rt_movie_info

Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
0,1,"This gritty, fast-paced, and innovative police...",R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971","Sep 25, 2001",,,104 minutes,
1,3,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000,108 minutes,Entertainment One
2,5,Illeana Douglas delivers a superb performance ...,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996","Apr 18, 2000",,,116 minutes,
3,6,Michael Douglas runs afoul of a treacherous su...,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994","Aug 27, 1997",,,128 minutes,
4,7,,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,,,,200 minutes,
...,...,...,...,...,...,...,...,...,...,...,...,...
1555,1996,Forget terrorists or hijackers -- there's a ha...,R,Action and Adventure|Horror|Mystery and Suspense,,,"Aug 18, 2006","Jan 2, 2007",$,33886034,106 minutes,New Line Cinema
1556,1997,The popular Saturday Night Live sketch was exp...,PG,Comedy|Science Fiction and Fantasy,Steve Barron,Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner,"Jul 23, 1993","Apr 17, 2001",,,88 minutes,Paramount Vantage
1557,1998,"Based on a novel by Richard Powell, when the l...",G,Classics|Comedy|Drama|Musical and Performing Arts,Gordon Douglas,,"Jan 1, 1962","May 11, 2004",,,111 minutes,
1558,1999,The Sandlot is a coming-of-age story about a g...,PG,Comedy|Drama|Kids and Family|Sports and Fitness,David Mickey Evans,David Mickey Evans|Robert Gunter,"Apr 1, 1993","Jan 29, 2002",,,101 minutes,


### Comment:
Rating looks to pertain to the actual movie rating. I do see we can catagorize the genre to make a graph based on that.
I don't see any value for the following columns: 'synopsis', 'currency', 'box office' (missing data, may need to talk with 
Hatice about this one), and studio (may change my mind later).

In [397]:
rt_reviews

Unnamed: 0,id,review,rating,fresh,critic,top_critic,publisher,date
0,3,A distinctly gallows take on contemporary fina...,3/5,fresh,PJ Nabarro,0,Patrick Nabarro,"November 10, 2018"
1,3,It's an allegory in search of a meaning that n...,,rotten,Annalee Newitz,0,io9.com,"May 23, 2018"
2,3,... life lived in a bubble in financial dealin...,,fresh,Sean Axmaker,0,Stream on Demand,"January 4, 2018"
3,3,Continuing along a line introduced in last yea...,,fresh,Daniel Kasman,0,MUBI,"November 16, 2017"
4,3,... a perverse twist on neorealism...,,fresh,,0,Cinema Scope,"October 12, 2017"
...,...,...,...,...,...,...,...,...
54427,2000,The real charm of this trifle is the deadpan c...,,fresh,Laura Sinagra,1,Village Voice,"September 24, 2002"
54428,2000,,1/5,rotten,Michael Szymanski,0,Zap2it.com,"September 21, 2005"
54429,2000,,2/5,rotten,Emanuel Levy,0,EmanuelLevy.Com,"July 17, 2005"
54430,2000,,2.5/5,rotten,Christopher Null,0,Filmcritic.com,"September 7, 2003"


### Comment:
We do have some relatable columns! The 'id' columns will come in handy when handling these datasets. The 'rating' column looks to be a mess, and will take considerable time to clean. The next most useful column would be 'fresh'! Rotten Tomatoes has a rule where if the review is at LEAST 60% in rating, then it is a fresh tomato!. I'm not sure if I can get any valuable data/info from the rest of the columns. Bummer.

## **Cleaning rt_movie_info**

- Drop irrelevant rows

In [399]:
rt_null_genre = rt_movie_info[rt_movie_info['genre'].isna()]
rt_null_genre

Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
10,17,,,,,,,,,,,
131,167,,,,,,,,,,,
222,289,,NR,,,,,,,,95 minutes,
250,327,"When a new robot, Raymond, defeats the three h...",NR,,,,,,,,13 minutes,
658,843,Miners want to drill for billions of dollars w...,NR,,,,,,,,60 minutes,
1082,1393,Steven Seagal plays an expert sniper on a spec...,R,,Fred Olen Ray,Fred Olen Ray,,,,,84 minutes,
1342,1736,,NR,,,,,,,,,
1543,1982,,,,,,,,,,,


### Comment:
These rows do not give us enough information to make any conclusions from and do not provide value. Lets DROP EM!

In [401]:
# Does not give any relevant info. drop em.
dropem = rt_null_genre.index
dropem = list(dropem)
rt_movie_info = rt_movie_info.drop(dropem)
rt_movie_info

Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
0,1,"This gritty, fast-paced, and innovative police...",R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971","Sep 25, 2001",,,104 minutes,
1,3,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000,108 minutes,Entertainment One
2,5,Illeana Douglas delivers a superb performance ...,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996","Apr 18, 2000",,,116 minutes,
3,6,Michael Douglas runs afoul of a treacherous su...,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994","Aug 27, 1997",,,128 minutes,
4,7,,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,,,,200 minutes,
...,...,...,...,...,...,...,...,...,...,...,...,...
1555,1996,Forget terrorists or hijackers -- there's a ha...,R,Action and Adventure|Horror|Mystery and Suspense,,,"Aug 18, 2006","Jan 2, 2007",$,33886034,106 minutes,New Line Cinema
1556,1997,The popular Saturday Night Live sketch was exp...,PG,Comedy|Science Fiction and Fantasy,Steve Barron,Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner,"Jul 23, 1993","Apr 17, 2001",,,88 minutes,Paramount Vantage
1557,1998,"Based on a novel by Richard Powell, when the l...",G,Classics|Comedy|Drama|Musical and Performing Arts,Gordon Douglas,,"Jan 1, 1962","May 11, 2004",,,111 minutes,
1558,1999,The Sandlot is a coming-of-age story about a g...,PG,Comedy|Drama|Kids and Family|Sports and Fitness,David Mickey Evans,David Mickey Evans|Robert Gunter,"Apr 1, 1993","Jan 29, 2002",,,101 minutes,


### Comment:
Rows have been dropped! But the index is all messed up. Take a look at the total rows and comapre it to the last index number. I'll need to reset it.

In [402]:
rt_movie_info = rt_movie_info.reset_index(drop=True)
rt_movie_info

Unnamed: 0,id,synopsis,rating,genre,director,writer,theater_date,dvd_date,currency,box_office,runtime,studio
0,1,"This gritty, fast-paced, and innovative police...",R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971","Sep 25, 2001",,,104 minutes,
1,3,"New York City, not-too-distant-future: Eric Pa...",R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012","Jan 1, 2013",$,600000,108 minutes,Entertainment One
2,5,Illeana Douglas delivers a superb performance ...,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996","Apr 18, 2000",,,116 minutes,
3,6,Michael Douglas runs afoul of a treacherous su...,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994","Aug 27, 1997",,,128 minutes,
4,7,,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,,,,200 minutes,
...,...,...,...,...,...,...,...,...,...,...,...,...
1547,1996,Forget terrorists or hijackers -- there's a ha...,R,Action and Adventure|Horror|Mystery and Suspense,,,"Aug 18, 2006","Jan 2, 2007",$,33886034,106 minutes,New Line Cinema
1548,1997,The popular Saturday Night Live sketch was exp...,PG,Comedy|Science Fiction and Fantasy,Steve Barron,Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner,"Jul 23, 1993","Apr 17, 2001",,,88 minutes,Paramount Vantage
1549,1998,"Based on a novel by Richard Powell, when the l...",G,Classics|Comedy|Drama|Musical and Performing Arts,Gordon Douglas,,"Jan 1, 1962","May 11, 2004",,,111 minutes,
1550,1999,The Sandlot is a coming-of-age story about a g...,PG,Comedy|Drama|Kids and Family|Sports and Fitness,David Mickey Evans,David Mickey Evans|Robert Gunter,"Apr 1, 1993","Jan 29, 2002",,,101 minutes,


In [385]:
rt_movie_info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1552 entries, 0 to 1551
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            1552 non-null   int64 
 1   synopsis      1495 non-null   object
 2   rating        1552 non-null   object
 3   genre         1552 non-null   object
 4   director      1360 non-null   object
 5   writer        1110 non-null   object
 6   theater_date  1201 non-null   object
 7   dvd_date      1201 non-null   object
 8   currency      340 non-null    object
 9   box_office    340 non-null    object
 10  runtime       1526 non-null   object
 11  studio        494 non-null    object
dtypes: int64(1), object(11)
memory usage: 145.6+ KB


### Comment:
Lets take a look at 'rating'. It should hold some good value to see if a certain rating is popular.

In [406]:
rt_movie_info['rating'].value_counts()

R        520
NR       499
PG       240
PG-13    235
G         57
NC17       1
Name: rating, dtype: int64

### Comment:
Woah! There is about 2x the amount of "R" and "NR" movies compared to the other ratings!

## **Cleaning rt_reviews**

In [5]:
rt_reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54432 entries, 0 to 54431
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          54432 non-null  int64 
 1   review      48869 non-null  object
 2   rating      40915 non-null  object
 3   fresh       54432 non-null  object
 4   critic      51710 non-null  object
 5   top_critic  54432 non-null  int64 
 6   publisher   54123 non-null  object
 7   date        54432 non-null  object
dtypes: int64(2), object(6)
memory usage: 3.3+ MB


In [303]:
# Missing data: review, rating, critic, and publisher
rt_reviews_null = rt_reviews[rt_reviews['rating'].isna()]
rt_reviews_null['fresh'].value_counts()

fresh     8174
rotten    5343
Name: fresh, dtype: int64

In [304]:
rt_reviews_null['rating'] = rt_reviews_null['rating'].replace(np.nan, 0.0)
rt_reviews_null.reset_index(inplace=True)

count = 0

for tomato in rt_reviews_null['fresh']:
    if tomato == 'fresh':
        rt_reviews_null['rating'][count] = 6
        count += 1
    else:
        rt_reviews_null['rating'][count] = 5
        count += 1

        
rt_reviews_null

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_null['rating'] = rt_reviews_null['rating'].replace(np.nan, 0.0)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_null['rating'][count] = 5
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exec(code_obj, self.user_global_ns, self.user_ns)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_gu

Unnamed: 0,index,id,review,rating,fresh,critic,top_critic,publisher,date
0,1,3,It's an allegory in search of a meaning that n...,5.0,rotten,Annalee Newitz,0,io9.com,"May 23, 2018"
1,2,3,... life lived in a bubble in financial dealin...,6.0,fresh,Sean Axmaker,0,Stream on Demand,"January 4, 2018"
2,3,3,Continuing along a line introduced in last yea...,6.0,fresh,Daniel Kasman,0,MUBI,"November 16, 2017"
3,4,3,... a perverse twist on neorealism...,6.0,fresh,,0,Cinema Scope,"October 12, 2017"
4,5,3,... Cronenberg's Cosmopolis expresses somethin...,6.0,fresh,Michelle Orange,0,Capital New York,"September 11, 2017"
...,...,...,...,...,...,...,...,...,...
13512,54409,2000,"A lightweight, uneven action comedy that freel...",5.0,rotten,Daniel Eagan,0,Film Journal International,"October 5, 2002"
13513,54417,2000,"The funny thing is, I didn't mind all this con...",6.0,fresh,Andrew Sarris,1,Observer,"October 2, 2002"
13514,54425,2000,Despite Besson's high-profile name being Wasab...,6.0,fresh,Andy Klein,0,New Times,"September 26, 2002"
13515,54426,2000,The film lapses too often into sugary sentimen...,5.0,rotten,Paul Malcolm,1,L.A. Weekly,"September 26, 2002"


In [305]:
rt_reviews_null['rating'].isna().value_counts()

False    13517
Name: rating, dtype: int64

In [306]:
rt_reviews_not_null = rt_reviews[rt_reviews['rating'].notna()]
rt_reviews_not_null.reset_index(inplace=True)
rt_reviews_not_null['rating'] = rt_reviews_not_null['rating'].str.replace(" ","")


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_not_null['rating'] = rt_reviews_not_null['rating'].str.replace(" ","")


In [307]:
count = 0
fresh_tomato = 6.0 # if the review rating is at least 60% positive, it is a fresh tomato.
rotten_tomato = 5.0 



for rate in rt_reviews_not_null['rating']:

    if len(rate) <= 2: # if the length of the string is 3 or less ( EX: A, B-, C, 6)
        # creating variables for grade scale. Out of 12
        plus_minus = .083 
        a_grade = 0.916
        b_grade = 0.666
        c_grade = 0.416
        d_grade = 0.166


        if rate[0] == 'A':
            if rate[-1] == '-':
                rt_reviews_not_null['rating'][count] = (a_grade - plus_minus)*10
                
            elif rate[-1] == '+':
                rt_reviews_not_null['rating'][count] = 10
                
            else:
                rt_reviews_not_null['rating'][count] = a_grade*10
                

        elif rate[0] == 'B': 
            if rate[-1] == '-':
                rt_reviews_not_null['rating'][count] = (b_grade - plus_minus)*10
                
            elif rate[-1] == '+':
                rt_reviews_not_null['rating'][count] = (b_grade + plus_minus)*10
                
            else:
                rt_reviews_not_null['rating'][count] = b_grade*10
                

        elif rate[0] == 'C':
            if rate[-1] == '-':
                rt_reviews_not_null['rating'][count] = (c_grade - plus_minus)*10
                
            elif rate[-1] == '+':
                rt_reviews_not_null['rating'][count] = (c_grade + plus_minus)*10
                
            else:
                rt_reviews_not_null['rating'][count] = c_grade*10
                

        elif rate[0] == 'D':
            if rate[-1] == '-':
                rt_reviews_not_null['rating'][count] = (d_grade - plus_minus)*10
                
            elif rate[-1] == '+':
                rt_reviews_not_null['rating'][count] = (d_grade + plus_minus)*10
                
            else:
                rt_reviews_not_null['rating'][count] = d_grade*10
                
        
        elif 'F' in rate:
            rt_reviews_not_null['rating'][count] = 0.0
            
        
        elif rate.isalpha() == True:
            if rt_reviews_not_null['fresh'][count] == 'fresh':
                rt_reviews_not_null['rating'][count] = fresh_tomato
            else:
                rt_reviews_not_null['rating'][count] = rotten_tomato

        elif float(rate) <= 10:
            if rt_reviews_not_null['fresh'][count] == 'fresh' and float(rate) >= 6:
                rt_reviews_not_null['rating'][count] = float(rate)

            elif rt_reviews_not_null['fresh'][count] == 'fresh' and float(rate) < 6:
                if float(rate) > 3:
                    rt_reviews_not_null['rating'][count] = (float(rate) / 5)*10
                else:
                    rt_reviews_not_null['rating'][count] = fresh_tomato

            elif rt_reviews_not_null['fresh'][count] == 'rotten' and float(rate) >= 6:
                rt_reviews_not_null['rating'][count] = rotten_tomato

            else:
                rt_reviews_not_null['rating'][count] = float(rate)
        

        else:
            if rt_reviews_not_null['fresh'][count] == 'fresh':
                rt_reviews_not_null['rating'][count] = fresh_tomato
            else:
                rt_reviews_not_null['rating'][count] = rotten_tomato


    elif len(rate) == 3: # If the length of the string is three ( EX: 1/5 )
        if rate[-1] == '5' and rate[-2] == '/':
            rt_reviews_not_null['rating'][count] = float(rate[0])
            
        
        elif rate[-2] == '/':
            rt_reviews_not_null['rating'][count] = (float(rate[0:1]) / float(rate[-1]))*5
            
        
        elif rate[-2] == '.' and float(rate) <= 10:
            if rt_reviews_not_null['fresh'][count] == 'fresh' and float(rate) >= 6:
                rt_reviews_not_null['rating'][count] = float(rate)

            elif rt_reviews_not_null['fresh'][count] == 'fresh' and float(rate) < 6:
                if float(rate) > 3:
                    rt_reviews_not_null['rating'][count] = (float(rate) / 5)*10
                else:
                    rt_reviews_not_null['rating'][count] = fresh_tomato

            elif rt_reviews_not_null['fresh'][count] == 'rotten' and float(rate) >= 6:
                rt_reviews_not_null['rating'][count] = rotten_tomato

            else:
                rt_reviews_not_null['rating'][count] = float(rate)

        else:
            if rt_reviews_not_null['fresh'][count] == 'fresh':
                rt_reviews_not_null['rating'][count] = fresh_tomato
            else:
                rt_reviews_not_null['rating'][count] = rotten_tomato


    elif len(rate) == 4: # If the length of the string is four ( EX: 0/10 )
        if rate[-3] == '/':
            rt_reviews_not_null['rating'][count] = (float(rate[0]) / float(rate[-2:]))*10
            
        elif rate[-3] == '.' and float(rate) <= 10:
            if rt_reviews_not_null['fresh'][count] == 'fresh' and float(rate) >= 6:
                rt_reviews_not_null['rating'][count] = float(rate)
            
            elif rt_reviews_not_null['fresh'][count] == 'fresh' and float(rate) < 6:
                rt_reviews_not_null['rating'][count] = float(rate)

            elif rt_reviews_not_null['fresh'][count] == 'rotten':
                rt_reviews_not_null['rating'][count] = float(rate)
            
        else:
            if rt_reviews_not_null['fresh'][count] == 'fresh':
                rt_reviews_not_null['rating'][count] = fresh_tomato
            else:
                rt_reviews_not_null['rating'][count] = rotten_tomato
            


    elif len(rate) == 5: # If the length of the string is five ( EX: 10/10 ; 1.0/5 )
        if rate[-1] == '5' and rate[-2] == '/':
            rt_reviews_not_null['rating'][count] = float(rate[0:3])
            

        elif rate[-2] == '/':
            rt_reviews_not_null['rating'][count] = (float(rate[0:3]) / float(rate[-1]))*5
            

        elif rate[-3] == '/':
            rt_reviews_not_null['rating'][count] = (float(rate[0:3]) / float(rate[-2:]))*5
            
        else:
            if rt_reviews_not_null['fresh'][count] == 'fresh':
                rt_reviews_not_null['rating'][count] = fresh_tomato
            else:
                rt_reviews_not_null['rating'][count] = rotten_tomato
            
    

    elif len(rate) == 6:
        if rate[-3] == '/':
            rt_reviews_not_null['rating'][count] = (float(rate[0:3]) / float(rate[-2:]))*5
            

        else:
            if rt_reviews_not_null['fresh'][count] == 'fresh':
                rt_reviews_not_null['rating'][count] = fresh_tomato
            else:
                rt_reviews_not_null['rating'][count] = rotten_tomato
            

    else:
        print(f'{rate} is not corrected. See else')

    count +=1
print(count)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_not_null['rating'][count] = float(rate[0])
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exec(code_obj, self.user_global_ns, self.user_ns)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_not_null['rating'][count] = c_grade*10
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_not

40915


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_not_null['rating'][count] = fresh_tomato


In [308]:
rt_reviews_not_null['rating'] = rt_reviews_not_null['rating'].convert_dtypes()

df_all_rows = pd.concat([rt_reviews_not_null, rt_reviews_null], ignore_index=True)
df_all_rows = df_all_rows.sort_values(by= 'index')

df_all_rows = df_all_rows.reset_index(drop=True)

df_all_rows['review'].fillna('None', inplace=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40915 entries, 0 to 40914
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   index       40915 non-null  int64  
 1   id          40915 non-null  int64  
 2   review      35379 non-null  object 
 3   rating      40915 non-null  float64
 4   fresh       40915 non-null  object 
 5   critic      38935 non-null  object 
 6   top_critic  40915 non-null  int64  
 7   publisher   40688 non-null  object 
 8   date        40915 non-null  object 
dtypes: float64(1), int64(3), object(5)
memory usage: 2.8+ MB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rt_reviews_not_null['rating'] = rt_reviews_not_null['rating'].convert_dtypes()


In [358]:
check = rt_reviews['fresh'] == df_all_rows['fresh']

In [361]:
check.value_counts()

True    54432
Name: fresh, dtype: int64