> **Tip**: Welcome to the Investigate a Dataset project! You will find tips in quoted sections like this to help organize your approach to your investigation. Before submitting your project, it will be a good idea to go back through your report and remove these sections to make the presentation of your work as tidy as possible. First things first, you might want to double-click this Markdown cell and change the title so that it reflects your dataset and investigation.

# Project: TMDb Data Analysis

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

https://stackoverflow.com/questions/41927973/pandas-dataframe-pipe-separated-values-in-a-cell
    
https://www.google.com/search?q=dealing+with+values+separted+by+pipe+in+pandas+dataframe&rlz=1C1XXVF_frTG984TG984&sxsrf=ALiCzsbFVhpfkWxinlCkh-7Q_-jcpjMvmg%3A1670590433715&ei=4S-TY72nK4PFgQbou72wDw&ved=0ahUKEwj9m_eqyuz7AhWDYsAKHehdD_YQ4dUDCA8&uact=5&oq=dealing+with+values+separted+by+pipe+in+pandas+dataframe&gs_lcp=Cgxnd3Mtd2l6LXNlcnAQAzoKCAAQRxDWBBCwAzoGCCMQJxATOgQIIxAnOgQIABBDOgsIABCABBCxAxCDAToOCC4QsQMQgwEQxwEQrwE6CAguELEDENQCOggILhCABBCxAzoICAAQgAQQsQM6CAgAELEDEIMBOgUIABCABDoECC4QQzoRCC4QgAQQsQMQgwEQxwEQ0QM6CAguEIAEENQCOgUILhCABDoLCC4QgAQQxwEQrwE6CAgAEIAEEMsBOg0IABCABBCxAxCDARAKOgcIABCABBAKOggILhCABBDLAToGCAAQFhAeOggIABAWEB4QDzoFCCEQoAE6CAghEBYQHhAdOgQIIRAVOgcIIRCgARAKOgoIIRAWEB4QDxAdOgoIIRAWEB4QChAdSgQIQRgASgQIRhgAUOYIWP9uYPpwaARwAXgAgAH1AYgB3VuSAQcwLjQyLjE3mAEAoAEByAEIwAEB&sclient=gws-wiz-serp

<a id='intro'></a>
## Introduction

> **Tip**: In this section of the report, provide a brief introduction to the dataset you've selected for analysis. At the end of this section, describe the questions that you plan on exploring over the course of the report. Try to build your report around the analysis of at least one dependent variable and three independent variables.
>
> If you haven't yet selected and downloaded your data, make sure you do that first before coming back here. If you're not sure what questions to ask right now, then make sure you familiarize yourself with the variables and the dataset context for ideas of what to explore.


> TODO : delete this text later.    
This data set contains information
about 10,000 movies collected from
The Movie Database (TMDb),
including user ratings and revenue.
● Certain columns, like ‘cast’
and ‘genres’, contain multiple
values separated by pipe (|)
characters.
● There are some odd characters
in the ‘cast’ column. Don’t worry
about cleaning them. You can
leave them as is.
● The final two columns ending
with “_adj” show the budget and
revenue of the associated movie
in terms of 2010 dollars,
accounting for inflation over
time.

-------------------

We are planing to invest in the movies business, so our aim is to understand the sector in order to know what are the best investment options available to us.   
In this project, we will be analyzing data associated with The Movie Database (TMDb), including user ratings and revenue, which contains information about 10,000 movies. In particular, we will be interested in :
1. Which genre of movies are on trend, so that we define the direction of our investment ?
2. Between the genres of movies on trend, what is the correlation between them and revenue ?  
3. What is the minimum amount of budget we should allocate to the choosen movie genre, in order to score in the highest movie revenues ?
4. Which are the companies that are best at the genre of movie we are targeting ?
5. Does having a website for the genre that is of interest to us help improve revenue ?



In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline


from typing import List, Tuple

In [2]:
def unpipe_features(df:pd.DataFrame, features:List[str], separator:str="|")->pd.DataFrame:
    """
        Remove pipes (or any given separator) from features cells, and create new lines
            based on each item of the cell
    """
    for feature in features:
        print("********************"*5)
        print("Working on feature : {} ".format(feature))
        df = unpipe_feature(df=df, feature=feature, separator=separator)
        print("Work on feature {} ended".format(feature))
    return df

In [3]:
def unpipe_feature(df:pd.DataFrame, feature:str, separator:str="|")->pd.DataFrame:
    """
        Remove pipes (or any given separator) from a feature cell, and create new lines
            based on each item of the cell
    
        References : 
          - https://stackoverflow.com/questions/41927973/pandas-dataframe-pipe-separated-values-in-a-cell
          - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples
          - https://docs.python.org/3/library/itertools.html
    """
    
    print("Shape of dataset before processing : {}".format(df.shape))
    print("........"*8)
    
    all_rows = [] #np.empty(shape=df.shape)
    
    # duplicate feature, and putting an underscore at the end of the duplicated column 
    target_col = "{}_".format(feature)
    df[target_col] = df[feature]
    
    for index,row in df.iterrows(): 
        print("Processing row index : {} ".format(index))
#         print(row)
#         print("..."*2)
#         print(row["popularity"])
#         print("..."*2)
#         print(index)
#         print("---"*8)

        # split the datas in list
        col_datas:list[str] = str(row[target_col]).split(separator)
            
        rows = [ list(row[df.columns[:-1]]) + [data] for data in col_datas]
        all_rows += rows
        print("-------"*6)
        
    print("Processed all rows in dataset")    
    df_2 = pd.DataFrame(data=all_rows, columns=df.columns)
    df_2[feature]=df_2[target_col]
    
    df_2.drop([target_col],axis=1,inplace=True)
    print("Generated new dataframe")
    print("Shape of new dataset : {}".format(df_2.shape))
    print("........"*8)
    
    return df_2
    
    

In [4]:
def unpipe_feature_nextgen(df:pd.DataFrame, feature:str, separator:str="|")->pd.DataFrame:
    """
        Remove pipes (or any given separator) from a feature cell, and create new lines
            based on each item of the cell
    
        References : 
          - https://stackoverflow.com/questions/41927973/pandas-dataframe-pipe-separated-values-in-a-cell
          
    """
    
    all_rows = np.array([])  #np.empty(shape=df.shape)
    
    # duplicate feature, and putting an underscore at the end of the duplicated column 
    target_col = "{}_".format(feature)
    df[target_col] = df[feature]
    
    for index,row in df.iterrows():  # df.iloc[:2,:].iterrows(): #
#         print(row)
#         print("..."*2)
#         print(row["popularity"])
#         print("..."*2)
#         print(index)
#         print("---"*8)

        # split the datas in list
        col_datas:list[str] = str(row[target_col]).split(separator)
            
        rows = np.array( [ list(row[df.columns[:-1]]) + [data] for data in col_datas] )
        #all_rows[index,:] = rows
        all_rows = np.append(all_rows,rows)
        
    df_2 = pd.DataFrame(data=all_rows, columns=df.columns)
    df_2[feature]=df_2[target_col]
    
    df_2.drop([target_col],axis=1,inplace=True)
    
    return df_2
    
    

<a id='wrangling'></a>
## Data Wrangling

> **Tip**: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

### General Properties

In [5]:
# Load your data and print out a few lines. Perform operations to inspect data
#   types and look for instances of missing or possibly errant data.


In [6]:
df = pd.read_csv("tmdb-movies.csv")
df.head(2).T

Unnamed: 0,0,1
id,135397,76341
imdb_id,tt0369610,tt1392190
popularity,32.985763,28.419936
budget,150000000,150000000
revenue,1513528810,378436354
original_title,Jurassic World,Mad Max: Fury Road
cast,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...
homepage,http://www.jurassicworld.com/,http://www.madmaxmovie.com/
director,Colin Trevorrow,George Miller
tagline,The park is open.,What a Lovely Day.


The features name are self-explanotory, but we will still explain them in order to avoid/clear any confusion that may exists :    
* id : movie id, in TMDb database
* imdb_id : movie id, in IMDd database
* popularity : movie popularity score, in IMDb database
* budget : budget used to make the movie
* revenue : revenue generated by the movie
* original_title : movie original title
* cast : cast team of the movie
* homepage : movie website
* director : movie casting team director
* tagline : the movie tagline
* keywords : movie keywords/hashtags
* overview : movie overview
* runtime : movie runtime, in minutes
* genres : movie genres
* production_companies : companies which producted the movie
* release_date : movie release date
* vote_count : number of person which give a rating for the movie
* vote_average : average rating for the movie
* release_year : movie release year
* budget_adj : budget of the movie in terms of 2010 dollars, accounting for inflation over time
* revenue_adj : revenue of the movie in terms of 2010 dollars, accounting for inflation over time


### Getting an overview of the data

In [7]:
# Checking the number of samples and features in the dataset
df.shape

(10866, 21)

In [8]:
# Checking the data structure
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10866 entries, 0 to 10865
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    10866 non-null  int64  
 1   imdb_id               10856 non-null  object 
 2   popularity            10866 non-null  float64
 3   budget                10866 non-null  int64  
 4   revenue               10866 non-null  int64  
 5   original_title        10866 non-null  object 
 6   cast                  10790 non-null  object 
 7   homepage              2936 non-null   object 
 8   director              10822 non-null  object 
 9   tagline               8042 non-null   object 
 10  keywords              9373 non-null   object 
 11  overview              10862 non-null  object 
 12  runtime               10866 non-null  int64  
 13  genres                10843 non-null  object 
 14  production_companies  9836 non-null   object 
 15  release_date       

In [9]:
# Number of N/A values per feature
df.isna().sum()

id                         0
imdb_id                   10
popularity                 0
budget                     0
revenue                    0
original_title             0
cast                      76
homepage                7930
director                  44
tagline                 2824
keywords                1493
overview                   4
runtime                    0
genres                    23
production_companies    1030
release_date               0
vote_count                 0
vote_average               0
release_year               0
budget_adj                 0
revenue_adj                0
dtype: int64

In [10]:
# Number of NULL values per feature
df.isnull().sum()

id                         0
imdb_id                   10
popularity                 0
budget                     0
revenue                    0
original_title             0
cast                      76
homepage                7930
director                  44
tagline                 2824
keywords                1493
overview                   4
runtime                    0
genres                    23
production_companies    1030
release_date               0
vote_count                 0
vote_average               0
release_year               0
budget_adj                 0
revenue_adj                0
dtype: int64

**Data Overview conclustion** :      
* Over a total of 10866 samples, homepage has 7930 null value. We will assume for our work that null value here is due to not having a website.    
* The other samples which have null values will be dropped.

In [11]:
#df.hist(figsize=(15,15));

In [12]:
#df[ df["homepage"].isnull() ].hist(figsize=(15,15));

### Data Cleaning

In [13]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


Dropping the null values, while having `homepage` as feature let us with a dataset of shape (1992, 21), which is too little for our analysis.   
So, we will first create a new feature, `no_website`, which will tell weither a movie has a homepage or not (assuming that not having a homepage in the dataset is due to not having one hosted, and not due to not having find it).

In [14]:
# Creating a new feature, has_website, to tell weither the movie has a website or not
df["no_website"] = df["homepage"].isnull() | df["homepage"].isna() 
df.head(3).T

Unnamed: 0,0,1,2
id,135397,76341,262500
imdb_id,tt0369610,tt1392190,tt2908446
popularity,32.985763,28.419936,13.112507
budget,150000000,150000000,110000000
revenue,1513528810,378436354,295238201
original_title,Jurassic World,Mad Max: Fury Road,Insurgent
cast,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,Shailene Woodley|Theo James|Kate Winslet|Ansel...
homepage,http://www.jurassicworld.com/,http://www.madmaxmovie.com/,http://www.thedivergentseries.movie/#insurgent
director,Colin Trevorrow,George Miller,Robert Schwentke
tagline,The park is open.,What a Lovely Day.,One Choice Can Destroy You


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10866 entries, 0 to 10865
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    10866 non-null  int64  
 1   imdb_id               10856 non-null  object 
 2   popularity            10866 non-null  float64
 3   budget                10866 non-null  int64  
 4   revenue               10866 non-null  int64  
 5   original_title        10866 non-null  object 
 6   cast                  10790 non-null  object 
 7   homepage              2936 non-null   object 
 8   director              10822 non-null  object 
 9   tagline               8042 non-null   object 
 10  keywords              9373 non-null   object 
 11  overview              10862 non-null  object 
 12  runtime               10866 non-null  int64  
 13  genres                10843 non-null  object 
 14  production_companies  9836 non-null   object 
 15  release_date       

In [16]:
df["no_website"].unique()

array([False,  True])

In [17]:
# Checking the current shape of the dataframe
df.shape

(10866, 22)

In [18]:
# Droping homepage feature
df.drop(["homepage"],axis=1,inplace=True)

# Checking that homepage was removed
df.shape

(10866, 21)

In [19]:
# Drop null value
df.dropna(inplace=True)

# Looking at the new shape of our dataset
df.shape

(7031, 21)

In [20]:
# Making sure there is no more null or N/A value

In [21]:
df.isna().sum()

id                      0
imdb_id                 0
popularity              0
budget                  0
revenue                 0
original_title          0
cast                    0
director                0
tagline                 0
keywords                0
overview                0
runtime                 0
genres                  0
production_companies    0
release_date            0
vote_count              0
vote_average            0
release_year            0
budget_adj              0
revenue_adj             0
no_website              0
dtype: int64

In [22]:
df.isnull().sum()

id                      0
imdb_id                 0
popularity              0
budget                  0
revenue                 0
original_title          0
cast                    0
director                0
tagline                 0
keywords                0
overview                0
runtime                 0
genres                  0
production_companies    0
release_date            0
vote_count              0
vote_average            0
release_year            0
budget_adj              0
revenue_adj             0
no_website              0
dtype: int64

In [23]:
df.head(3).T

Unnamed: 0,0,1,2
id,135397,76341,262500
imdb_id,tt0369610,tt1392190,tt2908446
popularity,32.985763,28.419936,13.112507
budget,150000000,150000000,110000000
revenue,1513528810,378436354,295238201
original_title,Jurassic World,Mad Max: Fury Road,Insurgent
cast,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,Shailene Woodley|Theo James|Kate Winslet|Ansel...
director,Colin Trevorrow,George Miller,Robert Schwentke
tagline,The park is open.,What a Lovely Day.,One Choice Can Destroy You
keywords,monster|dna|tyrannosaurus rex|velociraptor|island,future|chase|post-apocalyptic|dystopia|australia,based on novel|revolution|dystopia|sequel|dyst...


In our current dataset, we have release_date and release_year. We will focus on the year, and drop the date.

In [24]:
df.drop(["release_date"],axis=1,inplace=True)
df.head(3).T

Unnamed: 0,0,1,2
id,135397,76341,262500
imdb_id,tt0369610,tt1392190,tt2908446
popularity,32.985763,28.419936,13.112507
budget,150000000,150000000,110000000
revenue,1513528810,378436354,295238201
original_title,Jurassic World,Mad Max: Fury Road,Insurgent
cast,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...,Shailene Woodley|Theo James|Kate Winslet|Ansel...
director,Colin Trevorrow,George Miller,Robert Schwentke
tagline,The park is open.,What a Lovely Day.,One Choice Can Destroy You
keywords,monster|dna|tyrannosaurus rex|velociraptor|island,future|chase|post-apocalyptic|dystopia|australia,based on novel|revolution|dystopia|sequel|dyst...


In [25]:
# %%timeit
# xx = unpipe_feature(df,"genres")
# xx.head(20).T

In [26]:
#%%timeit
"""
# Removing the pipes in our dataset features
to_unpipes = ["cast","keywords","genres","production_companies"]

df = unpipe_features(df=df, features=to_unpipes, separator="|")
df.head(20).T"""

# This code was commented because it is taking forever to run. The pipes will be removed from the cells
#   when required.

'\n# Removing the pipes in our dataset features\nto_unpipes = ["cast","keywords","genres","production_companies"]\n\ndf = unpipe_features(df=df, features=to_unpipes, separator="|")\ndf.head(20).T'

The dataset is now ready to be used in the Exploratory Data Analysis

In [27]:
# Saving the dataset for later use
#df.to_csv("unpiped_tmdb.csv")

<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

 

### Which genres of movies are on trend ?   
We will explore the relationship between movie genres and production years. We will try to understand how movies production evolved over the years, and which genres of movies are most producted in recent years.    

In [28]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


In [29]:
# List of movie genres
df["genres"].unique()

array(['Action|Adventure|Science Fiction|Thriller',
       'Adventure|Science Fiction|Thriller',
       'Action|Adventure|Science Fiction|Fantasy', ...,
       'Adventure|Comedy|Fantasy|Science Fiction',
       'Adventure|Drama|Action|Family|Foreign',
       'Comedy|Family|Mystery|Romance'], dtype=object)

In [30]:
# Current shape of dataset
df.shape

(7031, 20)

We will first remove the pipes from the `genres` cells, in order to have the dataset in the right format for our analysis.

In [31]:
to_unpipes =["genres"]  #["cast","keywords","genres","production_companies"]

df = unpipe_features(df=df, features=to_unpipes, separator="|")  # df.iloc[:10,:]
df.head(3).T

Shape of dataset before processing : (7031, 20)
................................................................
Processing row index : 0 
------------------------------------------
Processing row index : 1 
------------------------------------------
Processing row index : 2 
------------------------------------------
Processing row index : 3 
------------------------------------------
Processing row index : 4 
------------------------------------------
Processing row index : 5 
------------------------------------------
Processing row index : 6 
------------------------------------------
Processing row index : 7 
------------------------------------------
Processing row index : 8 
------------------------------------------
Processing row index : 9 
------------------------------------------
Processing row index : 10 
------------------------------------------
Processing row index : 11 
------------------------------------------
Processing row index : 12 
------------------------------

------------------------------------------
Processing row index : 128 
------------------------------------------
Processing row index : 129 
------------------------------------------
Processing row index : 131 
------------------------------------------
Processing row index : 132 
------------------------------------------
Processing row index : 135 
------------------------------------------
Processing row index : 136 
------------------------------------------
Processing row index : 137 
------------------------------------------
Processing row index : 138 
------------------------------------------
Processing row index : 139 
------------------------------------------
Processing row index : 141 
------------------------------------------
Processing row index : 142 
------------------------------------------
Processing row index : 143 
------------------------------------------
Processing row index : 145 
------------------------------------------
Processing row index : 146 
------

Processing row index : 436 
------------------------------------------
Processing row index : 446 
------------------------------------------
Processing row index : 447 
------------------------------------------
Processing row index : 448 
------------------------------------------
Processing row index : 449 
------------------------------------------
Processing row index : 451 
------------------------------------------
Processing row index : 459 
------------------------------------------
Processing row index : 461 
------------------------------------------
Processing row index : 462 
------------------------------------------
Processing row index : 469 
------------------------------------------
Processing row index : 470 
------------------------------------------
Processing row index : 472 
------------------------------------------
Processing row index : 474 
------------------------------------------
Processing row index : 475 
------------------------------------------
Proces

------------------------------------------
Processing row index : 748 
------------------------------------------
Processing row index : 749 
------------------------------------------
Processing row index : 750 
------------------------------------------
Processing row index : 752 
------------------------------------------
Processing row index : 753 
------------------------------------------
Processing row index : 754 
------------------------------------------
Processing row index : 755 
------------------------------------------
Processing row index : 756 
------------------------------------------
Processing row index : 758 
------------------------------------------
Processing row index : 759 
------------------------------------------
Processing row index : 760 
------------------------------------------
Processing row index : 761 
------------------------------------------
Processing row index : 762 
------------------------------------------
Processing row index : 763 
------

Processing row index : 966 
------------------------------------------
Processing row index : 969 
------------------------------------------
Processing row index : 970 
------------------------------------------
Processing row index : 971 
------------------------------------------
Processing row index : 974 
------------------------------------------
Processing row index : 975 
------------------------------------------
Processing row index : 976 
------------------------------------------
Processing row index : 977 
------------------------------------------
Processing row index : 979 
------------------------------------------
Processing row index : 980 
------------------------------------------
Processing row index : 982 
------------------------------------------
Processing row index : 983 
------------------------------------------
Processing row index : 984 
------------------------------------------
Processing row index : 986 
------------------------------------------
Proces

------------------------------------------
Processing row index : 1345 
------------------------------------------
Processing row index : 1347 
------------------------------------------
Processing row index : 1348 
------------------------------------------
Processing row index : 1349 
------------------------------------------
Processing row index : 1350 
------------------------------------------
Processing row index : 1351 
------------------------------------------
Processing row index : 1352 
------------------------------------------
Processing row index : 1353 
------------------------------------------
Processing row index : 1354 
------------------------------------------
Processing row index : 1355 
------------------------------------------
Processing row index : 1356 
------------------------------------------
Processing row index : 1359 
------------------------------------------
Processing row index : 1362 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 1484 
------------------------------------------
Processing row index : 1485 
------------------------------------------
Processing row index : 1486 
------------------------------------------
Processing row index : 1487 
------------------------------------------
Processing row index : 1488 
------------------------------------------
Processing row index : 1489 
------------------------------------------
Processing row index : 1490 
------------------------------------------
Processing row index : 1491 
------------------------------------------
Processing row index : 1492 
------------------------------------------
Processing row index : 1493 
------------------------------------------
Processing row index : 1494 
------------------------------------------
Processing row index : 1495 
------------------------------------------
Processing row index : 1496 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 1948 
------------------------------------------
Processing row index : 1949 
------------------------------------------
Processing row index : 1950 
------------------------------------------
Processing row index : 1951 
------------------------------------------
Processing row index : 1952 
------------------------------------------
Processing row index : 1953 
------------------------------------------
Processing row index : 1954 
------------------------------------------
Processing row index : 1955 
------------------------------------------
Processing row index : 1956 
------------------------------------------
Processing row index : 1957 
------------------------------------------
Processing row index : 1958 
------------------------------------------
Processing row index : 1959 
------------------------------------------
Processing row index : 1960 
------------------------------------------
Processing row index 

Processing row index : 2111 
------------------------------------------
Processing row index : 2112 
------------------------------------------
Processing row index : 2113 
------------------------------------------
Processing row index : 2114 
------------------------------------------
Processing row index : 2116 
------------------------------------------
Processing row index : 2117 
------------------------------------------
Processing row index : 2118 
------------------------------------------
Processing row index : 2119 
------------------------------------------
Processing row index : 2121 
------------------------------------------
Processing row index : 2125 
------------------------------------------
Processing row index : 2127 
------------------------------------------
Processing row index : 2130 
------------------------------------------
Processing row index : 2131 
------------------------------------------
Processing row index : 2133 
-----------------------------------

Processing row index : 2411 
------------------------------------------
Processing row index : 2412 
------------------------------------------
Processing row index : 2413 
------------------------------------------
Processing row index : 2414 
------------------------------------------
Processing row index : 2415 
------------------------------------------
Processing row index : 2416 
------------------------------------------
Processing row index : 2417 
------------------------------------------
Processing row index : 2418 
------------------------------------------
Processing row index : 2419 
------------------------------------------
Processing row index : 2420 
------------------------------------------
Processing row index : 2421 
------------------------------------------
Processing row index : 2422 
------------------------------------------
Processing row index : 2423 
------------------------------------------
Processing row index : 2424 
-----------------------------------

------------------------------------------
Processing row index : 2637 
------------------------------------------
Processing row index : 2638 
------------------------------------------
Processing row index : 2639 
------------------------------------------
Processing row index : 2640 
------------------------------------------
Processing row index : 2641 
------------------------------------------
Processing row index : 2642 
------------------------------------------
Processing row index : 2643 
------------------------------------------
Processing row index : 2644 
------------------------------------------
Processing row index : 2645 
------------------------------------------
Processing row index : 2646 
------------------------------------------
Processing row index : 2647 
------------------------------------------
Processing row index : 2648 
------------------------------------------
Processing row index : 2649 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 2938 
------------------------------------------
Processing row index : 2939 
------------------------------------------
Processing row index : 2941 
------------------------------------------
Processing row index : 2942 
------------------------------------------
Processing row index : 2944 
------------------------------------------
Processing row index : 2945 
------------------------------------------
Processing row index : 2946 
------------------------------------------
Processing row index : 2947 
------------------------------------------
Processing row index : 2949 
------------------------------------------
Processing row index : 2950 
------------------------------------------
Processing row index : 2952 
------------------------------------------
Processing row index : 2953 
------------------------------------------
Processing row index : 2954 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 3187 
------------------------------------------
Processing row index : 3189 
------------------------------------------
Processing row index : 3190 
------------------------------------------
Processing row index : 3193 
------------------------------------------
Processing row index : 3198 
------------------------------------------
Processing row index : 3200 
------------------------------------------
Processing row index : 3201 
------------------------------------------
Processing row index : 3203 
------------------------------------------
Processing row index : 3205 
------------------------------------------
Processing row index : 3206 
------------------------------------------
Processing row index : 3211 
------------------------------------------
Processing row index : 3215 
------------------------------------------
Processing row index : 3217 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 3445 
------------------------------------------
Processing row index : 3446 
------------------------------------------
Processing row index : 3447 
------------------------------------------
Processing row index : 3448 
------------------------------------------
Processing row index : 3449 
------------------------------------------
Processing row index : 3450 
------------------------------------------
Processing row index : 3451 
------------------------------------------
Processing row index : 3452 
------------------------------------------
Processing row index : 3453 
------------------------------------------
Processing row index : 3454 
------------------------------------------
Processing row index : 3455 
------------------------------------------
Processing row index : 3456 
------------------------------------------
Processing row index : 3457 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 3640 
------------------------------------------
Processing row index : 3642 
------------------------------------------
Processing row index : 3643 
------------------------------------------
Processing row index : 3644 
------------------------------------------
Processing row index : 3648 
------------------------------------------
Processing row index : 3650 
------------------------------------------
Processing row index : 3655 
------------------------------------------
Processing row index : 3657 
------------------------------------------
Processing row index : 3659 
------------------------------------------
Processing row index : 3661 
------------------------------------------
Processing row index : 3663 
------------------------------------------
Processing row index : 3664 
------------------------------------------
Processing row index : 3666 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 3966 
------------------------------------------
Processing row index : 3968 
------------------------------------------
Processing row index : 3969 
------------------------------------------
Processing row index : 3970 
------------------------------------------
Processing row index : 3971 
------------------------------------------
Processing row index : 3972 
------------------------------------------
Processing row index : 3973 
------------------------------------------
Processing row index : 3974 
------------------------------------------
Processing row index : 3975 
------------------------------------------
Processing row index : 3976 
------------------------------------------
Processing row index : 3977 
------------------------------------------
Processing row index : 3978 
------------------------------------------
Processing row index : 3979 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 4266 
------------------------------------------
Processing row index : 4267 
------------------------------------------
Processing row index : 4268 
------------------------------------------
Processing row index : 4269 
------------------------------------------
Processing row index : 4270 
------------------------------------------
Processing row index : 4272 
------------------------------------------
Processing row index : 4274 
------------------------------------------
Processing row index : 4276 
------------------------------------------
Processing row index : 4277 
------------------------------------------
Processing row index : 4278 
------------------------------------------
Processing row index : 4281 
------------------------------------------
Processing row index : 4283 
------------------------------------------
Processing row index : 4284 
------------------------------------------
Processing row index 

Processing row index : 4476 
------------------------------------------
Processing row index : 4478 
------------------------------------------
Processing row index : 4479 
------------------------------------------
Processing row index : 4480 
------------------------------------------
Processing row index : 4481 
------------------------------------------
Processing row index : 4484 
------------------------------------------
Processing row index : 4485 
------------------------------------------
Processing row index : 4487 
------------------------------------------
Processing row index : 4488 
------------------------------------------
Processing row index : 4489 
------------------------------------------
Processing row index : 4490 
------------------------------------------
Processing row index : 4491 
------------------------------------------
Processing row index : 4492 
------------------------------------------
Processing row index : 4494 
-----------------------------------

------------------------------------------
Processing row index : 5027 
------------------------------------------
Processing row index : 5028 
------------------------------------------
Processing row index : 5029 
------------------------------------------
Processing row index : 5031 
------------------------------------------
Processing row index : 5032 
------------------------------------------
Processing row index : 5033 
------------------------------------------
Processing row index : 5035 
------------------------------------------
Processing row index : 5036 
------------------------------------------
Processing row index : 5037 
------------------------------------------
Processing row index : 5038 
------------------------------------------
Processing row index : 5039 
------------------------------------------
Processing row index : 5041 
------------------------------------------
Processing row index : 5043 
------------------------------------------
Processing row index 

Processing row index : 5275 
------------------------------------------
Processing row index : 5276 
------------------------------------------
Processing row index : 5277 
------------------------------------------
Processing row index : 5278 
------------------------------------------
Processing row index : 5279 
------------------------------------------
Processing row index : 5281 
------------------------------------------
Processing row index : 5282 
------------------------------------------
Processing row index : 5284 
------------------------------------------
Processing row index : 5285 
------------------------------------------
Processing row index : 5286 
------------------------------------------
Processing row index : 5287 
------------------------------------------
Processing row index : 5288 
------------------------------------------
Processing row index : 5289 
------------------------------------------
Processing row index : 5290 
-----------------------------------

------------------------------------------
Processing row index : 5457 
------------------------------------------
Processing row index : 5458 
------------------------------------------
Processing row index : 5459 
------------------------------------------
Processing row index : 5460 
------------------------------------------
Processing row index : 5461 
------------------------------------------
Processing row index : 5462 
------------------------------------------
Processing row index : 5463 
------------------------------------------
Processing row index : 5464 
------------------------------------------
Processing row index : 5465 
------------------------------------------
Processing row index : 5466 
------------------------------------------
Processing row index : 5467 
------------------------------------------
Processing row index : 5468 
------------------------------------------
Processing row index : 5469 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 5635 
------------------------------------------
Processing row index : 5638 
------------------------------------------
Processing row index : 5639 
------------------------------------------
Processing row index : 5640 
------------------------------------------
Processing row index : 5642 
------------------------------------------
Processing row index : 5644 
------------------------------------------
Processing row index : 5645 
------------------------------------------
Processing row index : 5647 
------------------------------------------
Processing row index : 5648 
------------------------------------------
Processing row index : 5649 
------------------------------------------
Processing row index : 5650 
------------------------------------------
Processing row index : 5653 
------------------------------------------
Processing row index : 5654 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 5973 
------------------------------------------
Processing row index : 5976 
------------------------------------------
Processing row index : 5978 
------------------------------------------
Processing row index : 5981 
------------------------------------------
Processing row index : 5982 
------------------------------------------
Processing row index : 5994 
------------------------------------------
Processing row index : 5999 
------------------------------------------
Processing row index : 6000 
------------------------------------------
Processing row index : 6007 
------------------------------------------
Processing row index : 6012 
------------------------------------------
Processing row index : 6016 
------------------------------------------
Processing row index : 6017 
------------------------------------------
Processing row index : 6018 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 6322 
------------------------------------------
Processing row index : 6323 
------------------------------------------
Processing row index : 6324 
------------------------------------------
Processing row index : 6325 
------------------------------------------
Processing row index : 6326 
------------------------------------------
Processing row index : 6327 
------------------------------------------
Processing row index : 6328 
------------------------------------------
Processing row index : 6329 
------------------------------------------
Processing row index : 6330 
------------------------------------------
Processing row index : 6333 
------------------------------------------
Processing row index : 6335 
------------------------------------------
Processing row index : 6336 
------------------------------------------
Processing row index : 6337 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 6687 
------------------------------------------
Processing row index : 6688 
------------------------------------------
Processing row index : 6689 
------------------------------------------
Processing row index : 6692 
------------------------------------------
Processing row index : 6694 
------------------------------------------
Processing row index : 6695 
------------------------------------------
Processing row index : 6696 
------------------------------------------
Processing row index : 6697 
------------------------------------------
Processing row index : 6699 
------------------------------------------
Processing row index : 6700 
------------------------------------------
Processing row index : 6703 
------------------------------------------
Processing row index : 6705 
------------------------------------------
Processing row index : 6706 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 6949 
------------------------------------------
Processing row index : 6955 
------------------------------------------
Processing row index : 6961 
------------------------------------------
Processing row index : 6962 
------------------------------------------
Processing row index : 6963 
------------------------------------------
Processing row index : 6964 
------------------------------------------
Processing row index : 6965 
------------------------------------------
Processing row index : 6966 
------------------------------------------
Processing row index : 6967 
------------------------------------------
Processing row index : 6968 
------------------------------------------
Processing row index : 6969 
------------------------------------------
Processing row index : 6970 
------------------------------------------
Processing row index : 6971 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 7314 
------------------------------------------
Processing row index : 7315 
------------------------------------------
Processing row index : 7316 
------------------------------------------
Processing row index : 7317 
------------------------------------------
Processing row index : 7318 
------------------------------------------
Processing row index : 7319 
------------------------------------------
Processing row index : 7320 
------------------------------------------
Processing row index : 7321 
------------------------------------------
Processing row index : 7323 
------------------------------------------
Processing row index : 7324 
------------------------------------------
Processing row index : 7326 
------------------------------------------
Processing row index : 7327 
------------------------------------------
Processing row index : 7328 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 7458 
------------------------------------------
Processing row index : 7460 
------------------------------------------
Processing row index : 7461 
------------------------------------------
Processing row index : 7462 
------------------------------------------
Processing row index : 7463 
------------------------------------------
Processing row index : 7465 
------------------------------------------
Processing row index : 7466 
------------------------------------------
Processing row index : 7467 
------------------------------------------
Processing row index : 7468 
------------------------------------------
Processing row index : 7469 
------------------------------------------
Processing row index : 7470 
------------------------------------------
Processing row index : 7471 
------------------------------------------
Processing row index : 7472 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 7644 
------------------------------------------
Processing row index : 7647 
------------------------------------------
Processing row index : 7649 
------------------------------------------
Processing row index : 7652 
------------------------------------------
Processing row index : 7653 
------------------------------------------
Processing row index : 7654 
------------------------------------------
Processing row index : 7658 
------------------------------------------
Processing row index : 7662 
------------------------------------------
Processing row index : 7664 
------------------------------------------
Processing row index : 7667 
------------------------------------------
Processing row index : 7671 
------------------------------------------
Processing row index : 7674 
------------------------------------------
Processing row index : 7675 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 7885 
------------------------------------------
Processing row index : 7886 
------------------------------------------
Processing row index : 7887 
------------------------------------------
Processing row index : 7888 
------------------------------------------
Processing row index : 7889 
------------------------------------------
Processing row index : 7890 
------------------------------------------
Processing row index : 7891 
------------------------------------------
Processing row index : 7892 
------------------------------------------
Processing row index : 7893 
------------------------------------------
Processing row index : 7894 
------------------------------------------
Processing row index : 7895 
------------------------------------------
Processing row index : 7896 
------------------------------------------
Processing row index : 7899 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 8033 
------------------------------------------
Processing row index : 8034 
------------------------------------------
Processing row index : 8035 
------------------------------------------
Processing row index : 8036 
------------------------------------------
Processing row index : 8037 
------------------------------------------
Processing row index : 8039 
------------------------------------------
Processing row index : 8040 
------------------------------------------
Processing row index : 8041 
------------------------------------------
Processing row index : 8042 
------------------------------------------
Processing row index : 8045 
------------------------------------------
Processing row index : 8046 
------------------------------------------
Processing row index : 8047 
------------------------------------------
Processing row index : 8048 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 8193 
------------------------------------------
Processing row index : 8194 
------------------------------------------
Processing row index : 8195 
------------------------------------------
Processing row index : 8196 
------------------------------------------
Processing row index : 8197 
------------------------------------------
Processing row index : 8198 
------------------------------------------
Processing row index : 8199 
------------------------------------------
Processing row index : 8200 
------------------------------------------
Processing row index : 8202 
------------------------------------------
Processing row index : 8204 
------------------------------------------
Processing row index : 8205 
------------------------------------------
Processing row index : 8206 
------------------------------------------
Processing row index : 8207 
------------------------------------------
Processing row index 

Processing row index : 8476 
------------------------------------------
Processing row index : 8477 
------------------------------------------
Processing row index : 8478 
------------------------------------------
Processing row index : 8479 
------------------------------------------
Processing row index : 8480 
------------------------------------------
Processing row index : 8481 
------------------------------------------
Processing row index : 8482 
------------------------------------------
Processing row index : 8483 
------------------------------------------
Processing row index : 8484 
------------------------------------------
Processing row index : 8485 
------------------------------------------
Processing row index : 8486 
------------------------------------------
Processing row index : 8487 
------------------------------------------
Processing row index : 8488 
------------------------------------------
Processing row index : 8489 
-----------------------------------

------------------------------------------
Processing row index : 8654 
------------------------------------------
Processing row index : 8655 
------------------------------------------
Processing row index : 8657 
------------------------------------------
Processing row index : 8660 
------------------------------------------
Processing row index : 8661 
------------------------------------------
Processing row index : 8662 
------------------------------------------
Processing row index : 8663 
------------------------------------------
Processing row index : 8664 
------------------------------------------
Processing row index : 8665 
------------------------------------------
Processing row index : 8666 
------------------------------------------
Processing row index : 8667 
------------------------------------------
Processing row index : 8668 
------------------------------------------
Processing row index : 8669 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 8860 
------------------------------------------
Processing row index : 8862 
------------------------------------------
Processing row index : 8863 
------------------------------------------
Processing row index : 8864 
------------------------------------------
Processing row index : 8870 
------------------------------------------
Processing row index : 8871 
------------------------------------------
Processing row index : 8872 
------------------------------------------
Processing row index : 8876 
------------------------------------------
Processing row index : 8881 
------------------------------------------
Processing row index : 8883 
------------------------------------------
Processing row index : 8885 
------------------------------------------
Processing row index : 8888 
------------------------------------------
Processing row index : 8889 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 9021 
------------------------------------------
Processing row index : 9022 
------------------------------------------
Processing row index : 9023 
------------------------------------------
Processing row index : 9025 
------------------------------------------
Processing row index : 9026 
------------------------------------------
Processing row index : 9027 
------------------------------------------
Processing row index : 9028 
------------------------------------------
Processing row index : 9029 
------------------------------------------
Processing row index : 9030 
------------------------------------------
Processing row index : 9031 
------------------------------------------
Processing row index : 9032 
------------------------------------------
Processing row index : 9033 
------------------------------------------
Processing row index : 9034 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 9200 
------------------------------------------
Processing row index : 9201 
------------------------------------------
Processing row index : 9202 
------------------------------------------
Processing row index : 9203 
------------------------------------------
Processing row index : 9204 
------------------------------------------
Processing row index : 9205 
------------------------------------------
Processing row index : 9207 
------------------------------------------
Processing row index : 9208 
------------------------------------------
Processing row index : 9209 
------------------------------------------
Processing row index : 9210 
------------------------------------------
Processing row index : 9211 
------------------------------------------
Processing row index : 9212 
------------------------------------------
Processing row index : 9213 
------------------------------------------
Processing row index 

Processing row index : 9364 
------------------------------------------
Processing row index : 9365 
------------------------------------------
Processing row index : 9367 
------------------------------------------
Processing row index : 9368 
------------------------------------------
Processing row index : 9369 
------------------------------------------
Processing row index : 9370 
------------------------------------------
Processing row index : 9371 
------------------------------------------
Processing row index : 9373 
------------------------------------------
Processing row index : 9375 
------------------------------------------
Processing row index : 9378 
------------------------------------------
Processing row index : 9379 
------------------------------------------
Processing row index : 9380 
------------------------------------------
Processing row index : 9381 
------------------------------------------
Processing row index : 9383 
-----------------------------------

------------------------------------------
Processing row index : 9572 
------------------------------------------
Processing row index : 9573 
------------------------------------------
Processing row index : 9576 
------------------------------------------
Processing row index : 9577 
------------------------------------------
Processing row index : 9579 
------------------------------------------
Processing row index : 9582 
------------------------------------------
Processing row index : 9585 
------------------------------------------
Processing row index : 9586 
------------------------------------------
Processing row index : 9587 
------------------------------------------
Processing row index : 9590 
------------------------------------------
Processing row index : 9592 
------------------------------------------
Processing row index : 9594 
------------------------------------------
Processing row index : 9595 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 9765 
------------------------------------------
Processing row index : 9766 
------------------------------------------
Processing row index : 9767 
------------------------------------------
Processing row index : 9768 
------------------------------------------
Processing row index : 9769 
------------------------------------------
Processing row index : 9770 
------------------------------------------
Processing row index : 9771 
------------------------------------------
Processing row index : 9772 
------------------------------------------
Processing row index : 9773 
------------------------------------------
Processing row index : 9774 
------------------------------------------
Processing row index : 9775 
------------------------------------------
Processing row index : 9776 
------------------------------------------
Processing row index : 9777 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 9944 
------------------------------------------
Processing row index : 9945 
------------------------------------------
Processing row index : 9946 
------------------------------------------
Processing row index : 9947 
------------------------------------------
Processing row index : 9948 
------------------------------------------
Processing row index : 9949 
------------------------------------------
Processing row index : 9950 
------------------------------------------
Processing row index : 9951 
------------------------------------------
Processing row index : 9952 
------------------------------------------
Processing row index : 9953 
------------------------------------------
Processing row index : 9954 
------------------------------------------
Processing row index : 9955 
------------------------------------------
Processing row index : 9956 
------------------------------------------
Processing row index 

------------------------------------------
Processing row index : 10094 
------------------------------------------
Processing row index : 10095 
------------------------------------------
Processing row index : 10096 
------------------------------------------
Processing row index : 10098 
------------------------------------------
Processing row index : 10103 
------------------------------------------
Processing row index : 10106 
------------------------------------------
Processing row index : 10107 
------------------------------------------
Processing row index : 10109 
------------------------------------------
Processing row index : 10110 
------------------------------------------
Processing row index : 10111 
------------------------------------------
Processing row index : 10112 
------------------------------------------
Processing row index : 10113 
------------------------------------------
Processing row index : 10114 
------------------------------------------
Processi

Processing row index : 10258 
------------------------------------------
Processing row index : 10259 
------------------------------------------
Processing row index : 10260 
------------------------------------------
Processing row index : 10261 
------------------------------------------
Processing row index : 10262 
------------------------------------------
Processing row index : 10263 
------------------------------------------
Processing row index : 10264 
------------------------------------------
Processing row index : 10265 
------------------------------------------
Processing row index : 10266 
------------------------------------------
Processing row index : 10267 
------------------------------------------
Processing row index : 10268 
------------------------------------------
Processing row index : 10269 
------------------------------------------
Processing row index : 10270 
------------------------------------------
Processing row index : 10271 
---------------------

------------------------------------------
Processing row index : 10427 
------------------------------------------
Processing row index : 10429 
------------------------------------------
Processing row index : 10431 
------------------------------------------
Processing row index : 10432 
------------------------------------------
Processing row index : 10433 
------------------------------------------
Processing row index : 10435 
------------------------------------------
Processing row index : 10438 
------------------------------------------
Processing row index : 10439 
------------------------------------------
Processing row index : 10440 
------------------------------------------
Processing row index : 10441 
------------------------------------------
Processing row index : 10443 
------------------------------------------
Processing row index : 10444 
------------------------------------------
Processing row index : 10445 
------------------------------------------
Processi

------------------------------------------
Processing row index : 10565 
------------------------------------------
Processing row index : 10566 
------------------------------------------
Processing row index : 10567 
------------------------------------------
Processing row index : 10569 
------------------------------------------
Processing row index : 10570 
------------------------------------------
Processing row index : 10572 
------------------------------------------
Processing row index : 10573 
------------------------------------------
Processing row index : 10574 
------------------------------------------
Processing row index : 10576 
------------------------------------------
Processing row index : 10577 
------------------------------------------
Processing row index : 10578 
------------------------------------------
Processing row index : 10579 
------------------------------------------
Processing row index : 10580 
------------------------------------------
Processi

------------------------------------------
Processing row index : 10727 
------------------------------------------
Processing row index : 10728 
------------------------------------------
Processing row index : 10729 
------------------------------------------
Processing row index : 10731 
------------------------------------------
Processing row index : 10734 
------------------------------------------
Processing row index : 10736 
------------------------------------------
Processing row index : 10737 
------------------------------------------
Processing row index : 10740 
------------------------------------------
Processing row index : 10742 
------------------------------------------
Processing row index : 10743 
------------------------------------------
Processing row index : 10744 
------------------------------------------
Processing row index : 10746 
------------------------------------------
Processing row index : 10747 
------------------------------------------
Processi

Unnamed: 0,0,1,2
id,135397,135397,135397
imdb_id,tt0369610,tt0369610,tt0369610
popularity,32.985763,32.985763,32.985763
budget,150000000,150000000,150000000
revenue,1513528810,1513528810,1513528810
original_title,Jurassic World,Jurassic World,Jurassic World
cast,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...,Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...
director,Colin Trevorrow,Colin Trevorrow,Colin Trevorrow
tagline,The park is open.,The park is open.,The park is open.
keywords,monster|dna|tyrannosaurus rex|velociraptor|island,monster|dna|tyrannosaurus rex|velociraptor|island,monster|dna|tyrannosaurus rex|velociraptor|island


In [32]:
# Current shape of dataset
df.shape

(18258, 20)

In [33]:
# List of movie genres
df["genres"].unique()

array(['Action', 'Adventure', 'Science Fiction', 'Thriller', 'Fantasy',
       'Crime', 'Western', 'Drama', 'Family', 'Animation', 'Comedy',
       'Mystery', 'Romance', 'War', 'History', 'Music', 'Horror',
       'Documentary', 'TV Movie', 'Foreign'], dtype=object)

### Between the genres of movies on trend, what is the correlation between them and revenue ?

### What is the minimum amount of money to invest in a given genre in order to have the higher revenue possible ?

### What is the companies that are best at the genre of movie that is of interest to us ?

In [34]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


### Does having a website for the genre that is of interest to us help improve revenue ?

<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!