<a href="https://colab.research.google.com/github/oigwe-frx/movie-database-analysis/blob/oi%2Fcontext-statement/Movie_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


-------------------------------------
# **Project 1: Movie Data Analysis**
-------------------------------------

--------------------
## **Context**
--------------------

In an era marked by the convergence of technology and entertainment, the film industry is a pillar of global cultural dissemination and economic vitality. Understanding the intricate factors influencing a movie's commercial success is paramount within this dynamic landscape. Leveraging data from the Internet Movie Database (IMDb), this project aims to delve into the multifaceted realm of box office hits, dissecting the variables that potentially shape their triumph or demise.

The cinematic ecosystem is a tapestry woven with diverse elements, ranging from star power and production budget to genre and critical reception. Against this backdrop, the analysis endeavors to discern patterns and correlations within the data, unraveling the enigmatic interplay between various attributes and a movie's financial performance.

As the digital era reshapes audience preferences and consumption patterns, traditional metrics of success undergo a metamorphosis. Thus, this investigation seeks to illuminate established paradigms and explore emerging trends and disruptions catalyzed by technological advancements and shifting audience dynamics.

By scrutinizing the data with a meticulous eye and employing sophisticated analytical methodologies, this project endeavors to offer insights that transcend conventional wisdom, providing stakeholders within the film industry with actionable intelligence to navigate an ever-evolving landscape and maximize their chances of crafting cinematic endeavors that resonate with audiences and thrive at the box office.

------------------
## **Objective**
------------------

This data analysis and visualization project aims to investigate and identify critical attributes influencing the commercial success of movies. Leveraging data sourced from IMDb, the project aims to uncover patterns and correlations between various factors and a movie's box office performance. Through meticulous analysis and visualization techniques, the project seeks to provide actionable insights to stakeholders in the film industry, enabling them to make informed decisions to enhance the financial viability of their cinematic endeavors.

-----------------------------
## **General Key Questions**
-----------------------------

- What is the dataset source for the visualization project, and how reliable and comprehensive is it?

- How can data be cleaned and prepared for visualization to ensure accuracy and consistency?

- What visualization techniques are most suitable for representing different data types and revealing insights?

- How should the visualizations be designed to cater to the target audience and convey information efficiently?

- Can any specific trends or patterns in the data be uncovered through visualization?

- What are the key metrics and performance indicators to track the effectiveness of the visualization?

- Can the visualization techniques be applied to different datasets and domains beyond the current project?

- How can user feedback and iterative design processes be integrated to improve the effectiveness of the visualizations continuously?

-----------------------------
## **Data Specific Key Questions**
-----------------------------
*   What are the most popular movies?
  *   Determine the top-rated or most popular titles based on IMDb ratings or user reviews.
      
* What are the trends in movie genres over time?
    * Analyze how the popularity of different genres has evolved over the years.
      
* Which actors have appeared in the most movies?
    * Identify prolific actors and actresses in the IMDb database.
      
* What are the highest-grossing movies of all time?
    * Investigate box office revenue data to find the most financially successful films.
      
* Are there any correlations between movie budgets and box office performance?
    * Explore whether higher budgets lead to higher box office earnings.
      
* What is the distribution of movie ratings on IMDb?
    * Analyze the distribution of IMDb ratings to understand audience preferences.
            
* What is the average movie runtime?
    * Calculate the average duration of movies and see if there are any trends over time.
      
* Who are the top-rated directors?
    * Identify directors with the highest-rated movies in the database.     
      
* Are there any geographical trends in movie preferences?
    * Explore whether movie preferences vary by region or country.
      
* How has the film industry evolved over the years?
    * Look at historical data to understand film production, technology, and distribution changes.
      
* What are the most influential factors for IMDb ratings?
    * Analyze which factors, genre, director, or cast impact ratings are most important.

* Are there any outliers or anomalies in the data?
    * Look for unusual or unexpected patterns in the data that may require further investigation.

------------------------------------
## **Dataset Description**
------------------------------------

### Data Description

The data contains data related to a films, as compiled by the Internet Movie Database (IMDb). The detailed data dictionary is given below.

### Data Dictionary

* names: movie name
* date_x: release date
* score: user rating
* genre: genre of the movie
* overview: a summary of the movie's plot
* crew: crew members
* orig_title: the original title of the movie
* status: release status
* orig_lang: originally released in this language
* budget_x: movie budget
* revenue: revenue generated worldwide
* country: release country

##  **Importing the necessary libraries and overview of the dataset**

In [2]:
# Library to suppress warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Libraries to help with data visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Library to extract datetime features
import datetime as dt

### **Loading the dataset**

In [5]:
data = pd.read_csv('/content/imdb_movies.csv')

### **View the first 5 rows of the dataset**

In [7]:
# Looking at head (the first 5 observations)
df.head()

Unnamed: 0,names,date_x,score,genre,overview,crew,orig_title,status,orig_lang,budget_x,revenue,country
0,Creed III,03/02/2023,73.0,"Drama, Action","After dominating the boxing world, Adonis Cree...","Michael B. Jordan, Adonis Creed, Tessa Thompso...",Creed III,Released,English,75000000.0,271616700.0,AU
1,Avatar: The Way of Water,12/15/2022,78.0,"Science Fiction, Adventure, Action",Set more than a decade after the events of the...,"Sam Worthington, Jake Sully, Zoe Saldaña, Neyt...",Avatar: The Way of Water,Released,English,460000000.0,2316795000.0,AU
2,The Super Mario Bros. Movie,04/05/2023,76.0,"Animation, Adventure, Family, Fantasy, Comedy","While working underground to fix a water main,...","Chris Pratt, Mario (voice), Anya Taylor-Joy, P...",The Super Mario Bros. Movie,Released,English,100000000.0,724459000.0,AU
3,Mummies,01/05/2023,70.0,"Animation, Comedy, Family, Adventure, Fantasy","Through a series of unfortunate events, three ...","Óscar Barberán, Thut (voice), Ana Esther Albor...",Momias,Released,"Spanish, Castilian",12300000.0,34200000.0,AU
4,Supercell,03/17/2023,61.0,Action,Good-hearted teenager William always lived in ...,"Skeet Ulrich, Roy Cameron, Anne Heche, Dr Quin...",Supercell,Released,English,77000000.0,340942000.0,US


**Observations:**

* ...

### **View the last 5 rows of the dataset**

In [8]:
# Looking at tail (the last 5 observations)
df.tail()

Unnamed: 0,names,date_x,score,genre,overview,crew,orig_title,status,orig_lang,budget_x,revenue,country
10173,20th Century Women,12/28/2016,73.0,Drama,"In 1979 Santa Barbara, California, Dorothea Fi...","Annette Bening, Dorothea Fields, Lucas Jade Zu...",20th Century Women,Released,English,7000000.0,9353729.0,US
10174,Delta Force 2: The Colombian Connection,08/24/1990,54.0,Action,When DEA agents are taken captive by a ruthles...,"Chuck Norris, Col. Scott McCoy, Billy Drago, R...",Delta Force 2: The Colombian Connection,Released,English,9145817.8,6698361.0,US
10175,The Russia House,12/21/1990,61.0,"Drama, Thriller, Romance","Barley Scott Blair, a Lisbon-based editor of R...","Sean Connery, Bartholomew 'Barley' Scott Blair...",The Russia House,Released,English,21800000.0,22997992.0,US
10176,Darkman II: The Return of Durant,07/11/1995,55.0,"Action, Adventure, Science Fiction, Thriller, ...",Darkman and Durant return and they hate each o...,"Larry Drake, Robert G. Durant, Arnold Vosloo, ...",Darkman II: The Return of Durant,Released,English,116000000.0,475661306.0,US
10177,The Swan Princess: A Royal Wedding,07/20/2020,70.0,"Animation, Family, Fantasy",Princess Odette and Prince Derek are going to ...,"Nina Herzog, Princess Odette (voice), Yuri Low...",The Swan Princess: A Royal Wedding,Released,English,92400000.0,539401838.6,GB


**Observations:**

* ...

### **Checking the shape of the dataset**

In [9]:
df.shape

(10178, 12)

### **Checking the info()**

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10178 entries, 0 to 10177
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   names       10178 non-null  object 
 1   date_x      10178 non-null  object 
 2   score       10178 non-null  float64
 3   genre       10093 non-null  object 
 4   overview    10178 non-null  object 
 5   crew        10122 non-null  object 
 6   orig_title  10178 non-null  object 
 7   status      10178 non-null  object 
 8   orig_lang   10178 non-null  object 
 9   budget_x    10178 non-null  float64
 10  revenue     10178 non-null  float64
 11  country     10178 non-null  object 
dtypes: float64(3), object(9)
memory usage: 954.3+ KB


**Observations:**

* ...

### **Cleaning of the data**

In [11]:
# Convert date_x to datetime, and  budget_x and revenue into integers:
df['date_x'] = pd.to_datetime(df['date_x'])
df['budget_x'] = df['budget_x'].astype(int)
df['revenue'] = df['revenue'].astype(int)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10178 entries, 0 to 10177
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   names       10178 non-null  object        
 1   date_x      10178 non-null  datetime64[ns]
 2   score       10178 non-null  float64       
 3   genre       10093 non-null  object        
 4   overview    10178 non-null  object        
 5   crew        10122 non-null  object        
 6   orig_title  10178 non-null  object        
 7   status      10178 non-null  object        
 8   orig_lang   10178 non-null  object        
 9   budget_x    10178 non-null  int64         
 10  revenue     10178 non-null  int64         
 11  country     10178 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(2), object(8)
memory usage: 954.3+ KB


In [12]:
# Identify columns with null data and how many items. 
df.isna().sum()

names          0
date_x         0
score          0
genre         85
overview       0
crew          56
orig_title     0
status         0
orig_lang      0
budget_x       0
revenue        0
country        0
dtype: int64

In [13]:
# Display sample rows with null values for genre
df[df['genre'].isnull()].head(10)

Unnamed: 0,names,date_x,score,genre,overview,crew,orig_title,status,orig_lang,budget_x,revenue,country
305,Housewife Sex Slaves: Hatano Yui,2015-01-09,0.0,,We don't have an overview translated in Englis...,"Yui Hatano,",人妻性奴隷 波多野結衣,Released,Japanese,167540000,175269998,JP
1174,Beauty Rope Cosmetology,1983-12-02,10.0,,Miki is the daughter of an affluent family. Sh...,"Miki Takakura, Miki, Maya Ito, Rena, Ren Osugi...",団鬼六　美女縄化粧,Released,Japanese,201940000,38157314,JP
1561,Reclaim,2022-07-29,20.0,,She is a good woman living a fulfilling life. ...,,Reclaim,Released,Chinese,12001040,38139010,US
1762,Ancient Chinese Whorehouse,1994-09-15,50.0,,Madam Five and carpenter Kong work together ma...,"Kent Cheng, Kong, Yvonne Yung Hung, Miss Ng, S...",青樓十二房,Released,Cantonese,163600000,812667214,HK
1776,Porno document: Toruko tokkyû bin,1982-02-26,100.0,,Pinku from 1982.,"Jun Miho, , Rumi Kagawa, , Miyuki Oka, , Kayok...",ポルノドキュメント　トルコ特急便,Released,Japanese,201000000,1569323843,JP
2020,"Moses, Prince of Egypt",2000-01-01,56.0,,"At birth, Moses, a Hebrew baby is abandoned on...",,"Moses, Prince of Egypt",Released,English,80300000,321306715,AU
2144,My Beautiful Man ～eternal～,2023-04-07,0.0,,We don't have an overview translated in Englis...,"Riku Hagiwara, , Yusei Yagi,",美しい彼～eternal～,Released,Japanese,167540000,175269998,JP
2267,Office Lady Rope Slave,1981-01-23,20.0,,Two assistants to an S&M photographer decide t...,"Junko Mabuki, Kimiyo Ezaki, Asami Ogawa, Kikuk...",団鬼六　ＯＬ縄奴隷,Released,Japanese,12001040,38139010,JP
2317,Dream to be a Wife,2015-09-09,0.0,,We don't have an overview translated in Englis...,"Ai Uehara,",欲望人妻：滴垂之蜜,Released,Chinese,167540000,175269998,CN
2431,Barbie,1977-01-01,20.0,,Barbie comes home from shopping. She takes her...,,Barbie,Released,No Language,12001040,38139010,US


In [14]:
# Display sample rows with null values for crew
df[df['crew'].isnull()].head(10)

Unnamed: 0,names,date_x,score,genre,overview,crew,orig_title,status,orig_lang,budget_x,revenue,country
148,Orgasm Inc: The Story of OneTaste,2022-11-05,64.0,Documentary,A sexual wellness company gains fame and follo...,,Orgasm Inc: The Story of OneTaste,Released,English,77400000,431611098,US
206,Legend of the Galactic Heroes: Die Neue These ...,2022-09-30,61.0,Animation,The story focuses on the exploits of rivals Re...,,銀河英雄伝説 Die Neue These 策謀 1,Released,Japanese,138000000,337725907,JP
649,Cuento de Primavera-A Spring Tale,2022-12-20,81.0,"Drama, Fantasy, Mystery",We don't have an overview translated in Englis...,,Cuento de Primavera-A Spring Tale,Released,"Spanish, Castilian",77600000,827017257,US
938,Cat Pack: A PAW Patrol Exclusive Event,2022-06-24,74.0,"Animation, Family",When Mayor Humdinger transforms his robot cat ...,,Cat Pack: A PAW Patrol Exclusive Event,Released,English,92800000,609222681,US
1561,Reclaim,2022-07-29,20.0,,She is a good woman living a fulfilling life. ...,,Reclaim,Released,Chinese,12001040,38139010,US
1752,Avatar: Scene Deconstruction,2009-12-18,71.0,Documentary,The deconstruction of the Avatar scenes and sets,,Avatar: Scene Deconstruction,Released,English,90100000,414551647,US
1828,Cyber Hell: Exposing an Internet Horror,2022-05-18,73.0,Documentary,"Anonymous and exploitative, a network of onlin...",,사이버 지옥: n번방을 무너뜨려라,Released,Korean,53600000,682429959,KR
1881,Guinea Pig: Devil's Experiment,1985-09-04,47.0,Horror,A group of guys capture a young girl with the ...,,ギニーピッグ 悪魔の実験,Released,Japanese,57800000,519287241,US
2020,"Moses, Prince of Egypt",2000-01-01,56.0,,"At birth, Moses, a Hebrew baby is abandoned on...",,"Moses, Prince of Egypt",Released,English,80300000,321306715,AU
2431,Barbie,1977-01-01,20.0,,Barbie comes home from shopping. She takes her...,,Barbie,Released,No Language,12001040,38139010,US


In [15]:
# Remove all null values
df.dropna(inplace=True)
# reset index
df.reset_index()
df

Unnamed: 0,names,date_x,score,genre,overview,crew,orig_title,status,orig_lang,budget_x,revenue,country
0,Creed III,2023-03-02,73.0,"Drama, Action","After dominating the boxing world, Adonis Cree...","Michael B. Jordan, Adonis Creed, Tessa Thompso...",Creed III,Released,English,75000000,271616668,AU
1,Avatar: The Way of Water,2022-12-15,78.0,"Science Fiction, Adventure, Action",Set more than a decade after the events of the...,"Sam Worthington, Jake Sully, Zoe Saldaña, Neyt...",Avatar: The Way of Water,Released,English,460000000,2316794914,AU
2,The Super Mario Bros. Movie,2023-04-05,76.0,"Animation, Adventure, Family, Fantasy, Comedy","While working underground to fix a water main,...","Chris Pratt, Mario (voice), Anya Taylor-Joy, P...",The Super Mario Bros. Movie,Released,English,100000000,724459031,AU
3,Mummies,2023-01-05,70.0,"Animation, Comedy, Family, Adventure, Fantasy","Through a series of unfortunate events, three ...","Óscar Barberán, Thut (voice), Ana Esther Albor...",Momias,Released,"Spanish, Castilian",12300000,34200000,AU
4,Supercell,2023-03-17,61.0,Action,Good-hearted teenager William always lived in ...,"Skeet Ulrich, Roy Cameron, Anne Heche, Dr Quin...",Supercell,Released,English,77000000,340941958,US
...,...,...,...,...,...,...,...,...,...,...,...,...
10173,20th Century Women,2016-12-28,73.0,Drama,"In 1979 Santa Barbara, California, Dorothea Fi...","Annette Bening, Dorothea Fields, Lucas Jade Zu...",20th Century Women,Released,English,7000000,9353729,US
10174,Delta Force 2: The Colombian Connection,1990-08-24,54.0,Action,When DEA agents are taken captive by a ruthles...,"Chuck Norris, Col. Scott McCoy, Billy Drago, R...",Delta Force 2: The Colombian Connection,Released,English,9145817,6698361,US
10175,The Russia House,1990-12-21,61.0,"Drama, Thriller, Romance","Barley Scott Blair, a Lisbon-based editor of R...","Sean Connery, Bartholomew 'Barley' Scott Blair...",The Russia House,Released,English,21800000,22997992,US
10176,Darkman II: The Return of Durant,1995-07-11,55.0,"Action, Adventure, Science Fiction, Thriller, ...",Darkman and Durant return and they hate each o...,"Larry Drake, Robert G. Durant, Arnold Vosloo, ...",Darkman II: The Return of Durant,Released,English,116000000,475661306,US


In [16]:
# Rename Columns:
df= df.rename(columns={
    'names':'Title', 
    'date_x':'Date',
    'score':'Rating', 
    'genre':'Genre', 
    'overview':'Overview', 
    'crew':'Actor', 
    'orig_title':'Orginal Title',
    'status':'Status', 
    'orig_lang':'Orginal Language', 
    'budget_x':'Budget', 
    'revenue':'Revenue', 
    'country':'Country'
}
)
df

Unnamed: 0,Title,Date,Rating,Genre,Overview,Actor,Orginal Title,Status,Orginal Language,Budget,Revenue,Country
0,Creed III,2023-03-02,73.0,"Drama, Action","After dominating the boxing world, Adonis Cree...","Michael B. Jordan, Adonis Creed, Tessa Thompso...",Creed III,Released,English,75000000,271616668,AU
1,Avatar: The Way of Water,2022-12-15,78.0,"Science Fiction, Adventure, Action",Set more than a decade after the events of the...,"Sam Worthington, Jake Sully, Zoe Saldaña, Neyt...",Avatar: The Way of Water,Released,English,460000000,2316794914,AU
2,The Super Mario Bros. Movie,2023-04-05,76.0,"Animation, Adventure, Family, Fantasy, Comedy","While working underground to fix a water main,...","Chris Pratt, Mario (voice), Anya Taylor-Joy, P...",The Super Mario Bros. Movie,Released,English,100000000,724459031,AU
3,Mummies,2023-01-05,70.0,"Animation, Comedy, Family, Adventure, Fantasy","Through a series of unfortunate events, three ...","Óscar Barberán, Thut (voice), Ana Esther Albor...",Momias,Released,"Spanish, Castilian",12300000,34200000,AU
4,Supercell,2023-03-17,61.0,Action,Good-hearted teenager William always lived in ...,"Skeet Ulrich, Roy Cameron, Anne Heche, Dr Quin...",Supercell,Released,English,77000000,340941958,US
...,...,...,...,...,...,...,...,...,...,...,...,...
10173,20th Century Women,2016-12-28,73.0,Drama,"In 1979 Santa Barbara, California, Dorothea Fi...","Annette Bening, Dorothea Fields, Lucas Jade Zu...",20th Century Women,Released,English,7000000,9353729,US
10174,Delta Force 2: The Colombian Connection,1990-08-24,54.0,Action,When DEA agents are taken captive by a ruthles...,"Chuck Norris, Col. Scott McCoy, Billy Drago, R...",Delta Force 2: The Colombian Connection,Released,English,9145817,6698361,US
10175,The Russia House,1990-12-21,61.0,"Drama, Thriller, Romance","Barley Scott Blair, a Lisbon-based editor of R...","Sean Connery, Bartholomew 'Barley' Scott Blair...",The Russia House,Released,English,21800000,22997992,US
10176,Darkman II: The Return of Durant,1995-07-11,55.0,"Action, Adventure, Science Fiction, Thriller, ...",Darkman and Durant return and they hate each o...,"Larry Drake, Robert G. Durant, Arnold Vosloo, ...",Darkman II: The Return of Durant,Released,English,116000000,475661306,US


### **Summary of the data**

In [17]:
df.describe().T

Unnamed: 0,count,mean,min,25%,50%,75%,max,std
Date,10052.0,2008-06-07 16:17:25.762037504,1903-05-15 00:00:00,2002-01-08 06:00:00,2013-04-10 00:00:00,2019-09-26 00:00:00,2023-12-31 00:00:00,
Rating,10052.0,63.827,0.0,59.0,65.0,71.0,100.0,12.78271
Budget,10052.0,64125276.572821,1.0,14397627.25,50000000.0,104000000.0,460000000.0,56658516.682372
Revenue,10052.0,251204923.931257,0.0,27687812.0,149328803.5,416157754.5,2923706026.0,276549495.070688


**By default, the describe() function shows the summary of numeric variables only. Let's check the summary of non-numeric variables.**  

In [18]:
df.describe(exclude = 'number').T

Unnamed: 0,count,unique,top,freq,mean,min,25%,50%,75%,max
Title,10052,9538.0,Pinocchio,12.0,,,,,,
Date,10052,,,,2008-06-07 16:17:25.762037504,1903-05-15 00:00:00,2002-01-08 06:00:00,2013-04-10 00:00:00,2019-09-26 00:00:00,2023-12-31 00:00:00
Genre,10052,2300.0,Drama,556.0,,,,,,
Overview,10052,9810.0,We don't have an overview translated in Englis...,61.0,,,,,,
Actor,10052,9857.0,"Robert Armstrong, Carl Denham, Fay Wray, Ann D...",3.0,,,,,,
Orginal Title,10052,9614.0,Pinocchio,12.0,,,,,,
Status,10052,3.0,Released,10007.0,,,,,,
Orginal Language,10052,53.0,English,7381.0,,,,,,
Country,10052,57.0,AU,4880.0,,,,,,


**Observations:**

* ...

:**Let's check the count of each unique category in each of the categorical variables.**

In [None]:
# Making a list of all categorical variables


# Printing number of count of each unique value in each column


### **Missing value treatment**

In [None]:
# Checking missing values
df.isna().sum()

In [None]:
df.isnull().sum()

## **Exploratory Data Analysis: Univariate**

**Let us explore the numerical variables first.**

In [None]:
def histogram_boxplot(feature, figsize=(15, 10), bins="auto"): #Histogram
    """ Boxplot and histogram combined
    feature: 1-d feature array
    figsize: size of fig (default (15, 10))
    bins: number of bins (default "auto")
    """
    f, (ax_box, ax_hist) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid
        sharex=True,  # The X-axis will be shared among all the subplots
        gridspec_kw={"height_ratios": (.25, .75)},
        figsize=figsize
    )

    # Creating the subplots
    # Boxplot will be created and the mean value of the column will be indicated using some symbol
    sns.boxplot(x=feature, ax=ax_box, showmeans=True, color='red')

    # For histogram
    sns.histplot(x=feature, kde=False, ax=ax_hist, bins=bins)
    ax_hist.axvline(np.mean(feature), color='g', linestyle='--')      # Add mean to the histogram
    ax_hist.axvline(np.median(feature), color='black', linestyle='-') # Add median to the histogram

    plt.show()

### **Observations on [...]**

In [None]:
histogram_boxplot(df.[])

**Observations:**
* ...

[Rinse and Repeat for different plots...]

**Now, let's explore the categorical variables.**

In [None]:
def bar_perc(data, z): #Bar Plot
    total = len(data[z]) # Length of the column
    plt.figure(figsize = (15, 5))

    # Convert the column to a categorical data type
    data[z] = data[z].astype('category')

    ax = sns.countplot(x=z, data=data, palette='Paired', order=data[z].value_counts().index)

    for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_height() / total) # Percentage of each class
        x = p.get_x() + p.get_width() / 2 - 0.05                    # Width of the plot
        y = p.get_y() + p.get_height()                              # Height of the plot
        ax.annotate(percentage, (x, y), size = 12)                  # Annotate the percentage

    plt.show()                                                      # Display the plot

### **Observations on [...]**

In [None]:
bar_perc(df, ...)

**Observations:**
* ...

[Rinse and Repeat for different plots...]

## **Exploratory Data Analysis: Multivariate**

[Rinse and Repeat (same idea as the univariate) for different plots...]

## **Conclusion and Recommendations**

-----------------------------------------------------------------
### **Conclusion**
-----------------------------------------------------------------

We analyzed a dataset of nearly ...
The data spanned ...
The main feature of interest here is the ...
From a business perspective, ...
Thus, we determined the factors that affect ...

We have been able to conclude that:

1. ...

--------------------------------------------------
### **Recommendation to business**
--------------------------------------------------

1. ...

---------------------------------
###  **Further Analysis**
---------------------------------
1. Dig deeper to explore the variation of ...