### Overview
In this notebook, we will illustrate the share of female-curated films and movies on Netflix over the from 2008 to today. With the company's recent 5M dollar pledge to female filmmakers, we can use this data to study how Netflix will plan to build out its legacy towards gender-inclusive media. Likewise, an insight towards Netflix's ESG ratings can also be extracted with this data.

In [None]:
!pip install gender_guesser

import pandas as pd
import numpy as np
import datetime as dt
import seaborn as sns
import matplotlib.pyplot as plt
import datetime
import gender_guesser.detector as gender
from collections import Counter
from plotly.offline import iplot

In [None]:
df = pd.read_csv("../input/netflix-shows/netflix_titles.csv")

In [None]:
sns.set_style("darkgrid", {"axes.facecolor": "0.95"})
ax = df['rating'].value_counts().plot.bar(stacked=False, color='#99ccff',figsize=(18,5))
ax.set_title('Film Ratings Breakdown')
sns.despine()
plt.show()

**Rating**

* The most popular rating of films on Netflix is TV-MA, followed by TV-14 and TV-PG.

In [None]:
ax = df['release_year'].value_counts().sort_index(ascending=False).plot.bar(stacked=False, color='#99ccff', figsize=(18,5))
ax.set_title('Film Release Year Breakdown')
sns.despine()
plt.show()

**Release Year**

* The number of moves and shows released by Netflix so far, sorted by year. 

* Netflix released the most content in 2018, and has seen a steady decrease from then.

In [None]:
country_codes = {'afghanistan': 'AFG',
 'albania': 'ALB',
 'algeria': 'DZA',
 'american samoa': 'ASM',
 'andorra': 'AND',
 'angola': 'AGO',
 'anguilla': 'AIA',
 'antigua and barbuda': 'ATG',
 'argentina': 'ARG',
 'armenia': 'ARM',
 'aruba': 'ABW',
 'australia': 'AUS',
 'austria': 'AUT',
 'azerbaijan': 'AZE',
 'bahamas': 'BHM',
 'bahrain': 'BHR',
 'bangladesh': 'BGD',
 'barbados': 'BRB',
 'belarus': 'BLR',
 'belgium': 'BEL',
 'belize': 'BLZ',
 'benin': 'BEN',
 'bermuda': 'BMU',
 'bhutan': 'BTN',
 'bolivia': 'BOL',
 'bosnia and herzegovina': 'BIH',
 'botswana': 'BWA',
 'brazil': 'BRA',
 'british virgin islands': 'VGB',
 'brunei': 'BRN',
 'bulgaria': 'BGR',
 'burkina faso': 'BFA',
 'burma': 'MMR',
 'burundi': 'BDI',
 'cabo verde': 'CPV',
 'cambodia': 'KHM',
 'cameroon': 'CMR',
 'canada': 'CAN',
 'cayman islands': 'CYM',
 'central african republic': 'CAF',
 'chad': 'TCD',
 'chile': 'CHL',
 'china': 'CHN',
 'colombia': 'COL',
 'comoros': 'COM',
 'congo democratic': 'COD',
 'Congo republic': 'COG',
 'cook islands': 'COK',
 'costa rica': 'CRI',
 "cote d'ivoire": 'CIV',
 'croatia': 'HRV',
 'cuba': 'CUB',
 'curacao': 'CUW',
 'cyprus': 'CYP',
 'czech republic': 'CZE',
 'denmark': 'DNK',
 'djibouti': 'DJI',
 'dominica': 'DMA',
 'dominican republic': 'DOM',
 'ecuador': 'ECU',
 'egypt': 'EGY',
 'el salvador': 'SLV',
 'equatorial guinea': 'GNQ',
 'eritrea': 'ERI',
 'estonia': 'EST',
 'ethiopia': 'ETH',
 'falkland islands': 'FLK',
 'faroe islands': 'FRO',
 'fiji': 'FJI',
 'finland': 'FIN',
 'france': 'FRA',
 'french polynesia': 'PYF',
 'gabon': 'GAB',
 'gambia, the': 'GMB',
 'georgia': 'GEO',
 'germany': 'DEU',
 'ghana': 'GHA',
 'gibraltar': 'GIB',
 'greece': 'GRC',
 'greenland': 'GRL',
 'grenada': 'GRD',
 'guam': 'GUM',
 'guatemala': 'GTM',
 'guernsey': 'GGY',
 'guinea-bissau': 'GNB',
 'guinea': 'GIN',
 'guyana': 'GUY',
 'haiti': 'HTI',
 'honduras': 'HND',
 'hong kong': 'HKG',
 'hungary': 'HUN',
 'iceland': 'ISL',
 'india': 'IND',
 'indonesia': 'IDN',
 'iran': 'IRN',
 'iraq': 'IRQ',
 'ireland': 'IRL',
 'isle of man': 'IMN',
 'israel': 'ISR',
 'italy': 'ITA',
 'jamaica': 'JAM',
 'japan': 'JPN',
 'jersey': 'JEY',
 'jordan': 'JOR',
 'kazakhstan': 'KAZ',
 'kenya': 'KEN',
 'kiribati': 'KIR',
 'north korea': 'PRK',
 'south korea': 'KOR',
 'kosovo': 'KSV',
 'kuwait': 'KWT',
 'kyrgyzstan': 'KGZ',
 'laos': 'LAO',
 'latvia': 'LVA',
 'lebanon': 'LBN',
 'lesotho': 'LSO',
 'liberia': 'LBR',
 'libya': 'LBY',
 'liechtenstein': 'LIE',
 'lithuania': 'LTU',
 'luxembourg': 'LUX',
 'macau': 'MAC',
 'macedonia': 'MKD',
 'madagascar': 'MDG',
 'malawi': 'MWI',
 'malaysia': 'MYS',
 'maldives': 'MDV',
 'mali': 'MLI',
 'malta': 'MLT',
 'marshall islands': 'MHL',
 'mauritania': 'MRT',
 'mauritius': 'MUS',
 'mexico': 'MEX',
 'micronesia': 'FSM',
 'moldova': 'MDA',
 'monaco': 'MCO',
 'mongolia': 'MNG',
 'montenegro': 'MNE',
 'morocco': 'MAR',
 'mozambique': 'MOZ',
 'namibia': 'NAM',
 'nepal': 'NPL',
 'netherlands': 'NLD',
 'new caledonia': 'NCL',
 'new zealand': 'NZL',
 'nicaragua': 'NIC',
 'nigeria': 'NGA',
 'niger': 'NER',
 'niue': 'NIU',
 'northern mariana islands': 'MNP',
 'norway': 'NOR',
 'oman': 'OMN',
 'pakistan': 'PAK',
 'palau': 'PLW',
 'panama': 'PAN',
 'papua new guinea': 'PNG',
 'paraguay': 'PRY',
 'peru': 'PER',
 'philippines': 'PHL',
 'poland': 'POL',
 'portugal': 'PRT',
 'puerto rico': 'PRI',
 'qatar': 'QAT',
 'romania': 'ROU',
 'russia': 'RUS',
 'rwanda': 'RWA',
 'saint kitts and nevis': 'KNA',
 'saint lucia': 'LCA',
 'saint martin': 'MAF',
 'saint pierre and miquelon': 'SPM',
 'saint vincent and the grenadines': 'VCT',
 'samoa': 'WSM',
 'san marino': 'SMR',
 'sao tome and principe': 'STP',
 'saudi arabia': 'SAU',
 'senegal': 'SEN',
 'serbia': 'SRB',
 'seychelles': 'SYC',
 'sierra leone': 'SLE',
 'singapore': 'SGP',
 'sint maarten': 'SXM',
 'slovakia': 'SVK',
 'slovenia': 'SVN',
 'solomon islands': 'SLB',
 'somalia': 'SOM',
 'south africa': 'ZAF',
 'south sudan': 'SSD',
 'spain': 'ESP',
 'sri lanka': 'LKA',
 'sudan': 'SDN',
 'suriname': 'SUR',
 'swaziland': 'SWZ',
 'sweden': 'SWE',
 'switzerland': 'CHE',
 'syria': 'SYR',
 'taiwan': 'TWN',
 'tajikistan': 'TJK',
 'tanzania': 'TZA',
 'thailand': 'THA',
 'timor-leste': 'TLS',
 'togo': 'TGO',
 'tonga': 'TON',
 'trinidad and tobago': 'TTO',
 'tunisia': 'TUN',
 'turkey': 'TUR',
 'turkmenistan': 'TKM',
 'tuvalu': 'TUV',
 'uganda': 'UGA',
 'ukraine': 'UKR',
 'united arab emirates': 'ARE',
 'united kingdom': 'GBR',
 'united states': 'USA',
 'uruguay': 'URY',
 'uzbekistan': 'UZB',
 'vanuatu': 'VUT',
 'venezuela': 'VEN',
 'vietnam': 'VNM',
 'virgin islands': 'VGB',
 'west bank': 'WBG',
 'yemen': 'YEM',
 'zambia': 'ZMB',
 'zimbabwe': 'ZWE'} 
    
def geoplot(ddf):
    country_with_code, country = {}, {}
    shows_countries = ", ".join(ddf['country'].dropna()).split(", ")
    for c,v in dict(Counter(shows_countries)).items():
        code = ""
        if c.lower() in country_codes:
            code = country_codes[c.lower()]
        country_with_code[code] = v
        country[c] = v

    data = [dict(
            type = 'choropleth',
            locations = list(country_with_code.keys()),
            z = list(country_with_code.values()),
            colorscale = [[0,"rgb(100, 153, 255)"],[0.65,"rgb(125, 173, 255)"],\
                        [0.80,"rgb(153, 204, 255)"],[1,"rgb(235, 235, 235)"]],
            autocolorscale = False,
            reversescale = True,
            marker = dict(
                line = dict (
                    color = 'white',
                    width = 0.9
                ) ),
            colorbar = dict(
                autotick = False)
          ) ]

    layout = dict(
        title = '',
        geo = dict(
            showframe = False,
            showcoastlines = False,
            projection = dict(
                type = 'Mercator'
            )
        )
        
    )

    fig = dict( data=data, layout=layout )
    iplot( fig, validate=False, filename='d3-world-map' )
    return country

country_vals = geoplot(df)

In [None]:
tabs = Counter(country_vals).most_common(15)
labels = [_[0] for _ in tabs][::-1]
values = [_[1] for _ in tabs][::-1]
countplt, ax = plt.subplots(figsize = (23,7))

sns.barplot(x=labels, y=values, palette=['#99ccff'])
ax.set_title('Countries of Origins')
ax.invert_xaxis()
sns.despine()

**Country**

* Most films on Netflix are created in the United States, followed by India and the United Kingdom.

In [None]:
# Get U.S. based Netflix titles
df_usa = df[df.country.str.contains('United States', na=False)]

# Prepare columns
df_usa = df_usa[df_usa['director'].notnull()].copy()
df_usa['director'] = df_usa['director'].astype(str)
df_usa.insert(4, 'director_gender', '')
df_usa.insert(7, 'month_added', '')

g = []
ma = []
d = gender.Detector()

for i in range(len(df_usa)):
    director_1stname = str(df_usa.iloc[i][3]).split()[0]
    director_gender = d.get_gender(director_1stname)
    g.append(director_gender.replace("mostly_", ""))
    ma.append(dt.datetime.strptime(df_usa.iloc[i][8].lstrip(), '%B %d, %Y').strftime('%y-%b'))
    
df_usa['director_gender'] = g
df_usa['month_added'] = ma

df_usa = df_usa[df_usa['director_gender'] != 'andy']
df_usa = df_usa[df_usa['director_gender'] != 'unknown']



countplt, ax = plt.subplots(figsize = (25,3))
ax = sns.countplot(y = df_usa['director_gender'], palette=['#99ccff', '#ff9999'])
ax.set_ylabel('')
ax.set_xlabel('')
ax.set_title('Gender Breakdown of Filmmakers -- Bar Chart')
sns.despine()

**Genders of Directors**

* There are about 6.5x more films on Netflix directed by men versus women.

In [None]:
df_usa["director_gender"] = df_usa["director_gender"].apply(lambda x: x.replace("female", "0"))
df_usa["director_gender"] = df_usa["director_gender"].apply(lambda x: x.replace("male", "1"))
tbl = df_usa.groupby(['month_added','release_year']).size()

df_usa['director_gender'] = pd.to_numeric(df_usa['director_gender'], errors ='coerce')
data = df_usa.groupby(['month_added','release_year']).agg({'director_gender': ['mean']}).reset_index()
data.columns = ['month_added', 'release_year', 'mean_gender']
data['month_added'] = data['month_added'].apply(lambda x: x.zfill(6))

# Convert datetime to string formats
data['month_added'] = pd.to_datetime(data['month_added'], format='%y-%b')
data['release_year'] = pd.to_datetime(data['release_year'], format='%Y')

data['month_added'] = data.month_added.apply(lambda x: x.strftime("%Y-%m"))
data['release_year'] = data.release_year.apply(lambda x: x.year)

pivottable = data.pivot_table(index="release_year",columns="month_added",values="mean_gender", aggfunc="sum")
plt.figure(figsize=(28,12))
sns.set_style({"savefig.dpi": 100})

cmap = sns.diverging_palette(15, 245, as_cmap=True)
heatmap = sns.heatmap(pivottable, vmin=0, vmax=1, annot=False, cmap=cmap, linewidths=.4, cbar=False, square=True)
heatmap.invert_yaxis()
heatmap.set_title('Gender Breakdown of Filmmakers -- Heatmap', fontsize = 15)
heatmap.set_xlabel('Month Released')
heatmap.set_ylabel('Year Added')
plt.show()

**Genders of Directors**

* Over the years, the balance between male and female-directed films released on Netflix has improved.

* There is no statiscally significant correlation between how recently the film has been released and the gender of its director.

* Netflix has diversified its release of older films over the years.

* Older films are more likely to have be directed by men than women.