<h1> Exercice </h1>

Le dataset "Titres Netflix" est une compilation exhaustive de films et de séries télévisées disponibles sur Netflix, couvrant divers aspects tels que le type de titre, le réalisateur, les acteurs, le pays de production, l'année de sortie, la classification, la durée, les genres (répertoriés dans) et une brève description. Ce dataset est essentiel pour analyser les tendances du contenu Netflix, comprendre la popularité des genres et examiner la distribution du contenu à travers différentes régions et périodes.

Les colonnes : 
- show_id : Un identifiant unique pour chaque titre.
- type : La catégorie du titre, qui peut être 'Film' ou 'Série télévisée'.
- title : Le nom du film ou de la série télévisée.
- director : Le(s) réalisateur(s) du film ou de la série télévisée. (Contient des valeurs nulles pour certaines entrées, en particulier les séries télévisées où cette information peut ne pas être applicable.)
- cast : La liste des acteurs principaux du titre. (Certaines entrées peuvent ne pas avoir cette information.)
- country : Le pays ou les pays où le film ou la série télévisée a été produit.
- date_added : La date à laquelle le titre a été ajouté à Netflix.
- release_year : L'année de sortie originale du film ou de la série télévisée.
- rating : La classification par âge du titre.
- duration : La durée du titre, en minutes pour les films et en saisons pour les séries télévisées.
- listed_in : Les genres auxquels appartient le titre.
- description : Un bref résumé du titre.

<h1> Import des données</h1>

In [8]:
import pandas as pd

netflix = pd.read_csv('netflix_titles.csv', encoding = 'latin1') #car l'encodage utf-8 ne fonctionnait pas

netflix_wip = df.copy() #création d'une copie pour commencer le data cleaning
netflix_wip.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,...,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,...,,,,,,,,,,
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,...,,,,,,,,,,
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,...,,,,,,,,,,
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,...,,,,,,,,,,
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,...,,,,,,,,,,


<h1> Data Cleaning et data modeling</h1>

In [10]:
#on vérifie si il y a des colonnes vides et si oui on les supprime.
netflix_wip.info() 
netflix_wip = netflix_wip[['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added', 'release_year', 'rating', 'duration', 'listed_in', 'description']]

netflix_wip.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8809 entries, 0 to 8808
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8809 non-null   object
 1   type          8809 non-null   object
 2   title         8809 non-null   object
 3   director      6175 non-null   object
 4   cast          7984 non-null   object
 5   country       7978 non-null   object
 6   date_added    8799 non-null   object
 7   release_year  8809 non-null   int64 
 8   rating        8805 non-null   object
 9   duration      8806 non-null   object
 10  listed_in     8809 non-null   object
 11  description   8809 non-null   object
dtypes: int64(1), object(11)
memory usage: 826.0+ KB


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [11]:
#on vérifie que la colonne show_id ne contient pas de doublons
print(len(netflix_wip['show_id']))
print(netflix_wip['show_id'].nunique())

8809
8809


In [12]:
#C'est le cas donc on en fait l'index de notre dataframe (car on a vérifié avant qu'il n'y avait pas de valeurs nulles non plus)
netflix_wip.set_index('show_id', inplace = True)
netflix_wip.head()

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [16]:
#on modifie le type de la colonne date_added pour que ce soit un datetime et non une chaine de caractères
netflix_wip['date_added'] = pd.to_datetime(netflix_wip['date_added'].str.strip(), format='%B %d, %Y')
netflix_wip.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8809 entries, s1 to s8809
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   type          8809 non-null   object        
 1   title         8809 non-null   object        
 2   director      6175 non-null   object        
 3   cast          7984 non-null   object        
 4   country       7978 non-null   object        
 5   date_added    8799 non-null   datetime64[ns]
 6   release_year  8809 non-null   int64         
 7   rating        8805 non-null   object        
 8   duration      8806 non-null   object        
 9   listed_in     8809 non-null   object        
 10  description   8809 non-null   object        
dtypes: datetime64[ns](1), int64(1), object(9)
memory usage: 825.8+ KB


In [19]:
print(netflix_wip['type'].unique()) #on vérifie que les types sont uniquement "movie" ou "tv show"
print(netflix_wip['duration'].unique()) #on constate que la nomenclature n'est pas la meme partout (parfois en minutes, parfois en nombre de saisons)

['Movie' 'TV Show']
['90 min' '2 Seasons' '1 Season' '91 min' '125 min' '9 Seasons' '104 min'
 '127 min' '4 Seasons' '67 min' '94 min' '5 Seasons' '161 min' '61 min'
 '166 min' '147 min' '103 min' '97 min' '106 min' '111 min' '3 Seasons'
 '110 min' '105 min' '96 min' '124 min' '116 min' '98 min' '23 min'
 '115 min' '122 min' '99 min' '88 min' '100 min' '6 Seasons' '102 min'
 '93 min' '95 min' '85 min' '83 min' '113 min' '13 min' '182 min' '48 min'
 '145 min' '87 min' '92 min' '80 min' '117 min' '128 min' '119 min'
 '143 min' '114 min' '118 min' '108 min' '63 min' '121 min' '142 min'
 '154 min' '120 min' '82 min' '109 min' '101 min' '86 min' '229 min'
 '76 min' '89 min' '156 min' '112 min' '107 min' '129 min' '135 min'
 '136 min' '165 min' '150 min' '133 min' '70 min' '84 min' '140 min'
 '78 min' '7 Seasons' '64 min' '59 min' '139 min' '69 min' '148 min'
 '189 min' '141 min' '130 min' '138 min' '81 min' '132 min' '10 Seasons'
 '123 min' '65 min' '68 min' '66 min' '62 min' '74 min' '131 

In [20]:
#on vérifie que la colonne duration est bien exprimée en minutes pour les films et en nombre de saisons pour les séries
netflix_wip.groupby(['type', 'duration'])['duration'].count()

type     duration 
Movie    10 min         1
         100 min      108
         101 min      116
         102 min      122
         103 min      114
                     ... 
TV Show  5 Seasons     65
         6 Seasons     33
         7 Seasons     23
         8 Seasons     17
         9 Seasons      9
Name: duration, Length: 220, dtype: int64

In [25]:
#on crée deux colonnes distinctes pour les durées des films et des séries

import numpy as np

netflix_wip['duration (movies)'] = np.nan
netflix_wip.loc[netflix_wip['type'] == 'Movie', 'duration (movies)'] = netflix_wip.loc[netflix_wip['type'] == 'Movie', 'duration'].str.replace('min', '').astype(float)

netflix_wip['seasons (TV shows)'] = np.nan
netflix_wip.loc[netflix_wip['type'] == 'TV Show', 'seasons (TV shows)'] = netflix_wip.loc[netflix_wip['type'] == 'TV Show', 'duration'].str.replace('Seasons', '').str.replace('Season', '').astype(float)

netflix_wip.head()

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration (movies),seasons (TV shows)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1.0
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2.0


In [26]:
#puis on supprime la colonne duration qui n'est maintenant plus utile

netflix_wip.drop(columns = 'duration', inplace = True)
netflix_wip.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8809 entries, s1 to s8809
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   type                8809 non-null   object        
 1   title               8809 non-null   object        
 2   director            6175 non-null   object        
 3   cast                7984 non-null   object        
 4   country             7978 non-null   object        
 5   date_added          8799 non-null   datetime64[ns]
 6   release_year        8809 non-null   int64         
 7   rating              8805 non-null   object        
 8   listed_in           8809 non-null   object        
 9   description         8809 non-null   object        
 10  duration (movies)   6129 non-null   float64       
 11  seasons (TV shows)  2677 non-null   float64       
dtypes: datetime64[ns](1), float64(2), int64(1), object(8)
memory usage: 1.1+ MB


In [30]:
netflix_wip['country'].unique() #on constate que certains pays sont en fait des listes de pays séparés par des virgules

array(['United States', 'South Africa', nan, 'India',
       'United States, Ghana, Burkina Faso, United Kingdom, Germany, Ethiopia',
       'United Kingdom', 'Germany, Czech Republic', 'Mexico', 'Turkey',
       'Australia', 'United States, India, France', 'Finland',
       'China, Canada, United States',
       'South Africa, United States, Japan', 'Nigeria', 'Japan',
       'Spain, United States', 'France', 'Belgium',
       'United Kingdom, United States', 'United States, United Kingdom',
       'France, United States', 'South Korea', 'Spain',
       'United States, Singapore', 'United Kingdom, Australia, France',
       'United Kingdom, Australia, France, United States',
       'United States, Canada', 'Germany, United States',
       'South Africa, United States', 'United States, Mexico',
       'United States, Italy, France, Japan',
       'United States, Italy, Romania, United Kingdom',
       'Australia, United States', 'Argentina, Venezuela',
       'United States, United Kin

In [33]:
#gestion des colonnes multi-valeurs : country
netflix_wip['countries'] = netflix_wip['country'].str.split(', ')

countries_exploded = netflix_wip.explode('countries')
countries_exploded = countries_exploded['countries']
countries_exploded.to_csv('countries_exploded.csv')
countries_exploded

show_id
s1       United States
s2        South Africa
s3                 NaN
s4                 NaN
s5               India
             ...      
s8805    United States
s8806    United States
s8807            India
s8808      South Korea
s8809    United States
Name: countries, Length: 10847, dtype: object

In [35]:
#gestion des colonnes multi-valeurs : listed in (categories)
netflix_wip['listed_in_list'] = netflix_wip['listed_in'].str.split(', ')

categories_exploded = netflix_wip.explode('listed_in_list')
categories_exploded = categories_exploded['listed_in_list']
categories_exploded.to_csv('categories_exploded.csv')
categories_exploded.unique() #on a bien toutes les catégories séparées

array(['Documentaries', 'International TV Shows', 'TV Dramas',
       'TV Mysteries', 'Crime TV Shows', 'TV Action & Adventure',
       'Docuseries', 'Reality TV', 'Romantic TV Shows', 'TV Comedies',
       'TV Horror', 'Children & Family Movies', 'Dramas',
       'Independent Movies', 'International Movies', 'British TV Shows',
       'Comedies', 'Spanish-Language TV Shows', 'Thrillers',
       'Romantic Movies', 'Music & Musicals', 'Horror Movies',
       'Sci-Fi & Fantasy', 'TV Thrillers', "Kids' TV",
       'Action & Adventure', 'TV Sci-Fi & Fantasy', 'Classic Movies',
       'Anime Features', 'Sports Movies', 'Anime Series',
       'Korean TV Shows', 'Science & Nature TV', 'Teen TV Shows',
       'Cult Movies', 'TV Shows', 'Faith & Spirituality', 'LGBTQ Movies',
       'Stand-Up Comedy', 'Movies', 'Stand-Up Comedy & Talk Shows',
       'Classic & Cult TV', 'Sci-fi', 'Horror', 'Action', 'Drama',
       'Romance', 'Thriller'], dtype=object)

In [37]:
#gestion des colonnes multi-valeurs : cast
netflix_wip['cast_list'] = netflix_wip['cast'].str.split(', ')
cast_list = netflix_wip.explode('cast_list')
cast_list = cast_list['cast_list']
cast_list.to_csv('cast_list.csv')
cast_list.unique()

array([nan, 'Ama Qamata', 'Khosi Ngema', ..., 'Petr Drozda', 'John Comer',
       'Benedetta Degli Innocenti'], shape=(36467,), dtype=object)

In [38]:
#on supprime les colonnes transformées car on est les a enregistrées dans des dataframes à part
netflix_wip.drop(columns = ['countries', 'listed_in_list', 'cast_list'], inplace = True)
netflix_wip.head()

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),seasons (TV shows)
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1.0
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2.0


In [40]:
#création de colonnes temporelles pour avoir l'année d'ajout, le mois d'ajout et le jour de la semaine d'ajout
netflix_wip['year added'] = netflix_wip['date_added'].dt.year
netflix_wip['month added'] = netflix_wip['date_added'].dt.month
netflix_wip['day of the week added'] = netflix_wip['date_added'].dt.dayofweek
netflix_wip.head()

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),seasons (TV shows),year added,month added,day of the week added
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,,2021.0,9.0,5.0
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",,2.0,2021.0,9.0,4.0
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0,2021.0,9.0,4.0
s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",,1.0,2021.0,9.0,4.0
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,,2.0,2021.0,9.0,4.0


<h1> Analyse des données </h1>

In [41]:
# Combien de "shows" sont présents dans ce dataset ?
len(netflix_wip)

8809

In [44]:
# Quelle est la répartition entre les types 'Movie' et 'TV Show' ?
netflix_wip['type'].value_counts()

type
Movie      6132
TV Show    2677
Name: count, dtype: int64

In [45]:
# Quelle est la répartition des ajouts en fonction de l'année ?
netflix_wip['year added'].value_counts()

year added
2019.0    2016
2020.0    1879
2018.0    1649
2021.0    1498
2017.0    1188
2016.0     429
2015.0      82
2014.0      24
2011.0      13
2013.0      11
2012.0       3
2009.0       2
2008.0       2
2024.0       2
2010.0       1
Name: count, dtype: int64

In [49]:
# Quel est le top 5 des catégories de shows les plus ajoutées ?
categories_exploded.value_counts().head()

listed_in_list
International Movies      2752
Dramas                    2427
Comedies                  1674
International TV Shows    1351
Documentaries              869
Name: count, dtype: int64

In [62]:
# Quel est le top 5 des comédiens les plus plébiscités aux États-Unis ?
merge_cast = pd.merge(cast_list, netflix_wip, how = 'inner', left_index = True, right_index = True)
merge_cast[merge_cast['country'].str.contains('United States', na = False)]['cast_list'].value_counts().head()

cast_list
Samuel L. Jackson    22
Tara Strong          22
Fred Tatasciore      21
Adam Sandler         20
Nicolas Cage         19
Name: count, dtype: int64

In [63]:
# Quelle est la répartition des ajouts en fonction du jour de la semaine ?
netflix_wip['day of the week added'].value_counts()

day of the week added
4.0    2500
3.0    1396
2.0    1288
1.0    1197
0.0     851
5.0     816
6.0     751
Name: count, dtype: int64

In [69]:
# Dans quel pays sont produits le plus de documentaires ?
merge_countries = pd.merge(countries_exploded, netflix_wip, how = 'inner', left_index = True, right_index = True)
merge_countries[merge_countries['listed_in'].str.contains('Documentaries', na=False)]['country'].value_counts().head()

country
United States     411
United Kingdom     84
France             24
Canada             21
India              19
Name: count, dtype: int64

In [71]:
# En moyenne, combien de saisons ont les séries ?
netflix_wip['seasons (TV shows)'].mean()

np.float64(1.7646619350018677)

In [72]:
# Quelle est la distribution des films en fonction de leur durée (quartiles) ?
netflix_wip['duration (movies)'].describe()

count    6129.000000
mean       99.578887
std        28.288598
min         3.000000
25%        87.000000
50%        98.000000
75%       114.000000
max       312.000000
Name: duration (movies), dtype: float64

In [77]:
# Combien de shows ont pour thématique la drogue (présence du mot "drug" dans la description) ?
drug_show = netflix_wip[netflix_wip['description'].str.contains('drug', na=False)]
print(len(drug_show))
drug_show.head(10)

158


Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration (movies),seasons (TV shows),year added,month added,day of the week added
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,,1.0,2021.0,9.0,4.0
s18,TV Show,Falsa identidad,,"Luis Ernesto Franco, Camila Sodi, Sergio Goyri...",Mexico,2021-09-22,2020,TV-MA,"Crime TV Shows, Spanish-Language TV Shows, TV ...",Strangers Diego and Isabel flee their home in ...,,2.0,2021.0,9.0,2.0
s37,Movie,The Stronghold,CÃ©dric Jimenez,"Gilles Lellouche, Karim Leklou, FranÃ§ois Civi...",,2021-09-17,2021,TV-MA,"Action & Adventure, Dramas, International Movies","Tired of the small-time grind, three Marseille...",105.0,,2021.0,9.0,4.0
s135,Movie,Clear and Present Danger,Phillip Noyce,"Harrison Ford, Willem Dafoe, Anne Archer, Joaq...","United States, Mexico",2021-09-01,1994,PG-13,"Action & Adventure, Dramas","When the president's friend is murdered, CIA D...",142.0,,2021.0,9.0,2.0
s151,Movie,In Too Deep,Michael Rymer,"Omar Epps, LL Cool J, Nia Long, Stanley Tucci,...",United States,2021-09-01,1999,R,Thrillers,Rookie cop Jeffrey Cole poses as a drug dealer...,97.0,,2021.0,9.0,2.0
s311,TV Show,Cocaine Cowboys: The Kings of Miami,Billy Corben,,United States,2021-08-04,2021,TV-MA,"Crime TV Shows, Docuseries",Two childhood friends go from high school drop...,,1.0,2021.0,8.0,2.0
s319,Movie,Shiny_Flakes: The Teenage Drug Lord,"Eva MÃ¼ller, Michael Schmitt",Maximilian Schmidt,,2021-08-03,2021,TV-MA,"Documentaries, International Movies",Max S. reveals how he built a drug empire from...,97.0,,2021.0,8.0,1.0
s347,Movie,Pineapple Express,David Gordon Green,"Seth Rogen, James Franco, Danny McBride, Kevin...",United States,2021-08-01,2008,R,"Action & Adventure, Comedies","After witnessing a murder, a perpetually stone...",112.0,,2021.0,8.0,6.0
s468,Movie,Private Network: Who Killed Manuel BuendÃ­a?,Manuel AlcalÃ¡,Daniel GimÃ©nez Cacho,,2021-07-14,2021,TV-MA,"Documentaries, International Movies",A deep dive into the work of renowned Mexican ...,100.0,,2021.0,7.0,2.0
s497,Movie,Brick Mansions,Camille Delamarre,"Paul Walker, David Belle, RZA, Gouchy Boy, Cat...","France, Canada, United States, Spain",2021-07-07,2014,PG-13,"Action & Adventure, International Movies",An undercover police detective partners with a...,90.0,,2021.0,7.0,2.0
