# Nettoyage des données récupérées grâce à l'API de TMDB

Dans cette partie nous allons nettoyer et clarifier les données récupérées grâce à l'API de TMDB. En effet, un certain nombre des variables ne nous intéresse pas, et d'autres variables sont formatées d'une certaine façon qui ne sera pas très pratique à exploiter telle quelle.
Nous procèderons donc en deux parties, chacune assez rapide car les données récupérées sont déjà assez bien formatées.
1. **Supprimer les variables inutile**
2. **Clarifier les variables formatées de façon trop complexe**

## Étape 1 : Suppression des variables inutiles pour notre projet

Nous allons donc commencer par garder uniquement les variables qui nous intéresse. Pour celà, nous regardons les noms des colonnes pour savoir à quoi chacune correspond, et ainsi savoir lesquelles nous intéresse ou non. Cette étape ne présente aucune difficulté particulière.

In [2]:
import pandas as pd
import re
import os
import ast

In [3]:
base = pd.read_csv('../API/MoviesPopDir.csv')
print(base.columns)

Index(['Unnamed: 0', 'adult', 'backdrop_path', 'belongs_to_collection',
       'budget', 'genres', 'homepage', 'id', 'imdb_id', 'original_language',
       'original_title', 'overview', 'popularity', 'poster_path',
       'production_companies', 'production_countries', 'release_date',
       'revenue', 'runtime', 'spoken_languages', 'status', 'tagline', 'title',
       'video', 'vote_average', 'vote_count', 'directors'],
      dtype='object')


In [4]:
base = base.drop(['Unnamed: 0','backdrop_path','homepage', 'poster_path', 'video', 'status'], axis=1)

In [5]:
base

Unnamed: 0,adult,belongs_to_collection,budget,genres,id,imdb_id,original_language,original_title,overview,popularity,...,production_countries,release_date,revenue,runtime,spoken_languages,tagline,title,vote_average,vote_count,directors
0,False,,40000000,"[{'id': 28, 'name': 'Action'}, {'id': 35, 'nam...",897087,tt15744298,en,Freelance,An ex-special forces operative takes a job to ...,2367.027,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",2023-10-05,8000000.0,108.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Retirement didn't suit him.,Freelance,6.419,192.0,['Pierre Morel']
1,False,,200000000,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",466420,tt5537002,en,Killers of the Flower Moon,When oil is discovered in 1920s Oklahoma under...,1806.389,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",2023-10-18,155500000.0,206.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Greed is an animal that hungers for blood.,Killers of the Flower Moon,7.711,1258.0,['Martin Scorsese']
2,False,"{'id': 489724, 'name': 'The Trolls Collection'...",95000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",901362,tt14362112,en,Trolls Band Together,"When Branch's brother, Floyd, is kidnapped for...",1560.713,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",2023-10-12,173800000.0,92.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",There are some new trolls on the block.,Trolls Band Together,7.204,324.0,['Walt Dohrn']
3,False,,0,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",1075794,tt5755238,en,Leo,Jaded 74-year-old lizard Leo has been stuck in...,1303.89,...,"[{'iso_3166_1': 'AU', 'name': 'Australia'}, {'...",2023-11-17,0.0,102.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Breaking out this November.,Leo,7.560,497.0,"['Robert Smigel', 'Robert Marianetti', 'David ..."
4,False,,125000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",787699,tt6166392,en,Wonka,Willy Wonka – chock-full of ideas and determin...,1256.256,...,"[{'iso_3166_1': 'GB', 'name': 'United Kingdom'...",2023-12-06,43200000.0,117.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Every good thing in this world started with a ...,Wonka,7.000,62.0,['Paul King']
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,False,"{'id': 124901, 'name': 'Hatchet Collection', '...",0,"[{'id': 35, 'name': 'Comedy'}, {'id': 27, 'nam...",472338,tt5534434,en,Victor Crowley,"Ten years ago, over forty people were brutally...",30.668,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",2017-09-12,0.0,83.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Return to his Swamp,Victor Crowley,5.700,235.0,['Adam Green']
12013,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",605116,tt7550000,en,Project Power,"An ex-soldier, a teen and a cop collide in New...",30.55,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",2020-08-14,0.0,113.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",What would you risk for five minutes of pure p...,Project Power,6.474,2666.0,"['Henry Joost', 'Ariel Schulman']"
12014,False,"{'id': 8917, 'name': 'Hellraiser Collection', ...",4000000,"[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'n...",17455,tt0337636,en,Hellraiser: Deader,"In London, after investigating crack addicted ...",30.664,...,"[{'iso_3166_1': 'RO', 'name': 'Romania'}, {'is...",2005-06-07,0.0,88.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",The Latest. Most Terrifying Evil.,Hellraiser: Deader,4.600,292.0,['Rick Bota']
12015,False,,45000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",15045,tt0396592,en,Fat Albert,Animated character Fat Albert emerges from his...,30.536,...,"[{'iso_3166_1': 'US', 'name': 'United States o...",2004-12-25,48600000.0,93.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Hey! Hey! Hey!,Fat Albert,5.100,242.0,['Joel Zwick']


## Étape 2 : Reformatage de certaines variables

Nous pouvons remarquer maintenant que certaines variables (belongs_to_collection par exemple) sont formatées de façon un peu complexes : ce sont des chaînes de caractères, ressemblant à des dictionnaires. Pour ces variables, seule une partie de cette chaîne de carcatères nous intéresse, en général ce qui suit le terme 'name'. Pour récupérer seulement cette partie là, la méthode pour laquelle nous avons opté est de passer par les Regular Expressions (le module re), ne pouvant pas utiliser les méthodes associées aux dictionnaires.

Nous avons pris garde pour cahcune des vaariables de traiter le cas où la valeur était NaN, même si la variable 'belongs_to_collection' était la seule à en avoir. 

In [6]:
type(base['belongs_to_collection'][2])

str

In [6]:
def extract_name(x):
    if pd.notna(x) and isinstance(x, str):
        match = re.search("'name':\s*'([^']+)'", str(x))
        return match.group(1) if match else None
    else:
        return None

base['collection_name'] = base['belongs_to_collection'].apply(extract_name)


In [7]:
base = base.drop(['belongs_to_collection'], axis=1)
base

Unnamed: 0,adult,budget,genres,id,imdb_id,original_language,original_title,overview,popularity,production_companies,...,release_date,revenue,runtime,spoken_languages,tagline,title,vote_average,vote_count,directors,collection_name
0,False,40000000,"[{'id': 28, 'name': 'Action'}, {'id': 35, 'nam...",897087,tt15744298,en,Freelance,An ex-special forces operative takes a job to ...,2367.027,"[{'id': 89171, 'logo_path': '/c3ttVfx0itQzk2vO...",...,2023-10-05,8000000.0,108.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Retirement didn't suit him.,Freelance,6.419,192.0,['Pierre Morel'],
1,False,200000000,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",466420,tt5537002,en,Killers of the Flower Moon,When oil is discovered in 1920s Oklahoma under...,1806.389,"[{'id': 194232, 'logo_path': '/oE7H93u8sy5vvW5...",...,2023-10-18,155500000.0,206.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Greed is an animal that hungers for blood.,Killers of the Flower Moon,7.711,1258.0,['Martin Scorsese'],
2,False,95000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",901362,tt14362112,en,Trolls Band Together,"When Branch's brother, Floyd, is kidnapped for...",1560.713,"[{'id': 521, 'logo_path': '/kP7t6RwGz2AvvTkvnI...",...,2023-10-12,173800000.0,92.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",There are some new trolls on the block.,Trolls Band Together,7.204,324.0,['Walt Dohrn'],The Trolls Collection
3,False,0,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",1075794,tt5755238,en,Leo,Jaded 74-year-old lizard Leo has been stuck in...,1303.89,"[{'id': 878, 'logo_path': '/e2AZdsQdkhN0qJhoN4...",...,2023-11-17,0.0,102.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Breaking out this November.,Leo,7.560,497.0,"['Robert Smigel', 'Robert Marianetti', 'David ...",
4,False,125000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",787699,tt6166392,en,Wonka,Willy Wonka – chock-full of ideas and determin...,1256.256,"[{'id': 174, 'logo_path': '/IuAlhI9eVC9Z8UQWOI...",...,2023-12-06,43200000.0,117.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Every good thing in this world started with a ...,Wonka,7.000,62.0,['Paul King'],
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,False,0,"[{'id': 35, 'name': 'Comedy'}, {'id': 27, 'nam...",472338,tt5534434,en,Victor Crowley,"Ten years ago, over forty people were brutally...",30.668,"[{'id': 3960, 'logo_path': None, 'name': 'Arie...",...,2017-09-12,0.0,83.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Return to his Swamp,Victor Crowley,5.700,235.0,['Adam Green'],Hatchet Collection
12013,False,0,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",605116,tt7550000,en,Project Power,"An ex-soldier, a teen and a cop collide in New...",30.55,"[{'id': 102118, 'logo_path': None, 'name': 'Sc...",...,2020-08-14,0.0,113.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",What would you risk for five minutes of pure p...,Project Power,6.474,2666.0,"['Henry Joost', 'Ariel Schulman']",
12014,False,4000000,"[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'n...",17455,tt0337636,en,Hellraiser: Deader,"In London, after investigating crack addicted ...",30.664,"[{'id': 7405, 'logo_path': '/rfnws0uY8rsNAsrLb...",...,2005-06-07,0.0,88.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",The Latest. Most Terrifying Evil.,Hellraiser: Deader,4.600,292.0,['Rick Bota'],Hellraiser Collection
12015,False,45000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",15045,tt0396592,en,Fat Albert,Animated character Fat Albert emerges from his...,30.536,"[{'id': 89719, 'logo_path': None, 'name': 'Cul...",...,2004-12-25,48600000.0,93.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Hey! Hey! Hey!,Fat Albert,5.100,242.0,['Joel Zwick'],


Pour les autres variables, les chaînes de caractères se complexifiaient un peu, puisqu'il y pouvait y avoir plusieurs accolades à la suite. Nous avons donc modifié un peu la fonction que nous avons ensuite appliquée à la colonne.

In [8]:
def extract_genres(x):
    try:
        return [re.search("'name':\s*'([^']+)'", match.group()).group(1) for match in re.finditer("{.*?'name':\s*'([^']+)'", str(x))]
    except (AttributeError, TypeError):
        return None

base['genres_list'] = base['genres'].apply(extract_genres)

In [9]:
base['countries_prod'] = base['production_countries'].apply(extract_genres)
base

Unnamed: 0,adult,budget,genres,id,imdb_id,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,tagline,title,vote_average,vote_count,directors,collection_name,genres_list,countries_prod
0,False,40000000,"[{'id': 28, 'name': 'Action'}, {'id': 35, 'nam...",897087,tt15744298,en,Freelance,An ex-special forces operative takes a job to ...,2367.027,"[{'id': 89171, 'logo_path': '/c3ttVfx0itQzk2vO...",...,108.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Retirement didn't suit him.,Freelance,6.419,192.0,['Pierre Morel'],,"[Action, Comedy]",[United States of America]
1,False,200000000,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",466420,tt5537002,en,Killers of the Flower Moon,When oil is discovered in 1920s Oklahoma under...,1806.389,"[{'id': 194232, 'logo_path': '/oE7H93u8sy5vvW5...",...,206.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Greed is an animal that hungers for blood.,Killers of the Flower Moon,7.711,1258.0,['Martin Scorsese'],,"[Crime, Drama, History]",[United States of America]
2,False,95000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",901362,tt14362112,en,Trolls Band Together,"When Branch's brother, Floyd, is kidnapped for...",1560.713,"[{'id': 521, 'logo_path': '/kP7t6RwGz2AvvTkvnI...",...,92.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",There are some new trolls on the block.,Trolls Band Together,7.204,324.0,['Walt Dohrn'],The Trolls Collection,"[Animation, Family, Music, Fantasy, Comedy]",[United States of America]
3,False,0,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",1075794,tt5755238,en,Leo,Jaded 74-year-old lizard Leo has been stuck in...,1303.89,"[{'id': 878, 'logo_path': '/e2AZdsQdkhN0qJhoN4...",...,102.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Breaking out this November.,Leo,7.560,497.0,"['Robert Smigel', 'Robert Marianetti', 'David ...",,"[Animation, Comedy, Family]","[Australia, United States of America]"
4,False,125000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",787699,tt6166392,en,Wonka,Willy Wonka – chock-full of ideas and determin...,1256.256,"[{'id': 174, 'logo_path': '/IuAlhI9eVC9Z8UQWOI...",...,117.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Every good thing in this world started with a ...,Wonka,7.000,62.0,['Paul King'],,"[Comedy, Family, Fantasy]","[United Kingdom, United States of America]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,False,0,"[{'id': 35, 'name': 'Comedy'}, {'id': 27, 'nam...",472338,tt5534434,en,Victor Crowley,"Ten years ago, over forty people were brutally...",30.668,"[{'id': 3960, 'logo_path': None, 'name': 'Arie...",...,83.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Return to his Swamp,Victor Crowley,5.700,235.0,['Adam Green'],Hatchet Collection,"[Comedy, Horror]",[United States of America]
12013,False,0,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",605116,tt7550000,en,Project Power,"An ex-soldier, a teen and a cop collide in New...",30.55,"[{'id': 102118, 'logo_path': None, 'name': 'Sc...",...,113.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",What would you risk for five minutes of pure p...,Project Power,6.474,2666.0,"['Henry Joost', 'Ariel Schulman']",,"[Action, Crime, Science Fiction]",[United States of America]
12014,False,4000000,"[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'n...",17455,tt0337636,en,Hellraiser: Deader,"In London, after investigating crack addicted ...",30.664,"[{'id': 7405, 'logo_path': '/rfnws0uY8rsNAsrLb...",...,88.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",The Latest. Most Terrifying Evil.,Hellraiser: Deader,4.600,292.0,['Rick Bota'],Hellraiser Collection,"[Horror, Mystery, Thriller]","[Romania, United States of America]"
12015,False,45000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",15045,tt0396592,en,Fat Albert,Animated character Fat Albert emerges from his...,30.536,"[{'id': 89719, 'logo_path': None, 'name': 'Cul...",...,93.0,"[{'english_name': 'English', 'iso_639_1': 'en'...",Hey! Hey! Hey!,Fat Albert,5.100,242.0,['Joel Zwick'],,"[Comedy, Family, Fantasy]",[United States of America]


In [10]:
def extract_languages(x):
    try:
        return [re.search("'english_name':\s*'([^']+)'", match.group()).group(1) for match in re.finditer("{.*?'english_name':\s*'([^']+)'", str(x))]
    except (AttributeError, TypeError):
        return None

base['languages_list'] = base['spoken_languages'].apply(extract_languages)
base

Unnamed: 0,adult,budget,genres,id,imdb_id,original_language,original_title,overview,popularity,production_companies,...,spoken_languages,tagline,title,vote_average,vote_count,directors,collection_name,genres_list,countries_prod,languages_list
0,False,40000000,"[{'id': 28, 'name': 'Action'}, {'id': 35, 'nam...",897087,tt15744298,en,Freelance,An ex-special forces operative takes a job to ...,2367.027,"[{'id': 89171, 'logo_path': '/c3ttVfx0itQzk2vO...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",Retirement didn't suit him.,Freelance,6.419,192.0,['Pierre Morel'],,"[Action, Comedy]",[United States of America],"[English, Spanish]"
1,False,200000000,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",466420,tt5537002,en,Killers of the Flower Moon,When oil is discovered in 1920s Oklahoma under...,1806.389,"[{'id': 194232, 'logo_path': '/oE7H93u8sy5vvW5...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",Greed is an animal that hungers for blood.,Killers of the Flower Moon,7.711,1258.0,['Martin Scorsese'],,"[Crime, Drama, History]",[United States of America],"[English, French, Latin]"
2,False,95000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",901362,tt14362112,en,Trolls Band Together,"When Branch's brother, Floyd, is kidnapped for...",1560.713,"[{'id': 521, 'logo_path': '/kP7t6RwGz2AvvTkvnI...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",There are some new trolls on the block.,Trolls Band Together,7.204,324.0,['Walt Dohrn'],The Trolls Collection,"[Animation, Family, Music, Fantasy, Comedy]",[United States of America],"[English, Lithuanian]"
3,False,0,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",1075794,tt5755238,en,Leo,Jaded 74-year-old lizard Leo has been stuck in...,1303.89,"[{'id': 878, 'logo_path': '/e2AZdsQdkhN0qJhoN4...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",Breaking out this November.,Leo,7.560,497.0,"['Robert Smigel', 'Robert Marianetti', 'David ...",,"[Animation, Comedy, Family]","[Australia, United States of America]",[English]
4,False,125000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",787699,tt6166392,en,Wonka,Willy Wonka – chock-full of ideas and determin...,1256.256,"[{'id': 174, 'logo_path': '/IuAlhI9eVC9Z8UQWOI...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",Every good thing in this world started with a ...,Wonka,7.000,62.0,['Paul King'],,"[Comedy, Family, Fantasy]","[United Kingdom, United States of America]",[English]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,False,0,"[{'id': 35, 'name': 'Comedy'}, {'id': 27, 'nam...",472338,tt5534434,en,Victor Crowley,"Ten years ago, over forty people were brutally...",30.668,"[{'id': 3960, 'logo_path': None, 'name': 'Arie...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",Return to his Swamp,Victor Crowley,5.700,235.0,['Adam Green'],Hatchet Collection,"[Comedy, Horror]",[United States of America],[English]
12013,False,0,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",605116,tt7550000,en,Project Power,"An ex-soldier, a teen and a cop collide in New...",30.55,"[{'id': 102118, 'logo_path': None, 'name': 'Sc...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",What would you risk for five minutes of pure p...,Project Power,6.474,2666.0,"['Henry Joost', 'Ariel Schulman']",,"[Action, Crime, Science Fiction]",[United States of America],"[English, Hindi, Portuguese]"
12014,False,4000000,"[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'n...",17455,tt0337636,en,Hellraiser: Deader,"In London, after investigating crack addicted ...",30.664,"[{'id': 7405, 'logo_path': '/rfnws0uY8rsNAsrLb...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",The Latest. Most Terrifying Evil.,Hellraiser: Deader,4.600,292.0,['Rick Bota'],Hellraiser Collection,"[Horror, Mystery, Thriller]","[Romania, United States of America]",[English]
12015,False,45000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",15045,tt0396592,en,Fat Albert,Animated character Fat Albert emerges from his...,30.536,"[{'id': 89719, 'logo_path': None, 'name': 'Cul...",...,"[{'english_name': 'English', 'iso_639_1': 'en'...",Hey! Hey! Hey!,Fat Albert,5.100,242.0,['Joel Zwick'],,"[Comedy, Family, Fantasy]",[United States of America],[English]


In [11]:
def extract_production_companies(x):
    try:
        return [(match.group('name'), match.group('origin_country')) for match in re.finditer(r"'name':\s*'(?P<name>[^']+).*?'origin_country':\s*'(?P<origin_country>[^']+)'", str(x))]
    except (AttributeError, TypeError):
        return None

base['prod_companies'] = base['production_companies'].apply(extract_production_companies)
base

Unnamed: 0,adult,budget,genres,id,imdb_id,original_language,original_title,overview,popularity,production_companies,...,tagline,title,vote_average,vote_count,directors,collection_name,genres_list,countries_prod,languages_list,prod_companies
0,False,40000000,"[{'id': 28, 'name': 'Action'}, {'id': 35, 'nam...",897087,tt15744298,en,Freelance,An ex-special forces operative takes a job to ...,2367.027,"[{'id': 89171, 'logo_path': '/c3ttVfx0itQzk2vO...",...,Retirement didn't suit him.,Freelance,6.419,192.0,['Pierre Morel'],,"[Action, Comedy]",[United States of America],"[English, Spanish]","[(Endurance Media, US), (AGC Studios, US), (Se..."
1,False,200000000,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",466420,tt5537002,en,Killers of the Flower Moon,When oil is discovered in 1920s Oklahoma under...,1806.389,"[{'id': 194232, 'logo_path': '/oE7H93u8sy5vvW5...",...,Greed is an animal that hungers for blood.,Killers of the Flower Moon,7.711,1258.0,['Martin Scorsese'],,"[Crime, Drama, History]",[United States of America],"[English, French, Latin]","[(Apple Studios, US), (Imperative Entertainmen..."
2,False,95000000,"[{'id': 16, 'name': 'Animation'}, {'id': 10751...",901362,tt14362112,en,Trolls Band Together,"When Branch's brother, Floyd, is kidnapped for...",1560.713,"[{'id': 521, 'logo_path': '/kP7t6RwGz2AvvTkvnI...",...,There are some new trolls on the block.,Trolls Band Together,7.204,324.0,['Walt Dohrn'],The Trolls Collection,"[Animation, Family, Music, Fantasy, Comedy]",[United States of America],"[English, Lithuanian]","[(DreamWorks Animation, US)]"
3,False,0,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",1075794,tt5755238,en,Leo,Jaded 74-year-old lizard Leo has been stuck in...,1303.89,"[{'id': 878, 'logo_path': '/e2AZdsQdkhN0qJhoN4...",...,Breaking out this November.,Leo,7.560,497.0,"['Robert Smigel', 'Robert Marianetti', 'David ...",,"[Animation, Comedy, Family]","[Australia, United States of America]",[English],"[(Happy Madison Productions, US), (Animal Logi..."
4,False,125000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",787699,tt6166392,en,Wonka,Willy Wonka – chock-full of ideas and determin...,1256.256,"[{'id': 174, 'logo_path': '/IuAlhI9eVC9Z8UQWOI...",...,Every good thing in this world started with a ...,Wonka,7.000,62.0,['Paul King'],,"[Comedy, Family, Fantasy]","[United Kingdom, United States of America]",[English],"[(Warner Bros. Pictures, US), (Village Roadsho..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,False,0,"[{'id': 35, 'name': 'Comedy'}, {'id': 27, 'nam...",472338,tt5534434,en,Victor Crowley,"Ten years ago, over forty people were brutally...",30.668,"[{'id': 3960, 'logo_path': None, 'name': 'Arie...",...,Return to his Swamp,Victor Crowley,5.700,235.0,['Adam Green'],Hatchet Collection,"[Comedy, Horror]",[United States of America],[English],"[(ArieScope Pictures, US)]"
12013,False,0,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",605116,tt7550000,en,Project Power,"An ex-soldier, a teen and a cop collide in New...",30.55,"[{'id': 102118, 'logo_path': None, 'name': 'Sc...",...,What would you risk for five minutes of pure p...,Project Power,6.474,2666.0,"['Henry Joost', 'Ariel Schulman']",,"[Action, Crime, Science Fiction]",[United States of America],"[English, Hindi, Portuguese]","[(Screen Arcade, US), (Supermarché, US)]"
12014,False,4000000,"[{'id': 27, 'name': 'Horror'}, {'id': 9648, 'n...",17455,tt0337636,en,Hellraiser: Deader,"In London, after investigating crack addicted ...",30.664,"[{'id': 7405, 'logo_path': '/rfnws0uY8rsNAsrLb...",...,The Latest. Most Terrifying Evil.,Hellraiser: Deader,4.600,292.0,['Rick Bota'],Hellraiser Collection,"[Horror, Mystery, Thriller]","[Romania, United States of America]",[English],"[(Dimension Films, US), (Stan Winston Producti..."
12015,False,45000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10751, '...",15045,tt0396592,en,Fat Albert,Animated character Fat Albert emerges from his...,30.536,"[{'id': 89719, 'logo_path': None, 'name': 'Cul...",...,Hey! Hey! Hey!,Fat Albert,5.100,242.0,['Joel Zwick'],,"[Comedy, Family, Fantasy]",[United States of America],[English],"[(Culver Studios, US), (20th Century Fox, US)]"


In [12]:
base = base.drop(['genres', 'production_countries', 'production_companies', 'spoken_languages'], axis = 1)
base

Unnamed: 0,adult,budget,id,imdb_id,original_language,original_title,overview,popularity,release_date,revenue,...,tagline,title,vote_average,vote_count,directors,collection_name,genres_list,countries_prod,languages_list,prod_companies
0,False,40000000,897087,tt15744298,en,Freelance,An ex-special forces operative takes a job to ...,2367.027,2023-10-05,8000000.0,...,Retirement didn't suit him.,Freelance,6.419,192.0,['Pierre Morel'],,"[Action, Comedy]",[United States of America],"[English, Spanish]","[(Endurance Media, US), (AGC Studios, US), (Se..."
1,False,200000000,466420,tt5537002,en,Killers of the Flower Moon,When oil is discovered in 1920s Oklahoma under...,1806.389,2023-10-18,155500000.0,...,Greed is an animal that hungers for blood.,Killers of the Flower Moon,7.711,1258.0,['Martin Scorsese'],,"[Crime, Drama, History]",[United States of America],"[English, French, Latin]","[(Apple Studios, US), (Imperative Entertainmen..."
2,False,95000000,901362,tt14362112,en,Trolls Band Together,"When Branch's brother, Floyd, is kidnapped for...",1560.713,2023-10-12,173800000.0,...,There are some new trolls on the block.,Trolls Band Together,7.204,324.0,['Walt Dohrn'],The Trolls Collection,"[Animation, Family, Music, Fantasy, Comedy]",[United States of America],"[English, Lithuanian]","[(DreamWorks Animation, US)]"
3,False,0,1075794,tt5755238,en,Leo,Jaded 74-year-old lizard Leo has been stuck in...,1303.89,2023-11-17,0.0,...,Breaking out this November.,Leo,7.560,497.0,"['Robert Smigel', 'Robert Marianetti', 'David ...",,"[Animation, Comedy, Family]","[Australia, United States of America]",[English],"[(Happy Madison Productions, US), (Animal Logi..."
4,False,125000000,787699,tt6166392,en,Wonka,Willy Wonka – chock-full of ideas and determin...,1256.256,2023-12-06,43200000.0,...,Every good thing in this world started with a ...,Wonka,7.000,62.0,['Paul King'],,"[Comedy, Family, Fantasy]","[United Kingdom, United States of America]",[English],"[(Warner Bros. Pictures, US), (Village Roadsho..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,False,0,472338,tt5534434,en,Victor Crowley,"Ten years ago, over forty people were brutally...",30.668,2017-09-12,0.0,...,Return to his Swamp,Victor Crowley,5.700,235.0,['Adam Green'],Hatchet Collection,"[Comedy, Horror]",[United States of America],[English],"[(ArieScope Pictures, US)]"
12013,False,0,605116,tt7550000,en,Project Power,"An ex-soldier, a teen and a cop collide in New...",30.55,2020-08-14,0.0,...,What would you risk for five minutes of pure p...,Project Power,6.474,2666.0,"['Henry Joost', 'Ariel Schulman']",,"[Action, Crime, Science Fiction]",[United States of America],"[English, Hindi, Portuguese]","[(Screen Arcade, US), (Supermarché, US)]"
12014,False,4000000,17455,tt0337636,en,Hellraiser: Deader,"In London, after investigating crack addicted ...",30.664,2005-06-07,0.0,...,The Latest. Most Terrifying Evil.,Hellraiser: Deader,4.600,292.0,['Rick Bota'],Hellraiser Collection,"[Horror, Mystery, Thriller]","[Romania, United States of America]",[English],"[(Dimension Films, US), (Stan Winston Producti..."
12015,False,45000000,15045,tt0396592,en,Fat Albert,Animated character Fat Albert emerges from his...,30.536,2004-12-25,48600000.0,...,Hey! Hey! Hey!,Fat Albert,5.100,242.0,['Joel Zwick'],,"[Comedy, Family, Fantasy]",[United States of America],[English],"[(Culver Studios, US), (20th Century Fox, US)]"


Nous obtenons donc maintenant une base avec seulement les variables qui nous intéressent, et avec des valeurs soit sous forme de chaîne de caractères, soit numérique ou soit sous forme de liste, qui seront donc facilement exploitable. Nous la convertissons en .csv, pour pouvoir ensuite la fusionner avec la base scrappée des récompenses, grâce à la variable 'title', qui correspond à la variable 'English_title' dans la base des récompenses.

In [14]:
current_directory = os.getcwd()

# Ajouter le nom du fichier 
file_name = 'Movies.csv'
file_path = os.path.join(current_directory, file_name)

base.to_csv(file_path, index = False)