# üé¨ Extraction des films depuis la base TMDB

## Objectif
Ce notebook a pour but de **r√©cup√©rer les donn√©es de films depuis la base de donn√©es The Movie Database (TMDB)** √† l‚Äôaide de leur API.  
Pour chaque ann√©e consid√©r√©e, nous collectons **jusqu‚Äô√† 100 films** les mieux not√©s.  
Certaines ann√©es (par exemple dans les ann√©es 1950) peuvent contenir moins de films enregistr√©s dans la base TMDB, nous prenons donc **tous les films disponibles** pour ces ann√©es.

---

## Structure du DataFrame `df_movies`

Le DataFrame final, nomm√© **`df_movies`**, contient **une ligne par film** et les colonnes suivantes :

| Colonne | Description |
|----------|--------------|
| `title` | Nom du film |
| `year` | Ann√©e de sortie |
| `country` | Liste des pays de production |
| `genres` | Liste des genres associ√©s au film |
| `director` | Nom du r√©alisateur principal |
| `cast` | Liste des 5 principaux acteurs |
| `overview` | R√©sum√© / description du film |




In [5]:
!pip install tmdbsimple



In [None]:
import numpy as np
import requests
import pandas as pd
import time 
import tmdbsimple as tmdb

In [10]:
API_KEY = '1d48b5e24b27cd111582c21dcff9b8f5'
BASE_URL = "https://api.themoviedb.org/3"

# A function to make get requests 
def get_json(url, params=None):
    params = params or {}
    params['api_key'] = API_KEY
    response = requests.get(url, params=params)
    return response.json()

# A function to get detailed movie info by movie ID
def get_movie_details(movie_id):
    details = get_json(f"{BASE_URL}/movie/{movie_id}", {"append_to_response": "credits"})
    return {
        "title": details.get("title"),
        "year": details.get("release_date", "")[:4],
        "country": [c["name"] for c in details.get("production_countries", [])],
        "genres": [g["name"] for g in details.get("genres", [])],
        "director": next((c["name"] for c in details["credits"]["crew"] if c["job"] == "Director"), None),
        "cast": [c["name"] for c in details["credits"]["cast"][:5]],  # main 5
        "overview": details.get("overview")
    }

# A function to get top 100 movies for a given year
def get_top_movies_by_year(year, limit=100):
    movies = []
    page = 1
    while len(movies) < limit:
        data = get_json(f"{BASE_URL}/discover/movie", {
            "sort_by": "vote_average.desc",
            "vote_count.gte": 1000,  # filter out obscure ones
            "primary_release_year": year,
            "page": page
        })
        for m in data["results"]:
            movies.append(get_movie_details(m["id"]))
            if len(movies) >= limit:
                break
            time.sleep(0.2)  # be kind to API
        if page >= data["total_pages"]:
            break
        page += 1
    return movies

# ---- MAIN ----
# Getting data for top 100 movies per year from 1950 to 2024
beginning_code_time = time.time()

if __name__ == "__main__":
    all_movies = {}
    for year in range(1950, 2025):
        start_time = time.time()
        print(f"Fetching top movies for {year}...")
        all_movies[year] = get_top_movies_by_year(year)
        print(f"Year {year} done in {time.time()-start_time} seconds...")
        time.sleep(1)
    print("Done!")
    print("Total running time:", time.time() - beginning_code_time, "seconds")

Fetching top movies for 1950...
Year 1950 done in 1.6374213695526123 seconds...
Fetching top movies for 1951...
Year 1951 done in 1.6855199337005615 seconds...
Fetching top movies for 1952...
Year 1952 done in 1.6187875270843506 seconds...
Fetching top movies for 1953...
Year 1953 done in 1.9478342533111572 seconds...
Fetching top movies for 1954...
Year 1954 done in 2.7760016918182373 seconds...
Fetching top movies for 1955...
Year 1955 done in 1.9610035419464111 seconds...
Fetching top movies for 1956...
Year 1956 done in 2.3192996978759766 seconds...
Fetching top movies for 1957...
Year 1957 done in 2.4108035564422607 seconds...
Fetching top movies for 1958...
Year 1958 done in 0.875154972076416 seconds...
Fetching top movies for 1959...
Year 1959 done in 2.7306549549102783 seconds...
Fetching top movies for 1960...
Year 1960 done in 2.3437018394470215 seconds...
Fetching top movies for 1961...
Year 1961 done in 1.9600460529327393 seconds...
Fetching top movies for 1962...
Year 1962

In [11]:
# Convert into a DataFrame 
flat_data = []
for year, movies in all_movies.items():
    for movie in movies:
        flat_data.append(movie)

df_movies = pd.DataFrame(flat_data)
df_movies

Unnamed: 0,title,year,country,genres,director,cast,overview
0,Sunset Boulevard,1950,[United States of America],[Drama],Billy Wilder,"[William Holden, Gloria Swanson, Erich von Str...",A hack screenwriter writes a screenplay for a ...
1,All About Eve,1950,[United States of America],[Drama],Joseph L. Mankiewicz,"[Bette Davis, Anne Baxter, George Sanders, Cel...",From the moment she glimpses her idol at the s...
2,Rashomon,1950,[Japan],"[Crime, Drama, Mystery]",Akira Kurosawa,"[Toshir≈ç Mifune, Machiko Ky≈ç, Takashi Shimura,...",Four people recount different versions of the ...
3,Cinderella,1950,[United States of America],"[Family, Fantasy, Animation, Romance]",Clyde Geronimi,"[Ilene Woods, Eleanor Audley, Verna Felton, Cl...",Cinderella has faith her dreams of a better li...
4,Strangers on a Train,1951,[United States of America],"[Crime, Thriller]",Alfred Hitchcock,"[Farley Granger, Ruth Roman, Robert Walker, Le...",A charming psychopath tries to coerce a tennis...
...,...,...,...,...,...,...,...
3510,The Crow,2024,"[France, Germany, United Kingdom, United State...","[Action, Fantasy, Horror]",Rupert Sanders,"[Bill Skarsg√•rd, FKA twigs, Danny Huston, Jose...",Soulmates Eric and Shelly are brutally murdere...
3511,Borderlands,2024,"[United States of America, Luxembourg]","[Action, Science Fiction, Comedy]",Eli Roth,"[Cate Blanchett, Kevin Hart, Edgar Ram√≠rez, Ja...","Returning to her home planet, an infamous boun..."
3512,The Platform 2,2024,[Spain],"[Science Fiction, Horror, Thriller]",Galder Gaztelu-Urrutia,"[Milena Smit, Hovik Keuchkerian, Natalia Tena,...",After a mysterious leader imposes his law in a...
3513,Joker: Folie √† Deux,2024,[United States of America],"[Drama, Crime, Thriller]",Todd Phillips,"[Joaquin Phoenix, Lady Gaga, Brendan Gleeson, ...","While struggling with his dual identity, Arthu..."


In [17]:
df_movies.loc[df_movies["title"]=='Life of Pi']

Unnamed: 0,title,year,country,genres,director,cast,overview
2243,Life of Pi,2012,"[India, Taiwan, United Kingdom, United States ...","[Adventure, Drama]",Ang Lee,"[Suraj Sharma, Irrfan Khan, Ayush Tandon, Gaut...","The story of an Indian boy named Pi, a zookeep..."


In [18]:
# Save as a csv file
df_movies.to_csv("tmdb_movies_dataset.csv", index=False)
print("CSV saved in project folder!")

CSV saved in project folder!
