**SAAD, Maissa, 21517105.**

**AKZOUN, Hafsa, 21511721.**

**Which LLM(s) did you use for this work?**

Chatgpt et Claude AI

# PROGRES 2025 - Mini-Projet 2
# API Web

Fabien Mathieu - fabien.mathieu@lip6.fr

Sébastien Tixeuil - Sebastien.Tixeuil@lip6.fr

The purpose of this mini-project is to work with the *Internet Movie DataBase* (IMDB) and a Python Web framework. It will involve:

- Retrieve and manipulate datasets
- Build an API to perform various tasks on the data
- Build a website that will use the API above

# Rules

1. Cite your sources
2. One file to rule them all
3. Explain
4. Execute your code


https://github.com/balouf/progres/blob/main/rules.ipynb

# The IMDB dataset

[IMDB](https://www.imdb.com) allows to retrieve a part of its dataset for any non-commercial purpose. The available data and the formatting convention is described here: https://developer.imdb.com/non-commercial-datasets/

We are especially interested in the data from the following files:
- https://datasets.imdbws.com/title.principals.tsv.gz
- https://datasets.imdbws.com/name.basics.tsv.gz
- https://datasets.imdbws.com/title.basics.tsv.gz

**Important notes**:
- If you see *Your answer here*, that means something is expected from you.
- To help you, the start and/or the end of a possible solution is sometimes given.
- The content of IMDB is refreshed regularly. That means that some of the results you will compute, like the number of movies, will vary with time. This should not surprise you.

## Exercise 1: Download

Write a `download_imdb` function inspired by the `download` function seen in course, with the following modifications:
- `download_imdb` will have one single argument, the name of the file to retrieve. Server location of the file is assumed to be https://datasets.imdbws.com/
- If the file already exists, print a message telling that it exists and do nothing. You can use the `pathlib` module for that.
 The data files are quite big, so you will specify a directory `data_dir` where the data files will be stored/read.

Prompt:
>I would like a Python function called download_imdb which takes the name of a file that it should download and file from IMDB  dataset server at https://datasets.imdbws.com/. The function should accept an argument (the file name) and save the file into a folder called data_dir and use the pathlib module to check if that file already exists. If the file is present, then the function should print a message and return, otherwise it should download the file with requests. Session. Finally, use this function to download the files title.principals.tsv.gz, name.basics.tsv.gz, and title.basics.tsv.gz.


In [1]:
from pathlib import Path
from requests import Session

# Base URL where IMDB datasets are hosted
base_url = "https://datasets.imdbws.com/"

# Directory where the files will be stored
data_dir = Path.home() / "Downloads"

def download_imdb(file):
    """
    Download a file from the IMDB dataset website if it does not already exist.
    """
    # Ensure the data directory exists
    data_dir.mkdir(parents=True, exist_ok=True)

    # Full path of the file to be downloaded
    file_path = data_dir / file

    # Check if the file already exists
    if file_path.exists():
        print(f"{file} already exists.")
        return

    # Create a session to download the file
    with Session() as session:
        url = base_url + file
        response = session.get(url, stream=True)
        response.raise_for_status()  # Raise error if download fails

        # Write the downloaded content to the file
        with open(file_path, "wb") as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

    print(f"{file} downloaded successfully.")

In [2]:
# List of IMDB files to download
files = [
    'title.principals.tsv.gz',
    'name.basics.tsv.gz',
    'title.basics.tsv.gz'
]

# Download each file
for file in files:
    download_imdb(file)

title.principals.tsv.gz already exists.
name.basics.tsv.gz already exists.
title.basics.tsv.gz already exists.


### Explanation:
The download_imdb function automatically downloads datasets from IMDb’s official public repository. To implement this function, we used: 
- Pathlib.Path classes for determining where our local copy of the IMDB data will go on disk.
- We call the mkdir() function to make sure that we create the download folder before we try to save any files.
- To get a fully qualified path to the file, we combine the path to the directory with the name of the file.
- Before downloding the file, we check  if it already exists through exists() to prevent downloading it again.
- If the file doesn't exist, we create a requests.Session and download it from the IMDB web site.
- When the file is downloaded, it will be written out as a series of chunks in binary mode to allow efficient downloading of large files.

Finally, we provide a console message indicating either the file already existed or it was downloaded successfully.

## Exercise 2: Explore

- What is the size of the different files you retrieved? You can use Python or a file explorer, as you prefer.

Your answer here.

Prompt
> Write a simple Python script that displays the size of several IMDB dataset files stored in a local directory. The script should use the pathlib module to access the files and print their sizes in megabytes

In [3]:
for file in files:
    file_path = data_dir / file
    size_mb = file_path.stat().st_size / (1024 * 1024)
    print(f"{file}: {size_mb:.2f} MB")

title.principals.tsv.gz: 710.77 MB
name.basics.tsv.gz: 282.27 MB
title.basics.tsv.gz: 205.22 MB


### Explanation

- This script uses the pathlib module to access files.
- stat().st_size returns the file size in bytes
- The size is converted from bytes to megabytes
- The script prints the size of each IMDB file

As explained in https://developer.imdb.com/non-commercial-datasets/:
- the data is stored as `tsv`, which means each text line represents a row.
- A [gzip compression](https://docs.python.org/3/library/gzip.html) is used to reduce the size of the data on the hard drive.

Large compressed files should not be uncompressed on your hard drive or fully loaded in memory.

The Python [gzip module](https://docs.python.org/3/library/gzip.html) is designed so you can open a compressed file as if it was already uncompressed. For example, the following code reads 666 lines from `title.basics` and print the last line read.

In [4]:
import gzip
with gzip.open(data_dir / 'title.basics.tsv.gz', 'rt', encoding='utf8') as f:
    for _ in range(666):
        l = f.readline()
print(l)

tt0000671	short	Desdemona	Desdemona	0	1908	\N	\N	Drama,Short



- Write a function that read the 4 first lines of a compressed tsv file. Each line read should be converted into a list of elements and printed.

Your answer here.

Prompt

> Write a Python function called explore(name) that reads the first four lines of a compressed TSV file (.tsv.gz).
The function should use the gzip module to read the file without fully uncompressing it.
Each line must be split using tabulation and converted into a list of elements, then printed

In [5]:
def explore(name):
    file_path = data_dir / name

    with gzip.open(file_path, 'rt', encoding='utf8') as f:
        for _ in range(4):
            line = f.readline()
            elements = line.strip().split('\t')
            print(elements)

### Explanation
- We wrote a function called explore that allows us to read the first four lines of a compressed IMDB TSV file. The function uses the gzip module, which makes it possible to read a compressed file as if it were already uncompressed, without loading the entire file into memory. This is important because IMDB datasets are very large. Each line read from the file represents one row of data, and we split the line using tab characters to obtain a list of elements. Finally, each list is printed so we can easily inspect the structure and content of the dataset

In [6]:
for file in files:
    print(f"First lines of {file}:")
    explore(file)

First lines of title.principals.tsv.gz:
['tconst', 'ordering', 'nconst', 'category', 'job', 'characters']
['tt0000001', '1', 'nm1588970', 'self', '\\N', '["Self"]']
['tt0000001', '2', 'nm0005690', 'director', '\\N', '\\N']
['tt0000001', '3', 'nm0005690', 'producer', 'producer', '\\N']
First lines of name.basics.tsv.gz:
['nconst', 'primaryName', 'birthYear', 'deathYear', 'primaryProfession', 'knownForTitles']
['nm0000001', 'Fred Astaire', '1899', '1987', 'actor,miscellaneous,producer', 'tt0072308,tt0050419,tt0027125,tt0025164']
['nm0000002', 'Lauren Bacall', '1924', '2014', 'actress,miscellaneous,soundtrack', 'tt0037382,tt0075213,tt0038355,tt0117057']
['nm0000003', 'Brigitte Bardot', '1934', '\\N', 'actress,music_department,producer', 'tt0057345,tt0049189,tt0056404,tt0054452']
First lines of title.basics.tsv.gz:
['tconst', 'titleType', 'primaryTitle', 'originalTitle', 'isAdult', 'startYear', 'endYear', 'runtimeMinutes', 'genres']
['tt0000001', 'short', 'Carmencita', 'Carmencita', '0', '

- How many movie entries are present in the retrieved database?
- How many people entries?

Your answer here.

Prompt
>Explain how to count the number of movie entries and people entries in the IMDB dataset using Python.
Then write simple Python code that reads compressed TSV files line by line using the gzip module.
The code should count movie entries from title.basics.tsv.gz and people entries from name.basics.tsv.gz

In [7]:
import gzip

# Count movie entries
movie_count = 0

with gzip.open(data_dir / "title.basics.tsv.gz", "rt", encoding="utf8") as f:
    header = f.readline()  # skip header
    for line in f:
        fields = line.strip().split('\t')
        title_type = fields[1]
        if title_type == "movie":
            movie_count += 1

print("Number of movie entries:", movie_count)


# Count people entries
people_count = 0

with gzip.open(data_dir / "name.basics.tsv.gz", "rt", encoding="utf8") as f:
    header = f.readline()  # skip header
    for _ in f:
        people_count += 1

print("Number of people entries:", people_count)


Number of movie entries: 734730
Number of people entries: 14953819


### Explanation
- To answer these questions, we processed the IMDB datasets directly from their compressed TSV files without fully loading them into memory. For movie entries, we read the title.basics.tsv.gz file line by line and skipped the header. Each line represents one title, and we checked the titleType field to count only entries classified as movies. For people entries, we read the name.basics.tsv.gz file and counted the number of lines after the header, since each line corresponds to one person

## Exercise 3: Extract

We want to study the relations between actors and movies. In particular, we focus on:
- Actual movies (e.g. not TV shows or short movies), where the movie year is known and at least one actor/actress is credited.
- Actors that are credited in at least one actual movie.

To start with, build a [Python set](https://docs.python.org/3/tutorial/datastructures.html#sets) that contains all movie ids (`tconst`) such that:
- The type of movie (`titleType`) is `movie`;
- The year (`startYear`) exists, i.e. is an integer.

How many movies have you referenced in the set?

Prompt:
> Using the title.basics.tsv.gz IMDb dataset, build a Python set called true_movies that contains the identifiers (tconst) of all actual movies. A movie should be included only if its titleType is "movie" and its release year (startYear) is known, meaning it is not missing (\N). The file should be read efficiently since it is large. Finally, compute and display the number of movie identifiers stored in the set.

In [8]:
import gzip

# Path to the title.basics IMDb file
file_path = data_dir / "title.basics.tsv.gz"

true_movies = set()

# Open the compressed TSV file
with gzip.open(file_path, 'rt', encoding='utf8', errors='ignore') as f:
    header = f.readline().strip().split('\t')

    # Get indices of useful columns
    tconst_idx = header.index("tconst")
    title_type_idx = header.index("titleType")
    start_year_idx = header.index("startYear")

    # Read file line by line
    for line in f:
        fields = line.strip().split('\t')

        title_type = fields[title_type_idx]
        start_year = fields[start_year_idx]

        # Keep only movies with a known year
        if title_type == "movie" and start_year != "\\N":
            true_movies.add(fields[tconst_idx])

In [9]:
len(true_movies)

626042

### Explanation
- We use the file title.basics.tsv.gz, which contains metadata about all IMDb titles.
- Since the file is compressed, we open it using the gzip module in text mode.
- We create an empty Python set called true_movies to store unique movie identifiers (tconst).
- We read the header to locate the columns tconst, titleType, and startYear.
- For each line in the file, we keep only entries where:
    - titleType is equal to "movie", and startYear is not equal to \N, meaning the year is known.
    - When both conditions are satisfied, we add the movie ID to the set.
- Finally, we use len(true_movies) to count how many valid movies are included.

Now we want to build two lists, `movies` and `actors`:

- Each element of `movies` should represent a movie, each element of `actors` an actor or actress;
- A movie is represented by a list of three elements:
  - The original name of the movie (`str`),
  - The principal actors of the movie, stored as a list whose elements are integers that represent the index (position) of the actors in the list `actors`,
  - The movie year, `startYear` (`int`);
- An actor/actress is represented by a list of two elements:
  - The name of the person (`str`),
  - The movies the person acted in, stored as a list whose elements are integers that represent the index (position) of the movies in the list `movies`.
  

Build these two lists.

A possible way to do this (this is a suggestion, not an order):
- Initiate `movies` and `actors` as empty lists;
- Create two auxiliary dictionary that will associate to each movie id (`tconst`) and person id (`nconst`) their position in the list;
- Read the file `title.principals.tsv.gz` line by line:
  - Ignore any line where the movie is not in the set `true_movies` or the `category` of the relation is not `actor` or `actress`,
  - If the movie id `tconst` is not in the movie auxiliary index, append an empty movie to `movies` (`["", [], 0]`) and update the movie auxiliary index with an entry for `tconst`,
  - If the actor id `nconst` is not in the actor auxiliary index, append an empty actor to `actors` (`["", []]`) and update the actor auxiliary index with an entry for `nconst`,
  - Append the movie index (not `tconst`!) to the movies of the corresponding actor in `actors`,
  - Append the actor index (not `nconst`!) to the actors of the corresponding movie in `movies`;
- There can be a few undesired duplicates, e.g. some actors can have multiple entries for the same movies. For each actor, remove possible duplicates in the list of movies, and for each movie, remove possible duplicates in the list of actors;
- Using `title.basics.tsv.gz` and your movie auxiliary index, populate each movie in `movies` with its correct name (`str`) and year (`int`);
- Using `name.basics.tsv.gz` and your actor auxiliary index, populate each actor in `movies` with her correct name.

### Prompt:
> Using the IMDb datasets, we want to create two Python lists called movies and actors in order to represent the relationships between movies and actors. Only actual movies with a known release year (stored in the true_movies set) and relations where the category is actor or actress should be considered. Each movie must store its title, release year, and the indices of its actors, while each actor must store their name and the indices of the movies they acted in. To efficiently build these structures, auxiliary dictionaries should be used to map IMDb identifiers (tconst and nconst) to their positions in the lists. The information must be extracted from the files title.principals.tsv.gz, title.basics.tsv.gz, and name.basics.tsv.gz.

In [10]:
# Initialize structures
movie_id_to_index = {}
actor_id_to_index = {}
movies = []
actors = []

# STEP 1: Build relations from title.principals.tsv.gz
with gzip.open(data_dir / "title.principals.tsv.gz", "rt", encoding="utf8", errors="ignore") as f:
    header = f.readline().strip().split("\t")

    tconst_idx = header.index("tconst")
    nconst_idx = header.index("nconst")
    category_idx = header.index("category")

    for line in f:
        fields = line.strip().split("\t")

        tconst = fields[tconst_idx]
        nconst = fields[nconst_idx]
        category = fields[category_idx]

        # Keep only valid movies and actors/actresses
        if tconst not in true_movies:
            continue
        if category not in {"actor", "actress"}:
            continue

        # Add movie if new
        if tconst not in movie_id_to_index:
            movie_id_to_index[tconst] = len(movies)
            movies.append(["", [], 0])  # [title, actors, year]

        # Add actor if new
        if nconst not in actor_id_to_index:
            actor_id_to_index[nconst] = len(actors)
            actors.append(["", []])  # [name, movies]

        m_idx = movie_id_to_index[tconst]
        a_idx = actor_id_to_index[nconst]

        movies[m_idx][1].append(a_idx)
        actors[a_idx][1].append(m_idx)

# STEP 2: Remove duplicate references
for movie in movies:
    movie[1] = list(set(movie[1]))

for actor in actors:
    actor[1] = list(set(actor[1]))

# STEP 3: Populate movie titles and years
with gzip.open(data_dir / "title.basics.tsv.gz", "rt", encoding="utf8", errors="ignore") as f:
    header = f.readline().strip().split("\t")

    tconst_idx = header.index("tconst")
    title_idx = header.index("originalTitle")
    year_idx = header.index("startYear")

    for line in f:
        fields = line.strip().split("\t")
        tconst = fields[tconst_idx]

        if tconst in movie_id_to_index:
            idx = movie_id_to_index[tconst]
            movies[idx][0] = fields[title_idx]
            movies[idx][2] = int(fields[year_idx])

# STEP 4: Populate actor names
with gzip.open(data_dir / "name.basics.tsv.gz", "rt", encoding="utf8", errors="ignore") as f:
    header = f.readline().strip().split("\t")

    nconst_idx = header.index("nconst")
    name_idx = header.index("primaryName")

    for line in f:
        fields = line.strip().split("\t")
        nconst = fields[nconst_idx]

        if nconst in actor_id_to_index:
            idx = actor_id_to_index[nconst]
            actors[idx][0] = fields[name_idx]

### Explanation:
1. Initialization of data structures:
We start by creating two empty lists, movies and actors, which will store our final data.
We also create two dictionaries, movie_id_to_index and actor_id_to_index, to map IMDb IDs (tconst, nconst) to their positions in the lists. This allows fast access using indices instead of IDs.

2. Reading title.principals.tsv.gz to build relations:
We read the file line by line using gzip.open, 
For each line, we extract the movie ID (tconst), actor ID (nconst), and the category of the person.
We keep only movies that belong to the true_movies set and people whose category is actor or actress

3. Creating movies and actors only once:
If a movie ID is not already in movie_id_to_index, we add a new empty movie entry ["", [], 0].
If an actor ID is not already in actor_id_to_index, we add a new empty actor entry ["", []].
This avoids creating duplicates.

4. Building the links between movies and actors:
For each valid relation:
We add the actor index to the movie’s actor list
We add the movie index to the actor’s movie list
This creates a bidirectional relationship using list indices.

5. Removing duplicate references:
Some actors may appear multiple times for the same movie.
We remove duplicates by converting the lists of indices into sets and back into lists.

6. Adding movie titles and years
We read title.basics.tsv.gz and, using the movie index dictionary, fill in the original movie title and the release year (startYear)

7. Adding actor names 
Finally, we read name.basics.tsv.gz and use the actor index dictionary to fill in the real names of the actors.

Manually check that your files are correct. For example, try to get the name and year of the movies Michel Blanc played in, or the actors of the first Harry Potter movie.

Your answer here (if everything went well, you just need to execute the two cells below).

In [11]:
', '.join([f"{movies[i][0]} ({movies[i][2]})" for i in [a for a in actors if a[0]=='Michel Blanc'][0][1]])

"La fille du RER (2009), Et soudain, tout le monde me manque (2011), The Hundred-Foot Journey (2014), La meilleure façon de marcher (1976), Le routard (2025), Marie-Line et son juge (2023), Les grands ducs (1996), The Favour, the Watch and the Very Big Fish (1991), Je vous trouve très beau (2005), Papy fait de la résistance (1983), Rien ne va plus (1979), Nemo (1984), Cause toujours... tu m'intéresses! (1979), Circulez y a rien à voir! (1983), Uranus (1990), Embrassez qui vous voudrez (2002), Les nouvelles aventures d'Aladin (2015), Musée haut, musée bas (2008), Le cheval d'orgueil (1980), Nos 18 ans (2008), Les témoins (2007), Voyez comme on danse (2018), Ma femme s'appelle reviens (1982), Prospero's Books (1991), Chambre à part (1989), Madame Edouard (2004), Toxic Affair (1993), Le beaujolais nouveau est arrivé (1978), Vous n'aurez pas l'Alsace et la Lorraine (1977), La cache (2025), Viens chez moi, j'habite chez une copine (1981), Les petites victoires (2022), L'exercice de l'État (

In [12]:
', '.join([actors[i][0] for i in [m for m in movies if m[0].startswith('Harry Potter')][0][1]])

'Robbie Coltrane, Richard Harris, Fiona Shaw, Richard Griffiths, Rupert Grint, Emma Watson, Saunders Triplets, Harry Melling, Daniel Radcliffe, Maggie Smith'

When you have successfully reached this point of the project, you can save the two lists `movies` and `actors` as compressed json files using the code below:

In [13]:
import gzip
import json

with gzip.open(data_dir / 'movies.json.gz', 'wt', encoding='utf8') as f:
    json.dump(movies, f)
with gzip.open(data_dir / 'actors.json.gz', 'wt', encoding='utf8') as f:
    json.dump(actors, f)

### Explanation:
This code will take our saved actors and movies and convert them to compressed JSON files.
The json.dump function is used to serialize the Python lists into JSON format, while gzip.open compresses the files to reduce their size on disk.
By saving the data in this format, we can restore our processed datasets quickly later without having to recalculate the relationships from the original larger versions of the IMDb files.

After your files have been saved, you do not need to re-execute all of the above each time your restart your notebook. Instead, you just need to reload `movies` and `actors` using the code below:

In [14]:
import gzip
import json

with gzip.open(data_dir / 'movies.json.gz', 'rt', encoding='utf8') as f:
    movies = json.load(f)
with gzip.open(data_dir / 'actors.json.gz', 'rt', encoding='utf8') as f:
    actors = json.load(f)    

### Explanation:
This code reloads the previously saved movies and actors lists from compressed JSON files. By doing this, we can continue working with the datasets directly in memory without reprocessing the original IMDb TSV files.

**Important remark:** in what follows, you will have to build functions that use the two lists a lot. You should NOT reload the lists each time you call a function. Instead, ensure that the two lists are loaded in memory and use them directly.

## Exercise 4: Explore again (now on the curated dataset)

- How many actors do you have in the new dataset? How many movies?
- In average, in how many movies played an actor?
- In average, how many actors play in a movie?
- What is the name of the actor that played in the most movies? How many movies did he feature in?
- What is the oldest movie in the DB?

Your answer here.

- How many actors do you have in the new dataset? How many movies?

Prompt
>Using a curated IMDb dataset stored in two Python lists called movies and actors, write Python code to determine how many movies and how many actors are present in the dataset.
The solution should be simple and suitable for a Master 1 networking student.

In [15]:

# Number of movies in the dataset
num_movies = len(movies)

# Number of actors in the dataset
num_actors = len(actors)

print(f"Number of movies: {num_movies}")
print(f"Number of actors: {num_actors}")

Number of movies: 492431
Number of actors: 1263295


### Explanation
- To answer this question, we used the two curated lists movies and actors, which are already loaded in memory. Each element of the movies list represents one movie, and each element of the actors list represents one actor or actress. By applying the len() function to each list, we obtained the total number of movies and the total number of actors in the dataset. This method is straightforward and efficient, as it does not require reprocessing the original IMDb files

- In average, in how many movies played an actor?

Prompt
>Using a curated IMDb dataset stored in a Python list called actors, write Python code to compute the average number of movies played by an actor.
Each actor contains a list of movie indices.

In [16]:
# Total number of actors
num_actors = len(actors)

# Average number of movies per actor
average_movies = sum(len(actor[1]) for actor in actors) / len(actors)

print(f"Average number of movies per actor: {average_movies:.2f}")

Average number of movies per actor: 3.00


### Explanation
- We computed the average number of movies played by an actor by iterating over the actors list. For each actor, the number of movies they participated in is given by the length of their movie index list. We summed these values for all actors and divided the result by the total number of actors. This calculation provides an average that describes how many movies an actor typically appears in within our curated dataset.

- In average, how many actors play in a movie?

Prompt
>Using a curated IMDb dataset stored in two Python lists called movies and actors, write Python code to compute the average number of actors per movie.
Each movie contains a list of actor indices.
The solution should be simple and suitable for a Master 1 networking student.

In [17]:
# Average number of actors per movie
average_actors = sum(len(movie[1]) for movie in movies) / len(movies)

print(f"Average number of actors per movie: {average_actors:.2f}")

Average number of actors per movie: 7.70


### Explanation
- We computed the average number of actors per movie by iterating over the movies list. Each movie contains a list of actor indices corresponding to the actors who played in it. We summed the sizes of these lists for all movies and divided the result by the total number of movies. This gives us an average value that represents how many actors typically play in a movie in our curated dataset.

- What is the name of the actor that played in the most movies? How many movies did he feature in?

Prompt
>Using a curated IMDb dataset stored in a list called actors, write Python code to find the actor who played in the highest number of movies.
Each actor contains a list of movie indices.
Display the actor’s name and the number of movies.
The solution should be simple and suitable for a Master 1 networking student


In [18]:
# Find the actor with the most movies
max_actor = max(actors, key=lambda actor: len(actor[1]))

actor_name = max_actor[0]
movie_count = len(max_actor[1])

print(f"Actor with most movies: {actor_name}")
print(f"Number of movies: {movie_count}")

Actor with most movies: Brahmanandam
Number of movies: 1124


### Explanation
- To answer this question, we searched for the actor who has the longest list of associated movies. Each actor in the actors list contains their name and a list of movie indices representing the movies they acted in. By selecting the actor with the maximum list length, we identified the person who played in the highest number of movies. We then displayed the actor’s name and the total number of movies they featured in

- What is the oldest movie in the DB?

Prompt
>Using a curated IMDb dataset stored in a list called movies, write Python code to find the oldest movie in the database.
Each movie contains its title and release year.
Display the movie name and its year.

In [19]:
# Find the oldest movie
oldest_movie = min(movies, key=lambda movie: movie[2])

movie_title = oldest_movie[0]
movie_year = oldest_movie[2]

print(f"Oldest movie: {movie_title} ({movie_year})")

Oldest movie: Miss Jerry (1894)


### Explanation
- We identified the oldest movie in the database by comparing the release years of all movies stored in the movies list. Each movie contains its release year as an integer, which allows direct comparison. By selecting the movie with the smallest year value, we obtained the oldest movie in our dataset. Finally, we displayed its title and release year

## Exercise 5: Prepare some functions

Write the following functions
- `search_movie(name: str) -> list`: return a list of movies whose name contains `name` (ignoring case). Each movie is described as a dictionary with keys `name`, `year`, and `index` (its position in `movies`)
- `get_movie(i: int) -> dict`: returns the a json of the movie at position `i`, with following keys:
  - `name` (`str`)
  - `year` (`int`)
  - `actors` (list of dictionaries with keys `name` and `index`)
- `search_actor(name: str) -> list`: return a list of actors whose name contains `name` (ignoring case). Each actor is described as a dictionary with keys `name` and `index` (its position in `actor`)
- `get_actor(i: int) -> dict`: returns the a json of the actor at position `i`, with following keys:
  - `name` (`str`)
  - `movies` (list of dictionaries with keys `name`, `year`, and `index`)

Prompt
>Using a curated IMDb dataset stored in a list called movies, write a Python function search_movie(name) that returns all movies whose name contains the given string, ignoring case.
Each result should be a dictionary with keys name, year, and index

In [20]:
def search_movie(name):
    results = []
    name = name.lower()

    for i, movie in enumerate(movies):
        if name in movie[0].lower():
            results.append({
                "name": movie[0],
                "year": movie[2],
                "index": i
            })

    return results

### Explanation
- We implemented the search_movie function to find movies whose title contains a given string, ignoring case differences. We iterate over the movies list using enumerate to keep track of each movie’s index. When the searched name appears in the movie title, we store the result as a dictionary containing the movie name, its release year, and its index in the movies list. The function returns a list of all matching movies

Prompt
>Using a curated IMDb dataset stored in two lists called movies and actors, write a Python function get_movie(i) that returns detailed information about a movie at position i.
The result should be a dictionary containing the movie name, year, and a list of actors with their names and indices

In [21]:
def get_movie(i):
    movie = movies[i]

    return {
        "name": movie[0],
        "year": movie[2],
        "actors": [
            {
                "name": actors[a_idx][0],
                "index": a_idx
            }
            for a_idx in movie[1]
        ]
    }

### Explanation
- The get_movie function returns detailed information about a movie given its index in the movies list. We extract the movie name and release year directly from the stored structure. For the actors, we iterate over the list of actor indices associated with the movie and retrieve each actor’s name from the actors list. The result is returned as a dictionary that can easily be converted to JSON

Prompt
>Using a curated IMDb dataset stored in a list called actors, write a Python function search_actor(name) that returns all actors whose name contains the given string, ignoring case.
Each result should be a dictionary with keys name and index

In [22]:
def search_actor(name):
    results = []
    name = name.lower()

    for i, actor in enumerate(actors):
        if name in actor[0].lower():
            results.append({
                "name": actor[0],
                "index": i
            })

    return results

### Explanation
- We created the search_actor function to find actors whose name contains a given string, without considering case sensitivity. We iterate over the actors list and compare the searched name with each actor’s name. When a match is found, we return the actor’s name and index as a dictionary. This function is useful for locating actors before retrieving detailed information.

Prompt
>Using a curated IMDb dataset stored in two lists called actors and movies, write a Python function get_actor(i) that returns detailed information about an actor at position i.
The result should include the actor name and a list of movies with their names, years, and indices

In [23]:
def get_actor(i):
    actor = actors[i]

    return {
        "name": actor[0],
        "movies": [
            {
                "name": movies[m_idx][0],
                "year": movies[m_idx][2],
                "index": m_idx
            }
            for m_idx in actor[1]
        ]
    }

### Explanation
- The get_actor function returns detailed information about an actor using their index in the actors list. We retrieve the actor’s name and then iterate over the list of movie indices associated with this actor. For each movie, we extract its name, release year, and index from the movies list. The function returns a dictionary that clearly describes the actor and their filmography

In [24]:
bronzés = search_movie('bronzés')
bronzés

[{'name': 'Les bronzés', 'year': 1978, 'index': 54000},
 {'name': 'Les bronzés font du ski', 'year': 1979, 'index': 55034},
 {'name': 'Les bronzés 3: amis pour la vie', 'year': 2006, 'index': 180714},
 {'name': "Les P'tits Bronzés au Pyrénéen", 'year': 2013, 'index': 467203}]

In [25]:
search_movie('gendarme')

[{'name': 'Le gendarme de Saint-Tropez', 'year': 1964, 'index': 41008},
 {'name': 'Le gendarme à New York', 'year': 1965, 'index': 42676},
 {'name': 'Le gendarme se marie', 'year': 1968, 'index': 44459},
 {'name': 'Le gendarme en balade', 'year': 1970, 'index': 46407},
 {'name': 'Le gendarme et les extra-terrestres', 'year': 1979, 'index': 55248},
 {'name': 'Le gendarme et les gendarmettes', 'year': 1982, 'index': 58345},
 {'name': 'Le gendarme de Champignol', 'year': 1959, 'index': 92832},
 {'name': 'El gendarme desconocido', 'year': 1941, 'index': 94280},
 {'name': 'El gendarme de la esquina', 'year': 1951, 'index': 116189},
 {'name': 'Sacrés gendarmes', 'year': 1980, 'index': 120031},
 {'name': "Hainburg - Je t'aime, gendarme", 'year': 2001, 'index': 145905},
 {'name': 'Le gendarme de Abobo', 'year': 2019, 'index': 320666},
 {'name': 'Le retour du gendarme de Abobo', 'year': 2025, 'index': 399577}]

In [26]:
get_movie(search_movie('Ils sont fous')[0]['index'])

{'name': 'Ils sont fous ces sorciers',
 'year': 1978,
 'actors': [{'name': 'Renée Saint-Cyr', 'index': 23809},
  {'name': 'Catherine Lachens', 'index': 99393},
  {'name': 'Michel Peyrelon', 'index': 85249},
  {'name': 'Jean Lefebvre', 'index': 25385},
  {'name': 'Jean-Jacques Moreau', 'index': 96009},
  {'name': 'Maitena Galli', 'index': 81806},
  {'name': 'Henri Guybet', 'index': 84594},
  {'name': 'Julien Guiomar', 'index': 72018},
  {'name': 'Daniel Ceccaldi', 'index': 49814},
  {'name': 'Dominique Vallée', 'index': 244190}]}

In [27]:
get_movie(bronzés[0]['index'])

{'name': 'Les bronzés',
 'year': 1978,
 'actors': [{'name': 'Michel Creton', 'index': 72836},
  {'name': 'Marie-Anne Chazel', 'index': 103879},
  {'name': 'Bruno Moynot', 'index': 103880},
  {'name': 'Thierry Lhermitte', 'index': 103881},
  {'name': 'Gérard Jugnot', 'index': 98987},
  {'name': 'Michel Blanc', 'index': 99340},
  {'name': 'Josiane Balasko', 'index': 101299},
  {'name': 'Luis Rego', 'index': 84374},
  {'name': 'Martin Lamotte', 'index': 103319},
  {'name': 'Dominique Lavanant', 'index': 103318}]}

In [28]:
harry = search_actor('Daniel Radcliffe')
harry

[{'name': 'Daniel Radcliffe', 'index': 278107}]

In [29]:
get_actor(harry[0]['index'])

{'name': 'Daniel Radcliffe',
 'movies': [{'name': 'Harry Potter and the Deathly Hallows: Part 2',
   'year': 2011,
   'index': 227081},
  {'name': 'Imperium', 'year': 2016, 'index': 427767},
  {'name': 'Merrily We Roll Along', 'year': 2025, 'index': 376853},
  {'name': 'Swiss Army Man', 'year': 2016, 'index': 416535},
  {'name': 'Horns', 'year': 2013, 'index': 268697},
  {'name': 'Playmobil: The Movie', 'year': 2019, 'index': 419105},
  {'name': 'Victor Frankenstein', 'year': 2015, 'index': 299172},
  {'name': 'Beast of Burden', 'year': 2018, 'index': 449081},
  {'name': 'Weird: The Al Yankovic Story', 'year': 2022, 'index': 284476},
  {'name': 'National Theatre Live: Rosencrantz & Guildenstern Are Dead',
   'year': 2017,
   'index': 457663},
  {'name': 'Harry Potter and the Half-Blood Prince',
   'year': 2009,
   'index': 175169},
  {'name': 'The Woman in Black', 'year': 2012, 'index': 276673},
  {'name': 'The F Word', 'year': 2013, 'index': 263620},
  {'name': 'Jungle', 'year': 2017,

Write a function `movie_path(origin: int, destination: int) -> distance: int, path: list` that computes the collaboration distance between two actors. That distance is the length of the shortest path `(origin, act1, act2, ..., actX, destination)`, where `origin` and `act` played in the same movie, `act1` and `act2` played in the same movie, ... and
`actX` and `destination` played in the same movie.  In addition to the distance, the response should include one shortest path between the two actors, as a list of the form `["origin_name", "movie1_name", "act1_name", "movie2_name", ..., "destination_name"]`, where `movie1` is a movie that featured `origin` and `act1`, and so on...

In particular:
- One actor is by convention at distance 0 from herself. The return path should be `["origin_name"]` then;
- Two distinct actors that play in the same movie are at distance 1;
- If there is no connection between two actors, the function should return `-1, []` by convention.

**Important remarks**: `movie_path` is tricky. You need to try to implement it but you are allowed to fail. If you are stuck for too long, please explain what you did/try and what blocked you in your opinion. Then move on.

Prompt
>Using a curated IMDb dataset stored in two Python lists called actors and movies, write a Python function movie_path(origin, destination) that computes the shortest collaboration path between two actors.
The function should return the collaboration distance and one shortest path, alternating actor names and movie names.
If the origin and destination are the same actor, return distance 0 and the actor name.
If no path exists, return -1 and an empty list.
The solution should be suitable for a Master 1 networking student

In [37]:
from collections import deque

def movie_path(origin, destination):
    # Case 1: same actor
    if origin == destination:
        return 0, [actors[origin][0]]

    visited = set()
    queue = deque()
    
    # parent dictionary:
    # actor_index -> (previous_actor_index, movie_index)
    parent = {}

    # Initialize BFS
    queue.append(origin)
    visited.add(origin)

    while queue:
        current_actor = queue.popleft()

        # Explore movies of the current actor
        for movie_idx in actors[current_actor][1]:
            for next_actor in movies[movie_idx][1]:

                if next_actor in visited:
                    continue

                visited.add(next_actor)
                parent[next_actor] = (current_actor, movie_idx)

                # Destination found
                if next_actor == destination:
                    # Reconstruct path
                    path = [actors[destination][0]]
                    dist = 0
                    cur = destination

                    while cur != origin:
                        prev_actor, movie_used = parent[cur]
                        path.append(movies[movie_used][0])
                        path.append(actors[prev_actor][0])
                        cur = prev_actor
                        dist += 1

                    path.reverse()
                    return dist, path

                queue.append(next_actor)

    # No connection found
    return -1, []

### Explanation:
- We implemented the movie_path function using a breadth-first search algorithm, which is well suited for finding the shortest path in a graph. In our case, actors are nodes and a connection exists when two actors played in the same movie. We start from the origin actor and explore all reachable actors level by level. To reconstruct the collaboration path, we store for each visited actor the previous actor and the movie that connects them. When the destination actor is found, we rebuild the path by going backwards and alternating actor names and movie names. If both actors are the same, we return a distance of zero. If no connection exists, we return -1 and an empty path

In [30]:
jean = search_actor('jean dujardin')
jean_index = jean[0]['index']
jean

[{'name': 'Jean Dujardin', 'index': 330542}]

In [31]:
jack = search_actor('kiefer sutherland')
jack_index = jack[0]['index']
jack

[{'name': 'Kiefer Sutherland', 'index': 123947}]

In [32]:
kevin = search_actor('kevin bacon')
kevin_index = kevin[0]['index']
kevin

[{'name': 'Kevin Bacon', 'index': 105473},
 {'name': 'Kevin Bacon', 'index': 1209088}]

In [33]:
cruchot = search_actor('louis de funès')
cruchot_index = cruchot[0]['index']
cruchot

[{'name': 'Louis de Funès', 'index': 38580}]

In [38]:
movie_path(kevin_index, kevin_index)

(0, ['Kevin Bacon'])

In [40]:
movie_path(kevin_index, jean_index)

(2,
 ['Kevin Bacon',
  'Wild Things',
  'Bill Murray',
  'The Monuments Men',
  'Jean Dujardin'])

In [41]:
movie_path(cruchot_index, jack_index)

(3,
 ['Louis de Funès',
  'Fantômas',
  'Andrée Tainsy',
  'Sous le sable',
  'Charlotte Rampling',
  'Melancholia',
  'Kiefer Sutherland'])

## Exercise 6. Provide a Web API

Using Python and Flask, build a web server that implements the following routes:
- `/movies/{id}` : where `id` is the index of a movie, returns the corresponding movie as a json (cf `get_movie`).
- `/movies` : returns by default the first 100 movies. The value 100 can be modified by sending a URL parameter `limit`.
- `/actors/{id}` : where `id` is the index of an author, returns the json of the actor (cf `get_actor`).
- `/actors` : returns by default the first 100 actors. The value 100 can be modified by sending a URL parameter `limit`.
- `/actors/{id}/costars` : returns the co-stars of one actor (actors that play in a same movie).
- `/search/actors/{searchString}` : where `searchString` is a string to lookup one actor. This route should return the actors whose name contains `searchString` (for example, `/search/actors/w` returns the actors whose name contains `w` or `W`).
- `/search/movies/{searchString}`: where `searchString` is a string, returns the list of movies whose title contains `searchString`. The route should accept a URL parameter `filter` formatted like `key1:value1,key2:value2,...`  to restrain the search to the publications where key `keyi` contains `valuei`. For example, `/search/movies/gendarme?filter=year:1964`
should return the list of movies where the title contains `gendarme` published in 1964.
- `/actors/{id_origin}/distance/{id_destination}` : where `id_origin`
and `id_destination` are two actor indices, returns the collaboration distance between the two actors. In addition to the distance, the response should include one shortest path between the two actors, e.g. the json you return should be a list of two elements, one integer and one list.

The developed API should have the following characteristics:

- All errors should have the same format.
- In absence of error, the API should always return a `json`.
- Each route must be documented with the return format, possible errors, and an explanation of parameters.
- Each route that returns a list should return a maximum of 100 elements and should accept URL parameters `start` and `limit` to display `limit` elements starting from the `start`-th element. For example: `/actors` should return the first 100 authors, `/actors?start=100` displays the next 100, and `/actors?start=200&limit=2` displays the next 2 elements.
- For each route that returns a list, the returned elements should be sortable based on a given field using a URL parameter `order`. For example: `/movies?order=year` displays the first 100 movies sorted by year.

Prompt
>Design and implement a REST Web API using Python and Flask to expose a curated IMDb dataset stored in two Python lists called movies and actors.
The API must provide routes to retrieve movies and actors by index, search movies and actors by name, list movies and actors with pagination and sorting, retrieve co-stars of an actor, and compute the collaboration distance between two actors.
All responses must be returned as JSON, and all errors must follow a uniform format.
List endpoints must support URL parameters start, limit, and order.
Each route should be clearly documented and implemented in a clean and readable way, suitable for a Master 1 networking student.

In [49]:
!{sys.executable} -m pip install flask

Collecting flask
  Downloading flask-3.1.2-py3-none-any.whl.metadata (3.2 kB)
Collecting blinker>=1.9.0 (from flask)
  Using cached blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting click>=8.1.3 (from flask)
  Downloading click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
Collecting itsdangerous>=2.2.0 (from flask)
  Using cached itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting werkzeug>=3.1.0 (from flask)
  Downloading werkzeug-3.1.4-py3-none-any.whl.metadata (4.0 kB)
Downloading flask-3.1.2-py3-none-any.whl (103 kB)
Using cached blinker-1.9.0-py3-none-any.whl (8.5 kB)
Downloading click-8.3.1-py3-none-any.whl (108 kB)
Using cached itsdangerous-2.2.0-py3-none-any.whl (16 kB)
Downloading werkzeug-3.1.4-py3-none-any.whl (224 kB)
Installing collected packages: werkzeug, itsdangerous, click, blinker, flask
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5/5[0m [flask]
[1A[2KSuccessfully installed blinker-1.9.0 click-8.3.1 flask-3.1.2 itsdangerous-2.2.

In [59]:
from flask import Flask, jsonify, request

app = Flask(__name__)

# --------------------
# Helper functions
# --------------------

def error_response(message, status=400):
    return jsonify({"error": message}), status


def get_pagination_params():
    start = int(request.args.get("start", 0))
    limit = int(request.args.get("limit", 100))
    limit = min(limit, 100)
    return start, limit


def sort_list(data, key):
    if key is None:
        return data
    try:
        return sorted(data, key=lambda x: x.get(key))
    except KeyError:
        return data

# --------------------
# Root route
# --------------------
@app.route("/")
def home():
    return """
    <h1>Hello from the Movie API!</h1>
    <p>This work is by:</p>
    <ul>
        <li>SAAD, Maissa</li>
        <li>AKZOUN, Hafsa</li>
    </ul>
    """

# --------------------
# Routes: Movies
# --------------------

@app.route("/movies/<int:id>")
def api_get_movie(id):
    if id < 0 or id >= len(movies):
        return error_response("Movie not found", 404)
    return jsonify(get_movie(id))


@app.route("/movies")
def api_movies():
    start, limit = get_pagination_params()
    order = request.args.get("order")

    data = [
        {"name": m[0], "year": m[2], "index": i}
        for i, m in enumerate(movies)
    ]

    data = sort_list(data, order)
    return jsonify(data[start:start + limit])


# --------------------
# Routes: Actors
# --------------------

@app.route("/actors/<int:id>")
def api_get_actor(id):
    if id < 0 or id >= len(actors):
        return error_response("Actor not found", 404)
    return jsonify(get_actor(id))


@app.route("/actors")
def api_actors():
    start, limit = get_pagination_params()
    order = request.args.get("order")

    data = [
        {"name": a[0], "index": i}
        for i, a in enumerate(actors)
    ]

    data = sort_list(data, order)
    return jsonify(data[start:start + limit])


@app.route("/actors/<int:id>/costars")
def api_costars(id):
    if id < 0 or id >= len(actors):
        return error_response("Actor not found", 404)

    costars = set()
    for movie_idx in actors[id][1]:
        for a in movies[movie_idx][1]:
            if a != id:
                costars.add(a)

    result = [{"name": actors[i][0], "index": i} for i in costars]
    return jsonify(result)


# --------------------
# Search routes
# --------------------

@app.route("/search/actors/<string:query>")
def api_search_actors(query):
    return jsonify(search_actor(query))


@app.route("/search/movies/<string:query>")
def api_search_movies(query):
    results = search_movie(query)

    filters = request.args.get("filter")
    if filters:
        for f in filters.split(","):
            key, value = f.split(":")
            results = [m for m in results if str(m.get(key)) == value]

    return jsonify(results)


# --------------------
# Distance route
# --------------------

@app.route("/actors/<int:origin>/distance/<int:destination>")
def api_distance(origin, destination):
    if origin < 0 or origin >= len(actors):
        return error_response("Origin actor not found", 404)
    if destination < 0 or destination >= len(actors):
        return error_response("Destination actor not found", 404)

    distance, path = movie_path(origin, destination)
    return jsonify([distance, path])


# --------------------
# Run server
# --------------------

if __name__ == "__main__":
    app.run()


 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
[33mPress CTRL+C to quit[0m
127.0.0.1 - - [20/Dec/2025 16:38:39] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [20/Dec/2025 16:38:41] "GET /actors/0/costars HTTP/1.1" 200 -
127.0.0.1 - - [20/Dec/2025 16:38:44] "GET / HTTP/1.1" 200 -


To test this API, you can open a browser and try the following routes:
- `/movies` - List of movies  
- `/movies/0` - Get movie with index 0  
- `/actors` - List of actors  
- `/actors/0` - Get actor with index 0  
- `/actors/0/costars` - Get costars of actor 0  
- `/search/actors/<name>` - Search actors by name (replace `<name>` with the actor’s name)  
- `/search/movies/<title>` - Search movies by title (replace `<title>` with the movie title)  
- `/actors/0/distance/1` - Get distance between two actors (replace `0` and `1` with actor indices)  


### Explanation
- We implemented a REST Web API using Flask to expose our curated IMDb dataset through HTTP routes. The API relies entirely on the movies and actors lists already loaded in memory, which ensures good performance and avoids unnecessary recomputation. Each route returns JSON responses only, and all errors follow a uniform structure to simplify client-side handling.

- List-based routes support pagination through the start and limit parameters, with a maximum of 100 elements returned per request. Sorting is implemented using the order parameter, allowing results to be ordered by a given field such as year or name. Search routes allow partial and case-insensitive matching. Finally, the collaboration distance between two actors is computed using a breadth-first search algorithm and exposed through a dedicated route. This API design follows common best practices in web services and networking.

## Exercise 7. Test a Web API

Using `pytest`, write a program that checks that the API made in the previous exercise works as expected.

Your answer here.

Prompt
>Write a pytest suite to test a Flask API that exposes a curated IMDb dataset.
The API has routes for movies, actors, searching, co-stars, and collaboration distances.
Each test should: check status codes, check JSON responses, and check expected fields in the returned JSON.
Use Flask's test_client() for testing.
Tests should cover: getting single movies/actors, listing movies/actors with pagination and sorting, searching, co-stars, collaboration distance, and proper error handling.

We install the pytest package for the Python environment currently used by this Jupyter notebook

In [None]:
!{sys.executable} -m pip install pytest

Collecting pytest
  Downloading pytest-9.0.2-py3-none-any.whl.metadata (7.6 kB)
Collecting iniconfig>=1.0.1 (from pytest)
  Downloading iniconfig-2.3.0-py3-none-any.whl.metadata (2.5 kB)
Collecting pluggy<2,>=1.5 (from pytest)
  Downloading pluggy-1.6.0-py3-none-any.whl.metadata (4.8 kB)
Collecting tomli>=1 (from pytest)
  Downloading tomli-2.3.0-py3-none-any.whl.metadata (10 kB)
Downloading pytest-9.0.2-py3-none-any.whl (374 kB)
Downloading pluggy-1.6.0-py3-none-any.whl (20 kB)
Downloading iniconfig-2.3.0-py3-none-any.whl (7.5 kB)
Downloading tomli-2.3.0-py3-none-any.whl (14 kB)
Installing collected packages: tomli, pluggy, iniconfig, pytest
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4/4[0m [pytest]
[1A[2KSuccessfully installed iniconfig-2.3.0 pluggy-1.6.0 pytest-9.0.2 tomli-2.3.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m

In [64]:
# Run pytest in verbose mode to execute tests and show detailed output
! pytest -v

platform darwin -- Python 3.10.17, pytest-9.0.2, pluggy-1.6.0 -- /opt/homebrew/opt/python@3.10/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/hafsaakzoun/Desktop/Master RES 1/PROGRES/MiniProjet
plugins: anyio-4.6.2.post1
collected 0 items                                                              [0m



In [65]:
import pytest
import json

# We assume 'app' is the Flask app object from exercise 6

@pytest.fixture
def client():
    with app.test_client() as client:
        yield client

# ----------------------------
# Test single movie
# ----------------------------
def test_get_movie(client):
    resp = client.get("/movies/0")
    assert resp.status_code == 200
    data = resp.get_json()
    assert "name" in data
    assert "year" in data
    assert "actors" in data
    
def test_get_movie_error(client):
    resp = client.get("/movies/99999999")
    assert resp.status_code == 404
    data = resp.get_json()
    assert "error" in data

# ----------------------------
# Test movies list
# ----------------------------
def test_movies_list(client):
    resp = client.get("/movies?start=0&limit=10&order=year")
    assert resp.status_code == 200
    data = resp.get_json()
    assert isinstance(data, list)
    assert len(data) <= 10
    assert "name" in data[0]
    assert "year" in data[0]

# ----------------------------
# Test single actor
# ----------------------------
def test_get_actor(client):
    resp = client.get("/actors/0")
    assert resp.status_code == 200
    data = resp.get_json()
    assert "name" in data
    assert "movies" in data

def test_get_actor_error(client):
    resp = client.get("/actors/9999999")
    assert resp.status_code == 404
    data = resp.get_json()
    assert "error" in data

# ----------------------------
# Test actors list
# ----------------------------
def test_actors_list(client):
    resp = client.get("/actors?start=0&limit=5&order=name")
    assert resp.status_code == 200
    data = resp.get_json()
    assert isinstance(data, list)
    assert len(data) <= 5
    assert "name" in data[0]

# ----------------------------
# Test co-stars
# ----------------------------
def test_costars(client):
    resp = client.get("/actors/0/costars")
    assert resp.status_code == 200
    data = resp.get_json()
    assert isinstance(data, list)
    if data:
        assert "name" in data[0]
        assert "index" in data[0]

# ----------------------------
# Test search actors
# ----------------------------
def test_search_actors(client):
    resp = client.get("/search/actors/kevin")
    assert resp.status_code == 200
    data = resp.get_json()
    assert isinstance(data, list)
    if data:
        assert "name" in data[0]
        assert "index" in data[0]

# ----------------------------
# Test search movies with filter
# ----------------------------
def test_search_movies(client):
    resp = client.get("/search/movies/gendarme?filter=year:1964")
    assert resp.status_code == 200
    data = resp.get_json()
    assert isinstance(data, list)
    if data:
        assert "name" in data[0]
        assert "year" in data[0]
        assert int(data[0]["year"]) == 1964

# ----------------------------
# Test collaboration distance
# ----------------------------
def test_distance(client):
    resp = client.get("/actors/0/distance/1")
    assert resp.status_code == 200
    data = resp.get_json()
    assert isinstance(data, list)
    assert len(data) == 2
    assert isinstance(data[0], int)  # distance
    assert isinstance(data[1], list)  # path


In [66]:
# Create a test client for the Flask app
client_instance = app.test_client()

In [67]:
# List of all test functions
tests = [
    test_get_movie, test_get_movie_error, test_movies_list,
    test_get_actor, test_get_actor_error, test_actors_list,
    test_costars, test_search_actors, test_search_movies, test_distance
]

# Run each test
for test in tests:
    try:
        test(client_instance)
        print(f"{test.__name__}: ✅ Passed")
    except AssertionError as e:
        print(f"{test.__name__}: ❌ Failed")
        print(e)

test_get_movie: ✅ Passed
test_get_movie_error: ✅ Passed
test_movies_list: ✅ Passed
test_get_actor: ✅ Passed
test_get_actor_error: ✅ Passed
test_actors_list: ✅ Passed
test_costars: ✅ Passed
test_search_actors: ✅ Passed
test_search_movies: ✅ Passed
test_distance: ✅ Passed


### Explanation
- We wrote a pytest suite to validate the functionality of the Flask API. Using Flask’s test_client, we can simulate HTTP requests without running the server externally. Each route is tested for:

Correct HTTP status codes (200 for success, 404 for missing resources)

Correct JSON structure with expected keys (name, year, actors, movies)

List endpoints check pagination (start, limit) and sorting (order)

Search functionality is validated both for actors and movies, including filters

Co-stars route is checked to ensure it returns a list of actors in the same movies

Collaboration distance route is checked to ensure it returns a distance integer and a path list

By using pytest fixtures and assertions, we ensure automated, repeatable testing. This approach is typical in networking and web development to validate APIs before deployment.

## Exercise 8. Make a Website that uses the Web API

Create a Python web server using Flask. Use the Web API you developed to offer the user a graphical Web interface. This interface allows the user to obtain, by entering relevant information into a Web form:

- The complete list of movies and the complete list of costars of an actor, possibly sorted alphabetically. This actor can be searched beforehand using a substring of characters appearing in her name.
- The colloration distance between two actors. As above, the actors can be searched beforehand using a substring of characters appearing in their names. Try to format a bit (not too much). For example:
  - The collaboration distance between Kevin Bacon and Jean Dujardin is 2.
  - Kevin bacon played in Wild things with Bill Murray;
  - Bill Murray played in The Monuments Men with Jean Dujardin.

Your answer here.