# Kino.wtf Movie Helper

This notebook is used to generate the movie metadata necessary for kino.wtf. 

How to run
```sh
# Create python virtual env
python -m venv env;
# activate venv 
source env/bin/activate; # env/bin/activate.fish for fish
# install requirements
pip install pandas numpy bs4
```

In [1]:
# Load in movies from a CSV file and and 

import csv
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import os
from dotenv import load_dotenv
import json

# Load in the API key from the .env file
load_dotenv()
TMDB_API_KEY = os.getenv('TMDB_API_KEY')

# Load in the movies from the CSV file (exported from Letterboxd account)
movies = pd.read_csv('movies.csv', sep=',', encoding='latin-1', usecols=['Name', 'Year', 'URL'])

movies.head()

Unnamed: 0,Name,Year,URL
1,The Shawshank Redemption,1994,https://boxd.it/2aHi
2,Reservoir Dogs,1992,https://boxd.it/2agc
3,Fight Club,1999,https://boxd.it/2a9q
4,Get Out,2017,https://boxd.it/eOCm
5,Platoon,1986,https://boxd.it/29BS


Letterboxd's export includes no TMDb or IMDB id's but Letterboxd's frontend contains that info.

Using BeautifulSoup, we can go to each film's details on letterboxd and extract the TMDb ID.

I'm using TMDb instead of IMDb because I have access to their API.

In [2]:
# Scrape the Letterboxd website for the movie's TMDB ID
movies['TMDb ID'] = np.nan

# Loop through the movies and scrape the TMDB ID
for index, row in movies.iterrows():
    # only 5 for now
    if index > 5:
        break
    # Skip movies that already have a TMDB ID
    if not pd.isnull(row['TMDb ID']):
        continue
    
    # Get the movie's Letterboxd URL
    url = row['URL'] + '/details'
    # Get the movie's TMDB ID
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    body = soup.find('body')
    tmdb_id = body.get('data-tmdb-id')
    # Add the TMDB ID to the dataframe
    movies.at[index, 'TMDb ID'] = tmdb_id

# assert movies['TMDb ID'].isnull().any() == False
movies.head()

  movies.at[index, 'TMDb ID'] = tmdb_id


Unnamed: 0,Name,Year,URL,TMDb ID
1,The Shawshank Redemption,1994,https://boxd.it/2aHi,278
2,Reservoir Dogs,1992,https://boxd.it/2agc,500
3,Fight Club,1999,https://boxd.it/2a9q,550
4,Get Out,2017,https://boxd.it/eOCm,419430
5,Platoon,1986,https://boxd.it/29BS,792


Now we have a TMDB ID for every movie, so we can make calls to the TMDb API to extract the top 6 actors from each film and add them as a column to the dataframe.

In [17]:
# Scrape the TMDB API for the movie's cast
movies['Actors'] = [[] for _ in range(len(movies))]

for index, row in movies.iterrows():
    # just do the first 5 movies for now
    if index > 5:
        break
    
    # Skip movies that already have actors listed
    if len(row['Actors']) > 0:
        continue

    # Get the movie's TMDB ID
    tmdb_id = row['TMDb ID']
    
    # Get the movie's cast
    url = f"https://api.themoviedb.org/3/movie/{tmdb_id}/credits?language=en-US"
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {TMDB_API_KEY}"
    }

    response = requests.get(url, headers=headers)
    data = response.text
    data = json.loads(data)
    cast = data['cast']
    # Add the cast to the dataframe as a list of 6 actors
    movies.loc[index, 'Actors'] = [actor['name'] for actor in cast[:6]]

movies.head()

ValueError: Must have equal len keys and value when setting with an iterable