# Creacion de DataFrames de Hulu

## Lecturas archivos

In [1]:
import numpy as np
import pandas as pd

### api_hulu
Informacion de las peliculas dada por la API. Sus columnas son:
Las columnas nos entregan los siguientes datos:
- `id`: Identificador de la pelicula en la API
- `title`: Nombre de la pelicula
- `year`: Año de estreno de la pelicula
- `imdb_id`: Identificador de la pelicula en IMDB
- `tmdb_id`: Identificador de la pelicula en TMDB
- `tmdb_type`: Tipo del titulo en TMDB
- `type`: Tipo del titulo en la API

In [2]:
dfhulu = pd.read_csv('data/api_hulu.csv')
dfhulu.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434
1,1615420,Bullet Train,2022,tt12593682,718930
2,1583908,The Menu,2022,tt9764362,593643
3,1515072,Uncharted,2022,tt1464335,335787
4,1650358,Barbarian,2022,tt15791034,913290


### imdb_basics
Informacion basica de cada titulo en IMDB. Sus columnas son:
- `tconst`: Id del titulo en IMDB
- `titleType`: Tipo del titulo
- `primaryTitle`: Nombre mas comun del titulo
- `originalTitle`: Nombre original del titulo
- `isAdult`: Bool que indica si es para adultos o no
- `startYear`: Año de salida, en series es el año de comienzo de la serie
- `endYear`: Año de fin de la serie (No muy util ya que solo trabajaremos con peliculas)
- `runtimeMinutes`: Duracion del titulo en minutos
- `genres`: Lista de generos del titulo

In [3]:
imdb_basics = pd.read_csv('data/title.basics.tsv', sep='\t')
imdb_basics.head()

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0000001,short,Carmencita,Carmencita,0,1894,\N,1,"Documentary,Short"
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,\N,5,"Animation,Short"
2,tt0000003,short,Poor Pierrot,Pauvre Pierrot,0,1892,\N,5,"Animation,Comedy,Romance"
3,tt0000004,short,Un bon bock,Un bon bock,0,1892,\N,12,"Animation,Short"
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893,\N,1,Short


### imdb_ratings
Rating de cada pelicula en IMDB. Sus columnas son:
- `tconst`: Id de IMDB
- `averageRating`: Puntaje promedio dado por los votos
- `numVotes`: Cantidad de votos

In [4]:
imdb_ratings = pd.read_csv('data/title.ratings.tsv', sep='\t')
imdb_ratings.head()

Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.7,2178
1,tt0000002,5.5,299
2,tt0000003,6.4,2243
3,tt0000004,5.2,193
4,tt0000005,6.2,2986


### imdb_principals
Trabajadores involucrados en cada titulo de IMDB (directores, productores, actores, etc.). Sus columnas son:
- `tconst`: Id del titulo en IMDB
- `ordering`: Id para enumerar a los trabajadores por titulo
- `nconst`: Id de persona en IMDB
- `category`: Categoria del rol que cumplio en el titulo
- `job`: Trabajo que tenia en el titulo
- `characters`: En caso de ser actor, muestra los nombres de los personajes que interpreta

In [5]:
imdb_principals = pd.read_csv('data/title.principals.tsv', sep='\t')
imdb_principals.head()

Unnamed: 0,tconst,ordering,nconst,category,job,characters
0,tt0000001,1,nm1588970,self,\N,"[""Self""]"
1,tt0000001,2,nm0005690,director,\N,\N
2,tt0000001,3,nm0005690,producer,producer,\N
3,tt0000001,4,nm0374658,cinematographer,director of photography,\N
4,tt0000002,1,nm0721526,director,\N,\N


### imdb_crew
Directores y escritores de cada titulo en IMDB. Sus columnas son:
- `tconst`: Id del titulo en IMDB
- `directors`: Id de persona del director en IMDB
- `writers`: Id de persona de los escritores en IMDB

In [6]:
imdb_crew = pd.read_csv('data/title.crew.tsv', sep='\t')
imdb_crew.head()

Unnamed: 0,tconst,directors,writers
0,tt0000001,nm0005690,\N
1,tt0000002,nm0721526,\N
2,tt0000003,nm0721526,nm0721526
3,tt0000004,nm0721526,\N
4,tt0000005,nm0005690,\N


### imdb_name
Este Dataframe contiene informacion de cada persona relacionada a titulos dentro de IMDB. Sus columnas son:
- `nconst`: Id de la persona en IMDB
- `primaryName`: Nombre por el que es mas conocida la persona
- `birthYear`: Año de nacimiento de la persona
- `deathYear`: Año de fallecimiento de la persona
- `primaryProfession`: Los tres roles que mas suele cumplir en los titulos
- `knownForTitle`: Titulos por los que es conocido

In [7]:
imdb_name = pd.read_csv('data/name.basics.tsv', sep='\t')
imdb_name.head()

Unnamed: 0,nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
0,nm0000001,Fred Astaire,1899,1987,"actor,miscellaneous,producer","tt0050419,tt0072308,tt0027125,tt0025164"
1,nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack,archive_footage","tt0037382,tt0075213,tt0038355,tt0117057"
2,nm0000003,Brigitte Bardot,1934,\N,"actress,music_department,producer","tt0057345,tt0049189,tt0056404,tt0054452"
3,nm0000004,John Belushi,1949,1982,"actor,writer,music_department","tt0072562,tt0077975,tt0080455,tt0078723"
4,nm0000005,Ingmar Bergman,1918,2007,"writer,director,actor","tt0050986,tt0069467,tt0050976,tt0083922"


## Creacion de Dataframe principal

### Join con `imdb_basics`

In [8]:
df_main1 = dfhulu.merge(imdb_basics, how='left', left_on='imdb_id', right_on='tconst')
df_main1.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434,tt36463894,movie,Predator: Killer of Killers,Predator: Killer of Killers,0.0,2025,\N,85,"Action,Adventure,Animation"
1,1615420,Bullet Train,2022,tt12593682,718930,tt12593682,movie,Bullet Train,Bullet Train,0.0,2022,\N,127,"Action,Comedy,Thriller"
2,1583908,The Menu,2022,tt9764362,593643,tt9764362,movie,The Menu,The Menu,0.0,2022,\N,107,"Comedy,Horror,Thriller"
3,1515072,Uncharted,2022,tt1464335,335787,tt1464335,movie,Uncharted,Uncharted,0.0,2022,\N,116,"Action,Adventure"
4,1650358,Barbarian,2022,tt15791034,913290,tt15791034,movie,Barbarian,Barbarian,0.0,2022,\N,102,"Horror,Mystery,Thriller"


### Join con `imdb_ratings`

In [9]:
df_main2 = df_main1.merge(imdb_ratings, how='left', left_on='imdb_id', right_on='tconst')
df_main2.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id,tconst_x,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,tconst_y,averageRating,numVotes
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434,tt36463894,movie,Predator: Killer of Killers,Predator: Killer of Killers,0.0,2025,\N,85,"Action,Adventure,Animation",tt36463894,7.5,44742.0
1,1615420,Bullet Train,2022,tt12593682,718930,tt12593682,movie,Bullet Train,Bullet Train,0.0,2022,\N,127,"Action,Comedy,Thriller",tt12593682,7.3,506862.0
2,1583908,The Menu,2022,tt9764362,593643,tt9764362,movie,The Menu,The Menu,0.0,2022,\N,107,"Comedy,Horror,Thriller",tt9764362,7.2,469715.0
3,1515072,Uncharted,2022,tt1464335,335787,tt1464335,movie,Uncharted,Uncharted,0.0,2022,\N,116,"Action,Adventure",tt1464335,6.3,288938.0
4,1650358,Barbarian,2022,tt15791034,913290,tt15791034,movie,Barbarian,Barbarian,0.0,2022,\N,102,"Horror,Mystery,Thriller",tt15791034,7.0,237987.0


### Join con `imdb_crew`

In [10]:
df_main3 = df_main2.merge(imdb_crew, how='left', left_on='imdb_id', right_on='tconst')
df_main3.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id,tconst_x,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,tconst_y,averageRating,numVotes,tconst,directors,writers
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434,tt36463894,movie,Predator: Killer of Killers,Predator: Killer of Killers,0.0,2025,\N,85,"Action,Adventure,Animation",tt36463894,7.5,44742.0,tt36463894,"nm0870469,nm1945519","nm3026436,nm0870469,nm0859029,nm0859049"
1,1615420,Bullet Train,2022,tt12593682,718930,tt12593682,movie,Bullet Train,Bullet Train,0.0,2022,\N,127,"Action,Comedy,Thriller",tt12593682,7.3,506862.0,tt12593682,nm0500610,"nm5599654,nm2157655"
2,1583908,The Menu,2022,tt9764362,593643,tt9764362,movie,The Menu,The Menu,0.0,2022,\N,107,"Comedy,Horror,Thriller",tt9764362,7.2,469715.0,tt9764362,nm0617042,"nm2219721,nm4301557"
3,1515072,Uncharted,2022,tt1464335,335787,tt1464335,movie,Uncharted,Uncharted,0.0,2022,\N,116,"Action,Adventure",tt1464335,6.3,288938.0,tt1464335,nm0281508,"nm1988994,nm1436466,nm0391344,nm2616125,nm4731086"
4,1650358,Barbarian,2022,tt15791034,913290,tt15791034,movie,Barbarian,Barbarian,0.0,2022,\N,102,"Horror,Mystery,Thriller",tt15791034,7.0,237987.0,tt15791034,nm1199107,nm1199107


### Limpieza
#### Eliminar columnas
Eliminamos columnas con informacion repetida o irrelevante

In [11]:
df_main4 = df_main3[['id', 'imdb_id', 'title', 'primaryTitle', 'originalTitle', 'titleType', 'year', 'startYear', 
                     'isAdult', 'runtimeMinutes', 'genres', 'averageRating', 'numVotes', 'directors', 'writers']]
df_main4.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0.0,85,"Action,Adventure,Animation",7.5,44742.0,"nm0870469,nm1945519","nm3026436,nm0870469,nm0859029,nm0859049"
1,1615420,tt12593682,Bullet Train,Bullet Train,Bullet Train,movie,2022,2022,0.0,127,"Action,Comedy,Thriller",7.3,506862.0,nm0500610,"nm5599654,nm2157655"
2,1583908,tt9764362,The Menu,The Menu,The Menu,movie,2022,2022,0.0,107,"Comedy,Horror,Thriller",7.2,469715.0,nm0617042,"nm2219721,nm4301557"
3,1515072,tt1464335,Uncharted,Uncharted,Uncharted,movie,2022,2022,0.0,116,"Action,Adventure",6.3,288938.0,nm0281508,"nm1988994,nm1436466,nm0391344,nm2616125,nm4731086"
4,1650358,tt15791034,Barbarian,Barbarian,Barbarian,movie,2022,2022,0.0,102,"Horror,Mystery,Thriller",7.0,237987.0,nm1199107,nm1199107


#### Correcion valores nulos y tipos de columnas

In [12]:
df_main5 = df_main4.replace(['\\N', np.nan, None], pd.NA)

In [13]:
df_main5.dtypes

id                 int64
imdb_id           object
title             object
primaryTitle      object
originalTitle     object
titleType         object
year               int64
startYear         object
isAdult           object
runtimeMinutes    object
genres            object
averageRating     object
numVotes          object
directors         object
writers           object
dtype: object

In [14]:
df_main5['titleType'] = df_main5['titleType'].astype('category')
df_main5['startYear'] = df_main5['startYear'].astype('Int64')
df_main5['isAdult'] = df_main5['isAdult'].astype('Int64')
df_main5['runtimeMinutes'] = df_main5['runtimeMinutes'].astype('Int64')
df_main5['averageRating'] = df_main5['averageRating'].astype('Float64')
df_main5['numVotes'] = df_main5['numVotes'].astype('Int64')

In [15]:
df_main5.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"Action,Adventure,Animation",7.5,44742,"nm0870469,nm1945519","nm3026436,nm0870469,nm0859029,nm0859049"
1,1615420,tt12593682,Bullet Train,Bullet Train,Bullet Train,movie,2022,2022,0,127,"Action,Comedy,Thriller",7.3,506862,nm0500610,"nm5599654,nm2157655"
2,1583908,tt9764362,The Menu,The Menu,The Menu,movie,2022,2022,0,107,"Comedy,Horror,Thriller",7.2,469715,nm0617042,"nm2219721,nm4301557"
3,1515072,tt1464335,Uncharted,Uncharted,Uncharted,movie,2022,2022,0,116,"Action,Adventure",6.3,288938,nm0281508,"nm1988994,nm1436466,nm0391344,nm2616125,nm4731086"
4,1650358,tt15791034,Barbarian,Barbarian,Barbarian,movie,2022,2022,0,102,"Horror,Mystery,Thriller",7.0,237987,nm1199107,nm1199107


In [16]:
for i in ['genres', 'directors', 'writers']:
    df_main5[i] = df_main5[i].apply(lambda x: str(x).split(',') if pd.notna(x) else [])

df_main5.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
1,1615420,tt12593682,Bullet Train,Bullet Train,Bullet Train,movie,2022,2022,0,127,"[Action, Comedy, Thriller]",7.3,506862,[nm0500610],"[nm5599654, nm2157655]"
2,1583908,tt9764362,The Menu,The Menu,The Menu,movie,2022,2022,0,107,"[Comedy, Horror, Thriller]",7.2,469715,[nm0617042],"[nm2219721, nm4301557]"
3,1515072,tt1464335,Uncharted,Uncharted,Uncharted,movie,2022,2022,0,116,"[Action, Adventure]",6.3,288938,[nm0281508],"[nm1988994, nm1436466, nm0391344, nm2616125, n..."
4,1650358,tt15791034,Barbarian,Barbarian,Barbarian,movie,2022,2022,0,102,"[Horror, Mystery, Thriller]",7.0,237987,[nm1199107],[nm1199107]


#### Valores Na

In [17]:
resultados = {}
for i in df_main5.columns:
    resultados[i] = df_main5[i].isna().sum()
resultados

{'id': np.int64(0),
 'imdb_id': np.int64(0),
 'title': np.int64(0),
 'primaryTitle': np.int64(2),
 'originalTitle': np.int64(2),
 'titleType': np.int64(2),
 'year': np.int64(0),
 'startYear': np.int64(4),
 'isAdult': np.int64(2),
 'runtimeMinutes': np.int64(6),
 'genres': np.int64(0),
 'averageRating': np.int64(6),
 'numVotes': np.int64(6),
 'directors': np.int64(0),
 'writers': np.int64(0)}

In [18]:
df_main5[pd.isna(df_main5['runtimeMinutes'])] 

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
349,1623475,tt11422728,"Summer of Soul (...Or, When the Revolution Cou...",,,,2021,,,,[],,,[],[]
445,1968255,tt38346012,The Balloonist,,,,2025,,,,[],,,[],[]
469,11040148,tt38039379,Stay,STAY,STAY,movie,2025,,0.0,,[Horror],,,[nm3520344],[nm3520344]
549,1775541,tt30858568,Mirai,Mirai,Mirai,movie,2025,2025.0,0.0,,"[Action, Sci-Fi, Thriller]",,,"[nm6783925, nm9389259]","[nm6783925, nm12769858]"
1037,1531777,tt0154532,Ghar Sansar,Ghar Sansar,Ghar Sansar,movie,1986,1986.0,0.0,,"[Comedy, Drama]",5.4,113.0,[nm0052630],"[nm1088936, nm0434318, nm0712432]"
1075,538039,tt9726390,Robert Durst: An ID Murder Mystery,Robert Durst: An ID Murder Mystery,Robert Durst: An ID Murder Mystery,tvMiniSeries,2019,2019.0,0.0,,[Crime],6.6,68.0,[nm3042526],[]


In [19]:
df_main5[pd.isna(df_main5['averageRating'])] 

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
249,11040166,tt34601055,Frankie Quiñones: Damn That's Crazy,Untitled Frankie Quiñones Project,Untitled Frankie Quiñones Project,movie,2025,,0.0,60.0,[Comedy],,,[nm4399227],[]
349,1623475,tt11422728,"Summer of Soul (...Or, When the Revolution Cou...",,,,2021,,,,[],,,[],[]
445,1968255,tt38346012,The Balloonist,,,,2025,,,,[],,,[],[]
469,11040148,tt38039379,Stay,STAY,STAY,movie,2025,,0.0,,[Horror],,,[nm3520344],[nm3520344]
549,1775541,tt30858568,Mirai,Mirai,Mirai,movie,2025,2025.0,0.0,,"[Action, Sci-Fi, Thriller]",,,"[nm6783925, nm9389259]","[nm6783925, nm12769858]"
682,1839810,tt32915874,Lilith Fair: Building a Mystery,Lilith Fair: Building a Mystery,Lilith Fair: Building a Mystery,movie,2025,2025.0,0.0,99.0,"[Biography, Documentary, History]",,,[nm5294992],[]


#### Duplicados

In [20]:
df_main5.duplicated('imdb_id').sum()

np.int64(0)

#### Analisis de columnas

In [21]:
df_main5.describe()

Unnamed: 0,id,year,startYear,isAdult,runtimeMinutes,averageRating,numVotes
count,1077.0,1077.0,1073.0,1075.0,1071.0,1071.0,1071.0
mean,1489755.0,2016.824513,2016.742777,0.0,105.485528,6.185621,75777.447246
std,797954.7,9.174974,9.011032,0.0,18.856208,1.067112,145531.19551
min,1867.0,1964.0,1964.0,0.0,9.0,2.0,9.0
25%,1336408.0,2012.0,2012.0,0.0,93.0,5.5,2798.5
50%,1607086.0,2021.0,2021.0,0.0,101.0,6.3,11289.0
75%,1692619.0,2023.0,2023.0,0.0,114.0,7.0,74196.5
max,11040170.0,2025.0,2025.0,0.0,191.0,9.1,1113524.0


In [23]:
dfmain = df_main5

In [24]:
dfmain.to_csv('data/imdb_hulu.csv')

## Creacion Dataframe personas

### Join con `imdb_principals`

In [25]:
df_personas1 = dfmain.merge(imdb_principals, how='left', left_on='imdb_id', right_on='tconst')
df_personas1.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,...,averageRating,numVotes,directors,writers,tconst,ordering,nconst,category,job,characters
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,1.0,nm0000299,actor,\N,"[""Vandy""]"
1,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,2.0,nm0168339,actor,\N,"[""Einar""]"
2,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,3.0,nm0327779,actor,\N,"[""Torres""]"
3,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,4.0,nm2689910,actor,\N,"[""Anders""]"
4,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,5.0,nm7576527,actress,\N,"[""Freya""]"


### Eliminar columnas

In [26]:
df_personas2 = df_personas1[['id', 'imdb_id', 'title', 'nconst', 'ordering', 'category', 'job', 'characters', 'primaryTitle',
                             'originalTitle', 'titleType', 'year', 'startYear', 'isAdult', 'runtimeMinutes', 'genres', 'averageRating',
                             'numVotes', 'directors', 'writers']]
df_personas2.head()

Unnamed: 0,id,imdb_id,title,nconst,ordering,category,job,characters,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,nm0000299,1.0,actor,\N,"[""Vandy""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
1,1874486,tt36463894,Predator: Killer of Killers,nm0168339,2.0,actor,\N,"[""Einar""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
2,1874486,tt36463894,Predator: Killer of Killers,nm0327779,3.0,actor,\N,"[""Torres""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
3,1874486,tt36463894,Predator: Killer of Killers,nm2689910,4.0,actor,\N,"[""Anders""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
4,1874486,tt36463894,Predator: Killer of Killers,nm7576527,5.0,actress,\N,"[""Freya""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"


### Arreglar valores nulos y columnas

In [27]:
df_personas3 = df_personas2.replace(['\\N', np.nan, None], pd.NA)
df_personas3.head()

Unnamed: 0,id,imdb_id,title,nconst,ordering,category,job,characters,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,nm0000299,1.0,actor,,"[""Vandy""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
1,1874486,tt36463894,Predator: Killer of Killers,nm0168339,2.0,actor,,"[""Einar""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
2,1874486,tt36463894,Predator: Killer of Killers,nm0327779,3.0,actor,,"[""Torres""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
3,1874486,tt36463894,Predator: Killer of Killers,nm2689910,4.0,actor,,"[""Anders""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
4,1874486,tt36463894,Predator: Killer of Killers,nm7576527,5.0,actress,,"[""Freya""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"


In [28]:
df_personas3.dtypes

id                   int64
imdb_id             object
title               object
nconst              object
ordering            object
category            object
job                 object
characters          object
primaryTitle        object
originalTitle       object
titleType         category
year                 int64
startYear            Int64
isAdult              Int64
runtimeMinutes       Int64
genres              object
averageRating      Float64
numVotes             Int64
directors           object
writers             object
dtype: object

In [29]:
df_personas3['ordering'] = df_personas3['ordering'].astype('Int64')
df_personas3['category'] = df_personas3['category'].astype('category')

### Valores nulos

In [30]:
resultados = {}
for i in df_personas3.columns:
    resultados[i] = df_personas3[i].isna().sum()
resultados

{'id': np.int64(0),
 'imdb_id': np.int64(0),
 'title': np.int64(0),
 'nconst': np.int64(2),
 'ordering': np.int64(2),
 'category': np.int64(2),
 'job': np.int64(16487),
 'characters': np.int64(12365),
 'primaryTitle': np.int64(2),
 'originalTitle': np.int64(2),
 'titleType': np.int64(2),
 'year': np.int64(0),
 'startYear': np.int64(14),
 'isAdult': np.int64(2),
 'runtimeMinutes': np.int64(58),
 'genres': np.int64(0),
 'averageRating': np.int64(52),
 'numVotes': np.int64(52),
 'directors': np.int64(0),
 'writers': np.int64(0)}

In [31]:
dfpersonas = df_personas3

In [32]:
dfpersonas.to_csv('data/personas_hulu.csv')