# Creacion de DataFrames de Disney+

## Lecturas archivos

In [1]:
import numpy as np
import pandas as pd

### api_disney
Información de las películas dada por la API.
Las columnas nos entregan los siguientes datos:
- `id`: Identificador de la película en la API
- `title`: Nombre de la película
- `year`: Año de estreno de la película
- `imdb_id`: Identificador de la película en IMDB
- `tmdb_id`: Identificador de la película en TMDB
- `tmdb_type`: Tipo del título en TMDB
- `type`: Tipo del título en la API

In [2]:
dfdisney = pd.read_csv('data/api_disney.csv')
dfdisney.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434
1,1685606,Thunderbolts*,2025,tt20969586,986056
2,1508465,Snow White,2025,tt6208148,447273
3,1560295,Doctor Strange in the Multiverse of Madness,2022,tt9419884,453395
4,138018,Avatar: The Way of Water,2022,tt1630029,76600


### imdb_basics
Información básica de cada título en IMDB. Sus columnas son:
- `tconst`: Id del título en IMDB
- `titleType`: Tipo del título
- `primaryTitle`: Nombre mas común del título
- `originalTitle`: Nombre original del título
- `isAdult`: Bool que indica si es para adultos o no
- `startYear`: Año de salida, en series es el año de comienzo de la serie
- `endYear`: Año de fin de la serie (No muy útil ya que solo trabajaremos con películas)
- `runtimeMinutes`: Duración del título en minutos
- `genres`: Lista de géneros del título

In [3]:
imdb_basics = pd.read_csv('data/title.basics.tsv', sep='\t')
imdb_basics.head()

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0000001,short,Carmencita,Carmencita,0,1894,\N,1,"Documentary,Short"
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,\N,5,"Animation,Short"
2,tt0000003,short,Poor Pierrot,Pauvre Pierrot,0,1892,\N,5,"Animation,Comedy,Romance"
3,tt0000004,short,Un bon bock,Un bon bock,0,1892,\N,12,"Animation,Short"
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893,\N,1,Short


### imdb_ratings
Rating de cada película en IMDB. Sus columnas son:
- `tconst`: Id de IMDB
- `averageRating`: Puntaje promedio dado por los votos
- `numVotes`: Cantidad de votos

In [4]:
imdb_ratings = pd.read_csv('data/title.ratings.tsv', sep='\t')
imdb_ratings.head()

Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.7,2178
1,tt0000002,5.5,299
2,tt0000003,6.4,2243
3,tt0000004,5.2,193
4,tt0000005,6.2,2986


### imdb_principals
Trabajadores involucrados en cada título de IMDB (directores, productores, actores, etc.). Sus columnas son:
- `tconst`: Id del título en IMDB
- `ordering`: Id para enumerar a los trabajadores por título
- `nconst`: Id de persona en IMDB
- `category`: Categoria del rol que cumplió en el título
- `job`: Trabajo que tenía en el título
- `characters`: En caso de ser actor/actriz, muestra los nombres de los personajes que interpreta

In [5]:
imdb_principals = pd.read_csv('data/title.principals.tsv', sep='\t')
imdb_principals.head()

Unnamed: 0,tconst,ordering,nconst,category,job,characters
0,tt0000001,1,nm1588970,self,\N,"[""Self""]"
1,tt0000001,2,nm0005690,director,\N,\N
2,tt0000001,3,nm0005690,producer,producer,\N
3,tt0000001,4,nm0374658,cinematographer,director of photography,\N
4,tt0000002,1,nm0721526,director,\N,\N


### imdb_crew
Directores y escritores de cada título en IMDB. Sus columnas son:
- `tconst`: Id del título en IMDB
- `directors`: Id de persona del director en IMDB
- `writers`: Id de persona de los escritores en IMDB

In [6]:
imdb_crew = pd.read_csv('data/title.crew.tsv', sep='\t')
imdb_crew.head()

Unnamed: 0,tconst,directors,writers
0,tt0000001,nm0005690,\N
1,tt0000002,nm0721526,\N
2,tt0000003,nm0721526,nm0721526
3,tt0000004,nm0721526,\N
4,tt0000005,nm0005690,\N


### imdb_name
Este Dataframe contiene informacion de cada persona relacionada a títulos dentro de IMDB. Sus columnas son:
- `nconst`: Id de la persona en IMDB
- `primaryName`: Nombre por el que es más conocida la persona
- `birthYear`: Año de nacimiento de la persona
- `deathYear`: Año de fallecimiento de la persona
- `primaryProfession`: Los tres roles que más suele cumplir en los títulos
- `knownForTitle`: Títulos por los que es conocido/conocida

In [7]:
imdb_name = pd.read_csv('data/name.basics.tsv', sep='\t')
imdb_name.head()

Unnamed: 0,nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
0,nm0000001,Fred Astaire,1899,1987,"actor,miscellaneous,producer","tt0050419,tt0072308,tt0027125,tt0025164"
1,nm0000002,Lauren Bacall,1924,2014,"actress,soundtrack,archive_footage","tt0037382,tt0075213,tt0038355,tt0117057"
2,nm0000003,Brigitte Bardot,1934,\N,"actress,music_department,producer","tt0057345,tt0049189,tt0056404,tt0054452"
3,nm0000004,John Belushi,1949,1982,"actor,writer,music_department","tt0072562,tt0077975,tt0080455,tt0078723"
4,nm0000005,Ingmar Bergman,1918,2007,"writer,director,actor","tt0050986,tt0069467,tt0050976,tt0083922"


## Creación de Dataframe principal

### Join con `imdb_basics`

In [8]:
df_main1 = dfdisney.merge(imdb_basics, how='left', left_on='imdb_id', right_on='tconst')
df_main1.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434,tt36463894,movie,Predator: Killer of Killers,Predator: Killer of Killers,0.0,2025,\N,85,"Action,Adventure,Animation"
1,1685606,Thunderbolts*,2025,tt20969586,986056,tt20969586,movie,Thunderbolts*,Thunderbolts*,0.0,2025,\N,127,"Action,Adventure,Crime"
2,1508465,Snow White,2025,tt6208148,447273,tt6208148,movie,Snow White,Snow White,0.0,2025,\N,109,"Adventure,Family,Fantasy"
3,1560295,Doctor Strange in the Multiverse of Madness,2022,tt9419884,453395,tt9419884,movie,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,0.0,2022,\N,126,"Action,Adventure,Fantasy"
4,138018,Avatar: The Way of Water,2022,tt1630029,76600,tt1630029,movie,Avatar: The Way of Water,Avatar: The Way of Water,0.0,2022,\N,192,"Action,Adventure,Fantasy"


### Join con `imdb_ratings`

In [9]:
df_main2 = df_main1.merge(imdb_ratings, how='left', left_on='imdb_id', right_on='tconst')
df_main2.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id,tconst_x,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,tconst_y,averageRating,numVotes
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434,tt36463894,movie,Predator: Killer of Killers,Predator: Killer of Killers,0.0,2025,\N,85,"Action,Adventure,Animation",tt36463894,7.5,44742.0
1,1685606,Thunderbolts*,2025,tt20969586,986056,tt20969586,movie,Thunderbolts*,Thunderbolts*,0.0,2025,\N,127,"Action,Adventure,Crime",tt20969586,7.2,207464.0
2,1508465,Snow White,2025,tt6208148,447273,tt6208148,movie,Snow White,Snow White,0.0,2025,\N,109,"Adventure,Family,Fantasy",tt6208148,2.1,388047.0
3,1560295,Doctor Strange in the Multiverse of Madness,2022,tt9419884,453395,tt9419884,movie,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,0.0,2022,\N,126,"Action,Adventure,Fantasy",tt9419884,6.9,526242.0
4,138018,Avatar: The Way of Water,2022,tt1630029,76600,tt1630029,movie,Avatar: The Way of Water,Avatar: The Way of Water,0.0,2022,\N,192,"Action,Adventure,Fantasy",tt1630029,7.5,545038.0


### Join con `imdb_crew`

In [10]:
df_main3 = df_main2.merge(imdb_crew, how='left', left_on='imdb_id', right_on='tconst')
df_main3.head()

Unnamed: 0,id,title,year,imdb_id,tmdb_id,tconst_x,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,tconst_y,averageRating,numVotes,tconst,directors,writers
0,1874486,Predator: Killer of Killers,2025,tt36463894,1376434,tt36463894,movie,Predator: Killer of Killers,Predator: Killer of Killers,0.0,2025,\N,85,"Action,Adventure,Animation",tt36463894,7.5,44742.0,tt36463894,"nm0870469,nm1945519","nm3026436,nm0870469,nm0859029,nm0859049"
1,1685606,Thunderbolts*,2025,tt20969586,986056,tt20969586,movie,Thunderbolts*,Thunderbolts*,0.0,2025,\N,127,"Action,Adventure,Crime",tt20969586,7.2,207464.0,tt20969586,nm1500577,"nm3069408,nm3355108,nm1598146,nm4987357,nm0498..."
2,1508465,Snow White,2025,tt6208148,447273,tt6208148,movie,Snow White,Snow White,0.0,2025,\N,109,"Adventure,Family,Fantasy",tt6208148,2.1,388047.0,tt6208148,nm1989536,"nm0933379,nm0342278,nm0342303"
3,1560295,Doctor Strange in the Multiverse of Madness,2022,tt9419884,453395,tt9419884,movie,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,0.0,2022,\N,126,"Action,Adventure,Fantasy",tt9419884,6.9,526242.0,tt9419884,nm0000600,"nm5642271,nm0498278,nm0228492"
4,138018,Avatar: The Way of Water,2022,tt1630029,76600,tt1630029,movie,Avatar: The Way of Water,Avatar: The Way of Water,0.0,2022,\N,192,"Action,Adventure,Fantasy",tt1630029,7.5,545038.0,tt1630029,nm0000116,"nm0000116,nm0415425,nm0798646,nm0295264,nm0004307"


### Limpieza
#### Eliminar columnas
Eliminamos columnas con información repetida o irrelevante

In [11]:
df_main4 = df_main3[['id', 'imdb_id', 'title', 'primaryTitle', 'originalTitle', 'titleType', 'year', 'startYear', 
                     'isAdult', 'runtimeMinutes', 'genres', 'averageRating', 'numVotes', 'directors', 'writers']]
df_main4.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0.0,85,"Action,Adventure,Animation",7.5,44742.0,"nm0870469,nm1945519","nm3026436,nm0870469,nm0859029,nm0859049"
1,1685606,tt20969586,Thunderbolts*,Thunderbolts*,Thunderbolts*,movie,2025,2025,0.0,127,"Action,Adventure,Crime",7.2,207464.0,nm1500577,"nm3069408,nm3355108,nm1598146,nm4987357,nm0498..."
2,1508465,tt6208148,Snow White,Snow White,Snow White,movie,2025,2025,0.0,109,"Adventure,Family,Fantasy",2.1,388047.0,nm1989536,"nm0933379,nm0342278,nm0342303"
3,1560295,tt9419884,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,movie,2022,2022,0.0,126,"Action,Adventure,Fantasy",6.9,526242.0,nm0000600,"nm5642271,nm0498278,nm0228492"
4,138018,tt1630029,Avatar: The Way of Water,Avatar: The Way of Water,Avatar: The Way of Water,movie,2022,2022,0.0,192,"Action,Adventure,Fantasy",7.5,545038.0,nm0000116,"nm0000116,nm0415425,nm0798646,nm0295264,nm0004307"


#### Correción valores nulos y tipos de columnas

In [12]:
df_main5 = df_main4.replace(['\\N', np.nan, None], pd.NA)

In [13]:
df_main5.dtypes

id                 int64
imdb_id           object
title             object
primaryTitle      object
originalTitle     object
titleType         object
year               int64
startYear         object
isAdult           object
runtimeMinutes    object
genres            object
averageRating     object
numVotes          object
directors         object
writers           object
dtype: object

In [14]:
df_main5['titleType'] = df_main5['titleType'].astype('category')
df_main5['startYear'] = df_main5['startYear'].astype('Int64')
df_main5['isAdult'] = df_main5['isAdult'].astype('Int64')
df_main5['runtimeMinutes'] = df_main5['runtimeMinutes'].astype('Int64')
df_main5['averageRating'] = df_main5['averageRating'].astype('Float64')
df_main5['numVotes'] = df_main5['numVotes'].astype('Int64')

In [15]:
df_main5.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"Action,Adventure,Animation",7.5,44742,"nm0870469,nm1945519","nm3026436,nm0870469,nm0859029,nm0859049"
1,1685606,tt20969586,Thunderbolts*,Thunderbolts*,Thunderbolts*,movie,2025,2025,0,127,"Action,Adventure,Crime",7.2,207464,nm1500577,"nm3069408,nm3355108,nm1598146,nm4987357,nm0498..."
2,1508465,tt6208148,Snow White,Snow White,Snow White,movie,2025,2025,0,109,"Adventure,Family,Fantasy",2.1,388047,nm1989536,"nm0933379,nm0342278,nm0342303"
3,1560295,tt9419884,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,movie,2022,2022,0,126,"Action,Adventure,Fantasy",6.9,526242,nm0000600,"nm5642271,nm0498278,nm0228492"
4,138018,tt1630029,Avatar: The Way of Water,Avatar: The Way of Water,Avatar: The Way of Water,movie,2022,2022,0,192,"Action,Adventure,Fantasy",7.5,545038,nm0000116,"nm0000116,nm0415425,nm0798646,nm0295264,nm0004307"


In [16]:
for i in ['genres', 'directors', 'writers']:
    df_main5[i] = df_main5[i].apply(lambda x: str(x).split(',') if pd.notna(x) else [])

df_main5.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
1,1685606,tt20969586,Thunderbolts*,Thunderbolts*,Thunderbolts*,movie,2025,2025,0,127,"[Action, Adventure, Crime]",7.2,207464,[nm1500577],"[nm3069408, nm3355108, nm1598146, nm4987357, n..."
2,1508465,tt6208148,Snow White,Snow White,Snow White,movie,2025,2025,0,109,"[Adventure, Family, Fantasy]",2.1,388047,[nm1989536],"[nm0933379, nm0342278, nm0342303]"
3,1560295,tt9419884,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,Doctor Strange in the Multiverse of Madness,movie,2022,2022,0,126,"[Action, Adventure, Fantasy]",6.9,526242,[nm0000600],"[nm5642271, nm0498278, nm0228492]"
4,138018,tt1630029,Avatar: The Way of Water,Avatar: The Way of Water,Avatar: The Way of Water,movie,2022,2022,0,192,"[Action, Adventure, Fantasy]",7.5,545038,[nm0000116],"[nm0000116, nm0415425, nm0798646, nm0295264, n..."


#### Valores Na

In [17]:
resultados = {}
for i in df_main5.columns:
    resultados[i] = df_main5[i].isna().sum()
resultados

{'id': np.int64(0),
 'imdb_id': np.int64(0),
 'title': np.int64(0),
 'primaryTitle': np.int64(2),
 'originalTitle': np.int64(2),
 'titleType': np.int64(2),
 'year': np.int64(0),
 'startYear': np.int64(2),
 'isAdult': np.int64(2),
 'runtimeMinutes': np.int64(4),
 'genres': np.int64(0),
 'averageRating': np.int64(2),
 'numVotes': np.int64(2),
 'directors': np.int64(0),
 'writers': np.int64(0)}

In [18]:
df_main5[pd.isna(df_main5['runtimeMinutes'])] 

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
328,1623475,tt11422728,"Summer of Soul (...Or, When the Revolution Cou...",,,,2021,,,,[],,,[],[]
393,1968255,tt38346012,The Balloonist,,,,2025,,,,[],,,[],[]
794,538039,tt9726390,Robert Durst: An ID Murder Mystery,Robert Durst: An ID Murder Mystery,Robert Durst: An ID Murder Mystery,tvMiniSeries,2019,2019.0,0.0,,[Crime],6.6,68.0,[nm3042526],[]
796,1942789,tt32059187,Jade Eyed Leopard,Jade Eyed Leopard,Jade Eyed Leopard,movie,2020,2020.0,0.0,,[Documentary],6.8,9.0,"[nm0431044, nm0431049]",[]


In [19]:
df_main5[pd.isna(df_main5['averageRating'])] 

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
328,1623475,tt11422728,"Summer of Soul (...Or, When the Revolution Cou...",,,,2021,,,,[],,,[],[]
393,1968255,tt38346012,The Balloonist,,,,2025,,,,[],,,[],[]


#### Duplicados

In [20]:
df_main5.duplicated('imdb_id').sum()

np.int64(0)

#### Análisis de columnas

In [21]:
df_main5.describe()

Unnamed: 0,id,year,startYear,isAdult,runtimeMinutes,averageRating,numVotes
count,800.0,800.0,798.0,798.0,796.0,798.0,798.0
mean,1364106.0,2005.835,2005.839599,0.0,93.124372,6.557393,150428.874687
std,871394.8,17.800951,17.760058,0.0,29.983364,1.061433,271302.054642
min,1867.0,1938.0,1937.0,0.0,3.0,1.6,9.0
25%,1184745.0,1998.0,1998.0,0.0,81.0,6.0,3285.0
50%,1404296.0,2010.0,2010.0,0.0,96.0,6.7,26974.0
75%,1663140.0,2019.0,2019.0,0.0,108.0,7.3,162386.5
max,11026850.0,2025.0,2025.0,0.0,192.0,9.2,1530708.0


In [22]:
dfmain = df_main5

In [23]:
dfmain.to_csv('data/imdb_disney.csv')

## Creación Dataframe personas

### Join con `imdb_principals`

In [24]:
df_personas1 = dfmain.merge(imdb_principals, how='left', left_on='imdb_id', right_on='tconst')
df_personas1.head()

Unnamed: 0,id,imdb_id,title,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,...,averageRating,numVotes,directors,writers,tconst,ordering,nconst,category,job,characters
0,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,1.0,nm0000299,actor,\N,"[""Vandy""]"
1,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,2.0,nm0168339,actor,\N,"[""Einar""]"
2,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,3.0,nm0327779,actor,\N,"[""Torres""]"
3,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,4.0,nm2689910,actor,\N,"[""Anders""]"
4,1874486,tt36463894,Predator: Killer of Killers,Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,...,7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]",tt36463894,5.0,nm7576527,actress,\N,"[""Freya""]"


### Eliminar columnas

In [25]:
df_personas2 = df_personas1[['id', 'imdb_id', 'title', 'nconst', 'ordering', 'category', 'job', 'characters', 'primaryTitle',
                             'originalTitle', 'titleType', 'year', 'startYear', 'isAdult', 'runtimeMinutes', 'genres', 'averageRating',
                             'numVotes', 'directors', 'writers']]
df_personas2.head()

Unnamed: 0,id,imdb_id,title,nconst,ordering,category,job,characters,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,nm0000299,1.0,actor,\N,"[""Vandy""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
1,1874486,tt36463894,Predator: Killer of Killers,nm0168339,2.0,actor,\N,"[""Einar""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
2,1874486,tt36463894,Predator: Killer of Killers,nm0327779,3.0,actor,\N,"[""Torres""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
3,1874486,tt36463894,Predator: Killer of Killers,nm2689910,4.0,actor,\N,"[""Anders""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
4,1874486,tt36463894,Predator: Killer of Killers,nm7576527,5.0,actress,\N,"[""Freya""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"


### Arreglar valores nulos y columnas

In [26]:
df_personas3 = df_personas2.replace(['\\N', np.nan, None], pd.NA)
df_personas3.head()

Unnamed: 0,id,imdb_id,title,nconst,ordering,category,job,characters,primaryTitle,originalTitle,titleType,year,startYear,isAdult,runtimeMinutes,genres,averageRating,numVotes,directors,writers
0,1874486,tt36463894,Predator: Killer of Killers,nm0000299,1.0,actor,,"[""Vandy""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
1,1874486,tt36463894,Predator: Killer of Killers,nm0168339,2.0,actor,,"[""Einar""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
2,1874486,tt36463894,Predator: Killer of Killers,nm0327779,3.0,actor,,"[""Torres""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
3,1874486,tt36463894,Predator: Killer of Killers,nm2689910,4.0,actor,,"[""Anders""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"
4,1874486,tt36463894,Predator: Killer of Killers,nm7576527,5.0,actress,,"[""Freya""]",Predator: Killer of Killers,Predator: Killer of Killers,movie,2025,2025,0,85,"[Action, Adventure, Animation]",7.5,44742,"[nm0870469, nm1945519]","[nm3026436, nm0870469, nm0859029, nm0859049]"


In [27]:
df_personas3.dtypes

id                   int64
imdb_id             object
title               object
nconst              object
ordering            object
category            object
job                 object
characters          object
primaryTitle        object
originalTitle       object
titleType         category
year                 int64
startYear            Int64
isAdult              Int64
runtimeMinutes       Int64
genres              object
averageRating      Float64
numVotes             Int64
directors           object
writers             object
dtype: object

In [28]:
df_personas3['ordering'] = df_personas3['ordering'].astype('Int64')
df_personas3['category'] = df_personas3['category'].astype('category')

### Valores nulos

In [29]:
resultados = {}
for i in df_personas3.columns:
    resultados[i] = df_personas3[i].isna().sum()
resultados

{'id': np.int64(0),
 'imdb_id': np.int64(0),
 'title': np.int64(0),
 'nconst': np.int64(3),
 'ordering': np.int64(3),
 'category': np.int64(3),
 'job': np.int64(11951),
 'characters': np.int64(8686),
 'primaryTitle': np.int64(2),
 'originalTitle': np.int64(2),
 'titleType': np.int64(2),
 'year': np.int64(0),
 'startYear': np.int64(2),
 'isAdult': np.int64(2),
 'runtimeMinutes': np.int64(15),
 'genres': np.int64(0),
 'averageRating': np.int64(2),
 'numVotes': np.int64(2),
 'directors': np.int64(0),
 'writers': np.int64(0)}

In [30]:
dfpersonas = df_personas3

In [31]:
dfpersonas.to_csv('data/personas_disney.csv')