<a href="https://colab.research.google.com/github/nmuraro/entregables/blob/main/docs/labs/lab_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/fralfaro/MAT281/blob/main/docs/labs/lab_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# MAT281 - Laboratorio N°03





**Objetivo**: Aplicar técnicas avanzadas de manipulación y análisis de datos con pandas sobre un conjunto real de datos de contenido de Netflix, reforzando buenas prácticas y métodos eficientes sin recurrir a `groupby`, `merge`, `pivot`, ni `join`.



**Dataset**:

Trabajaremos con el archivo `netflix_titles.csv`, que contiene información sobre los títulos disponibles en la plataforma Netflix hasta el año 2021.

| Variable       | Clase     | Descripción                                                                 |
|----------------|-----------|------------------------------------------------------------------------------|
| show_id        | caracter  | Identificador único del título en el catálogo de Netflix.                   |
| type           | caracter  | Tipo de contenido: 'Movie' o 'TV Show'.                                     |
| title          | caracter  | Título del contenido.                                                       |
| director       | caracter  | Nombre del director (puede ser nulo).                                       |
| cast           | caracter  | Lista de actores principales (puede ser nulo).                              |
| country        | caracter  | País o países donde se produjo el contenido.                                |
| date_added     | fecha     | Fecha en la que el título fue agregado al catálogo de Netflix.              |
| release_year   | entero    | Año de lanzamiento original del título.                                     |
| rating         | caracter  | Clasificación por edad (por ejemplo: 'PG-13', 'TV-MA').                      |
| duration       | caracter  | Duración del contenido (minutos o número de temporadas para series).        |
| listed_in      | caracter  | Categorías o géneros en los que está clasificado el contenido.              |
| description    | caracter  | Breve sinopsis del contenido.                                               |




In [3]:
import pandas as pd

# Cargar datos
df = pd.read_csv('https://raw.githubusercontent.com/fralfaro/MAT281/main/docs/labs/data/netflix_titles.csv')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...



### Parte 1: Limpieza y preparación

1. Revisar y describir el dataset:

   * ¿Cuántas filas y columnas tiene?
   * ¿Qué tipos de datos hay?
   * ¿Cuántos valores nulos hay por columna?

2. Transformar la columna `date_added` a tipo fecha.

3. Crear columnas auxiliares con `assign`:

   * Año (`year_added`)
   * Mes (`month_added`)



In [4]:
#Pregunta 1
df.shape   #cantidad de filas es de 8807 y 12 columnas

(8807, 12)

In [5]:
df.dtypes  #tipo de datos

Unnamed: 0,0
show_id,object
type,object
title,object
director,object
cast,object
country,object
date_added,object
release_year,int64
rating,object
duration,object


In [6]:
df.isnull().sum()  #cantidad de valores nulos por columna

Unnamed: 0,0
show_id,0
type,0
title,0
director,2634
cast,825
country,831
date_added,10
release_year,0
rating,4
duration,3


In [7]:
#Pregunta 2
import datetime
df["date_added"] = pd.to_datetime(df["date_added"], errors="coerce")
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [8]:
#Pregunta 3
df = df.assign(
    year_added = df["date_added"].dt.year,
    month_added = df["date_added"].dt.month
)
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021.0,9.0
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021.0,9.0
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021.0,9.0
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021.0,9.0


## Parte 2: Técnicas avanzadas de pandas

4. Utilizar `.loc` para seleccionar películas (`type == 'Movie'`) que fueron agregadas después del año 2018.

5. Utilizar `str.contains()` y `str.extract()`:

   * Filtrar títulos que contienen la palabra 'love' (sin distinguir mayúsculas/minúsculas).
   * Extraer la duración en minutos para las películas desde la columna `duration`.

6. Aplicar `explode()` sobre la columna `listed_in` para obtener una fila por cada género.

7. Obtener un top 10 de géneros más frecuentes utilizando `value_counts()`.

8. Aplicar `where()` y `mask()` para marcar las películas de más de 120 minutos como contenido largo en una nueva columna.

9. Utilizar `.loc` para filtrar películas que cumplen con:

   * Más de 100 minutos de duración.
   * Rating igual a `'R'`.
   * País igual a `'United States'`.

10. Utilizar `.style` para formatear visualmente el top 10 de películas más largas.

In [9]:
#Utilizar .loc para seleccionar películas (type == 'Movie') que fueron agregadas después del año 2018.

df_new=df.loc[(df['year_added']>2018) & (df['type']=='Movie')]
df_new.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021.0,9.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021.0,9.0
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021.0,9.0
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021.0,9.0


In [10]:
#Utilizar str.contains() y str.extract():

#Filtrar títulos que contienen la palabra 'love' (sin distinguir mayúsculas/minúsculas).
#Extraer la duración en minutos para las películas desde la columna duration

df_dos=df.loc[df['title'].str.contains('Love',case='False')]
df_tres=df_dos.loc[df_dos['type']=='Movie']
df_tres['new_duration']=df_tres["duration"].str.extract(r"(\d+)").astype(float)
df_tres.head()  #sólo contiene las peliculas que en los títulos contengan la palabra Love y en la Columna "new_duration"  muestra el número de la duración (en minutos).

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_tres['new_duration']=df_tres["duration"].str.extract(r"(\d+)").astype(float)


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,new_duration
158,s159,Movie,Love Don't Cost a Thing,Troy Byer,"Nick Cannon, Christina Milian, Kenan Thompson,...",United States,2021-09-01,2003,PG-13,101 min,"Comedies, Romantic Movies",A nerdy teen tries to make himself cool by ass...,2021.0,9.0,101.0
159,s160,Movie,Love in a Puff,Pang Ho-cheung,"Miriam Chin Wah Yeung, Shawn Yue, Singh Hartih...",Hong Kong,2021-09-01,2010,TV-MA,103 min,"Comedies, Dramas, International Movies",When the Hong Kong government enacts a ban on ...,2021.0,9.0,103.0
206,s207,Movie,"LSD: Love, Sex Aur Dhokha",Dibakar Banerjee,"Nushrat Bharucha, Anshuman Jha, Neha Chauhan, ...",India,2021-08-27,2010,TV-MA,112 min,"Dramas, Independent Movies, International Movies",This provocative drama examines how the voyeur...,2021.0,8.0,112.0
227,s228,Movie,Really Love,Angel Kristi Williams,"Kofi Siriboe, Yootha Wong-Loi-Sing, Michael Ea...",United States,2021-08-25,2020,TV-MA,95 min,"Dramas, Independent Movies, Romantic Movies",A rising Black painter tries to break into a c...,2021.0,8.0,95.0
246,s247,Movie,Man in Love,Yin Chen-hao,"Roy Chiu, Ann Hsu, Tsai Chen-nan, Chung Hsin-l...",,2021-08-20,2021,TV-MA,115 min,"Dramas, International Movies, Romantic Movies",When he meets a debt-ridden woman who's caring...,2021.0,8.0,115.0


In [11]:
#Aplicar explode() sobre la columna listed_in para obtener una fila por cada género.

df_exploded = df.copy()
df_exploded = df.dropna()  #se borra todas las filas que presentan valores None
df_exploded["listed_in"] = df_exploded["listed_in"].str.split(",")
df_exploded = df_exploded.explode("listed_in")
df_exploded.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_exploded["listed_in"] = df_exploded["listed_in"].str.split(",")


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,Dramas,"On a photo shoot in Ghana, an American model s...",2021.0,9.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,Independent Movies,"On a photo shoot in Ghana, an American model s...",2021.0,9.0
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,International Movies,"On a photo shoot in Ghana, an American model s...",2021.0,9.0
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,2021-09-24,2021,TV-14,9 Seasons,British TV Shows,A talented batch of amateur bakers face off in...,2021.0,9.0
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,2021-09-24,2021,TV-14,9 Seasons,Reality TV,A talented batch of amateur bakers face off in...,2021.0,9.0


In [12]:
#Obtener un top 10 de géneros más frecuentes utilizando value_counts().

top10_genres = df_exploded["listed_in"].value_counts().head(10)
print(top10_genres)

listed_in
 International Movies       2260
Dramas                      1518
Comedies                    1127
Action & Adventure           806
 Dramas                      775
 Independent Movies          720
 Romantic Movies             576
 Thrillers                   485
Children & Family Movies     469
 Comedies                    426
Name: count, dtype: int64


In [13]:
#Aplicar where() para marcar las películas de más de 120 minutos como contenido largo en una nueva columna.
df_Long= df.copy()
df_Long['new_duration']=df_Long["duration"].str.extract(r"(\d+)").astype(float)
df_Long=df_Long.loc[df_Long['type']=='Movie']
df_Long["content_length_where"] = "Long"
df_Long["content_length_where"] = df_Long["content_length_where"].where(
    df_Long["new_duration"] > 120,
    "Short"
)
df_Long.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,new_duration,content_length_where
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0,90.0,Short
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021.0,9.0,91.0,Short
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021.0,9.0,125.0,Long
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021.0,9.0,104.0,Short
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021.0,9.0,127.0,Long


In [14]:
#Aplicar mask() para marcar las películas de más de 120 minutos como contenido largo en una nueva columna.

df_short= df.copy()
df_short['new_duration']=df_short["duration"].str.extract(r"(\d+)").astype(float)
df_short=df_short.loc[df_short['type']=='Movie']
df_short["content_length_mask"] = "Short"
df_short["content_length_mask"] = df_short["content_length_mask"].mask(
    df_short["new_duration"] > 120,
    "Long"
)
df_short.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,new_duration,content_length_mask
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021.0,9.0,90.0,Short
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021.0,9.0,91.0,Short
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021.0,9.0,125.0,Long
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021.0,9.0,104.0,Short
12,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021.0,9.0,127.0,Long


In [15]:
df_Movie= df.copy()
df_Movie['new_duration']=df_Movie["duration"].str.extract(r"(\d+)").astype(float)
df_Movie=df_Movie.loc[(df_Movie['country']=='United States') & (df_Movie['rating']=='R') & (df_Movie['new_duration']>100)]
df_Movie.head() #peliculas con más de 100 minutos de duración con Rating R y del país EEUU


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added,month_added,new_duration
48,s49,Movie,Training Day,Antoine Fuqua,"Denzel Washington, Ethan Hawke, Scott Glenn, T...",United States,2021-09-16,2001,R,122 min,"Dramas, Thrillers",A rookie cop with one day to prove himself to ...,2021.0,9.0,122.0
81,s82,Movie,Kate,Cedric Nicolas-Troyan,"Mary Elizabeth Winstead, Jun Kunimura, Woody H...",United States,2021-09-10,2021,R,106 min,Action & Adventure,"Slipped a fatal poison on her final job, a rut...",2021.0,9.0,106.0
131,s132,Movie,Blade Runner: The Final Cut,Ridley Scott,"Harrison Ford, Rutger Hauer, Sean Young, Edwar...",United States,2021-09-01,1982,R,117 min,"Action & Adventure, Classic Movies, Cult Movies","In a smog-choked dystopian Los Angeles, blade ...",2021.0,9.0,117.0
139,s140,Movie,Do the Right Thing,Spike Lee,"Danny Aiello, Ossie Davis, Ruby Dee, Richard E...",United States,2021-09-01,1989,R,120 min,"Classic Movies, Comedies, Dramas","On a sweltering day in Brooklyn, simmering rac...",2021.0,9.0,120.0
144,s145,Movie,House Party,Reginald Hudlin,"Christopher Reid, Christopher Martin, Robin Ha...",United States,2021-09-01,1990,R,104 min,"Comedies, Cult Movies","Grounded by his strict father, Kid risks life ...",2021.0,9.0,104.0


In [16]:
#Utilizar .style para formatear visualmente el top 10 de películas más largas.

df_Top = df.copy()
df_Top['new_duration'] = df_Top["duration"].str.extract(r"(\d+)").astype(float)
df_Top =df_Top.loc[df_Top['type']=='Movie']
df_Top.head()

top10_longest = (
    df_Top
    .dropna(subset=['new_duration'])
    .nlargest(10, 'new_duration')
    [["title", "new_duration", "rating", "country", "date_added"]]
)

(
top10_longest.style
    .bar(subset=["new_duration"], color="#FFA07A")  # Barra horizontal
    .highlight_max(subset=["new_duration"], color="lightgreen")
    .set_caption("🎬 Top 10 películas más largas")
    .format({"new_duration": "{:.0f} min"})
)



Unnamed: 0,title,new_duration,rating,country,date_added
4253,Black Mirror: Bandersnatch,312 min,TV-MA,United States,2018-12-28 00:00:00
717,Headspace: Unwind Your Mind,273 min,TV-G,,2021-06-15 00:00:00
2491,The School of Mischief,253 min,TV-14,Egypt,2020-05-21 00:00:00
2487,No Longer kids,237 min,TV-14,Egypt,2020-05-21 00:00:00
2484,Lock Your Girls In,233 min,TV-PG,,2020-05-21 00:00:00
2488,Raya and Sakina,230 min,TV-14,,2020-05-21 00:00:00
166,Once Upon a Time in America,229 min,R,"Italy, United States",2021-09-01 00:00:00
7932,Sangam,228 min,TV-14,India,2019-12-31 00:00:00
1019,Lagaan,224 min,PG,"India, United Kingdom",2021-04-17 00:00:00
4573,Jodhaa Akbar,214 min,TV-14,India,2018-10-01 00:00:00




### Pregunta Desafío

11. ¿Cuáles son las combinaciones más frecuentes de género y rating en el dataset?
    (Sugerencia: utilizar `value_counts` con `subset=["genre", "rating"]` después de aplicar `explode()`).



### Bonus: Análisis de duplicados y limpieza

12. ¿Existen películas con el mismo nombre (`title`) pero con distinto año de lanzamiento (`release_year`)?
13. ¿Cuántos títulos únicos hay en total en la columna `title`?





In [18]:
#¿Cuáles son las combinaciones más frecuentes de género y rating en el dataset? (Sugerencia: utilizar value_counts con subset=["genre", "rating"] después de aplicar explode()).

df_exp = df.copy()
df_exp["listed_in"] = df_exp["listed_in"].astype(str).str.split(",")
df_exp = df_exp.explode("listed_in")
df_exp["listed_in"] = df_exp["listed_in"].str.strip()

combo_counts = (
    df_exp.value_counts(subset=["listed_in", "rating"])
    .reset_index(name="count")
    .sort_values("count", ascending=False)
)
combo_counts.head(10)

Unnamed: 0,listed_in,rating,count
0,International Movies,TV-MA,1130
1,International Movies,TV-14,1065
2,Dramas,TV-MA,830
3,International TV Shows,TV-MA,714
4,Dramas,TV-14,693
5,International TV Shows,TV-14,472
6,Comedies,TV-14,465
7,TV Dramas,TV-MA,434
8,Comedies,TV-MA,431
9,Dramas,R,375


In [20]:
#¿Existen películas con el mismo nombre (title) pero con distinto año de lanzamiento (release_year)?

title_year_counts = df.groupby("title")["release_year"].nunique()

titles_diff_years = title_year_counts[title_year_counts > 1]

print("Cantidad de títulos repetidos con distinto año:", len(titles_diff_years))
titles_diff_years.head(10)
unique_titles = df["title"].nunique()
print("Cantidad de títulos únicos:", unique_titles)


Cantidad de títulos repetidos con distinto año: 0
Cantidad de títulos únicos: 8807
