**EXPLORATORY DATA ANALYSIS (EDA) AND EXTRACT TRANSFORM LOAD (ETL)**

This Jupyter Notebooks shows all the steps and processes used for the EDA and ETL.

Import neccesary libraries

In [40]:
# Pandas
import pandas as pd
#Numpy
import numpy as np
#OS
import os

A function will be created that allows for the importation of the datasets.

In [42]:
def read_file_to_dataframe(file_path):
    """
    This function reads a file (csv, json, xlsx, txt) and returns the data  as a Pandas DataFrame.
    
    Parameters:
    file_name (str): The name of the file to be read, including the file extension.
    
    Returns:
    pd.DataFrame: DataFrame containing the data from the file.
    """
  
  # Get file extension
    _, file_extension = os.path.splitext(file_path)
    
    # Read file depending on extension
    if file_extension == '.csv':
        return pd.read_csv(file_path)
    elif file_extension == '.json':
        return pd.read_json(file_path)
    elif file_extension == '.xlsx':
        return pd.read_excel(file_path)
    elif file_extension == '.txt':
        return pd.read_csv(file_path, delimiter='\t')
    else:
        raise ValueError(f'Unsupported file extension: {file_extension}')

DataFrame creations based on dataset provided by soyHenry.

In [43]:
amazon_titles = read_file_to_dataframe(r"Datasets\amazon_prime_titles.csv")

In [44]:
disney_titles = read_file_to_dataframe(r"Datasets\disney_plus_titles.csv")

In [45]:
hulu_titles = read_file_to_dataframe(r'Datasets\hulu_titles.csv')

In [46]:
netflix_titles=read_file_to_dataframe(r"Datasets\netflix_titles.json")

**Exploratory Data Analysis**

In [47]:
# AMAZON
amazon_titles.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ..."


In [48]:
#info
amazon_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9668 entries, 0 to 9667
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       9668 non-null   object
 1   type          9668 non-null   object
 2   title         9668 non-null   object
 3   director      7586 non-null   object
 4   cast          8435 non-null   object
 5   country       672 non-null    object
 6   date_added    155 non-null    object
 7   release_year  9668 non-null   int64 
 8   rating        9331 non-null   object
 9   duration      9668 non-null   object
 10  listed_in     9668 non-null   object
 11  description   9668 non-null   object
dtypes: int64(1), object(11)
memory usage: 906.5+ KB


In [14]:
# Check if the file has nulls
amazon_titles.isnull().sum()

show_id            0
type               0
title              0
director        2082
cast            1233
country         8996
date_added      9513
release_year       0
rating           337
duration           0
listed_in          0
description        0
dtype: int64

In [15]:
#DISNEY
disney_titles.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,23 min,"Animation, Family",Join Mickey and the gang as they duck the halls!
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,91 min,Comedy,Santa Claus passes his magic bag to a new St. ...
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,23 min,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,41 min,Musical,"This is real life, not just fantasy!"


In [17]:
#Info
disney_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1450 entries, 0 to 1449
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       1450 non-null   object
 1   type          1450 non-null   object
 2   title         1450 non-null   object
 3   director      977 non-null    object
 4   cast          1260 non-null   object
 5   country       1231 non-null   object
 6   date_added    1447 non-null   object
 7   release_year  1450 non-null   int64 
 8   rating        1447 non-null   object
 9   duration      1450 non-null   object
 10  listed_in     1450 non-null   object
 11  description   1450 non-null   object
dtypes: int64(1), object(11)
memory usage: 136.1+ KB


In [16]:
# Check if the file has nulls
disney_titles.isnull().sum()

show_id           0
type              0
title             0
director        473
cast            190
country         219
date_added        3
release_year      0
rating            3
duration          0
listed_in         0
description       0
dtype: int64

In [18]:
#HULU
hulu_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       3073 non-null   object 
 1   type          3073 non-null   object 
 2   title         3073 non-null   object 
 3   director      3 non-null      object 
 4   cast          0 non-null      float64
 5   country       1620 non-null   object 
 6   date_added    3045 non-null   object 
 7   release_year  3073 non-null   int64  
 8   rating        2553 non-null   object 
 9   duration      2594 non-null   object 
 10  listed_in     3073 non-null   object 
 11  description   3069 non-null   object 
dtypes: float64(1), int64(1), object(10)
memory usage: 288.2+ KB


In [20]:
#info
hulu_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3073 entries, 0 to 3072
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       3073 non-null   object 
 1   type          3073 non-null   object 
 2   title         3073 non-null   object 
 3   director      3 non-null      object 
 4   cast          0 non-null      float64
 5   country       1620 non-null   object 
 6   date_added    3045 non-null   object 
 7   release_year  3073 non-null   int64  
 8   rating        2553 non-null   object 
 9   duration      2594 non-null   object 
 10  listed_in     3073 non-null   object 
 11  description   3069 non-null   object 
dtypes: float64(1), int64(1), object(10)
memory usage: 288.2+ KB


In [21]:
#heck if the file has nulls
hulu_titles.isnull().sum()

show_id            0
type               0
title              0
director        3070
cast            3073
country         1453
date_added        28
release_year       0
rating           520
duration         479
listed_in          0
description        4
dtype: int64

In [22]:
#NETLIX
netflix_titles.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."


In [23]:
#Info
netflix_titles.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 894.5+ KB


In [24]:
#Check if the file has nulls
netflix_titles.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

As a conclusion, there are some nulls values for the information from hulu and netflix that we must fix but in general terms we can say that we have very good conditions to proceed with the ETL process.

**Extract, Transform and Load** 

As first step I will create a copy of the DF in order to work better

In [49]:
#Amazon
amazon_titles_copy =amazon_titles.copy()
#Disney
disney_titles_copy=disney_titles.copy()
#Hulu
hulu_titles_copy=hulu_titles.copy()
#Netflix
netflix_titles_copy= netflix_titles.copy()

In [50]:
#Adding a new column for each dataset in order to get the platform
amazon_titles_copy['Platform'] = 'Amazon'
disney_titles_copy['Platform'] = 'Disney'
hulu_titles_copy['Platform'] = 'Hulu'
netflix_titles_copy['Platform'] = 'Netflix'

In [51]:
#Replacing null values
hulu_titles_copy['duration'].fillna('No data',inplace=True)
netflix_titles_copy['duration'].fillna('No data',inplace=True)

In the column duration with have the information as numbers and the time's unit.
A function will be created to divide this info in two columns: 

In [52]:
def separa_columna_duration(DF):
    """
     This function takes a pandas DataFrame as an argument and creates two new columns, one for the season and one for the duration.
    
    Parameters:
    df(pd.DataFrame): The input DataFrame.
    
    Returns:
    None
    
    """
    DF['seasons']=DF["duration"].apply(lambda i: i.split()[0] if i.split()[1]=='Seasons' else '0')
    DF['movie_duration']= DF['duration'].apply(lambda i: i.split()[0] if i.split(" ",1)[1]=='min' or i.split(" ",1)[1]=='mim'  else '0')
    return DF

In [53]:
amazon_titles_copy =separa_columna_duration(amazon_titles_copy)
disney_titles_copy = separa_columna_duration(disney_titles_copy)
hulu_titles_copy = separa_columna_duration(hulu_titles_copy)
tetflix_titles_copy = separa_columna_duration(netflix_titles_copy)
 

In [54]:
#Droping original column duration
amazon_titles_copy.drop(columns='duration',inplace=True)
disney_titles_copy.drop(columns='duration',inplace= True)
hulu_titles_copy.drop(columns= 'duration', inplace=True)
netflix_titles_copy.drop(columns='duration',inplace=True)

In [55]:
#Check
amazon_titles_copy.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,Platform,seasons,movie_duration
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,"Comedy, Drama",A small fishing village must procure a local d...,Amazon,0,113
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,Amazon,0,110
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,Amazon,0,74
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,Documentary,"Pink breaks the mold once again, bringing her ...",Amazon,0,69
4,s5,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,"March 30, 2021",1989,,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...,Amazon,0,45


In [36]:
disney_titles_copy.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,Platform,temporadas,duracion_peliculas
0,s1,Movie,Duck the Halls: A Mickey Mouse Christmas Special,"Alonso Ramirez Ramos, Dave Wasson","Chris Diamantopoulos, Tony Anselmo, Tress MacN...",,"November 26, 2021",2016,TV-G,"Animation, Family",Join Mickey and the gang as they duck the halls!,Disney,0,23
1,s2,Movie,Ernest Saves Christmas,John Cherry,"Jim Varney, Noelle Parker, Douglas Seale",,"November 26, 2021",1988,PG,Comedy,Santa Claus passes his magic bag to a new St. ...,Disney,0,91
2,s3,Movie,Ice Age: A Mammoth Christmas,Karen Disher,"Raymond Albert Romano, John Leguizamo, Denis L...",United States,"November 26, 2021",2011,TV-G,"Animation, Comedy, Family",Sid the Sloth is on Santa's naughty list.,Disney,0,23
3,s4,Movie,The Queen Family Singalong,Hamish Hamilton,"Darren Criss, Adam Lambert, Derek Hough, Alexa...",,"November 26, 2021",2021,TV-PG,Musical,"This is real life, not just fantasy!",Disney,0,41
4,s5,TV Show,The Beatles: Get Back,,"John Lennon, Paul McCartney, George Harrison, ...",,"November 25, 2021",2021,,"Docuseries, Historical, Music",A three-part documentary from Peter Jackson ca...,Disney,0,0


In [56]:
#I merge multiple DataFrames into a single one for practical purposes
movies_and_series=pd.concat([amazon_titles_copy, disney_titles_copy, hulu_titles_copy, netflix_titles_copy],axis=0)

In [57]:
#Check
movies_and_series.head(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,Platform,seasons,movie_duration
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,"Comedy, Drama",A small fishing village must procure a local d...,Amazon,0,113
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,Amazon,0,110
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,Amazon,0,74
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,Documentary,"Pink breaks the mold once again, bringing her ...",Amazon,0,69


In [58]:
movies_and_series.tail(4)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,Platform,seasons,movie_duration
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g...",Netflix,2,0
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...,Netflix,0,88
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",Netflix,0,88
8806,s8807,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,"March 2, 2019",2015,TV-14,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...,Netflix,0,111


In [60]:
#Columns to uppercase
movies_and_series.columns = movies_and_series.columns.str.upper()

In [61]:
#Final check
movies_and_series.head(4)

Unnamed: 0,SHOW_ID,TYPE,TITLE,DIRECTOR,CAST,COUNTRY,DATE_ADDED,RELEASE_YEAR,RATING,LISTED_IN,DESCRIPTION,PLATFORM,SEASONS,MOVIE_DURATION
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,"Comedy, Drama",A small fishing village must procure a local d...,Amazon,0,113
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,Amazon,0,110
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,Amazon,0,74
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,Documentary,"Pink breaks the mold once again, bringing her ...",Amazon,0,69


In [62]:
#Information of the new DF
movies_and_series.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 22998 entries, 0 to 8806
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   SHOW_ID         22998 non-null  object
 1   TYPE            22998 non-null  object
 2   TITLE           22998 non-null  object
 3   DIRECTOR        14739 non-null  object
 4   CAST            17677 non-null  object
 5   COUNTRY         11499 non-null  object
 6   DATE_ADDED      13444 non-null  object
 7   RELEASE_YEAR    22998 non-null  int64 
 8   RATING          22134 non-null  object
 9   LISTED_IN       22998 non-null  object
 10  DESCRIPTION     22994 non-null  object
 11  PLATFORM        22998 non-null  object
 12  SEASONS         22998 non-null  object
 13  MOVIE_DURATION  22998 non-null  object
dtypes: int64(1), object(13)
memory usage: 2.6+ MB


In [63]:
#Transformar type date 
movies_and_series.SEASONS=movies_and_series.SEASONS.astype(int)

In [64]:
#Transformo el tipo de dato de duración
movies_and_series.MOVIE_DURATION= movies_and_series.MOVIE_DURATION.astype(int)



In [65]:
#Export to a new dataser call
movies_and_series.to_csv(r"Datasets\movies_and_series.csv", index=False, header=True)


Next, we will proceed to perform example queries to make them in the API.

QUERIES TO PERFORM

-Get the maximum duration according to the type of film (movie/series), by platform and by year. The request should be: get_max_duration (year, platform, [min or season])

-Get the count of movies and series (separated) by platform. The request should be: get_count_plataform(platform)

-Get the count of times a genre and the platform with the highest frequency of the same is repeated. The request should be: get_listedin('genre'). As an example of genre you can use 'comedy', which should return a count of 2099 for the Amazon platform.

-Actor that is most repeated according to platform and year. The request should be: get_actor(platform, year)

In [129]:
#Get the maximum duration according to the type of film (movie/series), by platform and by year. In this example we will use 2018 and Hulu"

def get_max_duration(year:int,platform:str,min_or_seasson:str):
    df=movies_and_series[(movies_and_series['RELEASE_YEAR']==year) & (movies_and_series['PLATFORM']==platform)]
    if min_or_seasson == 'min':
        a=df.MOVIE_DURATION.max()
        title=df[df.MOVIE_DURATION==a] ['TITLE']
        title=title.to_list()
        title=title[0]
    else:
        a=df.SEASONS.max()
        title=df[df.SEASONS==a] ['TITLE']
        title=title.to_list()
        title=title[0]
    return title


In [130]:
get_max_duration(2018,'Hulu','min')

'The House That Jack Built'

In [122]:
# Get the count of movies and series (separated) by platform. Example: Netflix
def get_count_plataform(platform:str):
    platform = platform.replace("'","")
    platform = platform.capitalize()
    Count_platform = movies_and_series[(movies_and_series.PLATFORM == platform)]  # Aplicamos una máscara de acuerdo al parámetro
    movies = int(Count_platform[Count_platform.TYPE == 'Movie'].TYPE.value_counts()[0]) # Contamos la cantidad de ocurrencias
    series = int(Count_platform[Count_platform.TYPE == 'TV Show'].TYPE.value_counts()[0])
    # Retornamos el valor en formato str para poder aclarar a qué corresponde cada cantidad
    return platform, f'Movie: {movies}', f'TV Show: {series}'

In [123]:
get_count_plataform("Netflix")

('Netflix', 'Movie: 6131', 'TV Show: 2676')

In [131]:
#Get the count of times a genre and the platform with the highest frequency of the same is repeated. Example : Amazon

def get_listedin(genre:str):
    a=((movies_and_series['LISTED_IN'].str.contains(genre)) & (movies_and_series['PLATFORM']=='Amazon')).sum()
    b=((movies_and_series['LISTED_IN'].str.contains(genre)) & (movies_and_series['PLATFORM']=='Disney')).sum()
    c=((movies_and_series['LISTED_IN'].str.contains(genre)) & (movies_and_series['PLATFORM']=='Hulu')).sum()
    d=((movies_and_series['LISTED_IN'].str.contains(genre)) & (movies_and_series['PLATFORM']=='Netflix')).sum()
    list=[a,b,c,d]
    result=max(list)
    return int(result)

In [133]:
get_listedin('Comedy')


2099

In [136]:

#Actor that is most repeated according to platform and year.
def get_actor(platform:str,year:int):
    platform = platform.replace("'","")
    platform = platform.capitalize()
    actores, repeticiones = list(), list()  # Creamos dos listas vacías para colocar cada actor y la cantidad de veces
    # Aplicamos máscara para obtener una lista de listas de actores, que no tengan nulos
    Cast_list = list(movies_and_series[(movies_and_series.PLATFORM == platform) & (movies_and_series.RELEASE_YEAR == year)].CAST.fillna(''))

    for each in Cast_list:  # Iteramos cada elemento, que es a su vez una lista de actores
        if not(each == '' or each is None):    # Validamos que tenga datos
            list1 = each.split(",") # Separamos por comas, para obtener una lista nueva cuyos elementos sean los actores
            for elem in list1:  # Iteramos sobre esta nueva lista de actores
                elem = elem.strip() # Limpiamos los espacios vacíos
                # Si el actor ya se encuentra en 'actores', entonces sumará 1 en 'apariciones' con el mismo índice
                if elem in actores: 
                    repeticiones[actores.index(elem)] += 1
                # De lo contrario, agregará el actor en 'actores' y 1 en 'apariciones'
                else:    
                    actores.append(elem)
                    repeticiones.append(1)
    if actores == []: return 'No hay datos' # Para el caso de que ambas listas queden vacías, que no retorne error
    # Retornamos la plataforma, el actor que más se repite en esa plataforma y ese año, y cuántas veces lo hace
    return (platform, max(repeticiones), actores[repeticiones.index(max(repeticiones))])

In [137]:
get_actor('Netflix',2018)

('Netflix', 8, 'Andrea Libman')