# Disney+ Movies and TV Shows #


1. The Dataset contains information about TV Shows and Movies
2. We will segregate the dataset into two - TV Shows and Movies
3. Within each dataset, identify how many entries are there for each ratings? Which rating has the maximum count in both datasets?
4. What is the average duration of Movies and TV Shows?
5. In which month over the years are maximum number of movies released?

## Data Dictionary ##

1. show_id - Unique id
2. type - Movie or TV Show
3. title - Name of the movie/show
4. director - Directors of the movie/show
5. cast - Main cast of the moview/show
6. country - Country of production
7. date_Added - Date added on Disney+
8. release_year - Original Release Year of the moview/tv show
9. rating - Rating of the movie/show 
10. duration - Total duration of the moview/show

## Step 1: Read the file and display first 5 rows ##

In [7]:
from csv import reader
opened_file = open('.\Data\disney_plus_titles.csv', encoding="utf-8")
read_file = reader(opened_file)
dp = list(read_file)
dp_header = dp[0]
dp = dp[1:]

In [41]:
def explore_data(data_list_l, start_l, end_l, rows_and_columns_l=False):
    data_slice_l = data_list_l[start_l:end_l]
    for row_l in data_slice_l:
        print(row_l)
        print('\n')
        
    if rows_and_columns_l:
        print('no. of rows:', len(data_list_l))
        print('no. of columns:', len(data_list_l[0]))

In [42]:
print(dp_header)

['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added', 'release_year', 'rating', 'duration', 'listed_in', 'description']


In [43]:
explore_data(dp,0,5,True)

['s1', 'Movie', 'Duck the Halls: A Mickey Mouse Christmas Special', 'Alonso Ramirez Ramos, Dave Wasson', 'Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton', '', 'November 26, 2021', '2016', 'TV-G', '23', 'Animation, Family', 'Join Mickey and the gang as they duck the halls!']


['s2', 'Movie', 'Ernest Saves Christmas', 'John Cherry', 'Jim Varney, Noelle Parker, Douglas Seale', '', 'November 26, 2021', '1988', 'PG', '91', 'Comedy', 'Santa Claus passes his magic bag to a new St. Nic.']


['s3', 'Movie', 'Ice Age: A Mammoth Christmas', 'Karen Disher', 'Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah', 'United States', 'November 26, 2021', '2011', 'TV-G', '23', 'Animation, Comedy, Family', "Sid the Sloth is on Santa's naughty list."]


['s4', 'Movie', 'The Queen Family Singalong', 'Hamish Hamilton', 'Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen', '', 'November 26, 2021', '2021', 'TV-PG', '4

## Step 2: Separate Movies and TV Shows ##

Currently the dataset is clean. We can make it a bit dirty by having different values in type. like "Movie - Fairy Tale", "Movie - Adventure" and so on.

In [44]:
disney_movies = []
disney_shows = []
others = []

for row in dp:
    type = row[1].lower()
    if type.startswith("movie"):
        disney_movies.append(row)
    elif type.startswith("tv"):
        disney_shows.append(row)
    else:
        others.append(row)

In [45]:
print("Disney Movies: ", len(disney_movies))
print("Disney TV Shows: ", len(disney_shows))
print("Disney Others: ", len(others))

Disney Movies:  1052
Disney TV Shows:  398
Disney Others:  0


In [46]:
explore_data(disney_movies,0,5,False)

['s1', 'Movie', 'Duck the Halls: A Mickey Mouse Christmas Special', 'Alonso Ramirez Ramos, Dave Wasson', 'Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton', '', 'November 26, 2021', '2016', 'TV-G', '23', 'Animation, Family', 'Join Mickey and the gang as they duck the halls!']


['s2', 'Movie', 'Ernest Saves Christmas', 'John Cherry', 'Jim Varney, Noelle Parker, Douglas Seale', '', 'November 26, 2021', '1988', 'PG', '91', 'Comedy', 'Santa Claus passes his magic bag to a new St. Nic.']


['s3', 'Movie', 'Ice Age: A Mammoth Christmas', 'Karen Disher', 'Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah', 'United States', 'November 26, 2021', '2011', 'TV-G', '23', 'Animation, Comedy, Family', "Sid the Sloth is on Santa's naughty list."]


['s4', 'Movie', 'The Queen Family Singalong', 'Hamish Hamilton', 'Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen', '', 'November 26, 2021', '2021', 'TV-PG', '4

In [47]:
explore_data(disney_shows,0,5,False)

['s5', 'TV Show', 'The Beatles: Get Back', '', 'John Lennon, Paul McCartney, George Harrison, Ringo Starr', '', 'November 25, 2021', '2021', '', '1 Season', 'Docuseries, Historical, Music', 'A three-part documentary from Peter Jackson capturing a moment in music history with The Beatles.']


['s7', 'TV Show', 'Hawkeye', '', 'Jeremy Renner, Hailee Steinfeld, Vera Farmiga, Fra Fee, Tony Dalton, Zahn McClarnon', '', 'November 24, 2021', '2021', 'TV-14', '1 Season', 'Action-Adventure, Superhero', 'Clint Barton/Hawkeye must team up with skilled archer Kate Bishop to unravel a criminal conspiracy.']


['s8', 'TV Show', 'Port Protection Alaska', '', 'Gary Muehlberger, Mary Miller, Curly Leach, Sam Carlson, Stuart Andrews, David Squibb', 'United States', 'November 24, 2021', '2015', 'TV-14', '2 Seasons', 'Docuseries, Reality, Survival', 'Residents of Port Protection must combat volatile conditions to survive and thrive in Alaska.']


['s9', 'TV Show', 'Secrets of the Zoo: Tampa', '', 'Dr. Ray 

## Step 3: Get a list of ratings ##

In [48]:
ratings_list = []
for row in dp:
    if row[8] not in ratings_list:
        ratings_list.append(row[8])

In [49]:
print(ratings_list)

['TV-G', 'PG', 'TV-PG', '', 'PG-13', 'TV-14', 'G', 'TV-Y7', 'TV-Y', 'TV-Y7-FV']


## Step 4: For each corresponding rating get the number of movies and shows. Create a dictionary for this ##

In [55]:
def rating_count(dataset_l):
    rating_count_l = {}
    for row_l in dataset_l:
        rating_l = row_l[8]
        if rating_l in rating_count_l:
            rating_count_l[rating_l] += 1
        else:
            rating_count_l[rating_l] = 1
    sorted_rating_count = dict(sorted(rating_count_l.items(), key=lambda item: item[1], reverse=True))
    return sorted_rating_count

In [56]:
print("Moview Ratings Count:", rating_count(disney_movies))
print("Shows Ratings Count:", rating_count(disney_shows))

Moview Ratings Count: {'G': 253, 'PG': 235, 'TV-G': 233, 'TV-PG': 181, 'PG-13': 66, 'TV-14': 37, 'TV-Y7': 36, 'TV-Y7-FV': 7, 'TV-Y': 3, '': 1}
Shows Ratings Count: {'TV-PG': 120, 'TV-Y7': 95, 'TV-G': 85, 'TV-Y': 47, 'TV-14': 42, 'TV-Y7-FV': 6, '': 2, 'PG': 1}


## Sort the above dictionary in descending order of counts ##

## Step 5: What is the average duration of movies and shows ##

The duration of movies is in minutes and shows in seasons. We can convert them to numeric and remove the suffix minutes and seasons

In [57]:
def duration_conversion(data_l):
    for row_l in data_l:
        new_values = []
        new_values = row_l[9].split(" ")
        row_l[9] = int(new_values[0])
    return data_l

In [None]:
def average(data_l,loc_l):
    sum = 0
    for row in data_l:
        sum += data_l(loc_l)
    return(sum/)

In [58]:
disney_movies_mod = duration_conversion(disney_movies)

In [59]:
explore_data(disney_movies_mod,0,2,True)

['s1', 'Movie', 'Duck the Halls: A Mickey Mouse Christmas Special', 'Alonso Ramirez Ramos, Dave Wasson', 'Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton', '', 'November 26, 2021', '2016', 'TV-G', 23, 'Animation, Family', 'Join Mickey and the gang as they duck the halls!']


['s2', 'Movie', 'Ernest Saves Christmas', 'John Cherry', 'Jim Varney, Noelle Parker, Douglas Seale', '', 'November 26, 2021', '1988', 'PG', 91, 'Comedy', 'Santa Claus passes his magic bag to a new St. Nic.']


no. of rows: 1052
no. of columns: 12


## Step 6: In which month mostly the movies are released ##