---
title: Christmas Movies
author: Luna Oley
date: 2025-04-9
categories: [Programing, Spotify, Python] # tags for a blog post (e.g., python)
image: "Xmas_pic.png"

toc: true
---

First we load the Holiday Movies and Holiday Movies Genres dataset and observe it's structure. The Holiday Movies data set contains variables named tconst, title_type, primary_title, simple_title, and year. The Holiday Movie Genres dataset has 3 variables, runtime_minutes, average_rating, and num_votes.

In [10]:
import pandas as pd

holiday_movies = pd.read_csv("https://bcdanl.github.io/data/holiday_movies.csv")
holiday_movie_genres = pd.read_csv("https://bcdanl.github.io/data/holiday_movie_genres.csv")
print(holiday_movies.head())

      tconst title_type          primary_title          simple_title  year  \
0  tt0020356      movie       Sailor's Holiday       sailors holiday  1929   
1  tt0020823      movie    The Devil's Holiday    the devils holiday  1930   
2  tt0020985      movie                Holiday               holiday  1930   
3  tt0021268      movie  Holiday of St. Jorgen  holiday of st jorgen  1930   
4  tt0021377      movie    Sin Takes a Holiday   sin takes a holiday  1930   

   runtime_minutes  average_rating  num_votes  
0             58.0             5.4         55  
1             80.0             6.0        242  
2             91.0             6.3        638  
3             83.0             7.4        256  
4             81.0             6.1        740  


#

We will start looking at the data by counting the number of movies by year from the Holiday Movies data set.

In [4]:
# Count the number of movies by year
movies_by_year = holiday_movies['year'].value_counts().sort_index()

print(movies_by_year)


year
1929      1
1930      5
1931      1
1934      3
1936      2
       ... 
2019    143
2020    172
2021    183
2022    173
2023    107
Name: count, Length: 91, dtype: int64


Then we look at the Holiday Movies dataset number of imb rating in descending order. We do this by sorting the values by the variables "num_votes".

In [7]:
# Sort movies by IMDb rating in descending order
top_rated_movies = holiday_movies.sort_values(by='num_votes', ascending=False).head(10)

print(top_rated_movies[['primary_title', 'num_votes']])


                              primary_title  num_votes
151          The Nightmare Before Christmas     367288
501                             The Holiday     308807
211          How the Grinch Stole Christmas     276568
135   National Lampoon's Christmas Vacation     209533
107                       A Christmas Story     163273
48                            Roman Holiday     145289
499                      Mr. Bean's Holiday     132186
680                       A Christmas Carol     125562
2151                         Last Christmas      86058
1288                 Office Christmas Party      85255


Next we will look to filter movies released in or after 200.

In [8]:
# Filter movies released in or after 2000
modern_holiday_movies = holiday_movies[holiday_movies['year'] >= 2000]

print(modern_holiday_movies.head())


        tconst title_type                   primary_title  \
211  tt0170016      movie  How the Grinch Stole Christmas   
250  tt0217978      movie                         'R Xmas   
256  tt0221074      movie         Christmas in the Clouds   
269  tt0233828      movie                The Long Holiday   
273  tt0238121      movie  A Christmas Tree and a Wedding   

                       simple_title  year  runtime_minutes  average_rating  \
211  how the grinch stole christmas  2000            104.0             6.3   
250                          r xmas  2001             85.0             5.7   
256         christmas in the clouds  2001             96.0             6.4   
269                the long holiday  2000            145.0             7.9   
273  a christmas tree and a wedding  2000             90.0             8.3   

     num_votes  
211     276568  
250       1588  
256        863  
269        116  
273         57  


Now using the Holiday Movie and Holiday Movie Genres data sets we will merge them to first combine them to create a dataset that includes all the variables. Then with this dataset we will find the highest rated movue for each genre

In [16]:
# Find the highest-rated movie for each genre
# Perform an inner join to combine movie and genre information
movies_with_genres = pd.merge(holiday_movies, holiday_movie_genres, how='inner', on='tconst')

print(movies_with_genres.head())

highest_rated_by_genre = movies_with_genres.loc[movies_with_genres.groupby('genres')['num_votes'].idxmax()]

print(highest_rated_by_genre[['genres', 'primary_title', 'num_votes']])


      tconst title_type        primary_title        simple_title  year  \
0  tt0020356      movie     Sailor's Holiday     sailors holiday  1929   
1  tt0020823      movie  The Devil's Holiday  the devils holiday  1930   
2  tt0020823      movie  The Devil's Holiday  the devils holiday  1930   
3  tt0020985      movie              Holiday             holiday  1930   
4  tt0020985      movie              Holiday             holiday  1930   

   runtime_minutes  average_rating  num_votes   genres  
0             58.0             5.4         55   Comedy  
1             80.0             6.0        242    Drama  
2             80.0             6.0        242  Romance  
3             91.0             6.3        638   Comedy  
4             91.0             6.3        638    Drama  
           genres                                      primary_title  \
3217       Action                                            Holiday   
1391    Adventure                                  A Christmas Carol 