# Module 2 Challenge [5 pts]

After completing the multiple choice assessment, use the OILER framework on your own to correct the bug or bugs in this program.

The cell below defines a bunch of functions, followed by cells that invoke those functions. At least one of those following cells will raise an error or produce a wrong result. 
You'll have to understand what the program is trying to do and what the data looks like in order to determine if any of the results are wrong. Welcome to data-oriented programming!

When you get to the E step in the OILER framework, and you are ready to fix the code, we recommend that you copy and paste the code for the function(s) into the cell we have provided at the bottom of the page. That way you can try switching back and forth between the original version that we provided and your new version. (In python, you can redefine a function; whichever definition is the last one executed is the one that will be used.)

It is also possible to directly edit the function definitions in the cell below. But we don't recommend that, because it will be instructive in the Reflect stage to compare the original code with your new code.

In [25]:
import csv


def calculate_avg(movies:list, score_type:str) ->float:
    """
    This function calculates the average score based off of the parameter passed and returns a
    float value rounded off to one decimal point.

    Parameters:
        movies (list): A list of dictionaries containing all the movie data
        score_type (string): The type of score (e.g., "imdbRating", "Jump Scare Rating") to calculate
        the average for.

    Returns:
        float: A float value indicating the average value, rounded to one decimal point
    """

    sum_score = 0
    for movie in movies:
        score = get_value(movie, score_type)
        if score:
            sum_score = sum_score + float(movie[score_type])
    return round(sum_score / len(movies), 1)


def clean_movie_data(movie:dict) ->dict:
    """
    Clean the movie data by converting the 'Runtime' value to an integer. If the 'Runtime' value
    is not present, the function will call the < convert_to_int() > function.

    Parameters:
        movie (dict): dictionary containing key-value pairs representing data for one movie

    Returns:
        dict: dictionary containing key-value pairs representing data for one movie
    """

    try:
        movie["Runtime"] = convert_to_int(movie["Runtime"].split()[0])
    except KeyError:
        movie = convert_movie_fields_to_int(movie)
    return convert_movie_fields_to_int(movie)


def convert_to_int(value:str | bool) ->int | str:
    """
    Attempts to convert a string, number, or boolean value to an int.
    If a ValueError exception is encountered, the function returns the value unchanged.

    Parameters:
        value (str|bool): a string or boolean value to be converted

    Returns:
        int: if the value is successfully converted else returns value unchanged
    """
    try:
        return int(value)
    except ValueError:
        return value


def convert_movie_fields_to_int(movie:dict) ->dict:
    """
    Loops through a movie dictionary's key-value pairs and converts values
    that can be converted to an integer using the convert_to_int() function.

    Parameters:
        movie (dict): dictionary containing key-value pairs representing data for one movie

    Returns:
        dict: dictionary containing cleaned key-value pairs for the movie
    """
    for k, v in movie.items():
        movie[k] = convert_to_int(v)
    return movie


def count_movie_by_rating(movies:list, rating:str) ->int:
    """
    Loops through movies (list) and check if rating (str) is included in the dictionary key: 'Rated'.
    Increment the count by 1 if the given rating matches the movie's rating.

    Parameters:
        movies (list): List of dictionaries representing all movies
        rating (str): Audience Rating of the movie to check for

    Returns:
        int: Number of movies that have the supplied Rating
    """

    movie_count = 0
    for movie in movies:
        if movie["Rated"].lower() == rating.lower():
            movie_count += 1
    return movie_count


def filter_movie_by_genre(movies:list, genre:str) ->list:
    """
    Loops through movies (list) and check if genre (str) is included in the dictionary key: 'Genre'.
    Append the dictionary and all of its data to an empty list and return the final list.

    Parameters:
        movies (list): List of dictionaries representing all movies
        genre (str): Movie genre to check for

    Returns:
        list: List with movie dictionaries that include the supplied genre
    """

    movie_list = []
    for movie in movies:
        if genre.lower() in movie["Genre"].lower():
            movie_list.append(movie)
    return movie_list


def get_jumpscares(jumpscares:list[dict], movie_title:str) ->tuple:
    """
    Loops through jumpscare data (list) and check to see if the value of the key 'Movie Name'
    matches the supplied movie_title (str) and returns the values of the keys 'Jump Count'
    and 'Jump Scare Rating' as a tuple

    Parameters:
        jumpscares (list): information about the movie's jump scares
        movie_title (str): value of the key 'Title' from a movie dictionary

    Returns:
        tuple: A tuple with the first value representing a jump count and the second value
        representing a jump scare rating. Both of these tuples should be None if the
        movie name does not exist in the jumpscares list.
    """

    for item in jumpscares:
        if item["Movie Name"].lower().strip() == movie_title.lower().strip():
            return item["Jump Count"], item["Jump Scare Rating"]
    return None, None


def get_value(movie:dict, key_to_check:str) ->str | float | int:
    """
    This function checks if a parameter value exists or not in the dictionary, if it does it
    returns the value, else it returns False

    Parameters:
        movie (dict): a dictionary containing all the data about one movie
        key_to_check (string): a parameter of the dictionary

    Returns:
        value : if the value exists, return the value of the parameter. can be a string, float
        or int based on which parameter is being passed.
        None: if the value does not exist, return None
    """

    if movie[key_to_check]:
        return movie[key_to_check]
    return None


def read_csv_to_dicts(filepath:str, encoding="utf-8", newline="", delimiter=","):
    """
    Accepts a file path for a .csv file to be read, creates a file object,
    and uses csv.DictReader() to return a list of dictionaries
    that represent the row values from the file.

    Parameters:
        filepath (str): path to csv file
        encoding (str): name of encoding used to decode the file
        newline (str): specifies replacement value for newline '\n'
        or '\r\n' (Windows) character sequences
        delimiter (str): delimiter that separates the row values

    Returns:
        list: nested dictionaries representing the file contents
    """

    with open(filepath, "r", newline=newline, encoding=encoding) as file_obj:
        data = []
        reader = csv.DictReader(file_obj, delimiter=delimiter)
        for line in reader:
            data.append(line)
    return data


def write_dicts_to_csv(filepath:str, data:list, fieldnames, encoding="utf-8", newline=""):
    """
    Uses csv.DictWriter() to write a list of dictionaries to a target CSV file as row data.
    The passed in fieldnames list is used by the DictWriter() to determine the order
    in which each dictionary's key-value pairs are written to the row.

    Parameters:
        filepath (str): path to target file (if file does not exist it will be created)
        data (list): dictionary content to be written to the target file
        fieldnames (seq): sequence specifying order in which key-value pairs are written to each row
        encoding (str): name of encoding used to encode the file
        newline (str): specifies replacement value for newline '\n'
        or '\r\n' (Windows) character sequences.

    Returns:
        None
    """

    with open(filepath, "w", encoding=encoding, newline=newline) as file_obj:
        writer = csv.DictWriter(file_obj, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(data)

### 1. Load in the horror movie and jumpscare datasets. 

In [28]:
# Load in data
filepath_movies = "./data/horror_movies.csv"
filepath_jumpscare = "./data/movie_jumpscares.csv"
horror_movies = read_csv_to_dicts(filepath_movies)
jumpscares = read_csv_to_dicts(filepath_jumpscare)

### 2. Run the data cleaning steps and print out sample elements for inspection.

In [29]:
# Clean data
for movie in horror_movies:
    clean_movie_data(movie)

for data in jumpscares:
    clean_movie_data(data)

print(f"First element in horror_movies: {horror_movies[0]}")
print(f"First element in jumpscares: {jumpscares[0]}")

First element in horror_movies: {'Title': 'Would You Rather', 'Year': 2012, 'Rated': 'Not Rated', 'Released': '06 Jun 2019', 'Runtime': 93, 'Genre': 'Horror, Thriller', 'Director': 'David Guy Levy', 'Writer': 'Steffen Schlachtenhaufen', 'Actors': 'Brittany Snow, June Squibb, Jeffrey Combs', 'Plot': 'Desperate to help her ailing brother, a young woman unknowingly agrees to compete in a deadly game of "Would You Rather," hosted by a sadistic aristocrat.', 'Language': 'English', 'Country': 'United States', 'Awards': '1 nomination', 'imdbRating': '5.7'}
First element in jumpscares: {'Movie Name': '[Rec]', 'Director': 'Jaume Balagueró, Paco Plaza', 'Year': 2007, 'Jump Count': 11, 'Jump Scare Rating': '3.5', 'Netflix (US)': 'No'}


### 3. Filter the horror movies dataset so that we only include movies that have "horror" listed in their genre. This is an extra quality control step.

In [30]:
horror_movies = filter_movie_by_genre(horror_movies, "horror")
print(f"\nLength of Horror movies \n{len(horror_movies)}")


Length of Horror movies 
65


### 4. How many movies are there with each rating?

In [31]:
# Print out how many movies have each rating
unique_ratings = ["PG", "PG-13", "R", "Not Rated", "Passed", "Unrated"]

movie_ratings_count = {}
for rating in unique_ratings:
    movie_ratings_count[rating] = count_movie_by_rating(horror_movies, rating)

print(f"Count of horror movies in each viewer ratings:\n{movie_ratings_count}")

Count of horror movies in each viewer ratings:
{'PG': 3, 'PG-13': 9, 'R': 45, 'Not Rated': 6, 'Passed': 1, 'Unrated': 1}


### 5. How many movies contain a jumpscare?

In [32]:
for movie in horror_movies:
    movie["Jumpscare_count"], movie["Jumpscare_rating"] = get_jumpscares(jumpscares, movie["Title"])
print(f"\nLength of horror movies with jump scare data:\n{len(horror_movies)}")


Length of horror movies with jump scare data:
65


In [33]:
jumpscares[0]

{'Movie Name': '[Rec]',
 'Director': 'Jaume Balagueró, Paco Plaza',
 'Year': 2007,
 'Jump Count': 11,
 'Jump Scare Rating': '3.5',
 'Netflix (US)': 'No'}

### 6. What is the average IMDb movie rating and jumpscare rating for these movies?

In [34]:
avg_imdb_rating = calculate_avg(horror_movies, "imdbRating")
avg_jumpscare_rating = calculate_avg(horror_movies, "Jumpscare_rating")

print(f"The average imdb Score is: {avg_imdb_rating}")
print(f"The average jumpscare Score is: {avg_jumpscare_rating}")

The average imdb Score is: 6.6
The average jumpscare Score is: 2.3


### 7. Which movies are the highest rated, based on IMDb and jumpscare rating?

In [15]:
high_rated_movies = []

for movie in horror_movies:
    imdb_rating = get_value(movie, "imdbRating")
    jumpscare_rating = get_value(movie, "Jumpscare_rating")

    if imdb_rating is not None and jumpscare_rating is not None:
        if float(imdb_rating) > avg_imdb_rating and float(jumpscare_rating) > avg_jumpscare_rating:
            high_rated_movies.append({
                "Title": get_value(movie, "Title"),
                "IMDb_rating": imdb_rating,
                "Jumpscare_rating": jumpscare_rating,
            })

print("High-rated movies:")
for movie in high_rated_movies:
    print(movie)

High-rated movies:
{'Title': 'Would You Rather', 'IMDb_rating': '5.7', 'Jumpscare_rating': 1}
{'Title': 'Us', 'IMDb_rating': '6.8', 'Jumpscare_rating': '2.5'}
{'Title': 'The Texas Chain Saw Massacre', 'IMDb_rating': '7.4', 'Jumpscare_rating': 1}
{'Title': 'The Skeleton Key', 'IMDb_rating': '6.5', 'Jumpscare_rating': 3}
{'Title': 'The Shining', 'IMDb_rating': '8.4', 'Jumpscare_rating': '0.5'}
{'Title': 'The Ring', 'IMDb_rating': '7.1', 'Jumpscare_rating': '2.5'}
{'Title': 'The Purge: Anarchy', 'IMDb_rating': '6.4', 'Jumpscare_rating': 3}
{'Title': 'The Others', 'IMDb_rating': '7.6', 'Jumpscare_rating': 2}
{'Title': 'The Nun', 'IMDb_rating': '5.3', 'Jumpscare_rating': '4.5'}
{'Title': 'The Loved Ones', 'IMDb_rating': '6.6', 'Jumpscare_rating': 1}
{'Title': 'The Hills Have Eyes', 'IMDb_rating': '6.4', 'Jumpscare_rating': '2.5'}
{'Title': 'The Haunting in Connecticut 2: Ghosts of Georgia', 'IMDb_rating': '5.3', 'Jumpscare_rating': 5}
{'Title': 'The Conjuring 2', 'IMDb_rating': '7.3', 'Jump

In [36]:
horror_movies[0:3]

[{'Title': 'Would You Rather',
  'Year': 2012,
  'Rated': 'Not Rated',
  'Released': '06 Jun 2019',
  'Runtime': 93,
  'Genre': 'Horror, Thriller',
  'Director': 'David Guy Levy',
  'Writer': 'Steffen Schlachtenhaufen',
  'Actors': 'Brittany Snow, June Squibb, Jeffrey Combs',
  'Plot': 'Desperate to help her ailing brother, a young woman unknowingly agrees to compete in a deadly game of "Would You Rather," hosted by a sadistic aristocrat.',
  'Language': 'English',
  'Country': 'United States',
  'Awards': '1 nomination',
  'imdbRating': '5.7',
  'Jumpscare_count': 3,
  'Jumpscare_rating': 1},
 {'Title': 'Us',
  'Year': 2019,
  'Rated': 'R',
  'Released': '22 Mar 2019',
  'Runtime': 116,
  'Genre': 'Horror, Mystery, Thriller',
  'Director': 'Jordan Peele',
  'Writer': 'Jordan Peele',
  'Actors': "Lupita Nyong'o, Winston Duke, Elisabeth Moss",
  'Plot': "A family's serene beach vacation turns to chaos when their doppelgängers appear and begin to terrorize them.",
  'Language': 'English'

In [37]:
import csv


def calculate_avg(movies:list, score_type:str) ->float:
    """
    This function calculates the average score based off of the parameter passed and returns a
    float value rounded off to one decimal point.

    Parameters:
        movies (list): A list of dictionaries containing all the movie data
        score_type (string): The type of score (e.g., "imdbRating", "Jump Scare Rating") to calculate
        the average for.

    Returns:
        float: A float value indicating the average value, rounded to one decimal point
    """

    sum_score = 0
    for movie in movies:
        score = get_value(movie, score_type)
        if score:
            sum_score = sum_score + float(movie[score_type])
    return round(sum_score / len(movies), 1)


def clean_movie_data(movie:dict) ->dict:
    """
    Clean the movie data by converting the 'Runtime' value to an integer. If the 'Runtime' value
    is not present, the function will call the < convert_to_int() > function.

    Parameters:
        movie (dict): dictionary containing key-value pairs representing data for one movie

    Returns:
        dict: dictionary containing key-value pairs representing data for one movie
    """

    try:
        movie["Runtime"] = convert_to_int(movie["Runtime"].split()[0])
    except KeyError:
        movie = convert_movie_fields_to_int(movie)
    return convert_movie_fields_to_int(movie)


def convert_to_int(value:str | bool) ->int | str:
    """
    Attempts to convert a string, number, or boolean value to an int.
    If a ValueError exception is encountered, the function returns the value unchanged.

    Parameters:
        value (str|bool): a string or boolean value to be converted

    Returns:
        int: if the value is successfully converted else returns value unchanged
    """
    try:
        return int(value)
    except ValueError:
        return value


def convert_movie_fields_to_int(movie:dict) ->dict:
    """
    Loops through a movie dictionary's key-value pairs and converts values
    that can be converted to an integer using the convert_to_int() function.

    Parameters:
        movie (dict): dictionary containing key-value pairs representing data for one movie

    Returns:
        dict: dictionary containing cleaned key-value pairs for the movie
    """
    for k, v in movie.items():
        movie[k] = convert_to_int(v)
    return movie


def count_movie_by_rating(movies:list, rating:str) ->int:
    """
    Loops through movies (list) and check if rating (str) is included in the dictionary key: 'Rated'.
    Increment the count by 1 if the given rating matches the movie's rating.

    Parameters:
        movies (list): List of dictionaries representing all movies
        rating (str): Audience Rating of the movie to check for

    Returns:
        int: Number of movies that have the supplied Rating
    """

    movie_count = 0
    for movie in movies:
        if movie["Rated"].lower() == rating.lower():
            movie_count += 1
    return movie_count


def filter_movie_by_genre(movies:list, genre:str) ->list:
    """
    Loops through movies (list) and check if genre (str) is included in the dictionary key: 'Genre'.
    Append the dictionary and all of its data to an empty list and return the final list.

    Parameters:
        movies (list): List of dictionaries representing all movies
        genre (str): Movie genre to check for

    Returns:
        list: List with movie dictionaries that include the supplied genre
    """

    movie_list = []
    for movie in movies:
        if genre.lower() in movie["Genre"].lower():
            movie_list.append(movie)
    return movie_list


def get_jumpscares(jumpscares:list[dict], movie_title:str) ->tuple:
    """
    Loops through jumpscare data (list) and check to see if the value of the key 'Movie Name'
    matches the supplied movie_title (str) and returns the values of the keys 'Jump Count'
    and 'Jump Scare Rating' as a tuple

    Parameters:
        jumpscares (list): information about the movie's jump scares
        movie_title (str): value of the key 'Title' from a movie dictionary

    Returns:
        tuple: A tuple with the first value representing a jump count and the second value
        representing a jump scare rating. Both of these tuples should be None if the
        movie name does not exist in the jumpscares list.
    """

    for item in jumpscares:
        if item["Movie Name"].lower().strip() == movie_title.lower().strip():
            return item["Jump Count"], item["Jump Scare Rating"]
    return None, None


def get_value(movie:dict, key_to_check:str) ->str | float | int:
    """
    This function checks if a parameter value exists or not in the dictionary, if it does it
    returns the value, else it returns False

    Parameters:
        movie (dict): a dictionary containing all the data about one movie
        key_to_check (string): a parameter of the dictionary

    Returns:
        value : if the value exists, return the value of the parameter. can be a string, float
        or int based on which parameter is being passed.
        None: if the value does not exist, return None
    """

    if movie[key_to_check]:
        return movie[key_to_check]
    return None


def read_csv_to_dicts(filepath:str, encoding="utf-8", newline="", delimiter=","):
    """
    Accepts a file path for a .csv file to be read, creates a file object,
    and uses csv.DictReader() to return a list of dictionaries
    that represent the row values from the file.

    Parameters:
        filepath (str): path to csv file
        encoding (str): name of encoding used to decode the file
        newline (str): specifies replacement value for newline '\n'
        or '\r\n' (Windows) character sequences
        delimiter (str): delimiter that separates the row values

    Returns:
        list: nested dictionaries representing the file contents
    """

    with open(filepath, "r", newline=newline, encoding=encoding) as file_obj:
        data = []
        reader = csv.DictReader(file_obj, delimiter=delimiter)
        for line in reader:
            data.append(line)
    return data


def write_dicts_to_csv(filepath:str, data:list, fieldnames, encoding="utf-8", newline=""):
    """
    Uses csv.DictWriter() to write a list of dictionaries to a target CSV file as row data.
    The passed in fieldnames list is used by the DictWriter() to determine the order
    in which each dictionary's key-value pairs are written to the row.

    Parameters:
        filepath (str): path to target file (if file does not exist it will be created)
        data (list): dictionary content to be written to the target file
        fieldnames (seq): sequence specifying order in which key-value pairs are written to each row
        encoding (str): name of encoding used to encode the file
        newline (str): specifies replacement value for newline '\n'
        or '\r\n' (Windows) character sequences.

    Returns:
        None
    """

    with open(filepath, "w", encoding=encoding, newline=newline) as file_obj:
        writer = csv.DictWriter(file_obj, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(data)

In [38]:
# hidden tests are within this cell