# Data Bootcamp Selection Challenge

## Project Description

In this challenge, I am working with a car-based dataset to calculate various Key Performance Indicators (KPIs). Each question has a single correct answer, which will be evaluated through automated unit testing. The challenge includes questions related to data analysis, aggregation, and calculations.

## Instructions

1. **Dataset**: The dataset has already been provided. Do not perform any data cleaning or modifications to the dataset.

2. **Answer Dictionary**: Use the provided answer dictionary to store your answers. Do not modify the structure of the answer dictionary. For each question, replace the value with your calculated answer.

3. **Coding**: Solve each question by writing the necessary code using the given dataset. Use the examples and code snippets provided in the challenge description as references.

4. **Saving Answers**: After solving all the questions and filling in the answer dictionary, save your answers in pickle format using the provided code.

5. **GitHub Repository**: Upload your Jupyter notebook and the pickle file containing the answer dictionary to a public GitHub repository. Create a new repository if needed.

6. **Submission**: Submit the URLs of both your Jupyter notebook and the pickle file through the provided Google form. Ensure that the URLs are accessible and correct.

## My Task

My task is to complete the coding challenges presented in this notebook. I am to follow the provided examples and guidelines to calculate the required KPIs and ensure that my answers are in the correct format as specified in each question.

---

**Note**: The unit tests provided at the end of this notebook will help us verify the correctness of our answers. However, they are not exhaustive, so make sure to review your answers carefully before submission.


In [31]:
#Use this dictionary to store your answers in the correct format in the cells below , do not modify the keys
answer_dict =  {"Q1" : None,
                "Q2" : None,
                "Q3" : None,
                "Q4" : None,
                "Q5" : None,
                "Q6" : None,
                "Q7" : None}

# Data Bootcamp Selection Challenge

## Reading the Dataset

In this challenge, you will work with a car-based dataset to calculate various Key Performance Indicators (KPIs). The dataset contains information about different car models, their attributes, and fuel efficiency metrics. You will use this dataset to answer specific questions related to the provided challenges.

To get started, follow the instructions below to read the dataset into a DataFrame using the pandas library.

### Instructions

1. Import the required library, pandas.
2. Specify the URL of the dataset.
3. Read the dataset into a DataFrame using the `pd.read_csv()` function.
4. Display the first few rows of the DataFrame to ensure successful loading.

Remember, you can use any Python library to solve this challenge, but we recommend using pandas due to its ease of use and compatibility with the given dataset.

Let's begin by reading the dataset and exploring the initial data.


# Data Bootcamp Selection Challenge

## Reading the Dataset

In this step, we will read the car-based dataset into a DataFrame using the pandas library. The dataset contains information about various car models, their attributes, and fuel efficiency metrics.

We will use the following code to accomplish this:

```python
import pandas as pd
import numpy as np

# URL of the dataset
url = "https://storage.googleapis.com/deb-evaluation-materials/vehicles.csv"

# Read the dataset into a DataFrame using pandas
df = pd.read_csv(url)


In [32]:
import pandas as pd
import numpy as np
url = "https://storage.googleapis.com/deb-evaluation-materials/vehicles.csv"
df = pd.read_csv(url)

Let's execute the code and load the dataset for further analysis.

# Data Bootcamp Selection Challenge

## Exploring the Initial Data

After successfully reading the dataset into a DataFrame, let's take a look at the initial data. The `df.head()` function allows us to display the first few rows of the DataFrame to get a glimpse of the data's structure and content.

Let's execute the code below to see the first few rows of the dataset:

```python
df.head()


In [33]:
df.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


# Data Bootcamp Selection Challenge

## Dataset Information

Let's explore further details about the dataset using the `df.info()` function. This function provides information about the columns, non-null counts, and data types present in the DataFrame. The output below summarizes the dataset's structure:



In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35952 entries, 0 to 35951
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Make                     35952 non-null  object 
 1   Model                    35952 non-null  object 
 2   Year                     35952 non-null  int64  
 3   Engine Displacement      35952 non-null  float64
 4   Cylinders                35952 non-null  float64
 5   Transmission             35952 non-null  object 
 6   Drivetrain               35952 non-null  object 
 7   Vehicle Class            35952 non-null  object 
 8   Fuel Type                35952 non-null  object 
 9   Fuel Barrels/Year        35952 non-null  float64
 10  City MPG                 35952 non-null  int64  
 11  Highway MPG              35952 non-null  int64  
 12  Combined MPG             35952 non-null  int64  
 13  CO2 Emission Grams/Mile  35952 non-null  float64
 14  Fuel Cost/Year        

Q1. What is the average CO2 emmission per gram/mile of all Volkswagen cars?
Format: A floating number
Example answer:
11.547

In [35]:
# Calculate the average CO2 emission per gram/mile of all Volkswagen cars
average_co2_emission = df[df['Make'] == 'Volkswagen']['CO2 Emission Grams/Mile'].mean()

# Update the answer dictionary with the calculated result
answer_dict["Q1"] = average_co2_emission

# Print the calculated result
print("Average CO2 emission per gram/mile of all Volkswagen cars:", average_co2_emission)


Average CO2 emission per gram/mile of all Volkswagen cars: 392.74172108576107


Q2. Calculate the top 5 brands(Make) with the most unique models, order your answer in descending order with respect to the number of unique models.
NOTE: Consider only the name of the models and their brand, that is use only the Make and Model columns
Format: A 5X2 list with each row being the name of the brand followed by the unique number of models, in descending order.
Hint: You can use the pandas df.values.tolist() function to format your answer.
Example answer:
[["Volkswagen", 1000], ["Toyota", 900], ["Honda", 800], ["Subaru", 700], ["Ford", 600]]

In [36]:
# Calculate the top 5 brands with the most unique models
top_brands_unique_models = df.groupby('Make')['Model'].nunique().sort_values(ascending=False).head(5)

# Convert the result to the required format
top_brands_unique_models_list = top_brands_unique_models.reset_index().values.tolist()

# Update the answer dictionary with the calculated result
answer_dict["Q2"] = top_brands_unique_models_list

# Print the calculated result
print("Top 5 brands with the most unique models:")
for brand, num_models in top_brands_unique_models_list:
    print(brand, num_models)


Top 5 brands with the most unique models:
Mercedes-Benz 333
BMW 284
Chevrolet 253
Ford 185
GMC 163


Q3. What are all the different types of fuels in the dataset sorted alphabetically?
Format: A list of strings sorted alphabetically.
Example Answer:
['Regular', 'Premium']

In [37]:
# Get the different types of fuels and sort them alphabetically
fuel_types = sorted(df['Fuel Type'].unique())

# Update the answer dictionary with the calculated result
answer_dict["Q3"] = fuel_types

# Print the list of fuel types
print("Different types of fuels sorted alphabetically:")
print(fuel_types)


Different types of fuels sorted alphabetically:
['CNG', 'Diesel', 'Gasoline or E85', 'Gasoline or natural gas', 'Gasoline or propane', 'Midgrade', 'Premium', 'Premium Gas or Electricity', 'Premium and Electricity', 'Premium or E85', 'Regular', 'Regular Gas and Electricity', 'Regular Gas or Electricity']


Q4. Show the 9 Toyota cars with the most extreme Fuel Barrels/Year in abosolute terms within all Toyota cars. Show the car Model, Year and their Fuel Barrels/Year in standard deviation units(Z-score) sorted in descending order by their Fuel Barrels/Year in absolute terms first and then by year in descending order BUT without modifying the negative values (see example).
Format: A 9X3 list with each row containing the Model, Year and Fuel Barrels/Year in standard deviations units
Example answer:
[['DJ Po Vehicle 2WD', 2004, -6.407431084026927],
 ['FJ8c Post Office', 2003, -6.407431084026927],
 ['Post Office DJ5 2WD', 2005, -6.391684618442447],
 ['Sierra 2500 Hd 2WD', 2002, -6.391684618442447],
 ['Camry CNG', 2012, 2.677633075759575],
 ['Sierra 1500 4WD', 2005, 2.677633075759575],
 ['Sierra 1500 4WD', 2001, 2.677633075759575],
 ['V15 Suburban 4WD', 1988, 2.677633075759575],
 ['V15 Suburban 4WD', 1987, 2.677633075759575]]
Note that while the list is sorted by the Fuel Barrels/Year in absolute terms and in standard deviation units, the values are not modified. If the values are the same the rows are sorted by the year.

In [38]:
# Calculate Z-scores for Fuel Barrels/Year within Toyota cars
toyota_cars = df[df['Make'] == 'Toyota']
toyota_cars['Fuel Barrels/Year Z-score'] = (toyota_cars['Fuel Barrels/Year'] - toyota_cars['Fuel Barrels/Year'].mean()) / toyota_cars['Fuel Barrels/Year'].std()

# Filter out negative Z-scores and sort by absolute value and year
extreme_toyota_cars = toyota_cars[abs(toyota_cars['Fuel Barrels/Year Z-score']) > 0].sort_values(by=['Fuel Barrels/Year Z-score', 'Year'], ascending=[False, False])

# Select the top 9 extreme Toyota cars
top_extreme_toyota_cars = extreme_toyota_cars.head(9)[['Model', 'Year', 'Fuel Barrels/Year Z-score']].values.tolist()

# Update the answer dictionary with the calculated result
answer_dict["Q4"] = top_extreme_toyota_cars

# Print the list of extreme Toyota cars
print("Top 9 Toyota cars with extreme Fuel Barrels/Year:")
for car in top_extreme_toyota_cars:
    print(car)


Top 9 Toyota cars with extreme Fuel Barrels/Year:
['Cab/Chassis 2WD', 1993, 4.112255865424778]
['Cab/Chassis 2WD', 1992, 4.112255865424778]
['Cab/Chassis 2WD', 1991, 4.112255865424778]
['Cab/Chassis 2WD', 1990, 4.112255865424778]
['Cab/Chassis 2WD', 1989, 4.112255865424778]
['Cab/Chassis 2WD', 1993, 3.3791118637260777]
['Cab/Chassis 2WD', 1992, 3.3791118637260777]
['Land Cruiser Wagon 4WD', 1992, 3.3791118637260777]
['Cab/Chassis 2WD', 1991, 3.3791118637260777]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  toyota_cars['Fuel Barrels/Year Z-score'] = (toyota_cars['Fuel Barrels/Year'] - toyota_cars['Fuel Barrels/Year'].mean()) / toyota_cars['Fuel Barrels/Year'].std()


Q5. Calculate the changes in Combined MPG with their previous model of all Golf cars with Manual 5-spd transmission and Regular Fuel Type. Show the Year, the Combined MPG and the calculated difference of MPG in a list sorted by Year in ascending order.
Format: A 19X3 list, with the Year and Combined MPG being of type integer and only the calculated difference is of type float
Note: The value for the first model should be 0. It does not matter that there are gaps in the years, calculate with respect the previous model.
Example answer:
[[1986, 25, 0.0],
 [1987, 25, 0.0],
 [1988, 25, 0.0],
 [1989, 25, 0.0],
 [1990, 23, -2.0],
 [1991, 23, 0.0],
 [1992, 24, 1.0],
 [1993, 25, 1.0],
 [1994, 25, 0.0],
 [1995, 25, 0.0],
 [1996, 25, 0.0],
 [1997, 25, 0.0],
 [1998, 24, -1.0],
 [1999, 25, 1.0],
 [2000, 24, -1.0],
 [2001, 24, 0.0],
 [2002, 24, 0.0],
 [2004, 24, 0.0],
 [2006, 24, 0.0]]

In [39]:
# Filter the data for Golf cars with Manual 5-spd transmission and Regular Fuel Type
golf_cars = df[(df['Make'] == 'Volkswagen') & (df['Model'].str.contains('Golf')) & (df['Transmission'] == 'Manual 5-spd') & (df['Fuel Type'] == 'Regular')]

# Calculate the changes in Combined MPG with the previous model
golf_cars['Change in Combined MPG'] = golf_cars['Combined MPG'].diff().fillna(0)

# Select the relevant columns and sort by Year
result_q5 = golf_cars[['Year', 'Combined MPG', 'Change in Combined MPG']].values.tolist()
result_q5.sort()

# Update the answer dictionary with the calculated result
answer_dict["Q5"] = result_q5

# Print the list of changes in Combined MPG for Golf cars
print("Changes in Combined MPG for Golf cars:")
for entry in result_q5:
    print(entry)


Changes in Combined MPG for Golf cars:
[1985.0, 25.0, -3.0]
[1985.0, 26.0, 1.0]
[1986.0, 26.0, 1.0]
[1987.0, 25.0, 0.0]
[1987.0, 26.0, 0.0]
[1988.0, 25.0, -1.0]
[1989.0, 25.0, 0.0]
[1990.0, 25.0, -1.0]
[1991.0, 25.0, 0.0]
[1992.0, 25.0, 0.0]
[1993.0, 24.0, 0.0]
[1994.0, 24.0, -5.0]
[1995.0, 24.0, 0.0]
[1996.0, 23.0, -2.0]
[1997.0, 24.0, 1.0]
[1998.0, 24.0, 0.0]
[1999.0, 24.0, -1.0]
[1999.0, 24.0, 0.0]
[2000.0, 24.0, 0.0]
[2001.0, 24.0, 0.0]
[2002.0, 24.0, 0.0]
[2003.0, 24.0, 0.0]
[2004.0, 24.0, 0.0]
[2005.0, 24.0, 0.0]
[2006.0, 24.0, 0.0]
[2010.0, 25.0, 1.0]
[2011.0, 26.0, 1.0]
[2012.0, 26.0, 0.0]
[2013.0, 26.0, 0.0]
[2015.0, 29.0, 5.0]
[2015.0, 30.0, 4.0]
[2016.0, 29.0, 0.0]
[2016.0, 30.0, 0.0]
[2017.0, 28.0, -1.0]
[2017.0, 29.0, -1.0]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  golf_cars['Change in Combined MPG'] = golf_cars['Combined MPG'].diff().fillna(0)


Q6. What are the top 5 lowest CO2 Emission Grams/Mile emmisions of cars for each of the following brands: Toyota, Ford, Volkswagen, Nissan, Honda
Format: A 5X6 list with the first element of each row being the Make of the cars and the following five values being floats sorted in ascending order. The Makes should appear in order listed in the question starting with Toyota and ending with Honda (see example).
Example answer:
[['Toyota', 100.0, 140.0, 140.0, 150.0, 150.0],
 ['Ford',
  100.025641025641,
  200.677633075759575,
  200.677633075759575,
  200.677633075759575,
  200.677633075759575],
 ['Volkswagen', 139.0, 154.0, 166.5, 166.5, 166.5],
 ['Nissan', 122.0, 122.0, 122.0, 122.0, 160.0],
 ['Honda', 100.0, 100.0, 100.0, 100.0, 123.91684618442447]]

In [40]:
# List of specified brands
brands = ['Toyota', 'Ford', 'Volkswagen', 'Nissan', 'Honda']

# Initialize an empty list to store the results
result_q6 = []

# Iterate through each brand and calculate the top 5 lowest CO2 Emission Grams/Mile emissions
for brand in brands:
    # Filter the data for the current brand
    brand_data = df[df['Make'] == brand]

    # Sort the data by CO2 Emission Grams/Mile in ascending order
    brand_data_sorted = brand_data.sort_values(by='CO2 Emission Grams/Mile')

    # Select the top 5 lowest emissions and add them to the result list
    lowest_emissions = brand_data_sorted[['CO2 Emission Grams/Mile']].head(5).values.tolist()
    lowest_emissions.insert(0, brand)  # Add the brand name at the beginning
    result_q6.append(lowest_emissions)

# Update the answer dictionary with the calculated result
answer_dict["Q6"] = result_q6

# Print the list of top 5 lowest emissions for each brand
print("Top 5 lowest CO2 Emission Grams/Mile emissions for each brand:")
for entry in result_q6:
    print(entry)


Top 5 lowest CO2 Emission Grams/Mile emissions for each brand:
['Toyota', [133.0], [133.0], [133.0], [133.0], [158.0]]
['Ford', [112.0], [129.0], [129.0], [129.0], [129.0]]
['Volkswagen', [200.0], [200.0], [200.0], [200.0], [261.025641025641]]
['Nissan', [249.0], [254.0], [254.5], [254.5], [254.5]]
['Honda', [130.0], [167.67924528301887], [167.67924528301887], [167.67924528301887], [167.67924528301887]]


Q7. Form 7 groups of 5 years to calculated the median Combined MPG of each group. The first group is from 1984 to 1988, the second from 1989 to 1993 and so on. The last group will have years not appearing in the dataset.
Note: The groups ranges are inclusive on both sides, the first group starts with 1984 and cars from 1984 are included in it.
Format : A 7X2 list with the first element of each row being a tuple of two integers being the lower and uppper range of the year groups and the esecond element being the median Combined MPG of that group, a float number.
Example answer:
[[(1984, 1988), 11.0],
 [(1989, 1993), 10.0],
 [(1994, 1998), 10.0],
 [(1999, 2003), 14.0],
 [(2004, 2008), 13.0],
 [(2009, 2013), 14.0],
 [(2014, 2018), 15.0]]

In [41]:
# # Calculate the median Combined MPG for each group of 5 years
# result_q7 = []
# for start_year in range(1984, 2023, 5):
#     end_year = start_year + 4
#     group_data = df[(df['Year'] >= start_year) & (df['Year'] <= end_year)]
#     median_mpg = group_data['Combined MPG'].median()
#     result_q7.append([(start_year, end_year), median_mpg])

# # Update the answer dictionary with the calculated result
# answer_dict["Q7"] = result_q7

# # Print the list of tuples containing year groups and their corresponding median Combined MPG
# print("Median Combined MPG for each group of 5 years:")
# for entry in result_q7:
#     print(entry)


# Calculate the median Combined MPG for each group of 5 years
result_q7 = []
for start_year in range(1984, 2023, 5):
    end_year = start_year + 4
    group_data = df[(df['Year'] >= start_year) & (df['Year'] <= end_year)]
    median_mpg = group_data['Combined MPG'].median()
    result_q7.append([(start_year, end_year), float(median_mpg)])  # Convert median_mpg to float

# Update the answer dictionary with the calculated result
answer_dict["Q7"] = result_q7



Conversion of data types to conform to the set data types in the test cases

> I have tried all the ways possible to conform to the set data types, but I am still failing 4 test cases



In [42]:
# Convert answer_dict values to correct data types
answer_dict["Q1"] = float(answer_dict["Q1"])

answer_dict["Q2"] = [
    [str(entry[0]), int(entry[1])]
    for entry in answer_dict["Q2"]
]

answer_dict["Q3"] = sorted([str(fuel_type) for fuel_type in answer_dict["Q3"]])

answer_dict["Q4"] = [
    [str(entry[0]), int(entry[1]), float(entry[2])]
    for entry in answer_dict["Q4"]
]

answer_dict["Q5"] = [
    [int(entry[0]), int(entry[1]), float(entry[2])]
    for entry in answer_dict["Q5"]
]

answer_dict["Q6"] = [
    entry
    for entry in answer_dict["Q6"]
]

answer_dict["Q7"] = [
    [(int(entry[0][0]), int(entry[0][1])), float(entry[1])]
    for entry in answer_dict["Q7"]
]


Test your answers


> We provide you some tests to make sure your answer dictionary is in the correct format using unittest.

> These tests are not meant to be comprehensive, you should review all your answers carefully.

# Test Your Answers

To ensure that your answer dictionary is in the correct format, we will run a series of tests using the `unittest` framework. These tests are designed to check if your answers are structured as expected. Please review your answers and ensure that they match the required formats for each question.

**Note:** Passing these tests does not guarantee the correctness of your answers, but it does verify that your answers are in the expected format.

Below are the tests that will be run:

1. Check if `answer_dict` is a dictionary.
2. Check if the keys in `answer_dict` match the expected keys for each question.
3. Check if the values in `answer_dict` have the correct data types for each question.
4. Check specific data types and shapes for each question's answer.

Please make sure to run the test cells below to validate your answers.


In [43]:
import unittest

class TestAnswers(unittest.TestCase):
    def test_if_dict(self):
        self.assertIsInstance(answer_dict, dict)

    def test_keys(self):
        self.assertEqual(list(answer_dict.keys()), ['Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7'])

    def test_answers_types(self):
        types_values = [type(k) for k in answer_dict.values()]
        answer_types = [float, list, list, list, list, list, list]
        self.assertEqual(types_values, answer_types)

    def test_Q1(self):
        self.assertEqual(type(answer_dict['Q1']), float)

    def test_Q2_dim(self):
        self.assertEqual(np.array(answer_dict['Q2']).shape, (5,2))

    def test_Q2_types(self):
        dtype1 = type(answer_dict['Q2'][0][0])
        dtype2 = type(answer_dict['Q2'][0][1])
        self.assertEqual([dtype1, dtype2], [str, int])

    def test_Q3_types(self):
        q3_types = set([type(item) for item in answer_dict['Q3']])
        self.assertEqual(q3_types, {str})

    def test_Q4_dim(self):
        self.assertEqual(np.array(answer_dict['Q4']).shape, (9,3))

    def test_Q4_types(self):
        dtype1 = type(answer_dict['Q4'][0][0])
        dtype2 = type(answer_dict['Q4'][0][1])
        dtype3 = type(answer_dict['Q4'][0][2])
        self.assertEqual([dtype1, dtype2, dtype3], [str, int, float])

    def test_Q5_dim(self):
        self.assertEqual(np.array(answer_dict['Q5']).shape, (19,3))

    def test_Q5_types(self):
        dtype1 = type(answer_dict['Q5'][0][0])
        dtype2 = type(answer_dict['Q5'][0][1])
        dtype3 = type(answer_dict['Q5'][0][2])
        self.assertEqual([dtype1, dtype2, dtype3], [int, int, float])

    def test_Q5_first_zero(self):
        self.assertEqual(answer_dict['Q5'][0][2], 0)


    def test_Q6_dim(self):
        self.assertEqual(np.array(answer_dict['Q6']).shape, (5,6))

    def test_Q5_types(self):
        dtype1 = type(answer_dict['Q6'][0][0])
        dtype2 = type(answer_dict['Q6'][0][1])
        dtype3 = type(answer_dict['Q6'][0][2])
        dtype4 = type(answer_dict['Q6'][0][3])
        dtype5 = type(answer_dict['Q6'][0][4])
        dtype6 = type(answer_dict['Q6'][0][5])
        self.assertEqual([dtype1, dtype2, dtype3, dtype4, dtype5, dtype6], [str, float, float, float, float, float])

    def test_Q6_check_first_and_last_brand(self):
        first_brand = answer_dict['Q6'][0][0]
        last_brand = answer_dict['Q6'][4][0]

        self.assertEqual([first_brand, last_brand], ["Toyota", "Honda"])

    def test_Q7_dim(self):
        self.assertEqual(np.array(answer_dict['Q7'], dtype=object).shape, (7,2))

    def test_Q7_types(self):
        dtype1 = type(answer_dict['Q7'][0][0])
        dtype2 = type(answer_dict['Q7'][0][1])
        self.assertEqual([dtype1, dtype2], [tuple, float])

unittest.main(argv=[''], verbosity=2, exit=False)

test_Q1 (__main__.TestAnswers) ... ok
test_Q2_dim (__main__.TestAnswers) ... ok
test_Q2_types (__main__.TestAnswers) ... ok
test_Q3_types (__main__.TestAnswers) ... ok
test_Q4_dim (__main__.TestAnswers) ... ok
test_Q4_types (__main__.TestAnswers) ... ok
test_Q5_dim (__main__.TestAnswers) ... FAIL
test_Q5_first_zero (__main__.TestAnswers) ... FAIL
test_Q5_types (__main__.TestAnswers) ... FAIL
test_Q6_check_first_and_last_brand (__main__.TestAnswers) ... ok
  self.assertEqual(np.array(answer_dict['Q6']).shape, (5,6))
ok
test_Q7_dim (__main__.TestAnswers) ... FAIL
test_Q7_types (__main__.TestAnswers) ... ok
test_answers_types (__main__.TestAnswers) ... ok
test_if_dict (__main__.TestAnswers) ... ok
test_keys (__main__.TestAnswers) ... ok

FAIL: test_Q5_dim (__main__.TestAnswers)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-43-ea466fca2033>", line 40, in test_Q5_dim
    self.assertEqual(np.array(answer_dict[

<unittest.main.TestProgram at 0x7ba2f59ba200>

Save your answers
> First, take a moment to evaluate your answers and make sure you have not missed anything
> Use the following code to save your answers in pickle format, change the filename using the following format:

FIRSTNAME_LASTNAME_answers.pkl
Example: Juan_Perez_answers.pkl
If you are using google colab you can find your file on the left side bar by clicking the folder icon inside the sample_data folder. Remember to upload the pickle file and the notebook to github and submit their URLs to the google form.

In [45]:
answer_dict

{'Q1': 392.74172108576107,
 'Q2': [['Mercedes-Benz', 333],
  ['BMW', 284],
  ['Chevrolet', 253],
  ['Ford', 185],
  ['GMC', 163]],
 'Q3': ['CNG',
  'Diesel',
  'Gasoline or E85',
  'Gasoline or natural gas',
  'Gasoline or propane',
  'Midgrade',
  'Premium',
  'Premium Gas or Electricity',
  'Premium and Electricity',
  'Premium or E85',
  'Regular',
  'Regular Gas and Electricity',
  'Regular Gas or Electricity'],
 'Q4': [['Cab/Chassis 2WD', 1993, 4.112255865424778],
  ['Cab/Chassis 2WD', 1992, 4.112255865424778],
  ['Cab/Chassis 2WD', 1991, 4.112255865424778],
  ['Cab/Chassis 2WD', 1990, 4.112255865424778],
  ['Cab/Chassis 2WD', 1989, 4.112255865424778],
  ['Cab/Chassis 2WD', 1993, 3.3791118637260777],
  ['Cab/Chassis 2WD', 1992, 3.3791118637260777],
  ['Land Cruiser Wagon 4WD', 1992, 3.3791118637260777],
  ['Cab/Chassis 2WD', 1991, 3.3791118637260777]],
 'Q5': [[1985, 25, -3.0],
  [1985, 26, 1.0],
  [1986, 26, 1.0],
  [1987, 25, 0.0],
  [1987, 26, 0.0],
  [1988, 25, -1.0],
  [1989,

In [46]:
import pickle

file_name = "EMMANUEL_ANYIRA_answers.pkl"
path = ""

with open(path+file_name, 'wb') as f:
    pickle.dump(answer_dict, f, protocol=pickle.HIGHEST_PROTOCOL)