![ice cream flavors](images/difference-flavors-of-ice-cream.jpeg)

# Let's prepare a dataframe of ice cream reviews in Pandas.



We would like to do some analytics on ice cream flavors in order to inform retailers about which flavors might be best to stock and ice cream manufacturers interested in how their flavors are being received.

Pandas offers tools to load, view, and process data.  Today we will be practicing:

1. Loading data from disk into working memory in the form of a dataframe
2. Viewing data in the dataframe
3. Exploring the number of rows and columns, column names and data types, and any missing data
4. How to remove columns and rows with missing data.
5. How to join two dataframes together.

For this notebook you will be guiding your instructor through the steps of loading and exploring a dataset in Pandas.  

As necessary, please use Google to find out how to complete each step.

# Load Modules

First, we have to import the Pandas module.  We will give it the alias 'pd'

In [1]:
import pandas as pd

# Load Data

There are 2 datasets in this repo.  One is called 'products.csv' and contains descriptions and ratings of several ice cream flavors.  The second is called 'reviews.csv' which contains thousands of reviews of the flavors from 'products.csv'.  Load both into separate dataframes named 'products' and 'reviews'.  

These data were downloaded from [Kaggle: Ice Cream Dataset](https://www.kaggle.com/tysonpo/ice-cream-dataset)

In [2]:
products = pd.read_csv("products.csv")

In [3]:
reviews = pd.read_csv("reviews.csv")

# Examine Data

Great!  Now let's take a look at these datasets.  

For **EACH** dataset, display: 

1. the first 5 rows, 
2. the last 5 rows, and 
3. 5 random rows.

create a new view (a window into a dataframe where a dataframe is not copied) into the products dataframe.  Show only the names of the ice cream flavors and the average rating of each flavor.  Display the first SEVEN rows.

# Explore Data

Now print the shape of each dataframe, the number of rows and columns it contains

Do either dataframe have missing values? If so, print the number of missing values in each column. This will require chaining 2 methods.

Next, use 1 method for each dataframe to examine the names of the columns, data types, and number of non-null values.

# Remove Null Values

If a column from one of the datasets was missing any data, we need to deal with that.  Our machine learning algorithms might throw an error if our datasets are missing data.

For each column with missing data:
1. if the column is missing >= 10% of the data, drop the **COLUMN**.
2. if the column is missing < 10% of the data, drop the **ROWS** that are missing data.

Verify that your dataset no longer contains missing values

# Final Challenge: Join the Tables!

Both tables have a column labeled 'key'.  This column connects these tables by assigning each flavor a unique key and adding it to the reviews table so the keys for the flavors in 'products' matches the keys for flavor that each review describes.

Our last step will be to join the two tables so that the information from each flavor in the 'product' table is combined with each review in the 'reviews' table.

Name the resulting table 'icecream'

Verify that 'icecream' now contains data from both 'reviews' and 'products' by displaying a random sample of 3 rows.  

Also, verify that 'icecream' has the same number of rows as 'reviews' and a number of columns equal to the sum of the number of columns from 'reviews' and 'products' minus 1.  (Why minus one?)

How are the data ordered? Let's order them by flavor and then by date.

# Congratulations!

You have:
1. loaded two tables into dataframes
2. viewed the beginning, end, and random samples of the tables
3. examined the shape, feature names, and data types in the tables
4. detected and removed missing values in two different ways
5. joined the two tables into one using a key

# Please take moment to complete the survey below

# [Exit Ticket](https://docs.google.com/forms/d/e/1FAIpQLScVX-8y_vNLjaxFry_wWacl2a8NhvznAQvNkmiuXmxQ6b_wKg/viewform?usp=sf_link)