![ice cream flavors](images/difference-flavors-of-ice-cream.jpeg)

# Let's prepare a dataframe of ice cream reviews in Pandas.



We would like to do some analytics on ice cream flavors in order to inform retailers about which flavors might be best to stock and ice cream manufacturers interested in how their flavors are being received.

Pandas offers tools to load, view, and process data.  Today we will be practicing:

1. Loading data from disk into working memory in the form of a dataframe
2. Viewing data in the dataframe
3. Exploring the number of rows and columns, column names and data types, and any missing data
4. How to remove columns and rows with missing data.
5. How to join two dataframes together.

For this notebook you will be guiding your instructor through the steps of loading and exploring a dataset in Pandas.  

As necessary, please use Google to find out how to complete each step.

First, we have to import the Pandas module.  We will give it the alias 'pd'

In [2]:
#__SOLUTION__

import pandas as pd

# Load Data

There are 2 datasets in this repo.  One is called 'products.csv' and contains descriptions and ratings of several ice cream flavors.  The second is called 'reviews.csv' which contains thousands of reviews of the flavors from 'products.csv'.  Load both into separate dataframes named 'products' and 'reviews'.  

These data were downloaded from [Kaggle: Ice Cream Dataset](https://www.kaggle.com/tysonpo/ice-cream-dataset)

In [3]:
#__SOLUTION__

products = pd.read_csv('products.csv')
reviews = pd.read_csv('reviews.csv')

# Examine Data

Great!  Now let's take a look at these datasets.  

For each dataset, display: 

1. the first 5 rows, 
2. the last 5 rows, and 
3. 5 random rows.

In [4]:
#__SOLUTION__

products.head(5)

Unnamed: 0,key,name,subhead,description,rating,rating_count,ingredients
0,0_bj,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
1,1_bj,Netflix & Chilll'd™,Peanut Butter Ice Cream with Sweet & Salty Pre...,There’s something for everyone to watch on Net...,4.0,127,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
2,2_bj,Chip Happens,A Cold Mess of Chocolate Ice Cream with Fudge ...,Sometimes “chip” happens and everything’s a me...,4.7,130,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
3,3_bj,Cannoli,Mascarpone Ice Cream with Fudge-Covered Pastry...,As a Limited Batch that captured the rapture o...,3.6,70,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
4,4_bj,Gimme S’more!™,Toasted Marshmallow Ice Cream with Chocolate C...,It’s a gimme: there’s always room for s’more. ...,4.5,281,"CREAM, SKIM MILK, WATER, LIQUID SUGAR (SUGAR, ..."


In [5]:
#__SOLUTION__

products.tail(5)

Unnamed: 0,key,name,subhead,description,rating,rating_count,ingredients
52,52_bj,"Wake & "" No Bake "" Cookie Dough Core","Vanilla Ice Cream with Peanut Butter Cookies, ...",Now that we’ve core-fully captured the No Bake...,3.3,68,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
53,53_bj,Brownie Batter Core,Chocolate & Vanilla Ice Creams with Fudge Brow...,Spooning your way to brownie nirvana? Smack da...,3.5,108,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
54,54_bj,Cookies & Cream Cheesecake Core,Chocolate & Cheesecake Ice Creams with Chocola...,What's it called when you find yourself spoon-...,4.4,57,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
55,55_bj,Karamel Sutra® Core,Chocolate & Caramel Ice Creams with Fudge Chip...,Find your way to the ultimate ice cream experi...,4.2,94,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
56,56_bj,Peanut Butter Fudge Core,Chocolate & Peanut Butter Ice Creams with Mini...,Find your way to the ultimate ice cream experi...,4.7,19,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."


In [6]:
#__SOLUTION__

products.sample(5)

Unnamed: 0,key,name,subhead,description,rating,rating_count,ingredients
48,48_bj,Vanilla Caramel Fudge,Vanilla Ice Cream with Swirls of Caramel & Fudge,If you’ve been looking for reasons to try some...,4.8,29,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
49,49_bj,Boom Chocolatta™ Cookie Core,Mocha & Caramel Ice Creams with Chocolate Cook...,As you slam dunk your spoon through creamy moc...,4.6,97,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
41,41_bj,S'mores,"Chocolate Ice Cream with Fudge Chunks, Toasted...",Remember when cookouts & campfires kindled you...,4.1,151,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
33,33_bj,New York Super Fudge Chunk®,Chocolate Ice Cream with White & Dark Fudge Ch...,"In 1985, to make a name for ourselves in New Y...",4.9,63,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
22,22_bj,Cinnamon Buns®,Caramel Ice Cream with Cinnamon Bun Dough & a ...,Our cool salute to cinnamon buns is so cinnamo...,4.6,83,"CREAM, SKIM MILK, WATER, LIQUID SUGAR (SUGAR, ..."


In [7]:
#__SOLUTION__

reviews.head(5)

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
0,0_bj,Ilovebennjerry,2017-04-15,3,Not enough brownies!,10,3,"Super good, don't get me wrong. But I came for..."
1,0_bj,Sweettooth909,2020-01-05,5,I’m OBSESSED with this pint!,3,0,I decided to try it out although I’m not a hug...
2,0_bj,LaTanga71,2018-04-26,3,My favorite...More Caramel Please,5,2,My caramel core begins to disappear about half...
3,0_bj,chicago220,2018-01-14,5,Obsessed!!!,24,1,Why are people complaining about the blonde br...
4,0_bj,Kassidyk,2020-07-24,1,Worst Ice Cream Ever!,1,5,This ice cream is worst ice cream I’ve ever ta...


In [8]:
#__SOLUTION__

reviews.tail(5)

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
7938,56_bj,Shellyshellzs,2020-04-30,5,Peanut butter fudge heaven,0,0,Oh man I use to be a whatever was on sale girl...
7939,56_bj,Or1234,2020-02-24,5,The best Chocolate Ice Cream Combo,0,0,This is the first chocolate ice cream I’ve tri...
7940,56_bj,ava21,2020-01-31,5,PERFECT!!,0,0,This is the best pint of ice cream I've ever h...
7941,56_bj,yeee,2019-03-13,5,My favorite!,0,0,This is my favorite ice cream ever! Can't get ...
7942,56_bj,Ellehcar,2020-07-24,5,The Best,0,0,This is my favorite flavor...I can just buy on...


In [9]:
#__SOLUTION__

reviews.sample(5)

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
7598,52_bj,Mariah95,2019-05-08,5,Favorite cookie as an ice cream,4,0,When I saw this I was so excited because I’ve ...
7407,50_bj,icecream456,2019-07-14,2,the core is terrible.,0,0,i tried this thinking that the cookie dough wo...
7750,53_bj,tinaaa,2017-07-14,3,Good but missing most of its core,0,0,Probably one of my favorite flavors but not ev...
7327,50_bj,Mns1275,2019-03-28,1,Save your money for a better pint,5,0,I was soooooo excited to try this!! Then I did...
165,0_bj,Megcrum,2020-05-11,5,Favorite one!!,0,0,"I try new flavors, but always get this one too..."


create a new view (a window into a dataframe where a dataframe is not copied) into the products dataframe.  Show only the names of the ice cream flavors and the average rating of each flavor.  Display the first SEVEN rows.

In [10]:
#__SOLUTION__

products[['name', 'rating']].head(7)

Unnamed: 0,name,rating
0,Salted Caramel Core,3.7
1,Netflix & Chilll'd™,4.0
2,Chip Happens,4.7
3,Cannoli,3.6
4,Gimme S’more!™,4.5
5,Peanut Butter Half Baked®,4.9
6,Berry Sweet Mascarpone,4.6


# Explore Data

Now print the shape of each dataframe, the number of rows and columns it contains

In [11]:
#__SOLUTION__

print('Shape of Products: ', products.shape)
print('Shape of Reviews: ', reviews.shape)

Shape of Products:  (57, 7)
Shape of Reviews:  (7943, 8)


Next, use 1 method for each dataframe to examine the names of the columns, data types, and number of non-null values.

In [12]:
#__SOLUTION__

products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   key           57 non-null     object 
 1   name          57 non-null     object 
 2   subhead       57 non-null     object 
 3   description   57 non-null     object 
 4   rating        57 non-null     float64
 5   rating_count  57 non-null     int64  
 6   ingredients   57 non-null     object 
dtypes: float64(1), int64(1), object(5)
memory usage: 3.2+ KB


In [13]:
#__SOLUTION__

reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7943 entries, 0 to 7942
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   key          7943 non-null   object
 1   author       7659 non-null   object
 2   date         7943 non-null   object
 3   stars        7943 non-null   int64 
 4   title        5332 non-null   object
 5   helpful_yes  7943 non-null   int64 
 6   helpful_no   7943 non-null   int64 
 7   text         7943 non-null   object
dtypes: int64(3), object(5)
memory usage: 496.6+ KB


Do either dataframe have missing values?  If so, print the number of missing values in each column.  This will require chaining 2 methods.

In [14]:
#__SOLUTION__

reviews.isna().sum()

key               0
author          284
date              0
stars             0
title          2611
helpful_yes       0
helpful_no        0
text              0
dtype: int64

# Data Cleaning

If a column from one of the datasets was missing any data, we need to deal with that.  Our machine learning algorithms might throw an error if our datasets are missing data.

For each column with missing data:
1. if the column is missing >= 10% of the data, drop the **COLUMN**.
2. if the column is missing < 10% of the data, drop the **ROWS** that are missing data.

In [15]:
#__SOLUTION__

#First drop columns with more than 10% of data missing
reviews = reviews.dropna(axis=1, thresh=len(reviews)*.9)

#Then drop any rows that are still missing data
reviews = reviews.dropna(axis=0)


Verify that your dataset no longer contains missing values

In [16]:
#__SOLUTION__

reviews.isna().sum()

key            0
author         0
date           0
stars          0
helpful_yes    0
helpful_no     0
text           0
dtype: int64

The icecream manufacturers would like to see 

# Final Challenge: Join the Tables!

Both tables have a column labeled 'key'.  This column connects these tables by assigning each flavor a unique key and adding it to the reviews table so the keys for the flavors in 'products' matches the keys for flavor that each review describes.

Our last step will be to join the two tables so that the information from each flavor in the 'product' table is combined with each review in the 'reviews' table.

Name the resulting table 'icecream'

In [17]:
#__SOLUTION__

icecream = pd.merge(reviews, products, on='key')

Verify that 'icecream' now contains data from both 'reviews' and 'products' by displaying a the first 7 rows.  

Also, verify that 'icecream' has the same number of rows as 'reviews' and a number of columns equal to the sum of the number of columns from 'reviews' and 'products' minus 1.  (Why minus one?)

In [23]:
#__SOLUTION__

icecream.head(7)

Unnamed: 0,key,author,date,stars,helpful_yes,helpful_no,text,name,subhead,description,rating,rating_count,ingredients
0,0_bj,Ilovebennjerry,2017-04-15,3,10,3,"Super good, don't get me wrong. But I came for...",Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
1,0_bj,Sweettooth909,2020-01-05,5,3,0,I decided to try it out although I’m not a hug...,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
2,0_bj,LaTanga71,2018-04-26,3,5,2,My caramel core begins to disappear about half...,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
3,0_bj,chicago220,2018-01-14,5,24,1,Why are people complaining about the blonde br...,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
4,0_bj,Kassidyk,2020-07-24,1,1,5,This ice cream is worst ice cream I’ve ever ta...,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
5,0_bj,Nikiera,2020-07-23,2,3,1,I bought this last night to go with Louisiana ...,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
6,0_bj,Mmelvin,2017-05-28,3,3,3,"This is definitely my favorite flavor, but rec...",Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."


In [19]:
#__SOLUTION__

icecream.shape

(7659, 13)

How are the data ordered?  Let's order them by flavor and then by date.

In [24]:
icecream = icecream.sort_values(by=['name','date'])
icecream.head()

Unnamed: 0,key,author,date,stars,helpful_yes,helpful_no,text,name,subhead,description,rating,rating_count,ingredients
992,10_bj,Flavor Reviewer,2017-04-12,5,5,3,Excellent! This flavor has all sorts of things...,Americone Dream®,Vanilla Ice Cream with Fudge-Covered Waffle Co...,"Founded in fudge-covered waffle cones, this ca...",4.7,370,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
980,10_bj,Americone Dream Lover,2017-04-27,5,15,3,"I eat a pint when I am sad, I eat a pint when ...",Americone Dream®,Vanilla Ice Cream with Fudge-Covered Waffle Co...,"Founded in fudge-covered waffle cones, this ca...",4.7,370,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
1156,10_bj,KellyCrayon,2017-05-13,5,0,1,My absolute favorite of all of Ben and Jerry's...,Americone Dream®,Vanilla Ice Cream with Fudge-Covered Waffle Co...,"Founded in fudge-covered waffle cones, this ca...",4.7,370,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
1006,10_bj,Ana808,2017-06-13,3,1,1,I decided to go with something out of the blue...,Americone Dream®,Vanilla Ice Cream with Fudge-Covered Waffle Co...,"Founded in fudge-covered waffle cones, this ca...",4.7,370,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
1027,10_bj,Jennibobunni,2017-06-14,5,0,0,My favorite ice cream in this world! For a lon...,Americone Dream®,Vanilla Ice Cream with Fudge-Covered Waffle Co...,"Founded in fudge-covered waffle cones, this ca...",4.7,370,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."


Lastly, save your new dataframe as a new CSV file called 'icecream.csv' with `index=False`

In [25]:
#__SOLUTION__

icecream.to_csv('icecream.csv', index=False)

# Congratulations!

You have:
1. loaded two tables into dataframes
2. viewed the beginning, end, and random samples of the tables
3. examined the shape, feature names, and data types in the tables
4. detected and removed missing values in two different ways
5. joined the two tables into one using a key

# [Student Exit Ticket](https://docs.google.com/forms/d/1q5DiWQPlEMlcxSzJeNnvuw0T2dj9FYfLoif6r4Yl3dM/edit?usp=sharing)