![ice cream flavors](images/difference-flavors-of-ice-cream.jpeg)

# Let's prepare a dataframe of ice cream reviews in Pandas.



We would like to do some analytics on ice cream flavors in order to inform retailers about which flavors might be best to stock and ice cream manufacturers interested in how their flavors are being received.

Pandas offers tools to load, view, and process data.  Today we will be practicing:

1. Loading data from disk into working memory in the form of a dataframe
2. Viewing data in the dataframe
3. Exploring the number of rows and columns, column names and data types, and any missing data
4. How to remove columns and rows with missing data.
5. How to join two dataframes together.

For this notebook you will be guiding your instructor through the steps of loading and exploring a dataset in Pandas.  

As necessary, please use Google to find out how to complete each step.

# Load Modules

First, we have to import the Pandas module.  We will give it the alias 'pd'

In [188]:
import pandas as pd

# Load Data

There are 2 datasets in this repo.  One is called 'products.csv' and contains descriptions and ratings of several ice cream flavors.  The second is called 'reviews.csv' which contains thousands of reviews of the flavors from 'products.csv'.  Load both into separate dataframes named 'products' and 'reviews'.  

These data were downloaded from [Kaggle: Ice Cream Dataset](https://www.kaggle.com/tysonpo/ice-cream-dataset)

In [189]:
products = pd.read_csv('products.csv')
reviews = pd.read_csv('reviews.csv')

In [190]:
type(products)

pandas.core.frame.DataFrame

# Examine Data

Great!  Now let's take a look at these datasets.  

For **EACH** dataset, display: 

1. the first 5 rows, 
2. the last 5 rows, and 
3. 5 random rows.

In [191]:
products.head(5)

Unnamed: 0,key,name,subhead,description,rating,rating_count,ingredients
0,0_bj,Salted Caramel Core,Sweet Cream Ice Cream with Blonde Brownies & a...,Find your way to the ultimate ice cream experi...,3.7,208,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
1,1_bj,Netflix & Chilll'd™,Peanut Butter Ice Cream with Sweet & Salty Pre...,There’s something for everyone to watch on Net...,4.0,127,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
2,2_bj,Chip Happens,A Cold Mess of Chocolate Ice Cream with Fudge ...,Sometimes “chip” happens and everything’s a me...,4.7,130,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
3,3_bj,Cannoli,Mascarpone Ice Cream with Fudge-Covered Pastry...,As a Limited Batch that captured the rapture o...,3.6,70,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
4,4_bj,Gimme S’more!™,Toasted Marshmallow Ice Cream with Chocolate C...,It’s a gimme: there’s always room for s’more. ...,4.5,281,"CREAM, SKIM MILK, WATER, LIQUID SUGAR (SUGAR, ..."


In [192]:
reviews.head()

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
0,0_bj,Ilovebennjerry,2017-04-15,3,Not enough brownies!,10,3,"Super good, don't get me wrong. But I came for..."
1,0_bj,Sweettooth909,2020-01-05,5,I’m OBSESSED with this pint!,3,0,I decided to try it out although I’m not a hug...
2,0_bj,LaTanga71,2018-04-26,3,My favorite...More Caramel Please,5,2,My caramel core begins to disappear about half...
3,0_bj,chicago220,2018-01-14,5,Obsessed!!!,24,1,Why are people complaining about the blonde br...
4,0_bj,Kassidyk,2020-07-24,1,Worst Ice Cream Ever!,1,5,This ice cream is worst ice cream I’ve ever ta...


In [193]:
products.tail(5)

Unnamed: 0,key,name,subhead,description,rating,rating_count,ingredients
52,52_bj,"Wake & "" No Bake "" Cookie Dough Core","Vanilla Ice Cream with Peanut Butter Cookies, ...",Now that we’ve core-fully captured the No Bake...,3.3,68,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
53,53_bj,Brownie Batter Core,Chocolate & Vanilla Ice Creams with Fudge Brow...,Spooning your way to brownie nirvana? Smack da...,3.5,108,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
54,54_bj,Cookies & Cream Cheesecake Core,Chocolate & Cheesecake Ice Creams with Chocola...,What's it called when you find yourself spoon-...,4.4,57,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
55,55_bj,Karamel Sutra® Core,Chocolate & Caramel Ice Creams with Fudge Chip...,Find your way to the ultimate ice cream experi...,4.2,94,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
56,56_bj,Peanut Butter Fudge Core,Chocolate & Peanut Butter Ice Creams with Mini...,Find your way to the ultimate ice cream experi...,4.7,19,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."


In [194]:
reviews.tail()

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
7938,56_bj,Shellyshellzs,2020-04-30,5,Peanut butter fudge heaven,0,0,Oh man I use to be a whatever was on sale girl...
7939,56_bj,Or1234,2020-02-24,5,The best Chocolate Ice Cream Combo,0,0,This is the first chocolate ice cream I’ve tri...
7940,56_bj,ava21,2020-01-31,5,PERFECT!!,0,0,This is the best pint of ice cream I've ever h...
7941,56_bj,yeee,2019-03-13,5,My favorite!,0,0,This is my favorite ice cream ever! Can't get ...
7942,56_bj,Ellehcar,2020-07-24,5,The Best,0,0,This is my favorite flavor...I can just buy on...


In [195]:
products.sample(5)

Unnamed: 0,key,name,subhead,description,rating,rating_count,ingredients
28,28_bj,Half Baked®,Chocolate & Vanilla Ice Creams mixed with Gobs...,Ben & Jerry’s is proud to partner with fellow ...,4.7,887,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."
16,16_bj,Chocolate Chip Cookie Dough,Vanilla Ice Cream with Gobs of Chocolate Chip ...,We knew we were onto something big when we mad...,4.6,983,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
35,35_bj,Peanut Butter Cup,Peanut Butter Ice Cream with Peanut Butter Cups,We interrupt our regularly scheduled programmi...,4.4,41,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
14,14_bj,Cherry Garcia®,Cherry Ice Cream with Cherries & Fudge Flakes,Our euphorically edible tribute to guitarist J...,4.4,153,"CREAM, SKIM MILK, LIQUID SUGAR (SUGAR, WATER),..."
32,32_bj,Minter Wonderland™,Dark Chocolate Mint Ice Cream with Marshmallow...,"Imagine a wild ride through the wintriest, cho...",4.7,32,"CREAM, LIQUID SUGAR (SUGAR, WATER), SKIM MILK,..."


In [196]:
reviews.sample(5)

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
6855,44_bj,monetb1,2020-07-28,5,,0,0,Best cookie dough ever! I cant say more its de...
3561,23_bj,jonpeg,2017-07-17,5,Love this flavor!,1,0,"We love this flavor, it has good, strong coffe..."
345,2_bj,JayIceCream,2020-05-13,3,Bring back Late Night Snack!,5,3,Can I express how excited I was to see this on...
6247,43_bj,diamondh2535,2020-07-11,5,,0,0,My favorite ice cream! No one in my house is a...
5880,39_bj,IceCreamy,2020-02-11,5,New favorite,0,0,I usually don't like anything g cheesecake fla...


create a new view (a window into a dataframe where a dataframe is not copied) into the products dataframe.  Show only the names of the ice cream flavors and the average rating of each flavor.  Display the first SEVEN rows.

In [197]:
products[cols][20:30]

Unnamed: 0,name,rating
20,Chubby Hubby®,4.3
21,Chunky Monkey®,4.4
22,Cinnamon Buns®,4.6
23,Coffee Coffee BuzzBuzzBuzz!®,4.9
24,Coffee Toffee Bar Crunch,2.9
25,Cold Brew Caramel Latte,4.6
26,Everything But The...®,3.8
27,Glampfire Trail Mix™,4.7
28,Half Baked®,4.7
29,Ice Cream Sammie,5.0


# Explore Data

Now print the shape of each dataframe, the number of rows and columns it contains

In [198]:
print(f'shape of products: {products.shape}')
print('shape of reviews: ', reviews.shape)

shape of products: (57, 7)
shape of reviews:  (7943, 8)


Next, use 1 method for each dataframe to examine the names of the columns, data types, and number of non-null values.

In [199]:
products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   key           57 non-null     object 
 1   name          57 non-null     object 
 2   subhead       57 non-null     object 
 3   description   57 non-null     object 
 4   rating        57 non-null     float64
 5   rating_count  57 non-null     int64  
 6   ingredients   57 non-null     object 
dtypes: float64(1), int64(1), object(5)
memory usage: 3.2+ KB


In [200]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7943 entries, 0 to 7942
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   key          7943 non-null   object
 1   author       7659 non-null   object
 2   date         7943 non-null   object
 3   stars        7943 non-null   int64 
 4   title        5332 non-null   object
 5   helpful_yes  7943 non-null   int64 
 6   helpful_no   7943 non-null   int64 
 7   text         7943 non-null   object
dtypes: int64(3), object(5)
memory usage: 496.6+ KB


Do either dataframe have missing values?  If so, print the number of missing values in each column.  This will require chaining 2 methods.

In [201]:
reviews.isnull()

Unnamed: 0,key,author,date,stars,title,helpful_yes,helpful_no,text
0,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...
7938,False,False,False,False,False,False,False,False
7939,False,False,False,False,False,False,False,False
7940,False,False,False,False,False,False,False,False
7941,False,False,False,False,False,False,False,False


In [202]:
reviews.isnull().sum()

key               0
author          284
date              0
stars             0
title          2611
helpful_yes       0
helpful_no        0
text              0
dtype: int64

# Remove Null Values

If a column from one of the datasets was missing any data, we need to deal with that.  Our machine learning algorithms might throw an error if our datasets are missing data.

For each column with missing data:
1. if the column is missing >= 10% of the data, drop the **COLUMN**.
2. if the column is missing < 10% of the data, drop the **ROWS** that are missing data.

In [203]:
# for col in reviews.columns:
#     if reviews.isnull().sum()[col] >= len(reviews[col]) * .1:
#         print(col)
#         reviews.drop(col, axis=1, inplace=True)

In [204]:
print(reviews.shape)

reviews = reviews.dropna(axis=1, thresh=len(reviews) * .9)

print(reviews.shape)

(7943, 8)
(7943, 7)


In [205]:
reviews.isna().sum()

key              0
author         284
date             0
stars            0
helpful_yes      0
helpful_no       0
text             0
dtype: int64

In [206]:
reviews = reviews.dropna()

Verify that your dataset no longer contains missing values

In [208]:
print(reviews.shape)
reviews.isna().sum()

(7659, 7)


key            0
author         0
date           0
stars          0
helpful_yes    0
helpful_no     0
text           0
dtype: int64

# Final Challenge: Join the Tables!

Both tables have a column labeled 'key'.  This column connects these tables by assigning each flavor a unique key and adding it to the reviews table so the keys for the flavors in 'products' matches the keys for flavor that each review describes.

Our last step will be to join the two tables so that the information from each flavor in the 'product' table is combined with each review in the 'reviews' table.

Name the resulting table 'icecream'

Verify that 'icecream' now contains data from both 'reviews' and 'products' by displaying a random sample of 3 rows.  

Also, verify that 'icecream' has the same number of rows as 'reviews' and a number of columns equal to the sum of the number of columns from 'reviews' and 'products' minus 1.  (Why minus one?)

# Congratulations!

You have:
1. loaded two tables into dataframes
2. viewed the beginning, end, and random samples of the tables
3. examined the shape, feature names, and data types in the tables
4. detected and removed missing values in two different ways
5. joined the two tables into one using a key

# Please take moment to complete the survey below

# [Exit Ticket](https://docs.google.com/forms/d/e/1FAIpQLScVX-8y_vNLjaxFry_wWacl2a8NhvznAQvNkmiuXmxQ6b_wKg/viewform?usp=sf_link)