<p style="text-align: center; font-family: TimesNewRoman; font-size: 3.000em; color: Black; font-style: bold">

<br>
Grocery Store Price Comparison - Tableau and Python
</p><br>

<p style="text-align: center; font-family: TimesNewRoman; font-size:2.00em;color:Black; font-style:bold">
Vikrant Patil
</p>

# Table of Contents

1. [Section 1](#section1)
   
   1.1 [Overview](#overview)
   
   1.2 [Data Preprocessing](#dataprep)
   
   1.3 [Question 1](#question1)
   

2. [Section 2](#section2)
   
   2.1 [Question 2](#question2)
   
   2.2 [Graph of Price vs Banner](#graph1)
      
   2.3 [Graph of Price vs Region ](#graph2)
      
   2.4 [Graph of Price vs Banner and Region](#graph3)
   
   2.5 [Median Price per Day for Safeway stores by Region](#graph4)
   
   2.6 [Median Price per Day for Trader Joes stores by Region](#graph5)

   2.7 [Median Price per Day for Walmart stores by Region](#graph6)
      
   2.8 [Median Price per Day for Wegmans stores by Region](#graph7)
      
   2.9 [Median Price per Day for Whole Foods stores by Region](#graph8)
      


3. [Conclusions](#conclusions)

# 1. Section 1

<a id = "section1"> </a>

## 1.1 Overview

<a id = "overview"> </a>

This report explores the aggregated data regarding the prices of the products found in different supermarkets (described here as Banner) in different regions of the United States. The two goals of this report are, first to create a view showing the cross-tabulation of regional prices broken down by banner and second being to find any anomalies in the data. 

The final form of the dataset is obtained after joining together three separate files namely stores.json, prices.csv and auditors.csv. The dataset contains 6 different supermarket chains namely Walmart, Wegmans, Whole Foods Trader Joes, Safeway and Kroger and 5 different regions namely New York, Northern California, Kansas, Texas and Hawaii. The data is recorded over a period of 14 days from 16th October 2017 to 29th October 2017.

## 1.2 Data Preprocessing

<a id = "dataprep"> </a>

### Loading required libraries

In [1]:
import numpy as np
import pandas as pd
import warnings 
warnings.filterwarnings('ignore')


### Loading 'stores' json file

In [2]:
stores = pd.read_json("stores.json")

In [3]:
stores.head()

Unnamed: 0,Banner,Region,Store ID
0,Walmart,Northern California,66999
1,Trader Joes,Northern California,4698
2,Safeway,Northern California,39482
3,Whole Foods,Northern California,34957
4,Walmart,New York,12837


### Loading the 'prices' and 'auditors' csv files

In [4]:
auditors = pd.read_csv("auditors.csv")
prices = pd.read_csv("prices.csv")

In [5]:
auditors.head()

Unnamed: 0,Auditor ID,First,Last,Region
0,234,Sue,Smith,Northern California
1,536,Bob,Smith,Northern California
2,98,Jack,Smith,New York
3,203,Jill,Smith,New York
4,304,Jerry,Johnson,Texas


In [6]:
prices.head()

Unnamed: 0,Auditor ID,Date,Price,Store ID,UPC
0,234,10/18/17,24.95,66999,268588472
1,234,10/27/17,49.71,66999,475245085
2,234,10/20/17,25.75,66999,126967843
3,234,10/23/17,18.81,66999,708930835
4,234,10/23/17,33.32,66999,325885139


### Creating an inner join on auditors and prices dataframes on the key 'Auditor ID'

In [7]:
auditors_prices_merged = pd.merge(auditors, prices, on = "Auditor ID", how = "inner")

In [8]:
auditors_prices_merged.head()

Unnamed: 0,Auditor ID,First,Last,Region,Date,Price,Store ID,UPC
0,234,Sue,Smith,Northern California,10/18/17,24.95,66999,268588472
1,234,Sue,Smith,Northern California,10/27/17,49.71,66999,475245085
2,234,Sue,Smith,Northern California,10/20/17,25.75,66999,126967843
3,234,Sue,Smith,Northern California,10/23/17,18.81,66999,708930835
4,234,Sue,Smith,Northern California,10/23/17,33.32,66999,325885139


### Creating an inner join on the previously merged dataframe 'auditors_prices_merged' and stores dataframe on the key 'Store ID'

In [9]:
final_df = pd.merge(auditors_prices_merged, stores, on = "Store ID", how = "inner")

### Converting UPC and Auditor ID from float type to int type

In [10]:
final_df['UPC'] = final_df['UPC'].fillna(0.0).astype(int)
final_df['Auditor ID'] = final_df['Auditor ID'].fillna(0.0).astype(int)

After joining all the dataframes together, I decided to drop the 'Region_y' column, the reason being the column 'Region_x' comes from the 'auditors' dataframe and hence will provide a more accurate description of a store's price and location.

Also, for Question 1, I created a new dataframe called 'answer_df' and to obtain the desired output, I have dropped the additional columns of 'First', 'Last', 'Date'

In [11]:
answer_df = final_df.drop(columns=['Region_y', 'First', 'Last'])
final_df = final_df.drop(columns=['Region_y'])

In [12]:
final_df.head()

Unnamed: 0,Auditor ID,First,Last,Region_x,Date,Price,Store ID,UPC,Banner
0,234,Sue,Smith,Northern California,10/18/17,24.95,66999,268588472,Walmart
1,234,Sue,Smith,Northern California,10/27/17,49.71,66999,475245085,Walmart
2,234,Sue,Smith,Northern California,10/20/17,25.75,66999,126967843,Walmart
3,234,Sue,Smith,Northern California,10/23/17,18.81,66999,708930835,Walmart
4,234,Sue,Smith,Northern California,10/23/17,33.32,66999,325885139,Walmart


### Renaming the 'Region_x' column to 'Region'

In [13]:
final_df = final_df.rename(index=str, columns={"Region_x": "Region"})
answer_df = answer_df.rename(index=str, columns={"Region_x": "Region"})
answer_df_copy = answer_df 

In [14]:
banner_region_groupby = final_df.groupby(['Banner', 'Region']).mean()
banner_region_groupby

Unnamed: 0_level_0,Unnamed: 1_level_0,Auditor ID,Price,Store ID,UPC
Banner,Region,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Safeway,Kansas,1326.0,30.694151,39485,501767100.0
Safeway,New York,98.0,35.292941,39482,521183300.0
Safeway,Texas,304.0,30.619833,29382,510634200.0
Trader Joes,Kansas,1326.0,29.201024,29384,509801000.0
Trader Joes,New York,203.0,30.521937,9487,495229000.0
Trader Joes,Northern California,536.0,34.376035,4698,520140800.0
Trader Joes,Texas,63.0,29.321849,40586,505950500.0
Walmart,Kansas,1326.0,27.646769,40593,510927700.0
Walmart,New York,203.0,28.427648,12837,513504500.0
Walmart,Northern California,234.0,32.854515,66999,493258200.0


## 1.3 Question 1 

<a id = "question1"> </a>

1. The ﬁle prices.csv describes prices collected for products, represented as UPC, at speciﬁc physical store locations, represented as Store ID. The auditors who collected prices at each store are represented as Auditor ID. Store attribute information is described in stores.json, and auditor information is shown in auditors.csv. Can you transform these sources into a cross-tabulation of regional prices alongside each other, broken down by banner, and write this out to a spreadsheet (CSV or XLSX)? Note that a given product is not guaranteed to be found in all markets at a given banner.


### Creating a pivot table to get the cross-tabulation of regional prices

In [15]:
answer_df = pd.pivot_table(answer_df, values='Price', index=['UPC', 'Banner'], columns = ['Region']).reset_index()
answer_df.head()

Region,UPC,Banner,Kansas,New York,Northern California,Texas
0,11873171,Safeway,,6.09,,5.19
1,11873171,Trader Joes,,,,4.99
2,11873171,Walmart,,,5.53,4.75
3,11873171,Wegmans,,5.19,,5.09
4,11873171,Whole Foods,1.99,5.69,,5.49


### Rearranging the columns

In [16]:
answer_df = answer_df[['Banner', 'UPC', 'Northern California', 'New York', 'Kansas', 'Texas']]
answer_df.head()

Region,Banner,UPC,Northern California,New York,Kansas,Texas
0,Safeway,11873171,,6.09,,5.19
1,Trader Joes,11873171,,,,4.99
2,Walmart,11873171,5.53,,,4.75
3,Wegmans,11873171,,5.19,,5.09
4,Whole Foods,11873171,,5.69,1.99,5.49


### Checking the data types of the columns

In [17]:
answer_df.dtypes

Region
Banner                  object
UPC                      int64
Northern California    float64
New York               float64
Kansas                 float64
Texas                  float64
dtype: object

### Exporting the dataframe to an excel file

In [18]:
answer_df.to_excel("output.xlsx") 

# 2. Section 2

<a id = "section2"> </a>

## 2.1 Question 2

<a id = "question2"> </a>

2. Do you notice anything that seems oﬀ with the data we’ve collected? Call out anything you ﬁnd noteworthy. Again, it is not necessary to use the model to ﬁnd the anomalies we’re looking for, but you may use it as a tool to assist you if you wish.


In [19]:
final_df.head()

Unnamed: 0,Auditor ID,First,Last,Region,Date,Price,Store ID,UPC,Banner
0,234,Sue,Smith,Northern California,10/18/17,24.95,66999,268588472,Walmart
1,234,Sue,Smith,Northern California,10/27/17,49.71,66999,475245085,Walmart
2,234,Sue,Smith,Northern California,10/20/17,25.75,66999,126967843,Walmart
3,234,Sue,Smith,Northern California,10/23/17,18.81,66999,708930835,Walmart
4,234,Sue,Smith,Northern California,10/23/17,33.32,66999,325885139,Walmart


### Checking the data types of the columns

In [20]:
final_df.dtypes

Auditor ID      int32
First          object
Last           object
Region         object
Date           object
Price         float64
Store ID        int64
UPC             int32
Banner         object
dtype: object

### Converting the 'Date' column to date datatype

In [21]:
final_df['Date']= pd.to_datetime(final_df['Date']) 

In [22]:
final_df.dtypes

Auditor ID             int32
First                 object
Last                  object
Region                object
Date          datetime64[ns]
Price                float64
Store ID               int64
UPC                    int32
Banner                object
dtype: object

### Checking for NA values in the dataset

In [23]:
final_df[final_df.isnull().any(axis=1)]

Unnamed: 0,Auditor ID,First,Last,Region,Date,Price,Store ID,UPC,Banner


## 2.2 Graph of Price vs Banner

<a id = "graph1"> </a>

In [24]:
%%HTML 

<div class='tableauPlaceholder' id='viz1571454706158' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;JG&#47;JG354NDYG&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='path' value='shared&#47;JG354NDYG' /> <param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;JG&#47;JG354NDYG&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571454706158');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1620px';vizElement.style.minHeight='687px';vizElement.style.maxHeight='887px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1620px';vizElement.style.minHeight='687px';vizElement.style.maxHeight='887px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='727px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) Whole Foods is the cheapest supermarket according to the above graph (Graph of Price vs Banner) and it also has the most occurences in the dataset (2802). 

2) Safeway and Trader Joes have nearly the same median price range (30.89 and 30.94 respectively) for a product but their occurrences are a bit low (1963 and 2062 respectively) as compared to Whole Foods and Walmart (2802 and 2435 respectively). 

3) The median price of a product at Whole Foods ($19.09) is very low as compared to all the others and we could attribute it to an error in data collection.

## 2.3 Graph of Price vs Regions

<a id = "graph2"> </a>

In [25]:
%%HTML

<div class='tableauPlaceholder' id='viz1571443939910' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Median_Price_per_Region_and_Count&#47;MedianPriceperRegionandCount&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Median_Price_per_Region_and_Count&#47;MedianPriceperRegionandCount' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Median_Price_per_Region_and_Count&#47;MedianPriceperRegionandCount&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571443939910');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1620px';vizElement.style.minHeight='487px';vizElement.style.maxHeight='887px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1620px';vizElement.style.minHeight='487px';vizElement.style.maxHeight='887px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='727px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) According to the graph of Price vs Regions, Northern California is the most expensive state (median price is $34.99), but the number of values for Northern California in the dataset (1328) are less than half of that in Texas (3765) and nearly half of values in New York and Kansas (3331 and 3087 respectively).

2) Northern California's value of median price ($34.99) for a product might be misleading given the low count of records (1328).

3) The median price of a product in Kansas ($17.69) varies a lot from the other three values found in New York, Northern California and Texas (31.09, 34.99 and 29.84 respectively) and we can say that the difference in price range could be due to some errors while recording the prices.

## 2.4    Graph of Price vs Banner and Region

<a id = "graph3"> </a>

In [26]:
%%HTML

<div class='tableauPlaceholder' id='viz1571455911571' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Median_Price_vs_RegionBanner&#47;MedianPricevsRegionBanner&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Median_Price_vs_RegionBanner&#47;MedianPricevsRegionBanner' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Median_Price_vs_RegionBanner&#47;MedianPricevsRegionBanner&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571455911571');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1620px';vizElement.style.minHeight='887px';vizElement.style.maxHeight='1187px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1620px';vizElement.style.minHeight='887px';vizElement.style.maxHeight='1187px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='727px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) At first glance, we can see that the median price of a product in a Kansas based Whole Foods ($1.99) is an erroneous value.

2) In the previous graph of Prices vs Banner, we saw that Whole Foods was the cheapest brand, but in the graph above (Graph of Price vs Banner and Region) we find that in New York, Northern California and Texas, Whole Foods is on the expensive end. We can therefore assume that the data collected for Whole Foods has some errors in it.

3) Other banners (namely Safeway, Trader Joes, Walmart, Wegmans) show a similar range of values in different regions like New York, Northern California and Texas.

## 2.5 Median Price per Day for Safeway stores by Region
<a id = "graph4"> </a>

In [27]:
%%HTML

<div class='tableauPlaceholder' id='viz1571444327183' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Safeway&#47;safeway&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Safeway&#47;safeway' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Safeway&#47;safeway&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571444327183');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies

1) We see a large increase in median prices in New York from 26th to 28th October (from 30.39 to 43.99) and then the median price falls steeply on 30th October (from 45.19 to 24.99).

## 2.6 Median Price per Day for Trader Joes stores by Region

<a id = "graph5"> </a>

In [28]:
%%HTML

<div class='tableauPlaceholder' id='viz1571444394107' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Trader_Joes&#47;traderjoes&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Trader_Joes&#47;traderjoes' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Trader_Joes&#47;traderjoes&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571444394107');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) On 17th October, we see a steep rise in the median price of products in Northern California (from 19.64 to 43.24).

2) There is a sharp increase in the median price of Kansas on 21st October (from 25.74 to 37.84).

## 2.7 Median Price per Day for Walmart stores by Region

<a id = "graph6"> </a>

In [29]:
%%HTML

<div class='tableauPlaceholder' id='viz1571444425677' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Walmart&#47;walmart&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Walmart&#47;walmart' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Walmart&#47;walmart&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571444425677');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) We see that in Northern California, on 19th, 20th and 21st October, the median price shows a sharp increase (from 27.93 to 43.18) and again on 26th October we see a steep increase in median price (from 26.82 to 40.24).

2) We see a sharp decrease in median price on 24th October in Kansas (from 33.92 to 18.13) and also a steep increase in median price in New York on 27th October (from 21.07 to 34.86).


## 2.8 Median Price per Day for Wegmans stores by Region

<a id = "graph7"> </a>

In [30]:
%%HTML

<div class='tableauPlaceholder' id='viz1571444474668' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Wegmans&#47;wegmans&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Wegmans&#47;wegmans' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Wegmans&#47;wegmans&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571444474668');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) We see a lot of fluctuations in the median prices in the region of Kansas. 

2) The trends in Kansas and New York are similar, but those in Texas are sometimes complete opposite of the other two regions namely New York and Kansas.

3) We see a steep decline in the median price in Texas on 29th October (from 39.49 to 26.79) and on the same day a sharp increase in the median price in Kansas (from 23.89 to 35.89).

## 2.9 Median Price per Day for Whole Foods stores by Region

<a id = "graph8"> </a>

In [31]:
%%HTML

<div class='tableauPlaceholder' id='viz1571444706062' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Whole_Foods&#47;wholefoods&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Engage3_Whole_Foods&#47;wholefoods' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;En&#47;Engage3_Whole_Foods&#47;wholefoods&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1571444706062');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

### Observations and Anamolies:

1) In the above graph of Median Price per Day for Whole Foods stores by Region, we can surely say that the median price of a product in a Kansas based Whole Foods store is erroneous ($1.99 throughout the graph).

2) If we observe Northern California's median prices on 18th and 19th October, we see approximately a $20 decrease in the median price. In those two days, the prices in New York and Texas have increased steeply. 

3) On 22nd and 23rd October, the median price in Northern California again increases sharply (from 34.94 to 43.49) and by the end of the time period, we see that the median price has gone way too high ($50.19) as compared to other regions.

# 3. Conclusions

<a id = "conclusions"> </a>

1) More information on store scaling factor and regional scaling factor can give us more insights on a few things observed in the graphs above. For example, the mean price of Whole Foods is the lowest of all, but when grouped with region, Whole Foods is the most expensive retail store. The reason behind this could be clearer given the above mentioned scaling factors.

2) With more details about a particular product, we can be able derive more insights from the data, to see if a particular product category is more expensive in other states even if other categories in that state are cheaper and so on.
