# Milestone 1 - Data Visualization Complementary Views 

## Author - Matthew Denko



## Instructions
1. Create different complementary views of data by applying multiple chart types and aesthetics.
2. Project multiple dimensions using conditioning or faceting (e.g., small multiples) on both categorical and numeric variables.

In [None]:
# Load necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load data

filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df.describe()

# Tables Open vs Head Count

In [None]:
#Scatter plot

ax = plt.figure(figsize=(6, 6)).gca() # define axis
headcount_df.plot.scatter(x = 'HeadCount', y = 'TablesOpen', ax = ax)
ax.set_title('Tables Open vs Head Count') # Give the plot a main title
ax.set_ylabel('Tables Open')# Set text for y axis
ax.set_xlabel('Head Count')

Comments:
    I want to explore the relationship between the amount of open tables and head count. I first plotted a scatter plot however there is a significant amount of overplotting so it is difficult to make any insights. 

In [None]:
#Hexbin Plot

ax = plt.figure(figsize=(6, 6)).gca() # define axis
headcount_df.plot.hexbin(x = 'HeadCount', y = 'TablesOpen', gridsize = 15, ax = ax)
ax.set_title('Tables Open vs Head Count') # Give the plot a main title
ax.set_ylabel('Tables Open')# Set text for y axis
ax.set_xlabel('Head Count')

Comments: 
    The Hexbin helps with the overplotting issue. You can see that as head count increases the amount of open tables tends to increase. The most commen pair is both head count and tables open under 5. 

In [None]:
# Grouped Box Plot

fig = plt.figure(figsize=(6, 6)) # Define plot area
ax = fig.gca() # Define axis 
headcount_df.loc[:,['TablesOpen', 'HeadCount']].boxplot(by = 'HeadCount', ax = ax)
ax.set_title('Box plot of TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_ylim(0.0, 25.0) # Set the limits of the y axis

Comments:
    This box plot does not tell us much besides that there are many distinct points in both Tables Open and Head Count. It would make sense to bin one of these variables and re-plot

# Head Count vs Tables Occupied

In [None]:
#Scatterplot

ax = plt.figure(figsize=(6, 6)).gca() # define axis
headcount_df.plot.scatter(x = 'TablesOcc', y = 'HeadCount', ax = ax, alpha = 0.3)
ax.set_title('Head Count vs Tables Occupied') # Give the plot a main title
ax.set_ylabel('Head Count')# Set text for y axis
ax.set_xlabel('Tables Occupied')

Comments:
    There is a positive relationship between tables occupied and head count as tables occupied goes up head count goes     up. There is significant overplotting so I will create a hex diagram


In [None]:
#Hexbin Plot

ax = plt.figure(figsize=(6, 6)).gca() # define axis
headcount_df.plot.hexbin(x = 'TablesOcc', y = 'HeadCount', gridsize = 15, ax = ax)
ax.set_title('Head Count vs Tables Occupied') # Give the plot a main title
ax.set_ylabel('Head Count')# Set text for y axis
ax.set_xlabel('Tables Occupied')

Comments:
    There is highest concentration is of 0 tables and 0 people at the bar. 

In [None]:
#CrossTab

hour_head = pd.crosstab(headcount_df.loc[:, 'HeadCount'], headcount_df.loc[:, 'TablesOcc'])
num_cars = hour_head.apply(sum, axis = 0)
hour_head = hour_head.div(num_cars, axis = 1)
print(hour_head.head())

In [None]:
#HeatMap

ax = plt.figure(figsize=(6, 6)).gca() # define axis
ax.pcolor(hour_head, cmap = 'Blues')
ax.set_xticks(range(hour_head.shape[1]))
ax.set_xticklabels(hour_head.columns, rotation=90)
ax.set_xlabel('Tables Occupied')
ax.set_ylabel('Head Count')
ax.set_title('Head Count by Tables Occupied')

Comments:
    The most dense combination by far is 0 head count and 0 tables occupied. The heat map helps show how miniscule other concentrations are.

In [None]:
# Grouped Box Plot

fig = plt.figure(figsize=(6, 6)) # Define plot area
ax = fig.gca() # Define axis 
headcount_df.loc[:,['HeadCount', 'TablesOcc']].boxplot(by = 'TablesOcc', ax = ax)
ax.set_title('Box plot of HeadCount') # Give the plot a main title
ax.set_ylabel('HeadCount')# Set text for y axis
ax.set_ylim(0.0, 111.0) # Set the limits of the y axis

Comments:
    There are some interesting outliers around the Tables Occupied range of 4-6. I wonder if there was some sort of event or if that is when the bar is heavily occupied.

In [None]:
#Adding an indicator column for Friday

headcount_df.loc[:,'Friday'] = headcount_df.loc[:,"DayOfWeek"] == 6
headcount_df.columns

In [None]:
g = sns.FacetGrid(headcount_df, col="Friday", row='TablesOcc')
g = g.map(plt.hist, "HeadCount")

Comments:
    The headcount on Fridays is highest when there are lesser Tables Occupied (with 2 being the higest). This is interesting and suggests people are not sitting at tables on Fridays.

In [None]:
source_citation = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
print("this is the source used:",source_citation)