<img src="images/tableau_cmyk_2015.png" width=50%>

# Tableau Fundamentals

- 02/04/21
- 081720FT

## Topics

- Tableau vs Tableau Public
- Installing Tableau Public
- Loading Data Files 
- Key Vocabulary
- Making Several Types of Plots
    - Scatter Plots with Trendlines
    - Histograms/Grouped Histogram
    - Map scatter plot 
    - Map Shaded Area Plot
- Customizing Plots
- Your Tableau Profile


## Resources/References

- Udemy Course: **Tableau 2020 A-Z: Hands-On Tableau Training for Data Science**
    - https://www.udemy.com/course/tableau10/
- Official Tableau Video Resources:
 - https://public.tableau.com/en-us/s/resources

# Tableau vs Tableau Public

- Tableau Public is the free version of Tableau.
- They are VERY similar, but there are important distinctions:
    - Data Access
        - Tableau can access SQL servers
        - Tableau Public cannot.
    - Saving Projects:
        - Tableau can save and load projects locally.
        - Tableau Public can only save to the cloud

# Installing Tableau Public

- https://public.tableau.com/en-us/s/

# Loading Data

- Tableau Public can load data from many file types:
    - Excel
    - Text Files (csv,tsv)
    - JSON Files
    - Google Sheets
    - etc.
    
- We will download the 2 csv's we will be using into this repo's folder. 

## Saving Dataset for Demo

In [None]:
# !pip install -U fsds
from fsds.imports import *

In [None]:
df1 = fs.datasets.load_mod1_proj(read_csv_kwds={'index_col':0})
display(df1.head(2))
df1.to_csv('regression_data_complete.csv')

In [None]:
df2 = fs.datasets.load_ts_baltimore_crime_counts(read_csv_kwds={'parse_dates':['datetime']})
display(df2.head(2))
df2.to_csv('baltimore_crime_ts.csv')

# Basic Tableau Vocab

- Dimensions: 
    - categorical features/independent variables
    - Show up in Blue on Columns/Row view

- Measures: 
    - numeric features / dependent variables.
    - Measures get aggregated (SUM, MEAN,etc)
    - Shows up in Green on Columns/Row View

- Attributes
    - ?


# Our Tasks

## Load in `regression_data_details.csv`

> - Open Tableau Public and load up the first housing regression dataset file (`regression_data_details.csv`)
    - CSVs are technically "Text Files"   

> - Notice that Tableau automatically replaced text values in numeric columns ('?' in sqft)
    - It also cleaned up the column names (`sqft_living` -> "Sqft Living")
  
- Now click "Go to Worksheet" / "Sheet 1" at the bottom of the app.



## Plots to Make: King's County Housing Data

1. [ ] A scatter plot of Sqft Living vs Price + a trendline.
2. [ ] A scatter plot of Sqft Living vs Price grouped by if its waterfront property (+ trendlines) <br><t>(first with null values then without)
3. [ ] A histogram of price in in **\$**100K-bins.
4. [ ] A histogram of price in in **\$**100K-bins broken out by Waterfront properties.
5. [ ] A map of median prices by zipcode (with a Green color scale broken into 5 shades of green)- see note about maps below.
6. [ ] A map of all homes with color-coded price with the smallest markers possible. 

    
> - **Note: for our maps, we want:**
    - A dark background,
    - Add County names/borders
    - Add major cities
    - Add terrain
    - Add major roadways. 

    
- [ ] **Save the workbook to Tableau Public and make sure it shows all individual sheets.**



###  1. A scatter plot of Sqft Living vs Price + a trendline.

#### Python Answer

In [None]:
sns.regplot(data=df1, x='sqft_living',y='price')

#### Tableau Answer


- Answer: 
    - Columns = Sqft Living (Dimension)
    - Rows = Price (Dimension)
    - Change to Analysis Tab -> Trendline


### 2. A scatter plot of Sqft Living vs Price grouped by if its waterfront property (+ trendlines) 
- (first with null values then without)
    

#### Python Answer

In [None]:
sns.lmplot(data=df1,x='sqft_living',y='price',hue='waterfront')

#### Tableau Answer

- Answer:
    - Duplicate sheet/plot #1
    - Right Click Waterfront -> Convert to Dimension
    - Drag Waterfront -> Color
    
    - To remove Null values:
        - Right click on Null in legend > Exclude
        

### 3. A histogram of price in $100K-bins.

#### Python Answer

In [None]:
sns.histplot(df1,x='price',binwidth=100_000)

#### Tableau Answer

- Answer 1: 
    - Click Price then click Show Me > select histogram. 
- Answer 2: 
    - Right click on Price > Create > Bins
    - Columns = Price Bins
    - Rows = Price (CNT)    
    

### 4. A histogram of price in in $100K-bins - by Waterfront

#### Python Answer

In [None]:
sns.histplot(df1,x='price',hue='waterfront',binwidth=100_000,stat='density')

#### Tableau Answer

- Answer:
    - Duplicate plot #3
    - Drag Waterfront dimension to Color.

### 5. A map of median prices by zipcode (with a Green color scale broken into 5 shades of green)- see note about maps below.

    
> - **Note: for our maps, we want:**
    - A dark background,
    - Add County names/borders
    - Add major cities
    - Add terrain
    - Add major roadways. 

    

#### Python Answer

In [None]:
import plotly.express as px
df1.rename({'long':'lon'},axis=1,inplace=True)
center = dict(df1[['lat','lon']].mean())
center

In [None]:
px.scatter_mapbox(df1,lat='lat',lon='lon',color='price',
              center=center,mapbox_style='carto-darkmatter')

#### Tableau Answer

- Answer:
    - Drag Zipcode onto main pane of plot.
        - Drag Price onto Color
            - Change Price to Median
        - Click on Dropdown Arrow next to Title of Color Scale:
            - Edit Colors
            - Select Green
            - Select Stepped Color
        - Visual Flair: 
            - Right Click on Map > Map Layers
            - Select Dark
            - Add County borders, county labels,
            terrain, cities.

### 6. A map of all homes with color-coded price with the smallest markers possible. 

#### Python Answer

- Not easily implementable without downloading US zipcode geojson file. 

#### Tableau Answer

- Answer:
    - Columns: Long
    - Rows: Lat
    - Color: Price
    - Click on Size > Drag slider to the left.

### **Save the workbook to Tableau Public and make sure it shows all individual sheets.**

## Load in `baltimore_crime_ts.csv`

- After saving the first notebook, close it and load in the other text file. 

### Plots to Make: Baltimore Crime Data

### 1. Create a time series plot of: shootings, Robbery-Street, Robbery-Carjacking, Homicide
    
    
    

#### Python Answer

In [None]:
df2.set_index('datetime',inplace=True)

In [None]:
df2.columns

In [None]:
plot_df = df2[['SHOOTING','ROBBERY - STREET',
               'ROBBERY - CARJACKING','HOMICIDE']].resample('W').sum()
plot_df.plot(figsize=(12,4),alpha=.4)

#### Tableau Answer

- Answer:
    - Double Click one of the crimes, e.g. Shooting
    - Double click Date.
    - Click on Date in the Columns box and click on Month from the second section of options. 
    - To add more crimes:
        - Click and drag their name into the upper left area of the y-axis until the || icon appears. 
    - Reference: https://community.tableau.com/s/question/0D54T00000C5hf4SAB/multiple-series-on-line-graph
    
    

### 2. Create a time series **area** plot of: shootings, Robbery-Street, Robbery-Carjacking, Homicide
    

#### Python Answer

In [None]:
plot_df.plot.area(figsize=(12,4),alpha=.2)

#### Tableau Answer

- Answer:
    - Duplicate the prior sheet
    - To make an area chart: click drop down in Marks pane and select area. 

### Save the notebook to Tableau Public