# Pandas (E) Data Analysis Workshop
(ft. Kyle Sorensen)

Now more than ever, data has a unique ability to describe our physical, social and technological world. Thanks to relatively recent (and accelerating) advances in computing, those with proper motivation can use data to affect change in any realm they so choose, from ones more directly related to computer science such as machine learning to others with not so obvious connections such as biology and economics!

You have likely heard a lot of talk recently about *data science*, but what exactly is data science? A quick search gives this definition: "an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge from data across a broad range of application domains". Now, this is a mouthful, but essentially, data science pulls from the best of computer science, statistics, mathematics and business to <b>make data understandable and actionable</b>. This is the key, because what worth does data have if we can not interpret it and make decisions based on it?

## `pandas` and Data Science in Python:
The industry standard for this type of work is a library called `pandas`, along with a dependency that will come in handy called `numpy` and a couple lovely data viz tool called `matplotlib` and `seaborn`. There are other libraries that work well for data science in Python that we may explore later, but these are all you will need for now! :) We will load other libraries for modeling and interpolation later.

Below, you will see the usual convention for importing `pandas`, `numpy`, `matplotlib` and `seaborn`:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## `pandas` Basics
Before we get our hands on some real data, let's go over (some of) the basics of `pandas` including...

* The data structures of `pandas`
* Viewing data and metadata
* Importing and exporting data
* Indexing and selecting data
* Merge, join, and concatenate
* Group by and summarization
* Reshaping data

Fortunately for us, the `pandas` documentation is exceptional. Most tasks that you would reasonably want to achieve are documented somewhere in the `pandas` [user guide](https://pandas.pydata.org/docs/user_guide/index.html).

###  The Data Structures of `pandas`

In [2]:
# Code

### Viewing Data and Metadata

In [3]:
# Code

### Importing and Exporting

In [4]:
# Code

### Indexing, Selecting and Filtering by Row and Column

In [5]:
# Code

### Merge, join, and concatenate

In [6]:
# Code

### Groupby and Data Summarization

In [7]:
# Code

### Pivot Tables and Data Reshaping

In [8]:
# Code

Now that we have covered (some of) the basics of `pandas`, we are ready to start working with a real data set. Let's begin!

## Our Dataset (source: [rashida048](https://github.com/rashida048/Datasets/blob/master/home_data.csv))
The dataset we will be using for this workshop contains pricing data on 21,613 homes with variables such as `yr_renovated` which is the year of most recent renovation if available and 0 otherwise, and `waterfront` which is an indicator variable for whether the property is located close to a body of water. The reason for using `pandas` here over something more user-friendly like MS Excel is that our data is 3.5+ MB, making it quite cumbersome to work with in a spreadsheet.

To get started, we will load our data and use the `head(n)` and `info()` methods to display the first 10 rows of data along with a summary of the columns, including data types.

In [9]:
# Code

In [10]:
# Code

Notably, there are a few problems with the data that would affect our future analysis as written. Let's get those fixed before we proceed with the tasks!

In [11]:
# Code

## "Advanced" Tasks for This Workshop
Using the skills we learned above along with some additional functionality from other libraries, we will complete the following tasks...
* Generate a pivot table displaying average home price w.r.t. the number of bathrooms and number of bedrooms and create a 3D visualization of the summarized data
* Construct a time series model for housing prices with a 6 month forecast
* Construct a heatmap of housing prices using latitude and longitude coordinates

### Task 1: Pivot Table and 3D Plot of Average Home Prices

In [12]:
# Code

### Task 2: Holt-Winters' Forecast for Average Home Prices

In [13]:
# Code

### Task 3: Interpolated Heatmap of Average Home Prices

In [14]:
# Code