# Chapter 03: Data Wrangling with ``pandas``

In this chapter, I will:

* Discuss the process of data manipulation
* Learn how to explore an API to gather information
* Clean and reshape data with ``pandas``

This notebook will use daily temperature data from the [National Centers for Environmental Information (NCEI) API](https://www.ncdc.noaa.gov/cdo-web/webservices/v2). I will use the Global Historical Climatology Network - Daily (GHCND) dataset for the Boonton 1 station (GHCND:USC00280907).

<div class='alert alert-info' role='alert'>
    <span>See the documentation <a href='https://www1.ncdc.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf'>here</a>.</span>
    <br>
    <span><b>Note</b>: The NCEI is part of the National Oceanic and Atmospheric Administration (NOAA) and, as you can see from the URL for the API, this resource was created when the NCEI was called the NCDC. Should the URL for this resource change in the future, you can search for "NCEI weather API" to find the updated one.</span>
</div>

## Setup

Here, I will import the main modules I'll need (``matplotlib.pyplot`` & ``pandas``) and create two ``DataFrame`` instance:

* A wide data format instance
* A long data format instance

In [4]:
# Necessary imports
from matplotlib import pyplot as plt
import pandas as pd

# File paths
wide_data_file_path = 'data/wide_data.csv'
"""
str: The relative pathway to the wide data format CSV.
"""
long_data_file_path = 'data/long_data.csv'
"""
str: The relative pathway to the long data format CSV.
"""

# Variables for building the DataFrame instances
use_columns_long = ['date', 'datatype', 'value']
"""
list of str: The column headings for the columns I want to use for the long data format DataFrame instance.
"""
parse_date_arg = ['date']
"""
[str]: This variable will be used to tell pandas to parse the DataFrame instances along the date format.
"""

# Creating the DataFrame instances: wide & long
wide_df = pd.read_csv(wide_data_file_path, parse_dates = parse_date_arg)
print(type(wide_df))
"""
pandas.core.frame.DataFrame: The wide format DataFrame instance.
"""
long_df = pd.read_csv(long_data_file_path, parse_dates = parse_date_arg, usecols = use_columns_long)
"""
pandas.core.frame.DataFrame: The long format DataFrame instance.
"""

# Sort columns in the long DataFrame instance
long_df = long_df[use_columns_long]

<class 'pandas.core.frame.DataFrame'>


## Background: Wide Data vs. Long Data

<figure>
    <img src='wide_data_vs_long_data.png' alt='Wide Data Vs. Long Data' width='100%' height='max'>
    <figcaption style='text-align:center;'>Wide Data vs. Long Data</figcaption>
</figure>