# Representation Of Data
So far, we've mostly been manipulating and working with data that are represented as tables. Microsoft Excel, the pandas library in Python, and the CSV file format for datasets were all developed around this representation. Because a table neatly organizes values into rows and columns, we can easily look up specific values at the intersection of a row value and a colum value. Unfortunately, it's very difficult to explore a dataset to uncover patterns when it's represented as a table, especially when that dataset contains many values. We need a different representation of data that can help us identify patterns more easily.

In this project, we'll learn the basics of data visualization, a discipline that focuses on the visual representation of data. As humans, our brains have evolved to develop powerful visual processing capabilities. We can quickly find patterns in the visual information we encounter, which was incredibly important from a survivability standpoint. Unfortunately, when data is represented as tables of values, we can't really take advantage of our visual pattern matching capabilities. This is because our ability to quickly process symbolic values (like numbers and words) is very poor. Data visualization focuses on transforming data from table representations visual ones.

In this course, named Exploratory Data Visualization, we'll focus on data visualization techniques to explore datasets and help us uncover patterns. In this mission, we'll use a specific type of data visualization to understand U.S. unemployment data.

# Introduction To The Data
The United States Bureau of Labor Statistics (BLS) surveys and calculates the monthly unemployment rate. The unemployment rate is the percentage of individuals in the labor force without a job. While unemployment rate isn't perfect, it's a commonly used proxy for the health of the economy. You may have heard politicians and reporters state the unemployment rate when commenting on the economy. 

The BLS releases monthly unemployment data available for download as an Excel file, with the .xlsx file extension. While the pandas library can read in XLSX files, it relies on an external library for actually parsing the format. Let's instead download the same dataset as a CSV file from the website of the Federal Reserve Bank of St. Louis. We've downloaded the monthly unemployment rate as a CSV from January 1948 to August 2016, saved it as unrate.csv, and made it available in this mission.

To download this dataset on your own, head to the Federal Reserve Bank of St. Louis's website, select Text, Comma Separated as the File Format, make sure the Date Range field starts at 1948-01-01 and ends at 2016-08-01.

Before we get into visual representations of data, let's first read this CSV file into pandas to explore the table representation of this data. The dataset we'll be working with is a time series dataset, which means the data points (monthly unemployment rates) are ordered by time. 

When we read the dataset into a DataFrame, pandas will set the data type of the DATE column as a text column. Because of how pandas reads in strings internally, this column is given a data type of object. 

In [1]:
import pandas as pd
unrate = pd.read_csv('unrate.csv')
unrate['DATE'] = pd.to_datetime(unrate['DATE'])
print(unrate.head(12))

         DATE  VALUE
0  1948-01-01    3.4
1  1948-02-01    3.8
2  1948-03-01    4.0
3  1948-04-01    3.9
4  1948-05-01    3.5
5  1948-06-01    3.6
6  1948-07-01    3.6
7  1948-08-01    3.9
8  1948-09-01    3.8
9  1948-10-01    3.7
10 1948-11-01    3.8
11 1948-12-01    4.0


# Table Representation
The dataset contains 2 columns:

DATE: date, always the first of the month. Here are some examples:

1948-01-01: January 1, 1948.

1948-02-01: February 1, 1948.

1948-03-01: March 1, 1948.

1948-12-01: December 1, 1948.

VALUE: the corresponding unemployment rate, in percent.

The first 12 rows reflect the unemployment rate from January 1948 to December 1948:

DATE	VALUE

1948-01-01	3.4

1948-02-01	3.8

1948-03-01	4.0

1948-04-01	3.9

1948-05-01	3.5

1948-06-01	3.6

1948-07-01	3.6

1948-08-01	3.9

1948-09-01	3.8

1948-10-01	3.7

1948-11-01	3.8

1948-12-01	4.0

Take a minute to visually scan the table and observe how the monthly unemployment rate has changed over time. When you're finished, head to the next step in this mission.

# Observations From The Table Representation

We can make the following observations from the table:

In 1948:

monthly unemployment rate ranged between 3.4 and 4.0.

highest unemployment rate was reached in both March and December.

lowest unemployment rate was reached in January.

From January to March, unemployment rate trended up.

From March to May, unemployment rate trended down.

From May to August, unemployment rate trended up.

From August to October, unemployment rate trended down.

From October to December, unemployment rate trended up.

Because the table only contained the data from 1948, it didn't take too much time to identify these observations. If we scale up the table to include all 824 rows, it would be very time-consuming and painful to understand. Tables shine at presenting information precisely at the intersection of rows and columns and allow us to perform quick lookups when we know the row and column we're interested in. In addition, problems that involve comparing values between adjacent rows or columns are well suited for tables. Unfortunately, many problems you'll encounter in data science require comparisons that aren't possible with just tables.

For example, one thing we learned from looking at the monthly unemployment rates for 1948 is that every few months, the unemployment rate switches between trending up and trending down. It's not switching direction every month, however, and this could mean that there's a seasonal effect. Seasonality is when a pattern is observed on a regular, predictable basis for a specific reason. A simple example of seasonality would be a large increase textbook purchases every August every year. Many schools start their terms in August and this spike in textbook sales is directly linked.

We need to first understand if there's any seasonality by comparing the unemployment trends across many years so we can decide if we should investigate it further. The faster we're able to assess our data, the faster we can perform high-level analysis quickly. If we're reliant on just the table to help us figure this out, then we won't be able to perform a high level test quickly. Let's see how a visual representation of the same information can be more helpful than the table representation.

In [6]:
import matplotlib.pyplot as plt
plt.plot()
plt.show()

In [7]:
plt.plot(unrate['DATE'][0:12], unrate['VALUE'][0:12])
plt.show()

In [8]:
plt.plot(unrate['DATE'][0:12], unrate['VALUE'][0:12])
plt.xticks(rotation = 90)
plt.show()

In [None]:
plt.plot(unrate['DATE'][0:12], unrate['VALUE'][0:12])
plt.xticks(rotation = 90)
plt.xlabel('Month')
plt.ylabel('Unemployment Rate')
plt.title('Monthly Unemployment Trends, 1948')
plt.show()