# Visualizing Time Series Data in Python

Time series data is omnipresent in the field of Data Science. Whether it is analyzing business trends, forecasting company revenue or exploring customer behavior, every data scientist is likely to encounter time series data at some point during their work. To get you started on working with time series data, this notebook will provide practical knowledge on visualizing time series data using Python.

## Table of Contents

- [Introduction](#intro)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

path = "data/dc28/"

---
<a id='intro'></a>

## Introduction

## Load your time series data

The most common way to import time series data in Python is by using the pandas library. You can use the `read_csv()` from pandas to read the contents of a file into a DataFrame. This can be achieved using the following command: `df = pd.read_csv("name_of_your_file.csv")`.

Once your data is loaded into Python, you can display the first rows of your DataFrame by calling the `.head(n=5)` method, where n=5 indicates that you want to print the first five rows of your DataFrame.

In this exercise, you will read in a time series dataset that contains the number of "great" inventions and scientific discoveries from 1860 to 1959, and display its first five rows.

In [2]:
# Read in the file content in a DataFrame called discoveries
discoveries = pd.read_csv(path+'ch1_discoveries.csv')

# Display the first five lines of the DataFrame
print(discoveries.head())

         date  Y
0  01-01-1860  5
1  01-01-1861  3
2  01-01-1862  0
3  01-01-1863  2
4  01-01-1864  0


## Test whether your data is of the correct type

When working with time series data in pandas, any date information should be formatted as a `datetime64` type. Therefore, it is important to check that the columns containing the date information are of the correct type. You can check the type of each column in a DataFrame by using the `.dtypes` attribute. Fortunately, if your date columns come as strings, epochs, etc... you can use the `to_datetime()` function to convert them to the appropriate datetime64 type: `df['date_column'] = pd.to_datetime(df['date_column'])`.

In this exercise, you will learn how to check the data type of the columns in your time series data and convert a date column to the appropriate datetime type.

In [3]:
# Print the data type of each column in discoveries
print(discoveries.dtypes)

# Convert the date column to a datestamp type
discoveries['date'] = pd.to_datetime(discoveries['date'])

# Print the data type of each column in discoveries, again
print(discoveries.dtypes)

date    object
Y        int64
dtype: object
date    datetime64[ns]
Y                int64
dtype: object


---
<a id='intro'></a>

<img src="images/ts2_001.png" alt="" style="width: 400px;"/>