This is a data analysis project found in https://www.freecodecamp.org/. The goal of this project is to demonstrate good foundational knowledge of data analysis with Python.

### Assignment

For this project you will visualize time series data using a line chart, bar chart, and box plots. You will use Pandas, Matplotlib, and Seaborn to visualize a dataset containing the number of page views each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. The data visualizations will help you understand the patterns in visits and identify yearly and monthly growth.

Use the data to complete the following tasks:
* Use Pandas to import the data from "fcc-forum-pageviews.csv". Set the index to the "date" column.
* Clean the data by filtering out days when the page views were in the top 2.5% of the dataset or bottom 2.5% of the dataset.
* Create a `draw_line_plot` function that uses Matplotlib to draw a line chart similar to "examples/Figure_1.png". The title should be "Daily freeCodeCamp Forum Page Views 5/2016-12/2019". The label on the x axis should be "Date" and the label on the y axis should be "Page Views".
* Create a `draw_bar_plot` function that draws a bar chart similar to "examples/Figure_2.png". It should show average daily page views for each month grouped by year. The legend should show month labels and have a title of "Months". On the chart, the label on the x axis should be "Years" and the label on the y axis should be "Average Page Views".
* Create a `draw_box_plot` function that uses Searborn to draw two adjacent box plots similar to "examples/Figure_3.png". These box plots should show how the values are distributed within a given year or month and how it compares over time. The title of the first chart should be "Year-wise Box Plot (Trend)" and the title of the second chart should be "Month-wise Box Plot (Seasonality)". Make sure the month labels on bottom start at "Jan" and the x and x axis are labeled correctly.

For each chart, make sure to use a copy of the data frame. Unit tests are written for you under `test_module.py`.

### Development

For development, you can use `main.py` to test your functions. Click the "run" button and `main.py` will run.

### Testing 

We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button.

### Submitting

Copy your project's URL and submit it to freeCodeCamp.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import datetime
%matplotlib inline

In [None]:
# Import data (Make sure to parse dates. Consider setting index column to 'date'.)
df = pd.read_csv("fcc-forum-pageviews.csv", parse_dates=True)
df.set_index("date", inplace=True)

In [None]:
# Clean data
df = df[(df['value'] > df['value'].quantile(0.025)) & (df['value'] < df['value'].quantile(0.975))]

In [None]:
# Draw line plot
fig, ax = plt.subplots(figsize=(10,8))
ax.plot(df)

plt.title('Daily freeCodeCamp Forum Page Views 5/2016-12/2019')
plt.xlabel('Date')
plt.ylabel('Page Views')

In [None]:
# Copy and modify data for monthly bar plot
df_bar = pd.DataFrame(df.copy()).reset_index()
df_bar['month'] = pd.DatetimeIndex(df_bar['date']).month_name()
df_bar['year'] = pd.DatetimeIndex(df_bar['date']).year

In [None]:
# Draw bar plot    
months = ['January', 'February', 'March', 'April','May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
fig, ax = plt.subplots(1, figsize=(8, 6))

sns.barplot(x='year', y='value', hue='month', hue_order=months, data=df_bar, ci=None)
plt.xlabel('Years')
plt.ylabel('Average Page Views')
plt.legend(loc='upper left', title='Months')    

In [None]:
# Prepare data for box plots (this part is done!)
df_box = df.copy()
df_box.reset_index(inplace=True)
df_box['date'] = pd.to_datetime(df_box.date, format='%Y-%m-%d')

df_box['year'] = [d.year for d in df_box.date]
df_box['month'] = [d.strftime('%b') for d in df_box.date]

In [None]:
# Draw box plots (using Seaborn)
fig, axes = plt.subplots(figsize=(12,6), ncols=2)

ax0 = sns.boxplot(x='year', y='value', data=df_box, ax=axes[0])
ax0.set_xlabel('Year')
ax0.set_ylabel('Page Views')
ax0.set_title('Year-wise Box Plot (Trend)')

months = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

ax1 = sns.boxplot(x='month', y='value', order=months, data=df_box, ax=axes[1])
ax1.set_xlabel('Month')
ax1.set_ylabel('Page Views')
ax1.set_title('Month-wise Box Plot (Seasonality)')