<h> For this project you will visualize time series data using a line chart, bar chart, and box plots. You will use Pandas, Matplotlib, and Seaborn to visualize a dataset containing the number of page views each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. The data visualizations will help you understand the patterns in visits and identify yearly and monthly growth. <h>

In [None]:
#importing packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
#reading the csv file
df = pd.read_csv('fcc-forum-pageviews.csv')

In [None]:
#changing the date to a datetime
df['date']=pd.to_datetime(df['date'])

<h>Clean the data by filtering out days when the page views were in the top 2.5% of the dataset or bottom 2.5% of the dataset.<h>

In [None]:
df.head()

In [None]:
#filtering for the dataframe: 
filt_df = df.loc[:, df.columns != 'date']
filt_df.head(5)

In [None]:
#getting the low and high at the values of 2.5 and 97.5 percent.
low = .025
high = .975
quant_df = df.quantile([low,high])
print(quant_df)


In [None]:
filt_df = filt_df.apply(lambda x: x[(x>quant_df.loc[low,x.name]) & 
                                    (x < quant_df.loc[high,x.name])], axis=0)

In [None]:
filt_df.head()

In [None]:
filt_df = filt_df.dropna()
filt_df.head()

In [None]:
filt_df = pd.concat([df.loc[:,'date'], filt_df], axis=1)

In [None]:
filt_df.head()

In [None]:
print(filt_df.describe())

<h>Create a draw_line_plot function that uses Matplotlib to draw a line chart similar to "examples/Figure_1.png". The title should be "Daily freeCodeCamp Forum Page Views 5/2016-12/2019". The label on the x axis should be "Date" and the label on the y axis should be "Page Views".<h>

In [None]:
df.plot(x='date',y='value')

<h1>Create a draw_bar_plot function that draws a bar chart similar to "examples/Figure_2.png". It should show average daily page views for each month grouped by year. The legend should show month labels and have a title of "Months". On the chart, the label on the x axis should be "Years" and the label on the y axis should be "Average Page Views".<h1>

In [None]:
filt_df.resample('D',on = 'date').mean()


In [None]:

df2 = df.groupby([df.date.dt.day, df.date.dt.month]).mean().unstack()
df2.columns = df2.columns.droplevel()
df2.plot.bar()
plt.xlabel('dates')
plt.ylabel('avg page views')
plt.show()

<h1>Create a draw_box_plot function that uses Searborn to draw two adjacent box plots similar to "examples/Figure_3.png". These box plots should show how the values are distributed within a given year or month and how it compares over time. The title of the first chart should be "Year-wise Box Plot (Trend)" and the title of the second chart should be "Month-wise Box Plot (Seasonality)". Make sure the month labels on bottom start at "Jan" and the x and x axis are labeled correctly.<h1>

In [None]:
df.head()


In [None]:
df.tail()


In [None]:
plt.figure(figsize=(10, 8))
# make boxplot with Seaborn
sns.boxplot(x="date", y="value", data=df)
# Set labels and title
plt.ylabel("page views", size=14)
plt.xlabel("month", size=14)
plt.title("box plot trend", size=18)
plt.savefig("boxplot.png")

In [None]:
df=pd.DataFrame(np.random.randint(50,1000,365).reshape(-1,1),
                index=pd.date_range('2016-05-09','2020-01-01',freq='D'),
                columns=['value'])
df.reset_index(inplace=True)
df.columns = ['date','value']
df.head()

In [None]:
df_box = pd.DataFrame(df)
df_box['Year'] = df['date'].dt.year
df_box['Month'] = df['date'].dt.month
df_box.loc[df_box['Month'] == 1, 'Month'] = "Jan"
df_box.loc[df_box['Month'] == 2, 'Month'] = "Feb"
df_box.loc[df_box['Month'] == 3, 'Month'] = "Mar"
df_box.loc[df_box['Month'] == 4, 'Month'] = "Apr"
df_box.loc[df_box['Month'] == 5, 'Month'] = "May"
df_box.loc[df_box['Month'] == 6, 'Month'] = "Jun"
df_box.loc[df_box['Month'] == 7, 'Month'] = "Jul"
df_box.loc[df_box['Month'] == 8, 'Month'] = "Aug"
df_box.loc[df_box['Month'] == 9, 'Month'] = "Sep"
df_box.loc[df_box['Month'] == 10, 'Month'] = "Oct"
df_box.loc[df_box['Month'] == 11, 'Month'] = "Nov"
df_box.loc[df_box['Month'] == 12, 'Month'] = "Dec"


    # Draw box plots (using Seaborn)
fig, axes = plt.subplots(figsize=(20, 5), ncols=2, sharex=False)
sns.despine(left=True)

box_plot_year = sns.boxplot(x=df_box['Year'], y=df_box.value, ax=axes[0])
box_plot_year.set_title("Year-wise Box Plot (Trend)")
box_plot_year.set_xlabel('Year') 
box_plot_year.set_ylabel('Page Views')

box_plot_month = sns.boxplot(x=df_box['Month'], y=df_box.value, ax=axes[1])
box_plot_month.set_title("Month-wise Box Plot (Seasonality)")
box_plot_month.set_xlabel('Month')
box_plot_month.set_ylabel('Page Views')   
    
    # Save image and return fig (don't change this part)
fig.savefig('box_plot.png')
return fig

In [None]:
|