certification python project from freecodecamp
-
using Python to visualize time series data using a line chart, bar chart, and box plots.
-
the data visualizations will help us understand the patterns in visits and identify yearly and monthly growth of page views each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03.
-
import libaries
library
import matplotlib.pyplot as plt import pandas as pd import seaborn as sns from pandas.plotting import register_matplotlib_converters register_matplotlib_converters()
-
Import data (Make sure to parse dates. Consider setting index column to 'date')
import data
df = pd.read_csv("fcc-forum-pageviews.csv", index_col="date", parse_dates=["date"])
-
clean the data by filtering out days when the page views were in the top 2.5% of the dataset or bottom 2.5% of the dataset
clean data
df = df[(df["value"] >= df["value"].quantile(0.025)) & (df["value"] <= df["value"].quantile(0.975))]
-
create a draw_line_plot function that uses Matplotlib to draw a line chart. The title should be Daily freeCodeCamp Forum Page Views 5/2016-12/2019. The label on the x axis should be Date and the label on the y axis should be Page Views
draw_line_plot
def draw_line_plot(): - draw line plot fig, ax = plt.subplots(figsize=(12,6)) ax.plot(df.index, df["value"], color="red", linewidth=1) ax.set_xlabel("Date") ax.set_ylabel("Page Views") ax.set_title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019")
- save image and return fig fig.savefig('line_plot.png') return fig -
create a draw_bar_plot function that draws a bar chart. It should show average daily page views for each month grouped by year. The legend should show month labels and have a title of Months. On the chart, the label on the x axis should be Years and the label on the y axis should be Average Page Views
draw_bar_plot
def draw_bar_plot(): - copy and modify data for monthly bar plot df_bar = df.copy(deep=True) df_bar["Months"] = df_bar.index.month df_bar["tahun"] = df_bar.index.year df_bar["bulan_angka"] = df_bar.index.month df_bar = pd.DataFrame(df_bar.groupby(["tahun", "Months", "bulan_angka"])["value"].mean()) df_bar.reset_index(inplace=True)
- draw bar plot
fig, ax = plt.subplots(figsize = (14,10)) ax = sns.barplot(data = df_bar, x = "tahun", y = "value", hue = "Months", hue_order=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], palette = "bright") sns.move_legend(ax, "upper left") ax.set_xlabel("Years") ax.set_ylabel("Average Page Views") - save image and return fig fig.savefig('bar_plot.png') return fig
- draw bar plot
-
create a draw_box_plot function that uses Seaborn to draw two adjacent box plots. These box plots should show how the values are distributed within a given year or month and how it compares over time. The title of the first chart should be Year-wise Box Plot (Trend) and the title of the second chart should be Month-wise Box Plot (Seasonality). Make sure the month labels on bottom start at Jan and the x and y axis are labeled correctly
draw_box_plot
def draw_box_plot(): df_box = df.copy(deep=True) df_box.reset_index(inplace=True) df_box['year'] = [d.year for d in df_box.date] df_box['month'] = [d.strftime('%b') for d in df_box.date] df_box["month_angka"] = [d.strftime("%m") for d in df_box["date"]]
- draw box plots (using Seaborn) fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,6)) p = sns.boxplot(data = df_box, x = "year", y = "value", ax=ax[0]) p.set_title("Year-wise Box Plot (Trend)") p.set_xlabel("Year") p.set_ylabel("Page Views")
q = sns.boxplot(data = df_box.sort_values(by="month_angka"), x = "month", y = "value", ax=ax[1]) q.set_title("Month-wise Box Plot (Seasonality)") q.set_xlabel("Month") q.set_ylabel("Page Views")
- save image and return fig fig.savefig('box_plot.png') return fig -
put them together in time_series_visualizer.py file
-
call via main.py