### Find Unusual Patterns in Hourly Google Search Traffic
The goal here is to find out if there is a link between financial events for MercadoLibre and Google search traffic. 
We will dive into May 2020, when MercadoLibre released a quarterly financial report

In [92]:
import pandas as pd
import holoviews as hv
import hvplot.pandas
import seaborn as sns
import numpy as np

In [20]:
# Read in the search trends data
meli_search_df = pd.read_csv('resources/google_hourly_search_trends.csv', infer_datetime_format=True, parse_dates=True)

# Slice the frame to just the month of May 2020. We first need to convert our Date column to Datetime
meli_search_df['Date'] = pd.to_datetime(meli_search_df['Date'], infer_datetime_format=True, utc=True)

# Set the index to our Datetime column for simple lookup / plot statements
meli_search_df = meli_search_df.set_index('Date')

In [118]:
meli_search_df.tail()

Unnamed: 0_level_0,Search Trends
Date,Unnamed: 1_level_1
2020-09-07 20:00:00+00:00,71
2020-09-07 21:00:00+00:00,83
2020-09-07 22:00:00+00:00,96
2020-09-07 23:00:00+00:00,97
2020-09-08 00:00:00+00:00,96


In [121]:
# Because the end of the dataset coincides with the beginning of the month, 
# We will trim data from Sept. 2020 from the data set so that our measures of centrality are more representative

meli_search_df = meli_search_df.loc[:'2020-08-31']


## Zoom in on May 2020, month of Quarterly Financial Report

In [122]:

fig_march = meli_search_df.loc['2020-05'].hvplot(title= 'May 2020 Search Trend Data')
fig_march

### Interesting Spikes on May 5th 
The chart appears to be a stochastic process oscillating between ~10 and ~100 with reliable cyclicality. However, the data for **May 5th** seems elevated relative to the rest of the month. Let's see if the entire month was anomalous.

### Compare May 2020 to Monthly Median


In [123]:
#Group the data by year and month, so that we see a time series of years and months
df_yr_mnth_group = meli_search_df.groupby(by=[meli_search_df.index.year, meli_search_df.index.month]).sum()
my_median = df_yr_mnth_group.median()
may_2020_total = float(df_yr_mnth_group.loc[(2020, 5)])
print(f'Median Google Search Traffic: {float(my_median)}')
print(f'Total for May, 2020: {may_2020_total}, an {round(float(((may_2020_total-my_median)/my_median)*100),2)}% increase in search traffic')

Median Google Search Traffic: 35201.0
Total for May, 2020: 38181.0, an 8.47% increase in search traffic


## The month of May, 2020 brought a meaningful increase in our search traffic above the median
We could use this data to aid our marketing team and our investor relations team coordinate SEO and other marketing efforts to drive search traffic towards our own press releases or paid sponsorship articles.

In [90]:
# Question for Cam-- the directions say:
#  Calculate the TOTAL search traffic for the month (38181),
#    and then compare the value to the monthly median across all months (44 - 57)
#  Do we need to take May 2020 total traffic, and then compare that to the Median Totals (35172.5) of each month
meli_search_df.groupby(by=[meli_search_df.index.year, meli_search_df.index.month]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,Search Trends
Date,Date,Unnamed: 2_level_1
2016,6,33196
2016,7,33898
2016,8,34459
2016,9,32376
2016,10,32334
2016,11,33793
2016,12,33789
2017,1,32984
2017,2,31901
2017,3,35363


## Mine the Search Traffic Data for Seasonality
Next we want to see if we can *track* and *predict* interest in the company to help our marketing team concentrate their efforts around more optimal hours and days of the week. To do this we'll very cimple plot a heat map. Along the x-axis you'll see the hour of the day (military time), and alond the y-axis you'll see the day of week

In [124]:
# Plot a heat map with the hour as X and the Day of week as Y
meli_search_df.hvplot.heatmap(
    title = 'Daily HeatMap by Hour',
    x='index.hour',
    y='index.dayofweek',
    C= 'Search Trends',
    cmap= 'coolwarm'
).aggregate(function=np.mean)

## Tuesday-Thursday 

In [149]:
big_season_df = meli_search_df.groupby(meli_search_df.index.weekofyear).mean()
median = float(big_season_df.median())
fig = big_season_df.hvplot(title = 'Average Search Traffic by Week of Year')
hline = hv.HLine(median)
hline.opts(
    color = 'orange',
    line_dash= 'dashed',
    line_width= 2.0
)
fig * hline

  """Entry point for launching an IPython kernel.
