## 2021: Week 36 - What's Trendy?

For this week's challenge, I wanted to use Google Trends to take a look back over the past couple of years and see what people were searching for. In particular, are these categories still as popular now as they were in peak lockdown? How does the experience vary around the world? We'll be looking at:

- Pet adoption (who didn't want a furry work from home buddy?!)
- Online streamer (can one make money from playing video games?)
- Staycations (everyone's favourite word, right?)

### Input
There are 2 inputs this week:
1. Timeline - indexes for how popular the term is
![img](https://lh3.googleusercontent.com/-lVnuEqzTNtY/YS-ZW0sI3xI/AAAAAAAAA7Y/30m_U5C0SuYvhfe0Mssodb3bW7jlqQvDgCLcBGAsYHQ/w400-h126/image.png)

2. Country Breakdown - the percentage of these terms popularity in each country
![img](https://lh3.googleusercontent.com/-bZ4c-74Ebz8/YS-ZfQbLv_I/AAAAAAAAA7c/r-lB8iGWrAoWjQVe_HXx2nJ7bfycgyFzwCLcBGAsYHQ/w400-h85/image.png)

### Requirement
- Input the data
- Calculate the overall average index for each search term
- Work out the earliest peak for each of these search terms
- For each year (1st September - 31st August), calculate the average index
- Classify each search term as either a Lockdown Fad or Still Trendy based on whether the average index has increased or decreased since last year
- Filter the countries so that only those with values for each search term remain
- For each search term, work out which country has the highest percentage
- Bring everything together into one dataset
- Output the data

### Output
![img](https://lh3.googleusercontent.com/-NwnYxJvHKCQ/YS-bPi52xnI/AAAAAAAAA7k/oIYuwkhV79kyMnmelYviOZygovSXnSmagCLcBGAsYHQ/w400-h58/image.png)

- 7 fields
    - Search Term
    - Status
    - 2020/21 avg index
    - Avg index
    - Index Peak
    - First Peak
    - Country with highest percentage
- 3 rows (4 including headers)

In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input the data

In [26]:
data = pd.read_excel("./data/Trend Input.xlsx", sheet_name=[0, 1])
timeline = data[0].copy()
country_breakdown = data[1].copy()

In [27]:
timeline

Unnamed: 0,Category: All categories,Unnamed: 1,Unnamed: 2,Unnamed: 3
0,,,,
1,Week,Pet adoption: (Worldwide),Online streamer: (Worldwide),Staycation: (Worldwide)
2,2016-09-04 00:00:00,69,11,6
3,2016-09-11 00:00:00,70,10,4
4,2016-09-18 00:00:00,64,17,3
...,...,...,...,...
258,2021-08-01 00:00:00,52,46,42
259,2021-08-08 00:00:00,56,48,42
260,2021-08-15 00:00:00,57,48,42
261,2021-08-22 00:00:00,61,48,37


In [28]:
timeline.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 263 entries, 0 to 262
Data columns (total 4 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Category: All categories  262 non-null    object
 1   Unnamed: 1                262 non-null    object
 2   Unnamed: 2                262 non-null    object
 3   Unnamed: 3                262 non-null    object
dtypes: object(4)
memory usage: 8.3+ KB


In [29]:
country_breakdown

Unnamed: 0,Category: All categories,Unnamed: 1,Unnamed: 2,Unnamed: 3
0,,,,
1,Country,Pet adoption: (01/09/2016 - 01/09/2021),Online streamer: (01/09/2016 - 01/09/2021),Staycation: (01/09/2016 - 01/09/2021)
2,Hong Kong,0.03,0.05,0.92
3,South Korea,,,
4,Guernsey,,,1
...,...,...,...,...
247,Tuvalu,,,
248,US Outlying Islands,,,
249,Vatican City,,,
250,Wallis & Futuna,,,


In [30]:
country_breakdown.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 252 entries, 0 to 251
Data columns (total 4 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Category: All categories  251 non-null    object
 1   Unnamed: 1                21 non-null     object
 2   Unnamed: 2                22 non-null     object
 3   Unnamed: 3                24 non-null     object
dtypes: object(4)
memory usage: 8.0+ KB


### Calculate the overall average index for each search term

In [31]:
timeline.columns = ["Week", "Pet_adoption", "Online_streamer", "Staycation"]
timeline = timeline.drop([0, 1], axis=0)
timeline = timeline.reset_index(drop=True)

In [32]:
timeline = timeline.melt(id_vars="Week", var_name="Category", value_name="Search")
timeline

Unnamed: 0,Week,Category,Search
0,2016-09-04,Pet_adoption,69
1,2016-09-11,Pet_adoption,70
2,2016-09-18,Pet_adoption,64
3,2016-09-25,Pet_adoption,64
4,2016-10-02,Pet_adoption,63
...,...,...,...
778,2021-08-01,Staycation,42
779,2021-08-08,Staycation,42
780,2021-08-15,Staycation,42
781,2021-08-22,Staycation,37


In [34]:
avg_index = timeline.groupby(["Category"])["Search"].mean()
avg_index

Category
Online_streamer    29.770115
Pet_adoption       63.536398
Staycation         14.245211
Name: Search, dtype: float64

### Work out the earliest peak for each of these search terms

In [40]:
timeline[timeline["Category"] == "Pet_adoption"].sort_values(by="Search", ascending=False).head(1)

Unnamed: 0,Week,Category,Search
188,2020-04-12,Pet_adoption,100


In [41]:
def earliest_peak(df_, category_name):
    df = df_[df_["Category"] == str(category_name)]
    peak = df.sort_values(by="Search", ascending=False).head(1)
    
    return peak

In [44]:
first_peak = pd.concat([earliest_peak(timeline, "Pet_adoption"),
                        earliest_peak(timeline, "Online_streamer"),
                        earliest_peak(timeline, "Staycation")], axis=0)
first_peak

Unnamed: 0,Week,Category,Search
188,2020-04-12,Pet_adoption,100
482,2020-11-29,Online_streamer,84
782,2021-08-29,Staycation,44


### For each year (1st September - 31st August), calculate the average index

In [49]:
timeline = timeline.set_index("Week")
timeline

Unnamed: 0_level_0,Category,Search
Week,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-09-04,Pet_adoption,69
2016-09-11,Pet_adoption,70
2016-09-18,Pet_adoption,64
2016-09-25,Pet_adoption,64
2016-10-02,Pet_adoption,63
...,...,...
2021-08-01,Staycation,42
2021-08-08,Staycation,42
2021-08-15,Staycation,42
2021-08-22,Staycation,37


In [55]:
lockdown_yr_avg_index = timeline.loc["2020-09-01":"2021-08-31"].groupby(["Category"])["Search"].mean()
lockdown_yr_avg_index

  """Entry point for launching an IPython kernel.


Category
Online_streamer    53.096154
Pet_adoption       66.461538
Staycation         34.769231
Name: Search, dtype: float64

### Classify each search term as either a Lockdown Fad or Still Trendy based on whether the average index has increased or decreased since last year

In [58]:
prev_avg_index = timeline.loc["2019-09-01":"2020-08-31"].groupby(["Category"])["Search"].mean()
prev_avg_index

Category
Online_streamer    39.037736
Pet_adoption       72.000000
Staycation         15.886792
Name: Search, dtype: float64

In [62]:
status = pd.concat([lockdown_yr_avg_index, prev_avg_index], axis=1).reset_index()
status.columns = ["Category", "lockdown_avg_index", "prev_avg_index"]
status

Unnamed: 0,Category,lockdown_avg_index,prev_avg_index
0,Online_streamer,53.096154,39.037736
1,Pet_adoption,66.461538,72.0
2,Staycation,34.769231,15.886792


In [63]:
def trend_cal(lockdown, prev):
    if lockdown > prev:
        return "Still Trendy"
    else:
        return "Lockdown Fad"

In [67]:
status["Status"] = status.apply(lambda x: trend_cal(x["lockdown_avg_index"], x["prev_avg_index"]), axis=1)
status

Unnamed: 0,Category,lockdown_avg_index,prev_avg_index,Status
0,Online_streamer,53.096154,39.037736,Still Trendy
1,Pet_adoption,66.461538,72.0,Lockdown Fad
2,Staycation,34.769231,15.886792,Still Trendy
