# Assignment 1 - Google Trends


### INSTRUCTIONS

1. Find your own Question. 
2. Predict the answer to Q. This doesn't need to be correct, but write what is your rationale.
3. Collect Google Trends data to answer the Q. 
4. Compare the data with your answer.

You are provided with 3 examples in this notebook on how you can go about approaching and deriving the answers to your question. This is an individual assignment. You may discuss with your friends, but you are required to work on and submit your report independently. 

#### Submission

Upload your report in PDF format (up to 1 page including figures, so be concise!) to eLearn (Assignment 1) by __January 25 23:59, 2021__.
The late submission penalty is as follows:
  - submit <= 12 hrs late: 10% penalty (i.e., maximum grade is 90%) 
  - submit <= 24 hrs late: 30% penalty (i.e., maximum grade is 70%) 
  - submit > 24 hrs late: 50% penalty (i.e., maximum grade is 50%)

### GRADING 

This assignment constitutes **5% of your final grade**.

### HONOR CODE

You may discuss your ideas at high level with other students in the course. However, you **should not copy** answers from anyone, or take reference to other students' answers and solutions directly.

In [None]:
### Install the below libraries if you haven't

# !pip install pytrends
# !pip install matplotlib
# !pip install plotly

In [None]:
# Set Logger

import logging
logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
                    datefmt='%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)

In [None]:
# Connect to Google
from pytrends.request import TrendReq

pytrends = TrendReq(hl='en-US', tz=-480)

## Example 1: Curious about the popularity of BTS
### My question
**When does BTS get the half of the current popularity?**<br/>
It is hard to directly answer "when does BTS get the popularity?" because the definition of "get the popularity" is vague.

### Predict the answer
As their popularity goes up rapidly in the recent two years, the answer might be 2018.

### Data collection

In [None]:
keywords = ["BTS"]
pytrends.build_payload(keywords, geo='SG', timeframe='2017-01-01 2021-01-03', cat=0)
df = pytrends.interest_over_time()
df.reset_index(inplace=True)

In [None]:
import plotly.express as px
fig = px.line(df, x="date", y="BTS", title='BTS popularity in Google Search')
fig.show()

### Comparison

It is found that the search intentions are more complicated than the popularity of certain items/pesons/events because search might be triggered by other factors rather than their fame/popularity. <br/>
Two peaks are exactly when the concert at the national stadium was announced and tickets went on a sale. See https://www.bandwagon.asia/articles/bts-to-perform-in-singapore-in-january

## Example 2: Haze and Google Trends

### My question
**Does 'haze' search correlate to when haze actually happened?** <br/>
It might be challenging to search old news about haze (to know when it happened), but it's an interesting question.

### Predict the answer
When haze occurs, people would like to know when it will end or what kind of items (e.g., mask) I need, etc. Thus, I think the haze search and offline haze event are well aligned.

### Data Collection


In [None]:
keywords = ["Haze"]
pytrends.build_payload(keywords, geo='SG', timeframe='2004-01-01 2021-01-03', cat=0)
df = pytrends.interest_over_time()
df.reset_index(inplace=True)

In [None]:
import plotly.express as px
fig = px.line(df, x="date", y="Haze", title='Haze popularity in Google Search')
fig.show()

### Comparison

The biggest peak at 2013 was aligned with the worst haze, ["causing record high levels of pollution in Singapore and several parts of Malaysia."](https://en.wikipedia.org/wiki/2013_Southeast_Asian_haze#:~:text=The%202013%20Southeast%20Asian%20haze%20was%20notable%20for%20causing%20record,the%201997%20Southeast%20Asian%20Haze)

Similarly, 2015 haze was [big](https://en.wikipedia.org/wiki/2015_Southeast_Asian_haze).

In 2019, ["Indonesia began to experience haze between June and July. Malaysia was affected from August, while Singapore, Brunei, and Vietnam experienced haze in September."](https://en.wikipedia.org/wiki/2019_Southeast_Asian_haze) Thus, it might be interesting to see when the peaks of 'haze' appears differently in these southeast Asian countries.

## Example 3: Wine and Christmas

### My question
**Is 'wine' more popular on some particular days (e.g., Christmas)?**<br/>
Based on anecdotal evidence, wine seems to be popular in certain periods. Is it captured by Google Trends?

### Predict the answer
Search peaks might exist in other days, too. Not sure whether Christmas week is the highest peak.

### Data collection

In [None]:
keywords = ["Wine"]
pytrends.build_payload(keywords, geo='SG', timeframe='2016-01-01 2021-01-03', cat=0)
df = pytrends.interest_over_time()
df.reset_index(inplace=True)

In [None]:
import plotly.express as px
fig = px.line(df, x="date", y="Wine", title='Wine popularity in Google Search')
fig.show()

### Comparison

Surprisingly, the strong annual cycle comes with the peak at the Christmas season!<br/>
It would be interesting to see whether wine is also popular in other countries at the Christmas season.

Partiularly, this year the peak is 30%p higher than last years. <br/>
It might be social distancing or limitation of soical gathering; people tend to have smaller home party with wines.