# Data Exploration - Making Sense of Google Search Data

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

df_bitcoin = pd.read_csv("data/Bitcoin Search Trend.csv")
df_bitcoin_daily = pd.read_csv("data/Daily Bitcoin Price.csv")
df_tesla = pd.read_csv("data/TESLA Search Trend vs Price.csv")
df_unemployment = pd.read_csv("data/UE Benefits Search vs UE Rate 2004-19.csv")

### What are the shapes of the DataFrames?

### How many rows & columns do they have?

In [2]:
df_tesla.shape

(124, 3)

In [3]:
df_unemployment.shape

(181, 3)

In [4]:
df_bitcoin.shape

(73, 2)

In [5]:
df_bitcoin_daily.shape

(2204, 3)

### What are the column names?

In [6]:
df_tesla.columns

Index(['MONTH', 'TSLA_WEB_SEARCH', 'TSLA_USD_CLOSE'], dtype='object')

In [7]:
df_unemployment.columns

Index(['MONTH', 'UE_BENEFITS_WEB_SEARCH', 'UNRATE'], dtype='object')

In [8]:
df_bitcoin.columns

Index(['MONTH', 'BTC_NEWS_SEARCH'], dtype='object')

In [9]:
df_bitcoin_daily.columns

Index(['DATE', 'CLOSE', 'VOLUME'], dtype='object')

### What is the largest number in the search data column? Try using the `.describe()` function.

In [10]:
df_tesla.describe()

Unnamed: 0,TSLA_WEB_SEARCH,TSLA_USD_CLOSE
count,124.0,124.0
mean,8.725806,50.962145
std,5.870332,65.908389
min,2.0,3.896
25%,3.75,7.3525
50%,8.0,44.653
75%,12.0,58.991999
max,31.0,498.320007


In [11]:
df_unemployment.describe()

Unnamed: 0,UE_BENEFITS_WEB_SEARCH,UNRATE
count,181.0,181.0
mean,35.110497,6.21768
std,20.484925,1.891859
min,14.0,3.7
25%,21.0,4.7
50%,26.0,5.4
75%,45.0,7.8
max,100.0,10.0


In [12]:
df_bitcoin.describe()

Unnamed: 0,BTC_NEWS_SEARCH
count,73.0
mean,15.013699
std,15.146959
min,3.0
25%,5.0
50%,14.0
75%,18.0
max,100.0


In [13]:
df_bitcoin_daily.describe()

Unnamed: 0,CLOSE,VOLUME
count,2203.0,2203.0
mean,4429.421245,8043622000.0
std,4148.150071,11765290000.0
min,178.102997,5914570.0
25%,433.629502,60299150.0
50%,3637.52002,2018890000.0
75%,7997.372803,13224780000.0
max,19497.400391,74156770000.0


### What do the Search Numbers mean?

We can see from our DataFrames that Google's search interest ranges between 0 and 100. But what does that mean? Google defines the values of search interest as: 

> Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.

Basically, the actual search volume of a term is not publicly available. Google only offers a scaled number. Each data point is divided by the total searches of the geography and time range it represents to compare relative popularity.



For each word in your search, Google finds how much search volume in each region and time period your term had relative to all the searches in that region and time period. It then combines all of these measures into a single measure of popularity, and then it scales the values across your topics, so the largest measure is set to 100. In short: Google Trends doesn’t exactly tell you how many searches occurred for your topic, but it does give you a nice proxy.

Here are the Google Trends Search Parameters that I used to generate the .csv data:

- "Tesla", Worldwide, Web Search

- "Bitcoin", Worldwide, News Search

- "Unemployment Benefits", United States, Web Search