# What topics drive the Trending Page in Youtube?
---

The purpose of this project is to find what are the topics that dominate the [**Trending**](https://www.youtube.com/feed/trending) page of Youtube. 

If you want to understand the code I'm using to get the data, visit the [**Github Repo**](https://github.com/germarr/youtube_trending_videos) that I created for that script. There yo will find a tutorial that explains all the code in greater detail.

## Index
---

1. [**Getting The Data**](#Getting-The-Data)
2. [**Adjusting The Data**](#Adjusting-The-Data)
3. [**Data Analysis**](#Data-Analysis)
4. [**Findings** ](#Findings)
5. [**Conclusion**](#Conclusion)

## Getting The Data
---
1. Download the [google python client](https://github.com/googleapis/google-api-python-client)  and the [datetime](https://pypi.org/project/DateTime/) libraries via `PIP`. 

```console
pip install google-api-python-client
pip install DateTime
```

2. The libraries that are required for this project are:

In [192]:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime,date

%load_ext autoreload
%autoreload 2
import get_video_list as gvl
import merge_day as mer

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


3. To handle some information regarding dates, I created two variables `date_new` which get the current date and time and `title`. Once we export our CSV file, the file will be called with the value that is stored in `title`

In [33]:
date_new= date(datetime.now().year,datetime.now().month,datetime.now().day).isoformat()
title=f"{date_new[0:4]}_{date_new[5:7]}_{date_new[8:10]}_{datetime.now().hour}"
title_v=f"{date_new[0:4]}_{date_new[5:7]}_{date_new[8:10]}"

4. I created Python file called `get_video_list.py`. This file handles the functions that are required to get all the videos from the Trending page of a country. When the cell is activated it will ask for an `API_KEY`. To get this key I recommend to follow this [**tutorial**](https://developers.google.com/youtube/registering_an_application). Once the key is added, the  code will run and export a CSV file with the data to the `trending_videos_data` folder.

* <ins>Notes:</ins> 
    * By default it will get the videos from Mexico. To change this go to the `get_videos_list.py` script and on line 19 change the `regionCode` parameter to the country you want to follow. [Here's](https://www.iso.org/obp/ui/#search) a list of countries abbreviations.
    * Sometimes the code will show a "Warning" message. Even if the message appears the code will get our data.

In [144]:
gvl.get_videos()

API_KEY:  AIzaSyDumaE7Iy1XiF3g8c4ZtHhlOQ-Mi2uSIIk


5. Merge all the videos from the current day.

In [188]:
mer.merge_day()

In [193]:
mer.union()

## Adjusting The Data
---

### Test Files

In [145]:
df = pd.read_csv("test_file.csv", index_col=0)

In [168]:
categories = pd.read_csv("trending_videos_data/categories_mx.csv", index_col=0)

In [200]:
df_day = pd.read_csv("merged_file.csv", index_col=0).merge(categories, on="category_id", how="inner")

In [201]:
union = pd.read_csv("union_file.csv", index_col=0).merge(categories, on="category_id", how="inner")

In [202]:
union.count()

published_date      1000
trending_date       1000
category_id         1000
channel_title       1000
tags                1000
video_title         1000
views               1000
likes               1000
dislikes            1000
comments            1000
description          980
channel_id          1000
link                1000
thumbnail           1000
hour_trending       1000
video_lang            50
count                 50
category_title_x      50
category_title_y    1000
dtype: int64

## Data Analysis
---

### What features do I have?

In [157]:
for col in df.columns:
    print(col)

published_date
trending_date
category_id
channel_title
tags
video_title
views
likes
dislikes
comments
description
channel_id
link
thumbnail
hour_trending
video_lang
count
category_title


### How many NAs do we have per feature?

In [147]:
df.isna().sum()

published_date     0
trending_date      0
category_id        0
channel_title      0
tags               8
video_title        0
views              0
likes              0
dislikes           0
comments           0
description        1
channel_id         0
link               0
thumbnail          0
hour_trending      0
video_lang        18
count              0
category_title     0
dtype: int64

### What category appears the most?

In [183]:
df_day.pivot_table(index=['category_title'], aggfunc='size').reset_index().sort_values(by=0,ascending=False)

Unnamed: 0,category_title,0
6,Music,24
3,Entertainment,18
8,People & Blogs,14
4,Gaming,4
5,Howto & Style,4
1,Comedy,2
7,News & Politics,2
0,Autos & Vehicles,1
2,Education,1
9,Science & Technology,1


### What languages appear the most?

In [185]:
df.pivot_table(index=['video_lang'], aggfunc='size').reset_index().sort_values(by=0,ascending=False)

Unnamed: 0,video_lang,0
3,es-419,11
2,es,6
5,es-MX,6
8,zxx,3
0,en,2
1,en-US,1
4,es-ES,1
6,ko,1
7,zh-Hant,1


## Findings
---

## Conclusions
---