In [16]:
%load_ext autoreload
%autoreload 2

In [17]:
from data_utils import (generate_half_year_tuples, 
                        query_yt_videos, 
                        extract_pageInfo, 
                        extract_videoInfo_from_all,
                        collect_commentInfo,
                        collect_videoDetails)

The data collection consists of 2 parts: 1. search for videos on YouTube and 2. query top-level comments using the video IDs.

## Query videos

In [18]:
FITSPO_Q = "fitspo|fitspiration"
BODY_POSI_Q = "bodypositive|bodypositivity"

In [2]:
# Generate half year tuples from year 2013 to 2023
years_to_query = generate_half_year_tuples(list(range(2013, 2024)))
years_to_query

[('2013-01-01T00:00:00Z', '2013-06-30T23:59:59Z'),
 ('2013-07-01T00:00:00Z', '2013-12-31T23:59:59Z'),
 ('2014-01-01T00:00:00Z', '2014-06-30T23:59:59Z'),
 ('2014-07-01T00:00:00Z', '2014-12-31T23:59:59Z'),
 ('2015-01-01T00:00:00Z', '2015-06-30T23:59:59Z'),
 ('2015-07-01T00:00:00Z', '2015-12-31T23:59:59Z'),
 ('2016-01-01T00:00:00Z', '2016-06-30T23:59:59Z'),
 ('2016-07-01T00:00:00Z', '2016-12-31T23:59:59Z'),
 ('2017-01-01T00:00:00Z', '2017-06-30T23:59:59Z'),
 ('2017-07-01T00:00:00Z', '2017-12-31T23:59:59Z'),
 ('2018-01-01T00:00:00Z', '2018-06-30T23:59:59Z'),
 ('2018-07-01T00:00:00Z', '2018-12-31T23:59:59Z'),
 ('2019-01-01T00:00:00Z', '2019-06-30T23:59:59Z'),
 ('2019-07-01T00:00:00Z', '2019-12-31T23:59:59Z'),
 ('2020-01-01T00:00:00Z', '2020-06-30T23:59:59Z'),
 ('2020-07-01T00:00:00Z', '2020-12-31T23:59:59Z'),
 ('2021-01-01T00:00:00Z', '2021-06-30T23:59:59Z'),
 ('2021-07-01T00:00:00Z', '2021-12-31T23:59:59Z'),
 ('2022-01-01T00:00:00Z', '2022-06-30T23:59:59Z'),
 ('2022-07-01T00:00:00Z', '2022

In [21]:
query_yt_videos(years_to_query)

('2013-01-01T00:00:00Z', '2013-06-30T23:59:59Z')
Search results for fitspo|fitspiration from 2013-01-01T00:00:00Z to 2013-06-30T23:59:59Z saved to pickle file
Search results for bodypositive|bodypositivity from 2013-01-01T00:00:00Z to 2013-06-30T23:59:59Z saved to pickle file
('2013-07-01T00:00:00Z', '2013-12-31T23:59:59Z')
Search results for fitspo|fitspiration from 2013-07-01T00:00:00Z to 2013-12-31T23:59:59Z saved to pickle file
Search results for bodypositive|bodypositivity from 2013-07-01T00:00:00Z to 2013-12-31T23:59:59Z saved to pickle file
('2014-01-01T00:00:00Z', '2014-06-30T23:59:59Z')
Search results for fitspo|fitspiration from 2014-01-01T00:00:00Z to 2014-06-30T23:59:59Z saved to pickle file
Search results for bodypositive|bodypositivity from 2014-01-01T00:00:00Z to 2014-06-30T23:59:59Z saved to pickle file
('2014-07-01T00:00:00Z', '2014-12-31T23:59:59Z')
Search results for fitspo|fitspiration from 2014-07-01T00:00:00Z to 2014-12-31T23:59:59Z saved to pickle file
Search res

## Organize Video Search Data

### Extract Page Info

In [26]:
pageInfo_df = extract_pageInfo(years_to_query)

According to YouTube API:
- ```total_results```: The total number of results in the result set.Please note that the value is an approximation and may not represent an exact value. In addition, the maximum value is 1,000,000.
- ```returned_results```: The number of results included in the API response.

In [28]:
pageInfo_df.head()

Unnamed: 0,keyword,published_after,published_before,total_results,returned_results
0,fitspo|fitspiration,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z,2322,46
1,bodypositive|bodypositivity,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z,9172,0
2,fitspo|fitspiration,2013-07-01T00:00:00Z,2013-12-31T23:59:59Z,1995,50
3,bodypositive|bodypositivity,2013-07-01T00:00:00Z,2013-12-31T23:59:59Z,8593,1
4,fitspo|fitspiration,2014-01-01T00:00:00Z,2014-06-30T23:59:59Z,4183,46


In [29]:
pageInfo_df.groupby("keyword").sum('returned_results')  

Unnamed: 0_level_0,total_results,returned_results
keyword,Unnamed: 1_level_1,Unnamed: 2_level_1
bodypositive|bodypositivity,7382354,752
fitspo|fitspiration,564977,1045


### Extract Video Info

In [44]:
video_info_df = extract_videoInfo_from_all(year_list=years_to_query)

Extracted video info for fitspo|fitspiration from 2013-01-01T00:00:00Z to 2013-06-30T23:59:59Z
Extracted video info for bodypositive|bodypositivity from 2013-01-01T00:00:00Z to 2013-06-30T23:59:59Z
Extracted video info for fitspo|fitspiration from 2013-07-01T00:00:00Z to 2013-12-31T23:59:59Z
Extracted video info for bodypositive|bodypositivity from 2013-07-01T00:00:00Z to 2013-12-31T23:59:59Z
Extracted video info for fitspo|fitspiration from 2014-01-01T00:00:00Z to 2014-06-30T23:59:59Z
Extracted video info for bodypositive|bodypositivity from 2014-01-01T00:00:00Z to 2014-06-30T23:59:59Z
Extracted video info for fitspo|fitspiration from 2014-07-01T00:00:00Z to 2014-12-31T23:59:59Z
Extracted video info for bodypositive|bodypositivity from 2014-07-01T00:00:00Z to 2014-12-31T23:59:59Z
Extracted video info for fitspo|fitspiration from 2015-01-01T00:00:00Z to 2015-06-30T23:59:59Z
Extracted video info for bodypositive|bodypositivity from 2015-01-01T00:00:00Z to 2015-06-30T23:59:59Z
Extracted 

In [70]:
video_info_df.head()

Unnamed: 0,video_id,title,description,published_at,channel_id,channel_title,thumbnail_url,keyword,published_after,published_before
0,nhRzkGpdQdk,Female Fitness Motivation - Success is a Journey,Follow me on Facebook: http://www.facebook.com...,2013-03-12T17:09:50Z,UC3uDucFPe-E0I6CfxtHWZ6w,ShaQxTV,https://i.ytimg.com/vi/nhRzkGpdQdk/hqdefault.jpg,fitspo|fitspiration,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z
1,sIu61gS8Qag,Fitspiration Video,This is a video I made for me to stay motivate...,2013-02-15T22:11:00Z,UC87WGiHFdgzRf6XRpbIcg4Q,Samora Lewis,https://i.ytimg.com/vi/sIu61gS8Qag/hqdefault.jpg,fitspo|fitspiration,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z
2,Y4UyIC36VqQ,"Real Girl Weight loss, Before and After. Fitspo",Find us at http://alluring-beauties.tumblr.com...,2013-02-11T03:44:28Z,UC9aRTMF4m6aBzkyhchwsWHQ,AlluringBeautiesPage,https://i.ytimg.com/vi/Y4UyIC36VqQ/hqdefault.jpg,fitspo|fitspiration,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z
3,HQtyF3YmYpw,Fitspo for girls with curves 2013 (curvespo),just some weight lose/toning inspiration for y...,2013-05-07T03:33:06Z,UCLIeO7AnzCpod8GiZsxyx5w,LovesIt50,https://i.ytimg.com/vi/HQtyF3YmYpw/hqdefault.jpg,fitspo|fitspiration,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z
4,XXrvhTVlm0s,FITSPIRATION - Fitnessmodel,hope you like this Fitspo Video :D want to los...,2013-03-16T18:17:29Z,UCgsNAoSMqikDqYOUXyasaLg,fitandthin,https://i.ytimg.com/vi/XXrvhTVlm0s/hqdefault.jpg,fitspo|fitspiration,2013-01-01T00:00:00Z,2013-06-30T23:59:59Z


In [26]:
import pandas as pd
video_info_df = pd.read_parquet('query_results/video_info.parquet')

In [27]:
video_id_list = video_info_df['video_id'].to_list()

## Query Comments

In [57]:
comment_df = collect_commentInfo(video_id_list)
# all of the failed-to-collects are due to comments being disabled

Processed 10 video_ids
Processed 20 video_ids
Processed 30 video_ids
Processed 40 video_ids
Failed to collect comments for 1CPH_BY93nU
Processed 50 video_ids
Processed 60 video_ids
Processed 70 video_ids
Failed to collect comments for do1nispsN8M
Processed 80 video_ids
Processed 90 video_ids
Processed 100 video_ids
Failed to collect comments for _C6vl3JNZdE
Processed 110 video_ids
Failed to collect comments for ovrlw1WWfnc
Processed 120 video_ids
Processed 130 video_ids
Failed to collect comments for YYpNeP5GJm0
Processed 140 video_ids
Failed to collect comments for qKkWFCtao_k
Processed 150 video_ids
Processed 160 video_ids
Processed 170 video_ids
Processed 180 video_ids
Processed 190 video_ids
Processed 200 video_ids
Processed 210 video_ids
Processed 220 video_ids
Processed 230 video_ids
Processed 240 video_ids
Failed to collect comments for T4CYw_xNHEM
Processed 250 video_ids
Processed 260 video_ids
Processed 270 video_ids
Processed 280 video_ids
Processed 290 video_ids
Failed to co

In [59]:
comment_df.head()

Unnamed: 0,comment_id,text,comment_published_at,like_count,video_id,num_comments
0,UgyPI5QsDXr7wmUUvKt4AaABAg,"After 5 years, this is still my favorite motiv...",2018-10-01T16:21:08Z,42,nhRzkGpdQdk,100
1,UgjqhVSXlEHs-ngCoAEC,success really is a journey. Been hitting the ...,2014-10-08T10:31:17Z,55,nhRzkGpdQdk,100
2,Ughwdk3IOKhBf3gCoAEC,"I watch this whenever I feel unmotivated, and ...",2013-11-16T20:46:57Z,40,nhRzkGpdQdk,100
3,Ugh0d-7x5OwuZngCoAEC,This is a motivational video! Strong women bei...,2016-01-17T01:54:14Z,30,nhRzkGpdQdk,100
4,UgyPSoWbZoSYeNvs4eh4AaABAg,Love this video I watched this video 6 years a...,2022-05-26T04:14:29Z,4,nhRzkGpdQdk,100


In [60]:
comment_df['video_id'].nunique()

1168

## Query Video Stats

In [29]:
video_details_df, not_collected = collect_videoDetails(video_id_list)

Processed 10 video_ids
Processed 20 video_ids
Processed 30 video_ids
Processed 40 video_ids
Processed 50 video_ids
Processed 60 video_ids
Processed 70 video_ids
Processed 80 video_ids
Processed 90 video_ids
Processed 100 video_ids
Processed 110 video_ids
Processed 120 video_ids
Processed 130 video_ids
Processed 140 video_ids
Processed 150 video_ids
Processed 160 video_ids
Processed 170 video_ids
Processed 180 video_ids
Processed 190 video_ids
Processed 200 video_ids
Processed 210 video_ids
Processed 220 video_ids
Processed 230 video_ids
Processed 240 video_ids
Processed 250 video_ids
Processed 260 video_ids
Processed 270 video_ids
Processed 280 video_ids
Processed 290 video_ids
Processed 300 video_ids
Processed 310 video_ids
Processed 320 video_ids
Processed 330 video_ids
Processed 340 video_ids
Processed 350 video_ids
Processed 360 video_ids
Processed 370 video_ids
Processed 380 video_ids
Processed 390 video_ids
Processed 400 video_ids
Processed 410 video_ids
Processed 420 video_ids
P

In [30]:
video_details_df.head()

Unnamed: 0,description,tags,category_id,title,channel_title,view_count,like_count,comment_count,duration,definition,video_id
0,Follow me on Facebook:\nhttp://www.facebook.co...,"[Bodybuilding Motivation, ShaQx, Female Fitnes...",17,Female Fitness Motivation - Success is a Journey,ShaQx,9762915,41921,1920,PT3M12S,hd,nhRzkGpdQdk
1,This is a video I made for me to stay motivate...,"[fit, fitblr, fitspiration video, abs, motivat...",22,Fitspiration Video,Samora Lewis,211373,935,50,PT4M10S,hd,sIu61gS8Qag
2,Find us at http://alluring-beauties.tumblr.com...,"[girls, weightloss, fitspo, thinsp, skinny, sl...",22,"Real Girl Weight loss, Before and After. Fitspo",AlluringBeautiesPage,108440,263,35,PT1M53S,hd,Y4UyIC36VqQ
3,just some weight lose/toning inspiration for y...,"[thinspo, curves, curvy, sexy, curvespo, lose ...",26,Fitspo for girls with curves 2013 (curvespo),LovesIt50,63819,83,3,PT4M13S,sd,HQtyF3YmYpw
4,hope you like this Fitspo Video :D \nwant to l...,"[fitspiration, fitspo, fitness, thin, thinspir...",22,FITSPIRATION - Fitnessmodel,fitandthin,30885,130,6,PT2M9S,sd,XXrvhTVlm0s


In [31]:
not_collected   # all video details are collected

[]