# User Guide

This is a experimental user guide to **Youcreep**.

If you have any questions, please contact [steveflyer](steveflyer@gmail.com).

In [1]:
import logging

import nest_asyncio
import pandas as pd

from youcreep import YoutubeVideoInfoCrawler, YoutubeCommentCrawler
from youcreep.config.filter_enum import FilterSection, FilterLengthOption, FilterOrderByOption

nest_asyncio.apply()

## 1. VideoInfo Crawler

In [2]:
# configure parameters
n_target = 50               # number of videos to crawl at most
search_term = "Python"      # search term
filter_config = {           # how to filter the search results, here we filter by view count and length
    FilterSection.ORDER_BY: FilterOrderByOption.VIEW_COUNT,
    FilterSection.LENGTH: FilterLengthOption.MEDIUM,
}

# call the crawler to crawl, we can use async with to make sure the crawler is closed after use
async with YoutubeVideoInfoCrawler(headless=False) as video_crawler:
    video_info_list = await video_crawler.crawl(
        search_term=search_term,
        n_target=n_target,
        filter_options=filter_config,
    )
    
video_info_list[:3]

[VideoInfo(video_id=dVRhRzE_AkQ),
 VideoInfo(video_id=fr6WJG3Uw3M),
 VideoInfo(video_id=XcxwOVNyfFY)]

The result of crawling is a list of dictionaries, you can easily convert it to a pandas dataframe.

In [3]:
video_df = pd.DataFrame([video_info.to_dict() for video_info in video_info_list])

video_df

Unnamed: 0,video_id,title,video_url,is_short,view_count,publish_time,channel_name,channel_url,desc_text
0,dVRhRzE_AkQ,Python的实惠美洲鳄鱼,https://www.youtube.com/watch?v=dVRhRzE_AkQ,False,1.5亿次观看,11年前,ojatro,https://www.youtube.com/@ojatro,网址：http://Ojatro.com\nFacebook的：https://www.fa...
1,fr6WJG3Uw3M,OMG! Capuchin Monkey Save Mouse From Banded Kr...,https://www.youtube.com/watch?v=fr6WJG3Uw3M,False,1.5亿次观看,4年前,Life Of Big Cat,https://www.youtube.com/@lifeofbigcat2374,OMG! Capuchin Monkey Save Mouse From Banded Kr...
2,XcxwOVNyfFY,OMG! Giant Python Hunt Leopard Cubs When Mothe...,https://www.youtube.com/watch?v=XcxwOVNyfFY,False,1.4亿次观看,3年前,Life Of Big Cat,https://www.youtube.com/@lifeofbigcat2374,OMG! Giant Python Hunt Leopard Cubs When Mothe...
3,c98BHS8rSTg,Easy Snake Trap - Build Underground Python Tra...,https://www.youtube.com/watch?v=c98BHS8rSTg,False,1.4亿次观看,3年前,da ra,https://www.youtube.com/@dara-ee4uv,Easy Snake Trap - Build Underground Python Tra...
4,adUlm2m7RrU,Stop Motion ASMR - Big Python eats Alligator G...,https://www.youtube.com/watch?v=adUlm2m7RrU,False,1.1亿次观看,2年前,Unusual Cooking,https://www.youtube.com/@UnusualCooking,"Hello my dear viewers, Unbelievable Fishing at..."
5,dw5eILfy-c8,"Python is too aggressive, Lion Cub mistakes wh...",https://www.youtube.com/watch?v=dw5eILfy-c8,False,8086万次观看,4年前,Alan Tours,https://www.youtube.com/@Alantours4u,"NAP , #NAPanimal Python is too aggressive, Lio..."
6,JgKN3BuvC3E,"Python, Honey Badger & Jackal Fight Each Other",https://www.youtube.com/watch?v=JgKN3BuvC3E,False,6479万次观看,3年前,Caters Video,https://www.youtube.com/@CatersVideo,ID: 3331461 MANDATORY ONSCREEN CREDIT - Rosely...
7,rLycYSY2V6Y,10 Biggest Snakes Ever Discovered,https://www.youtube.com/watch?v=rLycYSY2V6Y,False,5184万次观看,2年前,FactFile,https://www.youtube.com/@FactFile,There are over 3000 species of snake on the pl...
8,5ADMawsgV5M,蜘蛛侠和斗牛犬对抗一只巨蟒,https://www.youtube.com/watch?v=5ADMawsgV5M,False,3767万次观看,1年前,King Of Survival,https://www.youtube.com/@KingOfSurvivalVlogs,蜘蛛侠和斗牛犬对抗一只巨蟒
9,MRnORGSYGCQ,Primitive Survival 4K Video - Build PYTHON Hou...,https://www.youtube.com/watch?v=MRnORGSYGCQ,False,3503万次观看,3年前,Survival Skills Asia,https://www.youtube.com/@survivalskillsasia,Primitive Survival 4K Video - Build PYTHON Hou...


## 2. Comment Crawler

In [4]:
from gembox.debug_utils import ConsoleDebugger

# configure parameters
# test_video_url = video_df.video_url[0]
test_video_url = "https://www.youtube.com/watch?v=c98BHS8rSTg"
n_target = 50

# call the crawler to crawl, this time, we use headless mode to avoid opening a browser, and we enable detailed logging to trace the crawling process
async with YoutubeCommentCrawler(headless=True, debug_tool=ConsoleDebugger(level=logging.INFO)) as comment_crawler:
    comment_list = await comment_crawler.crawl(
        video_url=test_video_url,
        n_target=n_target,
    )
comment_df = pd.DataFrame([comment.to_dict() for comment in comment_list])

comment_df

2023-08-21 16:23:15,408 - gembox.debug_utils_21c7_logger - INFO - Browser: Go to https://www.youtube.com/
2023-08-21 16:23:17,118 - gembox.debug_utils_21c7_logger - INFO - Browser: Go to https://www.youtube.com/watch?v=c98BHS8rSTg
2023-08-21 16:23:19,080 - gembox.debug_utils_21c7_logger - INFO - Scrolling and loading ytd-comment-renderer...
2023-08-21 16:23:19,081 - gembox.debug_utils_21c7_logger - INFO - Inner Call Scrolling and loading...(selector=ytd-comment-renderer, scroll_step=400, load_wait=40, same_th=100, threshold=50)
2023-08-21 16:23:19,181 - gembox.debug_utils_21c7_logger - INFO - Top unchanged, Scroll top: 189, last top: 189, same count: 1, same_th: 100
2023-08-21 16:23:19,303 - gembox.debug_utils_21c7_logger - INFO - Top unchanged, Scroll top: 189, last top: 189, same count: 2, same_th: 100
2023-08-21 16:23:19,349 - gembox.debug_utils_21c7_logger - INFO - Top unchanged, Scroll top: 189, last top: 189, same count: 3, same_th: 100
2023-08-21 16:23:19,397 - gembox.debug_util

## 3. Multiprocessing Crawler

You can easily crawl multiple videos or comments in parallel using multiprocessing.

The calling methods just receive a list of parameters, and you can specify `n_workers` to control the number of parallel processes.

### Crawl video info from a list of search terms

In [5]:
# configure parameters
search_terms = ["OpenAI", "Deep Learning", "Machine Learning", "Artificial Intelligence", "AI", "Neural Network", "Reinforcement Learning"]
n_targets = [20] * len(search_terms)

# call the crawler to crawl
await YoutubeVideoInfoCrawler.parallel_crawl(
    search_term_list=search_terms,
    n_target_list=n_targets,
    n_workers=4,                        # number of workers, i.e. number of parallel processes
    output_dir="youtube_output/search", # output directory
)

#### Check the output

In [6]:
video_csv = "youtube_output/search/AI_20_video_info.csv"  # change the file name to the one you just crawled
video_df = pd.read_csv(video_csv, index_col=0)

video_df

Unnamed: 0,video_id,title,video_url,is_short,view_count,publish_time,channel_name,channel_url,desc_text
0,Pj8pKvRdMIw,【財富的第N本筆記】AI股引爆台股！下一個引爆的概念股是「它」@CtiFinance,https://www.youtube.com/watch?v=Pj8pKvRdMIw,False,觀看次數：1068次,21 小時前,中天電視,https://www.youtube.com/@CtiTv,AI股讓台股紅紅火火，雖然是幾家歡樂幾家愁，甚至讓緯創的股價爬上百元俱樂部，相信讓不少投資人...
1,xUurLG5euDY,一鍵變臉! AI軟體扮公安 可邊換臉邊直播｜TVBS新聞 @TVBSNEWS01,https://www.youtube.com/watch?v=xUurLG5euDY,False,觀看次數：983次,2 天前,TVBS NEWS,https://www.youtube.com/@TVBSNEWS01,持續追蹤這起詐騙案，詐騙集團利用AI變臉軟體，鎖定海外人士詐騙，甚至可以一邊變臉一邊直播，但...
2,1B0OPbaSSH8,網瘋傳AI繪「蠟筆小新島」　超級逼真有房有海｜華視新聞 20230820,https://www.youtube.com/watch?v=1B0OPbaSSH8,False,觀看次數：1593次,22 小時前,華視新聞 CH52,https://www.youtube.com/@CtsTw,您有聽過「蠟筆小新島」嗎？最近外觀神似小新和小白的島嶼空拍照，在網路上瘋傳，不少人說實在好想...
3,XQQjMtOfgpo,【 8 月 AI 新聞精選】ChatGPT 5 新一代 AI 已註冊 多款 AI 難敵中國...,https://www.youtube.com/watch?v=XQQjMtOfgpo,False,觀看次數：3.5萬次,2 天前,UNWIRE.HK,https://www.youtube.com/@unwire,OpenAI 被發現為「GPT- 5」商標申請，新一代AI 已經開發中? 想知更多就要即刻睇...
4,X9MM9vctavM,REACTION AI TEAM NISAA PREGNANT???? AFIQ MENAN...,https://www.youtube.com/watch?v=X9MM9vctavM,False,觀看次數：18萬次,17 小時前,AI Boyyraa,https://www.youtube.com/@Boyyyraa,
5,v4_TTKzfMHA,FIRST IMPRESSION AI TEAM TERHADAP SYAHMIE??? S...,https://www.youtube.com/watch?v=v4_TTKzfMHA,False,觀看次數：3.8萬次,20 小時前,AI Syahmie,https://www.youtube.com/@aisyahmie,HI CITY BOY HERE so content kali ni saya nak t...
6,3QjlrITaAMQ,【理財達人秀】AI股大跌能撿？杜金龍獨家算目標價！官股.投信國家隊護盤 跟單誰？恆大破產人行...,https://www.youtube.com/watch?v=3QjlrITaAMQ,False,觀看次數：18萬次,2 天前,理財達人秀 EBCmoneyshow,https://www.youtube.com/@EBCmoneyshow,在小賈身編：鴻海低點到可接？ #AI #儲能#台股#理財達人秀(00:00) 航海王乘風破浪...
7,PCHskDuG2mw,AI孫燕姿歌曲30首精選，每首歌曲都是最高音質！快來聽你最喜歡的歌曲吧！AI Stefani...,https://www.youtube.com/watch?v=PCHskDuG2mw,False,觀看次數：3.7萬次,2 個月前,YUAN-華語AI音樂私藏,https://www.youtube.com/@YUAN52042,00:00:00【AI孫燕姿】《安靜》翻唱周杰倫00:05:16【AI孫燕姿】《被動》 00...
8,L51HuFs-D10,PRANK BALING PHONE KANDA MIMI !!! FARIDAH KENA...,https://www.youtube.com/watch?v=L51HuFs-D10,False,觀看次數：8.6萬次,21 小時前,AI Nazrul,https://www.youtube.com/@AI.Nazrul,Hidup ni kita kena santaiii... Walaupon kita j...
9,uTryDiddEck,SENIOR AI TEAM AJAR MADKHAN CARA UNTUK PIKAT H...,https://www.youtube.com/watch?v=uTryDiddEck,False,觀看次數：5.3萬次,19 小時前,AI Mad Khan,https://www.youtube.com/@aimadkhan,SO KORANG DAH TENGOK KAN MAD DAH MINTAK TIPS U...


### Crawl comments from a list of video urls

In [7]:
# get testing video_urls from the data we just crawled
video_urls = video_df["video_url"].tolist()

# configure parameters
video_urls = video_urls[:5]  # we just crawl comments from the first 5 videos to avoid long waiting time
n_targets = [30] * len(video_urls)  # we crawl 30 comments at most from each video

# call the crawler to crawl
await YoutubeCommentCrawler.parallel_crawl(
    video_url_list=video_urls,
    n_target_list=n_targets,
    n_workers=4,                        # number of workers, i.e. number of parallel processes
    output_dir="youtube_output/comment", # output directory
)

#### Check the output

In [11]:
comment_csv = "youtube_output/comment/X9MM9vctavM_30_comment.csv"  # change the file name to the one you just crawled
comment_df = pd.read_csv(comment_csv, index_col=0)

comment_df

Unnamed: 0,comment_id,is_reply,author_name,author_url,publish_time,parent_comment_id,video_id,author_thumbnail,content_text,like_count
0,Ugy-pt3ecAMZdKhnJHd4AaABAg,False,@Zahirah12,/channel/UCpAZNibWG4EPdje0CYczw4A,16 小時前,,X9MM9vctavM,https://yt3.ggpht.com/eUX9dK2-xAF7RZMnCNFwKVu1...,Tak tau nk ckp mcm mana. The best reaction yg ...,347
1,UgzeFm1mx0ET4ZD66rB4AaABAg,False,@kritta_v1025,/channel/UCiehePtwUNvp-X6J4GaQwHg,15 小時前,,X9MM9vctavM,https://yt3.ggpht.com/KxmPP0H70aXTyXETiWBCewp7...,"ikut terharu liatnya, persahabatan mereka kuat...",104
2,UgxP3_hLXxgF5wijC4B4AaABAg,False,@itsyaff7280,/channel/UCkKvXhWqvlyJ37b2L4qu_3A,1 小時前,,X9MM9vctavM,https://yt3.ggpht.com/q1tY6lZaOgTbFqwV3xfSikUW...,suka tengok friendship diorang. dari zaman buj...,9
3,Ugx5R5h4PqvSgdiV6tl4AaABAg,False,@fatinnajiha5788,/channel/UCs5nhKcc4ng9OCxKhH5YM7g,7 小時前 (已編輯),,X9MM9vctavM,https://yt3.ggpht.com/ytc/AOPolaTIrHTpGYdhipCY...,Alhamdulilah semoga boy & nisa dimurahkn rezek...,20
4,UgxFzQ0-x5NjVmZoXR54AaABAg,False,@fansaliefirfanteam,/channel/UCz9EX3Eo4CRFIfk5HqyeXjQ,17 小時前,,X9MM9vctavM,https://yt3.ggpht.com/LDzLxxGe32JZjjo81yQjYaHZ...,Congrats Boy & Nisa Semoga Sentiasa Dipermud...,66
5,UgxUQqICQYyiIzvdil94AaABAg,False,@boseychannel8024,/channel/UCAHM0yG3HQrU2aTd3Yi5_bw,16 小時前 (已編輯),,X9MM9vctavM,https://yt3.ggpht.com/ytc/AOPolaTtXiCdh__RL-xi...,"YES!! konten yg dinantikan, reaction drpd ai t...",105
6,UgwHBx478Btb6BP_uqF4AaABAg,False,@muhdsany5929,/channel/UC6DmM9ovCd0k9yoDg7tY-hg,16 小時前,,X9MM9vctavM,https://yt3.ggpht.com/ytc/AOPolaQX-6rcdUAycs_K...,Alhamdulillah …tahniah boy&nisa…semoga Allah p...,44
7,Ugwfv3zWfN30Xs_gjOF4AaABAg,False,@asmiemie87,/channel/UC-DkbMs-Q2aTC7rn7mSlbLA,17 小時前,,X9MM9vctavM,https://yt3.ggpht.com/pIdq8uRu7vqWHYIr5K2RNTKn...,Tahniah boy dan Nisa..Alhamdulillah doa² kami ...,62
8,Ugz0CpvHwIAEH7IQBsF4AaABAg,False,@nuramirazhar984,/channel/UCz9-KgGjidyuQFpqlkWufzA,5 小時前,,X9MM9vctavM,https://yt3.ggpht.com/ytc/AOPolaSM9spboeovun6h...,"Alhamdulillah , terharu tengok persahabatan di...",11
9,Ugw2bJhtV37zZcofJ-l4AaABAg,False,@Lesslayrafim134,/channel/UCK2WpI5zAPvAUegMKwKG2BQ,17 小時前,,X9MM9vctavM,https://yt3.ggpht.com/amZRuTJ-RAaUEmtojJi9PeQc...,Terharu tgk persahabatan ai team A dari dulu s...,131
