# Bowtie Youtube Channel Report

## Data Loading
Load and examine the initial structure of the datasets required for this analysis.

Data Extraction: Utilized the YouTube API to gather comprehensive data from three selected channels, including video metadata, view counts, likes, comments, and more.

# Successful
channel id get from: https://www.tunepocket.com/youtube-channel-id-finder/#channle-id-finder-form


api tutorial: https://www.youtube.com/watch?v=D56_Cx36oGY

# Extract
## General Setup

In [75]:
import pandas as pd
from googleapiclient.discovery import build
import os

In [76]:
# Hardcoded API key (avoid committing to GitHub)
api_key = 'AIzaSyB--eckt7AXDEzEp5zV2agASUWtpbuES5g'

# Initialize YouTube API client
youtube = build("youtube", "v3", developerKey=api_key)

# Channel ID for data extraction
channel_ids = [
    'UCD5Lx-3KCYZzCzGF2A60STg',  # @BowieYTM
    'UC8OVLoXv7B1BdOVV44Dz3ig',  # @projectumbrellahk
    'UCCjW9xzAsCSKIzDYCx8CuxA',  # @CW.talkinsurfp
    'UCn2mY9NLMvu7oJuLbFC-kIw',  # @adrianaszeyu
    'UCuv-es1NKE9mxFU3PTv3zkg',  # @unclewill894
    'UC7OUGIPx0HIB5HA2OSL-Zhg',  # @MW31
    'UCz8b7EYrOF4iXFIsap30kkw',  # @10LifeHK
    'UCXqmN9Z56cX2VPXZ-ZPnS1A',  # @BlueHKinsurance
    'UCFfbH3zDLa47d4nfotQ349Q',  # @easy_investment
    'UCxQfqaw1i39eBQG1YJDbDkw',  # @utopiahk1406
    'UCPO68WX6rtspcv-kmgK2ufQ',  # @adrianlee-9036
    'UCOr4rh-QXaVY_ZQzTqKesiQ',  # @kirk2677
    'UCLblmEwmgBr-UCxP9ZsbUnA'   # @AIARoundTableFamily
]

## Channel data

In [None]:
def get_channel_stat(youtube, channel_ids):
    all_data = []
    
    try:
        request = youtube.channels().list(
            part="snippet,contentDetails,statistics",
            id=",".join(channel_ids)
        )
        response = request.execute()
        
        for item in response["items"]:
            data = {
                
                "channelName": item["snippet"]["title"],
                "views": item["statistics"]["viewCount"],
                "totalVideos": item["statistics"]["videoCount"],
                "subscribers": item["statistics"]["subscriberCount"],
                "playlist_id": item["contentDetails"]["relatedPlaylists"]["uploads"]
            }
            all_data.append(data)
    except Exception as e:
        print(f"Error fetching channel stats: {e}")
    
    return pd.DataFrame(all_data)

# Create data/raw directory and save channel stats
os.makedirs("data/raw", exist_ok=True)
channel_stats = get_channel_stat(youtube, channel_ids)
channel_stats.to_csv("data/raw/channel_df_extracted.csv", index=False)
# print(f"Saved channel stats for {len(channel_stats)} channels")
channel_stats

Unnamed: 0,channelName,views,totalVideos,subscribers,playlist_id
0,MW Insurance Academe 保險為什麼,793670,1836,5370,UU7OUGIPx0HIB5HA2OSL-Zhg
1,Blue Insurance Hong Kong,29364224,62,578,UUXqmN9Z56cX2VPXZ-ZPnS1A
2,Bowtie Insurance 保泰人壽,51204434,330,79800,UUD5Lx-3KCYZzCzGF2A60STg
3,Adrian Lee,142649,69,2200,UUPO68WX6rtspcv-kmgK2ufQ
4,大佬Kirk保險日記,18889,9,360,UUOr4rh-QXaVY_ZQzTqKesiQ
5,王傲山MarcusWong綜合頻道,99708,218,1790,UULblmEwmgBr-UCxP9ZsbUnA
6,UTOPIA HK,619961,314,4170,UUxQfqaw1i39eBQG1YJDbDkw
7,Adriana的保險實戰攻略,251962,76,4750,UUn2mY9NLMvu7oJuLbFC-kIw
8,智偉保險理財Talk,2800757,426,39400,UUCjW9xzAsCSKIzDYCx8CuxA
9,Project Umbrella,1310132,230,20000,UU8OVLoXv7B1BdOVV44Dz3ig


## Video data

In [78]:
def get_video_ids(youtube, playlist_id):
    video_ids = []
    
    try:
        request = youtube.playlistItems().list(
            part="snippet,contentDetails",
            playlistId=playlist_id,
            maxResults=50
        )
        response = request.execute()
        
        for item in response['items']:
            video_ids.append(item['contentDetails']['videoId'])
        
        next_page_token = response.get('nextPageToken')
        while next_page_token:
            request = youtube.playlistItems().list(
                part="contentDetails",
                playlistId=playlist_id,
                maxResults=50,
                pageToken=next_page_token
            )
            response = request.execute()
            
            for item in response['items']:
                video_ids.append(item['contentDetails']['videoId'])
            
            next_page_token = response.get('nextPageToken')
    except Exception as e:
        print(f"Error fetching video IDs: {e}")
    
    return video_ids

def get_video_details(youtube, video_ids):
    all_video_info = []
    
    for i in range(0, len(video_ids), 50):
        try:
            request = youtube.videos().list(
                part="snippet,contentDetails,statistics",
                id=','.join(video_ids[i:i+50])
            )
            response = request.execute()
            
            for video in response['items']:
                stats_to_keep = {
                    'snippet': ['channelTitle', 'title', 'description', 'tags', 'publishedAt'],
                    'statistics': ['viewCount', 'likeCount', 'favoriteCount', 'commentCount'],
                    'contentDetails': ['duration', 'definition', 'caption']
                }
                video_info = {}
                video_info['video_id'] = video['id']
                for k in stats_to_keep.keys():
                    for v in stats_to_keep[k]:
                        try:
                            video_info[v] = video[k][v]
                        except:
                            video_info[v] = None
                all_video_info.append(video_info)
        except Exception as e:
            print(f"Error processing video batch {i//50 + 1}: {e}")
            continue
    
    return pd.DataFrame(all_video_info)

# Get video IDs and details using playlist ID from channel stats
playlist_id = channel_stats['playlist_id'][0]
video_ids = get_video_ids(youtube, playlist_id)
video_df = get_video_details(youtube, video_ids)
video_df.to_csv("data/raw/youtube_video_data.csv", index=False)
print(f"Saved {len(video_df)} video details")
video_df.head()

Saved 1650 video details


Unnamed: 0,video_id,channelTitle,title,description,tags,publishedAt,viewCount,likeCount,favoriteCount,commentCount,duration,definition,caption
0,8W3N1A3RAVc,MW Insurance Academe 保險為什麼,AI 🤣🥮✌🏻,,,2025-10-07T14:26:43Z,2195,7,0,0,PT6S,hd,False
1,Lkk6hTEcHNw,MW Insurance Academe 保險為什麼,預設醫療指示｜三個情景才能拒絕施救｜Podcast｜誰決定自己生死？｜保險為什麼 - 399...,成為這個頻道的會員並獲得獎勵：\nhttps://www.youtube.com/chann...,,2025-10-07T08:45:37Z,165,14,0,0,PT24M22S,hd,False
2,vG9EG1mHZP4,MW Insurance Academe 保險為什麼,網友最熱門問我的其中一條問題？｜點解市場上有申報「五年病歷」就可以買保單的誤解？｜保險為什麼...,成為這個頻道的會員並獲得獎勵：\nhttps://www.youtube.com/chann...,,2025-09-26T15:01:56Z,349,17,0,0,PT14M7S,hd,False
3,vihfAS92X2w,MW Insurance Academe 保險為什麼,颱風三保｜申請賠償時超前準備｜汽車保｜3｜「樺加沙」在家部住你｜我的名字就是「樺」，「加」上...,供参考之用：\n\n「這次颱風橫切面太大，持續時間較長，務必留在室內不要外出！\n\n建議:...,,2025-09-23T19:00:55Z,72,1,0,0,PT12M16S,hd,False
4,ADiSc6mz7CA,MW Insurance Academe 保險為什麼,颱風三保｜申請賠償時超前準備｜家居保｜2｜「樺加沙」在家部住你｜我的名字就是「樺」，「加」上...,供参考之用：\n\n「這次颱風橫切面太大，持續時間較長，務必留在室內不要外出！\n\n建議:...,,2025-09-23T17:00:13Z,48,5,0,0,PT4M28S,hd,False


## Comment data

In [79]:
def get_video_comments(youtube, video_ids):
    all_comments = []
    
    for video_id in video_ids:
        try:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=100
            )
            response = request.execute()
            
            while response:
                for item in response['items']:
                    comment_data = {
                        'comment_id': item['id'],
                        'video_id': video_id,
                        'channel_id': item['snippet']['topLevelComment']['snippet']['authorChannelId'].get('value', None),
                        'publish_at': item['snippet']['topLevelComment']['snippet']['publishedAt'],
                        'comment': item['snippet']['topLevelComment']['snippet']['textDisplay']
                    }
                    all_comments.append(comment_data)
                
                # Check for next page
                if 'nextPageToken' in response:
                    request = youtube.commentThreads().list(
                        part="snippet",
                        videoId=video_id,
                        maxResults=100,
                        pageToken=response['nextPageToken']
                    )
                    response = request.execute()
                else:
                    break
        except Exception as e:
            print(f"Error fetching comments for video {video_id}: {e}")
            continue
    
    return pd.DataFrame(all_comments)

# Get and save comments
comment_df = get_video_comments(youtube, video_ids)
comment_df.to_csv("data/raw/youtube_comments.csv", index=False)
print(f"Saved {len(comment_df)} comments")
comment_df.head()

Saved 368 comments


Unnamed: 0,comment_id,video_id,channel_id,publish_at,comment
0,Ugxn9Fr5BL5WwjWI57N4AaABAg,DVKARhqXiDo,UC7OUGIPx0HIB5HA2OSL-Zhg,2025-06-28T10:42:31Z,感謝大家比❤！<br>可以給我知道你覺得最有價值的重點是什麼嗎？
1,UgygL8Zb6phbxAOIVbB4AaABAg,zwJd8uyp92Y,UC7OUGIPx0HIB5HA2OSL-Zhg,2025-06-03T06:21:17Z,第一個Podcast 🎙️多多指教❤
2,Ugy1B8RHBaeAb4NeDnR4AaABAg,HI0jt6xhtSY,UC7OUGIPx0HIB5HA2OSL-Zhg,2023-01-18T10:41:15Z,今天是1/18/2023，我收到電郵通知，經核實符合入學要求，並將錄取入讀 2023年2月1...
3,UgzK5jTs4xEZNALOIvN4AaABAg,PA4FIxR1MFg,UC7OUGIPx0HIB5HA2OSL-Zhg,2022-12-10T10:17:07Z,喜歡我選的歌，比個♥️我，等我知道您也喜歡聽。🎼🎧🎹
4,UgzWHuKvHl0uKHxghV94AaABAg,1EHT6kjgyB4,UC7OUGIPx0HIB5HA2OSL-Zhg,2022-08-01T12:10:01Z,"每次一分享保險知識，都有 一 個 Fans秒like,真想知是誰那麼支持！😆"


In [80]:
len(comment_df)

368

In [71]:
video_df.isnull().sum()

video_id          0
channelTitle      0
title             0
description       0
tags             37
publishedAt       0
viewCount         0
likeCount         0
favoriteCount     0
commentCount      2
duration          0
definition        0
caption           0
dtype: int64

In [72]:
video_df.dtypes

video_id         object
channelTitle     object
title            object
description      object
tags             object
publishedAt      object
viewCount        object
likeCount        object
favoriteCount    object
commentCount     object
duration         object
definition       object
caption          object
dtype: object

In [73]:
numeric_columns = ['viewCount', 'likeCount', 'favoriteCount', 'commentCount']
video_df[numeric_columns] = video_df[numeric_columns].apply(pd.to_numeric, errors='coerce')

# small youtuber

@projectumbrellahk
UC8OVLoXv7B1BdOVV44Dz3ig

@CW.talkinsurfp
UCCjW9xzAsCSKIzDYCx8CuxA

Adriana的保險實戰攻略
@adrianaszeyu
UCn2mY9NLMvu7oJuLbFC-kIw

UncleWill
@unclewill894
UCuv-es1NKE9mxFU3PTv3zkg

MW Insurance Academe 保險為什麼
@MW31
UC7OUGIPx0HIB5HA2OSL-Zhg

10Life 保險比較平台
@10LifeHK
UCz8b7EYrOF4iXFIsap30kkw

Blue Insurance Hong Kong
@BlueHKinsurance
UCXqmN9Z56cX2VPXZ-ZPnS1A

投資最容易
@easy_investment
UCFfbH3zDLa47d4nfotQ349Q

UTOPIA HK
@utopiahk1406
UCxQfqaw1i39eBQG1YJDbDkw

Adrian Lee
@adrianlee-9036
UCPO68WX6rtspcv-kmgK2ufQ

大佬Kirk保險日記
@kirk2677
UCOr4rh-QXaVY_ZQzTqKesiQ

王傲山MarcusWong綜合頻道
@AIARoundTableFamily
UCLblmEwmgBr-UCxP9ZsbUnA