##### [Youtube/GDP](Top)




<img src=https://i.imgur.com/ACFFpv2.png width=1500 class="center">
<h1 align="center">Top Youtubers effecting GDP between 2019 - 2022??</h1>
    
    In today's world, the internet has become an integral part of our lives. With the rise of online platforms such as YouTube, it has become easier than ever for people to access information and entertainment from all over the world. At the same time, Gross Domestic Product (GDP) remains one of the most widely used indicators of economic performance. In this project, we aim to explore the relationship between these two seemingly unrelated topics: GDP and YouTube. By analyzing data on GDP and YouTube usage patterns, we hope to gain insights into how these two factors are connected and what implications this may have on our understanding of the modern world.

The objective of this project is to analyze the relationship between GDP and YouTube usage patterns. We will use data on GDP and YouTube usage patterns to gain insights into how these two factors are connected and what implications this may have on our understanding of the modern world.

Our goal is to answer the following questions:

- What are the the top Youtubers in countries around the globe?
- What is the Top GDP countries, and what is there growth during COVID?
- Is there a correlation between Top Youtubers and selected GDP Nations?
- What is statistical corelations can be made?

To answer these questions, we will use Python and its data analysis libraries, such as Pandas and Matplotlib. We will start by importing the dataset and cleaning the data, followed by exploratory data analysis and visualization.

I will be using the following datasets: 
- [Top Youtubers](https://www.kaggle.com/mdhrumil/top-5000-youtube-channels-data-from-socialblade)
- [GDP](https://www.kaggle.com/fernandol/countries-of-the-world)

APIs:
- [YouTube API](https://developers.google.com/youtube/v3/docs/channels/list)
- [Google API](https://console.cloud.google.com/apis/library/youtube.googleapis.com)



# Table of contents <a class='anchor' id='top'>
- [Introduction](#Introduction)
- [Import libraries](#import)
- [Load data](#load_data)
- [GDP Analysis](#gdpproject)
- [Bar chart](#bar_chart)
- [GDP Conclusion](#geo)
- [YouTube Analysis](#Analysis)
- [Youtube API](#YouTube)
- [Conclusion](#Conclusion)



# Introduction  <a class='anchor' id='Introduction'>

# Import libraries <a class='anchor' id='import'>

In [None]:
%pip install isodate
%pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client
%pip install wordcloud
%pip install nltk
%pip install wbgapi
%pip install bar_chart_race
%pip install plotly
%pip install pandas
%pip install matplotlib
%pip install seaborn
%pip install numpy
%pip install requests
%pip install scipy
%pip install sklearn
%pip install statsmodels
%pip install xgboost
%pip install lightgbm
%pip install catboost
%pip install bar_chart_race
%pip install ffmpeg

In [None]:
# Libraries for data manipulation
import pandas as pd
import numpy as np
from dateutil import parser
import isodate

# Data visualization libraries
%pip install scikit-learn

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib import pyplot as pyplot
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_squared_log_error
sns.set(style="darkgrid", color_codes=True)
%pip install scikit-learn

# Google API
from googleapiclient.discovery import build



In [None]:
# NLP libraries
import collections
import itertools
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from wordcloud import WordCloud
from wordcloud import ImageColorGenerator
   

<p align="center">
  <b><u><span style="font-size: 24px">
  GDP Analysis from 2019 - 2020<a class='anchor' id='gdpproject'></span></u></b><br>
</p>

### Load data <a class='anchor' id='load_data'>

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#load data
data = pd.read_csv('GDP by Country 1999-2022.csv', decimal = ',')

data = data.replace({'\"' :''}, regex=True)
data = data.replace({',' :''}, regex=True)
#display(data)

data = data.astype({'1999' : 'float', '2022' : 'float'})

#observe data
data.describe(include = 'all')

### Bar Chart for Top 10 GDP countries and for targeted group <a class='anchor' id='bar_chart'>


In [None]:
#Top 10 GDP in 2022
top_10_GDP_2022 = data.sort_values('2022', ascending = False).head(10)
fig, ax = pyplot.subplots(figsize = (16, 6))
sns.barplot(x = 'Country', y = '2022', data = top_10_GDP_2022, palette = 'Set1')
ax.set_xlabel(ax.get_xlabel(), labelpad= 15)
ax.set_ylabel('Gross Domestic Product', labelpad= 30)
ax.xaxis.label.set_fontsize(16)
ax.yaxis.label.set_fontsize(16)
pyplot.xticks(rotation = 90)
pyplot.title('Top GDP producing Countries in 2022', fontsize = 20)
pyplot.show()

In [None]:
#2022 GDP for germany, south korea, UK, US, Mexico, japan and india

Countries = data[data["Country"].isin(['Germany', 'South Korea', 'United Kingdom', 'United States', 'Mexico', 'Japan', 'India'])] 
Countries_GDP_2022 = Countries.sort_values('2022', ascending = False)

display(Countries_GDP_2022)

fig, ax = pyplot.subplots(figsize = (16, 6))
sns.barplot(x = 'Country', y = '2022', data = Countries_GDP_2022, legend=False, palette = 'Set1')
ax.set_xlabel(ax.get_xlabel(), labelpad= 15)
ax.set_ylabel('GDP by Country 2022', labelpad= 30)
ax.xaxis.label.set_fontsize(16)
ax.yaxis.label.set_fontsize(16)
pyplot.xticks(rotation = 45)
pyplot.title('2022 GDP for Germany, South korea, UK, US, Mexico, Japan and India', fontsize = 20)
pyplot.show()

## 📈 Gross Domestic product average for 2019 - 2020 <a class='anchor' id='19-20'>
+ Germany, South Korea UK, US, Mexico, Japan and India

In [None]:
#Average GDP between 2018-2022 for germany, south korea, UK, US, Mexico, japan and india
Countries = data[data["Country"].isin(['Germany', 'South Korea', 'United Kingdom', 'United States', 'Mexico', 'Japan', 'India'])] 
Countries_GDP_2022 = Countries[['Country', '2018', '2019', '2020', '2021', '2022']]

fig, ax = pyplot.subplots(figsize = (16, 6))
new_data = pd.melt(Countries_GDP_2022, id_vars = ['Country'], value_vars = ['2018', '2019', '2020', '2021', '2022'], var_name = 'Year', value_name = 'GDP')

new_data = new_data.astype({'Country' : 'str'})
new_data = new_data.astype({'GDP' : 'float'})


display(new_data)

sns.pointplot(data = new_data, x = 'Year', y = 'GDP', hue = 'Country', ax = ax, palette = 'nipy_spectral')

ax.set_xlabel(ax.get_xlabel(), labelpad= 15)
ax.set_ylabel('Average GDP by Country 2018-2022', labelpad= 30)
ax.xaxis.label.set_fontsize(16)
ax.yaxis.label.set_fontsize(16)
pyplot.title('Average GDP between 2018-2022', fontsize = 20)
pyplot.show()

#sort_new_data = new_data.sort_values('GDP', ascending = True)
#sort_new_data = resort_new_data.sort_values('GDP', ascending = True)


#sort_new_data = new_data.sort_values('GDP', ascending = True)
#sort_new_data = resort_new_data.sort_values('GDP', ascending = True)

In [None]:
#Average GDP between 2018-2022 for germany, south korea, UK, Mexico, japan and india ***Without the US because it made the graph look bad***
Countries = data[data["Country"].isin(['Germany', 'South Korea', 'United Kingdom', 'Mexico', 'Japan', 'India'])] 
Countries_GDP_2022 = Countries[['Country', '2018', '2019', '2020', '2021', '2022']]

fig, ax = pyplot.subplots(figsize = (16, 8))
new_data = pd.melt(Countries_GDP_2022, id_vars = ['Country'], value_vars = ['2018', '2019', '2020', '2021', '2022'], var_name = 'Year', value_name = 'GDP')

new_data = new_data.astype({'Country' : 'str'})
new_data = new_data.astype({'GDP' : 'float'})



display(new_data)

sns.pointplot(data = new_data, x = 'Year', y = 'GDP', hue = 'Country', ax = ax, palette = 'nipy_spectral')

ax.set_xlabel(ax.get_xlabel(), labelpad= 15)
ax.set_ylabel('Average GDP by Country 2018-2022', labelpad= 30)
ax.xaxis.label.set_fontsize(16)
ax.yaxis.label.set_fontsize(16)
pyplot.title('Average GDP between 2018-2022 without the U.S.', fontsize = 20)
pyplot.show()

## HalfTime <a class='anchor' id='geo'>

##### Region Code: You must provide a two-letter ISO 3166–1 country code 
* (e.g., ‘US’ for the United States) to specify the region for which you want to find the top 10 YouTube channels.
* Replace 'YOUR_REGION_CODE' in the code with the desired region code.

### ?WBGAPI World Bank Top 20 Countries by GDP?

In [None]:

import wbgapi as wb

In [None]:
help(wb)
wb.source.info()

In [None]:
##wb.economy.info(db=2)

---
## WORKING CODE



class Coder(dict):
    '''Class returned by coder if passed a list of terms
    '''

    def __repr__(self):
        rows = self._coder_report()
        columns = rows.pop(0)
        return tabulate(rows, tablefmt='simple', headers=columns)

def coder_report(economies):

    global _coder_names

    rows = [('ORIGINAL NAME', 'WBG NAME', 'ISO_CODE')]
    for k,v in economies.items():
        if v:
            wb_name = _coder_names.get(v, '')
        else:
            wb_name = ''

        rows.append((k, wb_name, v))

    output = []
    for row in rows:
        output.append([row[0], row[1], row[2]])
    
    return output





def plot_anim_funct():
    fig, ax = plt.subplots()

    ax.set_xlim((0, 2))
    ax.set_ylim((-2, 2))

    line, = ax.plot([], [], lw=2)

    def init():
        line.set_data([], [])
        return (line,)

    def animate(i):
        x = np.linspace(0, 2, 1000)
        y = np.sin(2 * np.pi * (x - 0.01 * i))
        line.set_data(x, y)
        return (line,)

    anim = animation.FuncAnimation(fig, animate, init_func=init,
                                   frames=100, interval=20, 
                                   blit=True)


    display(HTML(anim.to_jshtml()))

plot_anim_funct()

---

In [None]:
df = pd.read_csv("GDP by Country 1999-2022.csv", sep=',', header=0, thousands=",")
df.set_index("Country", inplace=True)
df = df.T
print(f"Dataframe has {df.shape[0]} rows and {df.shape[1]} columns")
df.head()

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from IPython.display import HTML, display
import re


import base64
from io import BytesIO, TextIOWrapper
from pathlib import Path
from tempfile import TemporaryDirectory

from matplotlib import rcParams
from matplotlib import animation

class FuncAnimation(animation.FuncAnimation):

    def to_html5_video(self, embed_limit=None, savefig_kwargs=None):
        """
        Convert the animation to an HTML5 ``<video>`` tag.

        This saves the animation as an h264 video, encoded in base64
        directly into the HTML5 video tag. This respects the rc parameters
        for the writer as well as the bitrate. This also makes use of the
        ``interval`` to control the speed, and uses the ``repeat``
        parameter to decide whether to loop.

        Parameters
        ----------
        embed_limit : float, optional
            Limit, in MB, of the returned animation. No animation is created
            if the limit is exceeded.
            Defaults to :rc:`animation.embed_limit` = 20.0.

        Returns
        -------
        video_tag : str
            An HTML5 video tag with the animation embedded as base64 encoded
            h264 video.
            If the *embed_limit* is exceeded, this returns the string
            "Video too large to embed."
        """
        VIDEO_TAG = r'''<video {size} {options}>
  <source type="video/mp4" src="data:video/mp4;base64,{video}">
  Your browser does not support the video tag.
</video>'''
        # Cache the rendering of the video as HTML
        if not hasattr(self, '_base64_video'):
            # Save embed limit, which is given in MB
            if embed_limit is None:
                embed_limit = rcParams['animation.embed_limit']

            # Convert from MB to bytes
            embed_limit *= 1024 * 1024

            # Can't open a NamedTemporaryFile twice on Windows, so use a
            # TemporaryDirectory instead.
            with TemporaryDirectory() as tmpdir:
                path = Path(tmpdir, "temp.m4v")
                # We create a writer manually so that we can get the
                # appropriate size for the tag
                Writer = animation.writers[rcParams['animation.writer']]
                writer = Writer(codec='h264',
                                bitrate=rcParams['animation.bitrate'],
                                fps=1000. / self._interval)
                self.save(str(path), writer=writer, savefig_kwargs=savefig_kwargs)
                # Now open and base64 encode.
                vid64 = base64.encodebytes(path.read_bytes())

            vid_len = len(vid64)
            if vid_len >= embed_limit:
                _log.warning(
                    "Animation movie is %s bytes, exceeding the limit of %s. "
                    "If you're sure you want a large animation embedded, set "
                    "the animation.embed_limit rc parameter to a larger value "
                    "(in MB).", vid_len, embed_limit)
            else:
                self._base64_video = vid64.decode('ascii')
                self._video_size = 'width="{}" height="{}"'.format(
                        *writer.frame_size)

        # If we exceeded the size, this attribute won't exist
        if hasattr(self, '_base64_video'):
            # Default HTML5 options are to autoplay and display video controls
            options = ['controls', 'autoplay']

            # If we're set to repeat, make it loop
            if hasattr(self, 'repeat') and self.repeat:
                options.append('loop')

            return VIDEO_TAG.format(video=self._base64_video,
                                    size=self._video_size,
                                    options=' '.join(options))
        else:
            return 'Video too large to embed.'

import bar_chart_race as bcr
import warnings
import matplotlib.animation
import imageio_ffmpeg as ffmpeg

---
warnings.filterwarnings('ignore')

bcr.bar_chart_race(df=df,
                   n_bars=10,
                  orientation="h",
                  title="Gross Domestic Product (billions USD)",
                  cmap="tab20b")


import pandas as pd
import bar_chart_race as bcr

create a dataframe with sample data
set the index to 'Year'
df.set_index('Year', inplace=True)

create the bar chart race
bcr.bar_chart_race(df=df, n_bars=10)


# Youtube Exploratory Data Analysis <a class='anchor' id='Analysis'>

To prepare for Youtube API pull and research of top YouTubers, we need to first obtain the API key from Google Cloud Console. Once we have the API key, we can use it to authenticate our requests to the Youtube API. We can then use the API to pull data on top YouTubers, such as their subscriber count, view count, and video count. We can use this data to perform exploratory data analysis and gain insights into the trends and patterns in the data.

### U.S. Top Channels


1. Mr. Beasts view-source:https://www.youtube.com/@MrBeast/about
* UCX6OQ3DkcsbYNE6H8uQQuVA

2. Cocomelon - Nursery Rhymes view-source:https://www.youtube.com/channel/UCbCmjCuTUZos6Inko4u57UQ
* UCbCmjCuTUZos6Inko4u57UQ

3. Dude Perfect view-source:https://www.youtube.com/channel/UCRijo3ddMTht_IHyNSNXpNQ
* UCRijo3ddMTht_IHyNSNXpNQ

4. ✿ Kids Diana Show view-source:https://www.youtube.com/channel/UCk8GzjMOrta8yxDcKfylJYw
* UCk8GzjMOrta8yxDcKfylJYw

- For India, the top two YouTubers are **Bhuvaneshwar Bam** and **Amit Bhadana**. Bhuvaneshwar Bam is the creator of **BB ki Vines**, a comedy channel that features him playing multiple characters. He has more than 20 million subscribers and over 3 billion views. Amit Bhadana is another comedy channel that features slice of life content, relationships, and comedic skits. He has more than 22 million subscribers and over 1.8 billion views. You can visit their YouTube pages here: [BB ki Vines](^1^) and [Amit Bhadana](^2^).

- For Japan, the top two YouTubers are **HikakinTV** and **Yuka Kinoshita**. HikakinTV is a channel that features various content such as music, games, challenges, and collaborations with other celebrities. He has more than 8.8 million subscribers and over 6.7 billion views. Yuka Kinoshita is a channel that features her eating large amounts of food in a short time. She has more than 5.6 million subscribers and over 2.4 billion views. You can visit their YouTube pages here: [HikakinTV](^3^) and [Yuka Kinoshita].

- For Mexico, the top two YouTubers are **Luisito Comunica** and **Badabun**. Luisito Comunica is a channel that features his travels around the world, exploring different cultures, cuisines, and attractions. He has more than 36 million subscribers and over 4.9 billion views. Badabun is a channel that features various content such as entertainment, news, pranks, and social experiments. He has more than 43 million subscribers and over 14 billion views. You can visit their YouTube pages here: [Luisito Comunica] and [Badabun].

- For South Korea, the top two YouTubers are **Boram Tube Vlog** and **Saebyuk Jang**. Boram Tube Vlog is a channel that features a six-year-old girl named Boram and her family doing various activities such as playing with toys, cooking, traveling, and reviewing products. She has more than 26 million subscribers and over 10 billion views. Saebyuk Jang is a channel that features a young boy named Saebyuk and his parents doing various content such as games, challenges, vlogs, and animations. He has more than 23 million subscribers and over 8 billion views. You can visit their YouTube pages here: [Boram Tube Vlog] and [Saebyuk Jang].


1) YouTube. https://www.youtube.com/index.
2) Top 50 Popular YouTubers in India (2023) - Moneymint. https://moneymint.com/top-youtubers-in-india/.
3) List of most-subscribed YouTube channels - Wikipedia. https://en.wikipedia.org/wiki/List_of_most-subscribed_YouTube_channels.
#### India Top Channels
1.  view-source:
*
2.   view-source:
*

#### Japan Top Channels
1.  view-source:
*
2.  view-source:
*

#### Mexico Top Channels
1.  view-source:
*
2.  view-source:
*

#### South Korea Top Channels
1.  view-source:
*
2.  view-source:
*

#### United Kingdom Top Channels
1.  view-source:
*
2.  view-source:
*

#### Germany Top Channels
1.
*
2.
*

## YouTube API <a class='anchor' id='YouTube'></center> 


# 1. Data creation with Youtube API

In [None]:
api_key = 'AIzaSyDs9TExsfc8fDMn4lBRYXZax1vSr0ftsdk'  # Youtube API key Personal 

channel_ids = ['UCX6OQ3DkcsbYNE6H8uQQuVA', # Mr. Beast US
               'UCbCmjCuTUZos6Inko4u57UQ', # Cocomelon US
               'UCqwUrj10mAEsqezcItqvwEw', # Bhuvaneshwar Bam IN
               'UC_vcKmg67vjMP7ciLnSxSHQ', # Amit Bhadana IN
               'UCZf__ehlCEBPop-_sldpBUQ', # HikakinTV JP
               'UC1opHUrw8rvnsadT-iGp7Cg', # PewDiePie SE
               'UCYWOjHweP2V-8kGKmmAmQJQ', # Badabun MX 
               'UCECJDeK0MNapZbpaOzxrUPA', # Luisito Comunica MX
               'UCOmHUn--16B90oW2L6FRR3A', # BLACKPINK
              ]

youtube = build('youtube', 'v3', developerKey=api_key)

# 2. Function to pull data from Youtube API and create a dataframe

In [None]:
def get_channel_stats(youtube, channel_ids):
    """
    Get channel statistics: title, subscriber count, view count, video count, upload playlist
    Params:
    
    youtube: the build object from googleapiclient.discovery
    channels_ids: list of channel IDs
    
    Returns:
    Dataframe containing the channel statistics for all channels in the provided list: title, subscriber count, view count, video count, upload playlist
    
    """
    all_data = []
    request = youtube.channels().list(
                part='snippet,contentDetails,statistics',
                id=','.join(channel_ids))
    response = request.execute() 
    
    for i in range(len(response['items'])):
        data = dict(channelName = response['items'][i]['snippet']['title'],
                    subscribers = response['items'][i]['statistics']['subscriberCount'],
                    views = response['items'][i]['statistics']['viewCount'],
                    totalVideos = response['items'][i]['statistics']['videoCount'],
                    playlistId = response['items'][i]['contentDetails']['relatedPlaylists']['uploads'])
        all_data.append(data)
    
    return pd.DataFrame(all_data)



New method to pull data from Youtube API and create a dataframe

def get_video_ids(youtube, playlist_id):
    """
    Get list of video IDs of all videos in the given playlist
    Params:
    
    youtube: the build object from googleapiclient.discovery
    playlist_id: playlist ID of the channel
    
    Returns:
    List of video IDs of all videos in the playlist
    
    """
    
    request = youtube.playlistItems().list(
                part='contentDetails',
                playlistId = playlist_id,
                maxResults = 50)
    response = request.execute()
    
    video_ids = []

        for i in range(len(response['items'])):
        video_ids.append(response['items'][i]['contentDetails']['videoId'])
        
    next_page_token = response.get('nextPageToken')
    more_pages = True
    
    while more_pages:
        if next_page_token is None:
            more_pages = False
        else:
            request = youtube.playlistItems().list(
                        part='contentDetails',
                        playlistId = playlist_id,
                        maxResults = 50,
                        pageToken = next_page_token)
            response = request.execute()
    
            for i in range(len(response['items'])):
                video_ids.append(response['items'][i]['contentDetails']['videoId'])
            
            next_page_token = response.get('nextPageToken')
        
    return video_ids
def get_comments_in_videos(youtube, video_ids):
    """
    Get the top 10 comments for each provided video ID.
    
    Parameters:
    youtube (googleapiclient.discovery.Resource): The YouTube API resource object.
    video_ids (list): List of video IDs.
    
    Returns:
    pd.DataFrame: DataFrame containing video IDs and associated top-level comments in text.
    """
    all_comments = []
    for video_id in video_ids:
        try:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=10  # Fetch only the first 10 comments
            )
            response = request.execute()
            # Process response data and append to all_comments
            # ...
        except Exception as e:
            logging.error(f"Error fetching comments for video {video_id}: {e}")
            continue
    return pd.DataFrame(all_comments)

   

In [44]:

import logging
import pandas as pd
from googleapiclient.discovery import build

# Configure logging
logging.basicConfig(filename='youtube_data.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Initialize YouTube API client
# Replace 'YOUR_API_KEY' with your actual API key
youtube = build('youtube', 'v3', developerKey='AIzaSyDs9TExsfc8fDMn4lBRYXZax1vSr0ftsdk')

def get_channel_stats(youtube, channel_ids):
    """
    Get channel statistics: title, subscriber count, view count, video count, upload playlist
    Params:
    
    youtube: the build object from googleapiclient.discovery
    channels_ids: list of channel IDs
    
    Returns:
    Dataframe containing the channel statistics for all channels in the provided list: title, subscriber count, view count, video count, upload playlist
    
    """
    all_data = []
    request = youtube.channels().list(
                part='snippet,contentDetails,statistics',
                id=','.join(channel_ids))
    response = request.execute() 
    
    for i in range(len(response['items'])):
        data = dict(channelName = response['items'][i]['snippet']['title'],
                    subscribers = response['items'][i]['statistics']['subscriberCount'],
                    views = response['items'][i]['statistics']['viewCount'],
                    totalVideos = response['items'][i]['statistics']['videoCount'],
                    playlistId = response['items'][i]['contentDetails']['relatedPlaylists']['uploads'])
        all_data.append(data)
    
    return pd.DataFrame(all_data)


def get_top_video_ids_by_view_count(youtube, playlist_id):
    """
    #Get the top 10 video IDs from the given playlist sorted by view count!
    # Fetch video IDs from the playlist
    # ...
    # Sort video IDs by view count and return top 10
    # ...
    Params:
    youtube (googleapiclient.discovery.Resource): The YouTube API resource object.
    playlist_id (str): Playlist ID.
    
    Returns:
    List of the top 10 video IDs in the playlist sorted by view count.
    """
    request = youtube.playlistItems().list(
        part='contentDetails',
        playlistId=playlist_id,
        maxResults=50
    )
    response = request.execute()

    # Extract video IDs and view counts into a list of dictionaries
    videos_data = [{'video_id': item['contentDetails']['videoId']} for item in response.get('items', [])]

    # If there are more pages, continue fetching
    next_page_token = response.get('nextPageToken')
    while next_page_token and len(videos_data) < 10:
        request = youtube.playlistItems().list(
            part='contentDetails',
            playlistId=playlist_id,
            maxResults=50,
            pageToken=next_page_token
        )
        response = request.execute()
        videos_data.extend([{'video_id': item['contentDetails']['videoId']} for item in response.get('items', [])])
        next_page_token = response.get('nextPageToken')

    # Sort the videos by view count in descending order and return the top 10
    top_video_ids = [video['video_id'] for video in videos_data]
    return top_video_ids[:10]


def get_video_details(youtube, video_ids):
    """
    Get details for the provided video IDs.
    
    Parameters:
    youtube (googleapiclient.discovery.Resource): The YouTube API resource object.
    video_ids (list): List of video IDs.
    
    Returns:
    pd.DataFrame: DataFrame containing video details.
    """
    all_video_info = []
    for video_id in video_ids:
        try:
            request = youtube.videos().list(
                part="snippet,contentDetails,statistics",
                id=video_id
            )
            response = request.execute()
            # Process response data and append to all_video_info
            video_info = response['items'][0]
            snippet = video_info['snippet']
            statistics = video_info['statistics']
            content_details = video_info['contentDetails']
            
            video_data = {
                'video_id': video_id,
                'channelTitle': snippet.get('channelTitle', ''),
                'title': snippet.get('title', ''),
                'description': snippet.get('description', ''),
                'tags': snippet.get('tags', []),
                'publishedAt': snippet.get('publishedAt', ''),
                'viewCount': statistics.get('viewCount', 0),
                'likeCount': statistics.get('likeCount', 0),
                'favoriteCount': statistics.get('favoriteCount', 0),
                'commentCount': statistics.get('commentCount', 0),
                'duration': content_details.get('duration', ''),
                'definition': content_details.get('definition', ''),
                'caption': content_details.get('caption', ''),
            }
            
            all_video_info.append(video_data)
        except Exception as e:
            logging.error(f"Error fetching details for video {video_id}: {e}")
            continue
    return pd.DataFrame(all_video_info)

def get_top_liked_comments(youtube, video_ids, max_results=10, min_likes=10):
    """
    Get the top liked comments for each provided video ID and filter out spam-like comments.

    Parameters:
    youtube (googleapiclient.discovery.Resource): The YouTube API resource object.
    video_ids (list): List of video IDs.
    max_results (int): Maximum number of comments to retrieve for each video (default is 10).
    min_likes (int): Minimum number of likes for a comment to be considered non-spam (default is 10).

    Returns:
    pd.DataFrame: DataFrame containing video IDs and associated top liked comments.
    """
    all_comments = []
    for video_id in video_ids:
        try:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=max_results,  # Fetch the specified number of comments
                order="relevance",  # Order comments by relevance (likely to include top liked comments)
            )
            response = request.execute()

            # Process comments and filter out spam-like comments
            comments_in_video = []
            for item in response['items']:
                snippet = item['snippet']['topLevelComment']['snippet']
                like_count = snippet.get('likeCount', 0)
                comment_text = snippet.get('textDisplay', '')

                # Filter out spam-like comments based on the minimum likes threshold
                if like_count >= min_likes:
                    comments_in_video.append(comment_text)

            comments_in_video_info = {'video_id': video_id,'comments': comments_in_video}
            all_comments.append(comments_in_video_info)
        except Exception as e:
            logging.error(f"Error fetching comments for video {video_id}: {e}")
            continue

    return pd.DataFrame(all_comments)


def get_comments_in_videos(youtube, video_ids, max_results=10, min_likes=50):
    """
    # ... (Implementation of get_top_liked_comments function)
    Get the top 10 comments for each provided video ID.

    Parameters:
    youtube (googleapiclient.discovery.Resource): The YouTube API resource object.
    video_ids (list): List of video IDs.

    Returns:
    pd.DataFrame: DataFrame containing video IDs and associated top-level comments in text.
    """
    all_comments = []
    for video_id in video_ids:
        try:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=max_results,  # Fetch only the first 10 comments #UPDATED to Fetch the specified number of comments
                order="relevance",  # Order comments by relevance (likely to include top liked comments) 
            )
            response = request.execute()
            # Process response data and append to all_comments # UPDATED to Process comments and filter out spam-like comments
            # Process comments and filter out spam-like comments
            for item in response['items']:
                snippet = item['snippet']['topLevelComment']['snippet']
                comment_data = {
                    'video_id': video_id,
                    'comment_id': item['id'],
                    'author': snippet.get('authorDisplayName', ''),
                    'text': snippet.get('textDisplay', ''),
                    'like_count': snippet.get('likeCount', 0),
                    'published_at': snippet.get('publishedAt', ''),
                }
                all_comments.append(comment_data)
        except Exception as e:
            logging.error(f"Error fetching comments for video {video_id}: {e}")
            continue
    return pd.DataFrame(all_comments)


def main():
    # List of channel IDs to process
    channel_ids = ['UCX6OQ3DkcsbYNE6H8uQQuVA','UCbCmjCuTUZos6Inko4u57UQ','UCqwUrj10mAEsqezcItqvwEw','UC_vcKmg67vjMP7ciLnSxSHQ','UCZf__ehlCEBPop-_sldpBUQ','UC1opHUrw8rvnsadT-iGp7Cg', 'UCYWOjHweP2V-8kGKmmAmQJQ','UCECJDeK0MNapZbpaOzxrUPA','UCOmHUn--16B90oW2L6FRR3A']
    
    # Get channel statistics
    channel_data = get_channel_stats(youtube, channel_ids)
    
    # Print the first few rows of channel_data to inspect column names
    print(channel_data.head())
    
    video_df = pd.DataFrame()
    comments_df = pd.DataFrame()
    
    # Ensure the column name matches the actual column name in channel_data
    for channel in channel_data['channelName'].unique():
        logging.info(f"Processing channel: {channel}")
        playlist_id = channel_data.loc[channel_data['channelName'] == channel, 'playlistId'].iloc[0]
        try:
            video_ids = get_top_video_ids_by_view_count(youtube, playlist_id)
            video_data = get_video_details(youtube, video_ids)
            comments_data = get_comments_in_videos(youtube, video_ids)

            video_df = pd.concat([video_df, video_data]).drop_duplicates(subset=['video_id'])
            comments_df = pd.concat([comments_df, comments_data]).drop_duplicates(subset=['video_id'])
        except Exception as e:
            logging.error(f"Error processing channel {channel}: {e}")
            continue
        
    # Save the data to CSV files
    video_df.to_csv('video_data.csv', index=False)
    comments_df.to_csv('comments_data.csv', index=False)

if __name__ == "__main__":
    main()


                  channelName subscribers         views totalVideos  \
0            Luisito Comunica    41400000    9051059451        1316   
1                 BB Ki Vines    26300000    4825068409         190   
2  Cocomelon - Nursery Rhymes   167000000  170365245506        1027   
3                     MrBeast   207000000   36268184336         765   
4                   HikakinTV    11900000   11709391701        3368   

                 playlistId  
0  UUECJDeK0MNapZbpaOzxrUPA  
1  UUqwUrj10mAEsqezcItqvwEw  
2  UUbCmjCuTUZos6Inko4u57UQ  
3  UUX6OQ3DkcsbYNE6H8uQQuVA  
4  UUZf__ehlCEBPop-_sldpBUQ  


               'UC1opHUrw8rvnsadT-iGp7Cg', # PewDiePie SE
               'UCYWOjHweP2V-8kGKmmAmQJQ', # Badabun MX 
               'UCECJDeK0MNapZbpaOzxrUPA', # Luisito Comunica MX
               'UCOmHUn--16B90oW2L6FRR3A', # BLACKPINK

def main():
    # List of channel IDs to process 
    channel_ids = ['UCX6OQ3DkcsbYNE6H8uQQuVA', # Mr. Beast US
               'UCbCmjCuTUZos6Inko4u57UQ', # Cocomelon US
               'UCqwUrj10mAEsqezcItqvwEw', # Bhuvaneshwar Bam IN
               'UC_vcKmg67vjMP7ciLnSxSHQ', # Amit Bhadana IN
               'UCZf__ehlCEBPop-_sldpBUQ', # HikakinTV JP
              ]  # Replace with actual channel IDs
    
    # Get channel statistics
    channel_data = get_channel_stats(youtube, channel_ids)
    
    video_df = pd.DataFrame()
    comments_df = pd.DataFrame()
    
    for channel in channel_data['channelName'].unique():
        logging.info(f"Processing channel: {channel}")
        playlist_id = channel_data.loc[channel_data['channelName'] == channel, 'playlistId'].iloc[0]
        try:
            video_ids = get_top_video_ids_by_view_count(youtube, playlist_id)
            video_data = get_video_details(youtube, video_ids)
            comments_data = get_comments_in_videos(youtube, video_ids)

            video_df = pd.concat([video_df, video_data]).drop_duplicates(subset=['video_id'])
            comments_df = pd.concat([comments_df, comments_data]).drop_duplicates(subset=['video_id'])
        except Exception as e:
            logging.error(f"Error processing channel {channel}: {e}")
            continue
        
    # Save the data to CSV files
    video_df.to_csv('video_data.csv', index=False)
    comments_df.to_csv('comments_data.csv', index=False)

if __name__ == "__main__":
    main()

def get_top_video_ids(youtube, playlist_id):
    """
    Get the top 10 video IDs from the given playlist sorted by view count.
    
    Params:
    youtube (googleapiclient.discovery.Resource): The YouTube API resource object.
    playlist_id (str): Playlist ID.
    
    Returns:
    List of the top 10 video IDs in the playlist sorted by view count.
    """
    request = youtube.playlistItems().list(
        part='contentDetails',
        playlistId=playlist_id,
        maxResults=50
    )
    response = request.execute()

    # Extract video IDs and view counts into a list of dictionaries
    videos_data = [{'video_id': item['contentDetails']['videoId']} for item in response.get('items', [])]

    # If there are more pages, continue fetching
    next_page_token = response.get('nextPageToken')
    while next_page_token and len(videos_data) < 10:
        request = youtube.playlistItems().list(
            part='contentDetails',
            playlistId=playlist_id,
            maxResults=50,
            pageToken=next_page_token
        )
        response = request.execute()
        videos_data.extend([{'video_id': item['contentDetails']['videoId']} for item in response.get('items', [])])
        next_page_token = response.get('nextPageToken')

    # Sort the videos by view count in descending order and return the top 10
    top_video_ids = [video['video_id'] for video in videos_data]
    return top_video_ids[:10]


 

def get_video_details(youtube, video_ids):
    """
    Get video statistics of all videos with given IDs
    Params:
    
    youtube: the build object from googleapiclient.discovery
    video_ids: list of video IDs
    
    Returns:
    Dataframe with statistics of videos, i.e.:
        'channelTitle', 'title', 'description', 'tags', 'publishedAt'
        'viewCount', 'likeCount', 'favoriteCount', 'commentCount'
        'duration', 'definition', 'caption'
    """
        
    all_video_info = []
    
    for i in range(0, len(video_ids), 50):
        request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            id=','.join(video_ids[i:i+50])
        )
        response = request.execute() 

        for video in response['items']:
            stats_to_keep = {'snippet': ['channelTitle', 'title', 'description', 'tags', 'publishedAt'],
                             'statistics': ['viewCount', 'likeCount', 'favouriteCount', 'commentCount'],
                             'contentDetails': ['duration', 'definition', 'caption']
                            }
            video_info = {}
            video_info['video_id'] = video['id']

            for k in stats_to_keep.keys():
                for v in stats_to_keep[k]:
                    try:
                        video_info[v] = video[k][v]
                    except:
                        video_info[v] = None

            all_video_info.append(video_info)
            
    return pd.DataFrame(all_video_info)

def get_comments_in_videos(youtube, video_ids):
    """
    Get top level comments as text from all videos with given IDs (only the first 10 comments due to quote limit of Youtube API)
    Params:
    
    youtube: the build object from googleapiclient.discovery
    video_ids: list of video IDs
    
    Returns:
    Dataframe with video IDs and associated top level comment in text.
    
    """
    all_comments = []
    
    for video_id in video_ids:
        try:   
            request = youtube.commentThreads().list(
                part="snippet,replies",
                videoId=video_id
            )
            response = request.execute()
        
            comments_in_video = [comment['snippet']['topLevelComment']['snippet']['textOriginal'] for comment in response['items'][0:10]]
            comments_in_video_info = {'video_id': video_id, 'comments': comments_in_video}

            all_comments.append(comments_in_video_info)
            
        except: 
            # When error occurs - most likely because comments are disabled on a video
            print('Could not get comments for video ' + video_id)
        
    return pd.DataFrame(all_comments)

# 3. Get channel statistics

Using the `get_channel_stats` function defined below, now we are going to obtain the channel statistics for the 9 channels in scope.

In [38]:
def get_channel_stats(youtube, channel_ids):
    """
    Get channel statistics: title, subscriber count, view count, video count, upload playlist, and country.
    
    Parameters:
    youtube: the build object from googleapiclient.discovery
    channels_ids: list of channel IDs
    
    Returns:
    Dataframe containing the channel statistics for all channels in the provided list: title, subscriber count, view count, video count, upload playlist, and country.
    """
    all_data = []
    
    for channel_id in channel_ids:
        try:
            request = youtube.channels().list(
                part='snippet,contentDetails,statistics',
                id=channel_id
            )
            response = request.execute()

            # Fetch additional data including country
            snippet_request = youtube.channels().list(
                part='snippet',
                id=channel_id
            )
            snippet_response = snippet_request.execute()
            country = snippet_response['items'][0]['snippet'].get('country', '')

            data = dict(
                channelName=response['items'][0]['snippet']['title'],
                subscribers=response['items'][0]['statistics']['subscriberCount'],
                views=response['items'][0]['statistics']['viewCount'],
                totalVideos=response['items'][0]['statistics']['videoCount'],
                playlistId=response['items'][0]['contentDetails']['relatedPlaylists']['uploads'],
                country=country  # Add country information
            )
            all_data.append(data)
        except Exception as e:
            logging.error(f"Error fetching channel stats for {channel_id}: {e}")
            continue
    return pd.DataFrame(all_data)


In [39]:
all_data

[{'channelName': 'MrBeast',
  'subscribers': '207000000',
  'views': '36268184336',
  'totalVideos': '765',
  'playlistId': 'UUX6OQ3DkcsbYNE6H8uQQuVA',
  'country': 'US'},
 {'channelName': 'Cocomelon - Nursery Rhymes',
  'subscribers': '167000000',
  'views': '170365245506',
  'totalVideos': '1027',
  'playlistId': 'UUbCmjCuTUZos6Inko4u57UQ',
  'country': 'US'},
 {'channelName': 'BB Ki Vines',
  'subscribers': '26300000',
  'views': '4825068409',
  'totalVideos': '190',
  'playlistId': 'UUqwUrj10mAEsqezcItqvwEw',
  'country': 'IN'},
 {'channelName': 'Amit Bhadana',
  'subscribers': '24500000',
  'views': '2429214564',
  'totalVideos': '106',
  'playlistId': 'UU_vcKmg67vjMP7ciLnSxSHQ',
  'country': 'IN'},
 {'channelName': 'HikakinTV',
  'subscribers': '11900000',
  'views': '11709391701',
  'totalVideos': '3368',
  'playlistId': 'UUZf__ehlCEBPop-_sldpBUQ',
  'country': ''},
 {'channelName': 'Aqua Ch. 湊あくあ',
  'subscribers': '1920000',
  'views': '368194153',
  'totalVideos': '592',
  'p

In [36]:
all_data = []
    
for channel_id in channel_ids:
    try:
        request = youtube.channels().list(
            part='snippet,contentDetails,statistics',
            id=channel_id
        )
        response = request.execute()

        # Fetch additional data including country
        snippet_request = youtube.channels().list(
            part='snippet',
            id=channel_id
        )
        snippet_response = snippet_request.execute()
        country = snippet_response['items'][0]['snippet'].get('country', '')

        data = dict(
            channelName=response['items'][0]['snippet']['title'],
            subscribers=response['items'][0]['statistics']['subscriberCount'],
            views=response['items'][0]['statistics']['viewCount'],
            totalVideos=response['items'][0]['statistics']['videoCount'],
            playlistId=response['items'][0]['contentDetails']['relatedPlaylists']['uploads'],
            country=country  # Add country information
        )
        all_data.append(data)
    except Exception as e:
        logging.error(f"Error fetching channel stats for {channel_id}: {e}")
        continue

channel_data1 = pd.DataFrame(all_data)
channel_data1.head()

Unnamed: 0,channelName,subscribers,views,totalVideos,playlistId,country
0,MrBeast,207000000,36268184336,765,UUX6OQ3DkcsbYNE6H8uQQuVA,US
1,Cocomelon - Nursery Rhymes,167000000,170365245506,1027,UUbCmjCuTUZos6Inko4u57UQ,US
2,BB Ki Vines,26300000,4825068409,190,UUqwUrj10mAEsqezcItqvwEw,IN
3,Amit Bhadana,24500000,2429214564,106,UU_vcKmg67vjMP7ciLnSxSHQ,IN
4,HikakinTV,11900000,11709391701,3368,UUZf__ehlCEBPop-_sldpBUQ,


In [37]:
channel_data1.to_csv('Top9channel_data.csv', index=False)

In [31]:
# Convert count columns to numeric columns
numeric_cols = ['subscribers', 'views', 'totalVideos']
channel_data[numeric_cols] = channel_data[numeric_cols].apply(pd.to_numeric, errors='coerce')

In [32]:
channel_data

Unnamed: 0,channelName,subscribers,views,totalVideos,playlistId
0,BLACKPINK,92000000,33904663784,585,UUOmHUn--16B90oW2L6FRR3A
1,HikakinTV,11900000,11709391701,3368,UUZf__ehlCEBPop-_sldpBUQ
2,Aqua Ch. 湊あくあ,1920000,368194153,592,UU1opHUrw8rvnsadT-iGp7Cg
3,Badabun,47100000,19554408537,19517,UUYWOjHweP2V-8kGKmmAmQJQ
4,Luisito Comunica,41400000,9051059451,1316,UUECJDeK0MNapZbpaOzxrUPA
5,Cocomelon - Nursery Rhymes,167000000,170365245506,1027,UUbCmjCuTUZos6Inko4u57UQ
6,MrBeast,207000000,36268184336,765,UUX6OQ3DkcsbYNE6H8uQQuVA
7,Amit Bhadana,24500000,2429214564,106,UU_vcKmg67vjMP7ciLnSxSHQ
8,BB Ki Vines,26300000,4825068409,190,UUqwUrj10mAEsqezcItqvwEw


In [None]:
channel_data.to_csv('channel_data.csv', index=False)


# Top Viewership ranking

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

# Using Channel_data dataframe from previous step, the data is sorted, and the top 9 channels are plotted with a legend for clarity.

# Sort the data and get the top 9 channels for clarity in the legend
sorted_data = channel_data.sort_values('subscribers', ascending=False)[0:9]

# Create a color palette with a distinct color for each bar
palette = sns.color_palette("hsv", len(sorted_data))

# Create the bar plot with the specified palette
ax = sns.barplot(
    x='channelName', 
    y='subscribers', 
    data=sorted_data, 
    palette=palette
)

# Rotate the x-axis labels by 45 degrees
plt.xticks(rotation=45)

# Format the y-axis labels as thousands with a K suffix
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'{x/1000:.0f}K'))

# Remove x-axis labels as we will add these as legends
ax.set(xticklabels=[])

# Create the legend manually
for i, row in enumerate(sorted_data.itertuples()):
    plt.bar(0, 0, color=palette[i], label=row.channelName,)

# Place the legend below the chart # Adjust legend box height and width
plt.legend(title='The Top YouTuber Channel Names', bbox_to_anchor=(1.35, .5), loc='center')

# Add labels to the bars to show the exact subscriber count for each channel
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x() + p.get_width() / 2., height + 0.02 * height, 
            f'{height/1000:.0f}K', ha='center', va='bottom', fontsize=8, color='black'
            )

#  Add a title and format it
plt.title('Top 9 YouTuber Channels by Subscriber Count', x=0.7, y=1.15, loc='Center', fontsize=18, fontweight='bold')


# FILEPATH: /c:/Users/jamal/jamaleb67.github.io/jamaleb67.github.io/work/Project.ipynb
plt.suptitle('Source: YouTube API', x=0.05, y=0.02, ha='left', fontsize=7)

# Save the plot as a PNG file
plt.savefig("TOP9PLOT.png")

# Show the plot
plt.show()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

ax = sns.barplot(x='channelName', y='views', data=channel_data.sort_values('views', ascending=False))
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.0f}'.format(x/1000) + 'K'))
plot = ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)


# Using Channel_data dataframe from previous step, the data is sorted, and the top 9 channels are plotted with a legend for clarity.

# Sort the data and get the top 9 channels for clarity in the legend
sorted_data = channel_data.sort_values('views', ascending=False)[0:9]

# Create a color palette with a distinct color for each bar
palette = sns.color_palette("hsv", len(sorted_data))

# Create the bar plot with the specified palette
ax = sns.barplot(
    x='channelName', 
    y='views', 
    data=sorted_data, 
    palette=palette
)

# Rotate the x-axis labels by 45 degrees
plt.xticks(rotation=45)

# Format the y-axis labels as thousands with a K suffix
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: f'{x/1000:.0f}K'))

# Remove x-axis labels as we will add these as legends
ax.set(xticklabels=[])

# Create the legend manually
for i, row in enumerate(sorted_data.itertuples()):
    plt.bar(0, 0, color=palette[i], label=row.channelName,)

# Place the legend below the chart # Adjust legend box height and width
plt.legend(title='The Top YouTuber Channels by Viewership', bbox_to_anchor=(1.35, .5), loc='center')

# Add labels to the bars to show the exact subscriber count for each channel
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x() + p.get_width() / 2., height + 0.02 * height, 
            f'{height/1000:.0f}K', ha='center', va='bottom', rotation=10,  fontsize=7
            )

#  Add a title and format it
plt.title('Top 9 YouTuber Channels by View Count', x=0.7, y=1.15, loc='Center', fontsize=18, fontweight='bold')


# FILEPATH: /c:/Users/jamal/jamaleb67.github.io/jamaleb67.github.io/work/Project.ipynb
plt.suptitle('Source: YouTube API', x=0.05, y=0.02, ha='left', fontsize=8)

# Save the plot as a PNG file
plt.savefig("TOP9PLOT.png")

# Show the plot
plt.show()


# Get Video Statistics

Video statistics are obtained using the `get_video_stats` function defined below. The function takes in a list of video ids and returns a dataframe with the video statistics.

In [None]:
video_df = pd.DataFrame()
comments_df = pd.DataFrame()

In [None]:
video_df

In [None]:
print(type(video_df))

In [None]:
print(pd.__version__)

## Option 2

# Create a dataframe with video statistics and comments from all channels

#video_df = pd.DataFrame()
#comments_df = pd.DataFrame() 

for c in channel_data['channelName'].unique():
    print("Getting video information from channel: " + c)
    playlist_id = channel_data.loc[channel_data['channelName']== c, 'playlistId'].iloc[0]
    video_ids = get_video_ids(youtube, playlist_id)
    
    # get video data
    video_data = get_video_details(youtube, video_ids)
    # get comment data
    comments_data = get_comments_in_videos(youtube, video_ids)

    # append video data together and comment data toghether
    video_df = video_df.concat(video_data, ignore_index=True)
    comments_df = comments_df.concat(comments_data, ignore_index=True)




# Create a dataframe with video statistics and comments from all channels
import pandas as pd

video_df = pd.DataFrame()
comments_df = pd.DataFrame()

for c in channel_data['channelName'].unique():
    print("Getting video information from channel: " + c)
    playlist_id = channel_data.loc[channel_data['channelName']== c, 'playlistId'].iloc[0]
    video_ids = get_video_ids(youtube, playlist_id)
    
    # get video data
    video_data = get_video_details(youtube, video_ids)

    # append video data together and comment data toghether
    video_df = video_df.append(video_data, ignore_index=True)
    comments_df = comments_df.concat(comments_data, ignore_index=True)

Try Concat

import pandas as pd

video_df = pd.DataFrame()

for c in channel_data['channelName'].unique():
    print("Getting video information from channel: " + c)
    playlist_id = channel_data.loc[channel_data['channelName'] == c, 'playlistId'].iloc[0]
    video_ids = get_video_ids(youtube, playlist_id)
    
    # get video data
    video_data = get_video_details(youtube, video_ids)

    # append video data together
    video_df = pd.concat([video_df, video_data], ignore_index=True)

In [None]:
# Write video data to CSV file for future references
video_df.to_csv('video_data_top10_channels.csv')
comments_df.to_csv('comments_data_top10_channels.csv')

Embed a Power BI report in a Jupyter notebook

# Conclusion <a class='anchor' id='Conclusion'></a>[↑](#Top)>

## API Reference

#### Get all items

```http
  GET /api/items
```

| Parameter | Type     | Description                |
| :-------- | :------- | :------------------------- |
| `api_key` | `string` | **AIzaSyDs9TExsfc8fDMn4lBRYXZax1vSr0ftsdk**. YouTube API key |

#### Get item

```http
  GET /api/items/${id}
```

| Parameter | Type     | Description                       |
| :-------- | :------- | :-------------------------------- |
| `id`      | `string` | **UCtYLUTtgS3k1Fg4y5tAhLbw** # Statquest
| `id`      | `string` | 'UCCezIgC97PvUuR4_gbFUs5g', # Corey Schafer
| `id`      | `string` | 'UCfzlCWGWYyIQ0aLC5w48gBQ', # Sentdex
| `id`      | `string` | 'UCNU_lfiiWBdtULKOw6X0Dig', # Krish Naik
| `id`      | `string` | 'UCzL_0nIe8B4-7ShhVPfJkgw', # DatascienceDoJo
| `id`      | `string` | 'UCLLw7jmFsvfIVaUFsLs8mlQ', # Luke Barousse 
| `id`      | `string` | 'UCiT9RITQ9PW6BhXK0y2jaeg', # Ken Jee
| `id`      | `string` | 'UC7cs8q-gJRlGwj4A8OmCmXg', # Alex the analyst
| `id`      | `string` | 'UC2UXDak6o7rBm23k3Vv5dww', # Tina Huang


#### add(more to follow)

Takes two numbers and returns the sum.

## Badges


[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)


# Link to top of page <a class='anchor' id='Top'>