<a href="https://colab.research.google.com/github/monicahatis/Kenya-Podcast-analysis/blob/main/podcasts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## KENYA YOUTUBE PODCAST ANALYSIS


In this project, I used the YouTube Data API to collect data from a selection of Kenyan YouTube podcasts, focusing on analyzing engagement and content performance across various channels. By extracting metrics such as views, likes, subscriber growth, and other key indicators, I aimed to uncover patterns that contribute to audience engagement within the Kenyan podcasting community. This analysis sheds light on the types of content that resonate most with Kenyan audiences and identifies factors that drive channel growth and audience loyalty. Through this approach, I hope to gain deeper insights into the unique elements that lead to successful content creation and sustained viewer interest in Kenyan YouTube podcasts.

## IMPORT REQUIRED LIBRARIES

In [1]:
from googleapiclient.discovery import build
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### 2. DEFINE API_KEY, CHANNEL_ID AND INITIALIZE YOUTUBE API

In [2]:
API_KEY = "AIzaSyDZ8G1llJEjwoj9XTNgRRFiR3PsRmf6HPI"
CHANNEL_IDS = ['UCKg6e0Z4v_u3m8DVvTuSU-g', #Legally clueless
               'UC5h4-WH0LAV4CWs380yM33A', #Iko nini
               'UCEYqAce8R78wNRAkb8wIuVQ', #Sandwich
               'UC6fVFxrbf0HDRW3B2mdWFGA',  #TMI
               'UCUc4W23onOq0760D6PnmuzQ',  #Be different
               'UCE3KVkSH1GwUtAAMcVcJ3QQ',  #mic cheque
               'UCCjULCQvh2cQQLzYe4DC2Nw',  #joyride
               'UCk6svFkAPqYCv8izig7HSUw',   #POV
               'UCx1WDOZzmyIa1MlK1W3RdOg',   #Its related I promise
               'UCmBQsChFjOTcqv7D6sT8dPw',   #Man talk
]

#YouTube API Client Initialization:
youtube = build("youtube", "v3", developerKey = API_KEY)

### 3. EXTRACT DATA

In [3]:
def get_channel_stats(youtube, CHANNEL_IDS):
  all_data = []
  request = youtube.channels().list(
      part="snippet, contentDetails, statistics",
      id = ','.join( CHANNEL_IDS))

  response = request.execute()

  for i in range(len(response["items"])):
    data = dict(
    channel_name = response["items"][i]["snippet"]["title"],
    date_created=response["items"][i]["snippet"]["publishedAt"],
    subscribers=response["items"][i]["statistics"]["subscriberCount"],
    Total_videos = response["items"][i]["statistics"]["videoCount"],
    Total_views = response["items"][i]["statistics"]["viewCount"],
        )

    all_data.append(data)

  return all_data

### 4. DICTIONARY CONTAINING CONTENT DATA

In [4]:
channel_statistics = get_channel_stats(youtube, CHANNEL_IDS)
channel_statistics

[{'channel_name': 'The Joy Ride',
  'date_created': '2021-12-06T08:07:37.9638Z',
  'subscribers': '72100',
  'Total_videos': '207',
  'Total_views': '8348722'},
 {'channel_name': 'POVPodcastKenya',
  'date_created': '2023-02-07T11:58:48.002857Z',
  'subscribers': '7900',
  'Total_videos': '55',
  'Total_views': '409154'},
 {'channel_name': 'Sandwich Podcast KE',
  'date_created': '2020-11-16T06:50:11.588499Z',
  'subscribers': '32400',
  'Total_videos': '148',
  'Total_views': '3107858'},
 {'channel_name': "It's Related, I Promise",
  'date_created': '2024-02-20T10:41:42.711259Z',
  'subscribers': '16900',
  'Total_videos': '29',
  'Total_views': '775642'},
 {'channel_name': 'Legally Clueless',
  'date_created': '2021-04-04T14:40:17.265409Z',
  'subscribers': '20900',
  'Total_videos': '425',
  'Total_views': '916207'},
 {'channel_name': 'Be Different Podcast',
  'date_created': '2021-07-19T08:05:04.870323Z',
  'subscribers': '4770',
  'Total_videos': '69',
  'Total_views': '46243'},
 

#### Converting to DataFrame

In [6]:
channel_df = pd.DataFrame(channel_statistics)
channel_df

Unnamed: 0,channel_name,date_created,subscribers,Total_videos,Total_views
0,The Joy Ride,2021-12-06T08:07:37.9638Z,72100,207,8348722
1,POVPodcastKenya,2023-02-07T11:58:48.002857Z,7900,55,409154
2,Sandwich Podcast KE,2020-11-16T06:50:11.588499Z,32400,148,3107858
3,"It's Related, I Promise",2024-02-20T10:41:42.711259Z,16900,29,775642
4,Legally Clueless,2021-04-04T14:40:17.265409Z,20900,425,916207
5,Be Different Podcast,2021-07-19T08:05:04.870323Z,4770,69,46243
6,TMI Podcast KE,2021-06-29T05:05:30.649655Z,134000,365,13625896
7,ManTalk Ke,2019-08-01T03:59:00Z,42600,263,2466230
8,Iko Nini,2015-03-22T13:49:41Z,142000,2800,33725540
9,UpSyd Digital Networks,2018-01-15T17:59:10Z,72400,652,9716419


### 5. EXPLORATORY DATA ANALYSIS AND DATA CLEANING

In [7]:
channel_df

Unnamed: 0,channel_name,date_created,subscribers,Total_videos,Total_views
0,The Joy Ride,2021-12-06T08:07:37.9638Z,72100,207,8348722
1,POVPodcastKenya,2023-02-07T11:58:48.002857Z,7900,55,409154
2,Sandwich Podcast KE,2020-11-16T06:50:11.588499Z,32400,148,3107858
3,"It's Related, I Promise",2024-02-20T10:41:42.711259Z,16900,29,775642
4,Legally Clueless,2021-04-04T14:40:17.265409Z,20900,425,916207
5,Be Different Podcast,2021-07-19T08:05:04.870323Z,4770,69,46243
6,TMI Podcast KE,2021-06-29T05:05:30.649655Z,134000,365,13625896
7,ManTalk Ke,2019-08-01T03:59:00Z,42600,263,2466230
8,Iko Nini,2015-03-22T13:49:41Z,142000,2800,33725540
9,UpSyd Digital Networks,2018-01-15T17:59:10Z,72400,652,9716419


#### Convert date format to a simpler format

In [9]:
# Convert date_created to datetime with mixed formats and format to yyyy-mm-dd
channel_df['date_created'] = pd.to_datetime(channel_df['date_created'], format='mixed').dt.strftime('%Y-%m-%d')
channel_df

Unnamed: 0,channel_name,date_created,subscribers,Total_videos,Total_views
0,The Joy Ride,2021-12-06,72100,207,8348722
1,POVPodcastKenya,2023-02-07,7900,55,409154
2,Sandwich Podcast KE,2020-11-16,32400,148,3107858
3,"It's Related, I Promise",2024-02-20,16900,29,775642
4,Legally Clueless,2021-04-04,20900,425,916207
5,Be Different Podcast,2021-07-19,4770,69,46243
6,TMI Podcast KE,2021-06-29,134000,365,13625896
7,ManTalk Ke,2019-08-01,42600,263,2466230
8,Iko Nini,2015-03-22,142000,2800,33725540
9,UpSyd Digital Networks,2018-01-15,72400,652,9716419


In [10]:
channel_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   channel_name  10 non-null     object
 1   date_created  10 non-null     object
 2   subscribers   10 non-null     object
 3   Total_videos  10 non-null     object
 4   Total_views   10 non-null     object
dtypes: object(5)
memory usage: 528.0+ bytes


#### Convert the Dtype of subscribers, total videos and total views column to integer


In [11]:
# Convert specified columns to integer data type in one line
channel_df[['subscribers', 'Total_videos', 'Total_views']] = channel_df[['subscribers', 'Total_videos', 'Total_views']].astype(int)

In [12]:
channel_df

Unnamed: 0,channel_name,date_created,subscribers,Total_videos,Total_views
0,The Joy Ride,2021-12-06,72100,207,8348722
1,POVPodcastKenya,2023-02-07,7900,55,409154
2,Sandwich Podcast KE,2020-11-16,32400,148,3107858
3,"It's Related, I Promise",2024-02-20,16900,29,775642
4,Legally Clueless,2021-04-04,20900,425,916207
5,Be Different Podcast,2021-07-19,4770,69,46243
6,TMI Podcast KE,2021-06-29,134000,365,13625896
7,ManTalk Ke,2019-08-01,42600,263,2466230
8,Iko Nini,2015-03-22,142000,2800,33725540
9,UpSyd Digital Networks,2018-01-15,72400,652,9716419


In [13]:
channel_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   channel_name  10 non-null     object
 1   date_created  10 non-null     object
 2   subscribers   10 non-null     int64 
 3   Total_videos  10 non-null     int64 
 4   Total_views   10 non-null     int64 
dtypes: int64(3), object(2)
memory usage: 528.0+ bytes


### 5. DATA ANALYSIS AND VISUALIZATION

1. Plot subscriber numbers against date_created to identify trends in subscriber growth among these channels.
2. Calculate and plot the average number of videos uploaded per year to understand the rate at which each channel produces content.
3. Calculate the ratio of subscribers to total views to see which channels have higher audience engagement rates per view.
4. Examine the number of videos per channel to see which channels are the most prolific in content production.
5. Create rankings to show which channels are leading in terms of subscribers, views, and average views per video.
6. or each channel, calculate engagement metrics (e.g., subscribers/views) and analyze them over time to see if engagement increases with newer content.
7. Calculate the age of each channel and see if older channels tend to have more content, or if newer channels are catching up quickly.
8. Examine the number of videos per channel to see which channels are the most prolific in content production.
9. See which channels attract subscribers faster by looking at subscriber counts relative to date_created.
10.