<a href="https://colab.research.google.com/github/rahul-tc/Data-Analysis-Project/blob/main/Rahul_YTchannels_PerformanceMetrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouTube channels Performance Metrics

You have been provided with a dataset of YouTube channels that contains information about various YouTube channels across different categories. To understand the dataset, you have been provided with the following data dictionary:

| Column Name       | Description                                                     | Example     |
|-------------------|-----------------------------------------------------------------|-------------|
| ChannelName       | The name of the YouTube channel.                                | TechReview  |
| Category          | The category or genre of content the channel produces.          | Technology  |
| Subscribers       | The number of subscribers the channel has.                      | 1200000     |
| TotalViews        | The total number of views the channel has accumulated.          | 50000000    |
| LongVideoCount    | The number of long videos the channel has uploaded.             | 150         |
| ShortVideoCount   | The number of short videos the channel has uploaded.            | 50          |
| JoinedYear        | The year the channel was created.                               | 2015        |
| PlayButton        | The type of YouTube Play Button the channel has received.       | Gold        |

To create a database and import the sqlite 3 run the below code


In [1]:
import pandas as pd
import sqlite3
import requests

# URL of the CSV file on GitHub
url = 'https://raw.githubusercontent.com/Invact-Abhay/SQL/main/SQL.csv'

# Download the CSV file
response = requests.get(url)
with open('ytChannels.csv', 'wb') as file:
    file.write(response.content)

# Load the CSV file into a pandas DataFrame
data = pd.read_csv('ytChannels.csv')

# Create a SQLite database (or connect to an existing one)
conn = sqlite3.connect('ytChannels.db')

# Load the DataFrame into the SQLite database
data.to_sql('ytChannels', conn, if_exists='replace', index=False)


69

# SELECT

**Task 1**


Retrieve the ytChannels data using SQL query.

In [2]:
pd.read_sql_query("SELECT * FROM ytChannels", conn)

Unnamed: 0,ChannelName,Category,Subscribers,TotalViews,LongVideoCount,ShortVideoCount,JoinedYear,PlayButton
0,TechGuru,Technology,1200000,150000000,400,100,2015,Gold
1,CookingMaster,Food & Cooking,800000,100000000,250,50,2017,Silver
2,TravelVibes,Travel,1500000,200000000,350,50,2016,Gold
3,FitnessFreak,Fitness,1000000,80000000,150,50,2018,Silver
4,ComedyCentral,Comedy,2500000,300000000,500,100,2014,Gold
...,...,...,...,...,...,...,...,...
64,FlavorFusion,Food & Cooking,1000000,130000000,300,100,2018,Silver
65,AdventureSeekers,Travel,1800000,250000000,500,150,2015,Gold
66,BodyBlast,Fitness,1500000,110000000,250,100,2019,Gold
67,LaughLounge,Comedy,3200000,400000000,700,200,2014,Gold


**Task 2**

Get the names of all channels from ytChannels.


In [3]:
pd.read_sql_query("SELECT ChannelName FROM ytChannels", conn)

Unnamed: 0,ChannelName
0,TechGuru
1,CookingMaster
2,TravelVibes
3,FitnessFreak
4,ComedyCentral
...,...
64,FlavorFusion
65,AdventureSeekers
66,BodyBlast
67,LaughLounge


**Task 3**


Get the names of all channels, along with their subscriber counts and play button statuses, from the ytChannels.

In [4]:
pd.read_sql_query("SELECT ChannelName, Subscribers, PlayButton FROM ytChannels", conn)

Unnamed: 0,ChannelName,Subscribers,PlayButton
0,TechGuru,1200000,Gold
1,CookingMaster,800000,Silver
2,TravelVibes,1500000,Gold
3,FitnessFreak,1000000,Silver
4,ComedyCentral,2500000,Gold
...,...,...,...
64,FlavorFusion,1000000,Silver
65,AdventureSeekers,1800000,Gold
66,BodyBlast,1500000,Gold
67,LaughLounge,3200000,Gold


# ORDER BY

**Task 4**

Get the names of all channels, along with their subscriber count and play button statuses, from the ytChannels, ordered by subscriber count in descending order.


In [5]:
pd.read_sql_query("SELECT ChannelName, Subscribers, PlayButton FROM ytChannels ORDER BY Subscribers DESC", conn)

Unnamed: 0,ChannelName,Subscribers,PlayButton
0,GamingGenius,4200000,Gold
1,GamingGalaxy,4000000,Gold
2,eSportsElite,3800000,Gold
3,eSportsEmpire,3500000,Gold
4,StandUpLaughs,3200000,Gold
...,...,...,...
64,CookingMaster,800000,Silver
65,ArtisticExpressions,800000,Silver
66,BookLovers,800000,Silver
67,HealthyEats,700000,Silver


**Task 5**

Get the names of all channels, along with their total views and long video count from ytChannels ordered by long video count in descending order and then order by total views in ascending order.


In [6]:
pd.read_sql_query("SELECT ChannelName, TotalViews, LongVideoCount FROM ytChannels ORDER BY LongVideoCount DESC, TotalViews ASC", conn)

Unnamed: 0,ChannelName,TotalViews,LongVideoCount
0,eSportsElite,700000000,1000
1,GamingGalaxy,800000000,1000
2,GamingGenius,900000000,1000
3,eSportsEmpire,600000000,900
4,MelodyMakers,450000000,800
...,...,...,...
64,Wanderlust,100000000,200
65,ArtisticExpressions,70000000,150
66,BookLovers,70000000,150
67,FitnessFreak,80000000,150


# LIMIT

**Task 6**

Retrieve the top 5 Channel Names and their Subscribers from ytChannels based on their Subscriber Count.

In [7]:
pd.read_sql_query("SELECT ChannelName, Subscribers FROM ytChannels ORDER BY Subscribers DESC LIMIT 5", conn)

Unnamed: 0,ChannelName,Subscribers
0,GamingGenius,4200000
1,GamingGalaxy,4000000
2,eSportsElite,3800000
3,eSportsEmpire,3500000
4,StandUpLaughs,3200000


**Task 7**

Retrieve the top 5 channel names along with their subscribers, total views, and long video count from ytChannels, ordered by total views in descending order.


In [8]:
pd.read_sql_query("SELECT ChannelName, Subscribers, TotalViews, LongVideoCount FROM ytChannels ORDER BY TotalViews DESC LIMIT 5", conn)

Unnamed: 0,ChannelName,Subscribers,TotalViews,LongVideoCount
0,GamingGenius,4200000,900000000,1000
1,GamingGalaxy,4000000,800000000,1000
2,eSportsElite,3800000,700000000,1000
3,eSportsEmpire,3500000,600000000,900
4,GamingLegends,3000000,500000000,700


# Where

**Task 8**

Get the names of all channels from ytChannels whose category is travel.

In [9]:
pd.read_sql_query("SELECT ChannelName FROM ytChannels WHERE Category = 'Travel'", conn)

Unnamed: 0,ChannelName
0,TravelVibes
1,Wanderlust
2,ExploreWorld
3,GlobalExplorer
4,WanderLuxe
5,AdventureSeekers


**Task 9**

Get the names of all channels, along with the joined year from ytChannels with joined year greater than 2019

In [10]:
pd.read_sql_query("SELECT ChannelName, JoinedYear FROM ytChannels WHERE JoinedYear > 2019", conn)

Unnamed: 0,ChannelName,JoinedYear
0,Bookworms,2020
1,BookLovers,2021
2,CraftyCreations,2020
3,DesignDuo,2020
4,CraftyCreatures,2020
5,ArtisticAvenue,2020


**Task 10**

Get the names of all channels, along with their short video count and category from ytChannels with short video count greater than or equal to 200.

In [11]:
pd.read_sql_query("SELECT ChannelName, ShortVideoCount, Category FROM ytChannels WHERE ShortVideoCount >= 200", conn)

Unnamed: 0,ChannelName,ShortVideoCount,Category
0,eSportsEmpire,200,Gaming
1,MovieMania,200,Movies
2,LaughterLounge,200,Comedy
3,eSportsElite,200,Gaming
4,HarmonyHits,200,Music
5,StyleStar,200,Fashion
6,ComedyCorner,200,Comedy
7,GamingGalaxy,300,Gaming
8,MelodyMood,200,Music
9,FashionFiesta,200,Fashion


**Task 11**

Get the names of all channels, along with their short video count, and joined year from ytChannels with joined year equal to 2019.

In [12]:
pd.read_sql_query("SELECT ChannelName, ShortVideoCount, JoinedYear FROM ytChannels WHERE JoinedYear = 2019", conn)

Unnamed: 0,ChannelName,ShortVideoCount,JoinedYear
0,DIYCrafts,50,2019
1,ArtisticExpressions,50,2019
2,Wanderlust,50,2019
3,HealthyEats,50,2019
4,ArtisticAdventures,50,2019
5,TasteBuds,100,2019
6,FitFusion,50,2019
7,BodyBlast,100,2019


**Task 12**

Get the names of all channels, along with their short video count, and joined year from ytChannels with joined year equal to 2019, 2020 or 2021.

In [13]:
pd.read_sql_query("SELECT ChannelName, ShortVideoCount, JoinedYear FROM ytChannels WHERE JoinedYear IN (2019, 2020, 2021)", conn)

Unnamed: 0,ChannelName,ShortVideoCount,JoinedYear
0,DIYCrafts,50,2019
1,ArtisticExpressions,50,2019
2,Bookworms,50,2020
3,Wanderlust,50,2019
4,HealthyEats,50,2019
5,ArtisticAdventures,50,2019
6,BookLovers,50,2021
7,TasteBuds,100,2019
8,CraftyCreations,100,2020
9,DesignDuo,50,2020


**Task 13**

Get the names of all channels, along with their short video count, and play button from ytChannels where the short video count is not equal to 100  and play button is not equal to silver.

In [14]:
pd.read_sql_query("SELECT ChannelName, ShortVideoCount, PlayButton FROM ytChannels WHERE ShortVideoCount != 100 AND PlayButton != 'Silver'", conn)

Unnamed: 0,ChannelName,ShortVideoCount,PlayButton
0,TravelVibes,50,Gold
1,PetParadise,50,Gold
2,HistoryHaven,50,Gold
3,TechTips,50,Gold
4,WorkoutWarriors,50,Gold
5,eSportsEmpire,200,Gold
6,ArtisticAdventures,50,Gold
7,MovieMania,200,Gold
8,TechTalk,150,Gold
9,GlobalExplorer,150,Gold


**Task 14**

Get the names of all channels, along with their short video count, joined year, and play button status, from ytChannels where the channels joined in 2021, 2020, or 2019, and their play button status is not 'Gold'

In [15]:
pd.read_sql_query("SELECT ChannelName, ShortVideoCount, JoinedYear, PlayButton FROM ytChannels WHERE JoinedYear IN (2021, 2020, 2019) AND PlayButton != 'Gold'", conn)

Unnamed: 0,ChannelName,ShortVideoCount,JoinedYear,PlayButton
0,DIYCrafts,50,2019,Silver
1,ArtisticExpressions,50,2019,Silver
2,Bookworms,50,2020,Silver
3,Wanderlust,50,2019,Silver
4,HealthyEats,50,2019,Silver
5,BookLovers,50,2021,Silver
6,DesignDuo,50,2020,Silver


**Task 15**

Get the channel name, category, subscriber count and joined year from ytChannels where the channels belongs to the category technology and joined year is between 2012 and 2018.

In [16]:
pd.read_sql_query("SELECT ChannelName, Category, Subscribers, JoinedYear FROM ytChannels WHERE Category = 'Technology' AND JoinedYear BETWEEN 2012 AND 2018", conn)

Unnamed: 0,ChannelName,Category,Subscribers,JoinedYear
0,TechGuru,Technology,1200000,2015
1,TechTalks,Technology,1600000,2016
2,TechTips,Technology,1100000,2017
3,TechTalk,Technology,1700000,2017
4,TechTrends,Technology,1300000,2016
5,TechTutorials,Technology,1400000,2016


# Comprehensive Queries

**Task 16**

Manager wants a list of the channel names and their total views for channels joined in 2015 and has received gold play button. The list should be ordered by total views in descending order.

In [17]:
pd.read_sql_query("SELECT ChannelName, TotalViews FROM ytChannels WHERE JoinedYear = 2015 AND PlayButton = 'Gold' ORDER BY TotalViews DESC", conn)

Unnamed: 0,ChannelName,TotalViews
0,MusicMania,400000000
1,StandUpLaughs,400000000
2,MovieMania,300000000
3,StyleStar,250000000
4,FashionFiesta,250000000
5,AdventureSeekers,250000000
6,WanderLuxe,220000000
7,FashionForward,200000000
8,TechGuru,150000000


**Task 17**

The manager wants a list of the top 5 channel names, total views, and the year they joined based on the total views from the ytChannels for channels that have received the Gold Play Button and joined between 2019 and 2021.

In [18]:
pd.read_sql_query("SELECT ChannelName, TotalViews, JoinedYear FROM ytChannels WHERE PlayButton = 'Gold' AND JoinedYear BETWEEN 2019 AND 2021 ORDER BY TotalViews DESC LIMIT 5", conn)

Unnamed: 0,ChannelName,TotalViews,JoinedYear
0,TasteBuds,150000000,2019
1,BodyBlast,110000000,2019
2,ArtisticAdventures,100000000,2019
3,CraftyCreations,100000000,2020
4,CraftyCreatures,100000000,2020


**Task 18**

The manager wants a list of the top 3 channels from the Technology category, including their subscriber count and joined years from the ytChannels, based on subscriber count. This list should include channels that did not join between 2012 and 2018.

In [19]:
pd.read_sql_query("SELECT ChannelName, Subscribers, JoinedYear FROM ytChannels WHERE Category = 'Technology' AND JoinedYear NOT BETWEEN 2012 AND 2018 ORDER BY Subscribers DESC LIMIT 3", conn)

Unnamed: 0,ChannelName,Subscribers,JoinedYear


**Task 19**

Manager wants a list of the top 5 channels under the categories Comedy, Fitness, or Travel, along with their category and play button status, that have joined after 2014. The list should be based on the long video count in descending order from ytChannels and include only channels that have created more number of long videos than short videos.

In [20]:
pd.read_sql_query("SELECT ChannelName, Category, PlayButton FROM ytChannels WHERE Category IN ('Comedy', 'Fitness', 'Travel') AND JoinedYear > 2014 AND LongVideoCount > ShortVideoCount ORDER BY LongVideoCount DESC LIMIT 5", conn)
#

Unnamed: 0,ChannelName,Category,PlayButton
0,StandUpLaughs,Comedy,Gold
1,ExploreWorld,Travel,Gold
2,GlobalExplorer,Travel,Gold
3,AdventureSeekers,Travel,Gold
4,WanderLuxe,Travel,Gold
