# Index

## I. Clean & Format Data 

### A. Get Basic Info  

### B. Preliminary Clean & Formatting  
   1. Removing duplicates
   2. Convert Published At to datetime
   3. Removing videos from August
   4. Formatting Duration values to hh:mm:ss format
   5. Calculate Video age
   6. Create views/month metric to normalize for different ages of videos
   7. Create likes/month
   8. Add Category
   9. Deal with Nulls

In [2]:
#import libraries
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import isodate
from datetime import datetime, timedelta
import pytz

# I. Clean & Investigate Data

In [3]:
data = pd.read_csv('../data/asmr_videos_2024-09-08_23-21-47.csv')

## A. Get Basic Info

In [4]:
data.head()

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled
0,https://www.youtube.com/watch?v=WBeMRU1Tbgs,The Perfect ASMR Video,Well... that's up for you to decide!! How did ...,2022-09-10T22:00:19Z,UCE6acMV3m35znLcf0JGNn7Q,Gibi ASMR,"gibi, asmr, gibi asmr, perfect, video, for sle...",24,en,PT1H12M56S,5668987,84955,0,0,2588,False,False
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13T19:45:00Z,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,1973546,27525,0,0,984,False,False
2,https://www.youtube.com/watch?v=BNxAGgvb60w,The Tingle Writer 🖋️ASMR (Cinematic Roleplay),"The Tingle Writer, an #ASMR #Cinematic #Rolepl...",2022-09-25T21:00:11Z,UC4d18IlLmw0utmVxIjSadLQ,Made In France ASMR,"asmr, sleep, binaural, satisfying, tingles, tr...",24,en,PT59M,3337304,111716,0,0,3530,False,False
3,https://www.youtube.com/watch?v=UPn3GAzLwEw,Welcome Back Questionnaire (Dystopian ASMR),This is probably one of the most non-event ASM...,2022-09-10T20:00:05Z,UC4eO8gplCQQqD8yvuey1TxQ,Jimち ASMR,"Asmr, asmr for sleep, relaxing sounds, jim chi...",26,en,PT28M25S,133462,3692,0,0,129,False,False
4,https://www.youtube.com/watch?v=fMIAKg68tMA,1 Hour Of ASMR Tingles For Deep Sleep,My longest video and biggest trigger assortmen...,2022-09-25T22:00:15Z,UCM5z4re0CofPJJTp1Uocb9Q,Safe Space ASMR,"1 hour asmr, one hour of asmr, one hour of asm...",22,en,PT1H14M13S,238233,2936,0,0,124,False,False


In [5]:
data.shape

(4641, 17)

In [6]:
#4641 records, 17 cols

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4641 entries, 0 to 4640
Data columns (total 17 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Video URL          4641 non-null   object
 1   Title              4641 non-null   object
 2   Description        3072 non-null   object
 3   Published At       4641 non-null   object
 4   Channel ID         4641 non-null   object
 5   Channel Title      4641 non-null   object
 6   Tags               2342 non-null   object
 7   Category ID        4641 non-null   int64 
 8   Default Language   4641 non-null   object
 9   Duration           4641 non-null   object
 10  View Count         4641 non-null   int64 
 11  Like Count         4641 non-null   int64 
 12  Dislike Count      4641 non-null   int64 
 13  Favorite Count     4641 non-null   int64 
 14  Comment Count      4641 non-null   int64 
 15  Comments Disabled  4641 non-null   bool  
 16  Ratings Disabled   4641 non-null   bool  


In [8]:
#can see that we'll have some null

In [9]:
data.describe()

Unnamed: 0,Category ID,View Count,Like Count,Dislike Count,Favorite Count,Comment Count
count,4641.0,4641.0,4641.0,4641.0,4641.0,4641.0
mean,22.310709,10811550.0,272556.4,0.0,0.0,1812.372549
std,4.347957,33980190.0,812391.7,0.0,0.0,6758.803013
min,1.0,0.0,0.0,0.0,0.0,0.0
25%,22.0,445243.0,9261.0,0.0,0.0,211.0
50%,24.0,1490365.0,35800.0,0.0,0.0,547.0
75%,24.0,6611329.0,184553.0,0.0,0.0,1449.0
max,30.0,709711600.0,17146520.0,0.0,0.0,235672.0


In [10]:
#can remove dislike and favorite count
#there are vidoes with 0 views and 0 likes
#need to decide what to do with those

#do I have to revisit this after removing records?

#??

## B. Preliminary Clean & Formatting

This section consists of:
   1. Removing duplicates
   2. Convert Published At to datetime
   3. Removing videos from August
   4. Formatting Duration values to hh:mm:ss format
   5. Calculate Video age
   6. Create views/month metric to normalize for different ages of videos
   7. Create likes/month
   8. Add Category
   9. Deal with Nulls

### 1. Removing Duplicates

In [11]:
#checking if these are true duplicates
data[data.duplicated(keep=False)].sort_values(by='Video URL').head(10)

#there are 614 dupes, meaning we want to keep half of these

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled
3349,https://www.youtube.com/watch?v=-Ma53y5MdpE,ASMR - Tingly Tapping & Scratching With Long N...,"Hi guys, welcome back! Today I'm tapping with ...",2024-02-25T18:00:00Z,UCJyZfWrqaGX4nwXGKOEdM6Q,Nanou ASMR,"Nanou Philips, Nanou, ASMR, Dutch, Vlaams, Ned...",24,en,PT29M25S,433410,9820,0,0,378,False,False
3298,https://www.youtube.com/watch?v=-Ma53y5MdpE,ASMR - Tingly Tapping & Scratching With Long N...,"Hi guys, welcome back! Today I'm tapping with ...",2024-02-25T18:00:00Z,UCJyZfWrqaGX4nwXGKOEdM6Q,Nanou ASMR,"Nanou Philips, Nanou, ASMR, Dutch, Vlaams, Ned...",24,en,PT29M25S,433410,9820,0,0,378,False,False
256,https://www.youtube.com/watch?v=-jWwk2dClEo,SCRUB DADDY vs. LIQUID NITROGEN #shorts #asmr ...,Today we are looking at how a scrub daddy reac...,2022-10-10T13:30:34Z,UC6rrBCCDvrDFcOJHifruL5A,Tommy Technetium,,28,en,PT51S,22648474,1030810,0,0,13123,False,False
238,https://www.youtube.com/watch?v=-jWwk2dClEo,SCRUB DADDY vs. LIQUID NITROGEN #shorts #asmr ...,Today we are looking at how a scrub daddy reac...,2022-10-10T13:30:34Z,UC6rrBCCDvrDFcOJHifruL5A,Tommy Technetium,,28,en,PT51S,22648474,1030810,0,0,13123,False,False
4389,https://www.youtube.com/watch?v=-kJkIKQ0qX0,ASMR Teaching You Basic Chinese,Hello my babies! Today I am going to be teachi...,2024-07-21T14:45:00Z,UCcxQnPU48Mi6uHqoPY9MI4g,Lin ASMR,,22,en,PT32M44S,415120,12102,0,0,566,False,False
4393,https://www.youtube.com/watch?v=-kJkIKQ0qX0,ASMR Teaching You Basic Chinese,Hello my babies! Today I am going to be teachi...,2024-07-21T14:45:00Z,UCcxQnPU48Mi6uHqoPY9MI4g,Lin ASMR,,22,en,PT32M44S,415120,12102,0,0,566,False,False
2044,https://www.youtube.com/watch?v=0I3xiJXrVT4,Асмр челедж!Разные звуки!#shorts,#shorts,2023-07-04T11:15:54Z,UCNkKoabH2dd6dbjXkYohe-Q,AMAX,,20,en,PT15S,791,0,0,0,0,False,False
2051,https://www.youtube.com/watch?v=0I3xiJXrVT4,Асмр челедж!Разные звуки!#shorts,#shorts,2023-07-04T11:15:54Z,UCNkKoabH2dd6dbjXkYohe-Q,AMAX,,20,en,PT15S,791,0,0,0,0,False,False
1047,https://www.youtube.com/watch?v=0hlGu7DrP7I,Instant relaxation 💆🏼‍♀️ #asmr,,2023-02-07T19:00:01Z,UCdvYSTbhmzWgWyfGnhet03Q,Diddly ASMR,ASMR,24,en,PT32S,1887512,66022,0,0,608,False,False
1051,https://www.youtube.com/watch?v=0hlGu7DrP7I,Instant relaxation 💆🏼‍♀️ #asmr,,2023-02-07T19:00:01Z,UCdvYSTbhmzWgWyfGnhet03Q,Diddly ASMR,ASMR,24,en,PT32S,1887512,66022,0,0,608,False,False


In [12]:
#drop duplicates
data2 = data.drop_duplicates()

In [13]:
#check
data2.shape

#looks good

(4333, 17)

### 2. Converting Published At to Datetime

In [14]:
data2['Published At'] = data2['Published At'].apply(pd.to_datetime)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data2['Published At'] = data2['Published At'].apply(pd.to_datetime)


In [15]:
#check
data2.info()

#looks good
#QUESTION - UTC???

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4333 entries, 0 to 4640
Data columns (total 17 columns):
 #   Column             Non-Null Count  Dtype              
---  ------             --------------  -----              
 0   Video URL          4333 non-null   object             
 1   Title              4333 non-null   object             
 2   Description        2847 non-null   object             
 3   Published At       4333 non-null   datetime64[ns, UTC]
 4   Channel ID         4333 non-null   object             
 5   Channel Title      4333 non-null   object             
 6   Tags               2166 non-null   object             
 7   Category ID        4333 non-null   int64              
 8   Default Language   4333 non-null   object             
 9   Duration           4333 non-null   object             
 10  View Count         4333 non-null   int64              
 11  Like Count         4333 non-null   int64              
 12  Dislike Count      4333 non-null   int64        

### 3. Removing August Videos

In [16]:
data3 = data2[data2['Published At'].dt.month != 8]

In [17]:
#check
data3[data3['Published At'] == 8]

#checks out

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled


### 4. Format Duration

In [18]:
#function to convert duration to hh:mm:ss format
def convert_duration_to_hhmmss(duration_str):
    duration = isodate.parse_duration(duration_str)
    total_seconds = int(duration.total_seconds())
    hours, remainder = divmod(total_seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    return f"{hours:02}:{minutes:02}:{seconds:02}"

In [19]:
#apply duration converter to Duration column
data3['Duration Time Format'] = data3['Duration'].apply(convert_duration_to_hhmmss)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data3['Duration Time Format'] = data3['Duration'].apply(convert_duration_to_hhmmss)


In [20]:
#check
data3.head()

#looks good

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format
0,https://www.youtube.com/watch?v=WBeMRU1Tbgs,The Perfect ASMR Video,Well... that's up for you to decide!! How did ...,2022-09-10 22:00:19+00:00,UCE6acMV3m35znLcf0JGNn7Q,Gibi ASMR,"gibi, asmr, gibi asmr, perfect, video, for sle...",24,en,PT1H12M56S,5668987,84955,0,0,2588,False,False,01:12:56
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,1973546,27525,0,0,984,False,False,00:26:03
2,https://www.youtube.com/watch?v=BNxAGgvb60w,The Tingle Writer 🖋️ASMR (Cinematic Roleplay),"The Tingle Writer, an #ASMR #Cinematic #Rolepl...",2022-09-25 21:00:11+00:00,UC4d18IlLmw0utmVxIjSadLQ,Made In France ASMR,"asmr, sleep, binaural, satisfying, tingles, tr...",24,en,PT59M,3337304,111716,0,0,3530,False,False,00:59:00
3,https://www.youtube.com/watch?v=UPn3GAzLwEw,Welcome Back Questionnaire (Dystopian ASMR),This is probably one of the most non-event ASM...,2022-09-10 20:00:05+00:00,UC4eO8gplCQQqD8yvuey1TxQ,Jimち ASMR,"Asmr, asmr for sleep, relaxing sounds, jim chi...",26,en,PT28M25S,133462,3692,0,0,129,False,False,00:28:25
4,https://www.youtube.com/watch?v=fMIAKg68tMA,1 Hour Of ASMR Tingles For Deep Sleep,My longest video and biggest trigger assortmen...,2022-09-25 22:00:15+00:00,UCM5z4re0CofPJJTp1Uocb9Q,Safe Space ASMR,"1 hour asmr, one hour of asmr, one hour of asm...",22,en,PT1H14M13S,238233,2936,0,0,124,False,False,01:14:13


### 5. Calculate Video Age

In [21]:
#set query date
#update this for query date of every data query
query_date = datetime(2024, 9, 8, tzinfo=pytz.UTC)

In [22]:
data3['Video Age'] = (query_date - data3['Published At']).dt.days

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data3['Video Age'] = (query_date - data3['Published At']).dt.days


In [23]:
data3 = data3.rename(columns={'Video Age': 'Video Age Days'})

In [24]:
data3.head()

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days
0,https://www.youtube.com/watch?v=WBeMRU1Tbgs,The Perfect ASMR Video,Well... that's up for you to decide!! How did ...,2022-09-10 22:00:19+00:00,UCE6acMV3m35znLcf0JGNn7Q,Gibi ASMR,"gibi, asmr, gibi asmr, perfect, video, for sle...",24,en,PT1H12M56S,5668987,84955,0,0,2588,False,False,01:12:56,728
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,1973546,27525,0,0,984,False,False,00:26:03,725
2,https://www.youtube.com/watch?v=BNxAGgvb60w,The Tingle Writer 🖋️ASMR (Cinematic Roleplay),"The Tingle Writer, an #ASMR #Cinematic #Rolepl...",2022-09-25 21:00:11+00:00,UC4d18IlLmw0utmVxIjSadLQ,Made In France ASMR,"asmr, sleep, binaural, satisfying, tingles, tr...",24,en,PT59M,3337304,111716,0,0,3530,False,False,00:59:00,713
3,https://www.youtube.com/watch?v=UPn3GAzLwEw,Welcome Back Questionnaire (Dystopian ASMR),This is probably one of the most non-event ASM...,2022-09-10 20:00:05+00:00,UC4eO8gplCQQqD8yvuey1TxQ,Jimち ASMR,"Asmr, asmr for sleep, relaxing sounds, jim chi...",26,en,PT28M25S,133462,3692,0,0,129,False,False,00:28:25,728
4,https://www.youtube.com/watch?v=fMIAKg68tMA,1 Hour Of ASMR Tingles For Deep Sleep,My longest video and biggest trigger assortmen...,2022-09-25 22:00:15+00:00,UCM5z4re0CofPJJTp1Uocb9Q,Safe Space ASMR,"1 hour asmr, one hour of asmr, one hour of asm...",22,en,PT1H14M13S,238233,2936,0,0,124,False,False,01:14:13,713


### 6. Create views/month metric

In [25]:
#to calc views per month, take views / (age of video / (365/12))

data3['Views per Month'] = data3['View Count'] / (data3['Video Age Days'] / (365/12))

In [26]:
#check
data3.head(30)

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month
0,https://www.youtube.com/watch?v=WBeMRU1Tbgs,The Perfect ASMR Video,Well... that's up for you to decide!! How did ...,2022-09-10 22:00:19+00:00,UCE6acMV3m35znLcf0JGNn7Q,Gibi ASMR,"gibi, asmr, gibi asmr, perfect, video, for sle...",24,en,PT1H12M56S,5668987,84955,0,0,2588,False,False,01:12:56,728,236856.7
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,1973546,27525,0,0,984,False,False,00:26:03,725,82798.19
2,https://www.youtube.com/watch?v=BNxAGgvb60w,The Tingle Writer 🖋️ASMR (Cinematic Roleplay),"The Tingle Writer, an #ASMR #Cinematic #Rolepl...",2022-09-25 21:00:11+00:00,UC4d18IlLmw0utmVxIjSadLQ,Made In France ASMR,"asmr, sleep, binaural, satisfying, tingles, tr...",24,en,PT59M,3337304,111716,0,0,3530,False,False,00:59:00,713,142369.8
3,https://www.youtube.com/watch?v=UPn3GAzLwEw,Welcome Back Questionnaire (Dystopian ASMR),This is probably one of the most non-event ASM...,2022-09-10 20:00:05+00:00,UC4eO8gplCQQqD8yvuey1TxQ,Jimち ASMR,"Asmr, asmr for sleep, relaxing sounds, jim chi...",26,en,PT28M25S,133462,3692,0,0,129,False,False,00:28:25,728,5576.194
4,https://www.youtube.com/watch?v=fMIAKg68tMA,1 Hour Of ASMR Tingles For Deep Sleep,My longest video and biggest trigger assortmen...,2022-09-25 22:00:15+00:00,UCM5z4re0CofPJJTp1Uocb9Q,Safe Space ASMR,"1 hour asmr, one hour of asmr, one hour of asm...",22,en,PT1H14M13S,238233,2936,0,0,124,False,False,01:14:13,713,10163.05
5,https://www.youtube.com/watch?v=fnknG3VdZRM,Unintelligible Whispers Collection (+ new scen...,Good evening ! Tonight let's relax with a ting...,2022-09-15 18:15:23+00:00,UCftD_LCuDAwlPipnM6Uikqw,Moonlight Cottage ASMR,"asmr, unintelligible whispers, roleplays, fant...",24,en,PT50M24S,1645606,32552,0,0,792,False,False,00:50:24,723,69230.77
6,https://www.youtube.com/watch?v=Ee4laybja4M,ASMR | 👽 Alien Uses You As Classroom Visual Ai...,Always listening to your requests! Hope you en...,2022-09-22 20:30:00+00:00,UCn8vv6lF-BYyTr5m3qa9vDg,The White Rabbit ASMR,"asmr, the white rabbit asmr, thewhiterabbitasm...",24,en,PT35M55S,700737,16171,0,0,650,False,False,00:35:55,716,29768.27
7,https://www.youtube.com/watch?v=s2t16zuy8o0,ASMR Luxury Watch Shop | Personal Attention fo...,I'm so happy to see you. Thank you for coming ...,2022-09-18 22:32:17+00:00,UCrrCVjyxCToOC4C9k52b7MQ,Matty Tingles,"asmr, mattytingles, relaxing, asmr watch shop,...",22,en,PT28M40S,505251,11584,0,0,358,False,False,00:28:40,720,21344.52
8,https://www.youtube.com/watch?v=jalPaPGoZQQ,ASMR Aesthetic Journaling 🧡 Orange Theme #shor...,ASMR Aesthetic Journaling 🧡 Orange Theme #shor...,2022-09-16 04:49:18+00:00,UCWBYtp1F0-6cjMaWykmBvyA,The Crafty Lefty,"asmr, asmr for sleep, asmr journal, asmr sleep...",22,en,PT1M,46356176,1838202,0,0,5946,False,False,00:01:00,722,1952909.0
9,https://www.youtube.com/watch?v=hQ5EjzS34j4,ASMR | 👽 Alien Uses You As Classroom Visual Ai...,You guys asked for it... so here it is! Someth...,2022-09-09 20:30:02+00:00,UCn8vv6lF-BYyTr5m3qa9vDg,The White Rabbit ASMR,"asmr, the white rabbit asmr, thewhiterabbitasm...",24,en,PT22M46S,637911,18504,0,0,527,False,False,00:22:46,729,26616.09


### 7. Create Likes / Month

In [27]:
data3['Likes per Month'] = data3['Like Count'] / (data3['Video Age Days'] / (365/12))

In [28]:
data3.head()

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,...,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month
0,https://www.youtube.com/watch?v=WBeMRU1Tbgs,The Perfect ASMR Video,Well... that's up for you to decide!! How did ...,2022-09-10 22:00:19+00:00,UCE6acMV3m35znLcf0JGNn7Q,Gibi ASMR,"gibi, asmr, gibi asmr, perfect, video, for sle...",24,en,PT1H12M56S,...,84955,0,0,2588,False,False,01:12:56,728,236856.714171,3549.516369
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,...,27525,0,0,984,False,False,00:26:03,725,82798.194253,1154.784483
2,https://www.youtube.com/watch?v=BNxAGgvb60w,The Tingle Writer 🖋️ASMR (Cinematic Roleplay),"The Tingle Writer, an #ASMR #Cinematic #Rolepl...",2022-09-25 21:00:11+00:00,UC4d18IlLmw0utmVxIjSadLQ,Made In France ASMR,"asmr, sleep, binaural, satisfying, tingles, tr...",24,en,PT59M,...,111716,0,0,3530,False,False,00:59:00,713,142369.794296,4765.818139
3,https://www.youtube.com/watch?v=UPn3GAzLwEw,Welcome Back Questionnaire (Dystopian ASMR),This is probably one of the most non-event ASM...,2022-09-10 20:00:05+00:00,UC4eO8gplCQQqD8yvuey1TxQ,Jimち ASMR,"Asmr, asmr for sleep, relaxing sounds, jim chi...",26,en,PT28M25S,...,3692,0,0,129,False,False,00:28:25,728,5576.19391,154.255952
4,https://www.youtube.com/watch?v=fMIAKg68tMA,1 Hour Of ASMR Tingles For Deep Sleep,My longest video and biggest trigger assortmen...,2022-09-25 22:00:15+00:00,UCM5z4re0CofPJJTp1Uocb9Q,Safe Space ASMR,"1 hour asmr, one hour of asmr, one hour of asm...",22,en,PT1H14M13S,...,2936,0,0,124,False,False,01:14:13,713,10163.048738,125.250117


### 8. Add in Category

In [29]:
categories = pd.read_csv('../data/youtube_categories.csv')

In [30]:
categories.head()

Unnamed: 0,id,snippet.title
0,1,Film & Animation
1,2,Autos & Vehicles
2,10,Music
3,15,Pets & Animals
4,17,Sports


In [31]:
data4 = pd.merge(data3,categories,how='left',left_on='Category ID', right_on='id').drop(columns=['id'])
data4.head()

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,...,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,snippet.title
0,https://www.youtube.com/watch?v=WBeMRU1Tbgs,The Perfect ASMR Video,Well... that's up for you to decide!! How did ...,2022-09-10 22:00:19+00:00,UCE6acMV3m35znLcf0JGNn7Q,Gibi ASMR,"gibi, asmr, gibi asmr, perfect, video, for sle...",24,en,PT1H12M56S,...,0,0,2588,False,False,01:12:56,728,236856.714171,3549.516369,Entertainment
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,...,0,0,984,False,False,00:26:03,725,82798.194253,1154.784483,People & Blogs
2,https://www.youtube.com/watch?v=BNxAGgvb60w,The Tingle Writer 🖋️ASMR (Cinematic Roleplay),"The Tingle Writer, an #ASMR #Cinematic #Rolepl...",2022-09-25 21:00:11+00:00,UC4d18IlLmw0utmVxIjSadLQ,Made In France ASMR,"asmr, sleep, binaural, satisfying, tingles, tr...",24,en,PT59M,...,0,0,3530,False,False,00:59:00,713,142369.794296,4765.818139,Entertainment
3,https://www.youtube.com/watch?v=UPn3GAzLwEw,Welcome Back Questionnaire (Dystopian ASMR),This is probably one of the most non-event ASM...,2022-09-10 20:00:05+00:00,UC4eO8gplCQQqD8yvuey1TxQ,Jimち ASMR,"Asmr, asmr for sleep, relaxing sounds, jim chi...",26,en,PT28M25S,...,0,0,129,False,False,00:28:25,728,5576.19391,154.255952,Howto & Style
4,https://www.youtube.com/watch?v=fMIAKg68tMA,1 Hour Of ASMR Tingles For Deep Sleep,My longest video and biggest trigger assortmen...,2022-09-25 22:00:15+00:00,UCM5z4re0CofPJJTp1Uocb9Q,Safe Space ASMR,"1 hour asmr, one hour of asmr, one hour of asm...",22,en,PT1H14M13S,...,0,0,124,False,False,01:14:13,713,10163.048738,125.250117,People & Blogs


In [32]:
data4 = data4.rename(columns={'snippet.title': 'Category'})

**Note:** Preliminary cleaning & formatting is complete. We can now move on to preliminary EDA to see what else to clean and format before moving to model pre-processing.

### 9. Deal with Nulls

In [33]:
data4.isnull().sum()

Video URL                  0
Title                      0
Description             1339
Published At               0
Channel ID                 0
Channel Title              0
Tags                    1966
Category ID                0
Default Language           0
Duration                   0
View Count                 0
Like Count                 0
Dislike Count              0
Favorite Count             0
Comment Count              0
Comments Disabled          0
Ratings Disabled           0
Duration Time Format       0
Video Age Days             0
Views per Month            0
Likes per Month            0
Category                   0
dtype: int64

##### Decision

I do not want to remove nulls in my dataset as I believe the absence of a description and tags is a feature. Instead, I will:  
- Create a flag that indicates whether a video has no Description or no Tags
- Replace nulls with five spaces and verify

#### Create No Description Tag

In [34]:
#create a new column no_Description if a video has no description
data4['No Description'] = data4['Description'].apply(lambda x: 1 if pd.isnull(x) else 0)

In [35]:
#check
data4[data4['Description'].isnull()].head()

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,...,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,Category,No Description
29,https://www.youtube.com/watch?v=ojRt9WHYRps,Awesome Indoor Cycling Set Up ASMR,,2022-09-17 02:54:34+00:00,UCwXicok1yKCwDWJRkRbJ4KQ,Mackenzie William,,22,en,PT29S,...,0,3539,False,False,00:00:29,721,3375555.0,94595.495839,People & Blogs,1
33,https://www.youtube.com/watch?v=tyrgZ3APBCI,Was This Worth The Price | ASMR,,2022-09-12 15:09:16+00:00,UCgaKAH-PAvm2wlBk3XbsRLQ,Ivan McCombs,,24,en,PT43S,...,0,643,False,False,00:00:43,726,126355.1,6290.258838,Entertainment,1
38,https://www.youtube.com/watch?v=kJNf3qkxllo,Retro Computer ASMR: IMac G3 booting up #apple...,,2022-09-23 13:04:16+00:00,UClt2R3HOLqBP2YleFxcc6Rw,Ashton’s Retro Computer Room,,22,en,PT16S,...,0,527,False,False,00:00:16,715,33546.82,1211.136364,People & Blogs,1
43,https://www.youtube.com/watch?v=bBw1GFwmiIM,♥️💄♥️💄#makeup #makeuptutorial #asmr #satisfying,,2022-09-11 17:00:14+00:00,UCOcBePwvcGYRxSGzfZ8zlVQ,Nursema,,27,en,PT34S,...,0,3512,False,False,00:00:34,727,4508784.0,0.0,Education,1
45,https://www.youtube.com/watch?v=-iB8vD2iyMs,ASMR Паровая терапия 2.0,,2022-09-09 14:40:03+00:00,UCyWamphhnZTfApacvhLXWkg,Barberry ASMR,,24,en,PT12S,...,0,9507,False,False,00:00:12,729,70266.21,1615.129172,Entertainment,1


#### Create "No Tags" Tag

In [36]:
#create a new column "No Tags" if a video has no tags
data4['No Tags'] = data4['Tags'].apply(lambda x: 1 if pd.isnull(x) else 0)

In [37]:
#check
pd.set_option('display.max_columns', None)
data4[data4['Tags'].isnull()].head()

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,Category,No Description,No Tags
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,1973546,27525,0,0,984,False,False,00:26:03,725,82798.19,1154.784483,People & Blogs,0,1
14,https://www.youtube.com/watch?v=MqMJdLxw8lA,ASMR hair salon roleplay without any props?! m...,hey guys. u guys wanted mouth sounds and a rol...,2022-09-15 00:01:25+00:00,UCBsVm9XFIPhnyerJiSlCv5g,cait ASMR,,24,en,PT20M17S,164626,4288,0,0,269,False,False,00:20:17,723,6925.829,180.396496,Entertainment,0,1
15,https://www.youtube.com/watch?v=rtwG_4mZ40Q,Orange ASMR Snacks vs Cavities!?,Let's Find Out If Orange ASMR Snacks Will Caus...,2022-09-22 11:15:05+00:00,UC7u9o8BHiJyH2_cef_nC7tQ,Dental Digest,,22,en,PT36S,134730714,3507929,0,0,5734,False,False,00:00:36,716,5723546.0,149021.657938,People & Blogs,0,1
16,https://www.youtube.com/watch?v=hYk4KIRUs9E,ASMR Rambling in the Rain & Bundling You Up,Use code MOON135 to get $135 off across five b...,2022-09-12 19:37:18+00:00,UClMJgjg2z_IrRm6J9KrhcuQ,Goodnight Moon,,22,en,PT40M7S,907091,22826,0,0,984,False,False,00:40:07,726,38003.7,956.323462,People & Blogs,0,1
28,https://www.youtube.com/watch?v=0JiurP4w77U,ASMR 파피 플레이타임2 버려진 마미롱레그 인형 수리하기 | 인형 복구 작업 |...,영상 중간부분에 편집 오류가 있어 수정 후 재업로드 했습니다 😥\n\n안녕하세요 스...,2022-09-06 12:14:51+00:00,UCKo9E6a6-E40CC6OdSbvcdA,스마일밤 Smile Bam,,1,en,PT8M4S,22481415,159761,0,0,1331,False,False,00:08:04,732,934166.3,6638.520606,Film & Animation,0,1


#### Replace nulls for these columns with 3 spaces

In [38]:
#No Description
#First check that there are no description with 3 spaces already - although that is effectively no desc

data4[data4['Description'] == '   ']

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,Category,No Description,No Tags


In [39]:
data4['Description'] = data4['Description'].fillna('   ')

In [40]:
data4[data4['Description'] == '   ']

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,Category,No Description,No Tags
29,https://www.youtube.com/watch?v=ojRt9WHYRps,Awesome Indoor Cycling Set Up ASMR,,2022-09-17 02:54:34+00:00,UCwXicok1yKCwDWJRkRbJ4KQ,Mackenzie William,,22,en,PT29S,80014525,2242302,0,0,3539,False,False,00:00:29,721,3.375555e+06,94595.495839,People & Blogs,1,1
33,https://www.youtube.com/watch?v=tyrgZ3APBCI,Was This Worth The Price | ASMR,,2022-09-12 15:09:16+00:00,UCgaKAH-PAvm2wlBk3XbsRLQ,Ivan McCombs,,24,en,PT43S,3015905,150139,0,0,643,False,False,00:00:43,726,1.263551e+05,6290.258838,Entertainment,1,1
38,https://www.youtube.com/watch?v=kJNf3qkxllo,Retro Computer ASMR: IMac G3 booting up #apple...,,2022-09-23 13:04:16+00:00,UClt2R3HOLqBP2YleFxcc6Rw,Ashton’s Retro Computer Room,,22,en,PT16S,788580,28470,0,0,527,False,False,00:00:16,715,3.354682e+04,1211.136364,People & Blogs,1,1
43,https://www.youtube.com/watch?v=bBw1GFwmiIM,♥️💄♥️💄#makeup #makeuptutorial #asmr #satisfying,,2022-09-11 17:00:14+00:00,UCOcBePwvcGYRxSGzfZ8zlVQ,Nursema,,27,en,PT34S,107766109,0,0,0,3512,False,False,00:00:34,727,4.508784e+06,0.000000,Education,1,1
45,https://www.youtube.com/watch?v=-iB8vD2iyMs,ASMR Паровая терапия 2.0,,2022-09-09 14:40:03+00:00,UCyWamphhnZTfApacvhLXWkg,Barberry ASMR,,24,en,PT12S,1684079,38710,0,0,9507,False,False,00:00:12,729,7.026621e+04,1615.129172,Entertainment,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3946,https://www.youtube.com/watch?v=kvbosLDDolg,Chewing on everything in my brother’s backpack...,,2024-07-25 20:39:05+00:00,UCuM54mvoSpSAs9RB5C4FysQ,Dean ASMR,,1,en,PT1M1S,196605,8146,0,0,554,False,False,00:01:01,44,1.359107e+05,5631.231061,Film & Animation,1,1
3947,https://www.youtube.com/watch?v=XfMhvIRydI4,Rainbow Boba Pudding Emoji Challenge ASMR🧋🥵bub...,,2024-07-01 15:21:38+00:00,UCxCxmmjFbHz7Ief3m87MQOA,TwinKle Couple,,22,en,PT52S,1511918,43162,0,0,510,False,False,00:00:52,68,6.762869e+05,19306.531863,People & Blogs,1,1
3951,https://www.youtube.com/watch?v=xp5GeigYdq0,ASMR SELF CARE NIGHT 🍒🌸 #asmr #shorts #nails,,2024-07-18 21:22:01+00:00,UCjvt48MCuwaO-s_Ur0i4V5A,Katie’s Nails,,22,en,PT59S,341754,16778,0,0,49,False,False,00:00:59,51,2.038239e+05,10006.486928,People & Blogs,1,1
3952,https://www.youtube.com/watch?v=hx7xGsC6eEw,Midnight Asmr Super Chill with Cute Order | Mẫ...,,2024-07-29 15:49:17+00:00,UC2hqpOywcs4s8MjBCPYno8g,Mẫn Mẫn Miladen Official,,22,en,PT1M,179243,9279,0,0,337,False,False,00:01:00,40,1.362994e+05,7055.906250,People & Blogs,1,1


In [41]:
#1339 rows returned - checks out

In [42]:
#do the same for null tags
#first check that there are no description with 3 spaces already

data4[data4['Tags'] == '   ']

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,Category,No Description,No Tags


In [43]:
data4['Tags'] = data4['Tags'].fillna('   ')

In [44]:
data4[data4['Tags'] == '   ']

Unnamed: 0,Video URL,Title,Description,Published At,Channel ID,Channel Title,Tags,Category ID,Default Language,Duration,View Count,Like Count,Dislike Count,Favorite Count,Comment Count,Comments Disabled,Ratings Disabled,Duration Time Format,Video Age Days,Views per Month,Likes per Month,Category,No Description,No Tags
1,https://www.youtube.com/watch?v=vvcUJEQnen4,ASMR Victorian Medical Roleplay 🩺 Medical Exam,Meet with the remarkable Doctor Cosmos and his...,2022-09-13 19:45:00+00:00,UC20BrZXv7OC6JyALCZN-0Ig,Tinglesmith ASMR,,22,en,PT26M3S,1973546,27525,0,0,984,False,False,00:26:03,725,8.279819e+04,1154.784483,People & Blogs,0,1
14,https://www.youtube.com/watch?v=MqMJdLxw8lA,ASMR hair salon roleplay without any props?! m...,hey guys. u guys wanted mouth sounds and a rol...,2022-09-15 00:01:25+00:00,UCBsVm9XFIPhnyerJiSlCv5g,cait ASMR,,24,en,PT20M17S,164626,4288,0,0,269,False,False,00:20:17,723,6.925829e+03,180.396496,Entertainment,0,1
15,https://www.youtube.com/watch?v=rtwG_4mZ40Q,Orange ASMR Snacks vs Cavities!?,Let's Find Out If Orange ASMR Snacks Will Caus...,2022-09-22 11:15:05+00:00,UC7u9o8BHiJyH2_cef_nC7tQ,Dental Digest,,22,en,PT36S,134730714,3507929,0,0,5734,False,False,00:00:36,716,5.723546e+06,149021.657938,People & Blogs,0,1
16,https://www.youtube.com/watch?v=hYk4KIRUs9E,ASMR Rambling in the Rain & Bundling You Up,Use code MOON135 to get $135 off across five b...,2022-09-12 19:37:18+00:00,UClMJgjg2z_IrRm6J9KrhcuQ,Goodnight Moon,,22,en,PT40M7S,907091,22826,0,0,984,False,False,00:40:07,726,3.800370e+04,956.323462,People & Blogs,0,1
28,https://www.youtube.com/watch?v=0JiurP4w77U,ASMR 파피 플레이타임2 버려진 마미롱레그 인형 수리하기 | 인형 복구 작업 |...,영상 중간부분에 편집 오류가 있어 수정 후 재업로드 했습니다 😥\n\n안녕하세요 스...,2022-09-06 12:14:51+00:00,UCKo9E6a6-E40CC6OdSbvcdA,스마일밤 Smile Bam,,1,en,PT8M4S,22481415,159761,0,0,1331,False,False,00:08:04,732,9.341663e+05,6638.520606,Film & Animation,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3948,https://www.youtube.com/watch?v=swMb7x7RgSw,#asmr @AsmrWD 신기한물먹방 #ASMRDRINKING #asmreating...,@AsmrWD \n#asmrdrinking,2024-07-18 08:25:00+00:00,UCbbiEt2oAN3F5tJwAHOIPbA,Asmr 세계 음주,,22,en,PT1M,5883083,149682,0,0,109,False,False,00:01:00,51,3.508701e+06,89271.127451,People & Blogs,0,1
3950,https://www.youtube.com/watch?v=lIJtPZ8if8s,ASMR FAST AND AGGRESIVE FAKE FOOD,holaaaa,2024-07-16 23:45:01+00:00,UCHdDGlROrbk9saotVVgwpOA,FLO ASMR,,20,en,PT24M48S,218465,8240,0,0,412,False,False,00:24:48,53,1.253769e+05,4728.930818,Gaming,0,1
3951,https://www.youtube.com/watch?v=xp5GeigYdq0,ASMR SELF CARE NIGHT 🍒🌸 #asmr #shorts #nails,,2024-07-18 21:22:01+00:00,UCjvt48MCuwaO-s_Ur0i4V5A,Katie’s Nails,,22,en,PT59S,341754,16778,0,0,49,False,False,00:00:59,51,2.038239e+05,10006.486928,People & Blogs,1,1
3952,https://www.youtube.com/watch?v=hx7xGsC6eEw,Midnight Asmr Super Chill with Cute Order | Mẫ...,,2024-07-29 15:49:17+00:00,UC2hqpOywcs4s8MjBCPYno8g,Mẫn Mẫn Miladen Official,,22,en,PT1M,179243,9279,0,0,337,False,False,00:01:00,40,1.362994e+05,7055.906250,People & Blogs,1,1


In [45]:
#1966 returned, checks out

# II. Next Steps

We are done with the preliminary data cleaning and formatting. I will then perform preliminary EDA (see next notebook).

In [46]:
#save file to csv for next step
#commenting out to not re-run

data4.to_csv('../data/data_clean_pt1.csv')

In [47]:
data4.shape
#3956 x 24

(3956, 24)