<h1>A second model</h1>
<p>After creating a initial model that fits the baseline, we want to create another model to beat the baseline.</p>

<h5>Importing the libraries</h5>
<p>Just like before, we are importing the libraries.</p>

In [1]:
import pandas as pd
import numpy as np
import re
import time

import bs4 as bs4
import json

import glob
import tqdm

pd.set_option("max.columns", 131)

#https://strftime.org/
%matplotlib inline
%pylab inline

Populating the interactive namespace from numpy and matplotlib


<p>We are reading the CSV file and another non null column was created and named "y". That's our target variable.</p>

In [2]:
df = pd.read_csv("raw_data_with_labels.csv", index_col=0)
df = df[df['y'].notnull()]
df.shape

(498, 16)

In [3]:
df_limpo = pd.DataFrame(index=df.index)


<h1>Data Cleaning</h1>
<p>Same as before. No changes here.</p>

In [4]:
clean_date = df['watch-time-text'].str.extract(r"(\d+) de ([a-z]+)\. de (\d+)")


In [5]:
clean_date[0] = clean_date[0].map(lambda x: "0"+x[0] if len(x) == 1 else x)


In [6]:
'''
It worked.
'''
clean_date[0]

0      03
1      16
2      02
3      13
4      30
       ..
496    01
497    31
498    10
499    25
500    21
Name: 0, Length: 498, dtype: object

In [7]:
mapa_meses = {"jan": "Jan",
              "fev": "Feb",
              "mar": "Mar", 
              "abr": "Apr", 
              "mai": "May", 
              "jun": "Jun",
              "jul": "Jul",
              "ago": "Aug", 
              "set": "Sep", 
              "out": "Oct", 
              "nov": "Nov",
              "dez": "Dec"}

clean_date[1] = clean_date[1].map(mapa_meses)

In [8]:
clean_date[1]

0      Sep
1      Nov
2      May
3      Aug
4      Nov
      ... 
496    Mar
497    May
498    Nov
499    Apr
500    Mar
Name: 1, Length: 498, dtype: object

In [9]:
clean_date = clean_date.apply(lambda x: " ".join(x), axis=1)

df_limpo['date'] = pd.to_datetime(clean_date, format="%d %b %Y")

<h1>View Cleaning</h1>
<p>Again, cleaning the views using the same old regular expression. No changes here either.</p>

In [10]:
views = df['watch-view-count'].str.extract(r"(\d+\.?\d*)", expand=False).str.replace(".", "").fillna(0).astype(int)


  views = df['watch-view-count'].str.extract(r"(\d+\.?\d*)", expand=False).str.replace(".", "").fillna(0).astype(int)


In [11]:
df_limpo['views'] = views

<h1>Creating Views</h1>
<p>The process of creating views is the same as before.</p>

In [12]:
features = pd.DataFrame(index=df_limpo.index)
y = df['y'].copy()

<h5>Cleaning the date type</h5>

In [13]:
pd.to_datetime("2021-01-01") - df_limpo["date"]

0      851 days
1      777 days
2      610 days
3      507 days
4      763 days
         ...   
496   1037 days
497    946 days
498    418 days
499    617 days
500    652 days
Name: date, Length: 498, dtype: timedelta64[ns]

In [14]:
features['time_since_pub'] = (pd.to_datetime("2021-01-01") - df_limpo['date']) / np.timedelta64(1, 'D')


In [15]:
features['time_since_pub'].head()

0    851.0
1    777.0
2    610.0
3    507.0
4    763.0
Name: time_since_pub, dtype: float64

<h5>Cleaning the views</h5>

In [16]:
features['views'] = df_limpo['views']

In [17]:
features['views_per_day'] = features['views'] / features['time_since_pub']

In [18]:
features = features.drop(['time_since_pub'], axis=1)

In [19]:
features.head()

Unnamed: 0,views,views_per_day
0,28028,32.93537
1,1131,1.455598
2,1816,2.977049
3,1171,2.309665
4,1228,1.609436


<h1>Creating some variables to our model</h1>
<p>Now we are gonna create some variables to use in our new model. Then, we are going to create the train and test variables, as always.</p>

In [20]:
mask_train = df_limpo['date'] < "2019-04-01"
mask_val = df_limpo['date'] >= "2019-04-01"

Xtrain, Xval = features[mask_train], features[mask_val]
ytrain, yval = y[mask_train], y[mask_val]
Xtrain.shape, Xval.shape, ytrain.shape, yval.shape

((228, 2), (270, 2), (228,), (270,))

<h1>Turn strings into numbers</h1>
<p>Our models can only understand numbers, not strings. And the column "title" only has words on it. To solve this problem, we can create a matrix wich counts how many times a word appears. Then we can create a column with each word and the counting in each line of dataframe. To do that we use the "TfidfVectorizer" library</p>

<p>TfidfVectorizer is a library that gives more weight to words that appears too little in all the videos but too much in a single video. The "min_df" parameter dictates in how many videos a word must appear at minimum. We can adjust that number at will, but by doing this, it will affect the model's performance.</p>

<h5>Sparcity of the matrix</h5>
<p>By default, the vectorizer give us a sparse matrix, for optmization purposes. Meaning that we will only store values != 0. This is a way to conserve memory without allocating unnecessary resources to store unnecessary values.</p>

In [21]:
from sklearn.feature_extraction.text import TfidfVectorizer

df_limpo['title'] = df['watch-title']

title_train = df_limpo[mask_train]['title']
title_val = df_limpo[mask_val]['title']

#Title Bow is a bag of words
title_vec = TfidfVectorizer(min_df=2)
title_bow_train = title_vec.fit_transform(title_train)
title_bow_val = title_vec.transform(title_val)

In [22]:
title_bow_train

<228x193 sparse matrix of type '<class 'numpy.float64'>'
	with 1277 stored elements in Compressed Sparse Row format>

In [23]:
title_bow_val

<270x193 sparse matrix of type '<class 'numpy.float64'>'
	with 1266 stored elements in Compressed Sparse Row format>

<p>We have a matrix of 228 by 193. Meaning, our total matrix space, counting the zeroes is 228*193.</p>

In [24]:
title_bow_train.shape

(228, 193)

In [25]:
title_bow_train

<228x193 sparse matrix of type '<class 'numpy.float64'>'
	with 1277 stored elements in Compressed Sparse Row format>

<p>Almost all the dataframe (97%) is composed of zeroes. In order to preserve memory, we don't need to store the zeroes.</p>

In [26]:
1 - 1277/(228*193)

0.9709799109171894

<h5>Joining some matrices</h5>
<p>To join some matrices, we can use the "hstack" library from "scipy.sparse". There are 2 ways to join matrices. One way is through hstack, another through vstack. The differences between them can be seen down below. </p>

In [27]:
'''
hstack - [1 2]     [3 4]   -> [1 2 3 4] - 1x4

vstack - [1 2]     [3 4]   -> [1 2]
                              [3 4] - 2x2
'''

'\nhstack - [1 2]     [3 4]   -> [1 2 3 4] - 1x4\n\nvstack - [1 2]     [3 4]   -> [1 2]\n                              [3 4] - 2x2\n'

In [28]:
#Effectvily joining matrices
from scipy.sparse import hstack, vstack

Xtrain_wtitle = hstack([Xtrain, title_bow_train])
Xval_wtitle = hstack([Xval, title_bow_val])

In [29]:
Xtrain_wtitle.shape, Xval_wtitle.shape

((228, 195), (270, 195))

<h1>Creating a model with Random Forest</h1>
<p>In order to beat the scores in our baseline, now we are going to use the Random Forest algorithm.</p>

In [30]:
#from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

mdl = RandomForestClassifier(n_estimators=1000, random_state=0, class_weight="balanced", n_jobs=6)
mdl.fit(Xtrain_wtitle, ytrain)

RandomForestClassifier(class_weight='balanced', n_estimators=1000, n_jobs=6,
                       random_state=0)

<h5>Verifying the results</h5>
<p>After the training we are going to analyse the results with the "roc_auc_score" and the "average_precision_score". </p>

In [31]:
from sklearn.metrics import roc_auc_score, average_precision_score

In [32]:
p = mdl.predict_proba(Xval_wtitle)[:, 1]

In [33]:
average_precision_score(yval, p)

0.18367911080129234

In [34]:
roc_auc_score(yval, p)

0.5761094224924012

<h1>Conclusion</h1>
<p>We have a higher precision score but a lower auc_roc_score. So, we can not conclude anything. We need both the metrics to be higher than our baseline.</p>

<h1>Active Learning</h1>
<p>Sometimes we need to do more with less. Sometimes we don't have or can't have enough data to work with. Even so, we need to get as much information as we can off of the very few data we have. In order to do that, we use the Active Learning.</p>
<p>For instance, if we have a hundred examples in wich our model can work with, we could choose 70 of these examples in wich the model can't perform well and 30 random examples.  </p>

<h5>Monving on</h5>
<p>For now on, i will follow along the code with just enough comments. Mainlly because the process repeats itself in many steps.</p>

In [35]:
df_unlabeled = pd.read_csv("raw_data_with_labels.csv", index_col=0)
df_unlabeled = df_unlabeled[df_unlabeled['y'].isnull()].dropna(how='all')
df_unlabeled.shape

(674, 16)

In [36]:
df_unlabeled.head(1)

Unnamed: 0,watch-title,y,watch-view-count,watch-time-text,content_watch-info-tag-list,watch7-headline,watch7-user-header,watch8-sentiment-actions,og:image,og:image:width,og:image:height,og:description,og:video:width,og:video:height,og:video:tag,channel_link_0
501,Kaggle Mercari Price Suggestion Challenge (1 p...,,2.167 visualizações,Publicado em 2 de nov. de 2018,Educação,Kaggle Mercari Price Suggestion Challenge (1 p...,ML Trainings\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarreg...,2.167 visualizações\n\n\n\n\n\n\n\n61\n\nGosto...,https://i.ytimg.com/vi/QFR0IHbzA30/maxresdefau...,1280.0,720.0,Pawel Jankiewicz and Konstantin Lopuhin share ...,1280.0,720.0,price suggestion,/channel/UCeq6ZIlvC9SVsfhfKnSvM9w


In [37]:
df_limpo_u = pd.DataFrame(index=df_unlabeled.index)
df_limpo_u['title'] = df_unlabeled['watch-title']

In [38]:
clean_date = df_unlabeled['watch-time-text'].str.extract(r"(\d+) de ([a-z]+)\. de (\d+)")
clean_date[0] = clean_date[0].map(lambda x: "0"+x[0] if len(x) == 1 else x)
#clean_date[1] = clean_date[1].map(lambda x: x[0].upper()+x[1:])

mapa_meses = {"jan": "Jan",
              "fev": "Feb",
              "mar": "Mar", 
              "abr": "Apr", 
              "mai": "May", 
              "jun": "Jun",
              "jul": "Jul",
              "ago": "Aug", 
              "set": "Sep", 
              "out": "Oct", 
              "nov": "Nov",
              "dez": "Dec"}

clean_date[1] = clean_date[1].map(mapa_meses)

clean_date = clean_date.apply(lambda x: " ".join(x), axis=1)
clean_date.head()
df_limpo_u['date'] = pd.to_datetime(clean_date, format="%d %b %Y")

In [39]:
df_limpo_u.head()

Unnamed: 0,title,date
501,Kaggle Mercari Price Suggestion Challenge (1 p...,2018-11-02
502,OpenAI Gym and Python for Q-learning - Reinfor...,2018-10-14
503,"Dashboarding with Notebooks, Day 1: What infor...",2018-12-17
504,How To Get US- American Company H1 Visa To Get...,2019-11-23
505,Platform Overview - Machine Learning,2019-05-21


In [40]:
views = df_unlabeled['watch-view-count'].str.extract(r"(\d+\.?\d*)", expand=False).str.replace(".", "").fillna(0).astype(int)
df_limpo_u['views'] = views

  views = df_unlabeled['watch-view-count'].str.extract(r"(\d+\.?\d*)", expand=False).str.replace(".", "").fillna(0).astype(int)


In [41]:
features_u = pd.DataFrame(index=df_limpo_u.index)

In [42]:
features_u['time_since_pub'] = (pd.to_datetime("2019-12-03") - df_limpo_u['date']) / np.timedelta64(1, 'D')
features_u['views'] = df_limpo_u['views']
features_u['views_per_day'] = features_u['views'] / features_u['time_since_pub']
features_u = features_u.drop(['time_since_pub'], axis=1)

In [43]:
features_u.head()

Unnamed: 0,views,views_per_day
501,2167,5.472222
502,20378,49.103614
503,10435,29.729345
504,7,0.7
505,4298,21.928571


In [44]:
from sklearn.feature_extraction.text import TfidfVectorizer

title_u = df_limpo_u['title']
title_bow_u = title_vec.transform(title_u)


In [45]:
title_bow_u

<674x193 sparse matrix of type '<class 'numpy.float64'>'
	with 3079 stored elements in Compressed Sparse Row format>

In [46]:
Xu_wtitle = hstack([features_u, title_bow_u])

In [47]:
Xu_wtitle

<674x195 sparse matrix of type '<class 'numpy.float64'>'
	with 4409 stored elements in COOrdinate format>

In [48]:
pu = mdl.predict_proba(Xu_wtitle)[:, 1]

In [49]:
df_unlabeled['probability_liking_video'] = pu

In [50]:
df_unlabeled.head(1)

Unnamed: 0,watch-title,y,watch-view-count,watch-time-text,content_watch-info-tag-list,watch7-headline,watch7-user-header,watch8-sentiment-actions,og:image,og:image:width,og:image:height,og:description,og:video:width,og:video:height,og:video:tag,channel_link_0,probability_liking_video
501,Kaggle Mercari Price Suggestion Challenge (1 p...,,2.167 visualizações,Publicado em 2 de nov. de 2018,Educação,Kaggle Mercari Price Suggestion Challenge (1 p...,ML Trainings\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarreg...,2.167 visualizações\n\n\n\n\n\n\n\n61\n\nGosto...,https://i.ytimg.com/vi/QFR0IHbzA30/maxresdefau...,1280.0,720.0,Pawel Jankiewicz and Konstantin Lopuhin share ...,1280.0,720.0,price suggestion,/channel/UCeq6ZIlvC9SVsfhfKnSvM9w,0.107


In [51]:
mask_u = (df_unlabeled['probability_liking_video'] >= 0.26) & (df_unlabeled['probability_liking_video'] <= 1.)
mask_u.sum()

72

In [52]:
mask_u

501     False
502     False
503     False
504     False
505      True
        ...  
1179    False
1180    False
1181    False
1182    False
1183    False
Name: probability_liking_video, Length: 674, dtype: bool

In [53]:
df_unlabeled[mask_u]

Unnamed: 0,watch-title,y,watch-view-count,watch-time-text,content_watch-info-tag-list,watch7-headline,watch7-user-header,watch8-sentiment-actions,og:image,og:image:width,og:image:height,og:description,og:video:width,og:video:height,og:video:tag,channel_link_0,probability_liking_video
505,Platform Overview - Machine Learning,,4.298 visualizações,Publicado em 21 de mai. de 2019,Ciência e tecnologia,Platform Overview - Machine Learning,Google Cloud Platform\n\n\n\n\n\n\n\n\n\n\n\n\...,4.298 visualizações\n\n\n\n\n\n\n\n141\n\nGost...,https://i.ytimg.com/vi/QR_LQQ-vvko/maxresdefau...,1280.0,720.0,"In this short GCP Essentials video, see how GC...",1280.0,720.0,Alexis Moussine Pouchkine,/channel/UCJS9pqu9BzkAMNTmzNMNhvg,0.428
507,Kaggle Meetup: Ship Detection Challenge,,504 visualizações,Publicado em 30 de nov. de 2018,Ciência e tecnologia,Kaggle Meetup: Ship Detection Challenge,Learn Data Science\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,504 visualizações\n\n\n\n\n\n\n\n9\n\nGostou d...,https://i.ytimg.com/vi/QXEy4rdLsDw/maxresdefau...,1280.0,720.0,Video from the 2018-11-29 meetup. Kaggle page:...,1280.0,720.0,learn data science,/channel/UCJhW_16uxALr0X4olEW2p5A,0.387
521,Kaggle iMaterialist (Fashion) 2019 at FGVC6 — ...,,526 visualizações,Publicado em 18 de set. de 2019,Educação,Kaggle iMaterialist (Fashion) 2019 at FGVC6 — ...,ML Trainings\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarreg...,526 visualizações\n\n\n\n\n\n\n\n18\n\nGostou ...,https://i.ytimg.com/vi/RUfmEj1MC3k/maxresdefau...,1280.0,720.0,Илья Денисов рассказывает про опыт участия в с...,1280.0,720.0,kaggle,/channel/UCeq6ZIlvC9SVsfhfKnSvM9w,0.262
522,Anomaly detection using machine learning in Az...,,2.570 visualizações,Publicado em 13 de fev. de 2019,Ciência e tecnologia,#azure #azurestreamanalytics #machinelearning\...,Microsoft Developer\n\n\n\n\n\n\n\n\n\n\n\n\n\...,2.570 visualizações\n\n\n\n\n\n\n\n32\n\nGosto...,https://i.ytimg.com/vi/Ra8HhBLdzHE/maxresdefau...,1280.0,720.0,Azure Stream Analytics is a fully managed serv...,1280.0,720.0,Azure Friday,/channel/UCsMica-v34Irf9KVTh6xx-g,0.261
527,Reinforcement Learning with TensorFlow and Uni...,,545 visualizações,Publicado em 22 de nov. de 2019,Ciência e tecnologia,Reinforcement Learning with TensorFlow and Uni...,Google Developers\n\n\n\n\n\n\n\n\n\n\n\n\n\n\...,545 visualizações\n\n\n\n\n\n\n\n13\n\nGostou ...,https://i.ytimg.com/vi/S-MbpQiwfls/maxresdefau...,1280.0,720.0,"Dan Goncharov, Head of 42 Robotics GDG Fremont...",1280.0,720.0,Dan Goncharov,/channel/UC_x5XG1OV2P6uZZ5FSM9Ttw,0.276
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1114,Artificial Intelligence & Machine Learning - ...,,28 visualizações,Publicado em 25 de nov. de 2019,Educação,Artificial Intelligence & Machine Learning - ...,jayanti prasad\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarr...,28 visualizações\n\n\n\n\n\n\n\n1\n\nGostou de...,https://i.ytimg.com/vi/vhnLdxH8eP0/hqdefault.jpg,480.0,360.0,A Review of machine learning on source code.,960.0,720.0,,/channel/UCJesxoOgwZS5JpHrTDOS-8A,0.467
1120,Recognizing Students at Risk of Dropping Out -...,,5 visualizações,Publicado em 24 de nov. de 2019,"SME, WMG (em nome de WM Spain); LatinAutor, CM...",Recognizing Students at Risk of Dropping Out -...,Joe Spaeth\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregan...,5 visualizações\n\n\n\n\n\n\n\n0\n\nGostou des...,https://i.ytimg.com/vi/w104zCQNB38/maxresdefau...,1280.0,720.0,"ML4VA project by Anna Spearman, Noah Collins, ...",1280.0,720.0,,/channel/UCZDNp9Zik3T8DjKzarPlDpQ,0.345
1136,Real Stories from Career Switchers | CareerCon...,,531 visualizações,Publicado em 15 de ago. de 2019,Ciência e tecnologia,Real Stories from Career Switchers | CareerCon...,Kaggle\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregando.....,531 visualizações\n\n\n\n\n\n\n\n24\n\nGostou ...,https://i.ytimg.com/vi/wyU9GTKSO0g/maxresdefau...,1280.0,720.0,"""Listen to a panel of speakers who successfull...",1280.0,720.0,careercon,/channel/UCSNeZleDn9c74yQc-EKnVTA,0.459
1138,"NIPS 2017 Test of Time Award ""Machine learning...",,13.474 visualizações,Publicado em 7 de mar. de 2018,Educação,"NIPS 2017 Test of Time Award ""Machine learning...",Preserve Knowledge\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,13.474 visualizações\n\n\n\n\n\n\n\n138\n\nGos...,https://i.ytimg.com/vi/x7psGHgatGM/maxresdefau...,1280.0,720.0,,1280.0,720.0,NIPS 2017,/channel/UC9p_wQs8b8SHvfJSuuxEnvw,0.311


In [54]:
hard_to_predict = df_unlabeled[mask_u]

In [55]:
random = df_unlabeled[~mask_u].sample(31, random_state=0)

In [56]:
pd.concat([hard_to_predict, random]).to_csv("active_label1.csv")

In [57]:
hard_to_predict

Unnamed: 0,watch-title,y,watch-view-count,watch-time-text,content_watch-info-tag-list,watch7-headline,watch7-user-header,watch8-sentiment-actions,og:image,og:image:width,og:image:height,og:description,og:video:width,og:video:height,og:video:tag,channel_link_0,probability_liking_video
505,Platform Overview - Machine Learning,,4.298 visualizações,Publicado em 21 de mai. de 2019,Ciência e tecnologia,Platform Overview - Machine Learning,Google Cloud Platform\n\n\n\n\n\n\n\n\n\n\n\n\...,4.298 visualizações\n\n\n\n\n\n\n\n141\n\nGost...,https://i.ytimg.com/vi/QR_LQQ-vvko/maxresdefau...,1280.0,720.0,"In this short GCP Essentials video, see how GC...",1280.0,720.0,Alexis Moussine Pouchkine,/channel/UCJS9pqu9BzkAMNTmzNMNhvg,0.428
507,Kaggle Meetup: Ship Detection Challenge,,504 visualizações,Publicado em 30 de nov. de 2018,Ciência e tecnologia,Kaggle Meetup: Ship Detection Challenge,Learn Data Science\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,504 visualizações\n\n\n\n\n\n\n\n9\n\nGostou d...,https://i.ytimg.com/vi/QXEy4rdLsDw/maxresdefau...,1280.0,720.0,Video from the 2018-11-29 meetup. Kaggle page:...,1280.0,720.0,learn data science,/channel/UCJhW_16uxALr0X4olEW2p5A,0.387
521,Kaggle iMaterialist (Fashion) 2019 at FGVC6 — ...,,526 visualizações,Publicado em 18 de set. de 2019,Educação,Kaggle iMaterialist (Fashion) 2019 at FGVC6 — ...,ML Trainings\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarreg...,526 visualizações\n\n\n\n\n\n\n\n18\n\nGostou ...,https://i.ytimg.com/vi/RUfmEj1MC3k/maxresdefau...,1280.0,720.0,Илья Денисов рассказывает про опыт участия в с...,1280.0,720.0,kaggle,/channel/UCeq6ZIlvC9SVsfhfKnSvM9w,0.262
522,Anomaly detection using machine learning in Az...,,2.570 visualizações,Publicado em 13 de fev. de 2019,Ciência e tecnologia,#azure #azurestreamanalytics #machinelearning\...,Microsoft Developer\n\n\n\n\n\n\n\n\n\n\n\n\n\...,2.570 visualizações\n\n\n\n\n\n\n\n32\n\nGosto...,https://i.ytimg.com/vi/Ra8HhBLdzHE/maxresdefau...,1280.0,720.0,Azure Stream Analytics is a fully managed serv...,1280.0,720.0,Azure Friday,/channel/UCsMica-v34Irf9KVTh6xx-g,0.261
527,Reinforcement Learning with TensorFlow and Uni...,,545 visualizações,Publicado em 22 de nov. de 2019,Ciência e tecnologia,Reinforcement Learning with TensorFlow and Uni...,Google Developers\n\n\n\n\n\n\n\n\n\n\n\n\n\n\...,545 visualizações\n\n\n\n\n\n\n\n13\n\nGostou ...,https://i.ytimg.com/vi/S-MbpQiwfls/maxresdefau...,1280.0,720.0,"Dan Goncharov, Head of 42 Robotics GDG Fremont...",1280.0,720.0,Dan Goncharov,/channel/UC_x5XG1OV2P6uZZ5FSM9Ttw,0.276
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1114,Artificial Intelligence & Machine Learning - ...,,28 visualizações,Publicado em 25 de nov. de 2019,Educação,Artificial Intelligence & Machine Learning - ...,jayanti prasad\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarr...,28 visualizações\n\n\n\n\n\n\n\n1\n\nGostou de...,https://i.ytimg.com/vi/vhnLdxH8eP0/hqdefault.jpg,480.0,360.0,A Review of machine learning on source code.,960.0,720.0,,/channel/UCJesxoOgwZS5JpHrTDOS-8A,0.467
1120,Recognizing Students at Risk of Dropping Out -...,,5 visualizações,Publicado em 24 de nov. de 2019,"SME, WMG (em nome de WM Spain); LatinAutor, CM...",Recognizing Students at Risk of Dropping Out -...,Joe Spaeth\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregan...,5 visualizações\n\n\n\n\n\n\n\n0\n\nGostou des...,https://i.ytimg.com/vi/w104zCQNB38/maxresdefau...,1280.0,720.0,"ML4VA project by Anna Spearman, Noah Collins, ...",1280.0,720.0,,/channel/UCZDNp9Zik3T8DjKzarPlDpQ,0.345
1136,Real Stories from Career Switchers | CareerCon...,,531 visualizações,Publicado em 15 de ago. de 2019,Ciência e tecnologia,Real Stories from Career Switchers | CareerCon...,Kaggle\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregando.....,531 visualizações\n\n\n\n\n\n\n\n24\n\nGostou ...,https://i.ytimg.com/vi/wyU9GTKSO0g/maxresdefau...,1280.0,720.0,"""Listen to a panel of speakers who successfull...",1280.0,720.0,careercon,/channel/UCSNeZleDn9c74yQc-EKnVTA,0.459
1138,"NIPS 2017 Test of Time Award ""Machine learning...",,13.474 visualizações,Publicado em 7 de mar. de 2018,Educação,"NIPS 2017 Test of Time Award ""Machine learning...",Preserve Knowledge\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,13.474 visualizações\n\n\n\n\n\n\n\n138\n\nGos...,https://i.ytimg.com/vi/x7psGHgatGM/maxresdefau...,1280.0,720.0,,1280.0,720.0,NIPS 2017,/channel/UC9p_wQs8b8SHvfJSuuxEnvw,0.311


In [58]:
random

Unnamed: 0,watch-title,y,watch-view-count,watch-time-text,content_watch-info-tag-list,watch7-headline,watch7-user-header,watch8-sentiment-actions,og:image,og:image:width,og:image:height,og:description,og:video:width,og:video:height,og:video:tag,channel_link_0,probability_liking_video
809,Machine Learning for Pricing & Auctions - Tuto...,,3.867 visualizações,Publicado em 25 de jul. de 2018,Ciência e tecnologia,Machine Learning for Pricing & Auctions - Tuto...,The Artificial Intelligence Channel\n\n\n\n\n\...,3.867 visualizações\n\n\n\n\n\n\n\n66\n\nGosto...,https://i.ytimg.com/vi/gsPWQqVhb74/maxresdefau...,1280.0,720.0,"Recorded July 10th, 2018 This tutorial Machine...",1280.0,720.0,International Conference on Machine Learning,/channel/UC5g-f-g4EVRkqL8Xs888BLA,0.219
560,Tableau for Data Science and Data Visualizatio...,,87.049 visualizações,Publicado em 29 de jan. de 2019,Educação,Tableau for Data Science and Data Visualizatio...,freeCodeCamp.org\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,87.049 visualizações\n\n\n\n\n\n\n\n1.171\n\nG...,https://i.ytimg.com/vi/TPMlZxRRaBQ/maxresdefau...,1280.0,720.0,"Learn to use Tableau to produce high quality, ...",1280.0,720.0,tableau course,/channel/UC8butISFwT-Wl7EV0hUK0BQ,0.013
1045,How machine learning is being used to help sav...,,1.544.819 visualizações,Publicado em 24 de out. de 2019,Licença de atribuição Creative Commons (reutil...,How machine learning is being used to help sav...,Google\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregando...,1.544.819 visualizações\n\n\n\n\n\n\n\n4.615\n...,https://i.ytimg.com/vi/sgCGHBek1To/maxresdefau...,1280.0,720.0,Bee populations around the world are declining...,1280.0,720.0,,/channel/UCK8sQmJBp8GCxrOtXWBpyEA,0.142
726,I Passed the Final Data Science Assignment! | ...,,11.421 visualizações,Publicado em 11 de nov. de 2018,Ciência e tecnologia,#datascience #machinelearning #artificialintel...,Daniel Bourke\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarre...,11.421 visualizações\n\n\n\n\n\n\n\n211\n\nGos...,https://i.ytimg.com/vi/btjp5PE96Sw/maxresdefau...,1280.0,720.0,Woohoo! I passed one of the hardest assignment...,1280.0,720.0,mrdbourke,/channel/UCr8O8l5cCX85Oem1d18EezQ,0.051
891,Learn Data Science Today - Data Science Tutori...,,178.094 visualizações,Publicado em 10 de jan. de 2019,Pessoas e blogs,#DataScienceWithPython #DataScienceWithR #Data...,UpDegree\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregando...,178.094 visualizações\n\n\n\n\n\n\n\n6.112\n\n...,https://i.ytimg.com/vi/kE9875zZkLE/maxresdefau...,1280.0,720.0,This Data Science Course will give you a Step ...,1280.0,720.0,data science bootcamps,/channel/UCn4Y0Ej7Vu3rO84e_aD28lg,0.007
889,Kaggle Live Coding: Automating report generati...,,1.725 visualizações,Transmitido ao vivo em 11 de out. de 2019,Ciência e tecnologia,Kaggle Live Coding: Automating report generati...,Kaggle\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregando.....,1.725 visualizações\n\n\n\n\n\n\n\n66\n\nGosto...,https://i.ytimg.com/vi/kDzEZihQFig/hqdefault.jpg,480.0,360.0,This week Rachael will continue to work on her...,1280.0,720.0,reading group,/channel/UCSNeZleDn9c74yQc-EKnVTA,0.056
779,Code with me (live): How to make your first Ka...,,23.854 visualizações,Transmitido ao vivo em 1 de dez. de 2018,Ciência e tecnologia,#datascience #machinelearning #kaggle\n\n\n\n ...,Daniel Bourke\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarre...,23.854 visualizações\n\n\n\n\n\n\n\n576\n\nGos...,https://i.ytimg.com/vi/f1y9wDDxWnA/hqdefault.jpg,480.0,360.0,Let's explore some Kaggle data together! Thank...,640.0,360.0,100 days of ml code,/channel/UCr8O8l5cCX85Oem1d18EezQ,0.141
1037,Data Science Training | Data Science for Begi...,,47.046 visualizações,Publicado em 28 de set. de 2019,Educação,#DataScience #DataScienceTraining #DataScience...,Intellipaat\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarrega...,47.046 visualizações\n\n\n\n\n\n\n\n1.382\n\nG...,https://i.ytimg.com/vi/sEtQUnVA4wQ/maxresdefau...,1280.0,720.0,🔥Intellipaat Data Science training course usin...,1280.0,720.0,data science course,/channel/UCCktnahuRFYIBtNnKT5IYyg,0.027
984,Checking out a Data Science Workstation,,101.921 visualizações,Publicado em 5 de out. de 2019,Educação,Checking out a Data Science Workstation,sentdex\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCarregand...,101.921 visualizações\n\n\n\n\n\n\n\n2.124\n\n...,https://i.ytimg.com/vi/p9bkz3hxrSM/maxresdefau...,1280.0,720.0,Checking out Lenovo's top of the line P920 Dat...,1280.0,720.0,computer,/channel/UCfzlCWGWYyIQ0aLC5w48gBQ,0.005
902,Deep Learning VM Images (AI Adventures),,9.505 visualizações,Publicado em 10 de out. de 2018,Ciência e tecnologia,Deep Learning VM Images (AI Adventures),Google Cloud Platform\n\n\n\n\n\n\n\n\n\n\n\n\...,9.505 visualizações\n\n\n\n\n\n\n\n211\n\nGost...,https://i.ytimg.com/vi/kyNbYCHFCSw/maxresdefau...,1280.0,720.0,Imagine if you could avoid the headache of set...,1280.0,720.0,ai adventures,/channel/UCJS9pqu9BzkAMNTmzNMNhvg,0.114
