# Do positive comments affect video views or it depends on quantity

## Channels used
### Music
[Anne Reburn](https://www.youtube.com/@AnneReburn)  
[Frog Leap Studios](https://www.youtube.com/@leolego)  
### Travel
[Jaychel](https://www.youtube.com/@JaychelAdventure)   
[Lucas T. Jahn](https://www.youtube.com/@LucasTJahn) 

## 1. Import all necessary libraries. 
- Requests for getting data.  
- Pandas to manipulate data.  
- NLTK to analyze text.

In [1]:
import pandas as pd
import requests
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import tokenize
import nltk

## 2. Define constants
YouTube API wants you to provide authentication token which you can get [here](https://developers.google.com/youtube/registering_an_application)       
Also you will need channel id to define from which channel you want to get videos

In [15]:
channel_id = "" 
auth_key = ""

## 3. Make request to get videos from channel and transform the data

I will get only first 50 videos

In [3]:
request_string = f"https://www.googleapis.com/youtube/v3/search?key={auth_key}&channelId={channel_id}&part=snippet,id&order=date&maxResults=50&type=video"

channel_videos = requests.get(request_string)

In [4]:
df = pd.DataFrame(channel_videos.json()['items'])
d=df['id']
df['videoId'] = df['id'].apply(lambda x: x['videoId'])
df = df.drop('id', axis = 1)
df = df.drop('etag', axis = 1)
df = df.drop('snippet', axis = 1)
df = df.drop('kind', axis = 1)

In [5]:
## Check data
df.head()

Unnamed: 0,videoId
0,FFyaqbAn-cA
1,iXTNZfuGjTM
2,v4CA65JyaVA
3,9nPFq3IX9I0
4,KVLAYb1L9xU


In [6]:
def get_views(video):
    """Gets dataframe row with video id ['videoId'] and returns 
    dataframe row with statistics. ['view_count'] and ['comment_count']"""
    views_request_string = f"https://www.googleapis.com/youtube/v3/videos?part=statistics&id={video['videoId']}&key={auth_key}"
    views_request = requests.get(views_request_string)
    video['view_count'] = int(views_request.json()['items'][0]['statistics']['viewCount'])
    video['comment_count'] = int(views_request.json()['items'][0]['statistics']['commentCount'])
    return video

In [7]:
df = df.apply(get_views, axis = 1)

In [8]:
## Check 
df.head()

Unnamed: 0,videoId,view_count,comment_count
0,FFyaqbAn-cA,1893380,1220
1,iXTNZfuGjTM,847003,239
2,v4CA65JyaVA,8493175,1859
3,9nPFq3IX9I0,328971,262
4,KVLAYb1L9xU,1053956,510


In [9]:
## Initialize sentiment analyzer that will process comment text
analyzer = SentimentIntensityAnalyzer()
def analyze(snippet):
    text = snippet['topLevelComment']['snippet']['textOriginal']
    return analyzer.polarity_scores(text)['compound']

In [10]:
def get_comments_score(temp_video_id):
    """ Gets video id and returns mean sentiment score for all comments for current video. """
    suma=0
    count=0
    temp_request_string = f"https://www.googleapis.com/youtube/v3/commentThreads?key={auth_key}&videoId={temp_video_id}&maxResults=100&part=snippet"
    temp_request=requests.get(temp_request_string)
    for item in temp_request.json()['items']:
        count += 1
        suma += analyze(item['snippet'])
    while True:
        try:
            next_page_token = temp_request.json()['nextPageToken']
            temp_request_string = f"https://www.googleapis.com/youtube/v3/commentThreads?key={auth_key}&videoId={v_id}&maxResults=100&part=snippet&pageToken={next_page_token}"
            temp_request=requests.get(temp_request_string)
            for item in temp_request.json()['items']:
                count += 1
                suma += analyze(item['snippet'])
        except: 
            break
    return suma/count

In [11]:
df['comment_score'] = df['videoId'].apply(get_comments_score)

In [12]:
df['comment_frequency'] = df['comment_count']/df['view_count']

In [13]:
## Check data
df.head()

Unnamed: 0,videoId,view_count,comment_count,comment_score,comment_frequency
0,FFyaqbAn-cA,1893380,1220,0.361841,0.000644
1,iXTNZfuGjTM,847003,239,0.506196,0.000282
2,v4CA65JyaVA,8493175,1859,0.549687,0.000219
3,9nPFq3IX9I0,328971,262,0.513874,0.000796
4,KVLAYb1L9xU,1053956,510,0.446923,0.000484


In [14]:
df.corr()

Unnamed: 0,view_count,comment_count,comment_score,comment_frequency
view_count,1.0,0.91934,-0.065308,-0.552714
comment_count,0.91934,1.0,-0.1698,-0.492465
comment_score,-0.065308,-0.1698,1.0,0.106701
comment_frequency,-0.552714,-0.492465,0.106701,1.0


| Channel           | Correlation   |                 |  
|-------------------|---------------|-----------------|
|                   | View comment  | View positivity |
| **Music**         |               |                 |
| Anne Reburn       |  0.865910	    |  -0.110349	  |
| Frog Leap Studios |  0.814003	    |  -0.087670	  |
| **Travel**        |               |                 |
| Jaychel           |  0.923368	    |  -0.235714	  |
| Lucas T. Jahn     |  0.919340	    |  -0.065308	  |


## Conclusion. 

As we see, comment positivity is not highly correlated woth number of views on the video.  
Also we can see that view count and comment frequency (probability that user will leave a comment after watching video) are not highly correlated, so we can assume that overall amount of comments does not affect view count.   