# Perseverance Land on Mars YouTube Live Comments
## English Comments Posted During Nasa's Live Stream on YouTube

<div>
    <img src="https://storage.googleapis.com/kaggle-datasets-images/1177049/1970679/d7acecdc6bb8c0555f746fa078399180/dataset-cover.jpg?t=2021-02-23-15-41-02">
</div>

<br>

## Content

The dataset contains two basic attributes from which you can extract an arrangement of exciting features, starting from DateTime-based features up to text-based features.
- The first is the time in the video in which the comment was posted; it is important to note that the EST time the live stream started is 2:15.
- The second is the comment that was posted; here, it is important to note that non-english comments were removed.

## Inspiration

I think it might be interesting to get a better understanding of how people around the world reacted to the rover landing on Mars and the content shown in the video. There were many points where the video lagged, or the site crashed.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import sys
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Load the data

In [None]:
data = pd.read_csv('../input/perseverance-land-on-mars-youtube-live-comments/Perseverance_Landing.csv', index_col=0)

In [None]:
data.shape

In [None]:
data.head()

In [None]:
def ytb_duration_timestamp(ytb_duration):
    # Clean white spaces
    ytb_duration = ytb_duration.replace(' ', '')
    
    # Get length
    length = len(ytb_duration)
    
    # Add spaces
    ytb_duration = ytb_duration.rjust(8)
    
    # Add colons
    ytb_duration = list(ytb_duration)
    ytb_duration[2] = ':'
    ytb_duration = ''.join(ytb_duration)
    
    # Add zeros
    if length <= 7:
        ytb_duration = list(ytb_duration)
        ytb_duration[0] = '0'
        ytb_duration = ''.join(ytb_duration)
        
    if length <= 5:
        ytb_duration = list(ytb_duration)
        ytb_duration[1] = '0'
        ytb_duration = ''.join(ytb_duration)
        
    if length <= 4:
        ytb_duration = list(ytb_duration)
        ytb_duration[3] = '0'
        ytb_duration = ''.join(ytb_duration)
        
    return ytb_duration

In [None]:
!pip install spacytextblob
!pip install geograpy3
!python3 -m textblob.download_corpora

In [None]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
import geograpy
import nltk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nlp = spacy.load('en_core_web_sm')
spacy_text_blob = SpacyTextBlob()
nlp.add_pipe(spacy_text_blob)

In [None]:
import datetime

# Miscellaneous string variables
data['length'] = data['comment'].apply(len) # Length of the comment
data['n_words'] = data['comment'].apply(lambda x: len(x.split(' '))) # Length of the comment
data['upper'] = data['comment'].apply(lambda x: sum(map(str.isupper, x)))
data['polarity'] = data['comment'].apply(lambda x: nlp(x)._.sentiment.polarity) # Negative - Positive comment

# Time variable
data['timestamp'] = data['time'].apply(lambda x: datetime.datetime.strptime(ytb_duration_timestamp(x), '%H:%M:%S'))

# Analysis

In [None]:
data.head()

In [None]:
no_comments = data.groupby('timestamp')['comment'].count()
no_words = data.groupby('timestamp')['n_words'].sum()
no_uppers = data.groupby('timestamp')['upper'].sum()
sum_polarity = data.groupby('timestamp')['polarity'].sum()
mean_polarity = data.groupby('timestamp')['polarity'].mean()
mean_length = data.groupby('timestamp')['length'].mean()

In [None]:
milestone1 = {
    'timestamp':datetime.datetime.strptime(ytb_duration_timestamp('1:40'), '%H:%M:%S'),
    'label':'Live started'
}

milestone2 = {
    'timestamp':datetime.datetime.strptime(ytb_duration_timestamp('1:40:43'), '%H:%M:%S'),
    'label':'Landing'
}

In [None]:
def plot_milestone(milestone, frame, color='r'):
    plt.plot([milestone['timestamp'], milestone['timestamp']], [frame.min(), frame.max()], color=color)
    plt.text(milestone['timestamp'], frame.min(), milestone['label'], color=color)

In [None]:
plt.figure(figsize=(20, 6))
sns.lineplot(x=no_words.index, y=no_words.values, color='teal', alpha=.3)
plt.title('No. words')
plot_milestone(milestone1, no_words)
plot_milestone(milestone2, no_words)
plt.show()

> We can suppose that people were casually exchanging during most of the live, but during events they mostly focused on cheering without writing lots of words.

In [None]:
plt.figure(figsize=(20, 6))
sns.lineplot(x=mean_length.index, y=mean_length.values, color='steelblue', alpha=.6)
plt.title('Mean length comments')
plot_milestone(milestone1, mean_length)
plot_milestone(milestone2, mean_length)
plt.show()

In [None]:
plt.figure(figsize=(20, 6))
sns.lineplot(x=no_comments.index, y=no_comments.values, color='navy', alpha=.6)
plt.title('No. comments')
plot_milestone(milestone1, no_comments, color='black')
plot_milestone(milestone2, no_comments, color='black')
plt.show()

In [None]:
plt.figure(figsize=(20, 6))
sns.lineplot(x=no_uppers.index, y=no_uppers.values, color='purple', alpha=.4)
plt.title('No. uppercase words')
plot_milestone(milestone1, no_uppers, color='black')
plot_milestone(milestone2, no_uppers, color='black')
plt.show()

In [None]:
plt.figure(figsize=(20, 6))
sns.lineplot(x=sum_polarity.index, y=sum_polarity.apply(lambda x: x if x > 0 else 0).values, label='Positive comments', color='darkblue', alpha=.6)
sns.lineplot(x=sum_polarity.index, y=sum_polarity.apply(lambda x: x if x < 0 else 0).values, label='Negative comments', color='red', alpha=.8)
sns.scatterplot(x=mean_polarity.index, y=mean_polarity.values, label='Mean polarity', color='black', alpha=0.4)
plot_milestone(milestone1, sum_polarity, color='seagreen')
plot_milestone(milestone2, sum_polarity, color='seagreen')
plt.title('Polarity')
plt.show()

> We can see that most of the comment are defined as being positive or neutral comments! So the overall sentiment here is positive, most likely because of the joyful event :)

# Conclusion

Please don't hesitate to help me improve this notebook :)