# Jupyter notebook

Useful keyboard shortcuts:
- Shift + Enter – run the code
- Esc + b – create a new cell
- Esc + m – change the cell to a markdown (comment) cell
- Ctrl + Shift + Minus – split the cell in half
- Shift + Tab – show information about a function

In [None]:
print("Hello world!")

In [None]:
1+1

In [None]:
variable = 135

In [None]:
variable

In [None]:
# Infinite loop in the cell

# Pandas

In [None]:
import pandas as pd

In [None]:
data = {"name": ["Ada", "Ben", "Chao", "Dia"],
        "age": [29, 31, 27, 31],
        "score": [88, 92, 79, 92]}
df = pd.DataFrame(data)

In [None]:
df

In [None]:
# 3) Selecting data
df["score"]   # a Series (one column)

In [None]:
df[["name","score"]]        # a DataFrame (two columns)

In [None]:
# Filtering data
df[df["score"] > 90]

In [None]:
# Save:
df.to_csv("results.csv", index=False)

In [None]:
# Load:
df2 = pd.read_csv("results.csv")

# Downloading Youtube comments

API key (YouTube Data API v3). 

Steps:

1. Go to Google Cloud Console 
2. Create project 
3. Enable APIs & Services 
4. Enable YouTube Data API v3 
5. Credentials  
6. Create API key.
7. Paste the key into the code below.
8. Keep it safe! 

In [None]:
!pip -q install pandas requests

In [None]:
import re, requests, pandas as pd

API_KEY = ""   # <-- put your YouTube Data API key here

VIDEO_URL = "https://www.youtube.com/watch?v=RiTfe-ckD_g"  # example Ameca video


In [None]:
import urllib.parse as up
video_id = up.parse_qs(up.urlparse(VIDEO_URL).query)["v"][0]
print("Video ID:", video_id)

In [None]:
def strip_html(s):
    return re.sub("<.*?>", "", s or "").replace("&amp;","&").strip()

def yt_get(endpoint, params):
    base = f"https://www.googleapis.com/youtube/v3/{endpoint}"
    params = {**params, "key": API_KEY}
    r = requests.get(base, params=params, timeout=20)
    r.raise_for_status()
    return r.json()

def fetch_comments(video_id, max_comments=200):
    out = []
    page = None
    pulled = 0
    while pulled < max_comments:
        params = {
            "part": "snippet",
            "videoId": video_id,
            "maxResults": min(100, max_comments - pulled),
            "textFormat": "html"
        }
        if page:
            params["pageToken"] = page
        data = yt_get("commentThreads", params)
        for item in data.get("items", []):
            top = item["snippet"]["topLevelComment"]["snippet"]
            out.append({
                "commentId": item["id"],
                "date": top["publishedAt"],
                "author": top.get("authorDisplayName",""),
                "text": strip_html(top.get("textDisplay","")),
                "likeCount": top.get("likeCount", 0),
            })
            pulled += 1
            if pulled >= max_comments:
                break
        page = data.get("nextPageToken")
        if not page:
            break
    return out

In [None]:
comments = fetch_comments(video_id, max_comments=200)
df = pd.DataFrame(comments).drop_duplicates(subset=["commentId"]).reset_index(drop=True)

print("Collected", len(df), "comments")

In [None]:
df

In [None]:
df.to_csv(full_csv, index=False)

# Exercise 1

Download and save the comments for the medium humanlike robot and the low humanlike robot as CSV files.

# Sentiment analisys with Vader

VADER is a rule-based sentiment analysis tool that uses a lexicon of words and simple heuristics to determine the positive, negative, or neutral tone of text, especially effective for social media language.

In [None]:
!pip -q install vaderSentiment

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

In [None]:
text = "I love this robot, it's amazing!"
scores = analyzer.polarity_scores(text)
print(scores)

In [None]:
df['text']

In [None]:
analyzer.polarity_scores(df['text'][2])

In [None]:
df["vader_score"] = df["text"].apply(lambda t : an.polarity_scores(t)['compound'])

In [None]:
df

In [None]:
import seaborn as sns

In [None]:
df['vader_score'].mean()

In [None]:
sns.histplot(df['vader_score'])

# Exercise 2

Analize sentiment for all comments from exercise 1. 

### Comparing Robots

When we have sentiment scores for multiple robots, we can statistically test whether the robots differ.

Kruskal–Wallis is a non-parametric test that compares three or more groups.

We use it to check:

> Do comments about robots with different humanlikeness levels show different sentiment?

# Exercise 3

Statistically compare the sentiment scores for the robots using the Kruskal–Wallis test.

In [None]:
import scipy
# kruskal wallis
# scipy.stats.kruskal(group1, group2, group3)

In [2]:
# post hocs
#import scikit_posthocs as sp
#sp.posthoc_dunn(df, val_col='score', group_col='group', p_adjust='bonferroni')

In [None]:
df = pd.read_csv('Ameca_youtube_comments.csv')

## Example research questions

Using these tools, we can study questions such as:

- Does the **humanlikeness** of humanoid robots relate to emotional attitudes toward them?  
- Do different **categories of robots** (social robots, pet robots, industrial robots, etc.) evoke different attitudes?  
- Is the **gender or race** of robots related to emotional attitudes expressed in comments?

For example, we can divide robots into categories based on **humanlikeness** (humanlike vs. non-humanlike), search for videos about these robots, collect the comments, and run sentiment analysis.  
By comparing sentiment between these groups, we can answer questions like:

> Do people react differently to **humanlike** robots than to **non-humanlike** robots?

# Lexicon-based specific context analisys 

We can also measure the presence of specific words related to phenomenon we are interested in.

Steps:
1. Create list of words related to some phenomenon.
2. Split each comment into lowercase words
3. Count how many phenomenon-related words appear
4. Compute index = (total number of related words / total words) * 1000

This gives a simple index for each robot/category of robots

For example: eeriness.

In [None]:
eerie = {'eerie','creepy','haunting','spookish','spooky','uncanny','unearthly','weird'}

# https://www.merriam-webster.com/thesaurus/eerie

In [None]:
import re
import pandas as pd

toks = df['text'].fillna('').str.lower().str.findall(r'[a-z]+') 
# make all text lowercase, replace missing with empty, and split into words

word_count = toks.str.len()

total_eerie = 0
for ws in toks:           
    e_count = 0
    for w in ws:
        if w in eerie:
            e_count += 1
    total_eerie += e_count

total_words = int(word_count.sum())
eerie_per_1000 = (total_eerie / max(total_words, 1)) * 1000

print("total_words:", total_words)
print("total_eerie_words:", total_eerie)
print("eerie_per_1000:", eerie_per_1000)

# Exercise 4

Create Your Own Lexicon-Based Index

Choose any psychological or thematic concept and create your own **word list** representing that concept. Then:

1. Build a simple lexicon based on dictionary.
2. Loop through all comments for one robot.
3. Count how many words from your lexicon appear in the comments.
4. Compute an index such as:
   - occurances per 1000 words

