# Final Project
## Gabrielle Bartomeo

Describe the final project.

## Data Setup

In [1]:
import requests

import nltk
from nltk.classify import NaiveBayesClassifier
from nltk.sentiment.util import *
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import re

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

Libraries explanation here. Break it down!

### Marvel Wikia names and alignment

The [Marvel Wikia](http://marvel.wikia.com/wiki/Marvel_Database) offers a plethora of information. Our only interest was in Norse pantheon's character from the Earth-616 universe. To do this, information was originally stripped from the Wikia by 538 and posted onto [their Github](https://github.com/fivethirtyeight/data/blob/master/comic-characters/).

In [2]:
wikia_url = "https://raw.githubusercontent.com/gabartomeo/data620-cunysps/master/Final%20Project/Data/marvel-wikia-data.csv"
wikia_df = pd.read_csv(wikia_url)[["name", "ALIGN"]].dropna()
wikia_df = wikia_df[wikia_df["name"].apply(lambda name: True if re.match("^(Thor|Loki|Odin|Sif|Sigyn|Frigga?|Freya|Freyr?|Heimdall|Utgard-Loki|Fenris|Idunn|Tyr|Jormungand|Hela|Hati|Skoll|Balder|Bragi) ", name) else False)].copy()
wikia_df["shortened"] = wikia_df["name"].apply(lambda name: re.sub(" .*", "", name) if " " in name else name)
wikia_df["ALIGN"] = wikia_df["ALIGN"].apply(lambda malign: malign.replace(" Characters", ""))
wikia_df = wikia_df.drop_duplicates(subset="shortened")
wikia_df = wikia_df.sort_values(by="shortened")[["name", "ALIGN"]].reset_index(drop=True).copy()
wikia_df["name"] = wikia_df["name"].apply(lambda name: re.sub(" \(.*\)", "", name) if "(" in name else name)
wikia_df = wikia_df.rename(index=str, columns={"name": "name, Marvel", "ALIGN": "alignment, Marvel"})
wikia_df

Unnamed: 0,"name, Marvel","alignment, Marvel"
0,Balder Odinson,Good
1,Bragi,Good
2,Fenris Wolf,Bad
3,Frey,Good
4,Freya,Good
5,Frigga,Good
6,Hati,Bad
7,Heimdall,Good
8,Hela,Bad
9,Idunn,Good


Explain how you did it.

### Going through the Eddas

The Eddas (both the Elder Edda and the Younger Edda) were [available online](https://www.gutenberg.org/ebooks/14726) via Project Gutenberg.

WRITE MORE HERE

In [3]:
text_url = "http://www.gutenberg.org/cache/epub/14726/pg14726.txt"
raw_text = requests.get(text_url).content.decode('utf-8')
test = re.findall("\n +(.*[^ ]) {2,}\d+", raw_text)
eddas = {}
chapters = [chapter.upper() for chapter in re.findall("\n +(.*[^ ]) {2,}\d+", raw_text)][:-2]
raw_text = re.sub("((\n[A-ZÀÂÄÖÔÉÈËÊÏÎŸÇÙÛÜÆŒ][A-ZÀÂÄÖÔÉÈËÊÏÎŸÇÙÛÜÆŒ \.\'\-]+\.)|(ODIN BEGUILES THE DAUGHTER OF BAUGI))", "NEW CHAPTER\n\\1\n", raw_text)
raw_text = re.sub("_([a-zA-ZäöüàâäôéèëêïîçùûüÿæœÀÂÄÖÔÉÈËÊÏÎŸÇÙÛÜÆŒ]+)\.?_\.?\r\n(\r\n\d+\. )([\"A-Za-z])", "\\2\\1: \\3", raw_text, flags=re.UNICODE)
chapters_split = raw_text.split("NEW CHAPTER")[16:-32]
for i in range(0, len(chapters)):
    eddas[chapters[i]] = re.split("\r\n\d+\.", chapters_split[i])[1:]
    for each_verse in range(0, len(eddas[chapters[i]])):
        eddas[chapters[i]][each_verse] = re.sub("[\r\n]+", " ", eddas[chapters[i]][each_verse]).strip()
        eddas[chapters[i]][each_verse] = re.sub("(\[.*\])|( FOOTNOTES:)", "", eddas[chapters[i]][each_verse])

In [4]:
sent_tokens = {}

for chapter in chapters:
    sent_tokens[chapter] = [nltk.sent_tokenize(verse) for verse in eddas[chapter]]
    sent_tokens[chapter] = [sentence for sentences in sent_tokens[chapter] for sentence in sentences]

sent_tokens_list = [sentence for current_list in list(sent_tokens.values()) for sentence in current_list]
elder_sent_tokens_list = [sentence for current_list in list(sent_tokens.values())[:37] for sentence in current_list]
younger_sent_tokens_list = [sentence for current_list in list(sent_tokens.values())[37:] for sentence in current_list]

In [5]:
sia = SentimentIntensityAnalyzer()

add_to_lexicon = {
    "devour": "murder", "slay": "kill", "slain": "killed", 
    "devourer": "murderer", "slayer": "murderer", 
    "devoured": "murdered", "swallowed": "murdered",
    "womanish": "weak", "emasculate": "weaken", "emasculated": "weaken",
    "perish": "die", "lifeless": "dead",
    "monster": "scary", "monsters": "scary", "monstrous": "scary",
    "troll": "scary", "demon": "scary", "demons": "scary",
    "perils": "peril", "burst": "broke", "conceal": "hide",
    "raving": "crazy", "unequally": "unequal", "dastardly": "cruel",
    "scoffing": "mocking", "false": "lying", "falsehoods": "lies",
    "opprobrius": "criticising", "bane": "burden",
    "reproachful": "disappointing", "strife": "conflict", "chafe": "irritate",
    "impure": "imperfect", "pure": "perfect", "holy": "divine",
    "sacred": "divine", "celestial": "divine", "aid": "help",
    "fain": "gladly", "producing": "creating", "mighty": "strong",
    "distinguish": "popular", "distinguished": "popular", "famous": "popular",
    "famed": "popular", "benignent": "kindly", "propitious": "favorable",
    "assented": "agreed", "wished": "wish", "prophecy": "divination"
}

for k,v in add_to_lexicon.items():
    sia.lexicon[k] = sia.lexicon[v]

A few words were added to the lexicon to make up for the old-speak of the document since the base lexicon being utilized is more geared for modern texts.

In [6]:
scores = [sia.polarity_scores(sentence) for sentence in sent_tokens_list]
gods = ["Odin", "Thor", "Loki", "Frigga", "Frey", "Freyja", "Bragi", "Tyr", "Sif", "Baldr", "Heimdall", "Sigyn", "Hel", "Idunn"]
giants = ["Utgard-Loki"]
monsters = ["Fenrir", "Jormungand", "Hati", "Skoll"]

def god_scores(godname):
    if godname == "Fenrir":
        list_of_sentences = [sentence if len(re.findall("Fenri", sentence)) > 0 else None for sentence in sent_tokens_list]
    elif godname == "Loki":
        list_of_sentences = [sentence if len(re.findall("[^-]Loki", sentence)) > 0 else None for sentence in sent_tokens_list]
    elif godname == "Freyja":
        list_of_sentences = [sentence if len(re.findall("Frey[ij]?a", sentence)) > 0 else None for sentence in sent_tokens_list]
    elif godname == "Tyr":
        list_of_sentences = [sentence if len(re.findall("\\bTyr?\\b", sentence)) > 0 else None for sentence in sent_tokens_list]
    elif godname == "Idunn":
        list_of_sentences = [sentence if len(re.findall("I[td]h?unn?a", sentence)) > 0 else None for sentence in sent_tokens_list]
    elif godname == "Jormungand":
        list_of_sentences = [sentence if len(re.findall("((?:Jormungandr?)|(?:Midgard [Ss]erpent))", sentence)) > 0 else None for sentence in sent_tokens_list]
    elif godname == "Hel":
        list_of_sentences = [sentence if len(re.findall("\\bHela?\\b", sentence)) > 0 else None for sentence in sent_tokens_list]
    else:
        list_of_sentences = [sentence if len(re.findall(godname, sentence)) > 0 else None for sentence in sent_tokens_list]        
    list_of_sentences = list(filter(None, list_of_sentences))
    god_scores = [scores[sent_tokens_list.index(sentence)] for sentence in list_of_sentences]
    return({"sentences": list_of_sentences, "scores": god_scores})

gods_df = {}
giants_df = {}
monsters_df = {}

for god in gods:
    gods_df[god] = god_scores(god)

for giant in giants:
    giants_df[giant] = god_scores(giant)
    
for monster in monsters:
    monsters_df[monster] = god_scores(monster)

In [7]:
raw_myth_df = {
    "name": [],
    "status": [],
    "sentence": [],
    "positive": [],
    "neutral": [],
    "negative": [],
    "pos-neg": []
}

for each in [gods, giants, monsters]:
    each_df = [gods_df, giants_df, monsters_df][[gods, giants,monsters].index(each)]
    status = ["God", "Jotunn", "Monster"][[gods, giants,monsters].index(each)]
    for being in each:
        raw_myth_df["name"] += [being]*len(each_df[being]["scores"])
        raw_myth_df["status"] += [status]*len(each_df[being]["scores"])
        raw_myth_df["sentence"] += each_df[being]["sentences"]
        for score in each_df[being]["scores"]:
            raw_myth_df["positive"] += [score["pos"]]
            raw_myth_df["neutral"] += [score["neu"]]
            raw_myth_df["negative"] += [score["neg"]]
            raw_myth_df["pos-neg"] += [score["pos"]-score["neg"]]

In [8]:
myth_df = pd.DataFrame(raw_myth_df)

for column in list(myth_df.columns)[-4:-1]:
    column_z = "z_" + column
    myth_df[column_z] = (myth_df[column] - myth_df[column].mean())/myth_df[column].std(ddof=0)

myth_df.sample(n=10)

Unnamed: 0,name,status,sentence,positive,neutral,negative,pos-neg,z_positive,z_neutral,z_negative
533,Heimdall,God,"For silence I pray all sacred children, great ...",0.403,0.597,0.0,0.403,3.154494,-1.791363,-0.674669
257,Thor,God,"I trust,"" concluded Thridi, ""that thou wilt no...",0.173,0.76,0.067,0.106,0.929105,-0.621283,-0.083719
175,Thor,God,"Thor did not come, being in the East, but his ...",0.194,0.806,0.0,0.194,1.132293,-0.291076,-0.674669
234,Thor,God,"Thor replied, that he would begin a drinking m...",0.0,1.0,0.0,0.0,-0.744774,1.101534,-0.674669
351,Frigga,God,She is entrusted with the toilette and slipper...,0.237,0.763,0.0,0.237,1.548344,-0.599748,-0.674669
613,Utgard-Loki,Jotunn,"""Utgard-Loki then asked Thor in what feats he ...",0.147,0.853,0.0,0.147,0.67754,0.046309,-0.674669
188,Thor,God,"Seven half-years I with Thora stayed, Hakon's ...",0.0,1.0,0.0,0.0,-0.744774,1.101534,-0.674669
528,Baldr,God,"He a hand will not wash, nor his head comb, er...",0.152,0.762,0.086,0.066,0.725918,-0.606926,0.083864
100,Odin,God,"First came Odin, accompanied by Frigga, the Va...",0.0,1.0,0.0,0.0,-0.744774,1.101534,-0.674669
99,Odin,God,"As soon as she alighted, Odin ordered four Ber...",0.0,1.0,0.0,0.0,-0.744774,1.101534,-0.674669


In [9]:
alignment_stats = pd.DataFrame({"name": gods+giants+monsters, 
                                "status": ["God"]*len(gods) + ["Jotunn"]*len(giants) + ["Monster"]*len(monsters)})
alignment_stats["avg positive"] = [np.mean(myth_df[myth_df["name"]==name]["positive"]) for name in alignment_stats["name"]]
alignment_stats["avg neutral"] = [np.mean(myth_df[myth_df["name"]==name]["neutral"]) for name in alignment_stats["name"]]
alignment_stats["avg negative"] = [np.mean(myth_df[myth_df["name"]==name]["negative"]) for name in alignment_stats["name"]]
alignment_stats["avg pos-neg"] = [np.mean(myth_df[myth_df["name"]==name]["pos-neg"]) for name in alignment_stats["name"]]
alignment_stats["avg z pos"] = [np.mean(myth_df[myth_df["name"]==name]["z_positive"]) for name in alignment_stats["name"]]
alignment_stats["avg z neu"] = [np.mean(myth_df[myth_df["name"]==name]["z_neutral"]) for name in alignment_stats["name"]]
alignment_stats["avg z neg"] = [np.mean(myth_df[myth_df["name"]==name]["z_negative"]) for name in alignment_stats["name"]]
alignment_stats = alignment_stats.sort_values(by="name").reset_index(drop=True)

In [10]:
alignment_stats

Unnamed: 0,name,status,avg positive,avg neutral,avg negative,avg pos-neg,avg z pos,avg z neu,avg z neg
0,Baldr,God,0.105571,0.799357,0.095071,0.0105,0.276693,-0.338761,0.163876
1,Bragi,God,0.1326,0.82352,0.04388,0.08872,0.538211,-0.165311,-0.287641
2,Fenrir,Monster,0.094556,0.795944,0.109611,-0.015056,0.170108,-0.363259,0.292118
3,Frey,God,0.090294,0.82675,0.082941,0.007353,0.128876,-0.142124,0.056885
4,Freyja,God,0.087686,0.815,0.097286,-0.0096,0.103638,-0.226471,0.183406
5,Frigga,God,0.104765,0.858118,0.037176,0.067588,0.268888,0.083045,-0.346767
6,Hati,Monster,0.039333,0.813,0.147833,-0.1085,-0.3642,-0.240827,0.629244
7,Heimdall,God,0.136417,0.783833,0.07975,0.056667,0.575139,-0.450198,0.028738
8,Hel,God,0.041431,0.876745,0.081804,-0.040373,-0.343901,0.216761,0.046854
9,Idunn,God,0.1019,0.8361,0.0621,0.0398,0.24117,-0.075006,-0.126937


In [11]:
alignments = alignment_stats[["name", "status"]].copy()
alignments["tmp"] = list(range(len(alignments)))
wikia_df["tmp"] = list(range(len(wikia_df)))
alignments = alignments.merge(wikia_df, on="tmp")
wikia_df = wikia_df.drop("tmp", axis=1)
alignments = alignments.drop("tmp", axis=1)
alignments["alignment, z"] = alignment_stats[["avg z pos", "avg z neu", "avg z neg"]].idxmax(axis=1).replace({"avg z pos": "Good", "avg z neu": "Neutral", "avg z neg": "Bad"})
alignments["alignment, avg"] = ["Good" if score > 0 else "Bad" for score in alignment_stats["avg pos-neg"].tolist()]
alignments

Unnamed: 0,name,status,"name, Marvel","alignment, Marvel","alignment, z","alignment, avg"
0,Baldr,God,Balder Odinson,Good,Good,Good
1,Bragi,God,Bragi,Good,Good,Good
2,Fenrir,Monster,Fenris Wolf,Bad,Bad,Bad
3,Frey,God,Frey,Good,Good,Good
4,Freyja,God,Freya,Good,Bad,Bad
5,Frigga,God,Frigga,Good,Good,Good
6,Hati,Monster,Hati,Bad,Bad,Bad
7,Heimdall,God,Heimdall,Good,Good,Good
8,Hel,God,Hela,Bad,Neutral,Bad
9,Idunn,God,Idunn,Good,Good,Good
