# Moral Foundations Analysis

This case study is an alternate for those students who were unable to get approval from Twitter to be able to access the Twitter API.

About 900 tweets are provided for the analysis instead of them being retrieved real-time.

## Moral Foundations Theory

<img align="right" style="padding-left:10px; height: 55%; width: 55%" src="figures/political-camps-moral-foundations.png" ></a>

The second part of this case study is Moral Foundations Theory. Google the phrase or visit [MoralFoundations.org](https://moralfoundations.org/) to learn more about the theory (or watch [Jonathan Haidt's Ted Talk](https://www.ted.com/talks/jonathan_haidt_the_moral_roots_of_liberals_and_conservatives/)).

For the purpose of this exercise, you needn't learn much about the theory &mdash; we will be using the Moral Foundations team's list of words that connote different dimensions of morality. 

* **Care** connotes safety, peace, compassion, etc.
* **Harm**, Care's opposite, connotes war, fight, hurt, kill, suffer, etc.

In the coding, the above two moral opposites are called 'HarmVirtue' and 'HarmVice' respectively. Similarly for other moral dimensions. The categories and the words that belong to each are available on-line in a [Moral Foundations Dictionary](https://moralfoundations.org/wp-content/uploads/files/downloads/moral%20foundations%20dictionary.dic). The dictionary words often include &ast;s, for example `peace`&ast;, which could be peace, peaceful, peacefully, even peacenik!

To retrieve the Moral Foundations Dictionary we shall use the `requests` library, documentation available [here](https://requests.readthedocs.io/en/master/).

The tweets are available in `tweets.txt`.

In [None]:
%more tweets.txt

In [None]:
import requests
url = "https://moralfoundations.org/wp-content/uploads/files/downloads/moral%20foundations%20dictionary.dic"
r = requests.get(url)
content = r.text 
lines = content.split('\n')
groups = {}
codes = {}
reading_groups = False

for line in lines:
    line_ = ' '.join(line.split())
    if not line_: continue
    if line_.startswith('%'):
        reading_groups = not reading_groups
    else:
        line__ = line.split()
        if reading_groups:
            groups[line__[0]] = line__[1]
        else:
            codes[line__[0]] = ','.join(line__[1:])
# print (groups)
# print (codes)

### Code Design

The obvious design is to create a dictionary that maps each word &rArr; category. Then we could take each word in an incoming tweet and categorize it. _But the &ast;s throw a wrench into this idea!_ How would it categorize "He came peacefully" when a dictionary whose key was peace&ast; wouldn't match the word "peacefully!"

And if a dictionary won't solve the problem, won't looking up each tweet word without the benefit of a dictionary be too slow?

Our design will be based on a combination of strategies: using a dictionary with 3-letter keys, and having each key map to a (relatively short) list that could be scanned quickly. For example, one of the longest lists will be "dis": [discriminat&ast;, disproportion&ast;, dishonest, dissociate, disloyal&ast;, dissent&ast;, disrespect&ast;, disobe&ast;, dissident, disgust&ast;, disease&ast;]. A particular Python class for this is a `defaultdict`, shown [here by example](https://docs.python.org/3.8/library/collections.html#defaultdict-examples).

### TO-DO 1

Complete in the <span style="color:teal"> _# Fill in_ </span> regions of the code below so as to initialize a `defaultdict` that looks like this:
```
defaultdict(<class 'list'>, {'saf': [{'safe*': '01'}], 'pea': [{'peace*': '01'}], 'com': [{'compassion*': '01'}, {'communal': '05'}, {'commune*': '05'}, {'communit*': '05'}, {'communis*': '05'}, {'comrad*': '05'}, {'complian*': '07'}, {'command': '07'}, {'comply': '07'}, {'commendable': '11'}], 'emp': [{'empath*': '01'}], 'sym': [{'sympath*': '01'}], 'car': [{'care': '01'}, {'caring': '01'}], 'pro': [{'protect*': '01'}, {'protest': '08'}, {'profan*': '10'}, {'promiscu*': '10'}, {'prostitut*': '10'}, {'profligate': '10'}, {'proper': '11'}] ... etc ...
```

In [None]:
from collections import defaultdict
groups = {}
codes = defaultdict(list)
reading_groups = False

for line in lines:
    line_ = ' '.join(line.split())
    if not line_: continue
    if line_.startswith('%'):
        # Fill in 
    else:
        line__ = line.split()
        # fill in
        else:
            '''
            We have to iterate through the words in a tweet 
            To allow for fast lookups, we make a dictionary with 3-letter keys
            Ref: https://docs.python.org/3.3/library/collections.html#defaultdict-examples
            '''
            codes[line__[0][0:3]].append({line__[0]:(','.join(line__[1:]))})
print (codes)

Verify that the k-v pair "dis": [discriminat&ast;, disproportion&ast;, dishonest, dissociate, disloyal&ast;, dissent&ast;, disrespect&ast;, disobe&ast;, dissident, disgust&ast;, disease&ast;] is part of the `defaultdict`

## Write a function for finding categories of a word

We want a function `find_word_categories(word)` such that the **assert** statements at the end of the next cell all pass.

In [None]:
def find_word_categories(word):
    if word[0:3] in codes:
        matches = codes[word[0:3]]
        for match in matches:
            mks = list(match.keys())
            for mk in mks:
                if mk.endswith('*'):
                    prefix = mk[:-1]
                    if word[0:(1+len(prefix))] == prefix:
                        # print (word, 'matches', mk)
                        return match[mk].split(',')
                    else:
                        # print (word, 'does not match', mk)
                        continue
                else:
                    if word == mk:
                        return match[mk].split(',')
                    #print (mk)
                    #pass
    return []

assert (find_word_categories('hurt') == ['02'])
assert (find_word_categories('tradition') == ['07'])
assert (find_word_categories('disease') == ['10'])
assert (find_word_categories('@nytnational') == [])
assert (find_word_categories('preserve') == ['01', '07', '09'])

### `analyze_tweet` function

A definition of `analyze_tweet` is provided below, you don't need to change it.

In [None]:
import re

twitter_stream_table = str.maketrans(dict.fromkeys((string.punctuation + "…").replace('@','')))

def analyze_tweet(tweet_text):
    text_no_url = re.sub(r'\shttps?:\/\/.*[\r\n]*', '', tweet_text, flags=re.MULTILINE)
    stripped_text = text_no_url.translate(twitter_stream_table)
    tw_words = stripped_text.lower().split(' ')
    categories = []
    for tw_word in tw_words:
        cats = find_word_categories(tw_word)
        for cat in cats:
            categories.append({groups[cat]: tw_word})
    return categories

cats = analyze_tweet("A tradition can be many things — for some, it's food. For others, it's faith. For many, family. ")
assert ({'AuthorityVirtue': 'tradition'} in cats)
assert ({'IngroupVirtue': 'family'} in cats)
assert (2 == len(cats))
cats2 = analyze_tweet("Peloton's CEO wouldn't talk about the company's ad fiasco during a speech on Monday, instead discussing Maslow's hi… https://t.co/8foz5Cmig6")
assert (0 == len(cats2))

### TO-DO 2 

Read the provided tweets in `tweets.txt` and process them through the `analyze_tweet` function, printing only the tweets that mention a moral foundation we recognize. For example, a tweet <span style="color:orange"><em>At least 14 people were shot dead in an attack on a church in eastern Burkina Faso on Sunday morning, the governmen… https://t.co/eKBoPEZZUP</em></span> might be printed along with its analysis: [{'HarmVice': 'attack'}, {'PurityVirtue': 'church'}]. However, the tweet <span style="color:orange"><em>Peloton's CEO wouldn't talk about the company's ad fiasco during a speech on Monday, instead discussing Maslow's hi… https://t.co/8foz5Cmig6</em></span> shouldn't be printed at all. Each printed tweet should appear like so:
```
-------------tweet-------------
At least 14 people were shot dead in an attack on a church in eastern Burkina Faso on Sunday morning, the governmen… https://t.co/eKBoPEZZUP
[{'HarmVice': 'attack'}, {'PurityVirtue': 'church'}]
```

### TO-DO 3

Two of the lines in the code above are somewhat mysterious. Please explain what they do.
```
    text_no_url = re.sub(r'\shttps?:\/\/.*[\r\n]*', '', tweet_text, flags=re.MULTILINE)
    stripped_text = text_no_url.translate(twitter_stream_table)
```

## Last Question:

In <em>your opinion</em>, how well does the categorization track your sense of the moral values expressed in the tweets? Do you see any blatant biases?

_There is no right or wrong answer to this question. Your response will be judged by how well-reasoned it is._

## Follow Ups?

It would be cool to plot a graph of the (Virtue - Vice) values of the five foundations expressed in these tweets. 

The New York Times is reputed to be a center-left news source. How do the foundations expressed by this source compared with the foundations expressed by other sources? 

In case you were wondering, there are [serious objections](https://behavioralscientist.org/whats-wrong-with-moral-foundations-theory-and-how-to-get-moral-psychology-right/) to Moral Foundation Theory; which you may or may not find persuasive.