# Random Acts of Pizza (RAOP) Notes

**Source**: Althoff, T., Danescu-Niculescu-Mizil, C., & Jurafsky, D. (2014). *How to Ask for a Favor: A Case Study on the Success of Altruistic Requests*. Association for the Advancement of Artificial
Intelligence (www.aaai.org).

- "The community only publishes which users have given or received pizzas but not which requests were successful. 
In the case of successful users posting multiple times it is unclear which of the requests was actually successful. 
Therefore, we restrict our analysis to users with a single request for which we can be certain whether or not 
it was successful, leaving us with 5728 pizza requests. We split this dataset into development(70%) and test set (30%) 
such that both sets mirror the average success rate in our dataset of 24.6%. All features are developed on the 
development test only while the test set is used only once to evaluate the prediction accuracy of our proposed model on held-out data. For a small number of requests (379) we further observe the identity of the benefactor through a 
'thank you' post by the beneficiary after the successful request. This enables us to reason about the impact of 
user similarity on giving."


- "It is extremely difficult to disentangle the effects of all these factors in determining what makes people satisfy requests, and what makes them select some requests over others. . . In this paper, we develop a framework for controlling for each of these potential confounds while studying the role of two aspects that characterize compelling requests: **social factors** (who is asking and how the recipient is related to the donor and community) and **linguistic factors** (how they are asking and what linguistic devices accompany successful requests). With the notable exception of Mitra and Gilbert (2014), the effect of language on the success of requests has largely been ignored thus far."


- "[Their] goal is to understand what motivates people to give when they do not receive anything tangible in return. That is, [they] focus on the important special case of altruistic requests in which the giver receives no rewards." **DSC**: But how do you know people don't want something in return, especially if they are more likely to help requesters who have high status or are more similar to them?

-----

Temporal Factors
- Specific months
- Weekdays
- **Days of the month (first half of the month)**
- Hour of the day
- **Community age of the request (earlier the better)**

Textual Factors
- Politeness (e.g., **gratitude**)
- **Evidentiality** (2nd largest parameter estimate)
- Reciprocity (respond to a positive action with another positive action, **pay it forward**)
- Sentiment (e.g., **urgency**)
- **Length**

Social Factors
- **Status**
    - karma points (up-votes minus down-votes) that Reddit counts on link submissions and comments,
    - user has posted on RAOP before and thus could be considered a member of the sub-community. 
    - **user account age based on the hypothesis that “younger” accounts might be less trusted**


- Similarity: intersection size between the set of the giver and receiver, and the Jaccard similarity (intersection
over union) of the two. NOT included in logistic regression model.

Narratives (identified through topic modeling)
- **Desire**
- **Family**
- **Job**
- **Money**
- Student

-----

Conclusion
- Drawing from social psychology literature [they] extract high-level social features from text that operationalize the relation between recipient and donor and demonstrate that these extracted relations are predictive of success. 
- [They] show that [they] can detect key narratives automatically that have significant impact on the success of the request. 
- [They] further demonstrate that linguistic indications of gratitude, evidentiality, and reciprocity, as well as the high status of the asker, all increase the likelihood of success, while neither politeness nor positive sentiment seem to be associated with success in [the] setting.

Limitations
- A shortcoming of any case study is that findings might be specific to the scenario at hand. While [they] have shown that particular linguistic and social factors differentiate between successful and unsuccessful requests [they] cannot claim a causal relationship between the proposed factors and success that would guarantee success. 
- Furthermore, the set of success factors studied in this work is likely to be incomplete as well and excludes,
for instance, group behavior dynamics. 
- Despite these limitations, [they] hope that this work and the data [they] make available will provide a basis for further research on success factors and helping behavior in other online communities.

-----

In [63]:
# This tells matplotlib not to try opening a new window for each plot.
%matplotlib inline

# General libraries.
import codecs
import json
import csv

import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# SK-learn libraries for learning.
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.grid_search import GridSearchCV

# SK-learn libraries for evaluation.
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import classification_report

# SK-learn library for importing the newsgroup data.
from sklearn.datasets import fetch_20newsgroups

# SK-learn libraries for feature extraction from text.
from sklearn.feature_extraction.text import *

In [64]:
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
# Convert a JSON string to pandas object

X = pd.read_json('./pizza_request_dataset.json')
#print X.head()
#print X.describe()
#print

'''
shuffle = np.random.permutation(np.arange(d.shape[0]))

print shuffle.max()

print X['giver_username_if_known'][:2]
X = X.sample(frac=1) #.reset_index(drop=True)
print X['giver_username_if_known'][:2]

#X, Y = X[shuffle], Y[shuffle]
'''

np.random.seed(0)
msk = np.random.rand(len(X)) <= 0.7
X_train_data = X[msk]
X_dev_data = X[~msk]

Y_train_labels = X_train_data[["requester_received_pizza"]]
#Y.describe()

del X_train_data["requester_received_pizza"]

print list(Y_train_labels)
print Y_train_labels.shape
print
print list(X_train_data)
print X_train_data.shape
print

Y_dev_labels = X_dev_data[["requester_received_pizza"]]
#Y.describe()

del X_dev_data["requester_received_pizza"]

print list(Y_dev_labels)
print Y_dev_labels.shape
print
print list(X_dev_data)
print X_dev_data.shape
print

print np.mean(Y_train_labels)
print
print np.mean(Y_dev_labels)
print

[u'requester_received_pizza']
(3975, 1)

[u'giver_username_if_known', u'in_test_set', u'number_of_downvotes_of_request_at_retrieval', u'number_of_upvotes_of_request_at_retrieval', u'post_was_edited', u'request_id', u'request_number_of_comments_at_retrieval', u'request_text', u'request_text_edit_aware', u'request_title', u'requester_account_age_in_days_at_request', u'requester_account_age_in_days_at_retrieval', u'requester_days_since_first_post_on_raop_at_request', u'requester_days_since_first_post_on_raop_at_retrieval', u'requester_number_of_comments_at_request', u'requester_number_of_comments_at_retrieval', u'requester_number_of_comments_in_raop_at_request', u'requester_number_of_comments_in_raop_at_retrieval', u'requester_number_of_posts_at_request', u'requester_number_of_posts_at_retrieval', u'requester_number_of_posts_on_raop_at_request', u'requester_number_of_posts_on_raop_at_retrieval', u'requester_number_of_subreddits_at_request', u'requester_subreddits_at_request', u'requester_

In [65]:
'''
def read_dataset(path):
    with codecs.open(path, 'r', 'utf-8') as myFile:
        content = myFile.read()
    dataset = json.loads(content)
    return dataset

if __name__ == '__main__':
    path = './pizza_request_dataset.json'
    dataset = read_dataset(path)
    
    print 'The dataset contains %d samples.' %(len(dataset))
    print
    print 'Available attributes: ', sorted(dataset[0].keys())
    print
    
    for i in range(3):
        print '---------'
        print 'Post:', i
        print '---------'
        print json.dumps(dataset[i], sort_keys=True, indent=2)
        print
    
    successes = [r['requester_received_pizza'] for r in dataset]
    success_rate = 100.0 * sum(successes) / float(len(successes))
    print 'The average success rate is: %.2f%%' %(success_rate)
    print

def read_dataset_pd(path):
    with codecs.open(path, 'r', 'utf-8') as myFile:
        content = myFile.read()
        
    json_data = json.load(open(content))
    df = pandas.io.json.json_normalize(json_data)
    print df.head()
    
read_dataset_pd(path)

#read_dataset_pd('./pizza_request_dataset.json')

def read_narrative(path, vector):
    temp = open(path)
    vector = []
    for word in temp:
        vector.append(word.rstrip())
    print vector
    print
    
#read_narrative('./narratives/desire.txt', desire)
'''

"\ndef read_dataset(path):\n    with codecs.open(path, 'r', 'utf-8') as myFile:\n        content = myFile.read()\n    dataset = json.loads(content)\n    return dataset\n\nif __name__ == '__main__':\n    path = './pizza_request_dataset.json'\n    dataset = read_dataset(path)\n    \n    print 'The dataset contains %d samples.' %(len(dataset))\n    print\n    print 'Available attributes: ', sorted(dataset[0].keys())\n    print\n    \n    for i in range(3):\n        print '---------'\n        print 'Post:', i\n        print '---------'\n        print json.dumps(dataset[i], sort_keys=True, indent=2)\n        print\n    \n    successes = [r['requester_received_pizza'] for r in dataset]\n    success_rate = 100.0 * sum(successes) / float(len(successes))\n    print 'The average success rate is: %.2f%%' %(success_rate)\n    print\n\ndef read_dataset_pd(path):\n    with codecs.open(path, 'r', 'utf-8') as myFile:\n        content = myFile.read()\n        \n    json_data = json.load(open(content))\