# Final Project

## I chose a python project on GitHub: nprapps/worldvalues

https://github.com/nprapps/worldvalues

## It's a data analysis on World Values Survey. 

# What is World Values Survey? 

The World Values Survey (www.worldvaluessurvey.org) is a global network of social scientists studying changing values and their impact on social and political life, led by an international team of scholars, with the WVS association and secretariat headquartered in Stockholm, Sweden. 

The survey started in 1981. The WVS consists of nationally representative surveys conducted in almost 100 countries which contain almost 90 percent of the world’s population, using a common questionnaire. 

The WVS is the largest non-commercial, cross-national, time series investigation of human beliefs and values ever executed, currently including interviews with almost 400,000 respondents. Moreover the WVS is the only academic study covering the full range of global variations, from very poor to very rich countries, in all of the world’s major cultural zones.

## What did NPR do with the data?

There are more than 200 questions asked in the questionarire. 

## And these are the 20 questions NPR picked out to do  analysis.

V12-V22. what qualities do you encourage in your children?

V45. When jobs are scarce, men should have more right to a job than women.

V47. If a woman earns more money than her husband, it's almost certain to cause problems

V48. Having a job is the best way for a woman to be an independent person.

V50. When a mother works for pay, the children suffer.

V51. On the whole, men make better political leaders than women do.

V52. A university education is more important for a boy than for a girl.

V53. On the whole, men make better business executives than women do.

V54. Being a housewife is just as fulfilling as working for pay

V80. I¹m going to read out some problems. Please indicate which of the following problems you consider the most serious one for the world as a whole? (Discrimination against girls and women)

V123. I am going to name a number of organizations. For each one, could you tell me how much confidence you have in them: (Women¹s organizations)

V139. Please tell me for each of the following things how essential you think it is as a characteristic of democracy. (Women have the same rights as men.)

V168. Companies that employ young people perform better than those that employ people of different ages.

V182. To what degree are you worried about the following situations? (Not being able to give my children a good education)

V203A. Prostitution

V204. Abortion

V205. Divorce

V206. Sex before marriage

V207. Suicide

V208. For a man to beat his wife

V209. Parents beating children

V240. Sex of respondent

V241. Respondent's birth year.

V242. Age

V250. Do you live with your parents?

# What's the story that came out of it? 

NPR used the survey result to these questions to explore the lives of 15 year-old girls around the world.

http://www.npr.org/sections/goatsandsoda/2015/10/20/448407788/where-the-girls-are-and-aren-t-15girls


## Download the folder from NPR's repository
## Download and install everything as the README.md says. 
## After that, install three modules are not mentioned in the requirements file. 
    pip install dataset
    pip install psycopg2
    pip install csvkit
    
## Run ./process.sh

In [None]:
#!/bin/bash

echo "Import data"
./import.sh

echo "Summarize World Values"
./summarize_agreement.py
./summarize_questions.py > output/question_index.txt


## ./summarize_agreement.py

In [None]:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import dataset

from db import query, initialize_counts, get_country_list
from collections import OrderedDict


ANALYSIS_QUESTIONS = ['v52', 'v45', 'v51']
ANALYSIS_COUNTRIES = ['India', 'Pakistan', 'Nigeria', 'China', 'Brazil', 'United States']


def _get_counts(result, question_id):
    counts = OrderedDict()
    for row in result:
        if not row['country'] in counts.keys():
            counts[row['country']] = initialize_counts(question_id)
        counts[row["country"]][row["response"]] += 1
    return counts


def process_mentioned(question, result, countries):
    counts = _get_counts(result, question['question_id'])
    key = '{0} {1} (% mentioned)'.format(question['question_id'], question['label'])

    for country, data in countries.items():
        data[key] = None

    for country, results in counts.items():
        if country not in countries.keys():
            continue

        total = 0

        for count in results.values():
            total += count

        countries[country][key] = float(results['Mentioned']) / float(total)


def process_agree_3way(question, result, countries):
    counts = _get_counts(result, question['question_id'])
    key = '{0} {1} (% agree)'.format(question['question_id'], question['label'])

    for country, data in countries.items():
        data[key] = None

    for country, results in counts.items():
        if country not in countries.keys():
            continue

        total = 0

        for count in results.values():
            total += count

        countries[country][key] = float(results['Agree']) / float(total)


def process_agree_4way(question, result, countries):
    counts = _get_counts(result, question['question_id'])
    key = '{0} {1} (% agree strongly and agree)'.format(question['question_id'], question['label'])

    for country, data in countries.items():
        data[key] = None

    for country, results in counts.items():
        if country not in countries.keys():
            continue

        total = 0

        for count in results.values():
            total += count

        countries[country][key] = (float(results['Agree']) + float(results['Agree strongly'])) / float(total)


def process_likert(question, result, countries):
    counts = _get_counts(result, question['question_id'])
    key = '{0} {1} (% favorable [#5-#10])'.format(question['question_id'], question['label'])

    for country, data in countries.items():
        data[key] = None

    for country, results in counts.items():
        if country not in countries.keys():
            continue

        total = 0

        for count in results.values():
            total += count

        favorable = sum(results.values()[5:10])

        countries[country][key] = float(favorable) / float(total)


def summarize_agreement():
    """
    Summarize agreement levels
    """
    country_list = get_country_list()
    countries = OrderedDict()
    for country in country_list:
        if country in ANALYSIS_COUNTRIES:
            countries[country] = OrderedDict((('country', country),))

    for question_id in ANALYSIS_QUESTIONS:
        question, result = query(question_id)

        if question['question_type'] == 'mentioned':
            process_mentioned(question, result, countries)

        if question['question_type'] == 'agree_3way':
            process_agree_3way(question, result, countries)

        if question['question_type'] == 'agree_4way':
            process_agree_4way(question, result, countries)

        if question['question_type'] == 'likert':
            process_likert(question, result, countries)

    dataset.freeze(countries.values(), format='csv', filename='output/agreement_summary.csv')


if __name__ == '__main__':
    summarize_agreement()


### ./process.sh will give you the summary output for all the questions NPR analyzed. Check output directory in csv format. 

## I used python agate to visualize the data summaries. 

In [None]:
import agate
agreement_summary = agate.Table.from_csv('output/agreement_summary.csv')


In [None]:
|------------------------------------------------------------------------------------------------------+------------|
|  column                                                                                              | data_type  |
|------------------------------------------------------------------------------------------------------+------------|
|  country                                                                                             | Text       |
|  v52 A university education is more important for a boy than for a girl (% agree strongly and agree) | Number     |
|  v45 When jobs are scarce, men should have more right to a job than women (% agree)                  | Number     |
|  v51 On the whole, men make better political leaders than women do (% agree strongly and agree)      | Number     |
|------------------------------------------------------------------------------------------------------+------------|

In [None]:
ANALYSIS_COUNTRIES = ['India', 'Pakistan', 'Nigeria', 'China', 'Brazil', 'United States']

Where half the world's teen live

In [None]:
sorted_by_education = agreement_summary.order_by('v52 A university education is more important for a boy than for a girl (% agree strongly and agree)')
sorted_by_education.print_table(max_columns=2)

|----------------+----------------------+------|
|  country       | v52 A university ... | ...  |
|----------------+----------------------+------|
|  United States |  0.06720430107526881 | ...  |
|  Brazil        |  0.09286675639300135 | ...  |
|  China         |  0.22260869565217392 | ...  |
|  Nigeria       |  0.42296759522455940 | ...  |
|  Pakistan      |  0.50583333333333340 | ...  |
|  India         |  0.61986084756483240 | ...  |
|----------------+----------------------+------|

In [None]:
sorted_by_job = agreement_summary.order_by('v45 When jobs are scarce, men should have more right to a job than women (% agree)', reverse = True)
sorted_by_job.print_table(max_columns=3)

|----------------+----------------------+----------------------+------|
|  country       | v52 A university ... | v45 When jobs are... | ...  |
|----------------+----------------------+----------------------+------|
|  Pakistan      |  0.50583333333333340 | 0.735833333333333300 | ...  |
|  Nigeria       |  0.42296759522455940 | 0.614553723706651500 | ...  |
|  India         |  0.61986084756483240 | 0.507273877292852700 | ...  |
|  China         |  0.22260869565217392 | 0.373043478260869600 | ...  |
|  Brazil        |  0.09286675639300135 | 0.160834454912516830 | ...  |
|  United States |  0.06720430107526881 | 0.056899641577060935 | ...  |
|----------------+----------------------+----------------------+------|

In [None]:
sorted_by_political_leader = agreement_summary.order_by('v51 On the whole, men make better political leaders than women do (% agree strongly and agree)', reverse = True)
sorted_by_political_leader.print_table()

|----------------+----------------------+----------------------+-----------------------|
|  country       | v52 A university ... | v45 When jobs are... | v51 On the whole,...  |
|----------------+----------------------+----------------------+-----------------------|
|  Nigeria       |  0.42296759522455940 | 0.614553723706651500 |  0.74928936895963620  |
|  Pakistan      |  0.50583333333333340 | 0.735833333333333300 |  0.72250000000000000  |
|  India         |  0.61986084756483240 | 0.507273877292852700 |  0.62618595825426950  |
|  China         |  0.22260869565217392 | 0.373043478260869600 |  0.48565217391304350  |
|  Brazil        |  0.09286675639300135 | 0.160834454912516830 |  0.27052489905787347  |
|  United States |  0.06720430107526881 | 0.056899641577060935 |  0.18548387096774194  |
|----------------+----------------------+----------------------+-----------------------|

I couldn't figure out how to use agate to visualize. But I think this makes good bar chart material.

## Inspiration: 

http://blog.apps.npr.org/2015/10/20/world-values-parser.html