# Lab 7 - Summarizing a Health Survey

## Background

The file health_survey.csv contains the responses to a series of health-related questions.  Dr. Bergen, Director of the Statistical Consulting Center at WSU, needs you to prepare the attached data for analysis.  Please perform the following steps to prepare the required csv file.

Dr. Bergen had a follow-up meeting with his client and it was determined that we need to redo the file construction from an earlier assignment.  Recall that the file `health_survey.csv` contains the responses to a series of health-related questions. We need to code the responses as 1-5 using the definition below.   Some of the columns need a reverse coding (see the *Needs Reverse Coding?* column in `ReverseCodingItems.csv`.

The following table describes the coding that should be used for both types of questions.

|Old Label                     |New Coded Value  |Reverse Coding
|------------------------------|-----------------|----------------
|"Strongly Disagree"           |1                |5
|"Somewhat Disagree"           |2                |4
|"Neither Agree nor Disagree"  |3                |3
|"Somewhat Agree"              |4                |2
|"Strongly Agree"              |5                |1




## Tasks 

#### Task 1  

Look at the questions that need reverse coding and explain why it makes sense to reverse the coding on these items.

> *It makes sense to reverse code because they are "negatively" worded questions*

#### Task 2 

You will need to redo the file construction, but now need to take the reverse coding into account. **For each step, paste a screenshot of the JMP dialog or formula associated with the outcome.**

1.  *Stack* the columns.

![](img/survey/image1.png)

In [29]:
import pandas as pd
from dfply import *
survey = pd.read_csv("./data/health_survey.csv")
survey.head()

Unnamed: 0,person_id,F1,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,...,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
0,1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree
2,3,Strongly Agree,Neither Agree nor Disagree,Somewhat Agree,Strongly Agree,Strongly Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Strongly Disagree,Somewhat Agree
3,4,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Agree,Strongly Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Disagree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Disagree,Somewhat Agree
4,5,Strongly Agree,Strongly Disagree,Neither Agree nor Disagree,Strongly Agree,Somewhat Agree,Strongly Disagree,Strongly Agree,Somewhat Agree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Disagree,Somewhat Agree


2.  Read in and join `ReverseCodingItems.csv` to add a new column called `NeedsReverse` to the health survey dataframe. 

![](img/survey/image2.png)

In [30]:
reverse_names = ['Question', 'Construct', 'Question_num',
       'Needs_Reverse_Coding', 'Column_Name']

In [31]:
reverse = pd.read_csv("./data/ReverseCodingItems.csv", names = reverse_names)
reverse.head()

Unnamed: 0,Question,Construct,Question_num,Needs_Reverse_Coding,Column_Name
0,Question,Construct,Question # on Qualtrics Survey,Needs Reverse Coding?,Column Name
1,"In the future, I plan to participate in a comm...",1,1,No,F1
2,Individuals are responsible for their own misf...,5,2,Yes,F5
3,When tryng to understand the position of other...,2,3,No,F2
4,I plan to become involved in my community,1,4,No,F1.1


3.  Make a new column called `TempCodedValue` by recoding the `Question`s column.

![](img/survey/image3.png)

In [32]:
# Your code here

4.  Make a new column called `TempCodedValue` by recoding the `Question`s column.

![](img/survey/image4.png)

In [33]:
# Your code here

5.  Make a new column called `RecodedValue` that holds the correct
    value for each question based on the value in `NeedsReverse`.

![](img/survey/image5.png)

In [34]:
# Your code here

6.  Make a new column by *Recoding* the Question Types to *F1, F2, ..., F6. *

![](img/survey/image6.png)

In [35]:
# Your code here

7.  *Aggregate* and *Unstack*.

![](img/survey/image7.png)
![](img/survey/image8.png)

In [36]:
# Your code here

#### Task 3

Repackage all of your code in one pipe then write the final output to `health_survey_summary.csv`


In [55]:
response_dict = {'Strongly Disagree': 1,
                 'Somewhat Disagree': 2,
                 'Neither Agree nor Disagree': 3,
                 'Somewhat Agree': 4,
                 'Strongly Agree': 5}

reverse_dict = {1:5,
                2:4,
                3:3,
                4:2,
                5:1}

In [58]:
from more_dfply import recode, ifelse
health_survey_updated =(survey
 >>gather("Column_Name", 
          "Response", 
          columns_from('F1')) 
 >> left_join(reverse, by='Column_Name')
 >> mutate(response_val = recode(X.Response, response_dict))
 >> mutate(tempCodedValue = recode(X.response_val, reverse_dict))
 >> mutate(RecodedValue = ifelse(X.Needs_Reverse_Coding == "Yes", X.tempCodedValue, X.response_val))
 >> drop(X.Question, X.Response, X.Question, X.Question_num, X.Needs_Reverse_Coding, X.response_val, X.tempCodedValue)
 >> group_by(X.Construct, X.person_id)
 >> drop(X.Column_Name)
 >>summarize(avg_value = (X.RecodedValue).sum())
 >> spread(X.Construct, X.avg_value)
)
health_survey_updated.head()


Unnamed: 0,person_id,1,2,3,4,5,6
0,1,31.0,48.0,20.0,17.0,28.0,18.0
1,2,31.0,47.0,19.0,17.0,27.0,20.0
2,3,36.0,46.0,19.0,18.0,32.0,17.0
3,4,32.0,54.0,12.0,15.0,30.0,16.0
4,5,37.0,47.0,22.0,19.0,36.0,19.0


In [50]:
health_survey_updated.to_csv("./data/health_survey_summary.csv", header = True, index = None)
