# Merge (JOIN) Exercise Solutions

When preparing to visualize survey data, it's common to separate the information out into three tables, which can be edited and updated more easily separately, but they eventually need to be joined (merged) together again.

This is a toy example with very small tables, so it's easier to see the results.

In [1]:
import pandas as pd

## Load in three data tables from Excel sheets

The three tables we need are stored as separate tabs/sheets in an Excel workbook named:

- **Demographics**: IDs and information about individual people who took the survey
- **Responses**: Both numerical and text responses to the survey questions in "tall" or "tidy" format. This is tied to the Demographics information by the Respondent_ID key field.
- **Questions**: Information about the questions, including the type of question and the wording. This is tied to the Responses by the Question_ID key field.

### Read data from `data/ToySurveyData.xlsx`

The basic form of the command you'll need is

```
dataframe_variable_name = read_excel('file_name', sheet_name='sheet_name')
```

In [2]:
demographics = pd.read_excel('data/ToySurveyData.xlsx', sheet_name='Demographics')
demographics

Unnamed: 0,Respondent_ID,State
0,1,NC
1,2,VA
2,3,MT


In [3]:
responses = pd.read_excel('data/ToySurveyData.xlsx', sheet_name='Responses')
responses

Unnamed: 0,Respondent_ID,Question_ID,Response_text,Response_numeric
0,1,Q1,Incredible,5
1,1,Q2,No,0
2,2,Q1,So so,3
3,2,Q2,Yes,1
4,3,Q1,Not great,2
5,3,Q2,No,0


In [4]:
questions = pd.read_excel('data/ToySurveyData.xlsx', sheet_name='Questions')
questions

Unnamed: 0,Question_ID,Question_wording,Question_type
0,Q1,How are your polo skills?,Likert
1,Q2,Are you scared of dolphins?,YesNo


## merge/JOIN the tables together

There are many ways we could go about this, but let's do it in stages. 

#### Demographics - Responses

- **Consider the Demographics table to be our primary table on the Left**
- **Do a LEFT JOIN using `merge()` with the Responses table**
- Explicitly state the key field `on=` which to merge
- Assign that to a variable called demog_resp

In [5]:
demog_resp = demographics.merge(responses, how='left', on='Respondent_ID')
demog_resp

Unnamed: 0,Respondent_ID,State,Question_ID,Response_text,Response_numeric
0,1,NC,Q1,Incredible,5
1,1,NC,Q2,No,0
2,2,VA,Q1,So so,3
3,2,VA,Q2,Yes,1
4,3,MT,Q1,Not great,2
5,3,MT,Q2,No,0


#### Demographics - Responses - Questions

- **Now consider the demog_resp table to be our primary table on the Left**
- **Do a LEFT JOIN using `merge()` with the Questions table**
- Explicitly state the key field `on=` which to merge
- Assign that to a variable called demog_resp_ques

In [6]:
demog_resp_ques = demog_resp.merge(questions, how='left', on='Question_ID')
demog_resp_ques

Unnamed: 0,Respondent_ID,State,Question_ID,Response_text,Response_numeric,Question_wording,Question_type
0,1,NC,Q1,Incredible,5,How are your polo skills?,Likert
1,1,NC,Q2,No,0,Are you scared of dolphins?,YesNo
2,2,VA,Q1,So so,3,How are your polo skills?,Likert
3,2,VA,Q2,Yes,1,Are you scared of dolphins?,YesNo
4,3,MT,Q1,Not great,2,How are your polo skills?,Likert
5,3,MT,Q2,No,0,Are you scared of dolphins?,YesNo


## Results

Your resulting table after both the `merge()` commands should look like this:

<img src='images/join_exercise_results.png' width=730>