# Immigrant Project 

### Cross-Generational Differences in Hispanic Outcomes

1.	Cross-Generational Differences in Hispanic Outcomes, 1994-2016: use main_national again, and plot the values for **white-All; Hispanic-All; Hispanic, 1st; Hispanic, 2nd; and Hispanic, 3rd** for each outcome listed above other than gen1-gen3. 
2.	Cross-Generational Differences in Hispanic Outcomes in Top Immigration States, 1994-2016: similar to (2) broken down by state, side-by-side would be great other than gen1-gen3.

  * LTHS: “% less than HS diploma”
  * College: “% with college degree”
  *	Hinsured: “% with health insurance”
  * rincp_all: “Average individual real income”
  *	employed: “% employed”
  *	married2: “% married”
  *	children: “% with children”
  * poverty; (% of families under the poverty line)
  * age; (Average age)
  * rinch_all; (median household income);

## Clean Data

In [1]:
import pandas as pd
import os
import plotly.graph_objs as go
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State

In [2]:
os.chdir(r'H:\CALDER\CALDER Data Visualizations\Data\Immigrant Project')

In [3]:
state = pd.read_csv('main_topstates.csv')

In [4]:
state = state.sort_values(by=['year', 'state', 'wbhao'])
state.head(9)

Unnamed: 0,year,wbhao,state,igen,lths,college,hinsured,employed,married2,children,gen1,gen2,gen3,rincp_all
23,1994,Hispanic,California,1st Generation,0.578281,0.069639,0.559158,0.660304,0.708181,0.782131,,,,16157.772
35,1994,Hispanic,California,2nd Generation,0.229561,0.090034,0.828209,0.766322,0.616886,0.55538,,,,27726.738
47,1994,Hispanic,California,3rd Generation,0.091974,0.077231,0.848449,0.791233,0.636506,0.687663,,,,33365.801
59,1994,Hispanic,California,All,0.423552,0.074302,0.660462,0.703786,0.67938,0.728175,0.639306,0.150298,0.210396,19389.326
22,1994,White,California,1st Generation,0.10951,0.413616,0.819607,0.717939,0.76781,0.579526,,,,32477.123
34,1994,White,California,2nd Generation,0.058566,0.388762,0.862976,0.792943,0.634908,0.521915,,,,41346.125
46,1994,White,California,3rd Generation,0.043372,0.321261,0.842302,0.803886,0.630651,0.49157,,,,41740.375
58,1994,White,California,All,0.050509,0.335425,0.842201,0.795411,0.642978,0.50198,0.087061,0.09071,0.822229,41178.082
19,1994,Hispanic,Florida,1st Generation,0.298869,0.134267,0.574033,0.736406,0.655011,0.568187,,,,19066.172


In [5]:
# Rename Columns
pre = ['lths', 'college', 'hinsured', 'employed', 'married2', 'children', 'gen1', 'gen2', 'gen3','rincp_all']
post = ["% less than High School diploma", "% with College Degree", "% with Health Insurance", "% Employed", "% Married", 
        "% with Children", "Share of 1st Generation", "Share of 2nd Generation", "Share of 3rd Generation", 
        "Median Individual Real Income"]
for i in range(0, len(pre)):   
    state.rename(columns={pre[i]: post[i]}, inplace=True)

state.columns

Index(['year', 'wbhao', 'state', 'igen', '% less than High School diploma',
       '% with College Degree', '% with Health Insurance', '% Employed',
       '% Married', '% with Children', 'Share of 1st Generation',
       'Share of 2nd Generation', 'Share of 3rd Generation',
       'Median Individual Real Income'],
      dtype='object')

In [6]:
state['wbhao_igen'] = state['wbhao'] + ' ' + state['igen']

In [7]:
state = state.drop(state[(state.wbhao_igen == 'White 1st Generation') | (state.wbhao_igen == 'White 2nd Generation')
                       | (state.wbhao_igen == 'White 3rd Generation')].index)

In [8]:
state = state.drop(['Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation'], axis=1)

## Append "main_national" data set

You will be able to select the national data along with the state data using the "state" column. 

### Import and Clean "top_state" data

In [9]:
nat = pd.read_csv('main_national.csv')

In [10]:
nat = nat.sort_values(by=['year', 'wbhao', 'igen'])
nat.head(9)

Unnamed: 0,year,wbhao,igen,lths,college,hinsured,employed,married2,children,gen1,gen2,gen3,rincp_all
3,1994,Hispanic,1st Generation,0.50553,0.098835,0.563049,0.678571,0.689724,0.725421,,,,16157.772
5,1994,Hispanic,2nd Generation,0.271175,0.117982,0.756655,0.671149,0.581021,0.590973,,,,24236.658
7,1994,Hispanic,3rd Generation,0.186292,0.096557,0.780923,0.782869,0.649603,0.658164,,,,27468.213
9,1994,Hispanic,All,0.380639,0.102503,0.655657,0.700982,0.656673,0.680479,0.550558,0.218982,0.23046,19389.326
2,1994,White,1st Generation,0.117878,0.373306,0.819465,0.733179,0.760921,0.530015,,,,30804.793
4,1994,White,2nd Generation,0.055666,0.354645,0.868807,0.807859,0.672146,0.485741,,,,38959.621
6,1994,White,3rd Generation,0.080657,0.276775,0.8636,0.81362,0.7001,0.532252,,,,35547.102
8,1994,White,All,0.080565,0.284902,0.862277,0.810314,0.700705,0.529443,0.036895,0.05863,0.904475,35547.102
11,1995,Hispanic,1st Generation,0.517231,0.088316,0.532448,0.682456,0.705492,0.728825,,,,17410.0


In [11]:
# Rename Columns
pre = ['lths', 'college', 'hinsured', 'employed', 'married2', 'children', 'gen1', 'gen2', 'gen3','rincp_all']
post = ["% less than High School diploma", "% with College Degree", "% with Health Insurance", "% Employed", "% Married", 
        "% with Children", "Share of 1st Generation", "Share of 2nd Generation", "Share of 3rd Generation", 
        "Median Individual Real Income"]
for i in range(0, len(pre)):   
    nat.rename(columns={pre[i]: post[i]}, inplace=True)

nat.columns

Index(['year', 'wbhao', 'igen', '% less than High School diploma',
       '% with College Degree', '% with Health Insurance', '% Employed',
       '% Married', '% with Children', 'Share of 1st Generation',
       'Share of 2nd Generation', 'Share of 3rd Generation',
       'Median Individual Real Income'],
      dtype='object')

In [12]:
nat['wbhao_igen'] = nat['wbhao'] + ' ' + nat['igen']

In [13]:
nat = nat.drop(nat[(nat.wbhao_igen == 'White 1st Generation') | (nat.wbhao_igen == 'White 2nd Generation')
                       | (nat.wbhao_igen == 'White 3rd Generation')].index)

In [14]:
nat = nat.drop(['Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation'], axis=1)

In [15]:
nat['state'] = 'National'

### Append

In [16]:
append = nat.append(state)

In [17]:
append = append.sort_values(by=['year', 'wbhao', 'igen'])

In [18]:
append.head(8)

Unnamed: 0,% Employed,% Married,% less than High School diploma,% with Children,% with College Degree,% with Health Insurance,Median Individual Real Income,igen,state,wbhao,wbhao_igen,year
3,0.678571,0.689724,0.50553,0.725421,0.098835,0.563049,16157.772,1st Generation,National,Hispanic,Hispanic 1st Generation,1994
23,0.660304,0.708181,0.578281,0.782131,0.069639,0.559158,16157.772,1st Generation,California,Hispanic,Hispanic 1st Generation,1994
19,0.736406,0.655011,0.298869,0.568187,0.134267,0.574033,19066.172,1st Generation,Florida,Hispanic,Hispanic 1st Generation,1994
17,0.720504,0.709687,0.586988,0.692047,0.049495,0.543079,16157.772,1st Generation,Illinois,Hispanic,Hispanic 1st Generation,1994
15,0.65962,0.631717,0.346576,0.606105,0.073918,0.575528,20843.527,1st Generation,New Jersey,Hispanic,Hispanic 1st Generation,1994
13,0.589808,0.545578,0.344917,0.655887,0.108126,0.681434,16157.772,1st Generation,New York,Hispanic,Hispanic 1st Generation,1994
21,0.760547,0.701864,0.601678,0.707707,0.143495,0.421037,14541.995,1st Generation,Texas,Hispanic,Hispanic 1st Generation,1994
5,0.671149,0.581021,0.271175,0.590973,0.117982,0.756655,24236.658,2nd Generation,National,Hispanic,Hispanic 2nd Generation,1994


## Graph

In [23]:
app = dash.Dash()

app.css.append_css({"external_url": "https://codepen.io/chriddyp/pen/bWLwgP.css"}) 

df = append

states = list(df['state'].unique())
outcomes = ["% less than High School diploma", "% with College Degree", "% with Health Insurance", "% Employed", 
            "% Married", "% with Children", "% of Families under the Poverty Line", "Median Individual Real Income"]

# Organize where items will be on the page
app.layout = html.Div([
        html.H3(
            children='Cross-Generational Differences in Hispanic Outcomes, 1994-2016',
            style={
                'textAlign': 'center', 'fontFamily' : 'Georgia'
            }
        ),
        html.Div([
            html.Div([
                    html.Center([          
                        html.Div([
                            html.Div([html.P('Select State',id='state-title1')],
                                style={'textAlign': 'center', 'fontFamily': 'Georgia'}),
                            dcc.Dropdown(
                                id='state-id1',
                                options=[{'label': i, 'value': i} for i in states],
                                value='California')
                            ],style={'width': '40%','textAlign': 'center', 'fontFamily': 'Georgia', 'display': 'inline-block'}),        
                        html.Div([
                            html.Div([html.P('Select Outcome',id='outcome-title1')],
                                style={'textAlign': 'center', 'fontFamily': 'Georgia'}),
                            dcc.Dropdown(
                                id='outcome-id1',
                                options=[{'label': i, 'value': i} for i in outcomes],
                                value='% less than High School diploma')
                            ],style={'width': '40%','textAlign': 'center', 'fontFamily': 'Georgia', 'display': 'inline-block'}),
                        ]),

                dcc.Graph(id='indicator-graphic1',
                          config={'modeBarButtonsToRemove': ['sendDataToCloud', 'lasso2d', 'zoomIn2d', 'zoomOut2d', 'pan2d', 
                                                             'zoom2d','resetScale2d'], 
                                'displaylogo': False})
                ], style={'width': '50%', 'display': 'inline-block'}),  
            html.Div([
                    html.Center([          
                        html.Div([
                            html.Div([html.P('Select State',id='state-title2')],
                                style={'textAlign': 'center', 'fontFamily': 'Georgia'}),
                            dcc.Dropdown(
                                id='state-id2',
                                options=[{'label': i, 'value': i} for i in states],
                                value='Texas')
                            ],style={'width': '40%','textAlign': 'center', 'fontFamily': 'Georgia', 'display': 'inline-block'}),        
                        html.Div([
                            html.Div([html.P('Select Outcome',id='outcome-title2')],
                                style={'textAlign': 'center', 'fontFamily': 'Georgia'}),
                            dcc.Dropdown(
                                id='outcome-id2',
                                options=[{'label': i, 'value': i} for i in outcomes],
                                value='% less than High School diploma')
                            ],style={'width': '40%','textAlign': 'center', 'fontFamily': 'Georgia', 'display': 'inline-block'}),
                        ]),

                dcc.Graph(id='indicator-graphic2',
                          config={'modeBarButtonsToRemove': ['sendDataToCloud', 'lasso2d', 'zoomIn2d', 'zoomOut2d', 'pan2d', 
                                                             'zoom2d','resetScale2d'], 
                                'displaylogo': False})
                ], style={'width': '50%', 'display': 'inline-block'}),             
        ]),
    ])
@app.callback(

    dash.dependencies.Output('indicator-graphic1', 'figure'),
    [dash.dependencies.Input('outcome-id1', 'value'),
     dash.dependencies.Input('state-id1', 'value'),
     dash.dependencies.Input('outcome-id2', 'value'),
     dash.dependencies.Input('state-id2', 'value')])
def outcome_time_series1(outcome_id, state_id, outcome_id2, state_id2):
    dff = df[['year', 'wbhao_igen', 'state',outcome_id]]
    dff = dff[dff['state'] == state_id]
    
    lines = {}
    data = []
    y_axis = {}
    legends={'orientation': 'h', 'xanchor': 'center', 'x': '0.5', 'y': '-0.22'}
    
    # Sets the range in each graph contingent on the other graphs options.
    if outcome_id==outcome_id2:
        graph2 = df[['year', 'wbhao_igen', 'state', outcome_id2]]
        graph2 = graph2[graph2['state'] == state_id2]

        dff_min = dff[outcome_id].min()
        dff_max = dff[outcome_id].max()
        if dff_min>graph2[outcome_id2].min():
            dff_min = dff[outcome_id].min()
        else:
            dff_min = dff[outcome_id2].min()
        if dff_max<graph2[outcome_id2].max():
            dff_max = graph2[outcome_id2].max()
        else:
            dff_max = dff[outcome_id].max()

        if dff_min<.05 or graph2[outcome_id2].min()<.05:
            dff_min = 0

        ranges = [dff_min, dff_max]
    elif outcome_id!=outcome_id2:
        ranges = []
    
    # Show three lines for each output
    generation = ['White All','Hispanic All', 'Hispanic 1st Generation', 'Hispanic 2nd Generation', 
                  'Hispanic 3rd Generation']
    for gen in generation:
        if '1st' in gen:
             lines = dict(
                 color = ("#6b6ecf"),
                 width = 2,
                 dash = 'dash')
        if '2nd' in gen:
             lines = dict(
                 color = ("#80b1d3"),
                 width = 2,
                 dash = 'dash')              
        if '3rd' in gen:
             lines = dict(
                 color = ("#fdb462"),
                 width = 2,
                 dash = 'dash')
        if 'White All' in gen:
              lines = dict(
                 color = ("#333333"),
                 width = 3)
        if 'Hispanic All' in gen:
               lines = dict(
                 color = ("#fb8072"),
                 width = 3)
        trace = go.Scatter(
            x = dff[dff['wbhao_igen']==gen]['year'],
            y = dff[dff['wbhao_igen']==gen][outcome_id],
            mode='lines',
            name = gen,
            line = lines,
            opacity = 0.8
            )
        
        data.append(trace)
    if '%' in outcome_id:
        y_axis = {'title': '{0}'.format(outcome_id), 
                  'hoverformat': ',.2f',
                  'range' : ranges}
    else:
         y_axis = {'title': '{0}'.format(outcome_id), 
                  'hoverformat': ',.2f'}    
    return {
        'data' : data,
        'layout' : go.Layout(
            xaxis={'title': 'Year'},
            yaxis=y_axis,
            legend=legends,
        )
    }

@app.callback(

    dash.dependencies.Output('indicator-graphic2', 'figure'),
    [dash.dependencies.Input('outcome-id2', 'value'),
     dash.dependencies.Input('state-id2', 'value'),
     dash.dependencies.Input('outcome-id1', 'value'),
     dash.dependencies.Input('state-id1', 'value')])
def outcome_time_series2(outcome_id, state_id, outcome_id1, state_id1):
    dff = df[['year', 'wbhao_igen', 'state',outcome_id]]
    dff = dff[dff['state'] == state_id]
    lines = {}
    data = []
    y_axis = {}
    legends={'orientation': 'h', 'xanchor': 'center', 'x': '0.5', 'y': '-0.22'}

    # Sets the range in each graph contingent on the other graphs options.
    if outcome_id==outcome_id1:
        graph1 = df[['year', 'wbhao_igen', 'state', outcome_id1]]
        graph1 = graph1[graph1['state'] == state_id1]
        dff_min = dff[outcome_id].min()
        dff_max = dff[outcome_id].max()
        if dff_min>graph1[outcome_id1].min():
            dff_min = graph1[outcome_id].min()
        else:
            dff_min = dff[outcome_id1].min()
        if dff_max<graph1[outcome_id1].max():
            dff_max = graph1[outcome_id1].max()
        else:
            dff_max = dff[outcome_id].max()
        if dff_min<.05 or graph1[outcome_id1].min()<.05:
            dff_min = 0

        ranges = [dff_min, dff_max]  
    elif outcome_id!=outcome_id1:
        ranges = []
        
    y_axis = {'title': '{0}'.format(outcome_id), 
              'hoverformat': ',.2f',
              'range': ranges
            }
    
    # Show 3 lines for each output
    generation = ['White All','Hispanic All', 'Hispanic 1st Generation', 'Hispanic 2nd Generation', 
                  'Hispanic 3rd Generation']
    for gen in generation:
        if '1st' in gen:
             lines = dict(
                 color = ("#6b6ecf"),
                 width = 2,
                 dash = 'dash')
        if '2nd' in gen:
             lines = dict(
                 color = ("#80b1d3"),
                 width = 2,
                 dash = 'dash')              
        if '3rd' in gen:
             lines = dict(
                 color = ("#fdb462"),
                 width = 2,
                 dash = 'dash')
        if 'White All' in gen:
              lines = dict(
                 color = ("#333333"),
                 width = 3)
        if 'Hispanic All' in gen:
               lines = dict(
                 color = ("#fb8072"),
                 width = 3)
        trace = go.Scatter(
            x = dff[dff['wbhao_igen']==gen]['year'],
            y = dff[dff['wbhao_igen']==gen][outcome_id],
            mode='lines',
            name = gen,
            line = lines,
            opacity = 0.8
            )
        
        data.append(trace)
    

    return {
        'data' : data,
        'layout' : go.Layout(
            xaxis={'title': 'Year'},
            yaxis=y_axis,
            legend=legends
        )
    }
    
if __name__ == '__main__':
    app.run_server()

 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [06/Apr/2018 09:53:10] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [06/Apr/2018 09:53:11] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [06/Apr/2018 09:53:11] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [06/Apr/2018 09:53:11] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [06/Apr/2018 09:53:11] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [06/Apr/2018 09:53:11] "GET /favicon.ico HTTP/1.1" 200 -
