# Immigrant Project 

### Distribution of Immigrant Generation among Hispanics and Whites

At this point, what I have in mind is two graphs showing the trends in outcomes (like the one you prepared [here](https://caldercenter.org/data-visualizations/aggregated-number-graduates-education-nationally))

1.	Distribution of Immigrant Generation among Hispanics and Whites, 1994-2016: you can create this using main_national…looking at igen=”All” and using the variables gen1 gen2, and gen3. I think it would be useful to have 2 graphs side by side, one showing the distribution among Hispanics and the other showing the distribution among whites.
2.	Distribution of Immigrant Generation among Hispanics and Whites in Top Immigration States, 1994-2016: you can create this using main_topstates…looking at igen=”All” and using the variables gen1 gen2, and gen3. Two graphs, side-by-side, and with the option to choose the state.

  * LTHS: “% less than HS diploma”
  * College: “% with college degree”
  *	Hinsured: “% with health insurance”
  * rincp_all: “Average individual real income”
  *	employed: “% employed”
  *	married2: “% married”
  *	children: “% with children”
  * poverty; (% of families under the poverty line)
  * age; (Average age)
  * rinch_all; (median household income);


## Clean data

I want to make sure that the data is in a format that can be transformed into a dash app easily.

In [1]:
import pandas as pd
import os
import plotly.graph_objs as go
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State

In [2]:
os.chdir(r'H:\CALDER\CALDER Data Visualizations\Data\Immigrant Project')

In [12]:
nat = pd.read_csv('main_national.csv')

In [13]:
nat = nat.sort_values(by=['year', 'wbhao', 'igen'])
nat.head(9)

Unnamed: 0,year,wbhao,igen,lths,college,hinsured,employed,married2,children,gen1,gen2,gen3,rincp_all
3,1994,Hispanic,1st Generation,0.529125,0.060532,0.544256,0.670069,0.673386,0.726153,,,,16157.772
5,1994,Hispanic,2nd Generation,0.27347,0.087691,0.752589,0.640134,0.551705,0.584468,,,,21515.689
7,1994,Hispanic,3rd Generation,0.208407,0.068863,0.763343,0.756557,0.587008,0.63558,,,,25852.436
9,1994,Hispanic,All,0.397849,0.068677,0.641707,0.682653,0.62584,0.673044,0.543867,0.230767,0.225366,19389.326
2,1994,White,1st Generation,0.118049,0.222853,0.815063,0.731381,0.746446,0.526193,,,,30699.768
4,1994,White,2nd Generation,0.054359,0.221735,0.867673,0.800871,0.666478,0.479842,,,,38809.355
6,1994,White,3rd Generation,0.080448,0.186397,0.861411,0.806955,0.691202,0.526112,,,,35270.801
8,1994,White,All,0.080278,0.189811,0.860092,0.803842,0.691756,0.523384,0.036445,0.059018,0.904537,35275.648
11,1995,Hispanic,1st Generation,0.534055,0.059078,0.526185,0.690245,0.669891,0.713,,,,16618.637


In [14]:
# Rename Columns
pre = ['lths', 'college', 'hinsured', 'employed', 'married2', 'children', 'gen1', 'gen2', 'gen3','rincp_all']
post = ["% less than High School diploma", "% with College Degree", "% with Health Insurance", "% Employed", "% Married", 
        "% with Children", "Share of 1st Generation", "Share of 2nd Generation", "Share of 3rd Generation", 
        "Median Individual Real Income"]
for i in range(0, len(pre)):   
    nat.rename(columns={pre[i]: post[i]}, inplace=True)

nat.columns

Index(['year', 'wbhao', 'igen', '% less than High School diploma',
       '% with College Degree', '% with Health Insurance', '% Employed',
       '% Married', '% with Children', 'Share of 1st Generation',
       'Share of 2nd Generation', 'Share of 3rd Generation',
       'Median Individual Real Income'],
      dtype='object')

In [15]:
nat['wbhao_igen'] = nat['wbhao'] + ' ' + nat['igen']

### Drop Columns to match Umut's request in #1

In [16]:
nat = nat[nat.igen == 'All']

In [18]:
nat.columns

Index(['year', 'wbhao', 'igen', '% less than High School diploma',
       '% with College Degree', '% with Health Insurance', '% Employed',
       '% Married', '% with Children', 'Share of 1st Generation',
       'Share of 2nd Generation', 'Share of 3rd Generation',
       'Median Individual Real Income', 'wbhao_igen'],
      dtype='object')

In [20]:
nat = nat[['year', 'wbhao', 'igen', 'Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation',
           'wbhao_igen']]

In [22]:
nat['state'] = 'National'

## Append "Top_State" data set

You will be able to select the national data along with the state data using the "state" column. 

### Import and Clean "top_state" data

In [24]:
state = pd.read_csv('main_topstates.csv')

In [25]:
state = state.sort_values(by=['year', 'state', 'wbhao'])
state.head(9)

Unnamed: 0,year,wbhao,state,igen,lths,college,hinsured,employed,married2,children,gen1,gen2,gen3,rincp_all
23,1994,Hispanic,California,1st Generation,0.612092,0.046513,0.539978,0.644767,0.679131,0.782559,,,,16157.772
35,1994,Hispanic,California,2nd Generation,0.236203,0.074119,0.811544,0.704071,0.60013,0.574653,,,,25254.598
47,1994,Hispanic,California,3rd Generation,0.121056,0.053164,0.831612,0.767576,0.558505,0.703352,,,,29083.99
59,1994,Hispanic,California,All,0.462181,0.051881,0.636176,0.677115,0.644281,0.736546,0.659941,0.14824,0.191819,19324.695
22,1994,White,California,1st Generation,0.111483,0.225295,0.813284,0.713643,0.745518,0.583275,,,,32396.334
34,1994,White,California,2nd Generation,0.056032,0.253155,0.868273,0.784288,0.633254,0.520214,,,,41577.18
46,1994,White,California,3rd Generation,0.042512,0.215277,0.839792,0.794372,0.620122,0.485173,,,,40636.797
58,1994,White,California,All,0.049636,0.219552,0.840092,0.786551,0.632042,0.496733,0.085607,0.090224,0.824168,40408.973
19,1994,Hispanic,Florida,1st Generation,0.293482,0.091414,0.56535,0.710784,0.631509,0.587935,,,,18726.859


In [26]:
# Rename Columns
pre = ['lths', 'college', 'hinsured', 'employed', 'married2', 'children', 'gen1', 'gen2', 'gen3','rincp_all']
post = ["% less than High School diploma", "% with College Degree", "% with Health Insurance", "% Employed", "% Married", 
        "% with Children", "Share of 1st Generation", "Share of 2nd Generation", "Share of 3rd Generation", 
        "Median Individual Real Income"]
for i in range(0, len(pre)):   
    state.rename(columns={pre[i]: post[i]}, inplace=True)

state.columns

Index(['year', 'wbhao', 'state', 'igen', '% less than High School diploma',
       '% with College Degree', '% with Health Insurance', '% Employed',
       '% Married', '% with Children', 'Share of 1st Generation',
       'Share of 2nd Generation', 'Share of 3rd Generation',
       'Median Individual Real Income'],
      dtype='object')

In [27]:
state['wbhao_igen'] = state['wbhao'] + ' ' + state['igen']

In [28]:
state = state[state.igen == 'All']

In [29]:
state = state[['year', 'wbhao', 'igen', 'state', 'Share of 1st Generation','Share of 2nd Generation', 
               'Share of 3rd Generation', 'wbhao_igen']]

### Append

In [31]:
append = nat.append(state)

In [35]:
append = append.sort_values(by=['year', 'wbhao', 'igen'])

In [38]:
append.head(8)

Unnamed: 0,Share of 1st Generation,Share of 2nd Generation,Share of 3rd Generation,igen,state,wbhao,wbhao_igen,year
9,0.543867,0.230767,0.225366,All,National,Hispanic,Hispanic All,1994
59,0.659941,0.14824,0.191819,All,California,Hispanic,Hispanic All,1994
55,0.699234,0.227542,0.073224,All,Florida,Hispanic,Hispanic All,1994
53,0.630748,0.277068,0.092184,All,Illinois,Hispanic,Hispanic All,1994
51,0.591881,0.375607,0.032512,All,New Jersey,Hispanic,Hispanic All,1994
49,0.516753,0.435614,0.047633,All,New York,Hispanic,Hispanic All,1994
57,0.372203,0.226525,0.401272,All,Texas,Hispanic,Hispanic All,1994
8,0.036445,0.059018,0.904537,All,National,White,White All,1994


## Graph

Will try to create what Umut asked for in #1 above using the data structure above. If it proves too messy I'll restructure it long. 

In [61]:
app = dash.Dash()

app.css.append_css({"external_url": "https://codepen.io/chriddyp/pen/bWLwgP.css"}) 

df = append

states = list(append['state'].unique())

# Organize where items will be on the page
app.layout = html.Div([
        html.H3(
            children='Distribution of Immigrant Generation among Hispanics and Whites, Nationally \
                        and in Top Immigration States',
            style={
                'textAlign': 'center', 'fontFamily' : 'Georgia'
            }
        ),
        html.Center([          
            html.Div([
                html.Div([html.P('Select State',id='state-title')],
                    style={'textAlign': 'center', 'fontFamily': 'Georgia'}),
                dcc.Dropdown(
                    id='state-id',
                    options=[{'label': i, 'value': i} for i in states],
                    value='California')
                ],style={'width': '50%','textAlign': 'center', 'fontFamily': 'Georgia', 'display': 'inline-block'}),        
            ]),
        html.Div([
            html.Div([
                dcc.Graph(id='indicator-graphic1',
                          config={'modeBarButtonsToRemove': ['sendDataToCloud', 'lasso2d', 'zoomIn2d', 'zoomOut2d', 'pan2d', 
                                                             'zoom2d','resetScale2d'], 
                                'displaylogo': False})
                ], style={'width': '50%', 'display': 'inline-block'}),  
            html.Div([
                dcc.Graph(id='indicator-graphic2',
                          config={'modeBarButtonsToRemove': ['sendDataToCloud', 'lasso2d', 'zoomIn2d', 'zoomOut2d', 'pan2d', 
                                                             'zoom2d','resetScale2d'], 
                                'displaylogo': False})
                ], style={'width': '50%', 'display': 'inline-block'}),             
        ]),
    ])
@app.callback(

    dash.dependencies.Output('indicator-graphic1', 'figure'),
    [dash.dependencies.Input('state-id', 'value')])
def outcome_hispanic(state_id):
    dff = df[['year', 'wbhao_igen', 'state','Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation']]
    dff = dff[(dff.state == state_id) & (dff.wbhao_igen == 'Hispanic All')]

    lines = {}
    data = []
    y_axis = {'title': '% in {0}'.format(state_id), 
              'hoverformat': ',.2f',
              'range' : [0,1]}          
    legends={'orientation': 'h', 'xanchor': 'center', 'x': '0.5', 'y': '-0.22'}

    
    outcomes = ['Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation']
    for gen in outcomes:
        if '1st' in gen:
             lines = dict(
                 color = ("#6b6ecf"),
                 width = 3)
        if '2nd' in gen:
             lines = dict(
                 color = ("#80b1d3"),
                 width = 3)
        if '3rd' in gen:
             lines = dict(
                 color = ("#fdb462"),
                 width = 3)
        trace = go.Scatter(
            x = dff['year'],
            y = dff[gen],
            mode='lines',
            name = gen,
            line = lines,
            opacity = 0.8
            )
        
        data.append(trace)


    return {
        'data' : data,
        'layout' : go.Layout(
            title='Hispanic',
            titlefont=dict(
                        family='Georgia'),
            xaxis={'title': 'Year'},
            yaxis=y_axis,
            legend = legends
        )
    }

@app.callback(

    dash.dependencies.Output('indicator-graphic2', 'figure'),
    [dash.dependencies.Input('state-id', 'value')])
def outcome_white(state_id):
    dff = df[['year', 'wbhao_igen', 'state','Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation']]
    dff = dff[(dff.state == state_id) & (dff.wbhao_igen == 'White All')]

    lines = {}
    data = []
    y_axis = {'title': '% in {0}'.format(state_id), 
              'hoverformat': ',.2f',
              'range' : [0,1]}          
    legends={'orientation': 'h', 'xanchor': 'center', 'x': '0.5', 'y': '-0.22'}

    
    outcomes = ['Share of 1st Generation','Share of 2nd Generation', 'Share of 3rd Generation']
    for gen in outcomes:
        if '1st' in gen:
             lines = dict(
                 color = ("#6b6ecf"),
                 width = 3)
        if '2nd' in gen:
             lines = dict(
                 color = ("#80b1d3"),
                 width = 3)
        if '3rd' in gen:
             lines = dict(
                 color = ("#fdb462"),
                 width = 3)
        trace = go.Scatter(
            x = dff['year'],
            y = dff[gen],
            mode='lines',
            name = gen,
            line = lines,
            opacity = 0.8
            )
        
        data.append(trace)


    return {
        'data' : data,
        'layout' : go.Layout(
            title = 'White',
            titlefont=dict(
                        family='Georgia'),
            xaxis={'title': 'Year'},
            yaxis=y_axis,
            legend = legends
        )
    }
    
if __name__ == '__main__':
    app.run_server()

 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [04/Apr/2018 14:40:36] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:37] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:37] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:37] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:37] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:37] "GET /favicon.ico HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:44] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [04/Apr/2018 14:40:44] "POST /_dash-update-component HTTP/1.1" 200 -
