# Merging SOC 2018 and Census 2010 Occupational Codes



In [2]:
import pandas as pd

## Statement of the Problem

To explore the question of flexibility, the inherent properties of occupations should be considered. To do this, I use data from O\*Net to build an index of flexibility for a particular occupation. However, the O\*Net database uses the Standard Occupation Classification (SOC) Code, updated annually, while the CPS and ATUS use the 2010 Census Occupation Classification Codes. 

Linking the occupation codes requires utilizing the Crosswalk (https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/2018-occupation-code-list-and-crosswalk.xlsx). The design of the Crosswalk between 2018 SOC and 2010 Census is not friendly to easy linking. 

In order to use the Crosswalk to link the 2018 SOC with the 2010 Census, we must first extract some codes from the 2018 Census Title column, fill in empty cells (imported as NaN), create a dictionary, and append the 2010 Census codes to some O\*Net data. 

With the final .csv, we can merge the CPS/ATUS data with the O\*Net job characteristics via the 2010 Census occupation code. 

## Extracting 2018 SOC Codes (SOC 2018 -> Census 2010)

Download the crosswalk table, removing the first three rows of title, date updated, and white space. 

Looking at the first ten rows of the data, we see that there are SOC Codes in the Census Title column. We want to extract these codes and put them into the SOC Code column. 

In [3]:
crosswalk = pd.read_excel('https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/2018-occupation-code-list-and-crosswalk.xlsx', 
                sheet_name='2010 to 2018 Crosswalk ',
                header=3)

crosswalk.head(10)

Unnamed: 0,2010 SOC code,2010 Census Code,2010 Census Title \n,2018 SOC Code,2018 Census Code,2018 Census Title
0,11-1011,10.0,Chief Executives,11-1011,10.0,Chief Executives
1,11-1021,20.0,General and Operations Managers,11-1021,20.0,General and Operations Managers
2,11-1031,30.0,Legislators,11-1031,30.0,Legislators
3,11-2011,40.0,Advertising and Promotions Managers,11-2011,40.0,Advertising and Promotions Managers
4,11-2020,50.0,Marketing and Sales Managers,,,
5,,,,11-2021,51.0,Marketing Managers
6,,,,11-2022,52.0,Sales Managers
7,11-2031,60.0,Public Relations and Fundraising Managers,11-2030,60.0,Public Relations and Fundraising Managers
8,,,,,,Public Relations Managers (11-2032)
9,,,,,,Fundraising Managers (11-2033)


To extract the codes, we are going to use the split function. First we split along the first parenthesis, then we split along the second parenthesis, and drop the unnecessary columns.

In [4]:
crosswalk[['2018 Census Title short','tmp_Code']] = crosswalk['2018 Census Title '].str.split("(", expand=True)

crosswalk[['Code','paren']] = crosswalk['tmp_Code'].str.split(")", expand=True)

crosswalk.drop(['tmp_Code', 'paren'], axis = 1, inplace = True)

crosswalk.head(10)

Unnamed: 0,2010 SOC code,2010 Census Code,2010 Census Title \n,2018 SOC Code,2018 Census Code,2018 Census Title,2018 Census Title short,Code
0,11-1011,10.0,Chief Executives,11-1011,10.0,Chief Executives,Chief Executives,
1,11-1021,20.0,General and Operations Managers,11-1021,20.0,General and Operations Managers,General and Operations Managers,
2,11-1031,30.0,Legislators,11-1031,30.0,Legislators,Legislators,
3,11-2011,40.0,Advertising and Promotions Managers,11-2011,40.0,Advertising and Promotions Managers,Advertising and Promotions Managers,
4,11-2020,50.0,Marketing and Sales Managers,,,,,
5,,,,11-2021,51.0,Marketing Managers,Marketing Managers,
6,,,,11-2022,52.0,Sales Managers,Sales Managers,
7,11-2031,60.0,Public Relations and Fundraising Managers,11-2030,60.0,Public Relations and Fundraising Managers,Public Relations and Fundraising Managers,
8,,,,,,Public Relations Managers (11-2032),Public Relations Managers,11-2032
9,,,,,,Fundraising Managers (11-2033),Fundraising Managers,11-2033


Now that we have extracted the SOC Codes from the Census title, we need to move the values to the 2018 SOC Code column

In [5]:
crosswalk['2018 SOC Code'].fillna(value=crosswalk['Code'], inplace=True)

crosswalk.head(10)

Unnamed: 0,2010 SOC code,2010 Census Code,2010 Census Title \n,2018 SOC Code,2018 Census Code,2018 Census Title,2018 Census Title short,Code
0,11-1011,10.0,Chief Executives,11-1011,10.0,Chief Executives,Chief Executives,
1,11-1021,20.0,General and Operations Managers,11-1021,20.0,General and Operations Managers,General and Operations Managers,
2,11-1031,30.0,Legislators,11-1031,30.0,Legislators,Legislators,
3,11-2011,40.0,Advertising and Promotions Managers,11-2011,40.0,Advertising and Promotions Managers,Advertising and Promotions Managers,
4,11-2020,50.0,Marketing and Sales Managers,,,,,
5,,,,11-2021,51.0,Marketing Managers,Marketing Managers,
6,,,,11-2022,52.0,Sales Managers,Sales Managers,
7,11-2031,60.0,Public Relations and Fundraising Managers,11-2030,60.0,Public Relations and Fundraising Managers,Public Relations and Fundraising Managers,
8,,,,11-2032,,Public Relations Managers (11-2032),Public Relations Managers,11-2032
9,,,,11-2033,,Fundraising Managers (11-2033),Fundraising Managers,11-2033


Now, we want to forward fill the 2010 Census code to remove the NaN caused by blanks in the Excel spreadsheet.

In [6]:
crosswalk['2010 Census Code'].fillna(method = 'ffill', inplace = True)

crosswalk.head(15)

Unnamed: 0,2010 SOC code,2010 Census Code,2010 Census Title \n,2018 SOC Code,2018 Census Code,2018 Census Title,2018 Census Title short,Code
0,11-1011,10,Chief Executives,11-1011,10.0,Chief Executives,Chief Executives,
1,11-1021,20,General and Operations Managers,11-1021,20.0,General and Operations Managers,General and Operations Managers,
2,11-1031,30,Legislators,11-1031,30.0,Legislators,Legislators,
3,11-2011,40,Advertising and Promotions Managers,11-2011,40.0,Advertising and Promotions Managers,Advertising and Promotions Managers,
4,11-2020,50,Marketing and Sales Managers,,,,,
5,,50,,11-2021,51.0,Marketing Managers,Marketing Managers,
6,,50,,11-2022,52.0,Sales Managers,Sales Managers,
7,11-2031,60,Public Relations and Fundraising Managers,11-2030,60.0,Public Relations and Fundraising Managers,Public Relations and Fundraising Managers,
8,,60,,11-2032,,Public Relations Managers (11-2032),Public Relations Managers,11-2032
9,,60,,11-2033,,Fundraising Managers (11-2033),Fundraising Managers,11-2033


## Dictionary to Match Census and SOC Codes

We are ready to create the dictionary. As we have many SOC codes to one Census code, we want to groupby SOC code and pull the relevant Census code.

In [7]:
crosswalk_dict = dict(crosswalk.groupby('2018 SOC Code')['2010 Census Code'].apply(list))

## Dingel and Neiman 2020

The data generated by Dingel and Neiman 2020 uses the ONET Release 24.2. This release using O\*Net-SOC 2010 taxonomy, that must be crosswalked to O\*Net-SOC 2019 then O\*Net-SOC 2018 to connect with the crosswalk created above using 2010 Census Codes.

- Dingel and Neiman Occupation Codes: O\*Net SOC 2010
- CPS and ATUS: Census 2010

To get from O\*Net SOC 2010 to Census 2010: 
1) Merge data frames 
    1) O\*Net SOC 2010 -> O\*Net SOC 2019 (https://www.onetcenter.org/taxonomy/2019/walk.html)
    2) O\*Net SOC 2019 -> SOC 2018 (https://www.onetcenter.org/taxonomy/2019/soc.html)
    3) SOC 2018 -> Census 2010 (https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/2018-occupation-code-list-and-crosswalk.xlsx)
2) Use crosswalk_dict above from SOC 2018 to Census 2010 


In [39]:
# DN = pd.read_csv('https://raw.githubusercontent.com/jdingel/DingelNeiman-workathome/master/occ_onet_scores/output/occupations_workathome.csv')

# BLS 2010 SOC Code
DN = pd.read_csv('https://raw.githubusercontent.com/jdingel/DingelNeiman-workathome/master/onet_to_BLS_crosswalk/output/onet_teleworkable_blscodes.csv')


DN.rename(columns = {'onetsoccode': 'O*NET-SOC 2010 Code', 'title': 'O*NET-SOC 2010 Title'}, inplace=True)
DN.head(10) # 968 occupations

Unnamed: 0,OCC_CODE,OES_TITLE,teleworkable
0,11-1011,Chief Executives,1.0
1,11-1021,General and Operations Managers,1.0
2,11-2011,Advertising and Promotions Managers,1.0
3,11-2021,Marketing Managers,1.0
4,11-2022,Sales Managers,1.0
5,11-2031,Public Relations and Fundraising Managers,1.0
6,11-3011,Administrative Services Managers,1.0
7,11-3021,Computer and Information Systems Managers,1.0
8,11-3031,Financial Managers,1.0
9,11-3051,Industrial Production Managers,0.0


In [9]:
# crosswalk = pd.read_excel('https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/2018-occupation-code-list-and-crosswalk.xlsx', 
#                 sheet_name='2010 to 2018 Crosswalk ',
#                 header=3)

# crosswalk.head(10)

In [10]:
crosswalk_onet_2010_to_2019 = pd.read_csv('https://www.onetcenter.org/taxonomy/2019/walk/2010_to_2019_Crosswalk.csv?fmt=csv')

crosswalk_onet_2010_to_2019.head(10) # 1164 occupations

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title
0,11-1011.00,Chief Executives,11-1011.00,Chief Executives
1,11-1011.03,Chief Sustainability Officers,11-1011.03,Chief Sustainability Officers
2,11-1021.00,General and Operations Managers,11-1021.00,General and Operations Managers
3,11-1031.00,Legislators,11-1031.00,Legislators
4,11-2011.00,Advertising and Promotions Managers,11-2011.00,Advertising and Promotions Managers
5,11-2011.01,Green Marketers,11-2011.00,Advertising and Promotions Managers
6,11-2021.00,Marketing Managers,11-2021.00,Marketing Managers
7,11-2022.00,Sales Managers,11-2022.00,Sales Managers
8,11-2031.00,Public Relations and Fundraising Managers,11-2032.00,Public Relations Managers
9,11-2031.00,Public Relations and Fundraising Managers,11-2033.00,Fundraising Managers


In [11]:
crosswalk_onet_2019_to_2018 = pd.read_csv('https://www.onetcenter.org/taxonomy/2019/soc/2019_to_SOC_Crosswalk.csv?fmt=csv')

crosswalk_onet_2019_to_2018.head(10) # 1016 occupations

Unnamed: 0,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title
0,11-1011.00,Chief Executives,11-1011,Chief Executives
1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives
2,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers
3,11-1031.00,Legislators,11-1031,Legislators
4,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers
5,11-2021.00,Marketing Managers,11-2021,Marketing Managers
6,11-2022.00,Sales Managers,11-2022,Sales Managers
7,11-2032.00,Public Relations Managers,11-2032,Public Relations Managers
8,11-2033.00,Fundraising Managers,11-2033,Fundraising Managers
9,11-3012.00,Administrative Services Managers,11-3012,Administrative Services Managers


In [24]:
tmp = pd.merge(crosswalk_onet_2010_to_2019, crosswalk_onet_2019_to_2018, how='left', on=['O*NET-SOC 2019 Code', 'O*NET-SOC 2019 Title'])
onetDN = pd.merge(DN, tmp, how = 'left', on=['O*NET-SOC 2010 Code', 'O*NET-SOC 2010 Title'])

In [25]:
tmp

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title
0,11-1011.00,Chief Executives,11-1011.00,Chief Executives,11-1011,Chief Executives
1,11-1011.03,Chief Sustainability Officers,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives
2,11-1021.00,General and Operations Managers,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers
3,11-1031.00,Legislators,11-1031.00,Legislators,11-1031,Legislators
4,11-2011.00,Advertising and Promotions Managers,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers
...,...,...,...,...,...,...
1159,55-3015.00,Command and Control Center Specialists,55-3015.00,Command and Control Center Specialists,55-3015,Command and Control Center Specialists
1160,55-3016.00,Infantry,55-3016.00,Infantry,55-3016,Infantry
1161,55-3017.00,Radar and Sonar Technicians,17-3029.00,"Engineering Technologists and Technicians, Exc...",17-3029,"Engineering Technologists and Technicians, Exc..."
1162,55-3018.00,Special Forces,55-3018.00,Special Forces,55-3018,Special Forces


In [26]:
DN

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable
0,11-1011.00,Chief Executives,1
1,11-1011.03,Chief Sustainability Officers,1
2,11-1021.00,General and Operations Managers,1
3,11-2011.00,Advertising and Promotions Managers,1
4,11-2021.00,Marketing Managers,1
...,...,...,...
963,53-7072.00,"Pump Operators, Except Wellhead Pumpers",0
964,53-7073.00,Wellhead Pumpers,0
965,53-7081.00,Refuse and Recyclable Material Collectors,0
966,53-7111.00,Mine Shuttle Car Operators,0


In [27]:
onetDN

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title
0,11-1011.00,Chief Executives,1,11-1011.00,Chief Executives,11-1011,Chief Executives
1,11-1011.03,Chief Sustainability Officers,1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives
2,11-1021.00,General and Operations Managers,1,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers
3,11-2011.00,Advertising and Promotions Managers,1,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers
4,11-2021.00,Marketing Managers,1,11-2021.00,Marketing Managers,11-2021,Marketing Managers
...,...,...,...,...,...,...,...
989,53-7072.00,"Pump Operators, Except Wellhead Pumpers",0,53-7072.00,"Pump Operators, Except Wellhead Pumpers",53-7072,"Pump Operators, Except Wellhead Pumpers"
990,53-7073.00,Wellhead Pumpers,0,53-7073.00,Wellhead Pumpers,53-7073,Wellhead Pumpers
991,53-7081.00,Refuse and Recyclable Material Collectors,0,53-7081.00,Refuse and Recyclable Material Collectors,53-7081,Refuse and Recyclable Material Collectors
992,53-7111.00,Mine Shuttle Car Operators,0,47-5044.00,"Loading and Moving Machine Operators, Undergro...",47-5044,"Loading and Moving Machine Operators, Undergro..."


# THIS IS THE STEP THAT IS BREAKING! 2018 SOC code wrong or 2018 SOC -> 2010 Census wrong. 

In [30]:
crosswalk_dict

{'11-1011': ['0010'],
 '11-1021': ['0020'],
 '11-1031': ['0030'],
 '11-2011': ['0040'],
 '11-2021': ['0050'],
 '11-2022': ['0050'],
 '11-2030': ['0060'],
 '11-2032': ['0060'],
 '11-2033': ['0060'],
 '11-3012': ['0100'],
 '11-3013': ['0100'],
 '11-3021': ['0110'],
 '11-3031': ['0120'],
 '11-3051': ['0140'],
 '11-3061': ['0150'],
 '11-3071': ['0160'],
 '11-3111': ['0135'],
 '11-3121': ['0136'],
 '11-3131': ['0137'],
 '11-9013': ['0205'],
 '11-9021': ['0220'],
 '11-9030': ['0230'],
 '11-9031': ['0230'],
 '11-9032': ['0230'],
 '11-9033': ['0230'],
 '11-9039': ['0230'],
 '11-9041': ['0300'],
 '11-9051': ['0310'],
 '11-9070': ['0330'],
 '11-9071': ['0330'],
 '11-9072': ['0330'],
 '11-9081': ['0340'],
 '11-9111': ['0350'],
 '11-9121': ['0360'],
 '11-9131': ['0400'],
 '11-9141': ['0410'],
 '11-9151': ['0420'],
 '11-9161': ['0425'],
 '11-9171': ['0325'],
 '11-9179': ['0425'],
 '11-9199': ['0430'],
 '13-1011': ['0500'],
 '13-1021': ['0510'],
 '13-1022': ['0520'],
 '13-1023': ['0530'],
 '13-1030'

In [31]:
onetDN['2010 Census Code'] = onetDN['2018 SOC Code'].map(crosswalk_dict)

In [32]:
onetDN

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code
0,11-1011.00,Chief Executives,1,11-1011.00,Chief Executives,11-1011,Chief Executives,[0010]
1,11-1011.03,Chief Sustainability Officers,1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives,[0010]
2,11-1021.00,General and Operations Managers,1,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers,[0020]
3,11-2011.00,Advertising and Promotions Managers,1,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers,[0040]
4,11-2021.00,Marketing Managers,1,11-2021.00,Marketing Managers,11-2021,Marketing Managers,[0050]
...,...,...,...,...,...,...,...,...
989,53-7072.00,"Pump Operators, Except Wellhead Pumpers",0,53-7072.00,"Pump Operators, Except Wellhead Pumpers",53-7072,"Pump Operators, Except Wellhead Pumpers",[9650]
990,53-7073.00,Wellhead Pumpers,0,53-7073.00,Wellhead Pumpers,53-7073,Wellhead Pumpers,[9650]
991,53-7081.00,Refuse and Recyclable Material Collectors,0,53-7081.00,Refuse and Recyclable Material Collectors,53-7081,Refuse and Recyclable Material Collectors,[9720]
992,53-7111.00,Mine Shuttle Car Operators,0,47-5044.00,"Loading and Moving Machine Operators, Undergro...",47-5044,"Loading and Moving Machine Operators, Undergro...",[6910]


In [38]:
onetDN[onetDN['2018 SOC Code']==51-9199]

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code


Trying to remove the 2010 Census code from the list in the DateFrame showed that there were missing values in the Census Code column. Investigation found that there were two values not mapping in the dictionary. 

- 2018 SOC Code 53-6031 is listed as 56-6031 in the Crosswalk file, but I am confident that this is a typo as it is listed under 53-6030. The corresponding 2010 Census code is 9360.
- 2018 SOC Code 17-3012 is listed as having a space in the Crosswalk file that was not removed in the string split procedure above. The corresponding 2010 Census code is 1540.


In [33]:
# Same null values as before, so we use the same two corrections.

CORRECTION1 = (onetDN['2018 SOC Code'] == '53-6031')
CORRECTION2 = (onetDN['2018 SOC Code'] == '17-3012')

onetDN.loc[CORRECTION1, '2010 Census Code'] = '9360'
onetDN.loc[CORRECTION2, '2010 Census Code'] = '1540'

onetDN.head()

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code
0,11-1011.00,Chief Executives,1,11-1011.00,Chief Executives,11-1011,Chief Executives,[0010]
1,11-1011.03,Chief Sustainability Officers,1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives,[0010]
2,11-1021.00,General and Operations Managers,1,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers,[0020]
3,11-2011.00,Advertising and Promotions Managers,1,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers,[0040]
4,11-2021.00,Marketing Managers,1,11-2021.00,Marketing Managers,11-2021,Marketing Managers,[0050]


In [34]:
def remove_from_list(x):
    val = x[0]
    if isinstance(x, str):
        return [x][0]
    return val

In [35]:
onetDN['2010 Census Code']=onetDN['2010 Census Code'].apply(remove_from_list)

onetDN['2010 Census Code'].astype(int)

0        10
1        10
2        20
3        40
4        50
       ... 
989    9650
990    9650
991    9720
992    6910
993    9740
Name: 2010 Census Code, Length: 994, dtype: int64

In [37]:
onetDN[onetDN['2010 Census Code']==8965]
# onetDN[onetDN['2010 Census Code']==5940]


Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code


In [None]:
onetDN.to_csv('DingelNeiman2020.csv', index=False)

## Dingel and Neiman 2020

Data from https://github.com/jdingel/DingelNeiman-workathome/tree/master/onet_to_BLS_crosswalk.

- Dingel and Neiman Occupation Codes: SOC 2010
- CPS and ATUS: Census 2010

To get from SOC 2010 to Census 2010: 
1) Merge data frames 
    3) SOC 2010 -> Census 2010 (https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/2010-occ-codes-with-crosswalk-from-2002-2011.xls)


In [56]:
# BLS 2010 SOC Code
DN = pd.read_csv('https://raw.githubusercontent.com/jdingel/DingelNeiman-workathome/master/onet_to_BLS_crosswalk/output/onet_teleworkable_blscodes.csv')

DN.rename(columns = {'OCC_CODE': '2010 SOC Code'}, inplace=True)#, 'title': 'O*NET-SOC 2010 Title'}, inplace=True)
DN.head(10) # 968 occupations

Unnamed: 0,2010 SOC Code,OES_TITLE,teleworkable
0,11-1011,Chief Executives,1.0
1,11-1021,General and Operations Managers,1.0
2,11-2011,Advertising and Promotions Managers,1.0
3,11-2021,Marketing Managers,1.0
4,11-2022,Sales Managers,1.0
5,11-2031,Public Relations and Fundraising Managers,1.0
6,11-3011,Administrative Services Managers,1.0
7,11-3021,Computer and Information Systems Managers,1.0
8,11-3031,Financial Managers,1.0
9,11-3051,Industrial Production Managers,0.0


In [57]:
crosswalk = pd.read_excel('https://www2.census.gov/programs-surveys/demo/guidance/industry-occupation/2010-occ-codes-with-crosswalk-from-2002-2011.xls',
                         sheet_name='2010OccCodeList',
                         header=4,
                         usecols=['Occupation 2010 Description', '2010 Census Code', '2010 SOC Code'],
                         ).dropna()

crosswalk.head(10)

Unnamed: 0,Occupation 2010 Description,2010 Census Code,2010 SOC Code
5,"Management, Business, and Financial Occupations:",0010-0950,11-0000 - 13-0000
7,Management Occupations:,0010-0430,11-0000
9,Chief executives,0010,11-1011
10,General and operations managers,0020,11-1021
11,Legislators,0030,11-1031
12,Advertising and promotions managers,0040,11-2011
13,Marketing and sales managers,0050,11-2020
14,Public relations and fundraising managers,0060,11-2031
15,Administrative services managers,0100,11-3011
16,Computer and information systems managers,0110,11-3021


In [59]:
censusDN = pd.merge(DN, crosswalk, how = 'left', on=['2010 SOC Code'])

censusDN.head(10)

Unnamed: 0,2010 SOC Code,OES_TITLE,teleworkable,Occupation 2010 Description,2010 Census Code
0,11-1011,Chief Executives,1.0,Chief executives,10.0
1,11-1021,General and Operations Managers,1.0,General and operations managers,20.0
2,11-2011,Advertising and Promotions Managers,1.0,Advertising and promotions managers,40.0
3,11-2021,Marketing Managers,1.0,,
4,11-2022,Sales Managers,1.0,,
5,11-2031,Public Relations and Fundraising Managers,1.0,Public relations and fundraising managers,60.0
6,11-3011,Administrative Services Managers,1.0,Administrative services managers,100.0
7,11-3021,Computer and Information Systems Managers,1.0,Computer and information systems managers,110.0
8,11-3031,Financial Managers,1.0,Financial managers,120.0
9,11-3051,Industrial Production Managers,0.0,Industrial production managers,140.0


In [32]:
onetDN

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code
0,11-1011.00,Chief Executives,1,11-1011.00,Chief Executives,11-1011,Chief Executives,[0010]
1,11-1011.03,Chief Sustainability Officers,1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives,[0010]
2,11-1021.00,General and Operations Managers,1,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers,[0020]
3,11-2011.00,Advertising and Promotions Managers,1,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers,[0040]
4,11-2021.00,Marketing Managers,1,11-2021.00,Marketing Managers,11-2021,Marketing Managers,[0050]
...,...,...,...,...,...,...,...,...
989,53-7072.00,"Pump Operators, Except Wellhead Pumpers",0,53-7072.00,"Pump Operators, Except Wellhead Pumpers",53-7072,"Pump Operators, Except Wellhead Pumpers",[9650]
990,53-7073.00,Wellhead Pumpers,0,53-7073.00,Wellhead Pumpers,53-7073,Wellhead Pumpers,[9650]
991,53-7081.00,Refuse and Recyclable Material Collectors,0,53-7081.00,Refuse and Recyclable Material Collectors,53-7081,Refuse and Recyclable Material Collectors,[9720]
992,53-7111.00,Mine Shuttle Car Operators,0,47-5044.00,"Loading and Moving Machine Operators, Undergro...",47-5044,"Loading and Moving Machine Operators, Undergro...",[6910]


In [38]:
onetDN[onetDN['2018 SOC Code']==51-9199]

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code


Trying to remove the 2010 Census code from the list in the DateFrame showed that there were missing values in the Census Code column. Investigation found that there were two values not mapping in the dictionary. 

- 2018 SOC Code 53-6031 is listed as 56-6031 in the Crosswalk file, but I am confident that this is a typo as it is listed under 53-6030. The corresponding 2010 Census code is 9360.
- 2018 SOC Code 17-3012 is listed as having a space in the Crosswalk file that was not removed in the string split procedure above. The corresponding 2010 Census code is 1540.


In [33]:
# Same null values as before, so we use the same two corrections.

CORRECTION1 = (onetDN['2018 SOC Code'] == '53-6031')
CORRECTION2 = (onetDN['2018 SOC Code'] == '17-3012')

onetDN.loc[CORRECTION1, '2010 Census Code'] = '9360'
onetDN.loc[CORRECTION2, '2010 Census Code'] = '1540'

onetDN.head()

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code
0,11-1011.00,Chief Executives,1,11-1011.00,Chief Executives,11-1011,Chief Executives,[0010]
1,11-1011.03,Chief Sustainability Officers,1,11-1011.03,Chief Sustainability Officers,11-1011,Chief Executives,[0010]
2,11-1021.00,General and Operations Managers,1,11-1021.00,General and Operations Managers,11-1021,General and Operations Managers,[0020]
3,11-2011.00,Advertising and Promotions Managers,1,11-2011.00,Advertising and Promotions Managers,11-2011,Advertising and Promotions Managers,[0040]
4,11-2021.00,Marketing Managers,1,11-2021.00,Marketing Managers,11-2021,Marketing Managers,[0050]


In [34]:
def remove_from_list(x):
    val = x[0]
    if isinstance(x, str):
        return [x][0]
    return val

In [35]:
onetDN['2010 Census Code']=onetDN['2010 Census Code'].apply(remove_from_list)

onetDN['2010 Census Code'].astype(int)

0        10
1        10
2        20
3        40
4        50
       ... 
989    9650
990    9650
991    9720
992    6910
993    9740
Name: 2010 Census Code, Length: 994, dtype: int64

In [37]:
onetDN[onetDN['2010 Census Code']==8965]
# onetDN[onetDN['2010 Census Code']==5940]


Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,teleworkable,O*NET-SOC 2019 Code,O*NET-SOC 2019 Title,2018 SOC Code,2018 SOC Title,2010 Census Code


In [None]:
onetDN.to_csv('DingelNeiman2020.csv', index=False)

# Scratch

## Using Dictionary to Rename Occupation Codes

ONet to determine level of flexibility by occupation. Measures include: 
- Work Context — Freedom to Make Decisions (https://www.onetonline.org/find/descriptor/result/4.C.3.a.4)
    - No freedom 0 ~ 100 A lot of freedom
- Work Context — Structured versus Unstructured Work (https://www.onetonline.org/find/descriptor/result/4.C.3.b.8)
    - Structured (no freedom) 0 ~ Unstructured (a lot of freedom)
- Work Context — Time Pressure (https://www.onetonline.org/find/descriptor/result/4.C.3.d.1)
    - Never 0 ~ 100 Every day
- Work Context — Regular Work Schedules (https://www.onetonline.org/find/descriptor/result/4.C.3.d.4)
    - Regular/established schedule 0 ~ 100 Seasonal/only during certain times of the year
- Work Styles — Independence (https://www.onetonline.org/find/descriptor/result/1.C.6)
    - No independence 0 ~ 100 A lot of independence
    - "Job requires developing one's own ways of doing things, guiding oneself with little or no supervision, and depending on oneself to get things done."

These were determined to impact flexibility from the browse tool (https://www.onetonline.org/find/descriptor/browse/)

In [None]:
free = pd.read_csv('../Freedom_to_Make_Decisions.csv')
struct = pd.read_csv('../Structured_versus_Unstructured_Work.csv')
time = pd.read_csv('../Time_Pressure.csv')
sched = pd.read_csv('../Work_Schedules.csv')
indep = free = pd.read_csv('../Freedom_to_Make_Decisions.csv')

In [None]:
free = free.rename(columns = {'Context':'Freedom_to_Make_Decisions'})
struct = struct.rename(columns = {'Context':'Structured_v_Unstructured'})
time = time.rename(columns = {'Context':'Time_Pressure'})
sched = sched.rename(columns = {'Context':'Regular_Schedule'})
indep = indep.rename(columns = {'Context':'Independence'})

In [None]:
tmp = pd.merge(free, struct, how='left', on=['Code','Occupation'])
tmp = pd.merge(tmp, time, how='left', on=['Code','Occupation'])
tmp = pd.merge(tmp, sched, how='left', on=['Code','Occupation'])
tmp = pd.merge(tmp, indep, how='left', on=['Code','Occupation'])

onet = tmp
onet.rename(columns = {'Code': '2018 SOC Code'}, inplace=True) # for data straight from ONET
onet.head()

In [None]:
onet[['2018 SOC Code','drop']] = onet['2018 SOC Code'].str.split(".", expand=True)
onet.drop(columns='drop', inplace=True)

onet.head()

Trying to remove the 2010 Census code from the list in the DateFrame showed that there were missing values in the Census Code column. Investigation found that there were two values not mapping in the dictionary. 

- 2018 SOC Code 53-6031 is listed as 56-6031 in the Crosswalk file, but I am confident that this is a typo as it is listed under 53-6030. The corresponding 2010 Census code is 9360.
- 2018 SOC Code 17-3012 is listed as having a space in the Crosswalk file that was not removed in the string split procedure above. The corresponding 2010 Census code is 1540.


In [None]:
onet['2010 Census Code'] = onet['2018 SOC Code'].map(crosswalk_dict)

onet.head()

In [None]:
nulls = onet[onet['2010 Census Code'].isna()]

nulls

In [None]:
CORRECTION1 = (onet['2018 SOC Code'] == '53-6031')
CORRECTION2 = (onet['2018 SOC Code'] == '17-3012')

onet.loc[CORRECTION1, '2010 Census Code'] = ['9360']
onet.loc[CORRECTION2, '2010 Census Code'] = ['1540']

onet.head()

In [None]:
def remove_from_list(x):
    val = x[0]
    if isinstance(x, str):
        return [x][0]
    return val

In [None]:
onet['2010 Census Code']=onet['2010 Census Code'].apply(remove_from_list)

onet['2010 Census Code'].astype(int)

In [None]:
onet

In [None]:
onet.to_csv('onet_flex.csv', index=False)