# Tradução dos Códigos de Ocupação O*NET para ocupações do Brasil (COD 2010)

Para aplicar a metodologia de Dingel e Neiman, é necessário realizar a tradução dos códigos de ocupações para ocupações do Brasil.

**Passo-a-passo da tradução dos códigos de ocupação, segundo artigo da FGV:**

1. Primeiro, realizamos uma tradução das 968 ocupações da O*NET para a Standard Occupational Classification de 2010 (SOC-2010), utilizada na pesquisa por domicílios americana (Current Population Survey - CPS);
2. Em seguida, reclassificamos as ocupações da SOC-2010 para a ISCO-08, para então; 
3. traduzirmos para a COD utilizada na PNAD Contínua. 

As reclassificações foram realizadas conforme o diagrama abaixo.

#### Diagrama para tradução dos códigos de ocupação (FGV, 2021)
![Diagrama](../Anexos/diagrama_traducao.png)

Link para base de ocupações da O*NET: [O*NET OnLine](https://www.onetonline.org/find/all)

Link para base de ocupações da SOC-2010: [U.S. BUREAU OF LABOR STATISTICS](https://www.bls.gov/soc/2010/home.htm)

Link para base de ocupações da ISCO-08: [International Labour Organization](https://isco-ilo.netlify.app/en/isco-08/)

Repositório do estudo Work at Home (Dingel e Neiman): [Work at Home](https://github.com/jdingel)

Arquivo CSV com as ocupações teletrabalháveis: [occupations_workathome.csv](https://raw.githubusercontent.com/jdingel/DingelNeiman-workathome/master/occ_onet_scores/output/occupations_workathome.csv)



In [1]:
#pacotes
import pandas as pd
import numpy as np

In [43]:
#Leitura das bases (tabelas de ocupação e de pessoas)

onet = pd.read_excel('../Anexos/Occupations_ONET.xlsx') #Ocupações O*NET
soc2010 = pd.read_excel('../Anexos/Occupations_SOC-2010.xlsx') #Ocupações SOC-2010
isco08 = pd.read_excel('../Anexos/Occupations_ISCO-08.xlsx') #Ocupações ISCO-08
cod2010 = pd.read_excel('../Anexos/COD 2010.xlsx') #Ocupações COD 2010

onet_teleworkable = pd.read_csv('https://raw.githubusercontent.com/jdingel/DingelNeiman-workathome/master/occ_onet_scores/output/occupations_workathome.csv')

### Tradução do O\*NET (filtrado por Dingel e Neiman) para o SOC-2010 

A tradução deve ser feita usando como refência o código do Major Group e título da ocupação

In [52]:
soc2010filtrado = soc2010[['Detailed Occupation', 'Title']]
soc2010filtrado.columns = ['code_SOC', 'Title_SOC']
soc2010filtrado.dropna(inplace=True)

#Captura somente o codigo major
#soc2010filtrado['code_SOC'] = soc2010filtrado['code_SOC'].apply(lambda x: x[:2])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  soc2010filtrado.dropna(inplace=True)


In [46]:
onet_filtrado = onet_teleworkable.copy()
onet_filtrado.columns = ['code_ONET', 'Title_ONET', 'teleworkable']

#captura somente o codigo major
onet_filtrado['code_ONET'] = onet_filtrado['code_ONET'].apply(lambda x: x[0:-3])

In [47]:
onet_filtrado.head()

Unnamed: 0,code_ONET,Title_ONET,teleworkable
0,11-1011,Chief Executives,1
1,11-1011,Chief Sustainability Officers,1
2,11-1021,General and Operations Managers,1
3,11-2011,Advertising and Promotions Managers,1
4,11-2021,Marketing Managers,1


In [56]:
#join onet_filtrado e soc2010filtrado by column Code_ONET and code_SOC

ocupacoes_agregado = pd.merge(onet_filtrado, soc2010filtrado, how='left', left_on='code_ONET', right_on='code_SOC')

### Tradução do O\*NET para o SOC-2010 (Descontinuado)

A tradução deve ser feita usando como refência o código do Major Group e título da ocupação

In [3]:
soc2010filtrado = soc2010[['Major Group', 'Title']]
soc2010filtrado.columns = ['code_SOC', 'Title_SOC']
soc2010filtrado.dropna(inplace=True)

#Captura somente o codigo major
soc2010filtrado['code_SOC'] = soc2010filtrado['code_SOC'].apply(lambda x: x[:2])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  soc2010filtrado.dropna(inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  soc2010filtrado['code_SOC'] = soc2010filtrado['code_SOC'].apply(lambda x: x[:2])


In [4]:
onet_filtrado = onet[['Code', 'Occupation']]
onet_filtrado.columns = ['code_ONET', 'Title_ONET']

#captura somente o codigo major
onet_filtrado['code_ONET'] = onet_filtrado['code_ONET'].apply(lambda x: x[0:2])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  onet_filtrado['code_ONET'] = onet_filtrado['code_ONET'].apply(lambda x: x[0:2])


In [5]:
#join onet_filtrado e soc2010filtrado by column Code_ONET and code_SOC

ocupacoes_agregado = pd.merge(onet_filtrado, soc2010filtrado, how='left', left_on='code_ONET', right_on='code_SOC')

### Tradução de SOC-2010 para ISCO-08

In [59]:
isco08

Unnamed: 0,Level,ISCO 08 Code,Title EN,Definition,Tasks include,Included occupations,Excluded occupations,Notes
0,1,1,Managers,"Managers plan, direct, coordinate and evaluate...",Tasks performed by managers usually include: f...,Occupations in this major group are classified...,,In distinguishing between managers classified ...
1,2,11,"Chief Executives, Senior Officials and Legisla...","Chief executives, senior officials and legisla...",Tasks performed by workers in this sub-major g...,Occupations in this sub-major group are classi...,,
2,3,111,Legislators and Senior Officials,"Legislators and senior officials determine, fo...",Tasks performed usually include: presiding ove...,Occupations in this minor group are classified...,,
3,4,1111,Legislators,"Legislators determine, formulate, and direct p...",Tasks include -\n(a) presiding over or partic...,Examples of the occupations classified here:\n...,,
4,4,1112,Senior Government Officials,Senior government officials advise governments...,"Tasks include -\n(a) advising national, state...",Examples of the occupations classified here:\n...,,Chief executives of Government-owned enterpris...
...,...,...,...,...,...,...,...,...
614,3,21,Non-commissioned Armed Forces Officers,Non-commissioned armed forces officers enforce...,,Occupations in this minor group are classified...,Excluded from this group are: \n - jobs held ...,
615,4,210,Non-commissioned Armed Forces Officers,Non-commissioned armed forces officers enforce...,,Examples of the occupations classified here:\n...,Excluded from this group are: \n - jobs held ...,
616,2,3,"Armed Forces Occupations, Other Ranks","Armed forces occupations, other ranks include ...",,Occupations in this sub-major group are classi...,Excluded from this group are: \n - jobs held ...,
617,3,31,"Armed Forces Occupations, Other Ranks","Armed forces occupations, other ranks include ...",,Occupations in this minor group are classified...,"parné\ndinero, posibles, cuartos, guita, mosca...",


In [7]:
isco08_filtrado = isco08[['ISCO 08 Code', 'Title EN']]
isco08_filtrado.columns = ['code_ISCO', 'Title_ISCO']

In [8]:
isco08_filtrado['code_ISCO'] = isco08_filtrado['code_ISCO'].astype(str) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isco08_filtrado['code_ISCO'] = isco08_filtrado['code_ISCO'].astype(str)


In [9]:
ocupacoes_agregado = pd.merge(ocupacoes_agregado, isco08_filtrado, how='left', left_on='code_SOC', right_on='code_ISCO')

### Tradução de ISC08 para COD 2010

In [10]:
cod2010_filtrado = cod2010[[0, 'Titulação']]
cod2010_filtrado.columns = ['code_COD', 'Title_COD']

In [11]:
cod2010_filtrado['code_COD'] = cod2010_filtrado['code_COD'].astype(str)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cod2010_filtrado['code_COD'] = cod2010_filtrado['code_COD'].astype(str)


In [21]:
ocupacoes_agregado = pd.merge(ocupacoes_agregado, cod2010_filtrado, how='left', left_on='code_ISCO', right_on='code_COD')

KeyError: 'title'

In [65]:
ocupacoes_agregado.sort_values(by=['code_SOC'])

Unnamed: 0,code_ONET,Title_ONET,teleworkable,code_SOC,Title_SOC
0,11-1011,Chief Executives,1,11-1011,Chief Executives
1,11-1011,Chief Sustainability Officers,1,11-1011,Chief Executives
2,11-1021,General and Operations Managers,1,11-1021,General and Operations Managers
3,11-2011,Advertising and Promotions Managers,1,11-2011,Advertising and Promotions Managers
4,11-2021,Marketing Managers,1,11-2021,Marketing Managers
...,...,...,...,...,...
967,53-7121,"Tank Car, Truck, and Ship Loaders",0,53-7121,"Tank Car, Truck, and Ship Loaders"
81,13-1161,Market Research Analysts and Marketing Special...,1,,
212,19-1020,Biologists,1,,
458,29-2035,Magnetic Resonance Imaging Technologists,0,,


In [22]:
pd.merge(ocupacoes_agregado, onet_teleworkable, how='left', left_on='Title_ONET', right_on='title')

Unnamed: 0,code_ONET,Title_ONET,code_SOC,Title_SOC,code_ISCO,Title_ISCO,code_COD,Title_COD,onetsoccode,title,teleworkable
0,13,Accountants and Auditors,13,Business and Financial Operations Occupations,13,Production and Specialized Services Managers,13,DIRIGENTES E GERENTES DE PRODUÇÃO E OPERAÇÃO,,,
1,27,Actors,27,"Arts, Design, Entertainment, Sports, and Media...",,,,,27-2011.00,Actors,0.0
2,15,Actuaries,15,Computer and Mathematical Occupations,,,,,15-2011.00,Actuaries,1.0
3,29,Acupuncturists,29,Healthcare Practitioners and Technical Occupat...,,,,,29-1199.01,Acupuncturists,0.0
4,29,Acute Care Nurses,29,Healthcare Practitioners and Technical Occupat...,,,,,29-1141.01,Acute Care Nurses,0.0
...,...,...,...,...,...,...,...,...,...,...,...
1108,51,"Woodworkers, All Other",51,Production Occupations,51,Personal Service Workers,,,,,
1109,51,"Woodworking Machine Setters, Operators, and Te...",51,Production Occupations,51,Personal Service Workers,,,51-7042.00,"Woodworking Machine Setters, Operators, and Te...",0.0
1110,43,Word Processors and Typists,43,Office and Administrative Support Occupations,43,Numerical and Material Recording Clerks,,,43-9022.00,Word Processors and Typists,1.0
1111,27,Writers and Authors,27,"Arts, Design, Entertainment, Sports, and Media...",,,,,,,


In [None]:
#Filtrando as ocupações que são teletrabalháveis

ocupacoes_agregado = pd.merge(ocupacoes_agregado, onet_teleworkable, how='left', left_on='code_ONET', right_on='OCC_CODE')

In [15]:
ocupacoes_agregado.dropna()['Title_COD'].value_counts()

DIRETORES EXECUTIVOS, DIRIGENTES DA ADMINISTRAÇÃO PÚBLICA E MEMBROS DO PODER EXECUTIVO E LEGISLATIVO    118
PROFISSIONAIS DE TECNOLOGIAS DA INFORMAÇÃO E COMUNICAÇÕES                                                68
DIRIGENTES E GERENTES DE PRODUÇÃO E OPERAÇÃO                                                             50
PROFISSIONAIS DE NÍVEL MÉDIO DAS CIÊNCIAS E DA ENGENHARIA                                                40
PROFISSIONAIS DAS CIÊNCIAS E DA ENGENHARIA                                                               36
PROFISSIONAIS DO ENSINO                                                                                   8
Name: Title_COD, dtype: int64