### Estimação de um Probit sobre as características que influenciam no trabalho remoto potencial

Verificar as variáveis que mais afetam o trabalho remoto potencial no Brasil.

A base de dados utilizada é a PNAD Contínua do IBGE, referente ao 4º trimestre de 2019.

In [1]:
import pandas as pd
import numpy as np

import statsmodels.api as sm
from statsmodels.discrete.discrete_model import Probit

In [10]:
pnad = pd.read_parquet('../Dados/pnad/pnad_2019_potencial.parquet')

In [12]:
pnad.shape

(231285, 19)

In [13]:
pnad.dropna(inplace=True)

#### Variável "UF"

In [4]:
estados = pnad[["UF","teletrabalho"]].dropna()

In [5]:
X_uf = pd.get_dummies(estados['UF'])
y_uf = estados['teletrabalho']

In [13]:
X_uf = sm.add_constant(X_uf)
model = Probit(y_uf, X_uf)
probit_model = model.fit()
print(probit_model.summary())

Optimization terminated successfully.
         Current function value: 0.493229
         Iterations 26
                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               230130
Model:                         Probit   Df Residuals:                   230102
Method:                           MLE   Df Model:                           27
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                0.007913
Time:                        10:43:51   Log-Likelihood:            -1.1351e+05
converged:                       True   LL-Null:                   -1.1441e+05
Covariance Type:            nonrobust   LLR p-value:                     0.000
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -0.8392        nan        nan        nan         nan         nan
A

#### Variável "Sexo"

In [19]:
sexo = pnad[["V2007","teletrabalho"]].dropna()

In [23]:
X_sexo = pd.get_dummies(sexo['V2007'])
y_sexo = sexo['teletrabalho']

In [25]:
X_sexo = sm.add_constant(X_sexo)
model = Probit(y_sexo, X_sexo)
probit_model = model.fit()
print(probit_model.summary())

         Current function value: 0.481989
         Iterations: 35




                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               230130
Model:                         Probit   Df Residuals:                   230127
Method:                           MLE   Df Model:                            2
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                 0.03052
Time:                        10:47:34   Log-Likelihood:            -1.1092e+05
converged:                      False   LL-Null:                   -1.1441e+05
Covariance Type:            nonrobust   LLR p-value:                     0.000
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.5586        nan        nan        nan         nan         nan
Homem         -0.5314        nan        nan        nan         nan         nan
Mulher        -0.0273        nan        nan        n

#### Variável "Cor ou Raça"

In [27]:
cor_raca = pnad[["V2010","teletrabalho"]].dropna()

In [28]:
X_cor_raca = pd.get_dummies(cor_raca['V2010'])
y_cor_raca = cor_raca['teletrabalho']

In [30]:
X_cor_raca = sm.add_constant(X_cor_raca)
model = Probit(y_cor_raca, X_cor_raca)
probit_model = model.fit()
print(probit_model.summary())

         Current function value: 0.490284
         Iterations: 35




                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               230130
Model:                         Probit   Df Residuals:                   230123
Method:                           MLE   Df Model:                            6
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                 0.01384
Time:                        10:49:47   Log-Likelihood:            -1.1283e+05
converged:                      False   LL-Null:                   -1.1441e+05
Covariance Type:            nonrobust   LLR p-value:                     0.000
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.6920   5.26e+04  -1.32e-05      1.000   -1.03e+05    1.03e+05
Amarela        0.0758   5.26e+04   1.44e-06      1.000   -1.03e+05    1.03e+05
Branca         0.0282   5.26e+04   5.36e-07      1.0

#### Variável "Educação"

In [32]:
educacao = pnad[["V3009A","teletrabalho"]].dropna()

In [33]:
X_educacao = pd.get_dummies(educacao['V3009A'])
y_educacao = educacao['teletrabalho']

In [35]:
X_educacao = sm.add_constant(X_educacao)
model = Probit(y_educacao, X_educacao)
probit_model = model.fit()
print(probit_model.summary())

Optimization terminated successfully.
         Current function value: 0.352202
         Iterations 7
                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               205058
Model:                         Probit   Df Residuals:                   205043
Method:                           MLE   Df Model:                           14
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                  0.2584
Time:                        10:51:22   Log-Likelihood:                -72222.
converged:                       True   LL-Null:                       -97383.
Covariance Type:            nonrobust   LLR p-value:                     0.000
                                                                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------------------------------------
const       

#### Variável "Setor de ocupação"

In [38]:
ocupacao = pnad[["V4012","teletrabalho"]].dropna()

In [39]:
X_ocupacao = pd.get_dummies(ocupacao['V4012'])
y_ocupacao = ocupacao['teletrabalho']

In [40]:
X_educacao = sm.add_constant(X_ocupacao)
model = Probit(y_ocupacao, X_ocupacao)
probit_model = model.fit()
print(probit_model.summary())

         Current function value: 0.439999
         Iterations: 35




                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               230130
Model:                         Probit   Df Residuals:                   230123
Method:                           MLE   Df Model:                            6
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                  0.1150
Time:                        10:54:32   Log-Likelihood:            -1.0126e+05
converged:                      False   LL-Null:                   -1.1441e+05
Covariance Type:            nonrobust   LLR p-value:                     0.000
                                                                                                           coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Conta própria                 

### Probit para mais de uma variável

#### UF e Sexo

In [42]:
estados_sexo = pnad[["UF","V2007","teletrabalho"]].dropna()

In [50]:
estados_sexo['UF_sexo'] = estados_sexo['UF'] + "_" + estados_sexo['V2007']

In [52]:
X_uf_sexo = pd.get_dummies(estados_sexo[['UF_sexo']])
y_uf_sexo = estados_sexo['teletrabalho']

In [53]:
X_uf_sexo = sm.add_constant(X_uf_sexo)
model = Probit(y_uf_sexo, X_uf_sexo)
probit_model = model.fit()
print(probit_model.summary())

Optimization terminated successfully.
         Current function value: 0.477783
         Iterations 18
                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               230130
Model:                         Probit   Df Residuals:                   230075
Method:                           MLE   Df Model:                           54
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                 0.03898
Time:                        11:05:14   Log-Likelihood:            -1.0995e+05
converged:                       True   LL-Null:                   -1.1441e+05
Covariance Type:            nonrobust   LLR p-value:                     0.000
                                         coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------------
const                                 -0.8418        nan   

#### UF e Cor ou Raça

In [63]:
estados_cor_raca = pnad[["UF","V2010","teletrabalho"]].dropna()

In [64]:
estados_cor_raca['UF_cor_raca'] = estados_cor_raca['UF'] + "_" + estados_cor_raca['V2010']

In [65]:
X_uf_cor_raca = pd.get_dummies(estados_cor_raca[['UF_cor_raca']])
y_uf_cor_raca = estados_cor_raca['teletrabalho']

In [66]:
X_uf_cor_raca = sm.add_constant(X_uf_cor_raca)
model = Probit(y_uf_cor_raca, X_uf_cor_raca)
probit_model = model.fit()
print(probit_model.summary())

         Current function value: 0.487065
         Iterations: 35




                          Probit Regression Results                           
Dep. Variable:           teletrabalho   No. Observations:               230130
Model:                         Probit   Df Residuals:                   229981
Method:                           MLE   Df Model:                          148
Date:                Fri, 28 Apr 2023   Pseudo R-squ.:                 0.02031
Time:                        11:16:23   Log-Likelihood:            -1.1209e+05
converged:                      False   LL-Null:                   -1.1441e+05
Covariance Type:            nonrobust   LLR p-value:                     0.000
                                               coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------------------
const                                       -1.1002        nan        nan        nan         nan         nan
UF_cor_raca_Acre_Amarela                 