<a href="https://colab.research.google.com/github/pvpogorelova/metrics_24_25/blob/main/sem_21.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Семинар 21. Система одновременных уравнений (SEM).**

In [6]:
!pip install linearmodels



Рассмотрим пример системы с двумя одновременными регрессионными уравнениями.
Набор данных MROZ (T.A. Mroz (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions") содержит 753 наблюдения по следующим 22 переменным:

* inlf: =1 if in lab frce, 1975
* hours: hours worked, 1975
* kidslt6: # kids < 6 years
* kidsge6: # kids 6-18
* age: woman's age in yrs
* educ: years of schooling
* wage: est. wage from earn, hrs
* repwage: rep. wage at interview in 1976
* hushrs: hours worked by husband, 1975
* husage: husband's age
* huseduc: husband's years of schooling
* huswage: husband's hourly wage, 1975
* faminc: family income, 1975
* mtr: fed. marg. tax rte facing woman
* motheduc: mother's years of schooling
* fatheduc: father's years of schooling
* unem: unem. rate in county of resid.
* city: =1 if live in SMSA
* exper: actual labor mkt exper
* nwifeinc: (faminc - wage*hours)/1000
* lwage: log(wage)
* expersq: exper^2




In [3]:
from linearmodels.datasets import mroz
from linearmodels import IV2SLS, IV3SLS, SUR

data = mroz.load()

In [4]:
# Отберем только интересующие нас признаки
data = data[
    ["hours", "educ", "age", "kidslt6", "nwifeinc", "lwage", "exper", "expersq"]
]
data = data.dropna()

Переменные $lwage$ (логарифм зарплаты) и $hours$ (количество отработанных часов за год) являются эндогенными переменными. Далее расмотрим для них два уравнения и оценим каждое с помощю 2SLS.

In [5]:
# Оценим уравнение для отработанных часов, используя в качестве инструмента для lwage экзогенные переменные - exper и expersq
hours = "hours ~ educ + age + kidslt6 + nwifeinc + [lwage ~ exper + expersq]"
hours_mod = IV2SLS.from_formula(hours, data)
hours_res = hours_mod.fit(cov_type="unadjusted")
print(hours_res)


                          IV-2SLS Estimation Summary                          
Dep. Variable:                  hours   R-squared:                      0.1903
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1807
No. Observations:                 428   F-statistic:                    399.30
Date:                Fri, Mar 07 2025   P-value (F-stat)                0.0000
Time:                        08:43:07   Distribution:                  chi2(5)
Cov. Estimator:            unadjusted                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
educ          -99.299     48.997    -2.0266     0.0427     -195.33     -3.2666
age            19.429     6.2770     3.0952     0.00

In [None]:
# Оценим уравнение для логарифма зарплаты, используя в качестве инструмента для hours экзогенные переменные - age, kidslt6 и nwifeinc
lwage = "lwage ~ educ + exper + expersq + [hours ~ age + kidslt6 + nwifeinc]"
lwage_mod = IV2SLS.from_formula(lwage, data)
lwage_res = lwage_mod.fit(cov_type="unadjusted")
print(lwage_res)

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.7582
Estimator:                    IV-2SLS   Adj. R-squared:                 0.7559
No. Observations:                 428   F-statistic:                    1362.4
Date:                Thu, Mar 06 2025   P-value (F-stat)                0.0000
Time:                        16:58:49   Distribution:                  chi2(4)
Cov. Estimator:            unadjusted                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
educ           0.0875     0.0162     5.3892     0.0000      0.0557      0.1193
exper          0.0524     0.0299     1.7501     0.08

In [None]:
# Можно оценить два уравнения одновременно в системе
system = dict(hours = hours, lwage = lwage)
system_2sls = IV3SLS.from_formula(system, data)
system_2sls_res = system_2sls.fit(method = "ols", cov_type = "unadjusted") # выбор ols эквивалентен оцениванию каждого уравнения с помощью 2SLS (сравните с результататми выше)
print(system_2sls_res)


                           System OLS Estimation Summary                           
Estimator:                        OLS   Overall R-squared:                   0.1903
No. Equations.:                     2   McElroy's R-squared:                 0.1276
No. Observations:                 428   Judge's (OLS) R-squared:            -2.0961
Date:                Thu, Mar 06 2025   Berndt's R-squared:                 -0.7279
Time:                        16:58:58   Dhrymes's R-squared:                 0.1903
                                        Cov. Estimator:                  unadjusted
                                        Num. Constraints:                      None
                  Equation: hours, Dependent Variable: hours                  
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
educ          -99.299     48.997    -2.0266     0.0427     -195.33     -3.2666
age         

In [None]:
# Использование 3SLS с GLS на 3 шаге позволяет увеличить эффективность оценок
system = dict(hours = hours, lwage = lwage)
system_3sls = IV3SLS.from_formula(system, data)
system_3sls_res = system_3sls.fit(method = "gls", cov_type = "unadjusted") # метод оценивания - GLS на 3 шаге
print(system_3sls_res)

                           System GLS Estimation Summary                           
Estimator:                        GLS   Overall R-squared:                   0.0120
No. Equations.:                     2   McElroy's R-squared:                 0.0873
No. Observations:                 428   Judge's (OLS) R-squared:            -2.7778
Date:                Thu, Mar 06 2025   Berndt's R-squared:                 -0.7279
Time:                        16:59:31   Dhrymes's R-squared:                 0.0120
                                        Cov. Estimator:                  unadjusted
                                        Num. Constraints:                      None
                  Equation: hours, Dependent Variable: hours                  
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
educ          -109.90     48.052    -2.2870     0.0222     -204.08     -15.716
age         