<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Quickstart:-Political-Democracy" data-toc-modified-id="Quickstart:-Political-Democracy-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Quickstart: Political Democracy</a></span></li></ul></div>

# Quickstart: Political Democracy
The model "Poltical Democracy" from Kenneth A. Bollen book is a part of **semopy** package. The model's description and the respective data can be retrieved via invoking *get_model* and *get_data* methods from the *political_democracy* submodule of the package.

In [2]:
import numpy as np
import semopy
from semopy.political_democracy import get_data, get_model

np.random.seed(2019)
model_desc = get_model()
data = get_data()

In [3]:
print(model_desc)

# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8



In [5]:
data

Unnamed: 0,y1,y2,y3,y4,y5,y6,y7,y8,x1,x2,x3
1,2.50,0.000000,3.333333,0.000000,1.250000,0.000000,3.726360,3.333333,4.442651,3.637586,2.557615
2,1.25,0.000000,3.333333,0.000000,6.250000,1.100000,6.666666,0.736999,5.384495,5.062595,3.568079
3,7.50,8.800000,9.999998,9.199991,8.750000,8.094061,9.999998,8.211809,5.961005,6.255750,5.224433
4,8.90,8.800000,9.999998,9.199991,8.907948,8.127979,9.999998,4.615086,6.285998,7.567863,6.267495
5,10.00,3.333333,9.999998,6.666666,7.500000,3.333333,9.999998,6.666666,5.863631,6.818924,4.573679
...,...,...,...,...,...,...,...,...,...,...,...
71,5.40,9.999998,6.666666,3.333333,3.750000,6.666666,6.666666,1.485166,4.477337,3.091042,1.987909
72,7.50,8.800000,9.999998,6.066666,7.500000,6.666666,9.999998,6.666666,5.337538,5.631212,3.491004
73,7.50,7.000000,9.999998,6.852998,7.500000,6.348340,6.666666,7.508044,6.129050,6.403574,5.001796
74,10.00,6.666666,9.999998,10.000000,10.000000,6.666666,9.999998,10.000000,5.003946,4.962845,3.976994


Now, we can create an instant of *Model* from *model_desc* and load data into it.

In [6]:
from semopy import Model

model = Model(model_desc)
model.load_dataset(data)

Now we can create an *Optimizer* by passing it *model*. In fact, we can create as many instances of *Optimizer* as we want. Each of them can perform an independent optimization sequence.
Let's say that we want to see estimates provided by minimising Wishart Maximumul Likelihood Ratio, ULS and GLS:

In [7]:
from semopy import Optimizer

opt_mlw = Optimizer(model)
opt_uls = Optimizer(model)
opt_gls = Optimizer(model)

# And now, we run the optimisation sequences.
lf_mlw = opt_mlw.optimize(objective='MLW') # Although MLW is default, we still provide it here for clarity.
lf_uls = opt_uls.optimize(objective='ULS')
lf_gls = opt_gls.optimize(objective='GLS')

print('Resultant objective functions'' values are:')
print('MLW: {:.3f}, ULS: {:.3f}, GLS: {:.3f}'.format(lf_mlw, lf_uls, lf_gls))

Resultant objective functions values are:
MLW: 0.508, ULS: 7.097, GLS: 0.972


Let's also try minimisng the MLW objective but instead of using default SLSQP nonlinear solver we will try using Adam with chunk_size=25 and num_epochs=1000:

In [8]:
opt_mlw_adam = Optimizer(model)
lf_mlw_adam = opt_mlw_adam.optimize(objective='MLW', method='Adam', chunk_size=25, num_epochs=1000)
print('MLW after Adam: {:.3f}'.format(lf_mlw_adam))

MLW after Adam: 0.513


Take a notice that one can't compare results for the same model based on the value of different loss functions. Fit indices (we will compute then in the end of this notebook) are a valid measure.

Also, it can be seen from the code above that we can in fact run another optimisation sequences for the same *Optimizer*, using previous parameters' estimates as starting values:

In [9]:
lf_mlw_adam_slsqp = opt_mlw_adam.optimize(method='SLSQP')
print('MLW after Adam after SLSQP: {:.3f}'.format(lf_mlw_adam_slsqp))

MLW after Adam after SLSQP: 0.508


The *inspector* module of **semopy** contains *inspect* method that is used to retrieve information on parameters' estimates in a user-friendly manner. It has two modes of display - 'list' (the default one) and 'mx'. Let's try the 'list' first:

In [10]:
from semopy.inspector import inspect

print(inspect(opt_mlw, mode='list'))

     lval  op   rval     Value        SE    Z-score       P-value
5   dem60  =~     y2  1.256759  0.182449   6.888264  5.647705e-12
6   dem60  =~     y3  1.057701  0.151394   6.986417  2.819966e-12
7   dem60  =~     y4  1.264819  0.145013   8.722092  0.000000e+00
8   dem65  =~     y6  1.185704  0.168814   7.023728  2.160272e-12
9   dem65  =~     y7  1.279501  0.159904   8.001694  1.332268e-15
10  dem65  =~     y8  1.265968  0.158112   8.006805  1.110223e-15
3   ind60  =~     x2  2.180344  0.138503  15.742231  0.000000e+00
4   ind60  =~     x3  1.818498  0.151953  11.967497  0.000000e+00
0   dem60   ~  ind60  1.482939  0.399133   3.715402  2.028809e-04
1   dem65   ~  dem60  0.837351  0.098353   8.513720  0.000000e+00
2   dem65   ~  ind60  0.572345  0.221309   2.586173  9.704827e-03
11  dem60  ~~  dem60  3.955856  0.921163   4.294415  1.751547e-05
12  dem65  ~~  dem65  0.172495  0.214804   0.803034  4.219551e-01
13  ind60  ~~  ind60  0.448458  0.086695   5.172836  2.305675e-07
14     x1 

We might also want to take a peek at starting values:

In [11]:
print(inspect(opt_mlw, mode='list', what='start'))

     lval  op   rval     Value         SE   Z-score   P-value
5   dem60  =~     y2  0.908818  17.247656  0.052692  0.957977
6   dem60  =~     y3  0.848845  12.967861  0.065458  0.947810
7   dem60  =~     y4  0.885162  16.123320  0.054899  0.956219
8   dem65  =~     y6  0.729128  14.418897  0.050568  0.959670
9   dem65  =~     y7  0.852861  14.045512  0.060721  0.951581
10  dem65  =~     y8  0.782249  14.947613  0.052333  0.958264
3   ind60  =~     x2  1.843737   2.241495  0.822548  0.410765
4   ind60  =~     x3  1.532953   1.802184  0.850609  0.394987
0   dem60   ~  ind60  0.000000   1.134786  0.000000  1.000000
1   dem65   ~  dem60  0.000000   4.275070  0.000000  1.000000
2   dem65   ~  ind60  0.000000   1.162828  0.000000  1.000000
11  dem60  ~~  dem60  0.050000   0.958579  0.052161  0.958401
12  dem65  ~~  dem65  0.050000   1.004401  0.049781  0.960297
13  ind60  ~~  ind60  0.050000   0.072395  0.690654  0.489783
14     x1  ~~     x1  0.264993   0.079626  3.327970  0.000875
15     x

The other mode of display is 'mx'. That's it, matrices with parameters values mapped to their positions will be printed:

In [12]:
print(inspect(opt_mlw, mode='mx'))

Beta:
          dem60  dem65     ind60
dem60  0.000000    0.0  1.482939
dem65  0.837351    0.0  0.572345
ind60  0.000000    0.0  0.000000
Lambda:
       dem60     dem65     ind60
x1  0.000000  0.000000  1.000000
x2  0.000000  0.000000  2.180344
x3  0.000000  0.000000  1.818498
y1  1.000000  0.000000  0.000000
y2  1.256759  0.000000  0.000000
y3  1.057701  0.000000  0.000000
y4  1.264819  0.000000  0.000000
y5  0.000000  1.000000  0.000000
y6  0.000000  1.185704  0.000000
y7  0.000000  1.279501  0.000000
y8  0.000000  1.265968  0.000000
Psi:
          dem60     dem65     ind60
dem60  3.955856  0.000000  0.000000
dem65  0.000000  0.172495  0.000000
ind60  0.000000  0.000000  0.448458
Theta:
          x1        x2      x3        y1        y2        y3        y4  \
x1  0.081547  0.000000  0.0000  0.000000  0.000000  0.000000  0.000000   
x2  0.000000  0.119805  0.0000  0.000000  0.000000  0.000000  0.000000   
x3  0.000000  0.000000  0.4667  0.000000  0.000000  0.000000  0.000000   
y1  0.

The *stats* module has various methods to calculate statistics and fit indices. However, there is a method *gather_statistics* that invokes them all:

In [13]:
from semopy.stats import gather_statistics

s = gather_statistics(opt_mlw)
print(s)

SEMStatistics(dof=35.0, ml=-1547.7909442514658, fit_val=0.5083362656743571, chi2=(38.12521992557678, 0.3291802839852196), dof_baseline=55.0, chi2_baseline=730.6540868219107, rmsea=0.03473684815745109, cfi=0.9953745267193204, gfi=0.947820424721898, agfi=0.9180035245629825, nfi=0.947820424721898, tli=0.9927313991303606, aic=3157.5818885029316, bic=3229.424020022557, params=[ParametersStatistics(value=1.4829393106346211, se=0.39913293740914707, zscore=3.715401991769875, pvalue=0.00020288089332520798), ParametersStatistics(value=0.8373509275114405, se=0.09835312202791924, zscore=8.513719851874454, pvalue=0.0), ParametersStatistics(value=0.5723445128998598, se=0.221309464452289, zscore=2.5861727798842624, pvalue=0.009704826892058538), ParametersStatistics(value=2.1803438171350416, se=0.1385028495837257, zscore=15.742230745954942, pvalue=0.0), ParametersStatistics(value=1.8184982814238366, se=0.15195310251261168, zscore=11.967496887804176, pvalue=0.0), ParametersStatistics(value=1.2567593617

A particular fit index/statistic can be invoked from *stats* module directly avoiding excess computations. For instance, let's say we want to calculate a GFI:

In [11]:
from semopy.stats import calc_gfi

print(calc_gfi(opt_gls))
print('MLW: {:.3f}, ULS: {:.3f}, GLS: {:.3f}, MLW after Adam after SLSQP: {:.3f}'.format(calc_gfi(opt_mlw),
                                                                                         calc_gfi(opt_uls),
                                                                                         calc_gfi(opt_gls),
                                                                                         calc_gfi(opt_mlw_adam)))

0.6844663290821422
MLW: 0.948, ULS: 0.997, GLS: 0.684, MLW after Adam after SLSQP: 0.948
