In [1]:
from IPython.core.display import HTML, Image
css_file = 'custom.css' 
HTML(open(css_file, 'r').read())

# *grmpy*  

*grmpy* can be used to estimate the benefits, cost, and surplus using your own data and using simulated data. 

    Eisenhauer, Philipp and James Heckman, Edward Vytlacil (2014): The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs, Journal of Political Economy, 123(2): 413-443.


There are two functions: (1) estimate and (2) simulate.

The grmpy package includes a sample dataset, 'test.dat', and a sample init file 'test.ini' that are included to provide an example of how the two functions work.

This tutorial reviews using both function with those sample files.

The *grmpy* package is maintained by [Philipp Eisenhauer](https://github.com/peisenha) and [Chase Corbin](https://github.com/cocorbin). We want to thank [Jake Torcasse](http://jaketorcasso.com/) and [Luke Schmerold](https://www.linkedin.com/pub/luke-schmerold/5a/308/a65/) for their help in developing earlier versions of the package. 

## Model

## Initialization File

We use an initialization file to specify the model.

```
DATA
	source		../data/test.dat
	agents 		10000

	outcome		0
	treatment	1

BENE
        
	coeff  2  0.00	0.00	false
	coeff  3  0.00	0.00	true

	coeff  4  0.00	0.00	true
	
	int       0.00	0.00
	sd        1.00	1.00

COST

	coeff  4  0.00
	coeff  5  0.00
	
	int       0.00
	sd        !1.00

RHO

	 treated   0.0
	 untreated   0.0

ESTIMATION
	
	algorithm 	bfgs
	maxiter    	15
	start		manual
	gtol       	1e-05

	epsilon    	1.4901161193847656e-08
	differences	one-sided

	asymptotics true
	hessian    	numdiff

	draws    	1000
	alpha		0.05

	version     fast

SIMULATION

	agents		1747
	seed 	  	123
	target  	simulation.dat

```

## Getting Started

You can incorporate the *grmpy* package's functions in any python project after first importing it just as you would any other python library

In [10]:
# Import grmpy package
import grmpy

# Let us run some basic tests
grmpy.test()

# Import some auxiliary functions
import auxiliary as aux

There is only a limited number of tests distributed as part of the package [(sources)](https://github.com/grmToolbox/package/blob/master/grmpy/tests/test.py). However, we have many more tests set up as part of our development process [(sources)](https://github.com/grmToolbox/package/tree/master/testing).

We briefly reproduce the interface to the *estimate* function.

In [3]:
def estimate(init='init.ini', resume=False, use_simulation=False):
    """
            Parameters
            ----------
            
                init: str, optional
                    Path to the initialization file.
                    
                resume: bool, optional
                    Restart estimation, requires info.grmpy.out.
                    
                use_simulation: bool, optional
                    Use information from SIMULATION section of 
                    the initialiation file.
                
            Results
            -------
            
                info.grmpy.out: file
                    Text file with results from estimation run.
                    
                rslt.grmpy.pkl: serialized Python object    
                    Results from estimation run.
    """
    
    pass


For any initialization file, you can simulate a dataset for the specified data generating process. This is  useful to inspect properties of the simulated population but also testing the reliability of your estimator. Also, if information from a previous estimation run is available the estimated structural parameters can be used for the simulation.

Here we reproduce the interface to the *simulate* function.

In [4]:
def simulate(init='init.ini', update=False):
    """
            Parameters
            ----------
            
                init: str, optional
                    Path to the initialization file.
                    
                update: bool, optional
                    Update structural parameters from info.grmpy.out.
                                    
            Results
            -------
            
                simulation.infos.grmpy.out
                    Text file with basic information on 
                    simulated economy.
                    
                *.dat
                    Text file with simulated dataset. The file
                    name is determined by the target flag in 
                    the SIMULATION section of the initalization 
                    file.
                
    """
    
    pass


## Basic Workflow

We now run through the basic workflow. First, we simulate a dataset as specificed in the initialization file and then run an estimation run. 

### Simulation

In [3]:
# Simulate dataset
grmpy.simulate('example.grmpy.ini')

# Inspect the results. 
%cat simulation.infos.grmpy.out


 SIMULATED ECONOMY

   Number of Observations: 1747
   Function Value:         2.13257055599

   Choices:  

     Treated            866
     Untreated          881


   Outcomes:  

     Treated           0.05538
     Untreated         0.00380



  TRUE PARAMETERS 

       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       0.000000000000000000
       1.000000000000000000
       1.000000000000000000
       1.000000000000000000


Some basic descriptives about the simulated dataset are provided such as the distribution of agents across the different treatment states and the average observed outcomes within each treatment group. Also, the value of the criterion function
at the true parameter values is printed out. To easily check the performance of the estimator later, we also print the true parameter values used in the simulation. 

### Estimation

In [5]:
# Estimate model
grmpy.estimate('example.grmpy.ini', use_simulation=True)

# Inspect the results
%cat info.grmpy.out


  START 

       0.015744600470970854
       0.034180760262436674
      -0.004735361740940788
       0.057126268753310797
       0.025106324197970483
      -0.012836080333877150
      -0.031037406079379139
       0.005922290617622228
       0.001013670882554213
       0.010916267528629392
      -0.037284824881465277
       0.000000000000000000
       0.000000000000000000
       1.043263109946092948
       0.998055314361389057
       1.000000000000000000

  STOP 

       0.018318646578632237
      -0.018673330341057552
      -0.006211520009127470
      -0.588556293176437206
       0.025218273790820752
       0.034256956552967260
      -0.031341527832669004
      -0.318378737571016479
       0.026860142888724889
      -0.259962614835427874
      -0.039642887685137296
      -0.678239470904042463
       0.373019951113544579
       1.090900776431787511
       1.185494370189602131
       1.000000000000000000

 OPTIMIZATION REPORT 

      Function:   

At the beginning, the file contains some information about the optimization process such as start and stop values and a report from the optimizer. Also, the parametrc estimates for the marginal effects of treatment are written out.

In [23]:
str_ = '{0:10.2f}{1:10.2f}{2:10.2f}{3:10.2f}'

START, STOP, TRUE = aux.get_parameters()

print '\n      START     STOP      TRUE      DIFF\n'

for i in range(len(STOP)):

    print str_.format(START[i], STOP[i], TRUE[i], np.abs(STOP[i] - TRUE[i]))



      START     STOP      TRUE      DIFF

      0.02      0.02      0.00      0.02
      0.03     -0.02      0.00      0.02
     -0.00     -0.01      0.00      0.01
      0.06     -0.59      0.00      0.59
      0.03      0.03      0.00      0.03
     -0.01      0.03      0.00      0.03
     -0.03     -0.03      0.00      0.03
      0.01     -0.32      0.00      0.32
      0.00      0.03      0.00      0.03
      0.01     -0.26      0.00      0.26
     -0.04     -0.04      0.00      0.04
      0.00     -0.68      0.00      0.68
      0.00      0.37      0.00      0.37
      1.04      1.09      1.00      0.09
      1.00      1.19      1.00      0.19
      1.00      1.00      1.00      0.00


### Visualization

To ease further processing, some information is also stored in the *rslt.grmpy.pkl* object. This can be used to create visual repüresentations of the marginal effects of treatment.