#### To use Stata seamlessly in a python jupyter notebook, follow the steps below.

Only for Windows. For Mac, see https://github.com/TiesdeKok/ipystata

Installation (only once)

1. Install ipystata by running "pip install ipystata" (without double quotes). Before doing that remember to activate your conda environment.
2. Register your Stata instance
    * Open a command window as administrator and go to your Stata installation directory (e.g. C:\Program Files (x86)\Stata15\StataSE-64.exe) and lookup the name of your Stata executable (e.g. StataSE-64.exe)
    *  Type "StataSE-64.exe /Register" (without the double quotes)
    
Set installation directory for Stata. Use the following commands (change as appropriate) by unrmarking the # symbols. The second time you run the code flag again with the # symbols.

In [1]:
import ipystata  
#from ipystata.config import config_stata  
#config_stata('C:\Program Files (x86)\Stata15\StataSE-64.exe')

Restart your kernel (Kernel > Restart). You do not need to repeat the steps above every time, but only once.

#### Usage

In [2]:
import pandas as pd

wos = pd.read_csv("wos_publications.csv")
wos.head()

Unnamed: 0,UID,accession_no,issn,eissn,doi,doc_type,source,itemtitle,pubyear,pubmonth,...,pubtype,issue,supplement,special_issue,part_no,indicator,is_archive,has_abstract,oases_type_gold,abstract
0,WOS:000071806400001,YV270,0170-8406,1741-3044,,Article,ORGANIZATION STUDIES,An organizational assessment of interfirm coor...,1997,,...,Journal,6,,,,,,Y,,\r\n<p>Inter-firm relationships are coordinate...
1,WOS:000072608300005,ZC705,0013-0133,,,Article,ECONOMIC JOURNAL,Pension reform and economic performance under ...,1998,MAR,...,Journal,447,,,,,,Y,,\r\n<p>We consider an overlapping generations ...
2,WOS:000073333300005,ZK531,0033-6807,,,Article,R & D MANAGEMENT,Exploiting and creating knowledge through cust...,1998,APR,...,Journal,2,,,,,,Y,,\r\n<p>Through an in-depth analysis of a custo...
3,WOS:000073475100004,ZL821,0938-2259,,,Article,ECONOMIC THEORY,The optimality of nominal contracts,1998,MAY,...,Journal,3,,,,,,Y,,\r\n<p>This paper presents a model in which ag...
4,WOS:000073918900007,ZQ946,0040-585X,,,Article,THEORY OF PROBABILITY AND ITS APPLICATIONS,"Well calibrated, coherent forecasting systems",1998,,...,Journal,1,,,,,,Y,,\r\n<p>This paper introduces a definition of p...


In [3]:
get_ipython().run_cell_magic('stata', '', '\ndisplay "Stata is working!"')


Stata is working!



Let's send a python DF to Stata.

In [4]:
import os
os.chdir('D:/Dropbox/Python/notebooks/crash_course/')

In [5]:
%%stata -cwd
display display "`c(pwd)'"

Set the working directory of Stata to: D:\Dropbox\Python\notebooks\crash_course

display not found
r(111);



In [6]:
import numpy as np
#print(wos.columns)
wos_a = wos.copy()
wos_a['y'] = np.where(wos_a['special_issue']=='SI', 1, 0)
wos_a = wos_a[['UID', 'pubyear', 'pubtype', 'y']]

Regress the variable y on pub_year.

In [7]:
import statsmodels.formula.api as sm
result = sm.ols(formula="y ~ pubyear", data=wos_a).fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.012
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     37.75
Date:                Fri, 24 May 2019   Prob (F-statistic):           9.06e-10
Time:                        09:01:04   Log-Likelihood:                -158.74
No. Observations:                3100   AIC:                             321.5
Df Residuals:                    3098   BIC:                             333.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -12.0637      1.975     -6.108      0.0

Now import the DF wos_a into stata and perform some stata command, e.g. sum

In [8]:
%%stata -d wos_a
sum
*sort pubtype
*by pubtype: tab pubyear
reg y pubyear


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       index |      3,100      1549.5    895.0372          0       3099
         UID |          0
     pubyear |      3,100    2011.296    4.660814       1984       2018
     pubtype |          0
           y |      3,100    .0706452    .2562725          0          1

      Source |       SS           df       MS      Number of obs   =     3,100
-------------+----------------------------------   F(1, 3098)      =     37.75
       Model |  2.45032103         1  2.45032103   Prob > F        =    0.0000
    Residual |  201.078389     3,098  .064905871   R-squared       =    0.0120
-------------+----------------------------------   Adj R-squared   =    0.0117
       Total |   203.52871     3,099  .065675608   Root MSE        =    .25477

------------------------------------------------------------------------------
           y |      Coef.   Std.

Run whatever other stata command you like. At the end, should you like to return into python, you just need to export data from stata to python in this way.

In [9]:
%%stata -d wos_a
keep if pubyear==2000
save wos_a


(3,069 observations deleted)

file wos_a.dta already exists
r(602);



In [10]:
%%stata -o df
sysuse wos_a.dta

no; data in memory would be lost
r(4);



df is a regular Pandas dataframe on which Python / Pandas actions can be performed.

In [11]:
print(df)

                    UID  pubyear  pubtype  y
0   WOS:000085660400001     2000  Journal  0
1   WOS:000085708100005     2000  Journal  0
2   WOS:000085975500001     2000  Journal  0
3   WOS:000086221000003     2000  Journal  0
4   WOS:000086331600004     2000  Journal  0
5   WOS:000086836700003     2000  Journal  0
6   WOS:000086985300019     2000  Journal  0
7   WOS:000087005300002     2000  Journal  0
8   WOS:000087384100008     2000  Journal  0
9   WOS:000087534400004     2000  Journal  0
10  WOS:000087539500013     2000  Journal  0
11  WOS:000087721100003     2000  Journal  0
12  WOS:000087922300007     2000  Journal  0
13  WOS:000088319100007     2000  Journal  0
14  WOS:000088373300009     2000  Journal  0
15  WOS:000088739900013     2000  Journal  0
16  WOS:000088814000005     2000  Journal  0
17  WOS:000088907300004     2000  Journal  0
18  WOS:000089012900033     2000  Journal  0
19  WOS:000089439100001     2000  Journal  0
20  WOS:000089791500001     2000  Journal  0
21  WOS:00

In [12]:
%%stata
sessions

The following sessions have been found:
main [active]


In [13]:
%%stata
close

The following sessions have been closed:
main


For more info and examples:
https://github.com/TiesdeKok/ipystata