#### To use Stata seamlessly in a python jupyter notebook, follow the steps below.

Only for Windows. For Mac, see https://github.com/TiesdeKok/ipystata

Installation (only once)

1. Install ipystata by running "pip install ipystata" (without double quotes). Before doing that remember to activate your conda environment.
2. Register your Stata instance
    * Open a command window as administrator and go to your Stata installation directory (e.g. C:\Program Files (x86)\Stata15\StataSE-64.exe) and lookup the name of your Stata executable (e.g. StataSE-64.exe)
    *  Type "StataSE-64.exe /Register" (without the double quotes)
    
Set installation directory for Stata. Use the following commands (change as appropriate) by unrmarking the # symbols. The second time you run the code flag again with the # symbols.

In [1]:
import ipystata  
#from ipystata.config import config_stata  
#config_stata('C:\Program Files (x86)\Stata15\StataSE-64.exe')

Restart your kernel (Kernel > Restart). You do not need to repeat the steps above every time, but only once.

#### Usage

In [2]:
import pandas as pd

wos = pd.read_csv("wos_publications.csv")
#wos.head()

In [3]:
get_ipython().run_cell_magic('stata', '', '\ndisplay "Stata is working!"')


Stata is working!



Let's send a python DF to Stata.

In [4]:
import os
os.chdir('D:/Dropbox/Python/notebooks/crash_course/')

In [5]:
%%stata -cwd
display display "`c(pwd)'"

Set the working directory of Stata to: D:\Dropbox\Python\notebooks\crash_course

display not found
r(111);



In [6]:
print(wos.columns)
wos_a = wos.copy()
wos_a = wos_a[['UID', 'pubyear', 'pubtype']]

Index(['UID', 'accession_no', 'issn', 'eissn', 'doi', 'doc_type', 'source',
       'itemtitle', 'pubyear', 'pubmonth', 'coverdate', 'sortdate', 'vol',
       'pubtype', 'issue', 'supplement', 'special_issue', 'part_no',
       'indicator', 'is_archive', 'has_abstract', 'oases_type_gold',
       'abstract'],
      dtype='object')


Now import the DF wos_a into stata and perform some stata command, e.g. sum

In [7]:
%%stata -d wos_a
sum
sort pubtype
by pubtype: tab pubyear


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       index |      3,100      1549.5    895.0372          0       3099
         UID |          0
     pubyear |      3,100    2011.296    4.660814       1984       2018
     pubtype |          0

---------------------------------------------------------------------------------------------------------------------------------
-> pubtype = Book in series

    pubyear |      Freq.     Percent        Cum.
------------+-----------------------------------
       1998 |          1        9.09        9.09
       2001 |          1        9.09       18.18
       2004 |          1        9.09       27.27
       2006 |          1        9.09       36.36
       2009 |          1        9.09       45.45
       2010 |          1        9.09       54.55
       2011 |          1        9.09       63.64
       2013 |          2       18.18       81.82
       20

Run whatever other stata command you like. At the end, should you like to return into python, you just need to export data from stata to python in this way.

In [8]:
%%stata -d wos_a
keep if pubyear==2000
save wos_a


(3,069 observations deleted)

file wos_a.dta already exists
r(602);



In [9]:
%%stata -o df
sysuse wos_a.dta

no; data in memory would be lost
r(4);



df is a regular Pandas dataframe on which Python / Pandas actions can be performed.

In [10]:
print(df)

                    UID  pubyear  pubtype
0   WOS:000085660400001     2000  Journal
1   WOS:000085708100005     2000  Journal
2   WOS:000085975500001     2000  Journal
3   WOS:000086221000003     2000  Journal
4   WOS:000086331600004     2000  Journal
5   WOS:000086836700003     2000  Journal
6   WOS:000086985300019     2000  Journal
7   WOS:000087005300002     2000  Journal
8   WOS:000087384100008     2000  Journal
9   WOS:000087534400004     2000  Journal
10  WOS:000087539500013     2000  Journal
11  WOS:000087721100003     2000  Journal
12  WOS:000087922300007     2000  Journal
13  WOS:000088319100007     2000  Journal
14  WOS:000088373300009     2000  Journal
15  WOS:000088739900013     2000  Journal
16  WOS:000088814000005     2000  Journal
17  WOS:000088907300004     2000  Journal
18  WOS:000089012900033     2000  Journal
19  WOS:000089439100001     2000  Journal
20  WOS:000089791500001     2000  Journal
21  WOS:000089917700001     2000  Journal
22  WOS:000089917700009     2000  

In [11]:
%%stata
sessions

The following sessions have been found:
main [active]


In [12]:
%%stata
close

The following sessions have been closed:
main


For more info and examples:
https://github.com/TiesdeKok/ipystata