In [37]:
%pylab inline
import pandas

Populating the interactive namespace from numpy and matplotlib


In [38]:
data = pandas.read_stata("ZA1715_v1-0-1.dta")

    

```
The variables we want to compare are:
v563 Common Defense
v566 Common Foreign Policy 
v565 Single Currency 
v567 European Government

To do this we need the european weighting from Footnote 12: 
In calculating the correlations, national weights were applied to all observations so as to provide
a representative sample of the EU population. In addition, an identical analysis was conducted that
excluded all responses of "don't know". The results were very similar to those presented here. In in-
terpreting the correlations, remember that discrete variables allow only a crude representations of the
actual continuum of responses to each question. This tends to attenuate the magnitude of the corre-
lations among the variables (Kim and Mueller 1978, 74).

From the codebook:
5  NATION I (UK As one variable.)            
6  NATION WEIGHT I                                         
7  NATION II (NI and GB separated.)                          
8  NATION WEIGHT II                                        
9  EUROPEAN WEIGHT

(We likely only need 9, but it's easier to leave the rest in our set for now.)
```

In [39]:
# Import the data into a new data frame and give saner column names.
Table1_Data = data[['v5','v6','v7','v8','v9','v563','v565','v566','v567']]
Table1_Data.columns = [
    'Nation_1',                                                                 
    'Weight_1',                                                                 
    'Nation_2',                                                         
    'Weight_2',                                                                 
    'European_Weight',                                                          
    'Common_Defense',                                                           
    'Single_Currency',                                                          
    'Common_Foreign_Policy',                                                    
    'European_Government']

Table1_Data

Unnamed: 0,Nation_1,Weight_1,Nation_2,Weight_2,European_Weight,Common_Defense,Single_Currency,Common_Foreign_Policy,European_Government
0,GERMANY,2.290,GERMANY,2.29,5.206,FOR,FOR,FOR,FOR
1,GERMANY,1.810,GERMANY,1.81,4.114,AGAINST,FOR,AGAINST,
2,GERMANY,0.341,GERMANY,0.34,0.773,,AGAINST,FOR,AGAINST
3,GERMANY,0.690,GERMANY,0.69,1.569,FOR,AGAINST,AGAINST,
4,GERMANY,0.640,GERMANY,0.64,1.455,FOR,FOR,FOR,FOR
5,GERMANY,1.361,GERMANY,1.36,3.092,FOR,FOR,FOR,FOR
6,GERMANY,1.580,GERMANY,1.58,3.592,FOR,,FOR,FOR
7,GERMANY,1.330,GERMANY,1.33,3.023,FOR,FOR,FOR,FOR
8,GERMANY,1.420,GERMANY,1.42,3.228,FOR,FOR,FOR,FOR
9,GERMANY,0.550,GERMANY,0.55,1.250,AGAINST,FOR,AGAINST,AGAINST


```
From Footnote 11, Page 341
'Each of the four question asked the respondent if she were for or against implementing the par-
ticular proposal between the twelve countries of the EC by 1992. I coded a response of "against" as
(0), "don't know" as (0.5), and "for" as (1).

Need to recode data as:

"against" (0) 
"don't know" (0.5)
"for" as (1)

```

In [40]:
Table1_Data.Common_Defense.dtypes

category

In [49]:
# Convert data to numerical from categorical. 
# Operation can only be performed once. (Once the data isn't categorical it can't be converted again.)

col_list = Table1_Data[['Common_Defense',
                        'Single_Currency', 
                        'Common_Foreign_Policy', 
                        'European_Government']].columns

#Table1_Data[col_list].apply(lambda x: x.cat.codes)

Table1_Data.Common_Defense.cat.add_categories(['DONTKNOW'], inplace = True)
Table1_Data.Common_Defense.fillna('DONTKNOW')



# After conversion:
# 0 = For, 1 = against, -1 = no answer/don't know.
# Need to switch to expected values above. (lambda x: re(foo) should do it.)

ValueError: new categories must not include old categories: set(['DONTKNOW'])

In [50]:
Table1_Data.Common_Defense.fillna('DONTKNOW')


0             FOR
1         AGAINST
2        DONTKNOW
3             FOR
4             FOR
5             FOR
6             FOR
7             FOR
8             FOR
9         AGAINST
10            FOR
11        AGAINST
12            FOR
13            FOR
14            FOR
15            FOR
16            FOR
17            FOR
18            FOR
19        AGAINST
20            FOR
21       DONTKNOW
22        AGAINST
23            FOR
24            FOR
25       DONTKNOW
26            FOR
27       DONTKNOW
28       DONTKNOW
29            FOR
           ...   
11764    DONTKNOW
11765         FOR
11766    DONTKNOW
11767         FOR
11768    DONTKNOW
11769    DONTKNOW
11770    DONTKNOW
11771    DONTKNOW
11772    DONTKNOW
11773    DONTKNOW
11774    DONTKNOW
11775         FOR
11776    DONTKNOW
11777         FOR
11778    DONTKNOW
11779    DONTKNOW
11780         FOR
11781    DONTKNOW
11782         FOR
11783    DONTKNOW
11784    DONTKNOW
11785         FOR
11786    DONTKNOW
11787         FOR
11788     

Look at df.crosstab for table 1 corellations.

To set new categories:
df.col2.cat.rename_categories([u'x', u'y', u'z'], inplace = True)

df.col2.cat.categories = df.col2.cat.categories.reindex([u'a', u'b', u'd'])[0]
