# Start of Data Transformation

Task: use Pandas to transform csv files into DataFrames that match desired tables for database schema

Tables:

- POLITY (done)
- STATE_DATES (done)
- TERRITORY_DATES (done)
- TERRITORIALCHANGE (done)

In [1]:
import pandas as pd
import numpy as np

In [2]:
!ls ../SourceData/CorrelatesOfWar/

[34mCodebooks[m[m                    MID_Narratives_2002-2010.pdf
CowWarList.csv               NMC_5_0-wsupplementary.csv
CowWarList.pdf               Non-StateWarData_v4.0.csv
[31mEntities.pdf[m[m                 Territories.csv
Extra-StateWarData_v4.0.csv  alliance_v4.1_by_member.csv
IGO_stateunit_v2.3.csv       contdir.csv
Inter-StateWarData_v4.0.csv  igounit_v2.3.csv
Intra-StateWarData_v4.1.csv  majors2016.csv
[31mMIDA_4.2.csv[m[m                 states2016.csv
[31mMIDB_4.2.csv[m[m                 system2016.csv
[31mMIDLOCA_2.0.csv[m[m              tc2014.csv
MID_Narratives_1993-2001.pdf


## Create 'POLITY' table

Task: transform the following csv files into a table:

- 'states2016.csv' 
- 'Territories.csv' 
- 'Non-StateWarData_v4.0.csv'
- 'Intra-StateWarData_v4.1.csv'
- 'Extra-StateWarData_v4.0.csv'

with the following attributes:

- PolityID
- PolityName
- PolityType
- StateAbbr

Note: Territories.csv was created by running Entities.pdf through [Tabula](https://tabula.technology/) and hand-correcting minor errors (for instance, some sets of rows were shifted to the left).

There were also some `\r`s introduced into rows where the TerritoryName was too long. I removed these by hand.
The carriage return characters can also be removed with this code:

`df = df.replace({r'\r': ' '}, regex=True)`

In [3]:
dfStates = pd.read_csv('../SourceData/CorrelatesOfWar/states2016.csv')
dfTerritories = pd.read_csv('../SourceData/CorrelatesOfWar/Territories.csv')
dfNonStateWarEntities = pd.read_csv('../SourceData/CorrelatesOfWar/Non-StateWarData_v4.0.csv', usecols=['SideA1', 'SideA2', 'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5'])
dfIntraStateWarEntities = pd.read_csv('../SourceData/CorrelatesOfWar/Intra-StateWarData_v4.1.csv', usecols=['CcodeA', 'SideA', 'CcodeB', 'SideB'])
dfExtraStateWarEntities = pd.read_csv('../SourceData/CorrelatesOfWar/Extra-StateWarData_v4.0.csv', usecols=['ccode1', 'SideA', 'ccode2', 'SideB'])

### STATES

In [4]:
dfStatesPOL = dfStates[['stateabb', 'ccode', 'statenme']]
dfStatesPOL.drop_duplicates(inplace=True)
dfStatesPOL.rename(columns={'stateabb':'StateAbbr', 'ccode':'PolityID', 'statenme':'PolityName'}, inplace=True)
dfStatesPOL['PolityType'] = 'State'
dfStatesPOL = dfStatesPOL[['PolityID', 'PolityName', 'PolityType', 'StateAbbr']]
dfStatesPOL

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,2,United States of America,State,USA
1,20,Canada,State,CAN
2,31,Bahamas,State,BHM
3,40,Cuba,State,CUB
5,41,Haiti,State,HAI
7,42,Dominican Republic,State,DOM
9,51,Jamaica,State,JAM
10,52,Trinidad and Tobago,State,TRI
11,53,Barbados,State,BAR
12,54,Dominica,State,DMA


### TERRITORIES

Note:
Some TerritoryIDs matched up to multiple TerritoryNames. Those Territory IDs were:

- 374
- 1152
- 3351
- 3377

I suspect this is a coding error, as the names these IDs corresponded to were in different (albiet relatively close) locations. For the sake of having a unique ID, and because only the ID is recorded in the TERRITORIALCHANGE table, I modified these by hand to be the same, with the second name in parentheses.

In [5]:
dfTerritoriesPOL = dfTerritories[['Entity Number', 'Name']]
dfTerritoriesPOL.drop_duplicates(inplace=True)
dfTerritoriesPOL.rename(columns={'Entity Number':'PolityID', 'Name':'PolityName'}, inplace=True)
dfTerritoriesPOL['PolityType'] = 'Territory'
dfTerritoriesPOL = dfTerritoriesPOL[['PolityID', 'PolityName', 'PolityType']]
dfTerritoriesPOL

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,PolityID,PolityName,PolityType
0,3,Alaska,Territory
3,4,Hawaii,Territory
5,5,Virgin Islands,Territory
7,6,Puerto Rico,Territory
10,7,Texas,Territory
14,10,Greenland,Territory
16,11,Faeroe Is.,Territory
18,20,Canada,Territory
20,21,Newfoundland,Territory
23,30,Bermuda,Territory


In [6]:
statesterrs = [dfStatesPOL, dfTerritoriesPOL]
dfPolity = pd.concat(statesterrs)
dfPolity

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,2,United States of America,State,USA
1,20,Canada,State,CAN
2,31,Bahamas,State,BHM
3,40,Cuba,State,CUB
5,41,Haiti,State,HAI
7,42,Dominican Republic,State,DOM
9,51,Jamaica,State,JAM
10,52,Trinidad and Tobago,State,TRI
11,53,Barbados,State,BAR
12,54,Dominica,State,DMA


In [7]:
dfPolity.sort_values(by=['PolityType'])
dfPolity.drop_duplicates(subset='PolityID', keep='first', inplace=True)
dfPolity

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,2,United States of America,State,USA
1,20,Canada,State,CAN
2,31,Bahamas,State,BHM
3,40,Cuba,State,CUB
5,41,Haiti,State,HAI
7,42,Dominican Republic,State,DOM
9,51,Jamaica,State,JAM
10,52,Trinidad and Tobago,State,TRI
11,53,Barbados,State,BAR
12,54,Dominica,State,DMA


In [8]:
IOrows = [(0, 'League of Nations', 'International Organization', ''),
          (1, 'United Nations', 'International Organization', '')]
dfIOrows = pd.DataFrame(IOrows, columns=['PolityID', 'PolityName', 'PolityType', 'StateAbbr'])

addIOrows = [dfIOrows, dfPolity]
dfPolity = pd.concat(addIOrows)
dfPolity

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,0,League of Nations,International Organization,
1,1,United Nations,International Organization,
0,2,United States of America,State,USA
1,20,Canada,State,CAN
2,31,Bahamas,State,BHM
3,40,Cuba,State,CUB
5,41,Haiti,State,HAI
7,42,Dominican Republic,State,DOM
9,51,Jamaica,State,JAM
10,52,Trinidad and Tobago,State,TRI


In [9]:
dfNSWE_A1 = dfNonStateWarEntities[['SideA1']]
dfNSWE_A1.rename(columns={'SideA1':'PolityName'}, inplace=True)
dfNSWE_A2 = dfNonStateWarEntities[['SideA2']]
dfNSWE_A2.rename(columns={'SideA2':'PolityName'}, inplace=True)
dfNSWE_B1 = dfNonStateWarEntities[['SideB1']]
dfNSWE_B1.rename(columns={'SideB1':'PolityName'}, inplace=True)
dfNSWE_B2 = dfNonStateWarEntities[['SideB2']]
dfNSWE_B2.rename(columns={'SideB2':'PolityName'}, inplace=True)
dfNSWE_B3 = dfNonStateWarEntities[['SideB3']]
dfNSWE_B3.rename(columns={'SideB3':'PolityName'}, inplace=True)
dfNSWE_B4 = dfNonStateWarEntities[['SideB4']]
dfNSWE_B4.rename(columns={'SideB4':'PolityName'}, inplace=True)
dfNSWE_B5 = dfNonStateWarEntities[['SideB5']]
dfNSWE_B5.rename(columns={'SideB5':'PolityName'}, inplace=True)

NSWEallsides = [dfNSWE_A1, dfNSWE_A2, dfNSWE_B1, dfNSWE_B2, dfNSWE_B3, dfNSWE_B4, dfNSWE_B5]
dfNSWEallsides = pd.concat(NSWEallsides)

dfNSWEallsides = dfNSWEallsides.replace('-8', np.NaN)
dfNSWEallsides.dropna(inplace=True)
dfNSWEallsides = dfNSWEallsides.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
dfNSWEallsides.drop_duplicates(inplace=True)
dfNSWEallsides

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,PolityName
0,Te Rauparaha's Ngati Toa
1,Shaka Zulu
2,Burma
3,Buenos Aires
4,Hongi Hika's Nga Phuhi
5,Thailand
6,China
7,Mexico
8,Conservative Confederation
9,Viang Chan


In [10]:
dfPolityMergeNSGs = pd.merge(dfPolity, dfNSWEallsides, on=['PolityName'], how='outer')
dfPolityMergeNSGs

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,0.0,League of Nations,International Organization,
1,1.0,United Nations,International Organization,
2,2.0,United States of America,State,USA
3,20.0,Canada,State,CAN
4,31.0,Bahamas,State,BHM
5,40.0,Cuba,State,CUB
6,41.0,Haiti,State,HAI
7,42.0,Dominican Republic,State,DOM
8,51.0,Jamaica,State,JAM
9,52.0,Trinidad and Tobago,State,TRI


In [11]:
dfISWE_A = dfIntraStateWarEntities[['CcodeA', 'SideA']]
dfISWE_A.rename(columns={'CcodeA':'PolityID', 'SideA':'PolityName'}, inplace=True)
dfISWE_B = dfIntraStateWarEntities[['CcodeB', 'SideB']]
dfISWE_B.rename(columns={'CcodeB':'PolityID', 'SideB':'PolityName'}, inplace=True)

ISWEallsides = [dfISWE_A, dfISWE_B]
dfISWEallsides = pd.concat(ISWEallsides)

dfISWEallsides = dfISWEallsides.replace('-8', np.NaN)
dfISWEallsides.dropna(subset=['PolityName'], inplace=True)
dfISWEallsides = dfISWEallsides.replace(-8, np.NaN)
dfISWEallsides = dfISWEallsides.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
dfISWEallsides.drop_duplicates(inplace=True)
dfISWEallsides

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,PolityID,PolityName
0,365.0,Russia
1,,Sidon
2,300.0,Austria
3,329.0,Two Sicilies
4,230.0,Spain
6,325.0,Sardinia
7,640.0,Ottoman Empire
11,,Egypt
14,235.0,Portugal
17,220.0,France


In [12]:
dfISWEallsides.PolityID.notnull()

0       True
1      False
2       True
3       True
4       True
6       True
7       True
11     False
14      True
17      True
18      True
22      True
23     False
27      True
29      True
35     False
37     False
38      True
41      True
45     False
46     False
49      True
56      True
58      True
66      True
71      True
73      True
75     False
76      True
85      True
       ...  
387    False
388    False
390    False
391    False
392     True
396    False
398    False
408    False
409    False
411    False
412    False
413    False
415    False
417    False
418    False
419    False
420    False
421    False
422    False
423    False
425    False
426    False
427    False
431    False
433    False
434    False
435    False
437    False
439     True
440    False
Name: PolityID, Length: 405, dtype: bool

In [13]:
dfISWEallsidesNSG = dfISWEallsides[dfISWEallsides.PolityID.notnull() == False]
dfISWEallsidesNSG.drop(columns=['PolityID'], inplace=True)
dfISWEallsidesNSG

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,PolityName
1,Sidon
11,Egypt
23,Palestinians
35,Egypt & Bashir
37,Lebanese Maronites
45,Maronites
46,Mayans
75,Santee Sioux
155,Ukraine Poles
156,Socialists


In [14]:
dfPolityMergeNSGs = pd.merge(dfPolityMergeNSGs, dfISWEallsidesNSG, on=['PolityName'], how='outer')
dfPolityMergeNSGs

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,0.0,League of Nations,International Organization,
1,1.0,United Nations,International Organization,
2,2.0,United States of America,State,USA
3,20.0,Canada,State,CAN
4,31.0,Bahamas,State,BHM
5,40.0,Cuba,State,CUB
6,41.0,Haiti,State,HAI
7,42.0,Dominican Republic,State,DOM
8,51.0,Jamaica,State,JAM
9,52.0,Trinidad and Tobago,State,TRI


In [15]:
dfPolityMergeNSGs['PolityName'].value_counts()

Georgia                 2
Christmas I.            2
Montenegro              2
Vietnam                 2
Samoa                   2
Benin                   2
Tonkin                  1
Audhali                 1
Bengalis                1
Danzig                  1
Bamangwato              1
Namibia                 1
Kazakhstan              1
MNLF Moros              1
Shining Path & TARM     1
Corrientes              1
Lauenburg               1
Kurds, PKK              1
Burma                   1
Vumbu Kuu               1
Turkmenistan            1
Amb                     1
Idar                    1
Mong Mit                1
Zaidi Imam              1
Santee Sioux            1
Ha'apai                 1
Alula (Mijerteyn)       1
Inkatha                 1
Pangtara                1
                       ..
South Hsen Wi           1
Rewa                    1
Diaoyu (Senkaku) Is.    1
Malawi                  1
White Settlers          1
Thailand                1
Hejaz Sultanate         1
Bijapur     

In [16]:
#dfPolityMergeNSGs.loc[dfPolityMergeNSGs['PolityName'] == 'Montenegro']

In [17]:
dfExtraStateWarEntities

Unnamed: 0,ccode1,SideA,ccode2,SideB
0,210,Netherlands,-8,-8
1,200,United Kingdom,-8,Algeria
2,640,Ottoman Empire,-8,Saudi Wahhabis
3,230,Spain,-8,San Martin revolutionaries
4,230,Spain,-8,New Granada
5,230,Spain,-8,Mina Expedition
6,200,United Kingdom,-8,Kandyan rebels
7,200,United Kingdom,-8,Marathas
8,640,Ottoman Empire,-8,Sudan states
9,230,Spain,-8,New Granada


In [18]:
dfESWE_A = dfExtraStateWarEntities[['ccode1', 'SideA']]
dfESWE_A.rename(columns={'ccode1':'PolityID', 'SideA':'PolityName'}, inplace=True)
dfESWE_B = dfExtraStateWarEntities[['ccode2', 'SideB']]
dfESWE_B.rename(columns={'ccode2':'PolityID', 'SideB':'PolityName'}, inplace=True)

ESWEallsides = [dfESWE_A, dfESWE_B]
dfESWEallsides = pd.concat(ESWEallsides)

dfESWEallsides = dfESWEallsides.replace('-8', np.NaN)
dfESWEallsides.dropna(subset=['PolityName'], inplace=True)
dfESWEallsides = dfESWEallsides.replace(-8, np.NaN)
dfESWEallsides = dfESWEallsides.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
dfESWEallsides.drop_duplicates(inplace=True)
dfESWEallsides

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,PolityID,PolityName
0,210.0,Netherlands
1,200.0,United Kingdom
2,640.0,Ottoman Empire
3,230.0,Spain
16,140.0,Brazil
17,365.0,Russia
19,220.0,France
25,135.0,Peru
27,160.0,Argentina
34,255.0,Prussia


In [19]:
dfESWEallsides.PolityID.notnull()

0       True
1       True
2       True
3       True
16      True
17      True
19      True
25      True
27      True
34      True
35      True
41      True
53      True
56      True
66      True
78      True
92      True
93      True
94      True
98      True
106     True
114     True
119     True
123     True
134     True
144     True
160     True
169     True
170     True
171     True
       ...  
141    False
142    False
143    False
145    False
146    False
148    False
149    False
150    False
151    False
152    False
153    False
154    False
156    False
157    False
158    False
159    False
160    False
163    False
166    False
167    False
168    False
169    False
170    False
171    False
172     True
174    False
176    False
178    False
186    False
189    False
Name: PolityID, Length: 171, dtype: bool

In [20]:
dfESWEallsides = dfESWEallsides[dfESWEallsides.PolityID.notnull() == False]
dfESWEallsides.drop(columns=['PolityID'], inplace=True)
dfESWEallsides

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,PolityName
1,Algeria
2,Saudi Wahhabis
3,San Martin revolutionaries
4,New Granada
5,Mina Expedition
6,Kandyan rebels
7,Marathas
8,Sudan states
10,Persia
11,Burma


In [21]:
dfPolityMergeNSGs = pd.merge(dfPolityMergeNSGs, dfESWEallsides, on=['PolityName'], how='outer')
dfPolityMergeNSGs

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,0.0,League of Nations,International Organization,
1,1.0,United Nations,International Organization,
2,2.0,United States of America,State,USA
3,20.0,Canada,State,CAN
4,31.0,Bahamas,State,BHM
5,40.0,Cuba,State,CUB
6,41.0,Haiti,State,HAI
7,42.0,Dominican Republic,State,DOM
8,51.0,Jamaica,State,JAM
9,52.0,Trinidad and Tobago,State,TRI


In [22]:
dfPolityMergeNSGs['PolityType'].fillna(value='NonState Group', inplace=True)
dfPolityMergeNSGs

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,0.0,League of Nations,International Organization,
1,1.0,United Nations,International Organization,
2,2.0,United States of America,State,USA
3,20.0,Canada,State,CAN
4,31.0,Bahamas,State,BHM
5,40.0,Cuba,State,CUB
6,41.0,Haiti,State,HAI
7,42.0,Dominican Republic,State,DOM
8,51.0,Jamaica,State,JAM
9,52.0,Trinidad and Tobago,State,TRI


In [23]:
#dfPolityMergeNSGs[dfPolityMergeNSGs['PolityType'] == 'NonState Group']

In [24]:
non_state_groups = dfPolityMergeNSGs[dfPolityMergeNSGs['PolityType'] == 'NonState Group']

start = 10000
ids = np.arange(start, start+non_state_groups.shape[0])

array([10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008,
       10009, 10010, 10011, 10012, 10013, 10014, 10015, 10016, 10017,
       10018, 10019, 10020, 10021, 10022, 10023, 10024, 10025, 10026,
       10027, 10028, 10029, 10030, 10031, 10032, 10033, 10034, 10035,
       10036, 10037, 10038, 10039, 10040, 10041, 10042, 10043, 10044,
       10045, 10046, 10047, 10048, 10049, 10050, 10051, 10052, 10053,
       10054, 10055, 10056, 10057, 10058, 10059, 10060, 10061, 10062,
       10063, 10064, 10065, 10066, 10067, 10068, 10069, 10070, 10071,
       10072, 10073, 10074, 10075, 10076, 10077, 10078, 10079, 10080,
       10081, 10082, 10083, 10084, 10085, 10086, 10087, 10088, 10089,
       10090, 10091, 10092, 10093, 10094, 10095, 10096, 10097, 10098,
       10099, 10100, 10101, 10102, 10103, 10104, 10105, 10106, 10107,
       10108, 10109, 10110, 10111, 10112, 10113, 10114, 10115, 10116,
       10117, 10118, 10119, 10120, 10121, 10122, 10123, 10124, 10125,
       10126, 10127,

In [25]:
dfPolityMergeNSGs.loc[dfPolityMergeNSGs['PolityType'] == 'NonState Group', 'PolityID'] = ids

In [26]:
dfPolityMergeNSGs

Unnamed: 0,PolityID,PolityName,PolityType,StateAbbr
0,0.0,League of Nations,International Organization,
1,1.0,United Nations,International Organization,
2,2.0,United States of America,State,USA
3,20.0,Canada,State,CAN
4,31.0,Bahamas,State,BHM
5,40.0,Cuba,State,CUB
6,41.0,Haiti,State,HAI
7,42.0,Dominican Republic,State,DOM
8,51.0,Jamaica,State,JAM
9,52.0,Trinidad and Tobago,State,TRI


In [27]:
dfPolityMergeNSGs.to_csv('../FinalData/polity.csv', encoding='utf-8', index=False)

## Create 'STATE_DATES' table

Task: transform 'states2016.csv' (saved as 'dfStates') into a table with attributes:

- StateID
- StartDate
- EndDate
- StartYear
- StartMonth
- StartDay
- EndYear
- EndMonth
- EndDay

in which each combination of StateID and StartDate occurs only once.

Note: StartDate and EndDate must be in the format 'YYYY-MM-DD'

In [28]:
dfStateDates = dfStates[['ccode', 'styear', 'stmonth', 'stday', 'endyear', 'endmonth', 'endday']]
dfStateDates.rename(columns={"ccode":"StateID", "styear": "StartYear", "stmonth":"StartMonth", "stday":"StartDay", "endyear": "EndYear", "endmonth":"EndMonth", "endday":"EndDay"}, inplace=True)

dfStateDates['StartDate'] = pd.to_datetime(dict(year=dfStateDates.StartYear, month=dfStateDates.StartMonth, day=dfStateDates.StartDay))
dfStateDates['EndDate'] = pd.to_datetime(dict(year=dfStateDates.EndYear, month=dfStateDates.EndMonth, day=dfStateDates.EndDay))

dfStateDates = dfStateDates[['StateID', 'StartDate', 'EndDate', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]

dfStateDates['StartDate'] = dfStateDates['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfStateDates['EndDate'] = dfStateDates['EndDate'].apply(lambda x: x.strftime('%Y-%m-%d'))

dfStateDates['EndDate'] = dfStateDates['EndDate'].replace('2016-12-31', '')
dfStateDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Unnamed: 0,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,2,1816-01-01,,1816,1,1,2016,12,31
1,20,1920-01-10,,1920,1,10,2016,12,31
2,31,1973-07-10,,1973,7,10,2016,12,31
3,40,1902-05-20,1906-09-25,1902,5,20,1906,9,25
4,40,1909-01-23,,1909,1,23,2016,12,31
5,41,1859-01-01,1915-07-28,1859,1,1,1915,7,28
6,41,1934-08-15,,1934,8,15,2016,12,31
7,42,1894-01-01,1916-11-29,1894,1,1,1916,11,29
8,42,1924-09-29,,1924,9,29,2016,12,31
9,51,1962-08-06,,1962,8,6,2016,12,31


In [29]:
dfStateDates.to_csv('../FinalData/state_dates.csv', encoding='utf-8', index=False)

## Create 'TERRITORY_DATES' table

Task: transform 'Territories.csv' (saved as 'dfTerritories') into a table with the following attributes:

- TerritoryID
- StartYear
- EndYear
- EndingStatus

In [30]:
dfTerritories

Unnamed: 0,Entity Number,Name,Begin Year,End Year,Ending Political Status
0,3,Alaska,1816.0,1867.0,Became colony of 365
1,3,Alaska,1867.0,1959.0,Became colony of 2
2,3,Alaska,1959.0,1993.0,Became part of 2
3,4,Hawaii,1898.0,1960.0,Became colony of 2
4,4,Hawaii,1960.0,1993.0,Became part of 2
5,5,Virgin Islands,1816.0,1917.0,Became colony of 390
6,5,Virgin Islands,1917.0,1993.0,Became colony of 2
7,6,Puerto Rico,1816.0,1821.0,Became part of 1070
8,6,Puerto Rico,1821.0,1898.0,Became colony of 230
9,6,Puerto Rico,1898.0,1952.0,Became colony of 2


In [31]:
dfTerritoryDates = dfTerritories[['Entity Number', 'Begin Year', 'End Year', 'Ending Political Status']]
dfTerritoryDates.rename(columns={'Entity Number':'PolityID', 'Begin Year':'StartYear', 'End Year':'EndYear', 'Ending Political Status':'EndingStatus'}, inplace=True)
dfTerritoryDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus
0,3,1816.0,1867.0,Became colony of 365
1,3,1867.0,1959.0,Became colony of 2
2,3,1959.0,1993.0,Became part of 2
3,4,1898.0,1960.0,Became colony of 2
4,4,1960.0,1993.0,Became part of 2
5,5,1816.0,1917.0,Became colony of 390
6,5,1917.0,1993.0,Became colony of 2
7,6,1816.0,1821.0,Became part of 1070
8,6,1821.0,1898.0,Became colony of 230
9,6,1898.0,1952.0,Became colony of 2


In [32]:
dfTerritoryDates [dfTerritoryDates.duplicated(subset=['PolityID', 'StartYear', 'EndYear'], keep=False)]

Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus
30,42,1816.0,1821.0,Became colony of 230
31,42,1816.0,1821.0,Became part of 1070
36,50,1816.0,1958.0,Became part of 56
37,50,1816.0,1958.0,Became part of 55
38,50,1816.0,1958.0,Became part of 59
39,50,1816.0,1958.0,Became part of 60
40,50,1816.0,1958.0,Became part of 52
41,50,1816.0,1958.0,Became part of 58
42,50,1816.0,1958.0,Became part of 51
43,50,1816.0,1958.0,Became part of 54


Note: this shows that all four attributes are need to form a compound primary key

In [33]:
dfTerritoryDates['StartYear'] = dfTerritoryDates['StartYear'].fillna(0)
dfTerritoryDates['EndYear'] = dfTerritoryDates['EndYear'].fillna(0)

dfTerritoryDates = dfTerritoryDates.astype({'StartYear':int, 'EndYear':int})
dfTerritoryDates = dfTerritoryDates.astype({'StartYear':str, 'EndYear':str})
dfTerritoryDates

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus
0,3,1816,1867,Became colony of 365
1,3,1867,1959,Became colony of 2
2,3,1959,1993,Became part of 2
3,4,1898,1960,Became colony of 2
4,4,1960,1993,Became part of 2
5,5,1816,1917,Became colony of 390
6,5,1917,1993,Became colony of 2
7,6,1816,1821,Became part of 1070
8,6,1821,1898,Became colony of 230
9,6,1898,1952,Became colony of 2


In [34]:
# See which years are bad
#dfTerritoryDates[~dfTerritoryDates['StartYear'].str.match('[0-9]{4}')]

In [35]:
dfTerritoryDatesClean = dfTerritoryDates[dfTerritoryDates['StartYear'].str.match('[0-9]{4}')]
dfTerritoryDatesClean = dfTerritoryDatesClean[dfTerritoryDatesClean['EndYear'].str.match('[0-9]{4}')]

dfTerritoryDatesClean

Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus
0,3,1816,1867,Became colony of 365
1,3,1867,1959,Became colony of 2
2,3,1959,1993,Became part of 2
3,4,1898,1960,Became colony of 2
4,4,1960,1993,Became part of 2
5,5,1816,1917,Became colony of 390
6,5,1917,1993,Became colony of 2
7,6,1816,1821,Became part of 1070
8,6,1821,1898,Became colony of 230
9,6,1898,1952,Became colony of 2


In [36]:
dfTerritoryDatesClean['ReferencedPolityID'] = dfTerritoryDatesClean['EndingStatus'].str.extract(r'([0-9]{1,4})')
dfTerritoryDatesClean

Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus,ReferencedPolityID
0,3,1816,1867,Became colony of 365,365
1,3,1867,1959,Became colony of 2,2
2,3,1959,1993,Became part of 2,2
3,4,1898,1960,Became colony of 2,2
4,4,1960,1993,Became part of 2,2
5,5,1816,1917,Became colony of 390,390
6,5,1917,1993,Became colony of 2,2
7,6,1816,1821,Became part of 1070,1070
8,6,1821,1898,Became colony of 230,230
9,6,1898,1952,Became colony of 2,2


In [37]:
dfTerritoryDatesClean = dfTerritoryDatesClean.astype({'StartYear':int, 'EndYear':int, 'ReferencedPolityID':int})
dfTerritoryDatesClean = dfTerritoryDatesClean.merge(dfPolity[['PolityID', 'PolityName']], left_on='ReferencedPolityID', right_on='PolityID', how='left')
dfTerritoryDatesClean.drop(columns=['PolityID_y'], inplace=True)
dfTerritoryDatesClean.rename(columns={'PolityID_x':'PolityID'}, inplace=True)
dfTerritoryDatesClean = dfTerritoryDatesClean[~dfTerritoryDatesClean['PolityName'].isnull()]
dfTerritoryDatesClean

Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus,ReferencedPolityID,PolityName
0,3,1816,1867,Became colony of 365,365,Russia
1,3,1867,1959,Became colony of 2,2,United States of America
2,3,1959,1993,Became part of 2,2,United States of America
3,4,1898,1960,Became colony of 2,2,United States of America
4,4,1960,1993,Became part of 2,2,United States of America
5,5,1816,1917,Became colony of 390,390,Denmark
6,5,1917,1993,Became colony of 2,2,United States of America
7,6,1816,1821,Became part of 1070,1070,Viceroyalty of New Spain
8,6,1821,1898,Became colony of 230,230,Spain
9,6,1898,1952,Became colony of 2,2,United States of America


In [38]:
dfTerritoryDatesClean['StatusSentence'] = dfTerritoryDatesClean['EndingStatus'].str.extract(r'([^0-9]+)')
dfTerritoryDatesClean['StatusSentence'] = dfTerritoryDatesClean['StatusSentence'].fillna('')
dfTerritoryDatesClean['EndingStatus'] = dfTerritoryDatesClean['StatusSentence'].map(str) + dfTerritoryDatesClean['PolityName'].map(str)
dfTerritoryDatesClean

Unnamed: 0,PolityID,StartYear,EndYear,EndingStatus,ReferencedPolityID,PolityName,StatusSentence
0,3,1816,1867,Became colony of Russia,365,Russia,Became colony of
1,3,1867,1959,Became colony of United States of America,2,United States of America,Became colony of
2,3,1959,1993,Became part of United States of America,2,United States of America,Became part of
3,4,1898,1960,Became colony of United States of America,2,United States of America,Became colony of
4,4,1960,1993,Became part of United States of America,2,United States of America,Became part of
5,5,1816,1917,Became colony of Denmark,390,Denmark,Became colony of
6,5,1917,1993,Became colony of United States of America,2,United States of America,Became colony of
7,6,1816,1821,Became part of Viceroyalty of New Spain,1070,Viceroyalty of New Spain,Became part of
8,6,1821,1898,Became colony of Spain,230,Spain,Became colony of
9,6,1898,1952,Became colony of United States of America,2,United States of America,Became colony of


In [39]:
dfTerritoryDatesClean.drop(columns=['PolityName', 'StatusSentence'], inplace=True)
dfTerritoryDatesClean = dfTerritoryDatesClean.rename(columns={'PolityID':'TerritoryID'})
dfTerritoryDatesClean = dfTerritoryDatesClean[['TerritoryID', 'StartYear', 'EndYear', 'EndingStatus', 'ReferencedPolityID']]
dfTerritoryDatesClean

Unnamed: 0,TerritoryID,StartYear,EndYear,EndingStatus,ReferencedPolityID
0,3,1816,1867,Became colony of Russia,365
1,3,1867,1959,Became colony of United States of America,2
2,3,1959,1993,Became part of United States of America,2
3,4,1898,1960,Became colony of United States of America,2
4,4,1960,1993,Became part of United States of America,2
5,5,1816,1917,Became colony of Denmark,390
6,5,1917,1993,Became colony of United States of America,2
7,6,1816,1821,Became part of Viceroyalty of New Spain,1070
8,6,1821,1898,Became colony of Spain,230
9,6,1898,1952,Became colony of United States of America,2


In [88]:
dfTerritoryDatesClean['EndingStatus'] = dfTerritoryDatesClean['EndingStatus'].str.replace(r'\r', ' ', regex=True)

In [90]:
dfTerritoryDatesClean.to_csv('../FinalData/territory_dates.csv', encoding='utf-8', index=False)

## Create 'TERRITORIALCHANGE' table

Task: transform tc2014.csv into a table with attributes:

- TerritorialChangeID
- Gainer
- Loser
- TransferDate
- Year
- Month
- Procedure
- TerritoryID
- TerritoryArea
- TerritoryPopulation
- IsWholeTerritory
- IsMilConflict
- IsIndependence
- GainerIsCont
- LoserIsCont
- IsGainerHomeland
- IsLoserHomeland
- IsSystemEntry
- IsSystemExit

In [42]:
dfTerrChange = pd.read_csv('../SourceData/CorrelatesOfWar/tc2014.csv')
dfTerrChange

Unnamed: 0,year,month,gainer,gaintype,procedur,entity,contgain,area,pop,portion,loser,losetype,contlose,entry,exit,number,indep,conflict,version
0,1816,7,160,1,-9,160,-9,2093164.00,1970000,1,230,0,0,1,0,3,1,0,5
1,1816,3,200,0,3,790,0,1.00,.,0,790,1,1,0,0,4,0,1,5
2,1816,.,200,0,3,420,0,179.00,.,0,-9,1,-9,0,0,5,0,0,5
3,1817,.,220,0,3,433,0,7819.00,100000,1,200,0,0,0,0,28,0,0,5
4,1817,.,365,1,1,365,1,650.00,.,0,-9,1,1,0,0,29,0,1,5
5,1818,10,2,1,3,20,1,84240.00,.,0,200,0,0,0,0,30,0,0,5
6,1818,12,155,1,-9,155,-9,464568.00,1656300,1,230,0,0,1,0,31,1,1,5
7,1818,10,200,0,3,2,0,41600.00,.,0,2,1,1,0,0,32,0,0,5
8,1818,6,200,0,1,750,0,421200.00,.,0,-9,1,-9,0,0,33,0,1,5
9,1818,.,200,0,2,438,0,16.00,.,0,-9,1,-9,0,0,34,0,0,5


In [43]:
dfTerrChange.rename(columns={"year":"Year", "month": "Month", "gainer":"Gainer", "gaintype":"IsGainerHomeland", 
                             "procedur":"Procedure", "entity":"TerritoryID", "contgain":"GainerIsCont", 
                             "area":"TerritoryArea", "pop":"TerritoryPopulation", "portion":"IsWholeTerritory", 
                             "loser":"Loser", "losetype":"IsLoserHomeland", "contlose": "LoserIsCont", 
                             "entry":"IsSystemEntry", "exit":"IsSystemExit", "number":"TerritorialChangeID", 
                             "indep":"IsIndependence", "conflict":"IsMilConflict"}, inplace=True)
dfTerrChange.drop(columns=['version'], inplace=True)

dfTerrChange['MonthClean'] = dfTerrChange['Month']
dfTerrChange['MonthClean'] [dfTerrChange['MonthClean'] == '.'] = 1 # boolean mask
dfTerrChange['TransferDate'] = pd.to_datetime(dict(year=dfTerrChange.Year, month=dfTerrChange.MonthClean, day='01'))
dfTerrChange['TransferDate'] = dfTerrChange['TransferDate'].apply(lambda x: x.strftime('%Y-%m-%d'))

dfTerrChange = dfTerrChange[['TerritorialChangeID', 'Gainer', 'Loser', 'TransferDate', 'Year', 'Month', 'Procedure', 'TerritoryID', 'TerritoryArea', 'TerritoryPopulation', 'IsWholeTerritory', 'IsMilConflict', 'IsIndependence', 'GainerIsCont', 'LoserIsCont', 'IsGainerHomeland', 'IsLoserHomeland', 'IsSystemEntry', 'IsSystemExit']]
dfTerrChange

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.


Unnamed: 0,TerritorialChangeID,Gainer,Loser,TransferDate,Year,Month,Procedure,TerritoryID,TerritoryArea,TerritoryPopulation,IsWholeTerritory,IsMilConflict,IsIndependence,GainerIsCont,LoserIsCont,IsGainerHomeland,IsLoserHomeland,IsSystemEntry,IsSystemExit
0,3,160,230,1816-07-01,1816,7,-9,160,2093164.00,1970000,1,0,1,-9,0,1,0,1,0
1,4,200,790,1816-03-01,1816,3,3,790,1.00,.,0,1,0,0,1,0,1,0,0
2,5,200,-9,1816-01-01,1816,.,3,420,179.00,.,0,0,0,0,-9,0,1,0,0
3,28,220,200,1817-01-01,1817,.,3,433,7819.00,100000,1,0,0,0,0,0,0,0,0
4,29,365,-9,1817-01-01,1817,.,1,365,650.00,.,0,1,0,1,1,1,1,0,0
5,30,2,200,1818-10-01,1818,10,3,20,84240.00,.,0,0,0,1,0,1,0,0,0
6,31,155,230,1818-12-01,1818,12,-9,155,464568.00,1656300,1,1,1,-9,0,1,0,1,0
7,32,200,2,1818-10-01,1818,10,3,2,41600.00,.,0,0,0,0,1,0,1,0,0
8,33,200,-9,1818-06-01,1818,6,1,750,421200.00,.,0,1,0,0,-9,0,1,0,0
9,34,200,-9,1818-01-01,1818,.,2,438,16.00,.,0,0,0,0,-9,0,1,0,0


In [44]:
dfTerrChange = dfTerrChange.replace(-9, '')
dfTerrChange = dfTerrChange.replace('.', '')
dfTerrChange

Unnamed: 0,TerritorialChangeID,Gainer,Loser,TransferDate,Year,Month,Procedure,TerritoryID,TerritoryArea,TerritoryPopulation,IsWholeTerritory,IsMilConflict,IsIndependence,GainerIsCont,LoserIsCont,IsGainerHomeland,IsLoserHomeland,IsSystemEntry,IsSystemExit
0,3,160,230,1816-07-01,1816,7,,160,2093164.00,1970000,1,0,1,,0,1,0,1,0
1,4,200,790,1816-03-01,1816,3,3,790,1.00,,0,1,0,0,1,0,1,0,0
2,5,200,,1816-01-01,1816,,3,420,179.00,,0,0,0,0,,0,1,0,0
3,28,220,200,1817-01-01,1817,,3,433,7819.00,100000,1,0,0,0,0,0,0,0,0
4,29,365,,1817-01-01,1817,,1,365,650.00,,0,1,0,1,1,1,1,0,0
5,30,2,200,1818-10-01,1818,10,3,20,84240.00,,0,0,0,1,0,1,0,0,0
6,31,155,230,1818-12-01,1818,12,,155,464568.00,1656300,1,1,1,,0,1,0,1,0
7,32,200,2,1818-10-01,1818,10,3,2,41600.00,,0,0,0,0,1,0,1,0,0
8,33,200,,1818-06-01,1818,6,1,750,421200.00,,0,1,0,0,,0,1,0,0
9,34,200,,1818-01-01,1818,,2,438,16.00,,0,0,0,0,,0,1,0,0


now need to test if any of the IDs for 'Gainer', 'Loser', or 'TerritoryID' are not in the POLITY dataset

In [45]:
dfgainers = dfTerrChange['Gainer']
dfpolities = dfPolity['PolityID']
flagg = dfgainers.isin(dfpolities)
flagg.value_counts()

True    837
Name: Gainer, dtype: int64

In [46]:
dflosers = dfTerrChange['Loser']
dfpolities = dfPolity['PolityID']
flagl = dflosers.isin(dfpolities)
flagl.value_counts()

True     699
False    138
Name: Loser, dtype: int64

In [47]:
dfTerrChange.loc[(flagl == False)]

Unnamed: 0,TerritorialChangeID,Gainer,Loser,TransferDate,Year,Month,Procedure,TerritoryID,TerritoryArea,TerritoryPopulation,IsWholeTerritory,IsMilConflict,IsIndependence,GainerIsCont,LoserIsCont,IsGainerHomeland,IsLoserHomeland,IsSystemEntry,IsSystemExit
2,5,200,,1816-01-01,1816,,3,420,179.00,,0,0,0,0,,0,1,0,0
4,29,365,,1817-01-01,1817,,1,365,650.00,,0,1,0,1,1,1,1,0,0
8,33,200,,1818-06-01,1818,6,1,750,421200.00,,0,1,0,0,,0,1,0,0
9,34,200,,1818-01-01,1818,,2,438,16.00,,0,0,0,0,,0,1,0,0
10,35,640,,1818-01-01,1818,,3,671,388500.00,,1,1,0,1,,1,1,0,0
20,45,640,,1822-01-01,1822,,1,625,229471.00,100000,1,1,0,1,,1,1,0,0
22,47,200,,1824-03-01,1824,3,3,830,583.00,10000,1,0,0,0,1,0,1,0,0
26,51,200,,1825-12-01,1825,12,2,9993,48.00,0,1,0,0,0,,0,1,0,0
29,54,200,,1826-01-01,1826,,2,561,200000.00,,0,0,0,0,,0,1,0,0
31,56,200,,1826-06-01,1826,6,3,821,1.00,,0,0,0,0,,0,1,0,0


In [48]:
dfTerrChangeLoserFlag = dfTerrChange.loc[(flagl == False)]
dfTerrChangeLoserFlag['Loser'].value_counts()

       137
822      1
Name: Loser, dtype: int64

In [49]:
dfTerrChange['Loser'] [flagl == False] = ''

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [50]:
dfterrs = dfTerrChange['TerritoryID']
dfpolities = dfPolity['PolityID'].astype(str)
flagt = dfterrs.isin(dfpolities)
flagt.value_counts()

True     826
False     11
Name: TerritoryID, dtype: int64

In [51]:
dfTerrChange.loc[(flagt == False)]

Unnamed: 0,TerritorialChangeID,Gainer,Loser,TransferDate,Year,Month,Procedure,TerritoryID,TerritoryArea,TerritoryPopulation,IsWholeTerritory,IsMilConflict,IsIndependence,GainerIsCont,LoserIsCont,IsGainerHomeland,IsLoserHomeland,IsSystemEntry,IsSystemExit
75,105,200,,1849-12-01,1849,12.0,1,,151536.0,9153209.0,1,1,0,0,,0,1.0,0,0
384,427,200,800.0,1909-03-01,1909,3.0,3,822.0,38195.0,450000.0,0,0,0,0,1.0,0,1.0,0,0
409,452,200,,1914-05-01,1914,5.0,2,822.0,18985.0,180412.0,0,0,0,0,,0,1.0,0,0
706,756,740,2.0,1968-06-01,1968,6.0,3,,100.0,,0,0,0,0,0.0,1,0.0,0,0
771,825,645,,1981-12-01,1981,12.0,3,,3333.0,0.0,0,0,0,1,,1,,0,0
772,826,670,,1981-12-01,1981,12.0,3,,3333.0,0.0,0,0,0,1,,1,,0,0
775,829,155,,1984-11-01,1984,11.0,3,,1079.0,0.0,0,0,0,1,,1,,0,0
776,830,160,,1984-11-01,1984,11.0,3,,1083.0,0.0,0,0,0,1,,1,,0,0
802,856,92,,1992-09-01,1992,9.0,3,,147.0,36000.0,0,0,0,1,,1,,0,0
803,857,91,,1992-09-01,1992,9.0,3,,293.0,13000.0,0,0,0,1,,1,,0,0


In [52]:
dfTerrChange['TerritoryID'] [flagt == False] = ''

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Note: territory code 822 is a problem. It occurs once as a loser and twice as a territory ID. In the list of territories, there is no 822, but there are several 822_'s (ie 8221, 8222, 8223...). My guess is 822 used to exist (near Malaysia), and was later broken down into several component territories, with the 1 digit longer codes replacing 822. However, these three instances were not replaced. For now, they will be made null.

In [53]:
dfTerrChange.to_csv('../FinalData/territorialchange.csv', encoding='utf-8', index=False)