# WAR Data Transformation

Task: use Pandas to transform csv files into DataFrames that match desired tables for database schema

Tables:

- WAR
- WAR_PARTICIPANTS
- WAR_LOCATIONS
- WAR_TRANSITIONS

![](../DatabaseDesign/IRDB_WAR_Tables.png)

In [1]:
import pandas as pd
import numpy as np

In [2]:
dfInterStateWar = pd.read_csv("../SourceData/CorrelatesOfWar/Inter-StateWarData_v4.0.csv", encoding='utf-8', na_values=[-7, -8, -9])
dfIntraStateWar = pd.read_csv("../SourceData/CorrelatesOfWar/Intra-StateWarData_v4.1.csv", encoding='latin-1', na_values=[-7, -8, -9])
dfExtraStateWar = pd.read_csv("../SourceData/CorrelatesOfWar/Extra-StateWarData_v4.0.csv", encoding='latin-1', na_values=[-7, -8, -9])
dfNonStateWar = pd.read_csv("../SourceData/CorrelatesOfWar/Non-StateWarData_v4.0.csv", encoding='utf-8', na_values=[-7, -8, -9])

dfPolities = pd.read_csv("../FinalData/polity.csv", encoding='utf-8')
dfWarNames = pd.read_csv("../SourceData/CorrelatesOfWar/CowWarList.csv", encoding='utf-8')

## Create 'WAR' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')
- CowWarList.csv (note: generated from pdf using Tabula, with `\r`s removed by hand)

with the following attributes:

- WarID
- WarShortName
- WarLongName (from CowWarList.csv)
- WarType
- IsIntervention (only relevant for Extra-State Wars)
- IsInternational (only relevant for Intra-State Wars)

Note: The carriage return characters (in CowWarList.csv) can also be removed with this code:

`df = df.replace({r'\r': ' '}, regex=True)`

In [3]:
dfInterWar = dfInterStateWar[['WarNum', 'WarName', 'WarType']]
dfInterWar = dfInterWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName'})
dfInterWar = dfInterWar.drop_duplicates()

dfIntraWar = dfIntraStateWar[['WarNum', 'WarName', 'WarType', 'Intnl']]
dfIntraWar = dfIntraWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName', 'Intnl':'IsInternational'})
dfIntraWar = dfIntraWar.drop_duplicates()

dfNonWar = dfNonStateWar[['WarNum', 'WarName', 'WarType']]
dfNonWar = dfNonWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName'})
dfNonWar = dfNonWar.drop_duplicates()

dfExtraWar = dfExtraStateWar[['WarNum', 'WarName', 'WarType', 'Interven']]
dfExtraWar = dfExtraWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName', 'Interven':'IsIntervention'})
dfExtraWar = dfExtraWar.drop_duplicates()

warDFs = [dfInterWar, dfIntraWar, dfNonWar, dfExtraWar]
dfWar = pd.concat(warDFs, sort=True).sort_values('WarID').reset_index(drop=True)
dfWar = dfWar[['WarID', 'WarShortName', 'WarType', 'IsIntervention', 'IsInternational']]
dfWar = dfWar.astype({'IsIntervention':'Int64', 'IsInternational':'Int64'})

Now to add the long names and the general category war type:

In [4]:
dfWarNamesIDs = dfWarNames['War Type & Number'].str.split("#", n = 1, expand = True)
dfWarNames['WarTypeName'] = dfWarNamesIDs[0]
dfWarNames['WarID'] = dfWarNamesIDs[1]

dfWarNames = dfWarNames[['WarID', 'WarTypeName', 'War Name']]
dfWarNames = dfWarNames.rename(columns={'War Name':'WarLongName'})
dfWarNames = dfWarNames.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
dfWarNames['WarID'] = dfWarNames['WarID'].astype(int)

In [5]:
dfWars = pd.merge(dfWar, dfWarNames, on='WarID')
dfWars = dfWars[['WarID', 'WarShortName', 'WarLongName', 'WarType', 'WarTypeName', 'IsIntervention', 'IsInternational']]
dfWars

Unnamed: 0,WarID,WarShortName,WarLongName,WarType,WarTypeName,IsIntervention,IsInternational
0,1,Franco-Spanish War,Franco-Spanish War of 1823,1,Inter-State War,,
1,4,First Russo-Turkish,First Russo-Turkish War of 1828-1829,1,Inter-State War,,
2,7,Mexican-American,Mexican-American War of 1846-1847,1,Inter-State War,,
3,10,Austro-Sardinian,Austro-Sardinian War of 1848-1849,1,Inter-State War,,
4,13,First Schleswig-Holstein,First Schleswig-Holstein War of 1848-1849,1,Inter-State War,,
...,...,...,...,...,...,...,...
649,1574,Rwandan Social Revolution,Rwandan Social Revolution of 1959-1962,8,Non-State War,,
650,1577,Dhofar Rebellion Phase 1,Dhofar Rebellion Phase 1 of 1968-1971,8,Non-State War,,
651,1581,Angola Guerilla War,Angola Guerilla War of 1974-1975,8,Non-State War,,
652,1582,East Timorese War Phase 1,East Timorese War Phase 1 of 1975,8,Non-State War,,


In [6]:
dfWars.to_csv('../FinalData/war.csv', encoding='utf-8', index=False)

## Create 'WAR_PARTICIPANTS' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- WarID
- PolityID
- StartDate
- EndDate
- StartYear
- StartMonth
- StartDay
- EndYear
- EndMonth
- EndDay
- Side
- IsInitiator
- Outcome
- Deaths

Note: 'Outcome' is pretty much entirely determined by 'WarID' and 'Side'. However, one codebook (interstate war) has an additional outcome type: 8 = changed sides. There is exactly 1 instance of this. Therefore, 'Outcome' must also be determined by 'PolityID'. This is why it is included in this table, and not a seperate one.

Similarly, 'Deaths' is almost entirely deterimined by 'WarID' and 'PolityID'. However, there are a very few instances in which it is also dependent on 'StartDate', which is why it is included in this table, and not a seperate one.

### Inter-State War

In [7]:
dfInterStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode', 'StateName', 'Side',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought', 'Initiator',
       'Outcome', 'TransTo', 'BatDeath', 'Version'],
      dtype='object')

In [8]:
dfInterWarPar1 = dfInterStateWar[['WarNum', 'ccode', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                  'EndMonth1', 'EndDay1', 'EndYear1', 'Side', 'Initiator', 'Outcome', 
                                  'BatDeath']] \
                                .rename(columns= {'WarNum':'WarID', 'ccode':'PolityID', 'StartMonth1':'StartMonth', 
                                        'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                        'EndDay1':'EndDay', 'EndYear1':'EndYear', 'Initiator':'IsInitiator', 
                                        'BatDeath':'Deaths'})

dfInterWarPar2 = dfInterStateWar[['WarNum', 'ccode', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                  'EndMonth2', 'EndDay2', 'EndYear2', 'Side', 'Initiator', 'Outcome', 
                                  'BatDeath']] \
                                .rename(columns={'WarNum':'WarID', 'ccode':'PolityID', 'StartMonth2':'StartMonth', 
                                        'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                        'EndDay2':'EndDay', 'EndYear2':'EndYear', 'Initiator':'IsInitiator', 
                                        'BatDeath':'Deaths'}) \
                                .dropna(subset=['StartMonth', 'StartDay', 'StartYear'], how='all')

according to codebook, for the 'Initiator' column, 1 = yes, did initiate; 2 = no, did not initiate. Need to standardize by changing the 2 to 0 (the more universally recognized number for False)

original table as possible values in 'Side' column as 1 and 2. In order to standardize with other tables, need to convert these to A and B.

In [9]:
dfInterWarPar = pd.concat([dfInterWarPar1, dfInterWarPar2]) \
                .reset_index(drop=True) \
                .replace({'IsInitiator': {2: 0}, 'Side': {1: 'A', 2: 'B'}})

dfInterWarPar = dfInterWarPar[['WarID', 'PolityID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome', 'Deaths']]
dfInterWarPar

Unnamed: 0,WarID,PolityID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome,Deaths
0,1,230,1823.0,4.0,7.0,1823.0,11.0,13.0,B,0,2,600.0
1,1,220,1823.0,4.0,7.0,1823.0,11.0,13.0,A,1,1,400.0
2,4,640,1828.0,4.0,26.0,1829.0,9.0,14.0,B,0,2,80000.0
3,4,365,1828.0,4.0,26.0,1829.0,9.0,14.0,A,1,1,50000.0
4,7,70,1846.0,4.0,25.0,1847.0,9.0,14.0,B,0,2,6000.0
...,...,...,...,...,...,...,...,...,...,...,...,...
351,148,660,1948.0,10.0,15.0,1948.0,10.0,31.0,B,0,2,500.0
352,148,663,1948.0,10.0,15.0,1948.0,10.0,31.0,B,1,2,1000.0
353,148,645,1948.0,10.0,15.0,1948.0,10.0,31.0,B,0,2,500.0
354,184,640,1974.0,8.0,14.0,1974.0,8.0,16.0,A,1,1,1000.0


### Intra-State War

In [10]:
dfIntraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'CcodeA', 'SideA', 'CcodeB', 'SideB',
       'Intnl', 'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1',
       'EndDay1', 'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2',
       'EndMonth2', 'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought',
       'Initiator', 'Outcome', 'TransTo', 'SideADeaths', 'SideBDeaths',
       'Version'],
      dtype='object')

- 1A = Side A, First set of start/end dates
- 2A = Side A, Second set of start/end dates (need to get rid of rows with no date values)
- 1B = Side B, First set of start/end dates
- 2B = Side B, Second set of start/end dates (need to get rid of rows with no date values)

In [11]:
dfIntraWarPar1A = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                   'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome', 'SideADeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'CcodeA':'PolityID', 'SideA':'PolityName', 
                                                  'StartMonth1':'StartMonth', 'StartDay1':'StartDay', 
                                                  'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                                  'EndDay1':'EndDay', 'EndYear1':'EndYear', 'SideADeaths':'Deaths'})

dfIntraWarPar2A = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                   'EndMonth2', 'EndDay2', 'EndYear2', 'Initiator', 'Outcome', 'SideADeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'CcodeA':'PolityID', 'SideA':'PolityName', 
                                                  'StartMonth2':'StartMonth', 'StartDay2':'StartDay', 
                                                  'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                                  'EndDay2':'EndDay', 'EndYear2':'EndYear', 'SideADeaths':'Deaths'}) \
                                 .dropna(subset=['StartMonth', 'StartDay', 'StartYear'], how='all')

dfIntraWarPar1B = dfIntraStateWar[['WarNum', 'CcodeB', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                   'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome', 'SideBDeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'CcodeB':'PolityID', 'SideB':'PolityName', 
                                                  'StartMonth1':'StartMonth', 'StartDay1':'StartDay', 
                                                  'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                                  'EndDay1':'EndDay', 'EndYear1':'EndYear', 'SideBDeaths':'Deaths'})

dfIntraWarPar2B = dfIntraStateWar[['WarNum', 'CcodeB', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                   'EndMonth2', 'EndDay2', 'EndYear2', 'Initiator', 'Outcome', 'SideBDeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'CcodeB':'PolityID', 'SideB':'PolityName', 
                                                  'StartMonth2':'StartMonth', 'StartDay2':'StartDay', 
                                                  'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                                  'EndDay2':'EndDay', 'EndYear2':'EndYear', 'SideBDeaths':'Deaths'}) \
                                 .dropna(subset=['StartMonth', 'StartDay', 'StartYear'], how='all')

for side B, the outcomes needs to be switched in order to reflect the schema of 1 = this side won, and 2 = this side lost

In [12]:
dfIntraWarParA = pd.concat([dfIntraWarPar1A, dfIntraWarPar2A]) \
                   .reset_index(drop=True) \
                   .dropna(subset=['PolityName'])
dfIntraWarParA['Side'] = 'A'

dfIntraWarParB = pd.concat([dfIntraWarPar1B, dfIntraWarPar2B]) \
                   .reset_index(drop=True) \
                   .dropna(subset=['PolityName'])
dfIntraWarParB['Side'] = 'B'

dfIntraWarParB = dfIntraWarParB.replace({'Outcome': {2: 20, 1: 10}}) \
                               .replace({'Outcome': {20: 1, 10: 2}})

In [13]:
dfIntraWarPar = pd.concat([dfIntraWarParA, dfIntraWarParB]) \
                .sort_values('WarID') \
                .reset_index(drop=True)

dfIntraWarPar['PolityName'] = dfIntraWarPar['PolityName'].str.strip()
dfIntraWarPar['Initiator'] = dfIntraWarPar['Initiator'].str.strip()

create the 'IsInitiator' column based on the 'Initiator' column.

Intra-State War is the only table that records the initiator as a text value. If it matches the polity name, that is an easy fix. If it does not match the polity name, need to manually determine the correct initiator.

In [14]:
dfIntraWarPar['IsInitiator'] = 0
dfIntraWarPar.loc[dfIntraWarPar['PolityName'] == dfIntraWarPar['Initiator'], 'IsInitiator'] = 1

In [15]:
checkinit = dfIntraWarPar.groupby('WarID')['IsInitiator'].sum()
checkinit.value_counts()

1    276
0     51
2      7
Name: IsInitiator, dtype: int64

276 wars are ok - there is one initiator (what it should be). 51 wars are missing an initiator - need to manually go through these. 7 wars have two initiators - need to double check what is going on here.

In [16]:
doubleInit = checkinit.loc[checkinit==2].index
dfIntraWarPar[dfIntraWarPar.WarID.isin(doubleInit)]

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Deaths,Side,IsInitiator
86,547,,Liberals,5.0,15.0,1848.0,5.0,15.0,1849.0,Liberals,2,,B,1
87,547,,Liberals,1.0,12.0,1848.0,1.0,27.0,1848.0,Liberals,2,,B,1
88,547,329.0,Two Sicilies,5.0,15.0,1848.0,5.0,15.0,1849.0,Liberals,1,1500.0,A,0
89,547,329.0,Two Sicilies,1.0,12.0,1848.0,1.0,27.0,1848.0,Liberals,1,1500.0,A,0
161,590,,Conservatives,8.0,14.0,1869.0,1.0,7.0,1871.0,Conservatives,1,,B,1
162,590,,Conservatives,1.0,11.0,1868.0,8.0,14.0,1868.0,Conservatives,1,,B,1
163,590,101.0,Venezuela,1.0,11.0,1868.0,8.0,14.0,1868.0,Conservatives,2,,A,0
164,590,101.0,Venezuela,8.0,14.0,1869.0,1.0,7.0,1871.0,Conservatives,2,,A,0
211,623,,Tonghak Society,9.0,14.0,1894.0,11.0,28.0,1894.0,Tonghak Society,2,,B,1
212,623,,Tonghak Society,2.0,29.0,1894.0,5.0,6.0,1894.0,Tonghak Society,2,,B,1


looks like the double init cases are just when an initiating party left the conflict and re-entered at a later date - this is fine.

In [17]:
missingInit = checkinit.loc[checkinit==0].index

with pd.option_context("display.max_rows", 120):
    display(dfIntraWarPar[dfIntraWarPar.WarID.isin(missingInit)])

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Deaths,Side,IsInitiator
0,500,365.0,Russia,6.0,10.0,1818.0,,,1822.0,Chechens,1,5000.0,A,0
1,500,,"Georgians, Dhagestania, Chechens",6.0,10.0,1818.0,,,1822.0,Chechens,2,6000.0,B,0
36,518,640.0,Ottoman Empire,10.0,1.0,1831.0,12.0,27.0,1832.0,Egyptians,2,8000.0,A,0
37,518,,Egyptians & Bashir,10.0,1.0,1831.0,12.0,27.0,1832.0,Egyptians,1,4000.0,B,0
63,533,640.0,Ottoman Empire,6.0,10.0,1839.0,6.0,24.0,1839.0,Mehmet Ali,2,2000.0,A,0
64,533,,Egypt,6.0,10.0,1839.0,6.0,24.0,1839.0,Mehmet Ali,1,1000.0,B,0
78,542,640.0,Ottoman Empire,12.0,19.0,1842.0,1.0,13.0,1843.0,Ottomans,1,1600.0,A,0
79,542,,Karbala,12.0,19.0,1842.0,1.0,13.0,1843.0,Ottomans,2,3000.0,B,0
90,548,,Paez led Conservatives,2.0,4.0,1848.0,8.0,15.0,1849.0,Former Pres. Paez,2,,B,0
91,548,101.0,Venezuela,2.0,4.0,1848.0,8.0,15.0,1849.0,Former Pres. Paez,1,1500.0,A,0


based on manual examination (and googling where entity names in initiator are unclear), I made a list of the df index values where IsInitiator should be 1

In [18]:
isInitIndex = [1, 37, 63, 78, 90, 102, 105, 110, 131, 137, 143, 153, 155, 243, 258, 265, 280, 288, 293, 299, 310, 330, 336, 
              359, 394, 401, 461, 497, 526, 529, 530, 544, 552, 572, 578, 583, 598, 608, 615, 619, 621, 633, 636, 692, 
              694, 698, 749, 750, 755, 757, 770, 776, 777]

dfIntraWarPar.loc[isInitIndex, 'IsInitiator'] = 1

checkinit2 = dfIntraWarPar.groupby('WarID')['IsInitiator'].sum()
checkinit2.value_counts()

1    325
2      9
Name: IsInitiator, dtype: int64

Need to correct errors in date fields. Note: these errors were found when transforming the date feilds into a proper datetime object.

In [19]:
dfIntraWarPar[(dfIntraWarPar['EndDay'] < 0) | (dfIntraWarPar['EndDay'] > 31)]

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Deaths,Side,IsInitiator
153,585,,Zhang Jizhong's followers,10.0,,1866.0,10.0,-91866.0,,Zhang Jizhong,2,,B,1
154,585,710.0,China,10.0,,1866.0,10.0,-91866.0,,Zhang Jizhong,1,,A,0
294,682,,German Freikorps,1.0,6.0,1919.0,5.0,1919.0,,Socialists,1,70.0,B,0
295,682,,Socialists,1.0,6.0,1919.0,5.0,1919.0,,Socialists,2,2100.0,A,1


In [20]:
dfIntraWarPar[dfIntraWarPar['WarID']==623]

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Deaths,Side,IsInitiator
211,623,,Tonghak Society,9.0,14.0,1894.0,11.0,28.0,1894.0,Tonghak Society,2,,B,1
212,623,,Tonghak Society,2.0,29.0,1894.0,5.0,6.0,1894.0,Tonghak Society,2,,B,1
213,623,730.0,Korea,9.0,14.0,1894.0,11.0,28.0,1894.0,Tonghak Society,1,,A,0
214,623,730.0,Korea,2.0,29.0,1894.0,5.0,6.0,1894.0,Tonghak Society,1,,A,0
215,623,740.0,Japan,10.0,24.0,1894.0,11.0,28.0,1894.0,Tonghak Society,1,,A,0


In [21]:
dfIntraWarPar[dfIntraWarPar['WarID']==585] = dfIntraWarPar.replace({'EndDay': {-91866: np.nan}, 'EndYear': {np.nan: 1866}})
dfIntraWarPar[dfIntraWarPar['WarID']==682] = dfIntraWarPar.replace({'EndDay': {1919: np.nan}, 'EndYear': {np.nan: 1919}})
dfIntraWarPar[dfIntraWarPar['WarID']==623] = dfIntraWarPar.replace({'StartDay': {29: 28}})

In [22]:
dfIntraWarPar[dfIntraWarPar.index.isin([153, 154, 294, 295, 211, 212, 213, 214, 215])]

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Deaths,Side,IsInitiator
153,585,,Zhang Jizhong's followers,10.0,,1866.0,10.0,,1866.0,Zhang Jizhong,2,,B,1
154,585,710.0,China,10.0,,1866.0,10.0,,1866.0,Zhang Jizhong,1,,A,0
211,623,,Tonghak Society,9.0,14.0,1894.0,11.0,28.0,1894.0,Tonghak Society,2,,B,1
212,623,,Tonghak Society,2.0,28.0,1894.0,5.0,6.0,1894.0,Tonghak Society,2,,B,1
213,623,730.0,Korea,9.0,14.0,1894.0,11.0,28.0,1894.0,Tonghak Society,1,,A,0
214,623,730.0,Korea,2.0,28.0,1894.0,5.0,6.0,1894.0,Tonghak Society,1,,A,0
215,623,740.0,Japan,10.0,24.0,1894.0,11.0,28.0,1894.0,Tonghak Society,1,,A,0
294,682,,German Freikorps,1.0,6.0,1919.0,5.0,,1919.0,Socialists,1,70.0,B,0
295,682,,Socialists,1.0,6.0,1919.0,5.0,,1919.0,Socialists,2,2100.0,A,1


In [23]:
dfIntraWarPar['Deaths'] = dfIntraWarPar['Deaths'].str.replace(',', '').astype(float)
dfIntraWarPar = dfIntraWarPar[['WarID', 'PolityID', 'PolityName', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome', 'Deaths']]
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome,Deaths
0,500,365.0,Russia,1818.0,6.0,10.0,1822.0,,,A,0,1,5000.0
1,500,,"Georgians, Dhagestania, Chechens",1818.0,6.0,10.0,1822.0,,,B,1,2,6000.0
2,501,,Sidon,1820.0,6.0,,1821.0,7.0,21.0,A,1,2,
3,501,,Damascus & Aleppo,1820.0,6.0,,1821.0,7.0,21.0,B,0,1,
4,502,,Liberals,1820.0,7.0,2.0,1821.0,3.0,23.0,B,1,2,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
790,938,520.0,Somalia,2006.0,3.0,6.0,2008.0,6.0,1.0,A,0,1,
791,940,,LTTE,2006.0,10.0,11.0,,,,B,0,5,
792,940,780.0,Sri Lanka,2006.0,10.0,11.0,,,,A,1,5,
793,941,,Zaidi Muslims,2007.0,1.0,29.0,2007.0,6.0,16.0,B,1,6,2000.0


### Non-State War

- again, all polities belong in a single column
- all wars have an initiator as A or B, need to change to reflect schema of 1 = is Initiator, 0 = isn't initiator
- Outcome is set to 1 = side A wins, 2 = side B wins; need to change (side B) to reflect schema of 1 = this side wins, 2 = this side loses

In [24]:
dfNonStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'WhereFought', 'SideA1', 'SideA2',
       'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5', 'StartYear',
       'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator',
       'TransFrom', 'TransTo', 'Outcome', 'SideADeaths', 'SideBDeaths',
       'TotalCombatDeaths', 'Version'],
      dtype='object')

In [25]:
dfNonWarParA1 = dfNonStateWar[['WarNum', 'SideA1', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideADeaths']] \
                             .rename(columns={'SideA1':'PolityName', 'WarNum':'WarID', 'SideADeaths': 'Deaths'})

dfNonWarParA2 = dfNonStateWar[['WarNum', 'SideA2', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideADeaths']] \
                             .rename(columns={'SideA2':'PolityName', 'WarNum':'WarID', 'SideADeaths': 'Deaths'})

dfNonWarParB1 = dfNonStateWar[['WarNum', 'SideB1', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideBDeaths']] \
                             .rename(columns={'SideB1':'PolityName', 'WarNum':'WarID', 'SideBDeaths': 'Deaths'})

dfNonWarParB2 = dfNonStateWar[['WarNum', 'SideB2', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideBDeaths']] \
                             .rename(columns={'SideB2':'PolityName', 'WarNum':'WarID', 'SideBDeaths': 'Deaths'})

dfNonWarParB3 = dfNonStateWar[['WarNum', 'SideB3', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideBDeaths']] \
                             .rename(columns={'SideB3':'PolityName', 'WarNum':'WarID', 'SideBDeaths': 'Deaths'})

dfNonWarParB4 = dfNonStateWar[['WarNum', 'SideB4', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideBDeaths']] \
                             .rename(columns={'SideB4':'PolityName', 'WarNum':'WarID', 'SideBDeaths': 'Deaths'})

dfNonWarParB5 = dfNonStateWar[['WarNum', 'SideB5', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 
                               'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideBDeaths']] \
                             .rename(columns={'SideB5':'PolityName', 'WarNum':'WarID', 'SideBDeaths': 'Deaths'})

In [26]:
dfNonWarParA = pd.concat([dfNonWarParA1, dfNonWarParA2]) \
               .dropna(subset=['PolityName']).reset_index(drop=True)
dfNonWarParA['Side'] = 'A'
dfNonWarParA['IsInitiator'] = 1
dfNonWarParA.loc[dfNonWarParA['Initiator'] == 'B', 'IsInitiator'] = 0

In [27]:
dfNonWarParB = pd.concat([dfNonWarParB1, dfNonWarParB2, dfNonWarParB3, dfNonWarParB4, dfNonWarParB5]) \
               .dropna(subset=['PolityName']).reset_index(drop=True)
dfNonWarParB['Side'] = 'B'
dfNonWarParB['IsInitiator'] = 1
dfNonWarParB.loc[dfNonWarParB['Initiator'] == 'A', 'IsInitiator'] = 0

dfNonWarParB = dfNonWarParB.replace({'Outcome': {2: 20, 1: 10}}) \
                           .replace({'Outcome': {20: 1, 10: 2}})

In [28]:
dfNonWarPar = pd.concat([dfNonWarParA, dfNonWarParB]).sort_values('WarID').reset_index(drop=True)
dfNonWarPar['PolityID'] = np.nan
dfNonWarPar = dfNonWarPar[['WarID', 'PolityID', 'PolityName', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome', 'Deaths']]
dfNonWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome,Deaths
0,1500,,Te Rauparaha's Ngati Toa,1818,,,1824,,,A,1,1,1500.0
1,1500,,Ngati Ira,1818,,,1824,,,B,0,2,6000.0
2,1500,,Waikato,1818,,,1824,,,B,0,2,6000.0
3,1500,,Ngai Tahu,1818,,,1824,,,B,0,2,6000.0
4,1500,,Taranaki,1818,,,1824,,,B,0,2,6000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
139,1582,,Apodeti,1975,8.0,11.0,1975,10.0,15.0,A,0,4,
140,1582,,Fretilin,1975,8.0,11.0,1975,10.0,15.0,A,0,4,
141,1582,,UDT,1975,8.0,11.0,1975,10.0,15.0,B,1,4,
142,1594,,Lendu,1999,6.0,,2005,3.0,,B,1,6,


### Extra-State War

- again, all polities in a single column, no duplicate date columns
- initiator needs to be fixed for side B (so 1 = this side is initiator)
- outcome needs to be fixed for side B (so 1 = this side wins, 2 = this side loses)

In [29]:
dfExtraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode1', 'SideA', 'ccode2', 'SideB',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2 ', 'EndYear2', 'Initiator', 'Interven', 'TransFrom', 'Outcome',
       'TransTo', 'WhereFought', 'BatDeath', 'NonStateDeaths', 'Version'],
      dtype='object')

In [30]:
dfExtraWarPar1A = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                   'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome', 
                                   'BatDeath', 'NonStateDeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'ccode1':'PolityID', 
                                        'SideA':'PolityName', 'StartMonth1':'StartMonth', 
                                        'StartDay1':'StartDay', 'StartYear1':'StartYear', 
                                        'EndMonth1':'EndMonth', 'EndDay1':'EndDay', 
                                        'EndYear1':'EndYear', 'Initiator': 'IsInitiator'})

dfExtraWarPar2A = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                   'EndMonth2', 'EndDay2 ', 'EndYear2', 'Initiator', 'Outcome', 
                                   'BatDeath', 'NonStateDeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'ccode1':'PolityID', 
                                        'SideA':'PolityName', 'StartMonth2':'StartMonth', 
                                        'StartDay2':'StartDay', 'StartYear2':'StartYear', 
                                        'EndMonth2':'EndMonth', 'EndDay2 ':'EndDay', 
                                        'EndYear2':'EndYear', 'Initiator': 'IsInitiator'})

dfExtraWarPar1B = dfExtraStateWar[['WarNum', 'ccode2', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                   'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome', 
                                   'BatDeath', 'NonStateDeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'ccode2':'PolityID', 
                                        'SideB':'PolityName', 'StartMonth1':'StartMonth', 
                                        'StartDay1':'StartDay', 'StartYear1':'StartYear', 
                                        'EndMonth1':'EndMonth', 'EndDay1':'EndDay', 
                                        'EndYear1':'EndYear', 'Initiator': 'IsInitiator'})

dfExtraWarPar2B = dfExtraStateWar[['WarNum', 'ccode2', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                   'EndMonth2', 'EndDay2 ', 'EndYear2', 'Initiator', 'Outcome', 
                                   'BatDeath', 'NonStateDeaths']] \
                                 .rename(columns={'WarNum':'WarID', 'ccode2':'PolityID', 
                                        'SideB':'PolityName', 'StartMonth2':'StartMonth', 
                                        'StartDay2':'StartDay', 'StartYear2':'StartYear', 
                                        'EndMonth2':'EndMonth', 'EndDay2 ':'EndDay', 
                                        'EndYear2':'EndYear', 'Initiator': 'IsInitiator'})

In [31]:
dfExtraWarParA = pd.concat([dfExtraWarPar1A, dfExtraWarPar2A]) \
                .dropna(subset=['StartDay', 'StartMonth', 'StartYear'], how='all') \
                .reset_index(drop=True)
dfExtraWarParA['Side'] = 'A'

In [32]:
dfExtraWarParB = pd.concat([dfExtraWarPar1B, dfExtraWarPar2B]) \
                .dropna(subset=['StartDay', 'StartMonth', 'StartYear'], how='all') \
                .reset_index(drop=True) \
                .replace({'IsInitiator': {1: 10, 0: 20}, 'Outcome': {1: 10, 2: 20}}) \
                .replace({'IsInitiator': {10: 0, 20: 1}, 'Outcome': {10: 2, 20: 1}})
dfExtraWarParB['Side'] = 'B'

In [33]:
dfExtraWarPar = pd.concat([dfExtraWarParA, dfExtraWarParB]) \
                .dropna(subset=['PolityName']).reset_index(drop=True)

dfExtraWarPar['Deaths'] = np.nan
dfExtraWarPar['Deaths'] = np.where(dfExtraWarPar['PolityID'].isna(), dfExtraWarPar['NonStateDeaths'], dfExtraWarPar['BatDeath'])

In [34]:
dfExtraWarPar = dfExtraWarPar[['WarID', 'PolityID', 'PolityName', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome', 'Deaths']]
dfExtraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome,Deaths
0,300,210.0,Netherlands,1816.0,8.0,26.0,1816.0,8.0,30.0,A,1,1,13.0
1,300,200.0,United Kingdom,1816.0,8.0,26.0,1816.0,8.0,30.0,A,1,1,129.0
2,301,640.0,Ottoman Empire,1816.0,9.0,,1818.0,9.0,11.0,A,1,1,13500.0
3,302,230.0,Spain,1817.0,1.0,9.0,1818.0,4.0,5.0,A,0,2,1700.0
4,303,230.0,Spain,1817.0,4.0,11.0,1819.0,8.0,10.0,A,1,2,3000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
363,481,,al-Qaeda & Taliban,2001.0,12.0,23.0,,,,B,0,5,12000.0
364,482,,al-Qaeda & Iraqi resistence,2003.0,5.0,3.0,,,,B,0,5,20000.0
365,334,,Bali,1849.0,4.0,2.0,1849.0,6.0,14.0,B,0,6,2000.0
366,379,,Afghanistan,1879.0,9.0,3.0,1880.0,9.0,2.0,B,0,2,11000.0


### Combine all war types

- need to fill in IDs for nonstate actors
- need to create startdate, startdate_prec, enddate, and enddate_prec columns

In [35]:
combinedWarPar = [dfInterWarPar, dfIntraWarPar, dfNonWarPar, dfExtraWarPar]
dfWarPar = pd.concat(combinedWarPar, sort=True).sort_values(['WarID', 'Side', 'StartYear']) \
             .drop_duplicates().reset_index(drop=True)

dfWarPar['PolityName'] = dfWarPar['PolityName'].str.strip()

dfWarPar = dfWarPar.merge(dfPolities, on='PolityName', how='left', suffixes=('', '_p'))
dfWarPar['PolityID'] = dfWarPar['PolityID'].fillna(dfWarPar['PolityID_p'])

In [36]:
dfWarPar = dfWarPar[['WarID', 'PolityID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome', 'Deaths']].drop_duplicates()
dfWarPar

Unnamed: 0,WarID,PolityID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome,Deaths
0,1,220.0,1823.0,4.0,7.0,1823.0,11.0,13.0,A,1,1,400.0
1,1,230.0,1823.0,4.0,7.0,1823.0,11.0,13.0,B,0,2,600.0
2,4,365.0,1828.0,4.0,26.0,1829.0,9.0,14.0,A,1,1,50000.0
3,4,640.0,1828.0,4.0,26.0,1829.0,9.0,14.0,B,0,2,80000.0
4,7,2.0,1846.0,4.0,25.0,1847.0,9.0,14.0,A,1,1,13283.0
...,...,...,...,...,...,...,...,...,...,...,...,...
1666,1582,10042.0,1975.0,8.0,11.0,1975.0,10.0,15.0,A,0,4,
1667,1582,10038.0,1975.0,8.0,11.0,1975.0,10.0,15.0,A,0,4,
1668,1582,10081.0,1975.0,8.0,11.0,1975.0,10.0,15.0,B,1,4,
1669,1594,10039.0,1999.0,6.0,,2005.0,3.0,,A,0,6,


In [37]:
dfWarPar['StartDate_Prec'] = 'Day'
dfWarPar.loc[dfWarPar['StartDay'].isna(), 'StartDate_Prec'] = 'Month'
dfWarPar.loc[dfWarPar['StartMonth'].isna(), 'StartDate_Prec'] = 'Year'

dfWarPar['EndDate_Prec'] = 'Day'
dfWarPar.loc[dfWarPar['EndDay'].isna(), 'EndDate_Prec'] = 'Month'
dfWarPar.loc[dfWarPar['EndMonth'].isna(), 'EndDate_Prec'] = 'Year'
dfWarPar.loc[dfWarPar['EndYear'].isna(), 'EndDate_Prec'] = 'Ongoing'

dfWarPar['StartDay'] = dfWarPar['StartDay'].fillna(1)
dfWarPar['StartMonth'] = dfWarPar['StartMonth'].fillna(1)
dfWarPar['EndDay'] = np.where(dfWarPar['EndYear'].notna(), dfWarPar['EndDay'].fillna(1), dfWarPar['EndDay'])
dfWarPar['EndMonth'] = np.where(dfWarPar['EndYear'].notna(), dfWarPar['EndMonth'].fillna(1), dfWarPar['EndMonth'])

In [38]:
dfWarPar = dfWarPar.astype({'StartYear': 'int', 'StartMonth': 'Int64', 'StartDay': 'Int64', 'EndYear': 'Int64', 'EndMonth': 'Int64', 'EndDay': 'Int64', 'PolityID': 'int', 'Deaths': 'Int64'})

dfWarPar['StartDate'] = dfWarPar['StartYear'].astype(str) + '/' + dfWarPar['StartMonth'].astype(str) + '/' + dfWarPar['StartDay'].astype(str)
dfWarPar['EndDate'] = dfWarPar['EndYear'].astype(str) + '/' + dfWarPar['EndMonth'].astype(str) + '/' + dfWarPar['EndDay'].astype(str)
dfWarPar['StartDate'] = pd.to_datetime(dfWarPar['StartDate'], format='%Y/%m/%d')
dfWarPar['EndDate'] = pd.to_datetime(dfWarPar['EndDate'], format='%Y/%m/%d', errors='coerce')

In [39]:
dfWarPar = dfWarPar[['WarID', 'PolityID', 'StartDate', 'StartDate_Prec', 'EndDate', 'EndDate_Prec', 'Side', 'IsInitiator', 'Outcome', 'Deaths']]
dfWarPar

Unnamed: 0,WarID,PolityID,StartDate,StartDate_Prec,EndDate,EndDate_Prec,Side,IsInitiator,Outcome,Deaths
0,1,220,1823-04-07,Day,1823-11-13,Day,A,1,1,400
1,1,230,1823-04-07,Day,1823-11-13,Day,B,0,2,600
2,4,365,1828-04-26,Day,1829-09-14,Day,A,1,1,50000
3,4,640,1828-04-26,Day,1829-09-14,Day,B,0,2,80000
4,7,2,1846-04-25,Day,1847-09-14,Day,A,1,1,13283
...,...,...,...,...,...,...,...,...,...,...
1666,1582,10042,1975-08-11,Day,1975-10-15,Day,A,0,4,
1667,1582,10038,1975-08-11,Day,1975-10-15,Day,A,0,4,
1668,1582,10081,1975-08-11,Day,1975-10-15,Day,B,1,4,
1669,1594,10039,1999-06-01,Month,2005-03-01,Month,A,0,6,


In [40]:
dfWarPar.to_csv('../FinalData/war_participants.csv', encoding='utf-8', index=False)

## Create 'WAR_LOCATIONS' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- WarID
- Region

In [41]:
dfNonStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'WhereFought', 'SideA1', 'SideA2',
       'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5', 'StartYear',
       'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator',
       'TransFrom', 'TransTo', 'Outcome', 'SideADeaths', 'SideBDeaths',
       'TotalCombatDeaths', 'Version'],
      dtype='object')

In [42]:
dfInterWarLocs = dfInterStateWar [['WarNum', 'WhereFought']]
dfIntraWarLocs = dfIntraStateWar [['WarNum', 'WhereFought']]
dfExtraWarLocs = dfExtraStateWar [['WarNum', 'WhereFought']]
dfNonWarLocs = dfNonStateWar [['WarNum', 'WhereFought']]

AllWarLocs = [dfInterWarLocs, dfIntraWarLocs, dfExtraWarLocs, dfNonWarLocs]
dfWarLocs = pd.concat(AllWarLocs).drop_duplicates().reset_index(drop=True) \
            .rename(columns={'WarNum':'WarID', 'WhereFought':'Region'})
dfWarLocs

Unnamed: 0,WarID,Region
0,1,2
1,4,11
2,7,1
3,10,2
4,13,2
...,...,...
663,1574,4
664,1577,6
665,1581,4
666,1582,7


In [43]:
dfWarLocs['WarID'].value_counts()

139     9
106     6
100     2
440     1
407     1
       ..
671     1
670     1
1552    1
658     1
1       1
Name: WarID, Length: 654, dtype: int64

Note: For all tables except InterState War, the WhereFought variable (Region) indicates where combat occurred. For the InterState War table, WhereFought indicates where combat *involving the state* occured. As a result, 3 (interstate) wars have more than one WhereFought value per war: 139 (WWII), 106 (WWI), and 100 (First Balkan War). For the purposes of this table, Region will track where combat occurred - thus losing some granularity for states involved in these three interstate wars.

In [44]:
region_map_values = {1: 'W. Hemisphere', 2: 'Europe', 4: 'Africa', 6: 'Middle East', 7: 'Asia', 9: 'Oceania', 
                     11: 'Europe, Middle East', 12: 'Europe, Asia', 13: 'W. Hemisphere, Asia', 
                     14: 'Europe, Africa, Middle East', 15: 'Europe, Africa, Middle East, Asia', 
                     16: 'Africa, Middle East, Asia, Oceania', 17: 'Asia, Oceania', 18: 'Africa, Middle East', 
                     19: 'Europe, Africa, Middle East, Asia, Oceania'}

dfWarLocs['Region'] = dfWarLocs['Region'].replace(region_map_values) \
                                         .str.split(', ')
dfWarLocs = dfWarLocs.explode('Region') \
                     .drop_duplicates() \
                     .reset_index(drop=True)
dfWarLocs

Unnamed: 0,WarID,Region
0,1,Europe
1,4,Europe
2,4,Middle East
3,7,W. Hemisphere
4,10,Europe
...,...,...
662,1574,Africa
663,1577,Middle East
664,1581,Africa
665,1582,Asia


In [45]:
dfWarLocs['WarID'].value_counts().head(10)

139    5
106    4
100    2
79     2
207    2
115    2
61     2
4      2
404    1
412    1
Name: WarID, dtype: int64

In [46]:
dfWarLocs.to_csv('../FinalData/war_locations.csv', encoding='utf-8', index=False)

## Create 'WAR_TRANSITIONS' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- FromWar
- ToWar

In [47]:
dfInterWarTrans = dfInterStateWar[['WarNum', 'TransTo']]
dfIntraWarTrans = dfIntraStateWar[['WarNum', 'TransTo']]
dfNonWarTrans = dfNonStateWar[['WarNum', 'TransTo']]
dfExtraWarTrans = dfExtraStateWar[['WarNum', 'TransTo']]

allWarTrans = [dfInterWarTrans, dfIntraWarTrans, dfNonWarTrans, dfExtraWarTrans]
dfWarTrans = pd.concat(allWarTrans).dropna().drop_duplicates().sort_values('WarNum').reset_index(drop=True)
dfWarTrans = dfWarTrans.rename(columns={'WarNum':'FromWar', 'TransTo':'ToWar'}).astype({'ToWar':int})
dfWarTrans

Unnamed: 0,FromWar,ToWar
0,40,587
1,176,785
2,186,804
3,187,808
4,189,475
5,215,877
6,225,481
7,227,482
8,327,19
9,352,37


In [48]:
dfWarTrans.to_csv('../FinalData/war_transitions.csv', encoding='utf-8', index=False)