# Start of Data Transformation

Task: use Pandas to transform csv files into DataFrames that match desired tables for database schema

Tables:

- WAR (done)
- WAR_LOCATION
- WAR_PARTICIPANTS
- WAR_DEATHS
- WAR_TRANSITIONS

In [1]:
import pandas as pd
import numpy as np

In [2]:
!ls ../SourceData/CorrelatesOfWar/

[34mCodebooks[m[m                    MID_Narratives_2002-2010.pdf
CowWarList.csv               NMC_5_0-wsupplementary.csv
CowWarList.pdf               Non-StateWarData_v4.0.csv
[31mEntities.pdf[m[m                 Territories.csv
Extra-StateWarData_v4.0.csv  alliance_v4.1_by_member.csv
IGO_stateunit_v2.3.csv       contdir.csv
Inter-StateWarData_v4.0.csv  igounit_v2.3.csv
Intra-StateWarData_v4.1.csv  majors2016.csv
[31mMIDA_4.2.csv[m[m                 states2016.csv
[31mMIDB_4.2.csv[m[m                 system2016.csv
[31mMIDLOCA_2.0.csv[m[m              tc2014.csv
MID_Narratives_1993-2001.pdf


In [3]:
dfInterStateWar = pd.read_csv('../SourceData/CorrelatesOfWar/Inter-StateWarData_v4.0.csv')
dfIntraStateWar = pd.read_csv('../SourceData/CorrelatesOfWar/Intra-StateWarData_v4.1.csv')
dfNonStateWar = pd.read_csv('../SourceData/CorrelatesOfWar/Non-StateWarData_v4.0.csv')
dfExtraStateWar = pd.read_csv('../SourceData/CorrelatesOfWar/Extra-StateWarData_v4.0.csv')

## Create 'WAR' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')
- CowWarList.csv (note: generated from pdf using Tabula, with `\r`s removed by hand)

with the following attributes:

- WarID
- WarShortName
- WarLongName (from CowWarList.csv)
- WarType
- IsIntervention (only relevant for Extra-State Wars)
- IsInternational (only relevant for Intra-State Wars)

Note: I re-saved many of the csv files with UTF-8 encoding.
Note: The carriage return characters can also be removed with this code:

`df = df.replace({r'\r': ' '}, regex=True)`

In [4]:
dfInterWar = dfInterStateWar[['WarNum', 'WarName', 'WarType']]
dfInterWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName'}, inplace=True)
dfInterWar.drop_duplicates(inplace=True)

dfIntraWar = dfIntraStateWar[['WarNum', 'WarName', 'WarType', 'Intnl']]
dfIntraWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName', 'Intnl':'IsInternational'}, inplace=True)
dfIntraWar.drop_duplicates(inplace=True)

dfNonWar = dfNonStateWar[['WarNum', 'WarName', 'WarType']]
dfNonWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName'}, inplace=True)
dfNonWar.drop_duplicates(inplace=True)

dfExtraWar = dfExtraStateWar[['WarNum', 'WarName', 'WarType', 'Interven']]
dfExtraWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName', 'Interven':'IsIntervention'}, inplace=True)
dfExtraWar.drop_duplicates(inplace=True)

warDFs = [dfInterWar, dfIntraWar, dfNonWar, dfExtraWar]
dfWar = pd.concat(warDFs)
dfWar = dfWar[['WarID', 'WarShortName', 'WarType', 'IsIntervention', 'IsInternational']]
dfWar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a

Unnamed: 0,WarID,WarShortName,WarType,IsIntervention,IsInternational
0,1,Franco-Spanish War,1,,
2,4,First Russo-Turkish,1,,
4,7,Mexican-American,1,,
6,10,Austro-Sardinian,1,,
10,13,First Schleswig-Holstein,1,,
12,16,Roman Republic,1,,
16,19,La Plata,1,,
18,22,Crimean,1,,
23,25,Anglo-Persian,1,,
25,28,Italian Unification,1,,


Now to add the long names and the general category war type:

In [5]:
dfWarNames = pd.read_csv('../SourceData/CorrelatesOfWar/CowWarList.csv')
dfWarNames

Unnamed: 0,Year,War Name,War Type & Number
0,1816,Allied Bombardment of Algiers of 1816,Extra-State War #300
1,1816,Ottoman-Wahhabi Revolt of 1816-1818,Extra-State War #301
2,1817,Liberation of Chile of 1817-1818,Extra-State War #302
3,1817,First Bolivar Expedition of 1817-1819,Extra-State War #303
4,1817,War of Mexican Independence of 1817-1818,Extra-State War #304
5,1817,British-Kandyan War of 1817-1818,Extra-State War #305
6,1817,British-Maratha of 1817-1818,Extra-State War #306
7,1818,First Maori Tribal War of 1818-1824,Non-State War #1500
8,1818,First Caucasus War of 1818-1822,Intra-State War #500
9,1819,Shaka Zulu-Bantu War of 1819-1828,Non-State War #1501


In [6]:
dfWarNamesIDs = dfWarNames['War Type & Number'].str.split("#", n = 1, expand = True)
dfWarNames['WarTypeName'] = dfWarNamesIDs[0]
dfWarNames['WarID'] = dfWarNamesIDs[1]

dfWarNames = dfWarNames[['WarID', 'WarTypeName', 'War Name']]
dfWarNames.rename(columns={'War Name':'WarLongName'}, inplace=True)
dfWarNames

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,WarID,WarTypeName,WarLongName
0,300,Extra-State War,Allied Bombardment of Algiers of 1816
1,301,Extra-State War,Ottoman-Wahhabi Revolt of 1816-1818
2,302,Extra-State War,Liberation of Chile of 1817-1818
3,303,Extra-State War,First Bolivar Expedition of 1817-1819
4,304,Extra-State War,War of Mexican Independence of 1817-1818
5,305,Extra-State War,British-Kandyan War of 1817-1818
6,306,Extra-State War,British-Maratha of 1817-1818
7,1500,Non-State War,First Maori Tribal War of 1818-1824
8,500,Intra-State War,First Caucasus War of 1818-1822
9,1501,Non-State War,Shaka Zulu-Bantu War of 1819-1828


In [7]:
dfWarNames['WarID'] = dfWarNames['WarID'].astype('int64')
dfWars = pd.merge(dfWar, dfWarNames, on='WarID')
dfWars = dfWars[['WarID', 'WarShortName', 'WarLongName', 'WarType', 'WarTypeName', 'IsIntervention', 'IsInternational']]
dfWars = dfWars.replace(np.nan, '', regex=True)
dfWars

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,WarID,WarShortName,WarLongName,WarType,WarTypeName,IsIntervention,IsInternational
0,1,Franco-Spanish War,Franco-Spanish War of 1823,1,Inter-State War,,
1,4,First Russo-Turkish,First Russo-Turkish War of 1828-1829,1,Inter-State War,,
2,7,Mexican-American,Mexican-American War of 1846-1847,1,Inter-State War,,
3,10,Austro-Sardinian,Austro-Sardinian War of 1848-1849,1,Inter-State War,,
4,13,First Schleswig-Holstein,First Schleswig-Holstein War of 1848-1849,1,Inter-State War,,
5,16,Roman Republic,War of the Roman Republic of 1849,1,Inter-State War,,
6,19,La Plata,La Plata War of 1851-1852,1,Inter-State War,,
7,22,Crimean,Crimean War of 1853-1856,1,Inter-State War,,
8,25,Anglo-Persian,Anglo-Persian War of 1856-1857,1,Inter-State War,,
9,28,Italian Unification,War of Italian Unification of 1859,1,Inter-State War,,


In [8]:
WarShortNameMaxLength = int(dfWars['WarShortName'].str.encode(encoding='utf-8').str.len().max())
print('WarShortNameMaxLength', WarShortNameMaxLength)
WarLongNameMaxLength = int(dfWars['WarLongName'].str.encode(encoding='utf-8').str.len().max())
print('WarLongNameMaxLength', WarLongNameMaxLength)
WarTypeNameMaxLength = int(dfWars['WarTypeName'].str.encode(encoding='utf-8').str.len().max())
print('WarTypeNameMaxLength', WarTypeNameMaxLength)

#dfWar['WarType'].value_counts()
#dfWar['WarID'].value_counts()

WarShortNameMaxLength 50
WarLongNameMaxLength 62
WarTypeNameMaxLength 16


In [9]:
dfWars.WarTypeName.unique()

array(['Inter-State War ', 'Intra-State War ', 'Non-State War ',
       'Extra-State War '], dtype=object)

In [10]:
# The war type names have an extra space at the end due to splitting. 
# Stripping white spaces from everything just to be safe:
dfWars = dfWars.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

In [11]:
dfWars.to_csv('../FinalData/war.csv', encoding='utf-8', index=False)

## Create 'WAR_PARTICIPANTS' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- WarID
- PolityID
- StartDate
- EndDate
- StartYear
- StartMonth
- StartDay
- EndYear
- EndMonth
- EndDay
- Side
- IsInitiator
- Outcome

Note: There was a data entry error in 'Intra-StateWarData_v4.1.csv' for WarNum 585; EndDay1 was coded '-91866' and EndYear1 was left blank. I corrected this by hand so the Day was '-9' and the Year '1866'.

Note: There was another data entry error in the same file for WarNum 682; EndDay1 was coded '1919' and EndYear1 was left blank. I corrected this by hand so that the Day was '-9' and the Year '1919'.

Note: There was an apparent data entry error in the same file for WarNum 623, the second entry (Korea) - the StartDay1 was coded as '29' when StartMonth1 was 2... which is not a valid date. I corrected this by hand so the StartDay1 became '28'.

In [12]:
dfPolities = pd.read_csv('../FinalData/polity.csv')

### Inter-State War

In [13]:
dfInterStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode', 'StateName', 'Side',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought', 'Initiator',
       'Outcome', 'TransTo', 'BatDeath', 'Version'],
      dtype='object')

In [14]:
dfInterWarPar1 = dfInterStateWar[['WarNum', 'ccode', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1', 'Side', 'Initiator', 'Outcome']]
dfInterWarPar2 = dfInterStateWar[['WarNum', 'ccode', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2', 'EndYear2', 'Side', 'Initiator', 'Outcome']]

In [15]:
dfInterWarPar1.rename(columns={'WarNum':'WarID', 'ccode':'PolityID', 'StartMonth1':'StartMonth', 
                               'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                               'EndDay1':'EndDay', 'EndYear1':'EndYear', 'Initiator':'IsInitiator'}, inplace=True)
dfInterWarPar2.rename(columns={'WarNum':'WarID', 'ccode':'PolityID', 'StartMonth2':'StartMonth', 
                               'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                               'EndDay2':'EndDay', 'EndYear2':'EndYear', 'Initiator':'IsInitiator'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [16]:
dfInterWarPar2

Unnamed: 0,WarID,PolityID,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Side,IsInitiator,Outcome
0,1,230,-8,-8,-8,-8,-8,-8,2,2,2
1,1,220,-8,-8,-8,-8,-8,-8,1,1,1
2,4,640,-8,-8,-8,-8,-8,-8,2,2,2
3,4,365,-8,-8,-8,-8,-8,-8,1,1,1
4,7,70,-8,-8,-8,-8,-8,-8,2,2,2
5,7,2,-8,-8,-8,-8,-8,-8,1,1,1
6,10,337,-8,-8,-8,-8,-8,-8,2,2,2
7,10,325,3,12,1849,3,30,1849,2,1,2
8,10,300,3,12,1849,3,30,1849,1,2,1
9,10,332,-8,-8,-8,-8,-8,-8,2,2,2


In [17]:
dfInterWarPar2 = dfInterWarPar2.replace(-8, '')
dfInterWarPar2['datesconcat'] = dfInterWarPar2['StartMonth'].map(str) + dfInterWarPar2['StartDay'].map(str) + dfInterWarPar2['StartYear'].map(str) + dfInterWarPar2['EndMonth'].map(str) + dfInterWarPar2['EndDay'].map(str) + dfInterWarPar2['EndYear'].map(str)
missdate = dfInterWarPar2.loc[0, 'datesconcat']
dfInterWarPar2 = dfInterWarPar2[dfInterWarPar2.datesconcat != missdate]
dfInterWarPar2.drop(columns=['datesconcat'], inplace=True)
dfInterWarPar2

Unnamed: 0,WarID,PolityID,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Side,IsInitiator,Outcome
7,10,325,3,12,1849,3,30,1849,2,1,2
8,10,300,3,12,1849,3,30,1849,1,2,1
10,13,255,3,25,1849,7,10,1849,1,1,1
11,13,390,3,25,1849,7,10,1849,2,2,2
38,46,255,6,25,1864,7,20,1864,1,1,1
39,46,390,6,25,1864,7,20,1864,2,2,2
40,46,300,6,25,1864,7,20,1864,1,2,1
104,100,355,2,3,1913,4,19,1913,1,2,1
105,100,345,2,3,1913,4,19,1913,1,1,1
182,139,365,8,8,1945,8,14,1945,1,2,1


In [18]:
combinedInterWarPar = [dfInterWarPar1, dfInterWarPar2]
dfInterWarPar = pd.concat(combinedInterWarPar)
dfInterWarPar

Unnamed: 0,WarID,PolityID,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Side,IsInitiator,Outcome
0,1,230,4,7,1823,11,13,1823,2,2,2
1,1,220,4,7,1823,11,13,1823,1,1,1
2,4,640,4,26,1828,9,14,1829,2,2,2
3,4,365,4,26,1828,9,14,1829,1,1,1
4,7,70,4,25,1846,9,14,1847,2,2,2
5,7,2,4,25,1846,9,14,1847,1,1,1
6,10,337,3,29,1848,8,9,1848,2,2,2
7,10,325,3,24,1848,8,9,1848,2,1,2
8,10,300,3,24,1848,8,9,1848,1,2,1
9,10,332,4,9,1848,8,9,1848,2,2,2


In [19]:
dfInterWarPar['Side'] [dfInterWarPar['Side'] == 1] = 'A'
dfInterWarPar['Side'] [dfInterWarPar['Side'] == 2] = 'B'
dfInterWarPar['IsInitiator'] [dfInterWarPar['IsInitiator'] == 2] = 0
dfInterWarPar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,PolityID,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Side,IsInitiator,Outcome
0,1,230,4,7,1823,11,13,1823,B,0,2
1,1,220,4,7,1823,11,13,1823,A,1,1
2,4,640,4,26,1828,9,14,1829,B,0,2
3,4,365,4,26,1828,9,14,1829,A,1,1
4,7,70,4,25,1846,9,14,1847,B,0,2
5,7,2,4,25,1846,9,14,1847,A,1,1
6,10,337,3,29,1848,8,9,1848,B,0,2
7,10,325,3,24,1848,8,9,1848,B,1,2
8,10,300,3,24,1848,8,9,1848,A,0,1
9,10,332,4,9,1848,8,9,1848,B,0,2


In [20]:
dfInterWarPar = dfInterWarPar[['WarID', 'PolityID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome']]
dfInterWarPar

Unnamed: 0,WarID,PolityID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome
0,1,230,1823,4,7,1823,11,13,B,0,2
1,1,220,1823,4,7,1823,11,13,A,1,1
2,4,640,1828,4,26,1829,9,14,B,0,2
3,4,365,1828,4,26,1829,9,14,A,1,1
4,7,70,1846,4,25,1847,9,14,B,0,2
5,7,2,1846,4,25,1847,9,14,A,1,1
6,10,337,1848,3,29,1848,8,9,B,0,2
7,10,325,1848,3,24,1848,8,9,B,1,2
8,10,300,1848,3,24,1848,8,9,A,0,1
9,10,332,1848,4,9,1848,8,9,B,0,2


### Intra-State War

In [21]:
dfIntraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'CcodeA', 'SideA', 'CcodeB', 'SideB',
       'Intnl', 'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1',
       'EndDay1', 'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2',
       'EndMonth2', 'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought',
       'Initiator', 'Outcome', 'TransTo', 'SideADeaths', 'SideBDeaths',
       'Version'],
      dtype='object')

In [22]:
dfIntraWarPar1A = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome']]
dfIntraWarPar2A = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2', 'EndYear2', 'Initiator', 'Outcome']]
dfIntraWarPar1B = dfIntraStateWar[['WarNum', 'CcodeB', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome']]
dfIntraWarPar2B = dfIntraStateWar[['WarNum', 'CcodeB', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2', 'EndYear2', 'Initiator', 'Outcome']]

In [23]:
dfIntraWarPar1A.rename(columns={'WarNum':'WarID', 'CcodeA':'PolityID', 'SideA':'PolityName', 'StartMonth1':'StartMonth', 
                                'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfIntraWarPar2A.rename(columns={'WarNum':'WarID', 'CcodeA':'PolityID', 'SideA':'PolityName', 'StartMonth2':'StartMonth', 
                                'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                'EndDay2':'EndDay', 'EndYear2':'EndYear'}, inplace=True)
dfIntraWarPar1B.rename(columns={'WarNum':'WarID', 'CcodeB':'PolityID', 'SideB':'PolityName', 'StartMonth1':'StartMonth', 
                                'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfIntraWarPar2B.rename(columns={'WarNum':'WarID', 'CcodeB':'PolityID', 'SideB':'PolityName', 'StartMonth2':'StartMonth', 
                                'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                'EndDay2':'EndDay', 'EndYear2':'EndYear'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [24]:
dfIntraWarPar1A = dfIntraWarPar1A[dfIntraWarPar1A.PolityName != '-8']
dfIntraWarPar1B = dfIntraWarPar1B[dfIntraWarPar1B.PolityName != '-8']
dfIntraWarPar2A = dfIntraWarPar2A[dfIntraWarPar2A.PolityName != '-8']
dfIntraWarPar2B = dfIntraWarPar2B[dfIntraWarPar2B.PolityName != '-8']

In [25]:
dfIntraWarPar2A = dfIntraWarPar2A.replace(-8, '')
dfIntraWarPar2A['datesconcat'] = dfIntraWarPar2A['StartMonth'].map(str) + dfIntraWarPar2A['StartDay'].map(str) + dfIntraWarPar2A['StartYear'].map(str) + dfIntraWarPar2A['EndMonth'].map(str) + dfIntraWarPar2A['EndDay'].map(str) + dfIntraWarPar2A['EndYear'].map(str)
missdate2A = dfIntraWarPar2A.loc[0, 'datesconcat']
dfIntraWarPar2A = dfIntraWarPar2A[dfIntraWarPar2A.datesconcat != missdate2A]
dfIntraWarPar2A.drop(columns=['datesconcat'], inplace=True)
dfIntraWarPar2A

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome
48,547,329,Two Sicilies,5,15,1848,5,15,1849,Liberals,1
86,590,101,Venezuela,8,14,1869,1,7,1871,Conservatives,2
111,623,730,Korea,9,14,1894,11,28,1894,Tonghak Society,1
193,720,350,Greece,2,12,1946,10,16,1949,Communists,1
310,820,620,Libya,6,-9,1983,9,-9,1984,FAN,2
324,836,625,Sudan,4,15,1992,1,10,2005,SPLA-Garang faction,3
367,877,346,Bosnia,3,20,1995,12,14,1995,Bosnian Serbs,1
391,898,451,Sierra Leone,5,11,2000,11,10,2000,Kabbah faction,2


In [26]:
dfIntraWarPar2B = dfIntraWarPar2B.replace(-8, '')
dfIntraWarPar2B['datesconcat'] = dfIntraWarPar2B['StartMonth'].map(str) + dfIntraWarPar2B['StartDay'].map(str) + dfIntraWarPar2B['StartYear'].map(str) + dfIntraWarPar2B['EndMonth'].map(str) + dfIntraWarPar2B['EndDay'].map(str) + dfIntraWarPar2B['EndYear'].map(str)
missdate2B = dfIntraWarPar2B.loc[0, 'datesconcat']
dfIntraWarPar2B = dfIntraWarPar2B[dfIntraWarPar2B.datesconcat != missdate2B]
dfIntraWarPar2B.drop(columns=['datesconcat'], inplace=True)
dfIntraWarPar2B

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome
48,547,,Liberals,5,15,1848,5,15,1849,Liberals,1
86,590,,Conservatives,8,14,1869,1,7,1871,Conservatives,2
111,623,,Tonghak Society,9,14,1894,11,28,1894,Tonghak Society,1
193,720,,Communists,2,12,1946,10,16,1949,Communists,1
324,836,,SPLA-Garang faction,4,15,1992,1,10,2005,SPLA-Garang faction,3
367,877,,Bosnian Serbs,3,20,1995,12,14,1995,Bosnian Serbs,1
369,877,344.0,Croatia,3,20,1995,12,14,1995,Bosnian Serbs,1
391,898,,Kabbah faction,5,11,2000,11,10,2000,Kabbah faction,2
392,898,452.0,Ghana,5,11,2000,11,10,2000,Kabbah faction,2
393,898,475.0,Nigeria,5,11,2000,11,10,2000,Kabbah faction,2


In [27]:
combinedIntraWarSideA = [dfIntraWarPar1A, dfIntraWarPar2A]
dfIntraWarParA = pd.concat(combinedIntraWarSideA)
dfIntraWarParA['Side'] = 'A'
dfIntraWarParA

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side
0,500,365,Russia,6,10,1818,-9,-9,1822,Chechens,1,A
1,501,-8,Sidon,6,-9,1820,7,21,1821,Sidon,2,A
2,502,300,Austria,3,-9,1821,3,23,1821,Liberals,1,A
3,502,329,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A
4,503,230,Spain,12,1,1821,4,6,1823,Royalists,4,A
5,505,300,Austria,3,10,1821,5,8,1821,Carbonari,1,A
6,505,325,Sardinia,3,10,1821,5,8,1821,Carbonari,1,A
7,506,640,Ottoman Empire,3,25,1821,4,25,1828,Greeks,4,A
11,507,-8,Egypt,3,20,1824,4,-9,1824,Mehdi army,1,A
12,508,640,Ottoman Empire,6,14,1826,9,30,1826,Janissaries,1,A


In [28]:
combinedIntraWarSideB = [dfIntraWarPar1B, dfIntraWarPar2B]
dfIntraWarParB = pd.concat(combinedIntraWarSideB)
dfIntraWarParB['Side'] = 'B'
# need to make Outcome consistent between war types... in InterStateWar winner = 1, loser = 2; in IntraStateWar sideA wins = 1, sideB wins = 2
dfIntraWarParB['Outcome'] [dfIntraWarParB['Outcome'] == 2] = 'win'
dfIntraWarParB['Outcome'] [dfIntraWarParB['Outcome'] == 1] = 'lose'
dfIntraWarParB['Outcome'] [dfIntraWarParB['Outcome'] == 'win'] = 1
dfIntraWarParB['Outcome'] [dfIntraWarParB['Outcome'] == 'lose'] = 2
dfIntraWarParB

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side
0,500,-8,"Georgians, Dhagestania, Chechens",6,10,1818,-9,-9,1822,Chechens,2,B
1,501,-8,Damascus & Aleppo,6,-9,1820,7,21,1821,Sidon,1,B
3,502,-8,Liberals,7,2,1820,3,23,1821,Liberals,2,B
4,503,-8,Royalists,12,1,1821,4,6,1823,Royalists,4,B
6,505,-8,Carbonari,3,10,1821,5,8,1821,Carbonari,2,B
7,506,-8,Greeks,3,25,1821,4,25,1828,Greeks,4,B
8,506,200,United Kingdom,10,20,1827,10,27,1827,Greeks,4,B
9,506,220,France,10,20,1827,10,27,1827,Greeks,4,B
10,506,365,Russia,10,20,1827,4,25,1828,Greeks,4,B
11,507,-8,Mehdi army,3,20,1824,4,-9,1824,Mehdi army,2,B


In [29]:
combinedIntraWarPar = [dfIntraWarParA, dfIntraWarParB]
dfIntraWarPar = pd.concat(combinedIntraWarPar)
dfIntraWarPar = dfIntraWarPar.sort_values('WarID')
dfIntraWarPar.reset_index(drop=True, inplace=True)
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side
0,500,365,Russia,6,10,1818,-9,-9,1822,Chechens,1,A
1,500,-8,"Georgians, Dhagestania, Chechens",6,10,1818,-9,-9,1822,Chechens,2,B
2,501,-8,Sidon,6,-9,1820,7,21,1821,Sidon,2,A
3,501,-8,Damascus & Aleppo,6,-9,1820,7,21,1821,Sidon,1,B
4,502,-8,Liberals,7,2,1820,3,23,1821,Liberals,2,B
5,502,300,Austria,3,-9,1821,3,23,1821,Liberals,1,A
6,502,329,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A
7,503,230,Spain,12,1,1821,4,6,1823,Royalists,4,A
8,503,-8,Royalists,12,1,1821,4,6,1823,Royalists,4,B
9,505,300,Austria,3,10,1821,5,8,1821,Carbonari,1,A


In [30]:
dfIntraWarPar = dfIntraWarPar.replace(-9, '')
dfIntraWarPar = dfIntraWarPar.replace(-8, '')
dfIntraWarPar = dfIntraWarPar.replace(-7, '')
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side
0,500,365,Russia,6,10,1818,,,1822,Chechens,1,A
1,500,,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B
2,501,,Sidon,6,,1820,7,21,1821,Sidon,2,A
3,501,,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B
4,502,,Liberals,7,2,1820,3,23,1821,Liberals,2,B
5,502,300,Austria,3,,1821,3,23,1821,Liberals,1,A
6,502,329,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A
7,503,230,Spain,12,1,1821,4,6,1823,Royalists,4,A
8,503,,Royalists,12,1,1821,4,6,1823,Royalists,4,B
9,505,300,Austria,3,10,1821,5,8,1821,Carbonari,1,A


In [31]:
dfIntraWarPar['PolityName'] = dfIntraWarPar['PolityName'].str.strip()
dfIntraWarPar['Initiator'] = dfIntraWarPar['Initiator'].str.strip()
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side
0,500,365,Russia,6,10,1818,,,1822,Chechens,1,A
1,500,,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B
2,501,,Sidon,6,,1820,7,21,1821,Sidon,2,A
3,501,,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B
4,502,,Liberals,7,2,1820,3,23,1821,Liberals,2,B
5,502,300,Austria,3,,1821,3,23,1821,Liberals,1,A
6,502,329,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A
7,503,230,Spain,12,1,1821,4,6,1823,Royalists,4,A
8,503,,Royalists,12,1,1821,4,6,1823,Royalists,4,B
9,505,300,Austria,3,10,1821,5,8,1821,Carbonari,1,A


create the 'IsInitiator' column based on the 'Initiator' column

In [32]:
dfIntraWarPar['IsInitiator'] = 0
dfIntraWarPar['IsInitiator'] [dfIntraWarPar['PolityName'] == dfIntraWarPar['Initiator']] = 1
dfIntraWarPar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator
0,500,365,Russia,6,10,1818,,,1822,Chechens,1,A,0
1,500,,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B,0
2,501,,Sidon,6,,1820,7,21,1821,Sidon,2,A,1
3,501,,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B,0
4,502,,Liberals,7,2,1820,3,23,1821,Liberals,2,B,1
5,502,300,Austria,3,,1821,3,23,1821,Liberals,1,A,0
6,502,329,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A,0
7,503,230,Spain,12,1,1821,4,6,1823,Royalists,4,A,0
8,503,,Royalists,12,1,1821,4,6,1823,Royalists,4,B,1
9,505,300,Austria,3,10,1821,5,8,1821,Carbonari,1,A,0


In [33]:
checkinit = dfIntraWarPar.groupby('WarID')['IsInitiator'].sum()
checkinit.value_counts()

1    276
0     51
2      7
Name: IsInitiator, dtype: int64

In [34]:
missingInit = checkinit.loc[checkinit==0].index

pd.set_option('display.max_rows', 200)
dfIntraWarPar[dfIntraWarPar.WarID.isin(missingInit)]

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator
0,500,365.0,Russia,6.0,10.0,1818,,,1822.0,Chechens,1,A,0
1,500,,"Georgians, Dhagestania, Chechens",6.0,10.0,1818,,,1822.0,Chechens,2,B,0
36,518,640.0,Ottoman Empire,10.0,1.0,1831,12.0,27.0,1832.0,Egyptians,2,A,0
37,518,,Egyptians & Bashir,10.0,1.0,1831,12.0,27.0,1832.0,Egyptians,1,B,0
63,533,640.0,Ottoman Empire,6.0,10.0,1839,6.0,24.0,1839.0,Mehmet Ali,2,A,0
64,533,,Egypt,6.0,10.0,1839,6.0,24.0,1839.0,Mehmet Ali,1,B,0
78,542,640.0,Ottoman Empire,12.0,19.0,1842,1.0,13.0,1843.0,Ottomans,1,A,0
79,542,,Karbala,12.0,19.0,1842,1.0,13.0,1843.0,Ottomans,2,B,0
90,548,,Paez led Conservatives,2.0,4.0,1848,8.0,15.0,1849.0,Former Pres. Paez,2,B,0
91,548,101.0,Venezuela,2.0,4.0,1848,8.0,15.0,1849.0,Former Pres. Paez,1,A,0


In [35]:
# some 'Initiator' values are irregular - due to misspellings, small alterations, alternate names, being part of a list, etc.
# Others are less clear and required some Wikipedia/Google searching on my part.
# I used the above dataframe slice to select which rows should be coded as a 1 in the 'IsInitiator' column
IsInitIndex = [1, 37, 63, 78, 90, 102, 105, 110, 131, 137, 143, 153, 155, 243, 258, 265, 280, 288, 293, 299, 310, 330, 336, 
              359, 394, 401, 461, 497, 526, 529, 530, 544, 552, 572, 578, 583, 598, 608, 615, 619, 621, 633, 636, 692, 
              694, 698, 749, 750, 755, 757, 770, 776, 777]

dfIntraWarPar['IsInitiator'] [dfIntraWarPar.index.isin(IsInitIndex)] = 1
dfIntraWarPar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator
0,500,365,Russia,6,10,1818,,,1822,Chechens,1,A,0
1,500,,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B,1
2,501,,Sidon,6,,1820,7,21,1821,Sidon,2,A,1
3,501,,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B,0
4,502,,Liberals,7,2,1820,3,23,1821,Liberals,2,B,1
5,502,300,Austria,3,,1821,3,23,1821,Liberals,1,A,0
6,502,329,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A,0
7,503,230,Spain,12,1,1821,4,6,1823,Royalists,4,A,0
8,503,,Royalists,12,1,1821,4,6,1823,Royalists,4,B,1
9,505,300,Austria,3,10,1821,5,8,1821,Carbonari,1,A,0


In [36]:
checkinit = dfIntraWarPar.groupby('WarID')['IsInitiator'].sum()
checkinit.value_counts()

1    325
2      9
Name: IsInitiator, dtype: int64

In [37]:
doubleInit = checkinit.loc[checkinit==2].index
dfIntraWarPar[dfIntraWarPar.WarID.isin(doubleInit)]

# just to check, but everything here looks fine.

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator
86,547,,Liberals,5,15,1848,5,15,1849,Liberals,2,B,1
87,547,,Liberals,1,12,1848,1,27,1848,Liberals,2,B,1
88,547,329.0,Two Sicilies,5,15,1848,5,15,1849,Liberals,1,A,0
89,547,329.0,Two Sicilies,1,12,1848,1,27,1848,Liberals,1,A,0
161,590,,Conservatives,8,14,1869,1,7,1871,Conservatives,1,B,1
162,590,,Conservatives,1,11,1868,8,14,1868,Conservatives,1,B,1
163,590,101.0,Venezuela,1,11,1868,8,14,1868,Conservatives,2,A,0
164,590,101.0,Venezuela,8,14,1869,1,7,1871,Conservatives,2,A,0
211,623,,Tonghak Society,9,14,1894,11,28,1894,Tonghak Society,2,B,1
212,623,,Tonghak Society,2,28,1894,5,6,1894,Tonghak Society,2,B,1


fill in PolityIDs where missing (mostly NonState Groups)

In [38]:
dfIntraWarPar['PolityID'].replace('', np.nan, inplace=True)
dfIntraWarPar['PolityID'] = dfIntraWarPar['PolityID'].astype(float)
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator
0,500,365.0,Russia,6,10,1818,,,1822,Chechens,1,A,0
1,500,,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B,1
2,501,,Sidon,6,,1820,7,21,1821,Sidon,2,A,1
3,501,,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B,0
4,502,,Liberals,7,2,1820,3,23,1821,Liberals,2,B,1
5,502,300.0,Austria,3,,1821,3,23,1821,Liberals,1,A,0
6,502,329.0,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A,0
7,503,230.0,Spain,12,1,1821,4,6,1823,Royalists,4,A,0
8,503,,Royalists,12,1,1821,4,6,1823,Royalists,4,B,1
9,505,300.0,Austria,3,10,1821,5,8,1821,Carbonari,1,A,0


In [39]:
dfIntraWarPar = dfIntraWarPar.merge(dfPolities[['PolityID', 'PolityName']], on='PolityName', how='left', suffixes=('', '_m'),)
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator,PolityID_m
0,500,365.0,Russia,6,10,1818,,,1822,Chechens,1,A,0,365.0
1,500,,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B,1,10115.0
2,501,,Sidon,6,,1820,7,21,1821,Sidon,2,A,1,10092.0
3,501,,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B,0,10116.0
4,502,,Liberals,7,2,1820,3,23,1821,Liberals,2,B,1,10048.0
5,502,300.0,Austria,3,,1821,3,23,1821,Liberals,1,A,0,305.0
6,502,329.0,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A,0,329.0
7,503,230.0,Spain,12,1,1821,4,6,1823,Royalists,4,A,0,230.0
8,503,,Royalists,12,1,1821,4,6,1823,Royalists,4,B,1,10117.0
9,505,300.0,Austria,3,10,1821,5,8,1821,Carbonari,1,A,0,305.0


In [40]:
dfIntraWarPar['PolityID'].fillna(dfIntraWarPar['PolityID_m'], inplace=True)
dfIntraWarPar

Unnamed: 0,WarID,PolityID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,Initiator,Outcome,Side,IsInitiator,PolityID_m
0,500,365.0,Russia,6,10,1818,,,1822,Chechens,1,A,0,365.0
1,500,10115.0,"Georgians, Dhagestania, Chechens",6,10,1818,,,1822,Chechens,2,B,1,10115.0
2,501,10092.0,Sidon,6,,1820,7,21,1821,Sidon,2,A,1,10092.0
3,501,10116.0,Damascus & Aleppo,6,,1820,7,21,1821,Sidon,1,B,0,10116.0
4,502,10048.0,Liberals,7,2,1820,3,23,1821,Liberals,2,B,1,10048.0
5,502,300.0,Austria,3,,1821,3,23,1821,Liberals,1,A,0,305.0
6,502,329.0,Two Sicilies,7,2,1820,3,23,1821,Liberals,1,A,0,329.0
7,503,230.0,Spain,12,1,1821,4,6,1823,Royalists,4,A,0,230.0
8,503,10117.0,Royalists,12,1,1821,4,6,1823,Royalists,4,B,1,10117.0
9,505,300.0,Austria,3,10,1821,5,8,1821,Carbonari,1,A,0,305.0


In [41]:
dfIntraWarPar = dfIntraWarPar[['WarID', 'PolityID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome']]
dfIntraWarPar

Unnamed: 0,WarID,PolityID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome
0,500,365.0,1818,6,10,1822,,,A,0,1
1,500,10115.0,1818,6,10,1822,,,B,1,2
2,501,10092.0,1820,6,,1821,7,21,A,1,2
3,501,10116.0,1820,6,,1821,7,21,B,0,1
4,502,10048.0,1820,7,2,1821,3,23,B,1,2
5,502,300.0,1821,3,,1821,3,23,A,0,1
6,502,329.0,1820,7,2,1821,3,23,A,0,1
7,503,230.0,1821,12,1,1823,4,6,A,0,4
8,503,10117.0,1821,12,1,1823,4,6,B,1,4
9,505,300.0,1821,3,10,1821,5,8,A,0,1


### Non-State War

In [43]:
dfNonStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'WhereFought', 'SideA1', 'SideA2',
       'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5', 'StartYear',
       'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator',
       'TransFrom', 'TransTo', 'Outcome', 'SideADeaths', 'SideBDeaths',
       'TotalCombatDeaths', 'Version'],
      dtype='object')

In [44]:
dfNonWarParA1 = dfNonStateWar[['WarNum', 'SideA1', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]
dfNonWarParA2 = dfNonStateWar[['WarNum', 'SideA2', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]
dfNonWarParB1 = dfNonStateWar[['WarNum', 'SideB1', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]
dfNonWarParB2 = dfNonStateWar[['WarNum', 'SideB2', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]
dfNonWarParB3 = dfNonStateWar[['WarNum', 'SideB3', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]
dfNonWarParB4 = dfNonStateWar[['WarNum', 'SideB4', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]
dfNonWarParB5 = dfNonStateWar[['WarNum', 'SideB5', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome']]

In [46]:
dfNonWarParA2

Unnamed: 0,WarNum,SideA2,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome
0,1500,-8,1818,-9,-9,1824,-9,-9,A,1
1,1501,-8,1819,-9,-9,1828,9,24,A,1
2,1502,-8,1819,-9,-9,1822,-9,-9,A,1
3,1503,-8,1820,1,8,1820,2,23,B,2
4,1505,-8,1821,9,-9,1823,-9,-9,A,1
5,1506,-8,1821,11,-9,1821,12,-9,A,1
6,1508,-8,1825,-9,-9,1828,-9,-9,A,1
7,1509,-8,1825,10,25,1827,4,13,A,3
8,1510,-8,1826,-9,-9,1829,4,12,B,2
9,1511,-8,1826,-9,-9,1827,5,15,A,2


In [47]:
dfNonWarParA1 = dfNonWarParA1[dfNonWarParA1.SideA1 != '-8']
dfNonWarParA2 = dfNonWarParA2[dfNonWarParA2.SideA2 != '-8']
dfNonWarParB1 = dfNonWarParB1[dfNonWarParB1.SideB1 != '-8']
dfNonWarParB2 = dfNonWarParB2[dfNonWarParB2.SideB2 != '-8']
dfNonWarParB3 = dfNonWarParB3[dfNonWarParB3.SideB3 != '-8']
dfNonWarParB4 = dfNonWarParB4[dfNonWarParB4.SideB4 != '-8']
dfNonWarParB5 = dfNonWarParB5[dfNonWarParB5.SideB5 != '-8']

In [52]:
dfNonWarParA2

Unnamed: 0,WarNum,SideA2,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome
18,1523,Argentina,1837,11,-9,1839,1,20,A,1
35,1543,Khoja,1857,-9,-9,1857,9,-9,A,2
40,1550,Nicaragua,1863,1,23,1863,11,15,A,1
56,1573,military,1948,4,3,1949,5,-9,A,2
60,1582,Apodeti,1975,8,11,1975,10,15,B,4


In [63]:
dfNonWarParA1.rename(columns={'SideA1':'PolityName', 'WarNum':'WarID'}, inplace=True)
dfNonWarParA2.rename(columns={'SideA2':'PolityName', 'WarNum':'WarID'}, inplace=True)

combinedNonWarSideA = [dfNonWarParA1, dfNonWarParA2]
dfNonWarParA = pd.concat(combinedNonWarSideA)
dfNonWarParA['Side'] = 'A'
dfNonWarParA['IsInitiator'] = 1
dfNonWarParA['IsInitiator'] [dfNonWarParA['Initiator'] == 'B'] = 0
dfNonWarParA

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,WarID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome,Side,IsInitiator
0,1500,Te Rauparaha's Ngati Toa,1818,-9,-9,1824,-9,-9,A,1,A,1
1,1501,Shaka Zulu,1819,-9,-9,1828,9,24,A,1,A,1
2,1502,Burma,1819,-9,-9,1822,-9,-9,A,1,A,1
3,1503,Buenos Aires,1820,1,8,1820,2,23,B,2,A,1
4,1505,Hongi Hika's Nga Phuhi,1821,9,-9,1823,-9,-9,A,1,A,1
5,1506,Thailand,1821,11,-9,1821,12,-9,A,1,A,1
6,1508,China,1825,-9,-9,1828,-9,-9,A,1,A,1
7,1509,Mexico,1825,10,25,1827,4,13,A,3,A,1
8,1510,Conservative Confederation,1826,-9,-9,1829,4,12,B,2,A,1
9,1511,Viang Chan,1826,-9,-9,1827,5,15,A,2,A,1


In [67]:
dfNonWarParB1.rename(columns={'SideB1':'PolityName', 'WarNum':'WarID'}, inplace=True)
dfNonWarParB2.rename(columns={'SideB2':'PolityName', 'WarNum':'WarID'}, inplace=True)
dfNonWarParB3.rename(columns={'SideB3':'PolityName', 'WarNum':'WarID'}, inplace=True)
dfNonWarParB4.rename(columns={'SideB4':'PolityName', 'WarNum':'WarID'}, inplace=True)
dfNonWarParB5.rename(columns={'SideB5':'PolityName', 'WarNum':'WarID'}, inplace=True)

combinedNonWarSideB = [dfNonWarParB1, dfNonWarParB2, dfNonWarParB3, dfNonWarParB4, dfNonWarParB5]
dfNonWarParB = pd.concat(combinedNonWarSideB)
dfNonWarParB['Side'] = 'B'
dfNonWarParB['IsInitiator'] = 1
dfNonWarParB['IsInitiator'] [dfNonWarParB['Initiator'] == 'A'] = 0
dfNonWarParB

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,WarID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome,Side,IsInitiator
0,1500,Taranaki,1818,-9,-9,1824,-9,-9,A,1,B,0
1,1501,Bantu,1819,-9,-9,1828,9,24,A,1,B,0
2,1502,Assam,1819,-9,-9,1822,-9,-9,A,1,B,0
3,1503,Provinces,1820,1,8,1820,2,23,B,2,B,1
4,1505,Ngati Paoa,1821,9,-9,1823,-9,-9,A,1,B,0
5,1506,Kedah,1821,11,-9,1821,12,-9,A,1,B,0
6,1508,Muslim rebels,1825,-9,-9,1828,-9,-9,A,1,B,0
7,1509,Yaqui Indians,1825,10,25,1827,4,13,A,3,B,0
8,1510,Liberals,1826,-9,-9,1829,4,12,B,2,B,1
9,1511,Siam,1826,-9,-9,1827,5,15,A,2,B,0


In [71]:
# For non-state war, outcome = 1 if sideA wins; 2 if sideB wins. Need to change Outcome so 1 = win, 2 = loss
dfNonWarParB['Outcome'] [dfNonWarParB['Outcome'] == 1] = 9
dfNonWarParB['Outcome'] [dfNonWarParB['Outcome'] == 2] = 1
dfNonWarParB['Outcome'] [dfNonWarParB['Outcome'] == 9] = 2
dfNonWarParB

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,WarID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome,Side,IsInitiator
0,1500,Taranaki,1818,-9,-9,1824,-9,-9,A,2,B,0
1,1501,Bantu,1819,-9,-9,1828,9,24,A,2,B,0
2,1502,Assam,1819,-9,-9,1822,-9,-9,A,2,B,0
3,1503,Provinces,1820,1,8,1820,2,23,B,1,B,1
4,1505,Ngati Paoa,1821,9,-9,1823,-9,-9,A,2,B,0
5,1506,Kedah,1821,11,-9,1821,12,-9,A,2,B,0
6,1508,Muslim rebels,1825,-9,-9,1828,-9,-9,A,2,B,0
7,1509,Yaqui Indians,1825,10,25,1827,4,13,A,3,B,0
8,1510,Liberals,1826,-9,-9,1829,4,12,B,1,B,1
9,1511,Siam,1826,-9,-9,1827,5,15,A,1,B,0


In [77]:
combinedNonWarPar = [dfNonWarParA, dfNonWarParB]
dfNonWarPar = pd.concat(combinedNonWarPar)
dfNonWarPar = dfNonWarPar.sort_values('WarID')
dfNonWarPar.reset_index(drop=True, inplace=True)
dfNonWarPar = dfNonWarPar.replace(-9, '')
dfNonWarPar

Unnamed: 0,WarID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome,Side,IsInitiator
0,1500,Te Rauparaha's Ngati Toa,1818,,,1824,,,A,1,A,1
1,1500,Ngati Ira,1818,,,1824,,,A,2,B,0
2,1500,Waikato,1818,,,1824,,,A,2,B,0
3,1500,Ngai Tahu,1818,,,1824,,,A,2,B,0
4,1500,Taranaki,1818,,,1824,,,A,2,B,0
5,1500,Rangitikei,1818,,,1824,,,A,2,B,0
6,1501,Shaka Zulu,1819,,,1828,9.0,24.0,A,1,A,1
7,1501,Bantu,1819,,,1828,9.0,24.0,A,2,B,0
8,1502,Burma,1819,,,1822,,,A,1,A,1
9,1502,Assam,1819,,,1822,,,A,2,B,0


In [78]:
dfNonWarPar['PolityName'] = dfNonWarPar['PolityName'].str.strip()
dfNonWarPar = dfNonWarPar.merge(dfPolities[['PolityID', 'PolityName']], on='PolityName', how='left', suffixes=('', '_m'),)
dfNonWarPar

Unnamed: 0,WarID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Initiator,Outcome,Side,IsInitiator,PolityID
0,1500,Te Rauparaha's Ngati Toa,1818,,,1824,,,A,1,A,1,10000.0
1,1500,Ngati Ira,1818,,,1824,,,A,2,B,0,10089.0
2,1500,Waikato,1818,,,1824,,,A,2,B,0,10087.0
3,1500,Ngai Tahu,1818,,,1824,,,A,2,B,0,10082.0
4,1500,Taranaki,1818,,,1824,,,A,2,B,0,10042.0
5,1500,Rangitikei,1818,,,1824,,,A,2,B,0,10091.0
6,1501,Shaka Zulu,1819,,,1828,9.0,24.0,A,1,A,1,10001.0
7,1501,Bantu,1819,,,1828,9.0,24.0,A,2,B,0,10043.0
8,1502,Burma,1819,,,1822,,,A,1,A,1,10002.0
9,1502,Assam,1819,,,1822,,,A,2,B,0,7572.0


In [79]:
dfNonWarPar = dfNonWarPar[['WarID', 'PolityID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Side', 'IsInitiator', 'Outcome']]
dfNonWarPar

Unnamed: 0,WarID,PolityID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,Side,IsInitiator,Outcome
0,1500,10000.0,1818,,,1824,,,A,1,1
1,1500,10089.0,1818,,,1824,,,B,0,2
2,1500,10087.0,1818,,,1824,,,B,0,2
3,1500,10082.0,1818,,,1824,,,B,0,2
4,1500,10042.0,1818,,,1824,,,B,0,2
5,1500,10091.0,1818,,,1824,,,B,0,2
6,1501,10001.0,1819,,,1828,9.0,24.0,A,1,1
7,1501,10043.0,1819,,,1828,9.0,24.0,B,0,2
8,1502,10002.0,1819,,,1822,,,A,1,1
9,1502,7572.0,1819,,,1822,,,B,0,2


# STOP
I have progressed up to this point. Following code still needs to be updated

### Extra-State War

In [50]:
dfExtraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode1', 'SideA', 'ccode2', 'SideB',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2 ', 'EndYear2', 'Initiator', 'Interven', 'TransFrom', 'Outcome',
       'TransTo', 'WhereFought', 'BatDeath', 'NonStateDeaths', 'Version'],
      dtype='object')

In [74]:
dfExtraStateWar

Unnamed: 0,WarNum,WarName,WarType,ccode1,SideA,ccode2,SideB,StartMonth1,StartDay1,StartYear1,...,EndYear2,Initiator,Interven,TransFrom,Outcome,TransTo,WhereFought,BatDeath,NonStateDeaths,Version
0,300,Allied Bombardment of Algiers,3,210,Netherlands,-8,-8,8,26,1816,...,-8,1,1,-8,1,-8,6,13,-8,4
1,300,Allied Bombardment of Algiers,3,200,United Kingdom,-8,Algeria,8,26,1816,...,-8,1,1,-8,1,-8,6,129,6000,4
2,301,Ottoman-Wahhabi,3,640,Ottoman Empire,-8,Saudi Wahhabis,9,-9,1816,...,-8,1,0,-8,1,-8,6,13500,14000,4
3,302,Liberation of Chile,2,230,Spain,-8,San Martin revolutionaries,1,9,1817,...,-8,0,0,-8,2,-8,1,1700,1140,4
4,303,First Bolivar Expedition,2,230,Spain,-8,New Granada,4,11,1817,...,-8,1,0,-8,2,-8,1,3000,2000,4
5,304,Mexican Independence,2,230,Spain,-8,Mina Expedition,8,15,1817,...,-8,0,0,-8,1,-8,1,1000,1000,4
6,305,British-Kandyan,2,200,United Kingdom,-8,Kandyan rebels,10,-9,1817,...,-8,0,0,-8,1,-8,7,1000,10000,4
7,306,British-Maratha,2,200,United Kingdom,-8,Marathas,11,6,1817,...,-8,0,0,-8,1,-8,7,2800,2000,4
8,307,Ottoman Conquest of Sudan,3,640,Ottoman Empire,-8,Sudan states,-9,-9,1820,...,-8,1,0,-8,1,-8,4,4000,2500,4
9,308,Second Bolivar Expedition,2,230,Spain,-8,New Granada,4,28,1821,...,-8,0,0,-8,2,-8,1,1000,500,4


In [51]:
dfExtraStateWar1A = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome']]
dfExtraStateWar2A = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2 ', 'EndYear2', 'Initiator', 'Outcome']]
dfExtraStateWar1B = dfExtraStateWar[['WarNum', 'ccode2', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome']]
dfExtraStateWar2B = dfExtraStateWar[['WarNum', 'ccode2', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2 ', 'EndYear2', 'Initiator', 'Outcome']]

In [52]:
dfExtraStateWar1A.rename(columns={'WarNum':'WarID', 'ccode1':'StateID', 'SideA':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfExtraStateWar2A.rename(columns={'WarNum':'WarID', 'ccode1':'StateID', 'SideA':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2 ':'EndDay', 'EndYear2':'EndYear'}, inplace=True)
dfExtraStateWar1B.rename(columns={'WarNum':'WarID', 'ccode2':'StateID', 'SideB':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfExtraStateWar2B.rename(columns={'WarNum':'WarID', 'ccode2':'StateID', 'SideB':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2 ':'EndDay', 'EndYear2':'EndYear'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [53]:
dfExtraStateWar1A = dfIntraStateWarDates1A[dfIntraStateWarDates1A.PolityName != '-8']
dfExtraStateWar1B = dfIntraStateWarDates1B[dfIntraStateWarDates1B.PolityName != '-8']

In [54]:
dfExtraStateWar1B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,-8.0,"Georgians, Dhagestania, Chechens",6,10,1818,-9,-9,1822
1,501,-8.0,Damascus & Aleppo,6,-9,1820,7,21,1821
3,502,-8.0,Liberals,7,2,1820,3,23,1821
4,503,-8.0,Royalists,12,1,1821,4,6,1823
6,505,-8.0,Carbonari,3,10,1821,5,8,1821
7,506,-8.0,Greeks,3,25,1821,4,25,1828
8,506,200.0,United Kingdom,10,20,1827,10,27,1827
9,506,220.0,France,10,20,1827,10,27,1827
10,506,365.0,Russia,10,20,1827,4,25,1828
11,507,-8.0,Mehdi army,3,20,1824,4,-9,1824


In [55]:
dfExtraStateWar2A = dfExtraStateWar2A.replace(-8, '')
dfExtraStateWar2A['datesconcat'] = dfExtraStateWar2A['StartMonth'].map(str) + dfExtraStateWar2A['StartDay'].map(str) + dfExtraStateWar2A['StartYear'].map(str) + dfExtraStateWar2A['EndMonth'].map(str) + dfExtraStateWar2A['EndDay'].map(str) + dfExtraStateWar2A['EndYear'].map(str)
nmissdate2A = dfExtraStateWar2A.loc[0, 'datesconcat']
dfExtraStateWar2A = dfExtraStateWar2A[dfExtraStateWar2A.datesconcat != nmissdate2A]
dfExtraStateWar2A.drop(columns=['datesconcat'], inplace=True)
dfExtraStateWar2A = dfExtraStateWar2A[dfExtraStateWar2A.PolityName != '-8']
dfExtraStateWar2A

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
37,334,210,Netherlands,4,2,1849,6,14,1849
79,379,200,United Kingdom,9,3,1879,9,2,1880
153,454,200,United Kingdom,8,-9,1937,1,-9,1939


In [56]:
dfExtraStateWar2B = dfExtraStateWar2B.replace(-8, '')
dfExtraStateWar2B['datesconcat'] = dfExtraStateWar2B['StartMonth'].map(str) + dfExtraStateWar2B['StartDay'].map(str) + dfExtraStateWar2B['StartYear'].map(str) + dfExtraStateWar2B['EndMonth'].map(str) + dfExtraStateWar2B['EndDay'].map(str) + dfExtraStateWar2B['EndYear'].map(str)
nmissdate2B = dfExtraStateWar2B.loc[0, 'datesconcat']
dfExtraStateWar2B = dfExtraStateWar2B[dfExtraStateWar2B.datesconcat != nmissdate2B]
dfExtraStateWar2B.drop(columns=['datesconcat'], inplace=True)
dfExtraStateWar2B = dfExtraStateWar2B[dfExtraStateWar2B.PolityName != '-8']
dfExtraStateWar2B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
37,334,,Bali,4,2,1849,6,14,1849
79,379,,Afghanistan,9,3,1879,9,2,1880
153,454,,Palestinians,8,-9,1937,1,-9,1939


In [57]:
combinedExtraStateWarDates = [dfExtraStateWar1A, dfExtraStateWar2A, dfExtraStateWar1B, dfExtraStateWar2B]
dfExtraStateWarDates = pd.concat(combinedExtraStateWarDates)
dfExtraStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365,Russia,6,10,1818,-9,-9,1822
1,501,-8,Sidon,6,-9,1820,7,21,1821
2,502,300,Austria,3,-9,1821,3,23,1821
3,502,329,Two Sicilies,7,2,1820,3,23,1821
4,503,230,Spain,12,1,1821,4,6,1823
5,505,300,Austria,3,10,1821,5,8,1821
6,505,325,Sardinia,3,10,1821,5,8,1821
7,506,640,Ottoman Empire,3,25,1821,4,25,1828
11,507,-8,Egypt,3,20,1824,4,-9,1824
12,508,640,Ottoman Empire,6,14,1826,9,30,1826


In [58]:
dfExtraStateWarDates = dfExtraStateWarDates.replace(-9, '')
dfExtraStateWarDates = dfExtraStateWarDates.replace(-8, '')
dfExtraStateWarDates = dfExtraStateWarDates.replace(-7, '')
dfExtraStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365,Russia,6,10,1818,,,1822
1,501,,Sidon,6,,1820,7,21,1821
2,502,300,Austria,3,,1821,3,23,1821
3,502,329,Two Sicilies,7,2,1820,3,23,1821
4,503,230,Spain,12,1,1821,4,6,1823
5,505,300,Austria,3,10,1821,5,8,1821
6,505,325,Sardinia,3,10,1821,5,8,1821
7,506,640,Ottoman Empire,3,25,1821,4,25,1828
11,507,,Egypt,3,20,1824,4,,1824
12,508,640,Ottoman Empire,6,14,1826,9,30,1826


In [59]:
dfExtraStateWarDates = dfExtraStateWarDates[['WarID', 'PolityName', 'StateID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfExtraStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,500,Russia,365,1818,6,10,1822,,
1,501,Sidon,,1820,6,,1821,7,21
2,502,Austria,300,1821,3,,1821,3,23
3,502,Two Sicilies,329,1820,7,2,1821,3,23
4,503,Spain,230,1821,12,1,1823,4,6
5,505,Austria,300,1821,3,10,1821,5,8
6,505,Sardinia,325,1821,3,10,1821,5,8
7,506,Ottoman Empire,640,1821,3,25,1828,4,25
11,507,Egypt,,1824,3,20,1824,4,
12,508,Ottoman Empire,640,1826,6,14,1826,9,30


In [60]:
dfNonStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1500,Te Rauparaha's Ngati Toa,,1818,,,1824,,
1,1501,Shaka Zulu,,1819,,,1828,9,24
2,1502,Burma,,1819,,,1822,,
3,1503,Buenos Aires,,1820,1,8,1820,2,23
4,1505,Hongi Hika's Nga Phuhi,,1821,9,,1823,,
5,1506,Thailand,,1821,11,,1821,12,
6,1508,China,,1825,,,1828,,
7,1509,Mexico,,1825,10,25,1827,4,13
8,1510,Conservative Confederation,,1826,,,1829,4,12
9,1511,Viang Chan,,1826,,,1827,5,15


In [61]:
combinedWarDates = [dfInterStateWarDates, dfIntraStateWarDates, dfNonStateWarDates, dfExtraStateWarDates]
dfWarDates = pd.concat(combinedWarDates)
dfWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823,4,7,1823,11,13
1,1,France,220,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14
3,4,Russia,365,1828,4,26,1829,9,14
4,7,Mexico,70,1846,4,25,1847,9,14
5,7,United States of America,2,1846,4,25,1847,9,14
6,10,Tuscany,337,1848,3,29,1848,8,9
7,10,Italy,325,1848,3,24,1848,8,9
8,10,Austria,300,1848,3,24,1848,8,9
9,10,Modena,332,1848,4,9,1848,8,9


In [62]:
dfWarDates = dfWarDates.replace(-9, '')
dfWarDates = dfWarDates.replace(-8, '')
dfWarDates = dfWarDates.replace(-7, '')

In [63]:
dfWarDates['StartMonthClean'] = dfWarDates['StartMonth']
dfWarDates['StartDayClean'] = dfWarDates['StartDay']
dfWarDates['EndMonthClean'] = dfWarDates['EndMonth']
dfWarDates['EndDayClean'] = dfWarDates['EndDay']
dfWarDates['StartMonthClean'] [dfWarDates['StartMonthClean'] == ''] = 1
dfWarDates['StartDayClean'] [dfWarDates['StartDayClean'] == ''] = 1
dfWarDates['EndMonthClean'] [dfWarDates['EndMonthClean'] == ''] = 1
dfWarDates['EndDayClean'] [dfWarDates['EndDayClean'] == ''] = 1
dfWarDates['EndYear'] [dfWarDates['EndYear'] == ''] = 2100 # placeholder for blank endyears, to be made null later
dfWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StartMonthClean,StartDayClean,EndMonthClean,EndDayClean
0,1,Spain,230,1823,4,7,1823,11,13,4,7,11,13
1,1,France,220,1823,4,7,1823,11,13,4,7,11,13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14,4,26,9,14
3,4,Russia,365,1828,4,26,1829,9,14,4,26,9,14
4,7,Mexico,70,1846,4,25,1847,9,14,4,25,9,14
5,7,United States of America,2,1846,4,25,1847,9,14,4,25,9,14
6,10,Tuscany,337,1848,3,29,1848,8,9,3,29,8,9
7,10,Italy,325,1848,3,24,1848,8,9,3,24,8,9
8,10,Austria,300,1848,3,24,1848,8,9,3,24,8,9
9,10,Modena,332,1848,4,9,1848,8,9,4,9,8,9


In [64]:
dfWarDates['StartYear'] = dfWarDates['StartYear'].astype(int)
dfWarDates['StartMonthClean'] = dfWarDates['StartMonthClean'].astype(int)
dfWarDates['StartDayClean'] = dfWarDates['StartDayClean'].astype(int)
dfWarDates['EndYear'] = dfWarDates['EndYear'].astype(int)
dfWarDates['EndMonthClean'] = dfWarDates['EndMonthClean'].astype(int)
dfWarDates['EndDayClean'] = dfWarDates['EndDayClean'].astype(int)

In [65]:
dfWarDates['StartDate'] = pd.to_datetime(dict(year=dfWarDates.StartYear, month=dfWarDates.StartMonthClean, day=dfWarDates.StartDayClean))
dfWarDates['EndDate'] = pd.to_datetime(dict(year=dfWarDates.EndYear, month=dfWarDates.EndMonthClean, day=dfWarDates.EndDayClean))
dfWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StartMonthClean,StartDayClean,EndMonthClean,EndDayClean,StartDate,EndDate
0,1,Spain,230,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
1,1,France,220,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
3,4,Russia,365,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
4,7,Mexico,70,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
5,7,United States of America,2,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
6,10,Tuscany,337,1848,3,29,1848,8,9,3,29,8,9,1848-03-29,1848-08-09
7,10,Italy,325,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
8,10,Austria,300,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
9,10,Modena,332,1848,4,9,1848,8,9,4,9,8,9,1848-04-09,1848-08-09


In [66]:
dfWarDates['StartDate'] = dfWarDates['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfWarDates['EndDate'] = dfWarDates['EndDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfWarDates['EndDate'] [dfWarDates['EndYear'] == 2100] = ''
dfWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StartMonthClean,StartDayClean,EndMonthClean,EndDayClean,StartDate,EndDate
0,1,Spain,230,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
1,1,France,220,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
3,4,Russia,365,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
4,7,Mexico,70,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
5,7,United States of America,2,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
6,10,Tuscany,337,1848,3,29,1848,8,9,3,29,8,9,1848-03-29,1848-08-09
7,10,Italy,325,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
8,10,Austria,300,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
9,10,Modena,332,1848,4,9,1848,8,9,4,9,8,9,1848-04-09,1848-08-09


In [67]:
dfWarDates = dfWarDates[['WarID', 'PolityName', 'StateID', 'StartDate', 'EndDate', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfWarDates

Unnamed: 0,WarID,PolityName,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823-04-07,1823-11-13,1823,4,7,1823,11,13
1,1,France,220,1823-04-07,1823-11-13,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828-04-26,1829-09-14,1828,4,26,1829,9,14
3,4,Russia,365,1828-04-26,1829-09-14,1828,4,26,1829,9,14
4,7,Mexico,70,1846-04-25,1847-09-14,1846,4,25,1847,9,14
5,7,United States of America,2,1846-04-25,1847-09-14,1846,4,25,1847,9,14
6,10,Tuscany,337,1848-03-29,1848-08-09,1848,3,29,1848,8,9
7,10,Italy,325,1848-03-24,1848-08-09,1848,3,24,1848,8,9
8,10,Austria,300,1848-03-24,1848-08-09,1848,3,24,1848,8,9
9,10,Modena,332,1848-04-09,1848-08-09,1848,4,9,1848,8,9


In [68]:
dfWarDates.drop_duplicates(inplace=True)
dfWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,WarID,PolityName,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823-04-07,1823-11-13,1823,4,7,1823,11,13
1,1,France,220,1823-04-07,1823-11-13,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828-04-26,1829-09-14,1828,4,26,1829,9,14
3,4,Russia,365,1828-04-26,1829-09-14,1828,4,26,1829,9,14
4,7,Mexico,70,1846-04-25,1847-09-14,1846,4,25,1847,9,14
5,7,United States of America,2,1846-04-25,1847-09-14,1846,4,25,1847,9,14
6,10,Tuscany,337,1848-03-29,1848-08-09,1848,3,29,1848,8,9
7,10,Italy,325,1848-03-24,1848-08-09,1848,3,24,1848,8,9
8,10,Austria,300,1848-03-24,1848-08-09,1848,3,24,1848,8,9
9,10,Modena,332,1848-04-09,1848-08-09,1848,4,9,1848,8,9


In [69]:
dfWarDates.to_csv('../FinalData/war_dates.csv', encoding='utf-8', index=False)

## Create test WAR_PARTICIPANTS table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- WarID
- PolityName
- StateID
- Side
- Deaths
- IsInitiator
- Outcome

# NOTE: need to re-work participant and dates into one table

In [71]:
dfInterStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode', 'StateName', 'Side',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought', 'Initiator',
       'Outcome', 'TransTo', 'BatDeath', 'Version'],
      dtype='object')

In [73]:
dfInterWarPar = dfInterStateWar [['WarNum', 'ccode', 'StateName', 'Side', 'Initiator', 'Outcome', 'BatDeath']]
dfInterWarPar

Unnamed: 0,WarNum,ccode,StateName,Side,Initiator,Outcome,BatDeath
0,1,230,Spain,2,2,2,600
1,1,220,France,1,1,1,400
2,4,640,Ottoman Empire,2,2,2,80000
3,4,365,Russia,1,1,1,50000
4,7,70,Mexico,2,2,2,6000
5,7,2,United States of America,1,1,1,13283
6,10,337,Tuscany,2,2,2,100
7,10,325,Italy,2,1,2,3400
8,10,300,Austria,1,2,1,3927
9,10,332,Modena,2,2,2,100


In [75]:
dfInterWarPar.rename(columns={'WarNum':'WarID', 'ccode':'StateID', 'StateName':'PolityName', 'Initiator':'IsInitiator', 'BatDeath':'Deaths'}, inplace=True)
dfInterWarPar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,WarID,StateID,PolityName,Side,IsInitiator,Outcome,Deaths
0,1,230,Spain,2,2,2,600
1,1,220,France,1,1,1,400
2,4,640,Ottoman Empire,2,2,2,80000
3,4,365,Russia,1,1,1,50000
4,7,70,Mexico,2,2,2,6000
5,7,2,United States of America,1,1,1,13283
6,10,337,Tuscany,2,2,2,100
7,10,325,Italy,2,1,2,3400
8,10,300,Austria,1,2,1,3927
9,10,332,Modena,2,2,2,100


In [77]:
dfInterWarPar['IsInitiator'] [dfInterWarPar['IsInitiator'] == 2] = 0
dfInterWarPar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,WarID,StateID,PolityName,Side,IsInitiator,Outcome,Deaths
0,1,230,Spain,2,0,2,600
1,1,220,France,1,1,1,400
2,4,640,Ottoman Empire,2,0,2,80000
3,4,365,Russia,1,1,1,50000
4,7,70,Mexico,2,0,2,6000
5,7,2,United States of America,1,1,1,13283
6,10,337,Tuscany,2,0,2,100
7,10,325,Italy,2,1,2,3400
8,10,300,Austria,1,0,1,3927
9,10,332,Modena,2,0,2,100


In [78]:
dfInterWarPar = dfInterWarPar.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

In [79]:
dfInterWarPar = dfInterWarPar[['WarID', 'PolityName', 'StateID', 'Side', 'Deaths', 'IsInitiator', 'Outcome']]
dfInterWarPar

Unnamed: 0,WarID,PolityName,StateID,Side,Deaths,IsInitiator,Outcome
0,1,Spain,230,2,600,0,2
1,1,France,220,1,400,1,1
2,4,Ottoman Empire,640,2,80000,0,2
3,4,Russia,365,1,50000,1,1
4,7,Mexico,70,2,6000,0,2
5,7,United States of America,2,1,13283,1,1
6,10,Tuscany,337,2,100,0,2
7,10,Italy,325,2,3400,1,2
8,10,Austria,300,1,3927,0,1
9,10,Modena,332,2,100,0,2


In [80]:
dfInterWarPar.to_csv('../FinalData/inter_war_participants.csv', encoding='utf-8', index=False)

## Create test WAR_LOCATION table

Task: Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- WarID
- Region

In [84]:
dfNonStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'WhereFought', 'SideA1', 'SideA2',
       'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5', 'StartYear',
       'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator',
       'TransFrom', 'TransTo', 'Outcome', 'SideADeaths', 'SideBDeaths',
       'TotalCombatDeaths', 'Version'],
      dtype='object')

In [85]:
dfInterWarLocs = dfInterStateWar [['WarNum', 'WhereFought']]
dfIntraWarLocs = dfIntraStateWar [['WarNum', 'WhereFought']]
dfExtraWarLocs = dfExtraStateWar [['WarNum', 'WhereFought']]
dfNonWarLocs = dfNonStateWar [['WarNum', 'WhereFought']]

AllWarLocs = [dfInterWarLocs, dfIntraWarLocs, dfExtraWarLocs, dfNonWarLocs]
dfWarLocs = pd.concat(AllWarLocs)
dfWarLocs

Unnamed: 0,WarNum,WhereFought
0,1,2
1,1,2
2,4,11
3,4,11
4,7,1
5,7,1
6,10,2
7,10,2
8,10,2
9,10,2


In [87]:
dfWarLocs.drop_duplicates(inplace=True)
dfWarLocs

Unnamed: 0,WarNum,WhereFought
0,1,2
2,4,11
4,7,1
6,10,2
10,13,2
12,16,2
16,19,1
18,22,2
23,25,6
25,28,2


In [90]:
dfWarLocs['WhereFought'].value_counts()

7     185
4     132
1     122
6     112
2      97
11      5
9       5
15      2
14      2
19      1
18      1
17      1
16      1
13      1
12      1
Name: WhereFought, dtype: int64

In [91]:
dfWarLocs['WhereFought'] [dfWarLocs['WhereFought'] == 1] = 'W. Hemisphere'
dfWarLocs['WhereFought'] [dfWarLocs['WhereFought'] == 2] = 'Europe'
dfWarLocs['WhereFought'] [dfWarLocs['WhereFought'] == 4] = 'Africa'
dfWarLocs['WhereFought'] [dfWarLocs['WhereFought'] == 6] = 'Middle East'
dfWarLocs['WhereFought'] [dfWarLocs['WhereFought'] == 7] = 'Asia'
dfWarLocs['WhereFought'] [dfWarLocs['WhereFought'] == 9] = 'Oceania'
dfWarLocs

Unnamed: 0,WarNum,WhereFought
0,1,Europe
2,4,11
4,7,W. Hemisphere
6,10,Europe
10,13,Europe
12,16,Europe
16,19,W. Hemisphere
18,22,Europe
23,25,Middle East
25,28,Europe


In [95]:
dfWarLocs.rename(columns={'WarNum':'WarID', 'WhereFought':'Region'}, inplace=True)

In [96]:
dfWarLocs.to_csv('../FinalData/test_war_locations.csv', encoding='utf-8', index=False)