# Start of Data Transformation

Task: use Pandas to transform csv files into DataFrames that match desired tables for database schema

Tables:

- STATE (done)
- STATE_DATES (done)
- STATE_CONTIGUITY (done)
- TERRITORY (done)
- TERRITORIALCHANGE (done)
- STATE_RESOURCE
- IGO
- IGO_MEMBERSHIP
- STATE_ALLIANCE
- ALLIANCE_MEMBERSHIP
- ALLIANCE_TRAITS
- WAR
- WAR_LOCATION
- WAR_DATES
- WAR_PARTICIPANTS
- WAR_TRANSITIONS

In [1]:
import pandas as pd
import numpy as np

In [2]:
!ls SourceData/CorrelatesOfWar/

[34mCodebooks[m[m                    MID_Narratives_2002-2010.pdf
CowWarList.csv               NMC_5_0-wsupplementary.csv
CowWarList.pdf               Non-StateWarData_v4.0.csv
[31mEntities.pdf[m[m                 Territories.csv
Extra-StateWarData_v4.0.csv  alliance_v4.1_by_member.csv
IGO_stateunit_v2.3.csv       contdir.csv
Inter-StateWarData_v4.0.csv  igounit_v2.3.csv
Intra-StateWarData_v4.1.csv  majors2016.csv
[31mMIDA_4.2.csv[m[m                 states2016.csv
[31mMIDB_4.2.csv[m[m                 system2016.csv
[31mMIDLOCA_2.0.csv[m[m              tc2014.csv
MID_Narratives_1993-2001.pdf


## Create 'STATE' table

Task: transform states2016.csv into a table with attributes:

- StateID
- StateAbbr
- StateName

in which each StateID occurs once (as it is the Primary Key)

In [3]:
dfStates = pd.read_csv('SourceData/CorrelatesOfWar/states2016.csv')
dfStates

Unnamed: 0,stateabb,ccode,statenme,styear,stmonth,stday,endyear,endmonth,endday,version
0,USA,2,United States of America,1816,1,1,2016,12,31,2016
1,CAN,20,Canada,1920,1,10,2016,12,31,2016
2,BHM,31,Bahamas,1973,7,10,2016,12,31,2016
3,CUB,40,Cuba,1902,5,20,1906,9,25,2016
4,CUB,40,Cuba,1909,1,23,2016,12,31,2016
5,HAI,41,Haiti,1859,1,1,1915,7,28,2016
6,HAI,41,Haiti,1934,8,15,2016,12,31,2016
7,DOM,42,Dominican Republic,1894,1,1,1916,11,29,2016
8,DOM,42,Dominican Republic,1924,9,29,2016,12,31,2016
9,JAM,51,Jamaica,1962,8,6,2016,12,31,2016


In [4]:
dfStates.drop(columns=['styear', 'stmonth', 'stday', 'endyear', 'endmonth', 'endday', 'version'], inplace=True)
dfStates.drop_duplicates(inplace=True)
dfStates.rename(columns={"stateabb": "StateAbbr", "ccode":"StateID", "statenme":"StateName"}, inplace=True)
dfStates =  dfStates[['StateID', 'StateAbbr', 'StateName']]
dfStates

Unnamed: 0,StateID,StateAbbr,StateName
0,2,USA,United States of America
1,20,CAN,Canada
2,31,BHM,Bahamas
3,40,CUB,Cuba
5,41,HAI,Haiti
7,42,DOM,Dominican Republic
9,51,JAM,Jamaica
10,52,TRI,Trinidad and Tobago
11,53,BAR,Barbados
12,54,DMA,Dominica


In [5]:
badstatenames = dfStates['StateName'].str.contains('\&', regex=True)
sum(badstatenames)

1

In [6]:
dfStates['StateName'] = dfStates['StateName'].str.replace('\&', 'and')
dfStates

Unnamed: 0,StateID,StateAbbr,StateName
0,2,USA,United States of America
1,20,CAN,Canada
2,31,BHM,Bahamas
3,40,CUB,Cuba
5,41,HAI,Haiti
7,42,DOM,Dominican Republic
9,51,JAM,Jamaica
10,52,TRI,Trinidad and Tobago
11,53,BAR,Barbados
12,54,DMA,Dominica


In [7]:
dfStates.to_csv('FinalData/state.csv', encoding='utf-8', index=False)

In [8]:
StateNameMaxLength = int(dfStates['StateName'].str.encode(encoding='utf-8').str.len().max())
print(StateNameMaxLength)
StateAbbrLength = int(dfStates['StateAbbr'].str.encode(encoding='utf-8').str.len().max())
print(StateAbbrLength)

32
3


## Create 'STATE_DATES' table

Task: transform states2016.csv into a table with attributes:

- StateID
- StartDate
- EndDate
- StartYear
- StartMonth
- StartDay
- EndYear
- EndMonth
- EndDay

in which each combination of StateID and StartDate occurs only once.

Note: StartDate and EndDate must be in the format 'YYYY-MM-DD'

In [9]:
dfStateDates = pd.read_csv('SourceData/CorrelatesOfWar/states2016.csv')
dfStateDates

Unnamed: 0,stateabb,ccode,statenme,styear,stmonth,stday,endyear,endmonth,endday,version
0,USA,2,United States of America,1816,1,1,2016,12,31,2016
1,CAN,20,Canada,1920,1,10,2016,12,31,2016
2,BHM,31,Bahamas,1973,7,10,2016,12,31,2016
3,CUB,40,Cuba,1902,5,20,1906,9,25,2016
4,CUB,40,Cuba,1909,1,23,2016,12,31,2016
5,HAI,41,Haiti,1859,1,1,1915,7,28,2016
6,HAI,41,Haiti,1934,8,15,2016,12,31,2016
7,DOM,42,Dominican Republic,1894,1,1,1916,11,29,2016
8,DOM,42,Dominican Republic,1924,9,29,2016,12,31,2016
9,JAM,51,Jamaica,1962,8,6,2016,12,31,2016


In [10]:
dfStateDates.drop(columns=['stateabb', 'statenme', 'version'], inplace=True)
dfStateDates.rename(columns={"ccode":"StateID", "styear": "StartYear", "stmonth":"StartMonth", "stday":"StartDay", "endyear": "EndYear", "endmonth":"EndMonth", "endday":"EndDay"}, inplace=True)
dfStateDates['StartDate'] = pd.to_datetime(dict(year=dfStateDates.StartYear, month=dfStateDates.StartMonth, day=dfStateDates.StartDay))
dfStateDates['EndDate'] = pd.to_datetime(dict(year=dfStateDates.EndYear, month=dfStateDates.EndMonth, day=dfStateDates.EndDay))
dfStateDates = dfStateDates[['StateID', 'StartDate', 'EndDate', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfStateDates

Unnamed: 0,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,2,1816-01-01,2016-12-31,1816,1,1,2016,12,31
1,20,1920-01-10,2016-12-31,1920,1,10,2016,12,31
2,31,1973-07-10,2016-12-31,1973,7,10,2016,12,31
3,40,1902-05-20,1906-09-25,1902,5,20,1906,9,25
4,40,1909-01-23,2016-12-31,1909,1,23,2016,12,31
5,41,1859-01-01,1915-07-28,1859,1,1,1915,7,28
6,41,1934-08-15,2016-12-31,1934,8,15,2016,12,31
7,42,1894-01-01,1916-11-29,1894,1,1,1916,11,29
8,42,1924-09-29,2016-12-31,1924,9,29,2016,12,31
9,51,1962-08-06,2016-12-31,1962,8,6,2016,12,31


In [11]:
dfStateDates['StartDate'] = dfStateDates['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfStateDates['EndDate'] = dfStateDates['EndDate'].apply(lambda x: x.strftime('%Y-%m-%d'))

presentdate = dfStateDates.loc[0,'EndDate']
dfStateDates['EndDate'] = dfStateDates['EndDate'].replace(presentdate, '')
dfStateDates

Unnamed: 0,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,2,1816-01-01,,1816,1,1,2016,12,31
1,20,1920-01-10,,1920,1,10,2016,12,31
2,31,1973-07-10,,1973,7,10,2016,12,31
3,40,1902-05-20,1906-09-25,1902,5,20,1906,9,25
4,40,1909-01-23,,1909,1,23,2016,12,31
5,41,1859-01-01,1915-07-28,1859,1,1,1915,7,28
6,41,1934-08-15,,1934,8,15,2016,12,31
7,42,1894-01-01,1916-11-29,1894,1,1,1916,11,29
8,42,1924-09-29,,1924,9,29,2016,12,31
9,51,1962-08-06,,1962,8,6,2016,12,31


In [12]:
dfStateDates.dtypes

StateID        int64
StartDate     object
EndDate       object
StartYear      int64
StartMonth     int64
StartDay       int64
EndYear        int64
EndMonth       int64
EndDay         int64
dtype: object

In [13]:
dfStateDates.to_csv('FinalData/state_dates.csv', encoding='utf-8', index=False)

## Create 'STATE_CONTIGUITY' table

Task: transform contdir.csv into a table with attributes:

- StateA
- StateB
- StartDate
- EndDate
- StartYear
- StartMonth
- EndYear
- EndMonth
- Type
- Notes

In [14]:
dfStateCont = pd.read_csv('SourceData/CorrelatesOfWar/contdir.csv')
dfStateCont

Unnamed: 0,dyad,statelno,statelab,statehno,statehab,conttype,begin,end,notes,version
0,2020,2,USA,20,CAN,1,192001,201612,Begins with CAN system entry,3.2
1,2031,2,USA,31,BHM,4,197307,201612,Across Atlantic Ocean (closest=Florida-Bimini)...,3.2
2,2040,2,USA,40,CUB,4,190205,190609,Across Florida Straits (closest=Key West); beg...,3.2
3,2040,2,USA,40,CUB,4,190901,201612,Across Florida Straits (closest=Key West); res...,3.2
4,2070,2,USA,70,MEX,1,183101,201612,Begins with MEX system entry,3.2
5,2365,2,USA,365,RUS,2,195901,201612,Across Bering Strait (closest=Alaska-Siberia);...,3.2
6,31040,31,BHM,40,CUB,4,197307,201612,Across Atlantic Ocean (closest=Great Inagua); ...,3.2
7,31041,31,BHM,41,HAI,4,197307,201612,Across Atlantic Ocean (closest=Great Inagua); ...,3.2
8,31042,31,BHM,42,DOM,4,197307,201612,Across Atlantic Ocean (closest=Great Inagua); ...,3.2
9,31051,31,BHM,51,JAM,5,197307,201612,Across Windward Passage (closest=Great Inagua)...,3.2


In [15]:
dfStateCont.drop(columns=['dyad', 'statelab', 'statehab', 'version'], inplace=True)
dfStateCont.rename(columns={"statelno":"StateA", "statehno": "StateB", "conttype":"Type", "notes":"Notes"}, inplace=True)
dfStateCont['StartYear'] = dfStateCont['begin'].astype(str).str[0:4]
dfStateCont['StartMonth'] = dfStateCont['begin'].astype(str).str[4:6]
dfStateCont['EndYear'] = dfStateCont['end'].astype(str).str[0:4]
dfStateCont['EndMonth'] = dfStateCont['end'].astype(str).str[4:6]
dfStateCont.drop(columns=['begin', 'end'], inplace=True)
dfStateCont['StartDate'] = pd.to_datetime(dict(year=dfStateCont.StartYear, month=dfStateCont.StartMonth, day='01'))
dfStateCont['EndDate'] = pd.to_datetime(dict(year=dfStateCont.EndYear, month=dfStateCont.EndMonth, day='01'))
dfStateCont = dfStateCont[['StateA', 'StateB', 'StartDate', 'EndDate', 'StartYear', 'StartMonth', 'EndYear', 'EndMonth', 'Type', 'Notes']]
dfStateCont

Unnamed: 0,StateA,StateB,StartDate,EndDate,StartYear,StartMonth,EndYear,EndMonth,Type,Notes
0,2,20,1920-01-01,2016-12-01,1920,01,2016,12,1,Begins with CAN system entry
1,2,31,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Florida-Bimini)...
2,2,40,1902-05-01,1906-09-01,1902,05,1906,09,4,Across Florida Straits (closest=Key West); beg...
3,2,40,1909-01-01,2016-12-01,1909,01,2016,12,4,Across Florida Straits (closest=Key West); res...
4,2,70,1831-01-01,2016-12-01,1831,01,2016,12,1,Begins with MEX system entry
5,2,365,1959-01-01,2016-12-01,1959,01,2016,12,2,Across Bering Strait (closest=Alaska-Siberia);...
6,31,40,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Great Inagua); ...
7,31,41,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Great Inagua); ...
8,31,42,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Great Inagua); ...
9,31,51,1973-07-01,2016-12-01,1973,07,2016,12,5,Across Windward Passage (closest=Great Inagua)...


In [16]:
NoteMaxLength = int(dfStateCont['Notes'].str.encode(encoding='utf-8').str.len().max())
print(NoteMaxLength)
dfStateCont.Type.unique()

163


array([1, 4, 2, 5, 3])

In [17]:
dfStateCont['StartDate'] = dfStateCont['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfStateCont['EndDate'] = dfStateCont['EndDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfStateCont.dtypes

StateA         int64
StateB         int64
StartDate     object
EndDate       object
StartYear     object
StartMonth    object
EndYear       object
EndMonth      object
Type           int64
Notes         object
dtype: object

In [18]:
badnotes = dfStateCont['Notes'].str.contains('\&', regex=True)
sum(badnotes)

8

In [19]:
dfStateCont['Notes'] = dfStateCont['Notes'].str.replace('\&', 'and')
dfStateCont

Unnamed: 0,StateA,StateB,StartDate,EndDate,StartYear,StartMonth,EndYear,EndMonth,Type,Notes
0,2,20,1920-01-01,2016-12-01,1920,01,2016,12,1,Begins with CAN system entry
1,2,31,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Florida-Bimini)...
2,2,40,1902-05-01,1906-09-01,1902,05,1906,09,4,Across Florida Straits (closest=Key West); beg...
3,2,40,1909-01-01,2016-12-01,1909,01,2016,12,4,Across Florida Straits (closest=Key West); res...
4,2,70,1831-01-01,2016-12-01,1831,01,2016,12,1,Begins with MEX system entry
5,2,365,1959-01-01,2016-12-01,1959,01,2016,12,2,Across Bering Strait (closest=Alaska-Siberia);...
6,31,40,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Great Inagua); ...
7,31,41,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Great Inagua); ...
8,31,42,1973-07-01,2016-12-01,1973,07,2016,12,4,Across Atlantic Ocean (closest=Great Inagua); ...
9,31,51,1973-07-01,2016-12-01,1973,07,2016,12,5,Across Windward Passage (closest=Great Inagua)...


In [20]:
dfStateCont.to_csv('FinalData/state_contiguity.csv', encoding='utf-8', index=False)

## Create 'TERRITORIALCHANGE' table

Task: transform tc2014.csv into a table with attributes:

- TerritorialChangeID
- Gainer
- Loser
- TransferDate
- Year
- Month
- Procedure
- TerritoryID
- TerritoryArea
- TerritoryPopulation
- IsWholeTerritory
- IsMilConflict
- IsIndependence
- GainerIsCont
- LoserIsCont
- IsGainerHomeland
- IsLoserHomeland
- IsSystemEntry
- IsSystemExit

In [21]:
dfTerrChange = pd.read_csv('SourceData/CorrelatesOfWar/tc2014.csv')
dfTerrChange

Unnamed: 0,year,month,gainer,gaintype,procedur,entity,contgain,area,pop,portion,loser,losetype,contlose,entry,exit,number,indep,conflict,version
0,1816,7,160,1,-9,160,-9,2093164.00,1970000,1,230,0,0,1,0,3,1,0,5
1,1816,3,200,0,3,790,0,1.00,.,0,790,1,1,0,0,4,0,1,5
2,1816,.,200,0,3,420,0,179.00,.,0,-9,1,-9,0,0,5,0,0,5
3,1817,.,220,0,3,433,0,7819.00,100000,1,200,0,0,0,0,28,0,0,5
4,1817,.,365,1,1,365,1,650.00,.,0,-9,1,1,0,0,29,0,1,5
5,1818,10,2,1,3,20,1,84240.00,.,0,200,0,0,0,0,30,0,0,5
6,1818,12,155,1,-9,155,-9,464568.00,1656300,1,230,0,0,1,0,31,1,1,5
7,1818,10,200,0,3,2,0,41600.00,.,0,2,1,1,0,0,32,0,0,5
8,1818,6,200,0,1,750,0,421200.00,.,0,-9,1,-9,0,0,33,0,1,5
9,1818,.,200,0,2,438,0,16.00,.,0,-9,1,-9,0,0,34,0,0,5


In [22]:
dfTerrChange.rename(columns={"year":"Year", "month": "Month", "gainer":"Gainer", "gaintype":"IsGainerHomeland", "procedur":"Procedure", "entity":"TerritoryID", "contgain":"GainerIsCont", "area":"TerritoryArea", "pop":"TerritoryPopulation", "portion":"IsWholeTerritory", "loser":"Loser", "losetype":"IsLoserHomeland", "contlose": "LoserIsCont", "entry":"IsSystemEntry", "exit":"IsSystemExit", "number":"TerritorialChangeID", "indep":"IsIndependence", "conflict":"IsMilConflict"}, inplace=True)
dfTerrChange.drop(columns=['version'], inplace=True)
missingmonth = (dfTerrChange['Month'] == '.')
dfTerrChange['MonthClean'] = dfTerrChange['Month']
dfTerrChange['MonthClean'] [dfTerrChange['MonthClean'] == '.'] = 1 # boolean mask
dfTerrChange['TransferDate'] = pd.to_datetime(dict(year=dfTerrChange.Year, month=dfTerrChange.MonthClean, day='01'))
dfTerrChange = dfTerrChange[['TerritorialChangeID', 'Gainer', 'Loser', 'TransferDate', 'Year', 'Month', 'Procedure', 'TerritoryID', 'TerritoryArea', 'TerritoryPopulation', 'IsWholeTerritory', 'IsMilConflict', 'IsIndependence', 'GainerIsCont', 'LoserIsCont', 'IsGainerHomeland', 'IsLoserHomeland', 'IsSystemEntry', 'IsSystemExit']]
dfTerrChange

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Unnamed: 0,TerritorialChangeID,Gainer,Loser,TransferDate,Year,Month,Procedure,TerritoryID,TerritoryArea,TerritoryPopulation,IsWholeTerritory,IsMilConflict,IsIndependence,GainerIsCont,LoserIsCont,IsGainerHomeland,IsLoserHomeland,IsSystemEntry,IsSystemExit
0,3,160,230,1816-07-01,1816,7,-9,160,2093164.00,1970000,1,0,1,-9,0,1,0,1,0
1,4,200,790,1816-03-01,1816,3,3,790,1.00,.,0,1,0,0,1,0,1,0,0
2,5,200,-9,1816-01-01,1816,.,3,420,179.00,.,0,0,0,0,-9,0,1,0,0
3,28,220,200,1817-01-01,1817,.,3,433,7819.00,100000,1,0,0,0,0,0,0,0,0
4,29,365,-9,1817-01-01,1817,.,1,365,650.00,.,0,1,0,1,1,1,1,0,0
5,30,2,200,1818-10-01,1818,10,3,20,84240.00,.,0,0,0,1,0,1,0,0,0
6,31,155,230,1818-12-01,1818,12,-9,155,464568.00,1656300,1,1,1,-9,0,1,0,1,0
7,32,200,2,1818-10-01,1818,10,3,2,41600.00,.,0,0,0,0,1,0,1,0,0
8,33,200,-9,1818-06-01,1818,6,1,750,421200.00,.,0,1,0,0,-9,0,1,0,0
9,34,200,-9,1818-01-01,1818,.,2,438,16.00,.,0,0,0,0,-9,0,1,0,0


In [23]:
dfTerrChange = dfTerrChange.replace(-9, '')
dfTerrChange['Loser'] [dfTerrChange['Loser'] == 0 ] = ''
dfTerrChange['Loser'] [dfTerrChange['Loser'] == 7693 ] = ''
dfTerrChange['Loser'] [dfTerrChange['Loser'] == 2292 ] = ''
dfTerrChange['Loser'] [dfTerrChange['Loser'] == 7507 ] = ''
dfTerrChange['Month'] = dfTerrChange['Month'].replace('.', '')
dfTerrChange['TerritoryPopulation'] = dfTerrChange['TerritoryPopulation'].replace('.', '')
dfTerrChange['TerritoryArea'] = dfTerrChange['TerritoryArea'].replace('.', '')
dfTerrChange['TerritoryID'] = dfTerrChange['TerritoryID'].replace('.', '')
dfTerrChange['TransferDate'] = dfTerrChange['TransferDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfTerrChange

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Unnamed: 0,TerritorialChangeID,Gainer,Loser,TransferDate,Year,Month,Procedure,TerritoryID,TerritoryArea,TerritoryPopulation,IsWholeTerritory,IsMilConflict,IsIndependence,GainerIsCont,LoserIsCont,IsGainerHomeland,IsLoserHomeland,IsSystemEntry,IsSystemExit
0,3,160,230,1816-07-01,1816,7,,160,2093164.00,1970000,1,0,1,,0,1,0,1,0
1,4,200,790,1816-03-01,1816,3,3,790,1.00,,0,1,0,0,1,0,1,0,0
2,5,200,,1816-01-01,1816,,3,420,179.00,,0,0,0,0,,0,1,0,0
3,28,220,200,1817-01-01,1817,,3,433,7819.00,100000,1,0,0,0,0,0,0,0,0
4,29,365,,1817-01-01,1817,,1,365,650.00,,0,1,0,1,1,1,1,0,0
5,30,2,200,1818-10-01,1818,10,3,20,84240.00,,0,0,0,1,0,1,0,0,0
6,31,155,230,1818-12-01,1818,12,,155,464568.00,1656300,1,1,1,,0,1,0,1,0
7,32,200,2,1818-10-01,1818,10,3,2,41600.00,,0,0,0,0,1,0,1,0,0
8,33,200,,1818-06-01,1818,6,1,750,421200.00,,0,1,0,0,,0,1,0,0
9,34,200,,1818-01-01,1818,,2,438,16.00,,0,0,0,0,,0,1,0,0


In [24]:
dfTerrChange.to_csv('FinalData/territorialchange.csv', encoding='utf-8', index=False)

## Create 'TERRITORY' table

Task: transform Territories.csv into a table with attributes:

- TerritoryID
- TerritoryName

Note: Territory.csv was created by running Entities.pdf through [Tabula](https://tabula.technology/) and hand-correcting minor errors (for instance, some sets of rows were shifted to the left).

There were also some `\r`s introduced into rows where the TerritoryName was too long. I removed these by hand.

Some TerritoryIDs matched up to multiple TerritoryNames. Those Territory IDs were:

- 374
- 1152
- 3351
- 3377

I suspect this is a coding error, as the names these IDs corresponded to were in different (albiet relatively close) locations. For the sake of having a unique ID, and because only the ID is recorded in the TERRITORIALCHANGE table, I modified these by hand to be the same, with the second name in parentheses.

I used this code to investigate these irregularities:

```
weirdnames = dfTerritory['TerritoryName'].str.contains('\\r', regex=True)
sum(weirdnames)
dfTerritory['TerritoryID'].value_counts()
```

In [25]:
dfTerritory = pd.read_csv('SourceData/CorrelatesOfWar/Territories.csv')
dfTerritory

Unnamed: 0,Entity Number,Name,Begin Year,End Year,Ending Political Status
0,3,Alaska,1816.0,1867.0,Became colony of 365
1,3,Alaska,1867.0,1959.0,Became colony of 2
2,3,Alaska,1959.0,1993.0,Became part of 2
3,4,Hawaii,1898.0,1960.0,Became colony of 2
4,4,Hawaii,1960.0,1993.0,Became part of 2
5,5,Virgin Islands,1816.0,1917.0,Became colony of 390
6,5,Virgin Islands,1917.0,1993.0,Became colony of 2
7,6,Puerto Rico,1816.0,1821.0,Became part of 1070
8,6,Puerto Rico,1821.0,1898.0,Became colony of 230
9,6,Puerto Rico,1898.0,1952.0,Became colony of 2


In [26]:
dfTerritory.drop(columns=['Begin Year', 'End Year', 'Ending Political Status'], inplace=True)
dfTerritory.rename(columns={'Entity Number':'TerritoryID', 'Name':'TerritoryName'}, inplace=True)
dfTerritory.drop_duplicates(inplace=True)
dfTerritory

Unnamed: 0,TerritoryID,TerritoryName
0,3,Alaska
3,4,Hawaii
5,5,Virgin Islands
7,6,Puerto Rico
10,7,Texas
14,10,Greenland
16,11,Faeroe Is.
18,20,Canada
20,21,Newfoundland
23,30,Bermuda


In [27]:
TerritoryNameMaxLength = int(dfTerritory['TerritoryName'].str.encode(encoding='utf-8').str.len().max())
print(TerritoryNameMaxLength)

68


In [28]:
dfTerritory['TerritoryName'] = dfTerritory['TerritoryName'].str.replace('\&', 'and')
dfTerritory

Unnamed: 0,TerritoryID,TerritoryName
0,3,Alaska
3,4,Hawaii
5,5,Virgin Islands
7,6,Puerto Rico
10,7,Texas
14,10,Greenland
16,11,Faeroe Is.
18,20,Canada
20,21,Newfoundland
23,30,Bermuda


In [29]:
dfTerritory.to_csv('FinalData/territory.csv', encoding='utf-8', index=False)

## Create 'WAR' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv
- Intra-StateWarData_v4.1.csv (note: had to be resaved in UTF-8 encoding)
- Non-StateWarData_v4.0.csv (note: had to be resaved in UTF-8 encoding)
- Extra-StateWarData_v4.0.csv (note: had to be resaved in UTF-8 encoding)
- CowWarList.csv (note: generated from pdf using Tabula, with `\r`s removed by hand)

with the following attributes:

- WarID
- WarShortName
- WarLongName (from CowWarList.csv)
- WarType
- IsIntervention (only relevant for Extra-State Wars)
- IsInternational (only relevant for Intra-State Wars)

In [30]:
dfInterStateWar = pd.read_csv('SourceData/CorrelatesOfWar/Inter-StateWarData_v4.0.csv')
dfInterStateWar

Unnamed: 0,WarNum,WarName,WarType,ccode,StateName,Side,StartMonth1,StartDay1,StartYear1,EndMonth1,...,EndMonth2,EndDay2,EndYear2,TransFrom,WhereFought,Initiator,Outcome,TransTo,BatDeath,Version
0,1,Franco-Spanish War,1,230,Spain,2,4,7,1823,11,...,-8,-8,-8,503,2,2,2,-8,600,4
1,1,Franco-Spanish War,1,220,France,1,4,7,1823,11,...,-8,-8,-8,503,2,1,1,-8,400,4
2,4,First Russo-Turkish,1,640,Ottoman Empire,2,4,26,1828,9,...,-8,-8,-8,506,11,2,2,-8,80000,4
3,4,First Russo-Turkish,1,365,Russia,1,4,26,1828,9,...,-8,-8,-8,506,11,1,1,-8,50000,4
4,7,Mexican-American,1,70,Mexico,2,4,25,1846,9,...,-8,-8,-8,-8,1,2,2,-8,6000,4
5,7,Mexican-American,1,2,United States of America,1,4,25,1846,9,...,-8,-8,-8,-8,1,1,1,-8,13283,4
6,10,Austro-Sardinian,1,337,Tuscany,2,3,29,1848,8,...,-8,-8,-8,-8,2,2,2,-8,100,4
7,10,Austro-Sardinian,1,325,Italy,2,3,24,1848,8,...,3,30,1849,551,2,1,2,-8,3400,4
8,10,Austro-Sardinian,1,300,Austria,1,3,24,1848,8,...,3,30,1849,551,2,2,1,-8,3927,4
9,10,Austro-Sardinian,1,332,Modena,2,4,9,1848,8,...,-8,-8,-8,-8,2,2,2,-8,100,4


In [31]:
dfInterWar = dfInterStateWar[['WarNum', 'WarName', 'WarType']]
dfInterWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName'}, inplace=True)
dfInterWar.drop_duplicates(inplace=True)
dfInterWar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,WarShortName,WarType
0,1,Franco-Spanish War,1
2,4,First Russo-Turkish,1
4,7,Mexican-American,1
6,10,Austro-Sardinian,1
10,13,First Schleswig-Holstein,1
12,16,Roman Republic,1
16,19,La Plata,1
18,22,Crimean,1
23,25,Anglo-Persian,1
25,28,Italian Unification,1


In [32]:
dfIntraStateWar = pd.read_csv('SourceData/CorrelatesOfWar/Intra-StateWarData_v4.1.csv')
dfIntraStateWar

Unnamed: 0,WarNum,WarName,WarType,CcodeA,SideA,CcodeB,SideB,Intnl,StartMonth1,StartDay1,...,EndDay2,EndYear2,TransFrom,WhereFought,Initiator,Outcome,TransTo,SideADeaths,SideBDeaths,Version
0,500,First Caucasus,5,365.0,Russia,-8.0,"Georgians, Dhagestania, Chechens",0,6,10,...,-8,-8,-8.0,2,Chechens,1,-8,5000,6000,4.1
1,501,Sidon-Damascus,6,-8.0,Sidon,-8.0,Damascus & Aleppo,0,6,-9,...,-8,-8,-8.0,6,Sidon,2,-8,-9,-9,4.1
2,502,First Two Sicilies,4,300.0,Austria,-8.0,-8,1,3,-9,...,-8,-8,-8.0,2,Liberals,1,-8,-9,-8,4.1
3,502,First Two Sicilies,4,329.0,Two Sicilies,-8.0,Liberals,1,7,2,...,-8,-8,-8.0,2,Liberals,1,-8,-9,-9,4.1
4,503,Spanish Royalists,4,230.0,Spain,-8.0,Royalists,0,12,1,...,-8,-8,-8.0,2,Royalists,4,1,-9,-9,4.1
5,505,Sardinian Revolt,4,300.0,Austria,-8.0,-8,1,3,10,...,-8,-8,-8.0,2,Carbonari,1,-8,-9,-8,4.1
6,505,Sardinian Revolt,4,325.0,Sardinia,-8.0,Carbonari,1,3,10,...,-8,-8,-8.0,2,Carbonari,1,-8,-9,-9,4.1
7,506,Greek Independence,5,640.0,Ottoman Empire,-8.0,Greeks,1,3,25,...,-8,-8,-8.0,2,Greeks,4,4,6000,3000,4.1
8,506,Greek Independence,5,-8.0,-8,200.0,United Kingdom,1,10,20,...,-8,-8,-8.0,2,Greeks,4,4,-8,80,4.1
9,506,Greek Independence,5,-8.0,-8,220.0,France,1,10,20,...,-8,-8,-8.0,2,Greeks,4,4,-8,40,4.1


In [33]:
dfIntraWar = dfIntraStateWar[['WarNum', 'WarName', 'WarType', 'Intnl']]
dfIntraWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName', 'Intnl':'IsInternational'}, inplace=True)
dfIntraWar.drop_duplicates(inplace=True)
dfIntraWar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,WarShortName,WarType,IsInternational
0,500,First Caucasus,5,0
1,501,Sidon-Damascus,6,0
2,502,First Two Sicilies,4,1
4,503,Spanish Royalists,4,0
5,505,Sardinian Revolt,4,1
7,506,Greek Independence,5,1
11,507,Egypt-Mehdi,6,0
12,508,Janissari Revolt,4,0
13,510,Miguelite War,4,1
15,511,First Murid War,5,0


In [34]:
dfNonStateWar = pd.read_csv('SourceData/CorrelatesOfWar/Non-StateWarData_v4.0.csv')
dfNonStateWar

Unnamed: 0,WarNum,WarName,WarType,WhereFought,SideA1,SideA2,SideB1,SideB2,SideB3,SideB4,...,EndMonth,EndDay,Initiator,TransFrom,TransTo,Outcome,SideADeaths,SideBDeaths,TotalCombatDeaths,Version
0,1500,First Maori Tribal War,8,9,Te Rauparaha's Ngati Toa,-8,Taranaki,Ngai Tahu,Waikato,Ngati Ira,...,-9,-9,A,-8,-8,1,1500,6000,7500,4
1,1501,Shaka Zulu-Bantu War,8,4,Shaka Zulu,-8,Bantu,-8,-8,-8,...,9,24,A,-8,-8,1,20000,40000,60000,4
2,1502,Burma-Assam War,8,7,Burma,-8,Assam,-8,-8,-8,...,-9,-9,A,-8,-8,1,-9,-9,-9,4
3,1503,Buenos Aires War,8,1,Buenos Aires,-8,Provinces,-8,-8,-8,...,2,23,B,-8,-8,2,-9,-9,-9,4
4,1505,Second Maori Tribal War,8,9,Hongi Hika's Nga Phuhi,-8,Ngati Paoa,Ngati Maru,Waikato River Maori,Te Arawa,...,-9,-9,A,-8,-8,1,500,2000,2500,4
5,1506,Siam-Kedah War,8,7,Thailand,-8,Kedah,-8,-8,-8,...,12,-9,A,-8,-8,1,-9,-9,-9,4
6,1508,China-Kashgaria War,8,7,China,-8,Muslim rebels,-8,-8,-8,...,-9,-9,A,-8,-8,1,-9,-9,-9,4
7,1509,Mexico-Yaqui Indian War,8,1,Mexico,-8,Yaqui Indians,-8,-8,-8,...,4,13,A,-8,-8,3,-9,-9,3000,4
8,1510,Central American Confederation War,8,1,Conservative Confederation,-8,Liberals,-8,-8,-8,...,4,12,B,-8,-8,2,2000,1300,3300,4
9,1511,Viang Chan- Siamese War,8,7,Viang Chan,-8,Siam,-8,-8,-8,...,5,15,A,-8,-8,2,24000,7000,31000,4


In [35]:
dfNonWar = dfNonStateWar[['WarNum', 'WarName', 'WarType']]
dfNonWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName'}, inplace=True)
dfNonWar.drop_duplicates(inplace=True)
dfNonWar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,WarShortName,WarType
0,1500,First Maori Tribal War,8
1,1501,Shaka Zulu-Bantu War,8
2,1502,Burma-Assam War,8
3,1503,Buenos Aires War,8
4,1505,Second Maori Tribal War,8
5,1506,Siam-Kedah War,8
6,1508,China-Kashgaria War,8
7,1509,Mexico-Yaqui Indian War,8
8,1510,Central American Confederation War,8
9,1511,Viang Chan- Siamese War,8


In [36]:
dfExtraStateWar = pd.read_csv('SourceData/CorrelatesOfWar/Extra-StateWarData_v4.0.csv')
dfExtraStateWar

Unnamed: 0,WarNum,WarName,WarType,ccode1,SideA,ccode2,SideB,StartMonth1,StartDay1,StartYear1,...,EndYear2,Initiator,Interven,TransFrom,Outcome,TransTo,WhereFought,BatDeath,NonStateDeaths,Version
0,300,Allied Bombardment of Algiers,3,210,Netherlands,-8,-8,8,26,1816,...,-8,1,1,-8,1,-8,6,13,-8,4
1,300,Allied Bombardment of Algiers,3,200,United Kingdom,-8,Algeria,8,26,1816,...,-8,1,1,-8,1,-8,6,129,6000,4
2,301,Ottoman-Wahhabi,3,640,Ottoman Empire,-8,Saudi Wahhabis,9,-9,1816,...,-8,1,0,-8,1,-8,6,13500,14000,4
3,302,Liberation of Chile,2,230,Spain,-8,San Martin revolutionaries,1,9,1817,...,-8,0,0,-8,2,-8,1,1700,1140,4
4,303,First Bolivar Expedition,2,230,Spain,-8,New Granada,4,11,1817,...,-8,1,0,-8,2,-8,1,3000,2000,4
5,304,Mexican Independence,2,230,Spain,-8,Mina Expedition,8,15,1817,...,-8,0,0,-8,1,-8,1,1000,1000,4
6,305,British-Kandyan,2,200,United Kingdom,-8,Kandyan rebels,10,-9,1817,...,-8,0,0,-8,1,-8,7,1000,10000,4
7,306,British-Maratha,2,200,United Kingdom,-8,Marathas,11,6,1817,...,-8,0,0,-8,1,-8,7,2800,2000,4
8,307,Ottoman Conquest of Sudan,3,640,Ottoman Empire,-8,Sudan states,-9,-9,1820,...,-8,1,0,-8,1,-8,4,4000,2500,4
9,308,Second Bolivar Expedition,2,230,Spain,-8,New Granada,4,28,1821,...,-8,0,0,-8,2,-8,1,1000,500,4


In [37]:
dfExtraWar = dfExtraStateWar[['WarNum', 'WarName', 'WarType', 'Interven']]
dfExtraWar.rename(columns={'WarNum':'WarID', 'WarName':'WarShortName', 'Interven':'IsIntervention'}, inplace=True)
dfExtraWar.drop_duplicates(inplace=True)
dfExtraWar

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,WarShortName,WarType,IsIntervention
0,300,Allied Bombardment of Algiers,3,1
2,301,Ottoman-Wahhabi,3,0
3,302,Liberation of Chile,2,0
4,303,First Bolivar Expedition,2,0
5,304,Mexican Independence,2,0
6,305,British-Kandyan,2,0
7,306,British-Maratha,2,0
8,307,Ottoman Conquest of Sudan,3,0
9,308,Second Bolivar Expedition,2,0
10,309,Turco-Persian,3,0


In [38]:
warDFs = [dfInterWar, dfIntraWar, dfNonWar, dfExtraWar]
dfWar = pd.concat(warDFs)
dfWar = dfWar[['WarID', 'WarShortName', 'WarType', 'IsIntervention', 'IsInternational']]
dfWar

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.


  


Unnamed: 0,WarID,WarShortName,WarType,IsIntervention,IsInternational
0,1,Franco-Spanish War,1,,
2,4,First Russo-Turkish,1,,
4,7,Mexican-American,1,,
6,10,Austro-Sardinian,1,,
10,13,First Schleswig-Holstein,1,,
12,16,Roman Republic,1,,
16,19,La Plata,1,,
18,22,Crimean,1,,
23,25,Anglo-Persian,1,,
25,28,Italian Unification,1,,


In [39]:
dfWarNames = pd.read_csv('SourceData/CorrelatesOfWar/CowWarList.csv')
dfWarNames

Unnamed: 0,Year,War Name,War Type & Number
0,1816,Allied Bombardment of Algiers of 1816,Extra-State War #300
1,1816,Ottoman-Wahhabi Revolt of 1816-1818,Extra-State War #301
2,1817,Liberation of Chile of 1817-1818,Extra-State War #302
3,1817,First Bolivar Expedition of 1817-1819,Extra-State War #303
4,1817,War of Mexican Independence of 1817-1818,Extra-State War #304
5,1817,British-Kandyan War of 1817-1818,Extra-State War #305
6,1817,British-Maratha of 1817-1818,Extra-State War #306
7,1818,First Maori Tribal War of 1818-1824,Non-State War #1500
8,1818,First Caucasus War of 1818-1822,Intra-State War #500
9,1819,Shaka Zulu-Bantu War of 1819-1828,Non-State War #1501


In [40]:
dfWarNamesIDs = dfWarNames['War Type & Number'].str.split("#", n = 1, expand = True)
dfWarNames['WarTypeName'] = dfWarNamesIDs[0]
dfWarNames['WarID'] = dfWarNamesIDs[1]
dfWarNames = dfWarNames[['WarID', 'WarTypeName', 'War Name']]
dfWarNames.rename(columns={'War Name':'WarLongName'}, inplace=True)
dfWarNames

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,WarID,WarTypeName,WarLongName
0,300,Extra-State War,Allied Bombardment of Algiers of 1816
1,301,Extra-State War,Ottoman-Wahhabi Revolt of 1816-1818
2,302,Extra-State War,Liberation of Chile of 1817-1818
3,303,Extra-State War,First Bolivar Expedition of 1817-1819
4,304,Extra-State War,War of Mexican Independence of 1817-1818
5,305,Extra-State War,British-Kandyan War of 1817-1818
6,306,Extra-State War,British-Maratha of 1817-1818
7,1500,Non-State War,First Maori Tribal War of 1818-1824
8,500,Intra-State War,First Caucasus War of 1818-1822
9,1501,Non-State War,Shaka Zulu-Bantu War of 1819-1828


In [41]:
dfWarNames['WarID'] = dfWarNames['WarID'].astype('int64')
dfWars = pd.merge(dfWar, dfWarNames, on='WarID')
dfWars = dfWars[['WarID', 'WarShortName', 'WarLongName', 'WarType', 'WarTypeName', 'IsIntervention', 'IsInternational']]
dfWars = dfWars.replace(np.nan, '', regex=True)
dfWars

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,WarID,WarShortName,WarLongName,WarType,WarTypeName,IsIntervention,IsInternational
0,1,Franco-Spanish War,Franco-Spanish War of 1823,1,Inter-State War,,
1,4,First Russo-Turkish,First Russo-Turkish War of 1828-1829,1,Inter-State War,,
2,7,Mexican-American,Mexican-American War of 1846-1847,1,Inter-State War,,
3,10,Austro-Sardinian,Austro-Sardinian War of 1848-1849,1,Inter-State War,,
4,13,First Schleswig-Holstein,First Schleswig-Holstein War of 1848-1849,1,Inter-State War,,
5,16,Roman Republic,War of the Roman Republic of 1849,1,Inter-State War,,
6,19,La Plata,La Plata War of 1851-1852,1,Inter-State War,,
7,22,Crimean,Crimean War of 1853-1856,1,Inter-State War,,
8,25,Anglo-Persian,Anglo-Persian War of 1856-1857,1,Inter-State War,,
9,28,Italian Unification,War of Italian Unification of 1859,1,Inter-State War,,


In [42]:
badsnamechar = dfWars['WarShortName'].str.contains('\&', regex=True)
print(sum(badsnamechar))
badlnamechar = dfWars['WarLongName'].str.contains('\&', regex=True)
print(sum(badlnamechar))

2
0


In [43]:
dfWars['WarShortName'] = dfWars['WarShortName'].str.replace('\&', 'and')

In [44]:
WarShortNameMaxLength = int(dfWars['WarShortName'].str.encode(encoding='utf-8').str.len().max())
print('WarShortNameMaxLength', WarShortNameMaxLength)
WarLongNameMaxLength = int(dfWars['WarLongName'].str.encode(encoding='utf-8').str.len().max())
print('WarLongNameMaxLength', WarLongNameMaxLength)
WarTypeNameMaxLength = int(dfWars['WarTypeName'].str.encode(encoding='utf-8').str.len().max())
print('WarTypeNameMaxLength', WarTypeNameMaxLength)

#dfWar['WarType'].value_counts()
#dfWar['WarID'].value_counts()

WarShortNameMaxLength 50
WarLongNameMaxLength 62
WarTypeNameMaxLength 16


In [45]:
dfWars.WarTypeName.unique()

array(['Inter-State War ', 'Intra-State War ', 'Non-State War ',
       'Extra-State War '], dtype=object)

In [46]:
dfWars = dfWars.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

In [47]:
dfWars.WarTypeName.unique()

array(['Inter-State War', 'Intra-State War', 'Non-State War',
       'Extra-State War'], dtype=object)

In [48]:
dfWars.to_csv('FinalData/war.csv', encoding='utf-8', index=False)

## Create 'WAR_DATES' table

Task: transform the following csv files into one table:

- Inter-StateWarData_v4.0.csv (note: already saved as 'dfInterStateWar')
- Intra-StateWarData_v4.1.csv (note: already saved as 'dfIntraStateWar')
- Non-StateWarData_v4.0.csv (note: already saved as 'dfNonStateWar')
- Extra-StateWarData_v4.0.csv (note: already saved as 'dfExtraStateWar')

with the following attributes:

- WarID
- PolityName
- StateID
- StartDate
- EndDate
- StartYear
- StartMonth
- StartDay
- EndYear
- EndMonth
- EndDay

Note: There was a data entry error in 'Intra-StateWarData_v4.1.csv' for WarNum 585; EndDay1 was coded '-91866' and EndYear1 was left blank. I corrected this by hand so the Day was '-9' and the Year '1866'.

Note: There was another data entry error in the same file for WarNum 682; EndDay1 was coded '1919' and EndYear1 was left blank. I corrected this by hand so that the Day was '-9' and the Year '1919'.

Note: There was an apparent data entry error in the same file for WarNum 623, the second entry (Korea) - the StartDay1 was coded as '29' when StartMonth1 was 2... which is not a valid date. I corrected this by hand so the StartDay1 became '28'.

In [49]:
dfInterStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode', 'StateName', 'Side',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought', 'Initiator',
       'Outcome', 'TransTo', 'BatDeath', 'Version'],
      dtype='object')

In [50]:
dfInterStateWarDates1 = dfInterStateWar[['WarNum', 'ccode', 'StateName', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1']]
dfInterStateWarDates2 = dfInterStateWar[['WarNum', 'ccode', 'StateName', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2', 'EndYear2']]

In [51]:
dfInterStateWarDates1.rename(columns={'WarNum':'WarID', 'ccode':'StateID', 'StateName':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfInterStateWarDates2.rename(columns={'WarNum':'WarID', 'ccode':'StateID', 'StateName':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2':'EndDay', 'EndYear2':'EndYear'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [52]:
dfInterStateWarDates2

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,1,230,Spain,-8,-8,-8,-8,-8,-8
1,1,220,France,-8,-8,-8,-8,-8,-8
2,4,640,Ottoman Empire,-8,-8,-8,-8,-8,-8
3,4,365,Russia,-8,-8,-8,-8,-8,-8
4,7,70,Mexico,-8,-8,-8,-8,-8,-8
5,7,2,United States of America,-8,-8,-8,-8,-8,-8
6,10,337,Tuscany,-8,-8,-8,-8,-8,-8
7,10,325,Italy,3,12,1849,3,30,1849
8,10,300,Austria,3,12,1849,3,30,1849
9,10,332,Modena,-8,-8,-8,-8,-8,-8


In [53]:
dfInterStateWarDates2 = dfInterStateWarDates2.replace(-8, '')
dfInterStateWarDates2

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,1,230,Spain,,,,,,
1,1,220,France,,,,,,
2,4,640,Ottoman Empire,,,,,,
3,4,365,Russia,,,,,,
4,7,70,Mexico,,,,,,
5,7,2,United States of America,,,,,,
6,10,337,Tuscany,,,,,,
7,10,325,Italy,3,12,1849,3,30,1849
8,10,300,Austria,3,12,1849,3,30,1849
9,10,332,Modena,,,,,,


In [54]:
dfInterStateWarDates2['datesconcat'] = dfInterStateWarDates2['StartMonth'].map(str) + dfInterStateWarDates2['StartDay'].map(str) + dfInterStateWarDates2['StartYear'].map(str) + dfInterStateWarDates2['EndMonth'].map(str) + dfInterStateWarDates2['EndDay'].map(str) + dfInterStateWarDates2['EndYear'].map(str)
dfInterStateWarDates2

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,datesconcat
0,1,230,Spain,,,,,,,
1,1,220,France,,,,,,,
2,4,640,Ottoman Empire,,,,,,,
3,4,365,Russia,,,,,,,
4,7,70,Mexico,,,,,,,
5,7,2,United States of America,,,,,,,
6,10,337,Tuscany,,,,,,,
7,10,325,Italy,3,12,1849,3,30,1849,31218493301849
8,10,300,Austria,3,12,1849,3,30,1849,31218493301849
9,10,332,Modena,,,,,,,


In [55]:
missdate = dfInterStateWarDates2.loc[0, 'datesconcat']
dfInterStateWarDates2 = dfInterStateWarDates2[dfInterStateWarDates2.datesconcat != missdate]
dfInterStateWarDates2

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear,datesconcat
7,10,325,Italy,3,12,1849,3,30,1849,31218493301849
8,10,300,Austria,3,12,1849,3,30,1849,31218493301849
10,13,255,Prussia,3,25,1849,7,10,1849,32518497101849
11,13,390,Denmark,3,25,1849,7,10,1849,32518497101849
38,46,255,Germany,6,25,1864,7,20,1864,62518647201864
39,46,390,Denmark,6,25,1864,7,20,1864,62518647201864
40,46,300,Austria,6,25,1864,7,20,1864,62518647201864
104,100,355,Bulgaria,2,3,1913,4,19,1913,2319134191913
105,100,345,Yugoslavia,2,3,1913,4,19,1913,2319134191913
182,139,365,USSR,8,8,1945,8,14,1945,8819458141945


In [56]:
dfInterStateWarDates2.drop(columns=['datesconcat'], inplace=True)
dfInterStateWarDates2

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
7,10,325,Italy,3,12,1849,3,30,1849
8,10,300,Austria,3,12,1849,3,30,1849
10,13,255,Prussia,3,25,1849,7,10,1849
11,13,390,Denmark,3,25,1849,7,10,1849
38,46,255,Germany,6,25,1864,7,20,1864
39,46,390,Denmark,6,25,1864,7,20,1864
40,46,300,Austria,6,25,1864,7,20,1864
104,100,355,Bulgaria,2,3,1913,4,19,1913
105,100,345,Yugoslavia,2,3,1913,4,19,1913
182,139,365,USSR,8,8,1945,8,14,1945


In [57]:
combinedInterStateWarDates = [dfInterStateWarDates1, dfInterStateWarDates2]
dfInterStateWarDates = pd.concat(combinedInterStateWarDates)
dfInterStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,1,230,Spain,4,7,1823,11,13,1823
1,1,220,France,4,7,1823,11,13,1823
2,4,640,Ottoman Empire,4,26,1828,9,14,1829
3,4,365,Russia,4,26,1828,9,14,1829
4,7,70,Mexico,4,25,1846,9,14,1847
5,7,2,United States of America,4,25,1846,9,14,1847
6,10,337,Tuscany,3,29,1848,8,9,1848
7,10,325,Italy,3,24,1848,8,9,1848
8,10,300,Austria,3,24,1848,8,9,1848
9,10,332,Modena,4,9,1848,8,9,1848


In [58]:
dfInterStateWarDates = dfInterStateWarDates[['WarID', 'PolityName', 'StateID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfInterStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823,4,7,1823,11,13
1,1,France,220,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14
3,4,Russia,365,1828,4,26,1829,9,14
4,7,Mexico,70,1846,4,25,1847,9,14
5,7,United States of America,2,1846,4,25,1847,9,14
6,10,Tuscany,337,1848,3,29,1848,8,9
7,10,Italy,325,1848,3,24,1848,8,9
8,10,Austria,300,1848,3,24,1848,8,9
9,10,Modena,332,1848,4,9,1848,8,9


In [59]:
dfIntraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'CcodeA', 'SideA', 'CcodeB', 'SideB',
       'Intnl', 'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1',
       'EndDay1', 'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2',
       'EndMonth2', 'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought',
       'Initiator', 'Outcome', 'TransTo', 'SideADeaths', 'SideBDeaths',
       'Version'],
      dtype='object')

In [60]:
dfIntraStateWarDates1A = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1']]
dfIntraStateWarDates2A = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2', 'EndYear2']]
dfIntraStateWarDates1B = dfIntraStateWar[['WarNum', 'CcodeB', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1']]
dfIntraStateWarDates2B = dfIntraStateWar[['WarNum', 'CcodeB', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2', 'EndYear2']]

In [61]:
dfIntraStateWarDates1A.rename(columns={'WarNum':'WarID', 'CcodeA':'StateID', 'SideA':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfIntraStateWarDates2A.rename(columns={'WarNum':'WarID', 'CcodeA':'StateID', 'SideA':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2':'EndDay', 'EndYear2':'EndYear'}, inplace=True)
dfIntraStateWarDates1B.rename(columns={'WarNum':'WarID', 'CcodeB':'StateID', 'SideB':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfIntraStateWarDates2B.rename(columns={'WarNum':'WarID', 'CcodeB':'StateID', 'SideB':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2':'EndDay', 'EndYear2':'EndYear'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [62]:
dfIntraStateWarDates1A

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365.0,Russia,6,10,1818,-9,-9,1822
1,501,-8.0,Sidon,6,-9,1820,7,21,1821
2,502,300.0,Austria,3,-9,1821,3,23,1821
3,502,329.0,Two Sicilies,7,2,1820,3,23,1821
4,503,230.0,Spain,12,1,1821,4,6,1823
5,505,300.0,Austria,3,10,1821,5,8,1821
6,505,325.0,Sardinia,3,10,1821,5,8,1821
7,506,640.0,Ottoman Empire,3,25,1821,4,25,1828
8,506,-8.0,-8,10,20,1827,10,27,1827
9,506,-8.0,-8,10,20,1827,10,27,1827


In [63]:
dfIntraStateWarDates1A = dfIntraStateWarDates1A[dfIntraStateWarDates1A.PolityName != '-8']
dfIntraStateWarDates1A

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365.0,Russia,6,10,1818,-9,-9,1822
1,501,-8.0,Sidon,6,-9,1820,7,21,1821
2,502,300.0,Austria,3,-9,1821,3,23,1821
3,502,329.0,Two Sicilies,7,2,1820,3,23,1821
4,503,230.0,Spain,12,1,1821,4,6,1823
5,505,300.0,Austria,3,10,1821,5,8,1821
6,505,325.0,Sardinia,3,10,1821,5,8,1821
7,506,640.0,Ottoman Empire,3,25,1821,4,25,1828
11,507,-8.0,Egypt,3,20,1824,4,-9,1824
12,508,640.0,Ottoman Empire,6,14,1826,9,30,1826


In [64]:
dfIntraStateWarDates1B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,-8.0,"Georgians, Dhagestania, Chechens",6,10,1818,-9,-9,1822
1,501,-8.0,Damascus & Aleppo,6,-9,1820,7,21,1821
2,502,-8.0,-8,3,-9,1821,3,23,1821
3,502,-8.0,Liberals,7,2,1820,3,23,1821
4,503,-8.0,Royalists,12,1,1821,4,6,1823
5,505,-8.0,-8,3,10,1821,5,8,1821
6,505,-8.0,Carbonari,3,10,1821,5,8,1821
7,506,-8.0,Greeks,3,25,1821,4,25,1828
8,506,200.0,United Kingdom,10,20,1827,10,27,1827
9,506,220.0,France,10,20,1827,10,27,1827


In [65]:
dfIntraStateWarDates1B = dfIntraStateWarDates1B[dfIntraStateWarDates1B.PolityName != '-8']
dfIntraStateWarDates1B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,-8.0,"Georgians, Dhagestania, Chechens",6,10,1818,-9,-9,1822
1,501,-8.0,Damascus & Aleppo,6,-9,1820,7,21,1821
3,502,-8.0,Liberals,7,2,1820,3,23,1821
4,503,-8.0,Royalists,12,1,1821,4,6,1823
6,505,-8.0,Carbonari,3,10,1821,5,8,1821
7,506,-8.0,Greeks,3,25,1821,4,25,1828
8,506,200.0,United Kingdom,10,20,1827,10,27,1827
9,506,220.0,France,10,20,1827,10,27,1827
10,506,365.0,Russia,10,20,1827,4,25,1828
11,507,-8.0,Mehdi army,3,20,1824,4,-9,1824


In [66]:
dfIntraStateWarDates2A

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365.0,Russia,-8,-8,-8,-8,-8,-8
1,501,-8.0,Sidon,-8,-8,-8,-8,-8,-8
2,502,300.0,Austria,-8,-8,-8,-8,-8,-8
3,502,329.0,Two Sicilies,-8,-8,-8,-8,-8,-8
4,503,230.0,Spain,-8,-8,-8,-8,-8,-8
5,505,300.0,Austria,-8,-8,-8,-8,-8,-8
6,505,325.0,Sardinia,-8,-8,-8,-8,-8,-8
7,506,640.0,Ottoman Empire,-8,-8,-8,-8,-8,-8
8,506,-8.0,-8,-8,-8,-8,-8,-8,-8
9,506,-8.0,-8,-8,-8,-8,-8,-8,-8


In [67]:
dfIntraStateWarDates2A = dfIntraStateWarDates2A.replace(-8, '')
dfIntraStateWarDates2A['datesconcat'] = dfIntraStateWarDates2A['StartMonth'].map(str) + dfIntraStateWarDates2A['StartDay'].map(str) + dfIntraStateWarDates2A['StartYear'].map(str) + dfIntraStateWarDates2A['EndMonth'].map(str) + dfIntraStateWarDates2A['EndDay'].map(str) + dfIntraStateWarDates2A['EndYear'].map(str)
missdate2A = dfIntraStateWarDates2A.loc[0, 'datesconcat']
dfIntraStateWarDates2A = dfIntraStateWarDates2A[dfIntraStateWarDates2A.datesconcat != missdate2A]
dfIntraStateWarDates2A.drop(columns=['datesconcat'], inplace=True)
dfIntraStateWarDates2A = dfIntraStateWarDates2A[dfIntraStateWarDates2A.PolityName != '-8']
dfIntraStateWarDates2A

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
48,547,329,Two Sicilies,5,15,1848,5,15,1849
86,590,101,Venezuela,8,14,1869,1,7,1871
111,623,730,Korea,9,14,1894,11,28,1894
193,720,350,Greece,2,12,1946,10,16,1949
310,820,620,Libya,6,-9,1983,9,-9,1984
324,836,625,Sudan,4,15,1992,1,10,2005
367,877,346,Bosnia,3,20,1995,12,14,1995
391,898,451,Sierra Leone,5,11,2000,11,10,2000


In [68]:
dfIntraStateWarDates2B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,-8.0,"Georgians, Dhagestania, Chechens",-8,-8,-8,-8,-8,-8
1,501,-8.0,Damascus & Aleppo,-8,-8,-8,-8,-8,-8
2,502,-8.0,-8,-8,-8,-8,-8,-8,-8
3,502,-8.0,Liberals,-8,-8,-8,-8,-8,-8
4,503,-8.0,Royalists,-8,-8,-8,-8,-8,-8
5,505,-8.0,-8,-8,-8,-8,-8,-8,-8
6,505,-8.0,Carbonari,-8,-8,-8,-8,-8,-8
7,506,-8.0,Greeks,-8,-8,-8,-8,-8,-8
8,506,200.0,United Kingdom,-8,-8,-8,-8,-8,-8
9,506,220.0,France,-8,-8,-8,-8,-8,-8


In [69]:
dfIntraStateWarDates2B = dfIntraStateWarDates2B.replace(-8, '')
dfIntraStateWarDates2B['datesconcat'] = dfIntraStateWarDates2B['StartMonth'].map(str) + dfIntraStateWarDates2B['StartDay'].map(str) + dfIntraStateWarDates2B['StartYear'].map(str) + dfIntraStateWarDates2B['EndMonth'].map(str) + dfIntraStateWarDates2B['EndDay'].map(str) + dfIntraStateWarDates2B['EndYear'].map(str)
missdate2B = dfIntraStateWarDates2B.loc[0, 'datesconcat']
dfIntraStateWarDates2B = dfIntraStateWarDates2B[dfIntraStateWarDates2B.datesconcat != missdate2B]
dfIntraStateWarDates2B.drop(columns=['datesconcat'], inplace=True)
dfIntraStateWarDates2B = dfIntraStateWarDates2B[dfIntraStateWarDates2B.PolityName != '-8']
dfIntraStateWarDates2B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
48,547,,Liberals,5,15,1848,5,15,1849
86,590,,Conservatives,8,14,1869,1,7,1871
111,623,,Tonghak Society,9,14,1894,11,28,1894
193,720,,Communists,2,12,1946,10,16,1949
324,836,,SPLA-Garang faction,4,15,1992,1,10,2005
367,877,,Bosnian Serbs,3,20,1995,12,14,1995
369,877,344.0,Croatia,3,20,1995,12,14,1995
391,898,,Kabbah faction,5,11,2000,11,10,2000
392,898,452.0,Ghana,5,11,2000,11,10,2000
393,898,475.0,Nigeria,5,11,2000,11,10,2000


In [70]:
combinedIntraStateWarDates = [dfIntraStateWarDates1A, dfIntraStateWarDates2A, dfIntraStateWarDates1B, dfIntraStateWarDates2B]
dfIntraStateWarDates = pd.concat(combinedIntraStateWarDates)
dfIntraStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365,Russia,6,10,1818,-9,-9,1822
1,501,-8,Sidon,6,-9,1820,7,21,1821
2,502,300,Austria,3,-9,1821,3,23,1821
3,502,329,Two Sicilies,7,2,1820,3,23,1821
4,503,230,Spain,12,1,1821,4,6,1823
5,505,300,Austria,3,10,1821,5,8,1821
6,505,325,Sardinia,3,10,1821,5,8,1821
7,506,640,Ottoman Empire,3,25,1821,4,25,1828
11,507,-8,Egypt,3,20,1824,4,-9,1824
12,508,640,Ottoman Empire,6,14,1826,9,30,1826


In [71]:
PolityNameMaxLength = int(dfIntraStateWarDates['PolityName'].str.encode(encoding='utf-8').str.len().max())
print(PolityNameMaxLength)

47


In [72]:
dfIntraStateWarDates = dfIntraStateWarDates.replace(-9, '')
dfIntraStateWarDates = dfIntraStateWarDates.replace(-8, '')
dfIntraStateWarDates = dfIntraStateWarDates.replace(-7, '')
dfIntraStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365,Russia,6,10,1818,,,1822
1,501,,Sidon,6,,1820,7,21,1821
2,502,300,Austria,3,,1821,3,23,1821
3,502,329,Two Sicilies,7,2,1820,3,23,1821
4,503,230,Spain,12,1,1821,4,6,1823
5,505,300,Austria,3,10,1821,5,8,1821
6,505,325,Sardinia,3,10,1821,5,8,1821
7,506,640,Ottoman Empire,3,25,1821,4,25,1828
11,507,,Egypt,3,20,1824,4,,1824
12,508,640,Ottoman Empire,6,14,1826,9,30,1826


In [73]:
dfIntraStateWarDates = dfIntraStateWarDates[['WarID', 'PolityName', 'StateID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfIntraStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,500,Russia,365,1818,6,10,1822,,
1,501,Sidon,,1820,6,,1821,7,21
2,502,Austria,300,1821,3,,1821,3,23
3,502,Two Sicilies,329,1820,7,2,1821,3,23
4,503,Spain,230,1821,12,1,1823,4,6
5,505,Austria,300,1821,3,10,1821,5,8
6,505,Sardinia,325,1821,3,10,1821,5,8
7,506,Ottoman Empire,640,1821,3,25,1828,4,25
11,507,Egypt,,1824,3,20,1824,4,
12,508,Ottoman Empire,640,1826,6,14,1826,9,30


In [74]:
dfNonStateWar

Unnamed: 0,WarNum,WarName,WarType,WhereFought,SideA1,SideA2,SideB1,SideB2,SideB3,SideB4,...,EndMonth,EndDay,Initiator,TransFrom,TransTo,Outcome,SideADeaths,SideBDeaths,TotalCombatDeaths,Version
0,1500,First Maori Tribal War,8,9,Te Rauparaha's Ngati Toa,-8,Taranaki,Ngai Tahu,Waikato,Ngati Ira,...,-9,-9,A,-8,-8,1,1500,6000,7500,4
1,1501,Shaka Zulu-Bantu War,8,4,Shaka Zulu,-8,Bantu,-8,-8,-8,...,9,24,A,-8,-8,1,20000,40000,60000,4
2,1502,Burma-Assam War,8,7,Burma,-8,Assam,-8,-8,-8,...,-9,-9,A,-8,-8,1,-9,-9,-9,4
3,1503,Buenos Aires War,8,1,Buenos Aires,-8,Provinces,-8,-8,-8,...,2,23,B,-8,-8,2,-9,-9,-9,4
4,1505,Second Maori Tribal War,8,9,Hongi Hika's Nga Phuhi,-8,Ngati Paoa,Ngati Maru,Waikato River Maori,Te Arawa,...,-9,-9,A,-8,-8,1,500,2000,2500,4
5,1506,Siam-Kedah War,8,7,Thailand,-8,Kedah,-8,-8,-8,...,12,-9,A,-8,-8,1,-9,-9,-9,4
6,1508,China-Kashgaria War,8,7,China,-8,Muslim rebels,-8,-8,-8,...,-9,-9,A,-8,-8,1,-9,-9,-9,4
7,1509,Mexico-Yaqui Indian War,8,1,Mexico,-8,Yaqui Indians,-8,-8,-8,...,4,13,A,-8,-8,3,-9,-9,3000,4
8,1510,Central American Confederation War,8,1,Conservative Confederation,-8,Liberals,-8,-8,-8,...,4,12,B,-8,-8,2,2000,1300,3300,4
9,1511,Viang Chan- Siamese War,8,7,Viang Chan,-8,Siam,-8,-8,-8,...,5,15,A,-8,-8,2,24000,7000,31000,4


In [75]:
dfNonStateWarDates = dfNonStateWar[['WarNum', 'SideA1', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfNonStateWarDates.rename(columns={'WarNum':'WarID', 'SideA1':'PolityName'}, inplace=True)
dfNonStateWarDates = dfNonStateWarDates.replace(-9, '')
dfNonStateWarDates['StateID'] = ''
dfNonStateWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,WarID,PolityName,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StateID
0,1500,Te Rauparaha's Ngati Toa,1818,,,1824,,,
1,1501,Shaka Zulu,1819,,,1828,9,24,
2,1502,Burma,1819,,,1822,,,
3,1503,Buenos Aires,1820,1,8,1820,2,23,
4,1505,Hongi Hika's Nga Phuhi,1821,9,,1823,,,
5,1506,Thailand,1821,11,,1821,12,,
6,1508,China,1825,,,1828,,,
7,1509,Mexico,1825,10,25,1827,4,13,
8,1510,Conservative Confederation,1826,,,1829,4,12,
9,1511,Viang Chan,1826,,,1827,5,15,


In [76]:
dfNonStateWarDates['StateID'] = ''
dfNonStateWarDates = dfNonStateWarDates[['WarID', 'PolityName', 'StateID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfNonStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1500,Te Rauparaha's Ngati Toa,,1818,,,1824,,
1,1501,Shaka Zulu,,1819,,,1828,9,24
2,1502,Burma,,1819,,,1822,,
3,1503,Buenos Aires,,1820,1,8,1820,2,23
4,1505,Hongi Hika's Nga Phuhi,,1821,9,,1823,,
5,1506,Thailand,,1821,11,,1821,12,
6,1508,China,,1825,,,1828,,
7,1509,Mexico,,1825,10,25,1827,4,13
8,1510,Conservative Confederation,,1826,,,1829,4,12
9,1511,Viang Chan,,1826,,,1827,5,15


In [77]:
dfExtraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode1', 'SideA', 'ccode2', 'SideB',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2 ', 'EndYear2', 'Initiator', 'Interven', 'TransFrom', 'Outcome',
       'TransTo', 'WhereFought', 'BatDeath', 'NonStateDeaths', 'Version'],
      dtype='object')

In [78]:
dfExtraStateWar1A = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1']]
dfExtraStateWar2A = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2 ', 'EndYear2']]
dfExtraStateWar1B = dfExtraStateWar[['WarNum', 'ccode2', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 
                                         'EndMonth1', 'EndDay1', 'EndYear1']]
dfExtraStateWar2B = dfExtraStateWar[['WarNum', 'ccode2', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 
                                         'EndMonth2', 'EndDay2 ', 'EndYear2']]

In [79]:
dfExtraStateWar1A.rename(columns={'WarNum':'WarID', 'ccode1':'StateID', 'SideA':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfExtraStateWar2A.rename(columns={'WarNum':'WarID', 'ccode1':'StateID', 'SideA':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2 ':'EndDay', 'EndYear2':'EndYear'}, inplace=True)
dfExtraStateWar1B.rename(columns={'WarNum':'WarID', 'ccode2':'StateID', 'SideB':'PolityName', 'StartMonth1':'StartMonth', 
                                      'StartDay1':'StartDay', 'StartYear1':'StartYear', 'EndMonth1':'EndMonth', 
                                      'EndDay1':'EndDay', 'EndYear1':'EndYear'}, inplace=True)
dfExtraStateWar2B.rename(columns={'WarNum':'WarID', 'ccode2':'StateID', 'SideB':'PolityName', 'StartMonth2':'StartMonth', 
                                      'StartDay2':'StartDay', 'StartYear2':'StartYear', 'EndMonth2':'EndMonth', 
                                      'EndDay2 ':'EndDay', 'EndYear2':'EndYear'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [80]:
dfExtraStateWar1A = dfIntraStateWarDates1A[dfIntraStateWarDates1A.PolityName != '-8']
dfExtraStateWar1B = dfIntraStateWarDates1B[dfIntraStateWarDates1B.PolityName != '-8']

In [81]:
dfExtraStateWar1B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,-8.0,"Georgians, Dhagestania, Chechens",6,10,1818,-9,-9,1822
1,501,-8.0,Damascus & Aleppo,6,-9,1820,7,21,1821
3,502,-8.0,Liberals,7,2,1820,3,23,1821
4,503,-8.0,Royalists,12,1,1821,4,6,1823
6,505,-8.0,Carbonari,3,10,1821,5,8,1821
7,506,-8.0,Greeks,3,25,1821,4,25,1828
8,506,200.0,United Kingdom,10,20,1827,10,27,1827
9,506,220.0,France,10,20,1827,10,27,1827
10,506,365.0,Russia,10,20,1827,4,25,1828
11,507,-8.0,Mehdi army,3,20,1824,4,-9,1824


In [82]:
dfExtraStateWar2A = dfExtraStateWar2A.replace(-8, '')
dfExtraStateWar2A['datesconcat'] = dfExtraStateWar2A['StartMonth'].map(str) + dfExtraStateWar2A['StartDay'].map(str) + dfExtraStateWar2A['StartYear'].map(str) + dfExtraStateWar2A['EndMonth'].map(str) + dfExtraStateWar2A['EndDay'].map(str) + dfExtraStateWar2A['EndYear'].map(str)
nmissdate2A = dfExtraStateWar2A.loc[0, 'datesconcat']
dfExtraStateWar2A = dfExtraStateWar2A[dfExtraStateWar2A.datesconcat != nmissdate2A]
dfExtraStateWar2A.drop(columns=['datesconcat'], inplace=True)
dfExtraStateWar2A = dfExtraStateWar2A[dfExtraStateWar2A.PolityName != '-8']
dfExtraStateWar2A

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
37,334,210,Netherlands,4,2,1849,6,14,1849
79,379,200,United Kingdom,9,3,1879,9,2,1880
153,454,200,United Kingdom,8,-9,1937,1,-9,1939


In [83]:
dfExtraStateWar2B = dfExtraStateWar2B.replace(-8, '')
dfExtraStateWar2B['datesconcat'] = dfExtraStateWar2B['StartMonth'].map(str) + dfExtraStateWar2B['StartDay'].map(str) + dfExtraStateWar2B['StartYear'].map(str) + dfExtraStateWar2B['EndMonth'].map(str) + dfExtraStateWar2B['EndDay'].map(str) + dfExtraStateWar2B['EndYear'].map(str)
nmissdate2B = dfExtraStateWar2B.loc[0, 'datesconcat']
dfExtraStateWar2B = dfExtraStateWar2B[dfExtraStateWar2B.datesconcat != nmissdate2B]
dfExtraStateWar2B.drop(columns=['datesconcat'], inplace=True)
dfExtraStateWar2B = dfExtraStateWar2B[dfExtraStateWar2B.PolityName != '-8']
dfExtraStateWar2B

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
37,334,,Bali,4,2,1849,6,14,1849
79,379,,Afghanistan,9,3,1879,9,2,1880
153,454,,Palestinians,8,-9,1937,1,-9,1939


In [84]:
combinedExtraStateWarDates = [dfExtraStateWar1A, dfExtraStateWar2A, dfExtraStateWar1B, dfExtraStateWar2B]
dfExtraStateWarDates = pd.concat(combinedExtraStateWarDates)
dfExtraStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365,Russia,6,10,1818,-9,-9,1822
1,501,-8,Sidon,6,-9,1820,7,21,1821
2,502,300,Austria,3,-9,1821,3,23,1821
3,502,329,Two Sicilies,7,2,1820,3,23,1821
4,503,230,Spain,12,1,1821,4,6,1823
5,505,300,Austria,3,10,1821,5,8,1821
6,505,325,Sardinia,3,10,1821,5,8,1821
7,506,640,Ottoman Empire,3,25,1821,4,25,1828
11,507,-8,Egypt,3,20,1824,4,-9,1824
12,508,640,Ottoman Empire,6,14,1826,9,30,1826


In [85]:
dfExtraStateWarDates = dfExtraStateWarDates.replace(-9, '')
dfExtraStateWarDates = dfExtraStateWarDates.replace(-8, '')
dfExtraStateWarDates = dfExtraStateWarDates.replace(-7, '')
dfExtraStateWarDates

Unnamed: 0,WarID,StateID,PolityName,StartMonth,StartDay,StartYear,EndMonth,EndDay,EndYear
0,500,365,Russia,6,10,1818,,,1822
1,501,,Sidon,6,,1820,7,21,1821
2,502,300,Austria,3,,1821,3,23,1821
3,502,329,Two Sicilies,7,2,1820,3,23,1821
4,503,230,Spain,12,1,1821,4,6,1823
5,505,300,Austria,3,10,1821,5,8,1821
6,505,325,Sardinia,3,10,1821,5,8,1821
7,506,640,Ottoman Empire,3,25,1821,4,25,1828
11,507,,Egypt,3,20,1824,4,,1824
12,508,640,Ottoman Empire,6,14,1826,9,30,1826


In [86]:
dfExtraStateWarDates = dfExtraStateWarDates[['WarID', 'PolityName', 'StateID', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfExtraStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,500,Russia,365,1818,6,10,1822,,
1,501,Sidon,,1820,6,,1821,7,21
2,502,Austria,300,1821,3,,1821,3,23
3,502,Two Sicilies,329,1820,7,2,1821,3,23
4,503,Spain,230,1821,12,1,1823,4,6
5,505,Austria,300,1821,3,10,1821,5,8
6,505,Sardinia,325,1821,3,10,1821,5,8
7,506,Ottoman Empire,640,1821,3,25,1828,4,25
11,507,Egypt,,1824,3,20,1824,4,
12,508,Ottoman Empire,640,1826,6,14,1826,9,30


In [87]:
dfNonStateWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1500,Te Rauparaha's Ngati Toa,,1818,,,1824,,
1,1501,Shaka Zulu,,1819,,,1828,9,24
2,1502,Burma,,1819,,,1822,,
3,1503,Buenos Aires,,1820,1,8,1820,2,23
4,1505,Hongi Hika's Nga Phuhi,,1821,9,,1823,,
5,1506,Thailand,,1821,11,,1821,12,
6,1508,China,,1825,,,1828,,
7,1509,Mexico,,1825,10,25,1827,4,13
8,1510,Conservative Confederation,,1826,,,1829,4,12
9,1511,Viang Chan,,1826,,,1827,5,15


In [88]:
combinedWarDates = [dfInterStateWarDates, dfIntraStateWarDates, dfNonStateWarDates, dfExtraStateWarDates]
dfWarDates = pd.concat(combinedWarDates)
dfWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823,4,7,1823,11,13
1,1,France,220,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14
3,4,Russia,365,1828,4,26,1829,9,14
4,7,Mexico,70,1846,4,25,1847,9,14
5,7,United States of America,2,1846,4,25,1847,9,14
6,10,Tuscany,337,1848,3,29,1848,8,9
7,10,Italy,325,1848,3,24,1848,8,9
8,10,Austria,300,1848,3,24,1848,8,9
9,10,Modena,332,1848,4,9,1848,8,9


In [89]:
dfWarDates = dfWarDates.replace(-9, '')
dfWarDates = dfWarDates.replace(-8, '')
dfWarDates = dfWarDates.replace(-7, '')

In [90]:
dfWarDates['StartMonthClean'] = dfWarDates['StartMonth']
dfWarDates['StartDayClean'] = dfWarDates['StartDay']
dfWarDates['EndMonthClean'] = dfWarDates['EndMonth']
dfWarDates['EndDayClean'] = dfWarDates['EndDay']
dfWarDates['StartMonthClean'] [dfWarDates['StartMonthClean'] == ''] = 1
dfWarDates['StartDayClean'] [dfWarDates['StartDayClean'] == ''] = 1
dfWarDates['EndMonthClean'] [dfWarDates['EndMonthClean'] == ''] = 1
dfWarDates['EndDayClean'] [dfWarDates['EndDayClean'] == ''] = 1
dfWarDates['EndYear'] [dfWarDates['EndYear'] == ''] = 2100 # placeholder for blank endyears, to be made null later
dfWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StartMonthClean,StartDayClean,EndMonthClean,EndDayClean
0,1,Spain,230,1823,4,7,1823,11,13,4,7,11,13
1,1,France,220,1823,4,7,1823,11,13,4,7,11,13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14,4,26,9,14
3,4,Russia,365,1828,4,26,1829,9,14,4,26,9,14
4,7,Mexico,70,1846,4,25,1847,9,14,4,25,9,14
5,7,United States of America,2,1846,4,25,1847,9,14,4,25,9,14
6,10,Tuscany,337,1848,3,29,1848,8,9,3,29,8,9
7,10,Italy,325,1848,3,24,1848,8,9,3,24,8,9
8,10,Austria,300,1848,3,24,1848,8,9,3,24,8,9
9,10,Modena,332,1848,4,9,1848,8,9,4,9,8,9


In [91]:
dfWarDates['StartYear'] = dfWarDates['StartYear'].astype(int)
dfWarDates['StartMonthClean'] = dfWarDates['StartMonthClean'].astype(int)
dfWarDates['StartDayClean'] = dfWarDates['StartDayClean'].astype(int)
dfWarDates['EndYear'] = dfWarDates['EndYear'].astype(int)
dfWarDates['EndMonthClean'] = dfWarDates['EndMonthClean'].astype(int)
dfWarDates['EndDayClean'] = dfWarDates['EndDayClean'].astype(int)

In [92]:
dfWarDates['StartDate'] = pd.to_datetime(dict(year=dfWarDates.StartYear, month=dfWarDates.StartMonthClean, day=dfWarDates.StartDayClean))
dfWarDates['EndDate'] = pd.to_datetime(dict(year=dfWarDates.EndYear, month=dfWarDates.EndMonthClean, day=dfWarDates.EndDayClean))
dfWarDates

Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StartMonthClean,StartDayClean,EndMonthClean,EndDayClean,StartDate,EndDate
0,1,Spain,230,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
1,1,France,220,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
3,4,Russia,365,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
4,7,Mexico,70,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
5,7,United States of America,2,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
6,10,Tuscany,337,1848,3,29,1848,8,9,3,29,8,9,1848-03-29,1848-08-09
7,10,Italy,325,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
8,10,Austria,300,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
9,10,Modena,332,1848,4,9,1848,8,9,4,9,8,9,1848-04-09,1848-08-09


In [93]:
dfWarDates['StartDate'] = dfWarDates['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfWarDates['EndDate'] = dfWarDates['EndDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
dfWarDates['EndDate'] [dfWarDates['EndYear'] == 2100] = ''
dfWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,WarID,PolityName,StateID,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay,StartMonthClean,StartDayClean,EndMonthClean,EndDayClean,StartDate,EndDate
0,1,Spain,230,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
1,1,France,220,1823,4,7,1823,11,13,4,7,11,13,1823-04-07,1823-11-13
2,4,Ottoman Empire,640,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
3,4,Russia,365,1828,4,26,1829,9,14,4,26,9,14,1828-04-26,1829-09-14
4,7,Mexico,70,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
5,7,United States of America,2,1846,4,25,1847,9,14,4,25,9,14,1846-04-25,1847-09-14
6,10,Tuscany,337,1848,3,29,1848,8,9,3,29,8,9,1848-03-29,1848-08-09
7,10,Italy,325,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
8,10,Austria,300,1848,3,24,1848,8,9,3,24,8,9,1848-03-24,1848-08-09
9,10,Modena,332,1848,4,9,1848,8,9,4,9,8,9,1848-04-09,1848-08-09


In [94]:
dfWarDates = dfWarDates[['WarID', 'PolityName', 'StateID', 'StartDate', 'EndDate', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay']]
dfWarDates

Unnamed: 0,WarID,PolityName,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823-04-07,1823-11-13,1823,4,7,1823,11,13
1,1,France,220,1823-04-07,1823-11-13,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828-04-26,1829-09-14,1828,4,26,1829,9,14
3,4,Russia,365,1828-04-26,1829-09-14,1828,4,26,1829,9,14
4,7,Mexico,70,1846-04-25,1847-09-14,1846,4,25,1847,9,14
5,7,United States of America,2,1846-04-25,1847-09-14,1846,4,25,1847,9,14
6,10,Tuscany,337,1848-03-29,1848-08-09,1848,3,29,1848,8,9
7,10,Italy,325,1848-03-24,1848-08-09,1848,3,24,1848,8,9
8,10,Austria,300,1848-03-24,1848-08-09,1848,3,24,1848,8,9
9,10,Modena,332,1848-04-09,1848-08-09,1848,4,9,1848,8,9


In [95]:
dfWarDates.drop_duplicates(inplace=True)
dfWarDates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,WarID,PolityName,StateID,StartDate,EndDate,StartYear,StartMonth,StartDay,EndYear,EndMonth,EndDay
0,1,Spain,230,1823-04-07,1823-11-13,1823,4,7,1823,11,13
1,1,France,220,1823-04-07,1823-11-13,1823,4,7,1823,11,13
2,4,Ottoman Empire,640,1828-04-26,1829-09-14,1828,4,26,1829,9,14
3,4,Russia,365,1828-04-26,1829-09-14,1828,4,26,1829,9,14
4,7,Mexico,70,1846-04-25,1847-09-14,1846,4,25,1847,9,14
5,7,United States of America,2,1846-04-25,1847-09-14,1846,4,25,1847,9,14
6,10,Tuscany,337,1848-03-29,1848-08-09,1848,3,29,1848,8,9
7,10,Italy,325,1848-03-24,1848-08-09,1848,3,24,1848,8,9
8,10,Austria,300,1848-03-24,1848-08-09,1848,3,24,1848,8,9
9,10,Modena,332,1848-04-09,1848-08-09,1848,4,9,1848,8,9


In [96]:
dfWarDates.to_csv('FinalData/war_dates.csv', encoding='utf-8', index=False)