# Wrangle War Participant Data

### Input Datasets

- `Inter-StateWarData_v4.0.csv`
- `INTRA-STATE WARS v5.1 CSV.csv`
- `Extra-StateWarData_v4.0.csv`
- `Non-StateWarData_v4.0.csv`
- `polity.csv`

### Output Datasets

- `war_participants.csv`

In [1]:
import pandas as pd
import numpy as np

In [2]:
raw_data_path = "../data/raw/"
processed_data_path = "../data/processed/"

In [3]:
dfInterStateWar = pd.read_csv(raw_data_path+"Inter-StateWarData_v4.0.csv", encoding='utf-8', na_values=[-7, -8, -9], dtype={"WarNum": str, "TransFrom": str, "TransTo": str})
dfIntraStateWar = pd.read_csv(raw_data_path+"INTRA-STATE_State_participants v5.1 CSV.csv", encoding='latin-1', na_values=[-7, -8, -9], dtype={"WarNum": str})
dfExtraStateWar = pd.read_csv(raw_data_path+"Extra-StateWarData_v4.0.csv", encoding='latin-1', na_values=[-7, -8, -9], dtype={"WarNum": str, "Interven": bool, "TransFrom": str, "TransTo": str})
dfNonStateWar = pd.read_csv(raw_data_path+"Non-StateWarData_v4.0.csv", encoding='utf-8', na_values=[-7, -8, -9], dtype={"WarNum": str, "TransFrom": str, "TransTo": str})
dfPolities = pd.read_csv(processed_data_path+"polity.csv", encoding='utf-8')

## create "war_participants" table

table creation statement

```
class War_Participants(Base):
    __tablename__ = "war_participants"

    war = Column(String(5), primary_key=True)
    polity = Column(Integer(5), primary_key=True)
    start_date = Column(Date, primary_key=True)
    start_date_prec = Column(Text)
    end_date = Column(Date)
    end_date_prec = Column(Text)
    side = Column(String(1))
    is_initiator = Column(Boolean)
    outcome = Column(Text)
    deaths = Column(Integer)

    __table_args__ = (
        ForeignKeyConstraint(["war"], ["war.id"]),
        ForeignKeyConstraint(["polity"], ["polity.id"]),
    )
```

#### Notes

Units of observation

- Interstate War: WarNum + ccode + Side
- Intrastate War: WarNum + CcodeA + CcodeB
- Extrastate War: WarNum + ccode1 + ccode2
- Nonstate War: WarNum


Outcome Codes

Integrated:

1 = "won"
2 = "lost"
3 = "compromise"
4 = "transitioned"
5 = "war ongoing as of (last updated)"
6 = "stalemate"
7 = "conflict continues at below war level"
8 = "changed sides"

Interstate:

1 = Winner
2 = Loser
3 = Compromise/Tied
4 = The war was transformed into another type of war 
5 = The war is ongoing as of 12/31/2007
6 = Stalemate
7 = Conflict continues at below war level
8 = changed sides

Intrastate:

1 = Side A wins; 
2 = Side B wins; 
3 = Compromise; 
4 = The war was transformed into another type of war; 
5 = The war is ongoing as of 12/31/2014; 
6 = Stalemate; 
7 = Conflict continues at below war level

Nonstate:

1 - Side A wins
2 - Side B wins
3 - Compromise
4 - The war was transformed into another type of war 
5 - The war is ongoing as of 12/31/2007
6 - Stalemate
7 - Conflict continues at below war level

Extrastate:

1 – Side A wins
2 - Side B wins
3 - Compromise
4 - The war was transformed into another type of war 
5 - The war is ongoing as of 12/31/2007
6 - Stalemate
7 - Conflict continues at below war level

### common functions

In [36]:
def process_dates(df, start_year, start_month, start_day, end_year, end_month, end_day):
    """
    For a given dataframe, with the column names provided for the start/end year/month/day,
    create columns for the date precision and the actual date
    """
    df["start_date_prec"] = "day"
    df["start_date_prec"] = df["start_date_prec"]\
        .where(df[start_day].notna(), "month")\
        .where(df[start_month].notna(), "year")

    df["end_date_prec"] = "day"
    df["end_date_prec"] = df["end_date_prec"]\
        .where(df[end_day].notna(), "month")\
        .where(df[end_month].notna(), "year")

    df[start_day] = df[start_day].fillna(1).astype(int)
    df[start_month] = df[start_month].fillna(1).astype(int)
    df[start_year] = df[start_year].fillna(1).astype(int)

    df[end_day] = df[end_day].fillna(1).astype(int)
    df[end_month] = df[end_month].fillna(1).astype(int)
    df[end_year] = df[end_year].fillna(1).astype(int)

    df["start_date"] = pd.to_datetime(df[[start_year, start_month, start_day]].rename(columns={start_year:"year", start_month:"month", start_day:"day"}), errors="coerce")
    df["end_date"] = pd.to_datetime(df[[end_year, end_month, end_day]].rename(columns={end_year:"year", end_month:"month", end_day:"day"}), errors="coerce")
    
    df = df.drop(columns = [start_year, start_month, start_day, end_year, end_month, end_day])
    return df

### Interstate War

In [5]:
dfInterStateWar.head()

Unnamed: 0,WarNum,WarName,WarType,ccode,StateName,Side,StartMonth1,StartDay1,StartYear1,EndMonth1,...,EndMonth2,EndDay2,EndYear2,TransFrom,WhereFought,Initiator,Outcome,TransTo,BatDeath,Version
0,1,Franco-Spanish War,1,230,Spain,2,4,7,1823,11,...,,,,503.0,2,2,2,,600.0,4
1,1,Franco-Spanish War,1,220,France,1,4,7,1823,11,...,,,,503.0,2,1,1,,400.0,4
2,4,First Russo-Turkish,1,640,Ottoman Empire,2,4,26,1828,9,...,,,,506.0,11,2,2,,80000.0,4
3,4,First Russo-Turkish,1,365,Russia,1,4,26,1828,9,...,,,,506.0,11,1,1,,50000.0,4
4,7,Mexican-American,1,70,Mexico,2,4,25,1846,9,...,,,,,1,2,2,,6000.0,4


In [6]:
dfInterStateWar.duplicated(['WarNum','ccode','Side']).sum()

0

In [7]:
dfInterStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode', 'StateName', 'Side',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2', 'EndYear2', 'TransFrom', 'WhereFought', 'Initiator',
       'Outcome', 'TransTo', 'BatDeath', 'Version'],
      dtype='object')

In [8]:
dfInterPar_dates1 = dfInterStateWar[['WarNum', 'ccode', 'Side','StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1', 'EndYear1', 'Initiator', 'Outcome', 'BatDeath']]\
.drop_duplicates().dropna(subset=["StartYear1"])
dfInterPar_dates1_proc = process_dates(dfInterPar_dates1, "StartYear1", "StartMonth1", "StartDay1", "EndYear1", "EndMonth1", "EndDay1")

In [9]:
dfInterPar_dates2 = dfInterStateWar[['WarNum', 'ccode', 'Side', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2', 'EndDay2', 'EndYear2', 'Initiator', 'Outcome', 'BatDeath']]\
.drop_duplicates().dropna(subset=["StartYear2"])
dfInterPar_dates2_proc = process_dates(dfInterPar_dates2, "StartYear2", "StartMonth2", "StartDay2", "EndYear2", "EndMonth2", "EndDay2")

In [14]:
dfInterPar = pd.concat([dfInterPar_dates1_proc, dfInterPar_dates2_proc])
dfInterPar

Unnamed: 0,WarNum,ccode,Side,Initiator,Outcome,BatDeath,start_date_prec,end_date_prec,start_date,end_date
0,1,230,2,2,2,600.0,day,day,1823-04-07,1823-11-13
1,1,220,1,1,1,400.0,day,day,1823-04-07,1823-11-13
2,4,640,2,2,2,80000.0,day,day,1828-04-26,1829-09-14
3,4,365,1,1,1,50000.0,day,day,1828-04-26,1829-09-14
4,7,70,2,2,2,6000.0,day,day,1846-04-25,1847-09-14
...,...,...,...,...,...,...,...,...,...,...
200,148,660,2,2,2,500.0,day,day,1948-10-15,1948-10-31
201,148,663,2,1,2,1000.0,day,day,1948-10-15,1948-10-31
202,148,645,2,2,2,500.0,day,day,1948-10-15,1948-10-31
268,184,640,1,1,1,1000.0,day,day,1974-08-14,1974-08-16


In [15]:
dfInterPar = dfInterPar.rename(columns={"WarNum": "war", "ccode": "polity"})

In [16]:
inter_outcome_map = {1: "won", 
                     2: "lost", 
                     3: "compromised", 
                     4: "war transitioned", 
                     5: "war ongoing as of war_type.last_updated", 
                     6: "stalemate", 
                     7: "conflict continues at below war level", 
                     8: "changed sides",
                    }

In [17]:
dfInterPar["is_initiator"] = dfInterPar["Initiator"].map({1:True, 2:False})
dfInterPar["side"] = dfInterPar["Side"].map({1:"A", 2:"B"})
dfInterPar["deaths"] = dfInterPar["BatDeath"].astype("Int64")
dfInterPar["outcome"] = dfInterPar["Outcome"].map(inter_outcome_map)
dfInterPar

Unnamed: 0,war,polity,Side,Initiator,Outcome,BatDeath,start_date_prec,end_date_prec,start_date,end_date,is_initiator,side,deaths,outcome
0,1,230,2,2,2,600.0,day,day,1823-04-07,1823-11-13,False,B,600,lost
1,1,220,1,1,1,400.0,day,day,1823-04-07,1823-11-13,True,A,400,won
2,4,640,2,2,2,80000.0,day,day,1828-04-26,1829-09-14,False,B,80000,lost
3,4,365,1,1,1,50000.0,day,day,1828-04-26,1829-09-14,True,A,50000,won
4,7,70,2,2,2,6000.0,day,day,1846-04-25,1847-09-14,False,B,6000,lost
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,148,660,2,2,2,500.0,day,day,1948-10-15,1948-10-31,False,B,500,lost
201,148,663,2,1,2,1000.0,day,day,1948-10-15,1948-10-31,True,B,1000,lost
202,148,645,2,2,2,500.0,day,day,1948-10-15,1948-10-31,False,B,500,lost
268,184,640,1,1,1,1000.0,day,day,1974-08-14,1974-08-16,True,A,1000,won


In [18]:
dfInterPar_final = dfInterPar[["war", "polity", "start_date", "start_date_prec", "end_date", "end_date_prec", "side", "is_initiator", "outcome", "deaths"]]
dfInterPar_final

Unnamed: 0,war,polity,start_date,start_date_prec,end_date,end_date_prec,side,is_initiator,outcome,deaths
0,1,230,1823-04-07,day,1823-11-13,day,B,False,lost,600
1,1,220,1823-04-07,day,1823-11-13,day,A,True,won,400
2,4,640,1828-04-26,day,1829-09-14,day,B,False,lost,80000
3,4,365,1828-04-26,day,1829-09-14,day,A,True,won,50000
4,7,70,1846-04-25,day,1847-09-14,day,B,False,lost,6000
...,...,...,...,...,...,...,...,...,...,...
200,148,660,1948-10-15,day,1948-10-31,day,B,False,lost,500
201,148,663,1948-10-15,day,1948-10-31,day,B,True,lost,1000
202,148,645,1948-10-15,day,1948-10-31,day,B,False,lost,500
268,184,640,1974-08-14,day,1974-08-16,day,A,True,won,1000


In [19]:
dfInterPar_final.duplicated(subset=["war", "polity", "start_date"]).sum()

0

## Intrastate War

In [50]:
dfIntraStateWar

Unnamed: 0,WarNum,WarName,V5Region,WarType,CcodeA,SideA,CcodeB,SideB,Intnl,StartMo1,...,Outcome,TransTo,Deaths A,Deaths B,TotalBDeaths,SideAPeakTotForces,SideAPeak TheatForces,SideBPeakTotForces,SideBPeakTheatForces,Version
0,500,First Caucasus War of 1818-1822,3,5.0,365.0,Russia,,Caucasus Rebels,0,6.0,...,1.0,,5000.0,6000.0,11000.0,1001000.0,70300.0,,46000.0,5.1
1,502,First Two Sicilies War of 1820-1821,3,4.0,300.0,Austria,,,1,3.0,...,1.0,,,,,273000.0,60000.0,,,5.1
2,502,First Two Sicilies War of 1820-1821,3,4.0,329.0,Two Sicilies,,Liberals,1,7.0,...,1.0,,,,2000.0,49000.0,10000.0,,50000.0,5.1
3,502.1,Ali Pasha Rebellion of 1820-1822,3,5.0,640.0,Ottoman Empire,,Ali Pasha Loyalists,0,7.0,...,1.0,,,,2000.0,200000.0,20000.0,,13500.0,5.1
4,503,Sardinian Revolt of 1821,3,4.0,300.0,Austria,,,1,4.0,...,1.0,,,,,273000.0,2000.0,,,5.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
588,992.5,Somali-Al-Shabaab war of 2014-present,4,4.0,530.0,Ethiopia,,,1,3.0,...,5.0,,,,,138000.0,4395.0,,,5.1
589,993,Donbas War of 2014-present,3,5.0,,,365.0,Russia,1,8.0,...,5.0,,,220.0,,,,798000.0,12000.0,5.1
590,993,Donbas War of 2014-present,3,5.0,369.0,Ukraine,,Separatists,1,4.0,...,5.0,,4000.0,2500.0,6700.0,204000.0,250000.0,,25000.0,5.1
591,994,Second Libyan Civil War of 2014-present,5,4.0,620.0,Libya,,Libyan Dawn,0,5.0,...,3.0,,,,2000.0,7000.0,30000.0,,50000.0,5.1


In [51]:
dfIntraStateWar.columns

Index(['WarNum', 'WarName', 'V5Region', 'WarType', 'CcodeA', 'SideA', 'CcodeB',
       'SideB', 'Intnl', 'StartMo1', 'StartDy1', 'StartYr1', 'EndMo1',
       'EndDy1', 'EndYr1', 'StartMo2', 'StartDy2', 'StartYr2', 'EndMo2',
       'EndDy2', 'EndYr2', 'StartMo3', 'StartDy3', 'StartYr3', 'EndMo3',
       'EndDy3', 'EndYr3', 'StartMo4', 'StartDy4', 'StartYr4', 'EndMo4',
       'EndDy4', 'EndYr4', 'WDuratDays', 'WDuratMo', 'TransFrom', 'Initiator',
       'Outcome', 'TransTo', 'Deaths A', 'Deaths B', 'TotalBDeaths',
       'SideAPeakTotForces', 'SideAPeak TheatForces', 'SideBPeakTotForces',
       'SideBPeakTheatForces', 'Version'],
      dtype='object')

In [54]:
dfIntraStateWar_dates1 = dfIntraStateWar[['WarNum', 'CcodeA', 'SideA', 'CcodeB', 'SideB', 'StartMo1', 'StartDy1', 'StartYr1', 'EndMo1', 'EndDy1', 'EndYr1', 'WDuratDays', 'WDuratMo', 'Initiator', 'Outcome', 'Deaths A', 'Deaths B', 'TotalBDeaths', 'SideAPeakTotForces', 'SideAPeak TheatForces', 'SideBPeakTotForces', 'SideBPeakTheatForces']]\
.drop_duplicates().dropna(subset=["StartYr1"])
dfIntraStateWar_dates1_proc = process_dates(dfIntraStateWar_dates1, "StartYr1", "StartMo1", "StartDy1", "EndYr1", "EndMo1", "EndDy1")

In [55]:
dfIntraStateWar_dates1_proc

Unnamed: 0,WarNum,CcodeA,SideA,CcodeB,SideB,WDuratDays,WDuratMo,Initiator,Outcome,Deaths A,Deaths B,TotalBDeaths,SideAPeakTotForces,SideAPeak TheatForces,SideBPeakTotForces,SideBPeakTheatForces,start_date_prec,end_date_prec,start_date,end_date
0,500,365.0,Russia,,Caucasus Rebels,1596,53.20,Chechnya,1.0,5000.0,6000.0,11000.0,1001000.0,70300.0,,46000.0,day,month,1818-06-10,1822-11-01
1,502,300.0,Austria,,,9,0.30,,1.0,,,,273000.0,60000.0,,,month,day,1821-03-01,1821-03-23
2,502,329.0,Two Sicilies,,Liberals,262,8.73,Liberals,1.0,,,2000.0,49000.0,10000.0,,50000.0,day,day,1820-07-02,1821-03-23
3,502.1,640.0,Ottoman Empire,,Ali Pasha Loyalists,550,18.33,Ottoman Empire,1.0,,,2000.0,200000.0,20000.0,,13500.0,month,day,1820-07-01,1822-01-24
4,503,300.0,Austria,,,32,1.07,,1.0,,,,273000.0,2000.0,,,day,day,1821-04-07,1821-05-08
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
588,992.5,530.0,Ethiopia,,,659,21.97,,5.0,,,,138000.0,4395.0,,,day,year,2014-03-03,NaT
589,993,,,365.0,Russia,370,12.33,,5.0,,220.0,,,,798000.0,12000.0,day,year,2014-08-15,NaT
590,993,369.0,Ukraine,,Separatists,493,16.43,Separatists,5.0,4000.0,2500.0,6700.0,204000.0,250000.0,,25000.0,day,year,2014-04-15,NaT
591,994,620.0,Libya,,Libyan Dawn,572,19.07,Libya,3.0,,,2000.0,7000.0,30000.0,,50000.0,day,year,2014-05-16,NaT


## Extrastate War

In [20]:
dfExtraStateWar

Unnamed: 0,WarNum,WarName,WarType,ccode1,SideA,ccode2,SideB,StartMonth1,StartDay1,StartYear1,...,EndYear2,Initiator,Interven,TransFrom,Outcome,TransTo,WhereFought,BatDeath,NonStateDeaths,Version
0,300,Allied Bombardment of Algiers,3,210.0,Netherlands,,,8.0,26.0,1816,...,,1,True,,1,,6,13.0,,4
1,300,Allied Bombardment of Algiers,3,200.0,United Kingdom,,Algeria,8.0,26.0,1816,...,,1,True,,1,,6,129.0,6000.0,4
2,301,Ottoman-Wahhabi,3,640.0,Ottoman Empire,,Saudi Wahhabis,9.0,,1816,...,,1,False,,1,,6,13500.0,14000.0,4
3,302,Liberation of Chile,2,230.0,Spain,,San Martin revolutionaries,1.0,9.0,1817,...,,0,False,,2,,1,1700.0,1140.0,4
4,303,First Bolivar Expedition,2,230.0,Spain,,New Granada,4.0,11.0,1817,...,,1,False,,2,,1,3000.0,2000.0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,482,Iraqi Resistance,3,290.0,Poland,,,3.0,,2004,...,,0,True,227,5,,6,23.0,,4
194,482,Iraqi Resistance,3,325.0,Italy,,,3.0,,2004,...,,0,True,227,5,,6,33.0,,4
195,482,Iraqi Resistance,3,369.0,Ukraine,,,3.0,,2004,...,,0,True,227,5,,6,18.0,,4
196,482,Iraqi Resistance,3,645.0,Iraq,,,6.0,28.0,2004,...,,0,True,227,5,,6,10800.0,,4


In [21]:
dfExtraStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'ccode1', 'SideA', 'ccode2', 'SideB',
       'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1',
       'EndYear1', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2',
       'EndDay2 ', 'EndYear2', 'Initiator', 'Interven', 'TransFrom', 'Outcome',
       'TransTo', 'WhereFought', 'BatDeath', 'NonStateDeaths', 'Version'],
      dtype='object')

In [37]:
dfExtraStateWar_dates1 = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'ccode2', 'SideB', 'StartMonth1', 'StartDay1', 'StartYear1', 'EndMonth1', 'EndDay1','EndYear1', 'Initiator', 'Outcome', 'BatDeath', 'NonStateDeaths']]\
.drop_duplicates().dropna(subset=["StartYear1"])
dfExtraStateWar_dates1_proc = process_dates(dfExtraStateWar_dates1, "StartYear1", "StartMonth1", "StartDay1", "EndYear1", "EndMonth1", "EndDay1")

In [38]:
dfExtraStateWar_dates1_proc

Unnamed: 0,WarNum,ccode1,SideA,ccode2,SideB,Initiator,Outcome,BatDeath,NonStateDeaths,start_date_prec,end_date_prec,start_date,end_date
0,300,210.0,Netherlands,,,1,1,13.0,,day,day,1816-08-26,1816-08-30
1,300,200.0,United Kingdom,,Algeria,1,1,129.0,6000.0,day,day,1816-08-26,1816-08-30
2,301,640.0,Ottoman Empire,,Saudi Wahhabis,1,1,13500.0,14000.0,month,day,1816-09-01,1818-09-11
3,302,230.0,Spain,,San Martin revolutionaries,0,2,1700.0,1140.0,day,day,1817-01-09,1818-04-05
4,303,230.0,Spain,,New Granada,1,2,3000.0,2000.0,day,day,1817-04-11,1819-08-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,482,290.0,Poland,,,0,5,23.0,,month,day,2004-03-01,2006-03-31
194,482,325.0,Italy,,,0,5,33.0,,month,year,2004-03-01,NaT
195,482,369.0,Ukraine,,,0,5,18.0,,month,month,2004-03-01,2005-10-01
196,482,645.0,Iraq,,,0,5,10800.0,,day,year,2004-06-28,NaT


In [41]:
dfExtraStateWar_dates2 = dfExtraStateWar[['WarNum', 'ccode1', 'SideA', 'ccode2', 'SideB', 'StartMonth2', 'StartDay2', 'StartYear2', 'EndMonth2', 'EndDay2 ','EndYear2', 'Initiator', 'Outcome', 'BatDeath', 'NonStateDeaths']]\
.drop_duplicates().dropna(subset=["StartYear2"])
dfExtraStateWar_dates2_proc = process_dates(dfExtraStateWar_dates2, "StartYear2", "StartMonth2", "StartDay2", "EndYear2", "EndMonth2", "EndDay2 ")

In [42]:
dfExtraStateWar_dates2_proc

Unnamed: 0,WarNum,ccode1,SideA,ccode2,SideB,Initiator,Outcome,BatDeath,NonStateDeaths,start_date_prec,end_date_prec,start_date,end_date
37,334,210.0,Netherlands,,Bali,1,6,300.0,2000.0,day,day,1849-04-02,1849-06-14
79,379,200.0,United Kingdom,,Afghanistan,1,1,10000.0,11000.0,day,day,1879-09-03,1880-09-02
153,454,200.0,United Kingdom,,Palestinians,0,1,126.0,2450.0,month,month,1937-08-01,1939-01-01


In [43]:
dfExtraPar = pd.concat([dfExtraStateWar_dates1_proc, dfExtraStateWar_dates2_proc])

In [44]:
dfExtraPar

Unnamed: 0,WarNum,ccode1,SideA,ccode2,SideB,Initiator,Outcome,BatDeath,NonStateDeaths,start_date_prec,end_date_prec,start_date,end_date
0,300,210.0,Netherlands,,,1,1,13.0,,day,day,1816-08-26,1816-08-30
1,300,200.0,United Kingdom,,Algeria,1,1,129.0,6000.0,day,day,1816-08-26,1816-08-30
2,301,640.0,Ottoman Empire,,Saudi Wahhabis,1,1,13500.0,14000.0,month,day,1816-09-01,1818-09-11
3,302,230.0,Spain,,San Martin revolutionaries,0,2,1700.0,1140.0,day,day,1817-01-09,1818-04-05
4,303,230.0,Spain,,New Granada,1,2,3000.0,2000.0,day,day,1817-04-11,1819-08-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,482,645.0,Iraq,,,0,5,10800.0,,day,year,2004-06-28,NaT
197,482,732.0,Republic of Korea,,,0,5,1.0,,day,year,2004-09-25,NaT
37,334,210.0,Netherlands,,Bali,1,6,300.0,2000.0,day,day,1849-04-02,1849-06-14
79,379,200.0,United Kingdom,,Afghanistan,1,1,10000.0,11000.0,day,day,1879-09-03,1880-09-02


## Nonstate War

In [45]:
dfNonStateWar

Unnamed: 0,WarNum,WarName,WarType,WhereFought,SideA1,SideA2,SideB1,SideB2,SideB3,SideB4,...,EndMonth,EndDay,Initiator,TransFrom,TransTo,Outcome,SideADeaths,SideBDeaths,TotalCombatDeaths,Version
0,1500,First Maori Tribal War,8,9,Te Rauparaha's Ngati Toa,,Taranaki,Ngai Tahu,Waikato,Ngati Ira,...,,,A,,,1,1500.0,6000.0,7500,4
1,1501,Shaka Zulu-Bantu War,8,4,Shaka Zulu,,Bantu,,,,...,9.0,24.0,A,,,1,20000.0,40000.0,60000,4
2,1502,Burma-Assam War,8,7,Burma,,Assam,,,,...,,,A,,,1,,,,4
3,1503,Buenos Aires War,8,1,Buenos Aires,,Provinces,,,,...,2.0,23.0,B,,,2,,,,4
4,1505,Second Maori Tribal War,8,9,Hongi Hika's Nga Phuhi,,Ngati Paoa,Ngati Maru,Waikato River Maori,Te Arawa,...,,,A,,,1,500.0,2000.0,2500,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,1574,Rwandan Social Revolution,8,4,Hutu,,Tutsi,,,,...,7.0,1.0,A,,,1,,,,4
58,1577,Dhofar Rebellion Phase 1,8,6,Dhofar,,Oman,,,,...,10.0,6.0,A,,,6,,,5000,4
59,1581,Angola Guerilla War,8,4,MPLA,,FLNA,UNITA,,,...,10.0,22.0,A,,186,4,,,,4
60,1582,East Timorese War Phase 1,8,7,Fretilin,Apodeti,UDT,,,,...,10.0,15.0,B,,472,4,,,3000,4


In [46]:
dfNonStateWar.columns

Index(['WarNum', 'WarName', 'WarType', 'WhereFought', 'SideA1', 'SideA2',
       'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5', 'StartYear',
       'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator',
       'TransFrom', 'TransTo', 'Outcome', 'SideADeaths', 'SideBDeaths',
       'TotalCombatDeaths', 'Version'],
      dtype='object')

In [48]:
dfNonStateWar_dates = dfNonStateWar[['WarNum', 'SideA1', 'SideA2', 'SideB1', 'SideB2', 'SideB3', 'SideB4', 'SideB5', 'StartYear', 'StartMonth', 'StartDay', 'EndYear', 'EndMonth', 'EndDay', 'Initiator', 'Outcome', 'SideADeaths', 'SideBDeaths', 'TotalCombatDeaths',]]\
.drop_duplicates()
dfNonStateWar_dates_proc = process_dates(dfNonStateWar_dates, "StartYear", "StartMonth", "StartDay", "EndYear", "EndMonth", "EndDay")

In [49]:
dfNonStateWar_dates_proc

Unnamed: 0,WarNum,SideA1,SideA2,SideB1,SideB2,SideB3,SideB4,SideB5,Initiator,Outcome,SideADeaths,SideBDeaths,TotalCombatDeaths,start_date_prec,end_date_prec,start_date,end_date
0,1500,Te Rauparaha's Ngati Toa,,Taranaki,Ngai Tahu,Waikato,Ngati Ira,Rangitikei,A,1,1500.0,6000.0,7500,year,year,1818-01-01,1824-01-01
1,1501,Shaka Zulu,,Bantu,,,,,A,1,20000.0,40000.0,60000,year,day,1819-01-01,1828-09-24
2,1502,Burma,,Assam,,,,,A,1,,,,year,year,1819-01-01,1822-01-01
3,1503,Buenos Aires,,Provinces,,,,,B,2,,,,day,day,1820-01-08,1820-02-23
4,1505,Hongi Hika's Nga Phuhi,,Ngati Paoa,Ngati Maru,Waikato River Maori,Te Arawa,,A,1,500.0,2000.0,2500,month,year,1821-09-01,1823-01-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,1574,Hutu,,Tutsi,,,,,A,1,,,,day,day,1959-10-19,1962-07-01
58,1577,Dhofar,,Oman,,,,,A,6,,,5000,month,day,1968-09-01,1971-10-06
59,1581,MPLA,,FLNA,UNITA,,,,A,4,,,,day,day,1974-10-15,1975-10-22
60,1582,Fretilin,Apodeti,UDT,,,,,B,4,,,3000,day,day,1975-08-11,1975-10-15
