# Wrangle War Data

### Input Datasets

- `Inter-StateWarData_v4.0.csv`
- `INTRA-STATE WARS v5.1 CSV.csv`
- `Extra-StateWarData_v4.0.csv`
- `Non-StateWarData_v4.0.csv`

### Output Datasets

- `war.csv`
- `war_type.csv`
- `war_locations.csv`
- `war_transitions.csv`

In [1]:
import pandas as pd
import numpy as np

In [2]:
raw_data_path = "../data/raw/"
processed_data_path = "../data/processed/"

In [3]:
dfInterStateWar = pd.read_csv(raw_data_path+"Inter-StateWarData_v4.0.csv", encoding='utf-8', na_values=[-7, -8, -9], dtype={"WarNum": str, "TransFrom": str, "TransTo": str})
dfIntraStateWar = pd.read_csv(raw_data_path+"INTRA-STATE WARS v5.1 CSV.csv", encoding='latin-1', na_values=[-7, -8, -9], dtype={"WarNum": str, "Intnl": bool, "TransFrom": str, "TransTo": str})
dfExtraStateWar = pd.read_csv(raw_data_path+"Extra-StateWarData_v4.0.csv", encoding='latin-1', na_values=[-7, -8, -9], dtype={"WarNum": str, "Interven": bool, "TransFrom": str, "TransTo": str})
dfNonStateWar = pd.read_csv(raw_data_path+"Non-StateWarData_v4.0.csv", encoding='utf-8', na_values=[-7, -8, -9], dtype={"WarNum": str, "TransFrom": str, "TransTo": str})

## Create "war" table

Table creation statement

```
class War(Base):
    __tablename__ = "war"

    id = Column(Integer(4), primary_key=True)
    name = Column(Text)
    type_code = Column(Integer(1))
    type_name = Column(Text)
    subtype_name = Column(Text)
    is_intervention = Column(Boolean)
    is_international = Column(Boolean)
```

In [4]:
dfWars = pd.concat([
    dfInterStateWar[['WarNum', 'WarName', 'WarType']],
    dfIntraStateWar[['WarNum', 'WarName', 'WarType', 'Intnl']],
    dfExtraStateWar[['WarNum', 'WarName', 'WarType', 'Interven']],
    dfNonStateWar[['WarNum', 'WarName', 'WarType']]
]).drop_duplicates(ignore_index=True).rename(columns={'WarNum':'id', 
                                                      'WarName': 'name', 
                                                      'WarType': 'type_code', 
                                                      'Intnl':'is_international', 
                                                      'Interven': 'is_intervention'})

In [5]:
dfWars = dfWars[["id", "name", "type_code", "is_intervention", "is_international"]]

In [6]:
dfWars.dtypes

id                  object
name                object
type_code            int64
is_intervention     object
is_international    object
dtype: object

In [7]:
dfWars.id.str.len().max()

5

Note: why are war ids strings now instead of integers? Because the most recent interwars dataset decided that numerical ordering was sooooo important that they added "in-between" war ids, thus war ids with decimals.

In [8]:
dfWars.to_csv(processed_data_path+"war.csv", index=False)

## Create "war_type" table

Table creation statement

```
class War_Type(Base):
    __tablename__ = "war_type"

    type_code = Column(Integer(1), primary_key=True)
    war_type = Column(Text)
    war_subtype = Column(Text)
    type_description = Column(Text)
```

In [9]:
war_types_dict = {"type_code": [1, 2, 3, 4, 5, 6, 7, 8, 9],
                  "war_category": ["Inter-State", "Extra-State", "Extra-State", "Intra-State", "Intra-State", "Intra-State", "Intra-State", "Non-State", "Non-State"],
                  "war_subtype": ["Inter-State", 
                                  "Colonial War", 
                                  "Imperial War", 
                                  "Civil war for central control", 
                                  "Civil war over local issues", 
                                  "Regional internal", 
                                  "Intercommunal", 
                                  "occur in non-state territory", 
                                  "occur across state borders"],
                  "type_description": ["""Inter-state wars are wars that take place between or among states (members of the interstate system)""", 
                                       """A colonial extra-state war occurs if the adversary is a “colony, dependency, or protectorate.” In other words, these “colonial wars” tend to occur when a colony rebels and tries to become independent.""", 
                                       """In imperial wars the system member fights an adversary that is, “an independent political entity that did not qualify for system membership because of serious limitations on its independence, a population insufficiency, or a failure of other states to recognize it as a legitimate member.”""", 
                                       """Civil wars involving the government of the state against a non-state entity; for control of the central government""", 
                                       """Civil wars involving the government of the state against a non-state entity; involving disputes over local issues""", 
                                       """Regional internal wars involve the government of a regional subunit against a non-state entity""", 
                                       """Intercommunal wars involve combat between/among two or more non-state entities within the state""", 
                                       """Wars between or among non-state entities that take place in non-state territory""", 
                                       """Wars between non-state armed groups that take place across state borders"""],
                 }
dfWarTypes = pd.DataFrame(war_types_dict)

In [10]:
dfWarTypes

Unnamed: 0,type_code,war_category,war_subtype,type_description
0,1,Inter-State,Inter-State,Inter-state wars are wars that take place betw...
1,2,Extra-State,Colonial War,A colonial extra-state war occurs if the adver...
2,3,Extra-State,Imperial War,In imperial wars the system member fights an a...
3,4,Intra-State,Civil war for central control,Civil wars involving the government of the sta...
4,5,Intra-State,Civil war over local issues,Civil wars involving the government of the sta...
5,6,Intra-State,Regional internal,Regional internal wars involve the government ...
6,7,Intra-State,Intercommunal,Intercommunal wars involve combat between/amon...
7,8,Non-State,occur in non-state territory,Wars between or among non-state entities that ...
8,9,Non-State,occur across state borders,Wars between non-state armed groups that take ...


In [11]:
dfWarTypes.to_csv(processed_data_path+"war_type.csv", index=False)

## Create "war_locations" table

Table creation statement

```
class War_Locations(Base):
    __tablename__ = "war_locations"

    war = Column(String(5), primary_key=True)
    region = Column(Text, primary_key=True)

    __table_args__ = (ForeignKeyConstraint(["war"], ["war.id"]),)
```

### interstate war locations

In [12]:
dfInterLoc = dfInterStateWar[["WarNum", "WhereFought"]].drop_duplicates()

In [13]:
dfInterLoc[dfInterLoc.duplicated(["WarNum"])]

Unnamed: 0,WarNum,WhereFought
104,100,2
112,106,11
114,106,15
115,106,7
117,106,6
123,106,14
169,139,19
170,139,15
172,139,14
180,139,16


In [14]:
dfInterLoc.dtypes

WarNum         object
WhereFought     int64
dtype: object

In [15]:
dfInterStateWar[dfInterStateWar.WarNum.isin(["100","106","139"])].head()

Unnamed: 0,WarNum,WarName,WarType,ccode,StateName,Side,StartMonth1,StartDay1,StartYear1,EndMonth1,...,EndMonth2,EndDay2,EndYear2,TransFrom,WhereFought,Initiator,Outcome,TransTo,BatDeath,Version
102,100,First Balkan,1,640,Turkey,2,10,17,1912,4,...,,,,650.0,11,2,2,,30000.0,4
103,100,First Balkan,1,350,Greece,1,10,17,1912,4,...,,,,650.0,11,2,1,,5000.0,4
104,100,First Balkan,1,355,Bulgaria,1,10,17,1912,12,...,4.0,19.0,1913.0,650.0,2,2,1,,32000.0,4
105,100,First Balkan,1,345,Yugoslavia,1,10,17,1912,12,...,4.0,19.0,1913.0,650.0,11,1,1,,15000.0,4
111,106,World War I,1,345,Yugoslavia,1,7,29,1914,11,...,,,,,2,2,1,,70000.0,4


War 100 has location 11
War 106 has location 15
War 139 has location 19

In [16]:
dfInterLoc.loc[dfInterLoc['WarNum'] == "100", 'WhereFought'] = 11
dfInterLoc.loc[dfInterLoc['WarNum'] == "106", 'WhereFought'] = 15
dfInterLoc.loc[dfInterLoc['WarNum'] == "139", 'WhereFought'] = 19

In [17]:
dfInterLoc = dfInterLoc.drop_duplicates().rename(columns={"WarNum":"war","WhereFought":"region"})

In [18]:
dfInterLoc

Unnamed: 0,war,region
0,1,2
2,4,11
4,7,1
6,10,2
10,13,2
...,...,...
315,219,4
317,221,2
325,223,7
327,225,7


In [19]:
region_map_values_interstate = {1: 'W. Hemisphere', 2: 'Europe', 4: 'Africa', 6: 'Middle East', 7: 'Asia', 9: 'Oceania', 11: 'Europe,Middle East', 12: 'Europe,Asia', 13: 'W. Hemisphere,Asia', 14: 'Europe,Africa,Middle East', 15: 'Europe,Africa,Middle East,Asia', 16: 'Africa,Middle East,Asia,Oceania', 17: 'Asia,Oceania', 18: 'Africa,Middle East', 19: 'Europe,Africa,Middle East,Asia,Oceania'}

dfInterLoc['region'] = dfInterLoc['region'].replace(region_map_values_interstate).str.split(',')
dfInterLoc = dfInterLoc.explode('region').drop_duplicates().reset_index(drop=True)
dfInterLoc

Unnamed: 0,war,region
0,1,Europe
1,4,Europe
2,4,Middle East
3,7,W. Hemisphere
4,10,Europe
...,...,...
103,219,Africa
104,221,Europe
105,223,Asia
106,225,Asia


In [20]:
dfInterLoc.region.value_counts()

Europe           30
Asia             29
Middle East      24
W. Hemisphere    16
Africa            8
Oceania           1
Name: region, dtype: int64

### intrastate war locations

In [21]:
dfIntraLoc_pre = dfIntraStateWar[dfIntraStateWar.V5RegionNum == 6].iloc[:, 0:7].rename(columns={"WarNum":"war"})

In [22]:
dfIntraLoc_pre["region_asiaoceania"] = "Asia"
dfIntraLoc_pre.loc[dfIntraLoc_pre['CcodeA'] >= 900, 'region_asiaoceania'] = "Oceania"
dfIntraLoc_pre

Unnamed: 0,war,WarName,V5RegionNum,WarType,CcodeA,SideA,SideB,region_asiaoceania
73,567,Taiping Rebellion phase 2 of 1860-1866,6,4,710.0,China,Taipings,Asia
74,568,Second Nien Revolt of 1860-1868,6,5,710.0,China,Nien Society,Asia
75,570,Miao Rebellion phase 2 of 1860-1872,6,5,710.0,China,Miao,Asia
76,571,Panthay Rebellion phase 2 of 1860-1874,6,5,710.0,China,Hui Rebels,Asia
80,576,Tungan Rebellion of 1862-1873,6,5,710.0,China,Shaanxi and Gansu Muslims,Asia
...,...,...,...,...,...,...,...,...
397,936,Second Philippine - NPA War of 2005-2006,6,4,840.0,Philippines,NPA,Asia
399,940,Third Sri Lanka Tamil War of 2006-2009,6,5,780.0,Sri Lanka,LTTE,Asia
402,942,Second Waziristan War of 2007-present,6,5,770.0,Pakistan,Taliban,Asia
408,980,Kachin Rebellion of 2011-2013,6,5,775.0,Myanmar,KIA,Asia


In [23]:
dfIntraLoc_mid = dfIntraStateWar[["WarNum","V5RegionNum"]].rename(columns={"WarNum":"war", "V5RegionNum":"region"})


In [24]:
dfIntraLoc = dfIntraLoc_mid.merge(dfIntraLoc_pre[["war","region_asiaoceania"]], how="left", on="war")

In [25]:
region_map_values_intrastate = {1: 'W. Hemisphere', 2: 'W. Hemisphere', 3: 'Europe', 4: 'Africa', 5: 'Middle East', 6: np.NaN}
dfIntraLoc['region'] = dfIntraLoc['region'].replace(region_map_values_intrastate)
dfIntraLoc['region'] = dfIntraLoc['region'].fillna(dfIntraLoc['region_asiaoceania'])
dfIntraLoc

Unnamed: 0,war,region,region_asiaoceania
0,500,Europe,
1,502,Europe,
2,502.1,Europe,
3,503,Europe,
4,504,Europe,
...,...,...,...
415,992,Middle East,
416,992.5,Africa,
417,993,Europe,
418,994,Middle East,


In [26]:
dfIntraLoc.region.value_counts()

Asia             102
W. Hemisphere    100
Middle East       81
Europe            73
Africa            63
Oceania            1
Name: region, dtype: int64

### extrastate

In [27]:
dfExtraLoc = dfExtraStateWar[["WarNum", "WhereFought"]].drop_duplicates().rename(columns={"WarNum":"war", "WhereFought": "region"})

In [28]:
region_map_values_extrastate = {1: 'W. Hemisphere', 2: 'Europe', 4: 'Africa', 6: 'Middle East', 7: 'Asia', 9: 'Oceania'}
dfExtraLoc['region'] = dfExtraLoc['region'].replace(region_map_values_extrastate)

dfExtraLoc

Unnamed: 0,war,region
0,300,Middle East
2,301,Middle East
3,302,W. Hemisphere
4,303,W. Hemisphere
5,304,W. Hemisphere
...,...,...
178,477,Middle East
179,479,Middle East
180,480,Middle East
181,481,Asia


### nonstate

In [29]:
dfNonLoc = dfNonStateWar[["WarNum", "WhereFought"]].drop_duplicates().rename(columns={"WarNum":"war", "WhereFought": "region"})

In [30]:
region_map_values_nonstate = {1: 'W. Hemisphere', 2: 'Europe', 4: 'Africa', 6: 'Middle East', 7: 'Asia', 9: 'Oceania'}
dfNonLoc['region'] = dfNonLoc['region'].replace(region_map_values_nonstate)

dfNonLoc

Unnamed: 0,war,region
0,1500,Oceania
1,1501,Africa
2,1502,Asia
3,1503,W. Hemisphere
4,1505,Oceania
...,...,...
57,1574,Africa
58,1577,Middle East
59,1581,Africa
60,1582,Asia


### combine all for final location table

In [31]:
dfWarLoc = pd.concat([dfInterLoc, dfIntraLoc[["war", "region"]], dfExtraLoc, dfNonLoc])
dfWarLoc

Unnamed: 0,war,region
0,1,Europe
1,4,Europe
2,4,Middle East
3,7,W. Hemisphere
4,10,Europe
...,...,...
57,1574,Africa
58,1577,Middle East
59,1581,Africa
60,1582,Asia


In [32]:
dfWarLoc.to_csv(processed_data_path+"war_locations.csv", index=False)

## create "war_transitions" table

table creation statement

```
class War_Transitions(Base):
    __tablename__ = "war_transitions"

    from_war = Column(String(5), primary_key=True)
    to_war = Column(String(5), primary_key=True)

    __table_args__ = (
        ForeignKeyConstraint(["from_war"], ["war.id"]),
        ForeignKeyConstraint(["to_war"], ["war.id"]),
    )
```

In [33]:
dfInterWarTrans1 = dfInterStateWar[["WarNum", "TransFrom"]].rename(columns={"WarNum": "to_war", "TransFrom": "from_war"})
dfInterWarTrans2 = dfInterStateWar[["WarNum", "TransTo"]].rename(columns={"WarNum": "from_war", "TransTo": "to_war"})
dfInterWarTrans = pd.concat([dfInterWarTrans1, dfInterWarTrans2]).dropna()
dfInterWarTrans.head()

Unnamed: 0,to_war,from_war
0,1,503
1,1,503
2,4,506
3,4,506
7,10,551


In [34]:
dfIntraWarTrans1 = dfIntraStateWar[["WarNum", "TransFrom"]].rename(columns={"WarNum": "to_war", "TransFrom": "from_war"})
dfIntraWarTrans2 = dfIntraStateWar[["WarNum", "TransTo"]].rename(columns={"WarNum": "from_war", "TransTo": "to_war"})
dfIntraWarTrans = pd.concat([dfIntraWarTrans1, dfIntraWarTrans2]).dropna()
dfIntraWarTrans.head()

Unnamed: 0,to_war,from_war
37,538,1527
52,553,545
73,567,1534
75,570,1538
76,571,1541


In [35]:
dfExtraWarTrans1 = dfExtraStateWar[["WarNum", "TransFrom"]].rename(columns={"WarNum": "to_war", "TransFrom": "from_war"})
dfExtraWarTrans2 = dfExtraStateWar[["WarNum", "TransTo"]].rename(columns={"WarNum": "from_war", "TransTo": "to_war"})
dfExtraWarTrans = pd.concat([dfExtraWarTrans1, dfExtraWarTrans2]).dropna()
dfExtraWarTrans.head()

Unnamed: 0,to_war,from_war
74,373,601
160,461,1571
169,472,1582
174,475,189
175,475,189


In [36]:
dfNonWarTrans1 = dfNonStateWar[["WarNum", "TransFrom"]].rename(columns={"WarNum": "to_war", "TransFrom": "from_war"})
dfNonWarTrans2 = dfNonStateWar[["WarNum", "TransTo"]].rename(columns={"WarNum": "from_war", "TransTo": "to_war"})
dfNonWarTrans = pd.concat([dfNonWarTrans1, dfNonWarTrans2]).dropna()
dfNonWarTrans.head()

Unnamed: 0,to_war,from_war
21,538,1527
26,567,1534
29,570,1537
33,571,1541
54,461,1571


In [37]:
dfWarTrans = pd.concat([dfInterWarTrans, dfIntraWarTrans, dfExtraWarTrans, dfNonWarTrans]).drop_duplicates()

In [38]:
dfWarTrans

Unnamed: 0,to_war,from_war
0,1,503
2,4,506
7,10,551
16,19,327
32,37,352
...,...,...
105,79,404
110,79,410
167,1581,469
174,857,475


In [39]:
dfWarTrans.to_csv(processed_data_path+"war_transitions.csv", index=False)