In [8]:
import pandas as pd
import numpy as np


pd.set_option('display.max_colwidth', None)  # Show full content in each cell
pd.set_option('display.max_rows', None)      # Show all rows
pd.set_option('display.max_columns', None)   # Show all columns


sourceDf = pd.read_csv("./data/Europe-Central-Asia_2018-2024_Sep27.csv")
print(sourceDf.columns)

Index(['event_id_cnty', 'event_date', 'year', 'time_precision',
       'disorder_type', 'event_type', 'sub_event_type', 'actor1',
       'assoc_actor_1', 'inter1', 'actor2', 'assoc_actor_2', 'inter2',
       'interaction', 'civilian_targeting', 'iso', 'region', 'country',
       'admin1', 'admin2', 'admin3', 'location', 'latitude', 'longitude',
       'geo_precision', 'source', 'source_scale', 'notes', 'fatalities',
       'tags', 'timestamp'],
      dtype='object')
1.26.2


**Data Quality Check**
- all columns are populated

In [10]:

import reverse_geocoder as rg
import pycountry
from functools import cache

#trim leading and trailing spaces
sourceDf["notes"] = sourceDf["notes"].str.strip()


#looks like all columns are populated
for column in sourceDf.columns:
    emptyCount = sourceDf[sourceDf[column].isnull() | sourceDf[column] == ''][column].count()
    if emptyCount > 0:
        print(f"{column} empty count: {emptyCount}")



def check_location_mismatch():

    locations = sourceDf[["country","latitude", "longitude"]]

    @cache
    def get_country_name(country_code):
        country = pycountry.countries.get(alpha_2=country_code)
        return  country.name if country else "Unknown"



    # compare with reverse_geocoder to see any coordinate to country mismatches
    # most mismatch happens on border towns and our dataset is correct (after eyeballing on google map)

    for row in locations.itertuples():
        given_country = row.country
        if given_country not in ["Ukraine", "Russia"]:
            continue
        coordinates = (row.latitude, row.longitude)  
        location = rg.search(coordinates) 
        country_code = location[0]['cc']
        computed_country_name =  get_country_name(country_code)

        # Russia == Russian Federation
        computed_country_name = "Russia" if computed_country_name == "Russian Federation" else computed_country_name
        if given_country != computed_country_name:
            print(f"Record country: {given_country}; Computed country: {computed_country_name}; Coordinate: {coordinates}")


# check_location_mismatch()


In [3]:
unique_countries = sourceDf['country'].unique()
unique_countries = np.sort(unique_countries)
print(unique_countries)

['Albania' 'Andorra' 'Armenia' 'Austria' 'Azerbaijan'
 'Bailiwick of Guernsey' 'Bailiwick of Jersey' 'Belarus' 'Belgium'
 'Bosnia and Herzegovina' 'Bulgaria' 'Croatia' 'Cyprus' 'Czech Republic'
 'Denmark' 'Estonia' 'Faroe Islands' 'Finland' 'France' 'Georgia'
 'Germany' 'Gibraltar' 'Greece' 'Greenland' 'Hungary' 'Iceland' 'Ireland'
 'Isle of Man' 'Italy' 'Kazakhstan' 'Kosovo' 'Kyrgyzstan' 'Latvia'
 'Liechtenstein' 'Lithuania' 'Luxembourg' 'Malta' 'Moldova' 'Monaco'
 'Montenegro' 'Netherlands' 'North Macedonia' 'Norway' 'Poland' 'Portugal'
 'Romania' 'Russia' 'San Marino' 'Serbia' 'Slovakia' 'Slovenia' 'Spain'
 'Sweden' 'Switzerland' 'Tajikistan' 'Turkmenistan' 'Ukraine'
 'United Kingdom' 'Uzbekistan' 'Vatican City']


In [4]:


unique_sub_event = sourceDf['sub_event_type'].unique()
unique_sub_event = np.sort(unique_sub_event)
print(unique_sub_event)

non_war_related_event  = [
    'Agreement',
    'Arrests',
    'Mob violence',
    'Excessive force against protesters',
    'Peaceful protest',
    'Protest with intervention',
    'Sexual violence',
    'Violent demonstration',
    'Looting/property destruction'
]

war_related_event =  [event for event in unique_sub_event if event not in non_war_related_event ]

['Abduction/forced disappearance' 'Agreement' 'Air/drone strike'
 'Armed clash' 'Arrests' 'Attack' 'Change to group/activity'
 'Disrupted weapons use' 'Excessive force against protesters'
 'Government regains territory' 'Grenade'
 'Headquarters or base established' 'Looting/property destruction'
 'Mob violence' 'Non-state actor overtakes territory'
 'Non-violent transfer of territory' 'Other' 'Peaceful protest'
 'Protest with intervention' 'Remote explosive/landmine/IED'
 'Sexual violence' 'Shelling/artillery/missile attack' 'Suicide bomb'
 'Violent demonstration']


**Scope Investigation**

Countries other than Ukraine and Russion are mostly not involved in direct conflicts.

Here are some ways a third country can be involved in the war by eye-balling related notes mentioning Ukraine/Russia:

- Russia moving/deploying/firing weapons in Belarus
- A third-country shipping supplies to Russia/Ukraine
- Russia missile/drone crossed or fell in Belarus/Moldova/Romania (this might be less significant to the war intensity)


Overall, we may revisit these records later


In [5]:
# Does records of countires other than Ukraine, Russia and Belarus contain events directed related to Ukraine wars?

immediately_related_countries = ["Ukraine", "Russia"]
other_country_records = sourceDf[~sourceDf["country"].isin(immediately_related_countries)]

#filter for notes containing related keywords
related_keywords = "|".join(["Ukraine", "Russia", "Ukrainian"])
other_country_related_notes = other_country_records[other_country_records["notes"].str.contains(related_keywords, case=False, na=False)]

#filter out notes containing irrelevent keywords
unrelated_keywords = "|".join([ "Azerbaijan", "Armenia", "Israel", "Kazakh", "Displacement", "Security measures", "Non-violent activity"])
other_country_related_notes = other_country_related_notes[~other_country_related_notes["notes"].str.contains(unrelated_keywords, case=False, na=False)]

other_country_related_notes = other_country_related_notes[other_country_related_notes["sub_event_type"].isin(war_related_event)]
notes = other_country_related_notes["notes"].tolist()
notes.sort()
print("\n".join(notes))

Around 1 February 2024 (week of), police physically assaulted a male citizen of Ukraine at the police detention facility in Talgar township of Almaty region after police arrest for illegal cultivation of marijuana at his residence in the village of Besagash.
Around 1 January 2024 (month of), Belarusian state security officers beat up 12 detainees in Drogichinskiy district (coded to Drahichyn (Drogichinskiy, Belarus)). The detainees were accused of collaborating with Ukrainian security services.
Around 1 May 2023 (beginning of month), Russian journalist and activists felt sick after her stay in a hotel in Prague. Further investigation revealed poisoning by a nerve agent, and linked it with the Russian government.
Around 13 April 2023 (week of), a 40-year-old Uzbekistan-native male citizen of Russia was detained by police and allegedly tortured while in custody at the city police department detention facility near Yashnobod district of Tashkent city in connection with a victim's alleged 

**Dataset Basic Features**

In [44]:
ukraine_russia_events = sourceDf[(sourceDf["country"] == "Ukraine") | (sourceDf["country"] == "Russia")]
ukraine_war_events = ukraine_russia_events[ukraine_russia_events["sub_event_type"].isin(war_related_event)]
print("total relevant event count:", ukraine_russia_events.size)
print("total war related event count:",ukraine_war_events.size)
print("total war fatalities:", ukraine_war_events["fatalities"].sum())
print("min date:", ukraine_russia_events["event_date"].min())
print("max date:", ukraine_russia_events["event_date"].max())

total relevant event count: 6480798
total war related event count: 5978970
total war fatalities: 122124
min date: 2018-01-01
max date: 2024-09-27


**Remaining Work**
- Associate each event with type and amount of weapon used by extracting keywords from notes
- Visualize event/weapon use frequency over time
- Compute a feature vector for each day where each vector element is an average/summary of a certain event happening in the +- k days (ie, an average of number of missiles fired in the past and future 2 days from today)
- Perform KNN on such vectors and see if we can separate out intensity levels?
- Data enrichment with economic/weather/foreign aid data

In [16]:

count_by_event_type = ukraine_war_events.groupby("sub_event_type").size()
print(count_by_event_type)

sub_event_type
Abduction/forced disappearance            440
Air/drone strike                        25235
Armed clash                             49467
Attack                                   1622
Change to group/activity                  522
Disrupted weapons use                    9187
Government regains territory              350
Grenade                                    91
Headquarters or base established           14
Non-state actor overtakes territory       484
Non-violent transfer of territory         146
Other                                     579
Remote explosive/landmine/IED            1846
Shelling/artillery/missile attack      102879
Suicide bomb                                8
dtype: int64


**Weapon Type Mining**

- separating missile from artillery 
- Long-range missile vs anti-tank missile
- Look for missile models (they are usually within 3 word distance from the keyword "missile")

In [23]:
shelling_missile_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Shelling/artillery/missile attack"]["notes"]

notes_list = shelling_missile_notes.to_list()
notes_list = [x for x in notes_list if "missile" in x][:20]
print("\n\n".join(notes_list))

On 27 September 2024, Russian forces launched ''Iskander-M' ballistic missiles at Dnipro, Dnipropetrovsk. Casualties unknown.

On 27 September 2024, Russian forces launched ''Iskander-M' ballistic missiles at Zaporizhia, Zaporizhia. Casualties unknown.

Around 26 September 2024 (as reported), according to the Russian sources, Ukrainian forces using a anti-tank guided missiles (likely shoulder fired) struck other Ukrainian forces, that were attempting to surrender by flying the white flag on the APC near the village of Veseloe (Glushkovskiy, Kursk). The APC was destroyed. Casualties unknown.

On 26 September 2024, Russian Forces struck Dnipro, Dnipropetrovsk, with a ballistic missile. A fire broke out as a result. Casualties unknown.

On 24 September 2024, Ukrainian military carried out (likely) missile strikes on the city of Belgorod (Belgorod). In addition to the direct impact, Russian military intercepted some of the missiles causing the debris to fall. 6 civilians were injured, 1 ap

In [29]:
air_drone_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Air/drone strike"]["notes"]

notes_list = air_drone_notes.to_list()
notes_list = [x for x in notes_list if "drone" not in x][:20]
print("\n\n".join(notes_list))

On 27 September 2024, Russian military carried out air strikes, shelled with artillery and (likely) mortars concentration of Ukrainian military personnel and equipment in the area around Guevo (Sudzhanskiy, Kursk). According to the Russian MoD, Russian troops defeated Ukrainian troops from mechanized, tank, marine, airborne assault, and territorial defense brigades of the Armed Forces of Ukraine, and that during 27 September Ukrainian military's losses amounted to more than 330 servicemen, as well as armored vehicles and military equipment including in Guevo, Kolmakov, Kazach'ya-Loknya, Malaya Loknya, Martynovka, Mikhaylovka, Novaya Sorochina, Orlovka, Pravda, Tolstyi Lug, Cherkasskoe Porechnoe, Russkoe Porechnoe, Borki [Russian MoD reported 330 fatalities. Coded as 10 fatalities split across 13 events. 1 fatality coded to this event].

On 27 September 2024, Russian military carried out air strikes, shelled with artillery and (likely) mortars concentration of Ukrainian military personn

**Mining for Human/Equipment Losses in Land Attack**
- some reports mention the amount of losses of soldiers and each type of equipment
- some reports mention the type of support (artillery / aviation) involved in the clash
- 


In [32]:


armed_clash_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Armed clash"]["notes"]

notes_list = armed_clash_notes.to_list()
notes_list = [x for x in notes_list if "armour" in x][:20]
print("\n\n".join(notes_list))

On 20 September 2024, Russian Forces clashed with Ukrainian Forces near Vovchansk, Kharkiv. 64 Russian soldiers were killed and injured in the area of Vovchansk (coded as 32 fatalities). Ukrainian Forces destroyed an armoured vehicle, 3 MLRS, 51 UAVs and 2 vehicle of Russian Forces in the area of Vovchansk. According to Russian sources, 135 Ukrainian soldiers were killed in the area of Vovchansk, Lyptsi, Vovchanski Khutory and Kharkiv. [Russian MoD reported 135 fatalities. Coded as 10 fatalities split across 4 events. 3 fatalities coded to this event]. Overall fatalities coded as 35.

On 12 September 2024, Russian Forces launched assault actions on Ukrainian positions near Vozdvyzhenka, Donetsk. 377 Russian soldiers were killed and injured in the area of Vozdvyzhenka, Novooleksandrivka, Zelene Pole, Novotroitske, Novohrodivka, Hrodivka, Selydove and Mykhailivka (coded as 188 fatalities split across 8 events. 24 fatalities coded to this event). Ukrainian Forces destroyed 1 tank, 2 armou

In [33]:
#attack event is mostly milliary brutality against civilian, let's not consider it 

attack_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Attack"]["notes"]

notes_list = attack_notes.to_list()[:100]

print("\n\n".join(notes_list))

Around 27 September 2024 (as reported), according to the Russian sources, Ukrainian soldiers beat up and hanged a civilian in Cherkasskoe Porechnoe (Sudzhanskiy, Kursk). The man died.

Around 24 September 2024 (as reported), a former PMC Wagner mercenary killed another former PMC Wagner mercenary at an unidentified location in the Berezovskiy district (coded as Beryozovo (Berezovskiy, Khanty-Mansi)). According to the attacker, he killed the man with an axe because that person was participating in the 'reset' (extrajudicial executions) of the deserters, drug addicts and delinquent fighters in the ranks of Wagner, during his time serving in the PMC.

On 22 September 2024, a military academy cadet beat up a woman, whom he mistook for a Ukrainian spy, to death in St. Petersburg (Admiralteyskiy, Saint Petersburg). To him the woman introduced herself as a foreigner, and after the man heard the sound of her hearing aid, he decided that she was a 'Ukrainian spy'. The men previously served in t

In [34]:


#mostly interceptions of missiles and drones, very important information

disrupted_weapon_use_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Disrupted weapons use"]["notes"]

notes_list = disrupted_weapon_use_notes.to_list()[:100]

print("\n\n".join(notes_list))

Interception: On 27 September 2024, at night, Russian military shot down 6 Ukrainian drones over the Rostov Region (coded as Rostov-on-Don, (Rostov-na-Donu, Rostov)). No casualties.

Interception: On 27 September 2024, Russian military shot down 4 Ukrainian drones over the Belgorod Region (coded as Belgorod). No casualties.

Interception: On 26 September 2024, Russian military shot down 6 Ukrainian drones over the Belgorod Region (coded as Belgorod). No casualties.

Interception: On 26 September 2024, Russian military shot down 2 Ukrainian drones over the Kursk Region (coded as Kursk). No casualties.

Interception: On 26 September 2024, Russian military shot down 2 Ukrainian drones over the Bryansk Region (coded as Bryansk). No casualties.

Interception: On 26 September 2024, Russian military shot down 1 drone over the Oryol Region (coded as Oryol). No casualties.

Interception: On 26 September 2024, Russian Forces launched 78 Shahed-136/131 drones against targets in Ukraine. Ukrainian

In [36]:

# mostly civilians running into mines with rare cases where miliary equipment gets blown up by landmines
# we could harvest equipment losses from this but might not get much

remote_explosive_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Remote explosive/landmine/IED"]["notes"]

notes_list = remote_explosive_notes.to_list()[100:500]

print("\n\n".join(notes_list))

On 10 April 2024, a child was injured as the result of the detonation of an explosive device in Novoraisk, Kherson.

On 9 April 2024, a teenager was injured by an explosion of an unknown explosive device in occupied Bezimenne, Donetsk.

On 5 April 2024, a tractor accidentally ran over a landmine during agricultural works in Rivne region (coded to Rivne, Riven region). The vehicle was damaged, 3 civilians were injured.

On 5 April 2024, a tractor accidentally ran over a landmine during agricultural works in Zelenodolsk territorial community, Dnipropetrovsk. There were no casualties.

On 4 April 2024, a previously unexploded Ukrainian cluster munition detonated in Horlivka, Donetsk and injured a civilian.

On 2 April 2024, a civilian was injured in Kryvorizkyi district (coded to Kryvyi Rih, Dnipropetrovsk region), when he accidentally activated a munition left after Russian shelling in his backyard.

On 2 April 2024, a civilian accidentally activated an explosive near Liubomyrivka, Mykol

In [37]:

#very little interesting equipment loss reports
gov_regain_territory_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Government regains territory"]["notes"]

notes_list = gov_regain_territory_notes.to_list()[100:500]

print("\n\n".join(notes_list))

On 5 October 2022, according to the Ukrainian Ministry of Defence, Russian forces shelled near Zaitseve, Donetsk region. According to the Russian Ministry of Defence, Russian forces gained control over the settlement. Over 120 Ukrainian soldiers were killed and 3 infantry fighting vehicles were destroyed. Casualties and material losses on Russian side unknown. [Russian MoD reported 120 fatalities. Coded as 10 fatalities]

Around 4 October 2022 (as reported), following clashes, Ukrainian forces took control over Borivska Andriivka, Kharkiv. Casualties unknown.

On 4 October 2022, following clashes, Ukrainian forces regained control over Novomykhailivka, Kherson region. Casualties unknown.

On 4 October 2022, following clashes, Ukrainian forces regained control over Bohuslavka, Kharkiv region. Casualties unknown.

Around 4 October 2022 (as reported), following clashes with Russian forces, Ukrainian forces took control over Davydiv Brid, Kherson. Over 120 Ukrainian soldiers were killed in

In [40]:
#very little interesting equipment loss reports

other_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Other"]["notes"]

notes_list = other_notes.to_list()[100:500]

print("\n\n".join(notes_list))

Displacement: On 8 April 2023, 61 persons evacuated from the de-occupied territories of Kherson region (coded to Kherson).

Explosive remnants of war: On 8 April 2023, two teenagers in Mikhaylovka found an unexploded ordnance from World War II, which detonated, injuring the two boys.

Displacement: On 6 April 2023, 18 children and 13 adults (overall 31 civilians) were evacuated from Kupiansk, Kupiansk-Vuzlovyi and Kivsharivka (coded to Kupiansk, Kharkiv).

Displacement: On 5 April 2023, 93 persons were evacuated from de-occupied territories of Kherson region (coded to Kherson, Kherson region).

Other: On 5 April 2023, Russian authorities reported that a Ukrainian military plane crashed near Butovsk. According to Russian officials, the pilot was planning to drop explosives on an oil pipeline in the Bryansk region but crashed due to a technical problem. The pilot was detained. The Ukrainian authorities did not comment on the incident.

Displacement: On 2 April 2023, 94 people evacuated f

In [45]:

#Hand grenades, nothing interesting

grenade_notes = ukraine_war_events[ukraine_war_events["sub_event_type"] == "Grenade"]["notes"]

notes_list = grenade_notes.to_list()

# print("\n\n".join(notes_list))

On 4 September 2024, a local resident threw a grenade of an unknown origin into a policeman's house in Zviahel district (coded to Zviahel, Zhytomyr). The grenade exploded. The incident occurred after the man's car was stopped by the policeman on patrol and an administrative report was issued against the driver. No casualties.

On 31 July 2024, a man threw a grenade in a crowded place in Kherson, Kherson, killing a civilian and injuring five more.

On 15 July 2024, an unidentified armed group threw a grenade at the conscription center of Military Forces of Ukraine in Busk, Lviv, damaging the building. No casualties.

On 30 May 2024, an unidentified armed group threw a grenade in the garden of a serviceman working at the conscription center of Military Forces of Ukraine in Katerynopil, Cherkasy. No casualties.

On 7 March 2024, a man died after a F-1 hand grenade detonated at his home in Dernivka, Kyiv. He had found the item upon deoccupation of Kyiv region.

On 29 December 2023, a teena

**Overall characteristics of the notes field**

- the same report tend to be split across multiple records based on casualty and locations (we might need to do a group by to avoid double counting equipment losses)

- we should be good just focusing on 
    - Disrupted weapons use                    9187
    - Air/drone strike                        25235
    - Armed clash                             49467
    - Shelling/artillery/missile attack      102879
    - Government regains territory              350  (try to see if battle front line has changed)


**Next Steps**
- regroup events that are split based on casualties
- Design features by looking at record field + notes under each sub event type (ie, missile model/ number_of_location_struck_by_air_attack, etc)
- vectorize each event report and generate a vector for each day
- perform RNN on day vectors to investigate intensity tiers
- label each day with the intensity label we discovered



**Potentially interesting features**
- count of equipment loss / used in report
- a boolean flag of whether a certain type of attack happened (ie, whether an air strick took place)
- the number of locations involved
- the number of confirmed fatalities
- 