# Further exploration

### This is a document exploring the question of if each row in the dataset is a distinct damage event.
This question was posed in response to our presentation.

In [57]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

In [58]:
ukraine = pd.read_csv('ukraine-damages.csv', sep='|', header=0, index_col=False)
ukraine.head()

Unnamed: 0,damage_id,iso3,country,gid_1,oblast,rayon,type_of_infrastructure,if_other_what,date_of_event,source_name,source_date,source_link,additional_sources,extent_of_damage,_internal_filter_date,_weights,access_subindicator,pcode
0,D0011,UKR,Ukraine,['UKR.15_1'],Luhanska,Siverskodonetskyi,Warehouse,,2022-03-25,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Destroyed,2022-03-25,0.7,['7.2'],UA44
1,D0012,UKR,Ukraine,['UKR.15_1'],Luhanska,Siverskodonetskyi,Warehouse,,2022-03-26,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Partially damaged,2022-03-26,0.7,['7.2'],UA44
2,D0015,UKR,Ukraine,['UKR.14_1'],Lvivska,,Warehouse,,2022-03-26,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Unknown,2022-03-26,1.0,['7.2'],UA46
3,D0016,UKR,Ukraine,['UKR.14_1'],Lvivska,,Aircraft repair plant,Aircraft repair plan,2022-03-26,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Destroyed,2022-03-26,1.0,['7.2'],UA46
4,D0017,UKR,Ukraine,['UKR.12_1'],Kyivska,,Bridge,,2022-03-22,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Destroyed,2022-03-22,1.0,['9.2'],UA32


### How many rows share the same source link?

In [59]:
# how many rows share same source_link
source_links = ukraine.dropna(subset=['source_link'])

total_rows_with_links = len(source_links)
num_unique_links = source_links['source_link'].nunique()

num_duplicate_links = source_links.duplicated(subset=['source_link']).sum()

print(f"Total rows with a 'source_link': {total_rows_with_links}")
print(f"Number of unique 'source_link' values: {num_unique_links}")
print(f"Number of rows sharing a 'source_link' with a previous row: {num_duplicate_links}")
print(f"Calculation check: {total_rows_with_links} (total) - {num_unique_links} (unique) = {total_rows_with_links - num_unique_links} (duplicates)")

Total rows with a 'source_link': 24266
Number of unique 'source_link' values: 7747
Number of rows sharing a 'source_link' with a previous row: 16519
Calculation check: 24266 (total) - 7747 (unique) = 16519 (duplicates)


This indicates that a single article that is reporting on damage to multiple infrastructure items is broken up into multiple rows in our dataset.

Also, the existance of additional sources on some of the rows indicates that multiple reports on a single damage to an infrastructure item are being grouped together rather than recorded as distinct damages.

### How many rows share same event details?

In [None]:
# convert date column to datetime format
ukraine['date_of_event'] = pd.to_datetime(ukraine['date_of_event'], errors='coerce')

# define columns consituting a unique event
event_cols = ['date_of_event', 'oblast', 'type_of_infrastructure', 'rayon']

damage_events = ukraine.dropna(subset=event_cols)

total_event_rows_checked = len(damage_events)

num_event_duplicates = damage_events.duplicated(subset=event_cols).sum()

print(f"Columns checked for event details: {event_cols}")
print(f"Total rows checked (after dropping nulls in those columns): {total_event_rows_checked}")
print(f"Number of rows that seem to be duplicates (same date, oblast, and infrastructure type): {num_event_duplicates}")

Columns checked for event details: ['_internal_filter_date', 'oblast', 'type_of_infrastructure', 'rayon']
Total rows checked (after dropping nulls in those columns): 15547
Number of rows that seem to be duplicates (same date, oblast, and infrastructure type): 4503


In [61]:
damage_events

Unnamed: 0,damage_id,iso3,country,gid_1,oblast,rayon,type_of_infrastructure,if_other_what,date_of_event,source_name,source_date,source_link,additional_sources,extent_of_damage,_internal_filter_date,_weights,access_subindicator,pcode
0,D0011,UKR,Ukraine,['UKR.15_1'],Luhanska,Siverskodonetskyi,Warehouse,,2022-03-25,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Destroyed,2022-03-25,0.7,['7.2'],UA44
1,D0012,UKR,Ukraine,['UKR.15_1'],Luhanska,Siverskodonetskyi,Warehouse,,2022-03-26,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Partially damaged,2022-03-26,0.7,['7.2'],UA44
6,D0020,UKR,Ukraine,['UKR.25_1'],Volynska,Lutskyi,Oil depot,Oil depot,2022-03-26,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,https://www.bbc.com/news/world-europe-60914019...,Destroyed,2022-03-26,0.7,['7.2'],UA07
7,D0024,UKR,Ukraine,['UKR.15_1'],Luhanska,Siverskodonetskyi,Warehouse,,2022-03-26,OCHA,2022-03-28,https://reliefweb.int/report/ukraine/ukraine-h...,,Partially damaged,2022-03-26,0.7,['7.2'],UA44
8,D0025,UKR,Ukraine,['UKR.8_1'],Kharkivska,Kharkivskyi,Nuclear unit,,,IAEA,2022-03-28,https://reliefweb.int/report/ukraine/update-35...,https://www.iaea.org/ar/newscenter/pressreleas...,Partially damaged,2022-03-28,0.7,['7.2'],UA63
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24264,D9995,UKR,Ukraine,['UKR.9_1'],Khersonska,Khersonskyi,"Health facility (hospital, health clinic)",,2023-04-03,MagnoliaTV,2023-04-04,https://magnolia-tv.com/news/88335-khersonska-...,,Partially damaged,2023-04-03,0.7,['7.2'],UA65
24265,D9996,UKR,Ukraine,['UKR.17_1'],Odeska,Odeskyi,Industrial/Business/Enterprise facilities,,2023-04-04,Baltanews,2023-04-04,https://baltanews.city/articles/276787/vorog-a...,,Partially damaged,2023-04-04,0.7,['7.2'],UA51
24266,D9997,UKR,Ukraine,['UKR.6_1'],Donetska,Bakhmutskyi,Industrial/Business/Enterprise facilities,the mine,2023-04-04,Ukrainska Pravda,2023-04-04,https://www.pravda.com.ua/news/2023/04/4/7396355/,,Partially damaged,2023-04-04,0.7,['7.2'],UA14
24267,D9998,UKR,Ukraine,['UKR.6_1'],Donetska,Bakhmutskyi,Government facilities,,2023-04-04,Ukrainska Pravda,2023-04-04,https://www.pravda.com.ua/news/2023/04/4/7396355/,,Partially damaged,2023-04-04,0.7,['7.2'],UA14


In [None]:
mini_damage_events_df = damage_events[['date_of_event', 'oblast', 'type_of_infrastructure', 'rayon','source_link']]
mini_damage_events_df

Unnamed: 0,date_of_event,oblast,type_of_infrastructure,rayon,source_link
0,2022-03-25,Luhanska,Warehouse,Siverskodonetskyi,https://reliefweb.int/report/ukraine/ukraine-h...
1,2022-03-26,Luhanska,Warehouse,Siverskodonetskyi,https://reliefweb.int/report/ukraine/ukraine-h...
6,2022-03-26,Volynska,Oil depot,Lutskyi,https://reliefweb.int/report/ukraine/ukraine-h...
7,2022-03-26,Luhanska,Warehouse,Siverskodonetskyi,https://reliefweb.int/report/ukraine/ukraine-h...
13,2022-03-29,Mykolaivska,Government facilities,Mykolaivskyi,https://www.ansa.it/sito/videogallery/mondo/20...
...,...,...,...,...,...
24264,2023-04-03,Khersonska,"Health facility (hospital, health clinic)",Khersonskyi,https://magnolia-tv.com/news/88335-khersonska-...
24265,2023-04-04,Odeska,Industrial/Business/Enterprise facilities,Odeskyi,https://baltanews.city/articles/276787/vorog-a...
24266,2023-04-04,Donetska,Industrial/Business/Enterprise facilities,Bakhmutskyi,https://www.pravda.com.ua/news/2023/04/4/7396355/
24267,2023-04-04,Donetska,Government facilities,Bakhmutskyi,https://www.pravda.com.ua/news/2023/04/4/7396355/


In [74]:
mini_damage_events_df.iloc[0][4], mini_damage_events_df.iloc[1][4]

  mini_damage_events_df.iloc[0][4], mini_damage_events_df.iloc[1][4]


('https://reliefweb.int/report/ukraine/ukraine-humanitarian-impact-situation-report-1200-pm-eet-28-march-2022',
 'https://reliefweb.int/report/ukraine/ukraine-humanitarian-impact-situation-report-1200-pm-eet-28-march-2022')

In [78]:
ukraine['damage_id'].is_unique

True

## Conclusion

Based on the information found at this link, multiple food warehouses were hit in Siverskodonetskyi on 3/25/22. Combined with the fact the dataset gives unique damage ids to each row, we believe that the rows in the dataset are distinct damage events.