In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# The Russian-Ukranian War

The purpose of this notebook is to gain further insight into the true, factual losses that the Russian War Machine as incurred over the last two weeks.  To begin, let's take a look at the first of the two sets: russia_losses_equipment.

In [None]:
russian_equipment = pd.read_csv("../input/2022-ukraine-russian-war/russia_losses_equipment.csv", index_col="date", parse_dates=True)
russian_equipment

As expected, the dataset is quite small; the war has only been going since February 26th, 2022.  What are the datatypes of each attribute?

In [None]:
russian_equipment.dtypes

First, let's make a visualization to see the losses the Russian War Machine has incurred over time.  Since we are using time as a variable, it would be a good idea to use a line graph.  Also, to avoid some potential issues, I am going to remove the "day" attribute and work only with the "date" index.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

russian_equipment_no_day = russian_equipment.drop("day", axis=1)
russian_equipment_no_day.head()

sns.set_style("darkgrid")
plt.figure(figsize=(20,9))
plt.title("Russian Equipment Losses")
plt.xlabel("Date")
plt.ylabel("Asset")
sns.lineplot(data=russian_equipment_no_day)
plt.show()

There are three bins that trend positively: fuel, tanks, and military auto.  It could also be argued that fuel expenditures have a relationship with tank and military auto losses.  Let's take a closer look at these three variables.

In [None]:
plt.figure(figsize=(20,9))

x_data = ["military auto", "APC"]

for vehicle in x_data:
    sns.lmplot(data=russian_equipment_no_day, x=vehicle, y="tank")
plt.show()

Military auto losses have a large, positive trending relationship.  At the static rate of 60 fuel tanks lost per day, this behavior is to be expected.  The more fuel that is lost, the more likely it is to be that vehicle losses will incure due to lack of fuel and/or Ukranian acquisition.

Now, let's take a look at the Russian Personnel losses over time.

In [None]:
russian_personnel = pd.read_csv("../input/2022-ukraine-russian-war/russia_losses_personnel.csv", parse_dates=True)
russian_personnel

Two columns "POW" contain NaN values.  Let's replace the POW NaNs with the mean of the column while dropping the "personnel*" column; personnel is ordinal data that I don't see a concrete use for at this time.

In [None]:
# Also dropping the "day" column as datetime values are preferred.
# russian_personnel = russian_personnel.drop("day", axis=1)
russian_personnel = russian_personnel.drop("personnel*", axis=1)
russian_personnel

In [None]:
# Replace NaNs with the mean of the POW column
from sklearn.impute import SimpleImputer

russian_personnel_copy = russian_personnel.drop("date", axis=1)

my_imputer = SimpleImputer()
imputed_data = pd.DataFrame(my_imputer.fit_transform(russian_personnel_copy))
imputed_data.columns = russian_personnel_copy.columns
imputed_data

While this is a good show of how machine learning can regressively backfill NaN values with the mean, it is probably a better idea to replace the NaNs with zero.  This will scew the data much less, and it makes more sense since the data was lower at the start of the invasion most likely.

In [None]:
# Replace all NaN POW values with zero
russian_personnel_copy = russian_personnel.fillna(0)
russian_personnel_copy

Now let's create some visualizations.

In [None]:
plt.figure(figsize=(20,6))
sns.lineplot(data=russian_personnel_copy, x="date", y="personnel")
plt.show()

Personnel losses trends upward as equipment losses trend upward as well.  As the dataset is still relatviely small, the only concrete insights that can be made are that Russian equipment and personnel losses will continue to increase as their invasion continues.  This is because the Ukranian resistance Russian forces have faced thus far has been incredibly driven, powerful, and a clearly more intense adversary than the Kremlin could have ever expected.  Based on this data, the longer the Russian occupancy in Ukraine continues the higher the cost the Kremlin must pay in Russian assets and lives.

In [None]:
nan