# 01 | Transform the Data
# Introduction
In this notebook, we will deal with the data transformation steps required to proceed with our analysis. This process will be reproduced as .py files later on.

# Goals:
Targeted objectives in this notebook ar checked as follows:

- [ ] Import the raw data
  -  [ ] SIPRI dataset and capitals coordinates
- [ ]  Store the raw data
- [x] Prepare the data
  - [x] Clean each individual table
  - [x] Store the transformed dataset
- Combine both datasets
  - Check if the merging column values match
- Store the final output

# Set up our working environment

In [2]:
# Import required libraries
import pandas as pd
import os

In [3]:
# Create directory folders to store our data
dirname = os.getcwd()

raw_data = f"{dirname}/data/raw/"
transformed_data = f"{dirname}/data/transformed/"
refined_data = f"{dirname}/data/refined/"

paths = [raw_data, transformed_data, refined_data]

for path in paths:
    if not os.path.exists(path):
        os.makedirs(path)

# Data Transformation | SIPRI dataset
Let's start by preparing our main dataset for data analysis.

## Import raw data
We'll first import the raw data that we stored during the previous step. Because we have three indivual tables that must turn into one, we must establish a few steps:
- Import a single table;
- Analyse the required transformation steps we need;
- Apply the transformations;
- Loop through the 3 tables and pass the same steps (if possible);

We can start by using table 5 (Constant US$ (2022))

In [9]:
# Read the file in chunks to find the header row
with open(f"{raw_data}sipri_data_raw_tb_5.csv", "r") as file:
    for i, line in enumerate(file):
        if "Country" in line:
            header_row = i
            break

In [10]:
sipri_raw = pd.read_csv(f"{raw_data}sipri_data_raw_tb_5.csv", skiprows=header_row, header=0)
sipri_raw

Unnamed: 0,Country,Notes,1948,1949,1950,1951,1952,1953,1954,1955,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,,,,,,,,,,,...,,,,,,,,,,
1,Africa,,,,,,,,,,...,,,,,,,,,,
2,North Africa,,,,,,,,,,...,,,,,,,,,,
3,Algeria,§,...,...,...,...,...,...,...,...,...,9724.379971923256,10412.714002896393,10217.081699569308,10073.364021301344,9583.7242883703,10303.60057521065,9708.277440227255,9112.461105348943,9145.810174207281,18263.96796826213
4,Libya,‡§¶,...,...,...,...,...,...,...,...,...,3755.652496350929,...,...,...,...,...,...,...,...,...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
188,Syria,§,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
189,Türkiye,‖,...,197.68186020052622,212.97020550380432,231.81397994737964,257.7686126715495,294.0339899025812,332.0770817037616,382.9197184100121,...,17576.538470505897,15668.749999999998,17827.702150796664,17822.738263164494,19648.69382385138,20436.917121238785,17478.41368526983,15567.410029425082,10779.896284618242,15827.853255045886
190,United Arab Emirates,§,...,...,...,...,...,...,...,...,...,22755.071477195375,...,...,...,...,...,...,...,...,...
191,"Yemen, North",§,...,...,...,...,...,...,...,...,...,xxx,xxx,xxx,xxx,xxx,xxx,xxx,xxx,xxx,xxx
