<a href="https://colab.research.google.com/github/safiyenarman/DSA210-Project/blob/main/data_process.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Processing of Sources

This notebook merges and processes data from two sources:
- Sustainable Development Goals (SDG) scores (2019–2022)
- World Happiness Rankings (2019–2022)

# SDG Index Scores (2019–2022)
This part below, filters and explores Sustainable Development Goal (SDG) scores between the years 2019 and 2022.

In [None]:
import pandas as pd

## Load the dataset

In [None]:
# Load the full SDG dataset
df = pd.read_csv("sdg_scores.csv")

# Filter for years 2019 to 2022
df_filtered = df[(df['year'] >= 2019) & (df['year'] <= 2022)].reset_index(drop=True)

# Preview
df_filtered.head()

## Export filtered dataset


In [None]:
df_filtered.to_csv("sdg_2019_2022.csv", index=False)


# World Happiness Rankings (2019–2022)
This part below, cleans happiness scores for countries between 2019 and 2022. It standardizes the format of the happiness ranks according to countries.

In [None]:
import pandas as pd

## Load Datasets

In [None]:
df_2019 = pd.read_csv("2019.csv")
df_2020 = pd.read_csv("2020.csv")
df_2021 = pd.read_csv("2021.csv")
df_2022 = pd.read_csv("2022.csv")

## Clean and Standardize Columns

In [None]:

df_2019_cleaned = df_2019.rename(columns={
    'Country or region': 'Country',
    'Score': 'Score'
})[['Country', 'Score']]
df_2019_cleaned['Year'] = 2019

df_2020_cleaned = df_2020.rename(columns={
    'Country name': 'Country',
    'Ladder score': 'Score'
})[['Country', 'Score']]
df_2020_cleaned['Year'] = 2020

df_2021_cleaned = df_2021.rename(columns={
    'Country name': 'Country',
    'Ladder score': 'Score'
})[['Country', 'Score']]
df_2021_cleaned['Year'] = 2021


## Merge and Clean Country Names

In [None]:
df_2022['Happiness score'] = df_2022['Happiness score'].str.replace(',', '.').astype(float)
df_2022_cleaned = df_2022.rename(columns={
    'Country': 'Country',
    'Happiness score': 'Score'
})[['Country', 'Score']]
df_2022_cleaned['Year'] = 2022

merged_df = pd.concat([
    df_2019_cleaned,
    df_2020_cleaned,
    df_2021_cleaned,
    df_2022_cleaned
], ignore_index=True)

# Sort by country and year
merged_sorted_by_country = merged_df.sort_values(by=["Country", "Year"]).reset_index(drop=True)


## Export Cleaned Data

In [None]:
merged_sorted_by_country.to_csv("merged_happiness_by_country.csv", index=False)