### Original Dataframe has been split into 2 categories based on Indicator type to make 2 smaller dataframes
## Goal: Split the already reduced 2 dataframes further based on Year of INFORM Index column categories

### Importing 2 dataframes

In [2]:
import pandas as pd

In [4]:
df1 = pd.read_csv("2014-2024 INORM Index.csv")

In [5]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 326248 entries, 0 to 326247
Data columns (total 7 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Iso3            326248 non-null  object 
 1   IndicatorId     326248 non-null  object 
 2   IndicatorName   325091 non-null  object 
 3   IndicatorScore  326248 non-null  float64
 4   SurveyYear      326248 non-null  int64  
 5   Indicator Type  326248 non-null  object 
 6   INFORMYear      326248 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 17.4+ MB


In [8]:
df1.head()

Unnamed: 0,Iso3,IndicatorId,IndicatorName,IndicatorScore,SurveyYear,Indicator Type,INFORMYear
0,EST,CC.INF.AHC.HEALTH-EXP,Health expenditure per capita,0.0,2021,INORM Index,2024
1,ETH,CC.INF.AHC.HEALTH-EXP,Health expenditure per capita,9.9,2020,INORM Index,2024
2,FIN,CC.INF.AHC.HEALTH-EXP,Health expenditure per capita,0.0,2020,INORM Index,2024
3,FJI,CC.INF.AHC.HEALTH-EXP,Health expenditure per capita,8.7,2020,INORM Index,2024
4,FRA,CC.INF.AHC.HEALTH-EXP,Health expenditure per capita,0.0,2020,INORM Index,2024


In [17]:
str(len(df1['INFORMYear'].unique()))

'10'

In [6]:
df2 = pd.read_csv("2014-2024 Core Indicators.csv")

In [7]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147420 entries, 0 to 147419
Data columns (total 7 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Iso3            147420 non-null  object 
 1   IndicatorId     147420 non-null  object 
 2   IndicatorName   147420 non-null  object 
 3   IndicatorScore  147420 non-null  float64
 4   SurveyYear      147420 non-null  int64  
 5   Indicator Type  147420 non-null  object 
 6   INFORMYear      147420 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 7.9+ MB


In [9]:
df2.head()

Unnamed: 0,Iso3,IndicatorId,IndicatorName,IndicatorScore,SurveyYear,Indicator Type,INFORMYear
0,AFG,AFF_DR,People affected by drought (absolute) - raw,886000.0,2022,Core Indicators,2024
1,AGO,AFF_DR,People affected by drought (absolute) - raw,197920.457143,2022,Core Indicators,2024
2,ALB,AFF_DR,People affected by drought (absolute) - raw,91428.571429,2022,Core Indicators,2024
3,ARE,AFF_DR,People affected by drought (absolute) - raw,0.0,2022,Core Indicators,2024
4,ARG,AFF_DR,People affected by drought (absolute) - raw,1000.914286,2022,Core Indicators,2024


In [18]:
str(len(df2['INFORMYear'].unique()))

'10'

### Split each dataframe on Year of INFORM Index (INFORMYear)  
- 10 categories x 2 dataframes
- Store as (10 x 2) 20 new dataframes
- Keep original dataframes
- Export new worksheet for each year dataframes to Project folder

#### Splitting INFORM Index (df1) by Index Year

In [21]:
# Grouping by 'INFORM Year'
grouped_df1 = df1.groupby('INFORMYear')

# Creating a dictionary to store DataFrames
df1s = {}

# Splitting and storing DataFrames based on category
for category, group_df1 in grouped_df1:
    df1s[category] = group_df1.copy()

# Accessing the new DataFrames and saving them
for category, new_df in df1s.items():
    print(f"DataFrame of Indeces for INFORM Year {category}:")
    print(new_df.head())
    print()  # line to space between dataframes
    print(new_df.info())
    print()  # line to space between dataframes
    print()  # line to space between dataframes

DataFrame of Indeces for INFORM Year 2015:
       Iso3 IndicatorId                                 IndicatorName  \
293544  AFG  AFF_DR_REL  People affected by droughts (relative) - raw   
293545  AGO  AFF_DR_REL  People affected by droughts (relative) - raw   
293546  ALB  AFF_DR_REL  People affected by droughts (relative) - raw   
293547  ARE  AFF_DR_REL  People affected by droughts (relative) - raw   
293548  ARG  AFF_DR_REL  People affected by droughts (relative) - raw   

        IndicatorScore  SurveyYear Indicator Type  INFORMYear  
293544        0.568526           0    INORM Index        2015  
293545        0.468029           0    INORM Index        2015  
293546        3.170088           0    INORM Index        2015  
293547        0.000000           0    INORM Index        2015  
293548        0.000000           0    INORM Index        2015  

<class 'pandas.core.frame.DataFrame'>
Index: 32541 entries, 293544 to 326084
Data columns (total 7 columns):
 #   Column          Non

#### New INFORM Index Databases/Dataframes created & stored remotely for each Index Year 2015-2024

In [23]:
import os

# Define the folder path where you want to save the DataFrames
folder_path = '~/Desktop/CodeOp/DSF02/Group Project'  # Replace this with your desired folder path

# Accessing the new DataFrames and saving them
for category, new_df in df1s.items():
    # Construct the file path
    file_path = os.path.join(folder_path, f"INFORM Index {category}.csv")

    # Save each DataFrame as a separate CSV file in the specified folder
    new_df.to_csv(file_path, index=False)
    print(f"DataFrame for Indeces for INFORM Year {category} saved as {file_path}")

DataFrame for Indeces for INFORM Year 2015 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2015.csv
DataFrame for Indeces for INFORM Year 2016 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2016.csv
DataFrame for Indeces for INFORM Year 2017 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2017.csv
DataFrame for Indeces for INFORM Year 2018 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2018.csv
DataFrame for Indeces for INFORM Year 2019 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2019.csv
DataFrame for Indeces for INFORM Year 2020 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2020.csv
DataFrame for Indeces for INFORM Year 2021 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2021.csv
DataFrame for Indeces for INFORM Year 2022 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2022.csv
DataFrame for Indeces for INFORM Year 2023 saved as ~/Desktop/CodeOp/DSF02/Group Project\INFORM Index 2023.csv
D

#### Splitting Core Indicators (df2) by Index Year

In [24]:
# Grouping by 'INFORM Year'
grouped_df2 = df2.groupby('INFORMYear')

# Creating a dictionary to store DataFrames
df2s = {}

# Splitting and storing DataFrames based on category
for category, group_df2 in grouped_df2:
    df2s[category] = group_df2.copy()

# Accessing the new DataFrames and saving them
for category, new_df2 in df2s.items():
    print(f"DataFrame of Indeces for INFORM Year {category}:")
    print(new_df2.head())
    print()  # line to space between dataframes
    print(new_df2.info())
    print()  # line to space between dataframes
    print()  # line to space between dataframes

DataFrame of Indeces for INFORM Year 2015:
      Iso3 IndicatorId                                IndicatorName  \
73480  AFG      AFF_DR  People affected by drought (absolute) - raw   
73481  AGO      AFF_DR  People affected by drought (absolute) - raw   
73482  ALB      AFF_DR  People affected by drought (absolute) - raw   
73483  ARE      AFF_DR  People affected by drought (absolute) - raw   
73484  ARG      AFF_DR  People affected by drought (absolute) - raw   

       IndicatorScore  SurveyYear   Indicator Type  INFORMYear  
73480   186000.000000        2013  Core Indicators        2015  
73481   126968.571429        2013  Core Indicators        2015  
73482    91428.571429        2013  Core Indicators        2015  
73483        0.000000        2013  Core Indicators        2015  
73484        0.000000        2013  Core Indicators        2015  

<class 'pandas.core.frame.DataFrame'>
Index: 14471 entries, 73480 to 146895
Data columns (total 7 columns):
 #   Column          Non-Null C

#### New Core Indicators Databases/Dataframes created & stored remotely for each Index Year 2015-2024

In [25]:
# Define the folder path where you want to save the DataFrames
folder_path = '~/Desktop/CodeOp/DSF02/Group Project'  # Replace this with your desired folder path

# Accessing the new DataFrames and saving them
for category, new_df2 in df2s.items():
    # Construct the file path
    file_path = os.path.join(folder_path, f"Core Indicators {category}.csv")

    # Save each DataFrame as a separate CSV file in the specified folder
    new_df2.to_csv(file_path, index=False)
    print(f"DataFrame for Core Indicators for INFORM Year {category} saved as {file_path}")

DataFrame for Core Indicators for INFORM Year 2015 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2015.csv
DataFrame for Core Indicators for INFORM Year 2016 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2016.csv
DataFrame for Core Indicators for INFORM Year 2017 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2017.csv
DataFrame for Core Indicators for INFORM Year 2018 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2018.csv
DataFrame for Core Indicators for INFORM Year 2019 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2019.csv
DataFrame for Core Indicators for INFORM Year 2020 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2020.csv
DataFrame for Core Indicators for INFORM Year 2021 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2021.csv
DataFrame for Core Indicators for INFORM Year 2022 saved as ~/Desktop/CodeOp/DSF02/Group Project\Core Indicators 2022.csv
DataFrame for Core Indic