### Importing Libraries and Loading Datasets
In this section, we import the Pandas library and load multiple datasets required for building the ML model. These include:
- **CO2 Emissions Data**: Data on per capita carbon emissions for each country.
- **Countries Data**: Country-related details, potentially including socio-economic and geographic information.
- **IEA Cleaned Data**: Energy consumption data from IEA (International Energy Agency).
- **Compliance Emissions and Revenue Data**: Data on emissions compliance and related revenue.

Loading these datasets is a crucial first step as they form the basis for further data merging, transformation, and feature engineering required for the ML model. Each dataset brings important information that will help in predicting carbon emissions more accurately.

In [2]:
import pandas as pd

# Load the datasets (replace with your file paths)
co2_data = pd.read_excel('CO2 Per Capita Data.xlsx', sheet_name='Sheet1')
countries_data = pd.read_csv('Countries.csv', encoding='ISO-8859-1')
iea_cleaned_data = pd.read_csv('IEA_cleaned.csv', encoding='ISO-8859-1')
compliance_emissions_data = pd.read_csv('compliance_emissions.csv', encoding='ISO-8859-1')
compliance_revenue_data = pd.read_csv('compliance_revenue.csv', encoding='ISO-8859-1')

### Merging Datasets
In this part, we sequentially merge multiple datasets to create a comprehensive table that contains all the relevant information:

1. **Step 1**: Merging CO2 Data with Countries Data - We merge the CO2 emissions data with the countries data to bring in socio-economic and geographic information about each country. This enriches the CO2 dataset with more contextual information that might be critical for understanding variations in emissions.

2. **Step 2**: Merging with IEA Cleaned Data - Adding energy consumption data from IEA allows us to correlate energy usage with CO2 emissions, giving insight into the relationship between energy policies and carbon footprint.

3. **Step 3**: Merging Compliance Emissions Data - Incorporating compliance-related emissions data provides an understanding of how regulations affect emissions, adding an important dimension to our dataset for model training.

4. **Step 4**: Merging Compliance Revenue Data - Compliance revenue data can help us understand the economic impact of emissions regulations. This feature might be useful to capture the economic incentive or cost related to emissions control.

5. **Step 5**: Review of Merged Data - After merging, we review the data to ensure it is complete and correctly structured, which is crucial for a successful machine learning workflow.

6. **Step 6**: Saving the Final Merged Dataset - Finally, we save the merged dataset as a CSV file to use in the ML model. This flat file is now enriched with all the essential features needed for predicting per capita carbon emissions. Proper merging ensures data consistency, integrity, and completeness, which are critical for an effective model.

In [3]:
# Step 1: Merge CO2 data with Countries Data on 'Country' (CO2) and 'name' (Countries)
merged_data = co2_data.merge(countries_data, left_on='Country', right_on='name', how='left')

# Step 2: Merge with IEA Cleaned data
merged_data = merged_data.merge(iea_cleaned_data, on='Country', how='left')

# Step 3: Merge with Compliance Emissions data
merged_data = merged_data.merge(compliance_emissions_data, on='Country', how='left')

# Step 4: Merge with Compliance Revenue data
merged_data = merged_data.merge(compliance_revenue_data, on='Country', how='left')

# Step 5: Review the merged data to ensure there are no missing columns and that the data is in the right structure
print(merged_data.head())

# Step 6: Save the final merged data as a flat file (CSV) for ML use
merged_data.to_csv('final_merged_flat_table.csv', index=False)

          Country  Year  Total Population  Growth Rate %  Land Area (km²)  \
0     Afghanistan  2024          40121552           2.22         652867.0   
1         Albania  2024           3107100           0.16          28748.0   
2         Algeria  2024          47022473           1.54        2381741.0   
3  American Samoa  2024             43895          -1.54            199.0   
4         Andorra  2024             85370          -0.12            468.0   

   Forest Cover Area (km²)  Per Capita GDP (PPP) (USD)  \
0                   1208.0                        2065   
1                   7890.0                       15000   
2                  19490.0                       13000   
3                     17.0                       11200   
4                     16.0                       49900   

   Per Capita Energy Consumption (kWh)  Per Capita Carbon Emissions (tonnes)  \
0                                  677                                 0.212   
1                           