# Regression Analysis

## Table of Contents

1. [Setup and Installation](#2-setup-and-installation)
   - [1.1 Import Libraries](#31-import-libraries)

2. [Data Extraction & Inspection](#3-Data-Extraction-&-Inspection) 
   - [2.1 Load CSV](#32-Load-CSV)
   - [2.2 Inspect dataset](#32-Inspect-dataset)


5. [Data Cleaning and Transformation](#5-data-cleaning-and-transformation)
   - [5.1 Data Cleaning Process](#51-data-cleaning-process)
   - [5.2 Cleaned Dataset Preview](#52-cleaned-dataset-preview)

6. [Exploratory Data Analysis](#6-exploratory-data-analysis)
   - [6.1 ](#61-seasonal-analysis)
   - [6.2 ](#62-price-trends)
7. 


## 1. Setup and Installation

### Required Packages
The following libraries are required for this analysis:
- **Pandas**: Data manipulation and analysis
- **Matplotlib**: Data visualization
- **Numpy**: Numerical manupilations 
- **Statsmodel** : Fit data to models
- **SSL** : 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# Set the path to the CA certificates bundle  ??????????
ssl._create_default_https_context = ssl._create_unverified_context

In [13]:
!pip install -r requirements.txt



## 2. Data Extraction

2.1 Load CSV Dataset

In [None]:
df = pd.read_csv('co2_emissions_from_agri.csv', index_col=0)
df.head(10)

Unnamed: 0_level_0,Year,Savanna fires,Forest fires,Crop Residues,Rice Cultivation,Drained organic soils (CO2),Pesticides Manufacturing,Food Transport,Forestland,Net Forest conversion,...,Manure Management,Fires in organic soils,Fires in humid tropical forests,On-farm energy use,Rural population,Urban population,Total Population - Male,Total Population - Female,total_emission,Average Temperature °C
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,1990,14.7237,0.0557,205.6077,686.0,0.0,11.807483,63.1152,-2388.803,0.0,...,319.1763,0.0,0.0,,9655167.0,2593947.0,5348387.0,5346409.0,2198.963539,0.536167
Afghanistan,1991,14.7237,0.0557,209.4971,678.16,0.0,11.712073,61.2125,-2388.803,0.0,...,342.3079,0.0,0.0,,10230490.0,2763167.0,5372959.0,5372208.0,2323.876629,0.020667
Afghanistan,1992,14.7237,0.0557,196.5341,686.0,0.0,11.712073,53.317,-2388.803,0.0,...,349.1224,0.0,0.0,,10995568.0,2985663.0,6028494.0,6028939.0,2356.304229,-0.259583
Afghanistan,1993,14.7237,0.0557,230.8175,686.0,0.0,11.712073,54.3617,-2388.803,0.0,...,352.2947,0.0,0.0,,11858090.0,3237009.0,7003641.0,7000119.0,2368.470529,0.101917
Afghanistan,1994,14.7237,0.0557,242.0494,705.6,0.0,11.712073,53.9874,-2388.803,0.0,...,367.6784,0.0,0.0,,12690115.0,3482604.0,7733458.0,7722096.0,2500.768729,0.37225
Afghanistan,1995,14.7237,0.0557,243.8152,666.4,0.0,11.712073,54.6445,-2388.803,0.0,...,397.5498,0.0,0.0,,13401971.0,3697570.0,8219467.0,8199445.0,2624.612529,0.285583
Afghanistan,1996,38.9302,0.2014,249.0364,686.0,0.0,11.712073,53.1637,-2388.803,0.0,...,465.205,0.0,0.0,,13952791.0,3870093.0,8569175.0,8537421.0,2838.921329,0.036583
Afghanistan,1997,30.9378,0.1193,276.294,705.6,0.0,11.712073,52.039,-2388.803,0.0,...,511.5927,0.0,0.0,,14373573.0,4008032.0,8916862.0,8871958.0,3204.180115,0.415167
Afghanistan,1998,64.1411,0.3263,287.4346,705.6,0.0,11.712073,52.705,-2388.803,0.0,...,541.6598,0.0,0.0,,14733655.0,4130344.0,9275541.0,9217591.0,3560.716661,0.890833
Afghanistan,1999,46.1683,0.0895,247.498,548.8,0.0,11.712073,35.763,-2388.803,0.0,...,611.0611,0.0,0.0,,15137497.0,4266179.0,9667811.0,9595036.0,3694.806533,1.0585


2.2 Data Inspection  

In [35]:
# Show structure and types
df.info()

# Describe numeric columns
df.describe().T

<class 'pandas.core.frame.DataFrame'>
Index: 6965 entries, Afghanistan to Zimbabwe
Data columns (total 30 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Year                             6965 non-null   int64  
 1   Savanna fires                    6934 non-null   float64
 2   Forest fires                     6872 non-null   float64
 3   Crop Residues                    5576 non-null   float64
 4   Rice Cultivation                 6965 non-null   float64
 5   Drained organic soils (CO2)      6965 non-null   float64
 6   Pesticides Manufacturing         6965 non-null   float64
 7   Food Transport                   6965 non-null   float64
 8   Forestland                       6472 non-null   float64
 9   Net Forest conversion            6472 non-null   float64
 10  Food Household Consumption       6492 non-null   float64
 11  Food Retail                      6965 non-null   float64
 12  On-farm Ele

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,6965.0,2005.125,8.894665,1990.0,1997.0,2005.0,2013.0,2020.0
Savanna fires,6934.0,1188.391,5246.288,0.0,0.0,1.65185,111.0814,114616.4
Forest fires,6872.0,919.3022,3720.079,0.0,0.0,0.5179,64.95077,52227.63
Crop Residues,5576.0,998.7063,3700.345,0.0002,11.006525,103.6982,377.641,33490.07
Rice Cultivation,6965.0,4259.667,17613.83,0.0,181.2608,534.8174,1536.64,164915.3
Drained organic soils (CO2),6965.0,3503.229,15861.45,0.0,0.0,0.0,690.4088,241025.1
Pesticides Manufacturing,6965.0,333.4184,1429.159,0.0,6.0,13.0,116.3255,16459.0
Food Transport,6965.0,1939.582,5616.749,0.0001,27.9586,204.9628,1207.001,67945.76
Forestland,6472.0,-17828.29,81832.21,-797183.079,-2848.35,-62.92,0.0,171121.1
Net Forest conversion,6472.0,17605.64,101157.5,0.0,0.0,44.44,4701.746,1605106.0


In [None]:

df_clean = df.copy()
    
# Handle missing values in numeric columns
num_cols = df_clean.select_dtypes(include=[np.number]).columns
for col in num_cols:
    if df_clean[col].isnull().any():
        median_val = df_clean[col].median()
        df_clean[col] = df_clean[col].fillna(median_val)
        print(f"Filled missing values in '{col}' with median: {median_val:.2f}")

Filled missing values in 'Savanna fires' with median: 1.65
Filled missing values in 'Forest fires' with median: 0.52
Filled missing values in 'Crop Residues' with median: 103.70
Filled missing values in 'Forestland' with median: -62.92
Filled missing values in 'Net Forest conversion' with median: 44.44
Filled missing values in 'Food Household Consumption' with median: 155.47
Filled missing values in 'IPPU' with median: 803.71
Filled missing values in 'Manure applied to Soils' with median: 120.44
Filled missing values in 'Manure Management' with median: 269.86
Filled missing values in 'Fires in humid tropical forests' with median: 0.00
Filled missing values in 'On-farm energy use' with median: 141.10


In [47]:
# Get all column names
cols = df_clean.columns.tolist()

# Check if the column exists
if 'Drained organic soils (CO2)' in cols:
    # Move the column to the first position
    cols.remove('Drained organic soils (CO2)')
    new_cols = ['Drained organic soils (CO2)'] + cols
    df_clean = df_clean[new_cols]
    print("Reordered columns successfully!")

df_clean.head()

Reordered columns successfully!


Unnamed: 0_level_0,Drained organic soils (CO2),Year,Savanna fires,Forest fires,Crop Residues,Rice Cultivation,Pesticides Manufacturing,Food Transport,Forestland,Net Forest conversion,...,Manure Management,Fires in organic soils,Fires in humid tropical forests,On-farm energy use,Rural population,Urban population,Total Population - Male,Total Population - Female,total_emission,Average Temperature °C
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.0,1990,14.7237,0.0557,205.6077,686.0,11.807483,63.1152,-2388.803,0.0,...,319.1763,0.0,0.0,141.0963,9655167.0,2593947.0,5348387.0,5346409.0,2198.963539,0.536167
Afghanistan,0.0,1991,14.7237,0.0557,209.4971,678.16,11.712073,61.2125,-2388.803,0.0,...,342.3079,0.0,0.0,141.0963,10230490.0,2763167.0,5372959.0,5372208.0,2323.876629,0.020667
Afghanistan,0.0,1992,14.7237,0.0557,196.5341,686.0,11.712073,53.317,-2388.803,0.0,...,349.1224,0.0,0.0,141.0963,10995568.0,2985663.0,6028494.0,6028939.0,2356.304229,-0.259583
Afghanistan,0.0,1993,14.7237,0.0557,230.8175,686.0,11.712073,54.3617,-2388.803,0.0,...,352.2947,0.0,0.0,141.0963,11858090.0,3237009.0,7003641.0,7000119.0,2368.470529,0.101917
Afghanistan,0.0,1994,14.7237,0.0557,242.0494,705.6,11.712073,53.9874,-2388.803,0.0,...,367.6784,0.0,0.0,141.0963,12690115.0,3482604.0,7733458.0,7722096.0,2500.768729,0.37225


In [52]:
#Generate the regression string
formula_str = df_clean.columns[0]+' ~ '+'+'.join(df_clean.columns[1:])
formula_str

'Drained organic soils (CO2) ~ Year+Savanna fires+Forest fires+Crop Residues+Rice Cultivation+Pesticides Manufacturing+Food Transport+Forestland+Net Forest conversion+Food Household Consumption+Food Retail+On-farm Electricity Use+Food Packaging+Agrifood Systems Waste Disposal+Food Processing+Fertilizers Manufacturing+IPPU+Manure applied to Soils+Manure left on Pasture+Manure Management+Fires in organic soils+Fires in humid tropical forests+On-farm energy use+Rural population+Urban population+Total Population - Male+Total Population - Female+total_emission+Average Temperature °C'

In [53]:
#Construct and fit the model using ols
model=sm.ols(formula=formula_str, data=df_clean)
fitted = model.fit()

#Print the summary of the model
print(fitted.summary())

SyntaxError: invalid syntax (<unknown>, line 1)