# CO2 Emission Model Deployment and Performance Testing
In this notebook, we will save the trained machine learning model and test its performance in a deployment-like environment.

In [1]:
# Import necessary libraries
import pickle
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression  # Example ML model

## Load the Preprocessed Data

In [2]:
# Load the preprocessed data
data_path = '../data/cleaned/co2_emissions_cleaned.csv'
df = pd.read_csv(data_path)
X = df.iloc[:, 2:]  # Feature columns
y = df.iloc[:, 1]   # Target column (CO2 emissions for a specific year)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Train and Save the Model

In [10]:
print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(X.head())  # Check the first few rows of X
print(y.head())  # Check the first few rows of y



X shape: (0, 33)
y shape: (0,)
Empty DataFrame
Columns: [country_name, country_code, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
Index: []

[0 rows x 33 columns]
Series([], Name: 2020, dtype: float64)


In [9]:
# Fill NaN values in X with the mean of each column
X = X.fillna(X.mean())

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Save the model using pickle
model_filename = 'models/co2_model.pkl'
with open(model_filename, 'wb') as file:
    pickle.dump(model, file)

print(f'Model saved as {model_filename}')


ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

## Load the Model and Test Performance

In [None]:
# Load the saved model
with open(model_filename, 'rb') as file:
    loaded_model = pickle.load(file)
print('Model loaded successfully')

## Evaluate Model Performance on Test Data

In [None]:
# Make predictions on the test set
y_pred = loaded_model.predict(X_test)

# Evaluate performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

## Next Steps
- Deploy the model to a cloud environment (e.g., AWS, Google Cloud) if needed.
- Build an API using Flask or FastAPI to allow real-time predictions from the saved model.