**WEATHER MODEL**

In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

In [None]:
# Load the dataset
url = 'https://raw.githubusercontent.com/noobstang/NNtraining/master/Weather49Sets/weatherstats_ottawa_daily.csv'

weather_data = pd.read_csv(url)
weather_data.head(5)

  weather_data = pd.read_csv(url)


Unnamed: 0,date,max_temperature,avg_hourly_temperature,avg_temperature,min_temperature,max_humidex,min_windchill,max_relative_humidity,avg_hourly_relative_humidity,avg_relative_humidity,...,avg_cloud_cover_4,min_cloud_cover_4,max_cloud_cover_8,avg_hourly_cloud_cover_8,avg_cloud_cover_8,min_cloud_cover_8,max_cloud_cover_10,avg_hourly_cloud_cover_10,avg_cloud_cover_10,min_cloud_cover_10
0,2024-01-31,2.0,-3.24,-3.0,-8.0,,-13.0,89,82.5,80.5,...,,,8.0,8.0,8.0,8.0,,,,
1,2024-01-30,-4.0,-6.08,-5.85,-7.7,,-15.0,91,73.4,76.5,...,,,8.0,7.7,7.5,7.0,,,,
2,2024-01-29,-0.2,-2.75,-4.05,-7.9,,-10.0,89,77.1,77.5,...,,,8.0,5.2,4.5,1.0,,,,
3,2024-01-28,1.6,0.35,0.5,-0.6,,-4.0,100,91.2,87.0,...,,,8.0,6.6,5.0,2.0,,,,
4,2024-01-27,1.1,0.28,0.45,-0.2,,-3.0,100,98.0,95.0,...,,,8.0,7.3,5.0,2.0,,,,


In [None]:
# Convert 'date' column to datetime type for filtering
weather_data['date'] = pd.to_datetime(weather_data['date'])

# Filter data for dates between May 1st and November 30th for each year
filtered_data = weather_data[(weather_data['date'].dt.month >= 5) & (weather_data['date'].dt.month <= 11)]

# Further filter data for the years 2013 to 2023
filtered_data = filtered_data[(filtered_data['date'].dt.year >= 2013) & (filtered_data['date'].dt.year <= 2023)]

# Select the required columns
selected_columns = ['date', 'avg_temperature', 'precipitation', 'solar_radiation', 'avg_pressure_sea']
final_data = filtered_data[selected_columns]

# Split the dataset into training (2013-2022) and testing (2023)
train_data = final_data[final_data['date'].dt.year < 2023]
test_data = final_data[final_data['date'].dt.year == 2023]


**model**

In [None]:
# Standardizing the features
scaler = StandardScaler()
X_train = scaler.fit_transform(train_data.drop('date', axis=1))
X_test = scaler.transform(test_data.drop('date', axis=1))

# Imputing missing values
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

# Target variable
y_train = train_data['avg_temperature'].values
y_test = test_data['avg_temperature'].values




In [None]:
# Constructing and training the MLP model
mlp_model = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=500, random_state=42)
mlp_model.fit(X_train_imputed, y_train)

# Predicting on the test data
y_pred = mlp_model.predict(X_test_imputed)

# Evaluating the model's performance
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f'Mean Absolute Error: {mae}')
print(f'Root Mean Square Error: {rmse}')

Mean Absolute Error: 0.07102227543533181
Root Mean Square Error: 0.09701273069296401


In [None]:
# Assuming 'test_data' still includes the 'date' column and corresponds to your test dataset
test_dates = test_data['date'].reset_index(drop=True)

# Creating a DataFrame for comparison
comparison_with_dates = pd.DataFrame({
    'Date': test_dates,
    'Actual Temperature': y_test,
    'Predicted Temperature': y_pred
})

# Printing the first 5 rows of the comparison
print(comparison_with_dates.head(5))


        Date  Actual Temperature  Predicted Temperature
0 2023-11-30                2.15               2.081863
1 2023-11-29               -4.14              -4.121984
2 2023-11-28               -5.75              -5.796403
3 2023-11-27                1.35               1.281115
4 2023-11-26                0.29               0.221973




```
Date                Actual Temperature  Predicted Temperature
0 2023-11-30                2.15               2.081863
1 2023-11-29               -4.14              -4.121984
2 2023-11-28               -5.75              -5.796403
3 2023-11-27                1.35               1.281115
4 2023-11-26                0.29               0.221973
```



Mean Absolute Error: 0.07102227543533181

Root Mean Square Error: 0.09701273069296401


---


The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) reported are very low, which suggests that the MLP model has performed quite well on the test dataset for predicting the average temperature. An MAE of approximately 0.071 degrees and an RMSE of approximately 0.097 degrees indicate that, on average, the model's predictions are very close to the actual observed values, with small prediction errors.

These metrics are particularly useful for evaluating the performance of regression models:

MAE provides a straightforward interpretation of the average magnitude of errors between the predicted and actual values, without considering their direction. A lower MAE value indicates better model performance, with 0 being a perfect score.
RMSE gives a higher weight to larger errors due to the squaring of each term, which means it is particularly sensitive to outliers in the prediction errors. Similar to MAE, a lower RMSE value indicates better model performance, and a score of 0 would mean the model is perfect.
Given the context of weather prediction, where even small changes can be significant, the low values of MAE and RMSE achieved are impressive. They suggest that the model could be very useful for practical applications, such as agricultural planning, event planning, and more, within the geographic area and time frame your data covers.

**OTHER ATTRIBUTES**

In [None]:
# Assuming 'train_data' and 'test_data' have not been altered and still contain all necessary columns
y_train_precipitation = train_data['precipitation'].values
y_test_precipitation = test_data['precipitation'].values

# Set 'solar_radiation' as the target variable
y_train_solar = train_data['solar_radiation'].values
y_test_solar = test_data['solar_radiation'].values

# Set 'avg_pressure_sea' as the target variable
y_train_pressure = train_data['avg_pressure_sea'].values
y_test_pressure = test_data['avg_pressure_sea'].values


**Precipitation**

In [None]:
mlp_precipitation = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=500, random_state=42)
mlp_precipitation.fit(X_train_imputed, y_train_precipitation)

# Predicting and evaluating for precipitation
y_pred_precipitation = mlp_precipitation.predict(X_test_imputed)
mae_precipitation = mean_absolute_error(y_test_precipitation, y_pred_precipitation)
rmse_precipitation = np.sqrt(mean_squared_error(y_test_precipitation, y_pred_precipitation))

print(f'Precipitation - Mean Absolute Error: {mae_precipitation}')
print(f'Precipitation - Root Mean Square Error: {rmse_precipitation}')

# Compare actual vs. predicted solar radiation for the first 5 data points
comparison_p = pd.DataFrame({
    'Date': test_dates,
    'Actual Precipitation ': y_test_solar,
    'Predicted Precipitation': y_pred_precipitation
})
print(comparison_p.head(5))



Precipitation - Mean Absolute Error: 0.038120378520387715
Precipitation - Root Mean Square Error: 0.05479543638407956
        Date  Actual Precipitation   Predicted Precipitation
0 2023-11-30                 4460.0                -0.055228
1 2023-11-29                 4626.0                 0.214526
2 2023-11-28                 6134.0                 0.029430
3 2023-11-27                 3812.0                -0.054412
4 2023-11-26                 5107.0                 0.491985


**Solar Radiation**

In [None]:
# Impute missing values in the target variable (solar radiation)
imputer_target = SimpleImputer(strategy='mean')

# Reshape y_train_solar for imputation
y_train_solar_reshaped = y_train_solar.reshape(-1, 1)
y_train_solar_imputed = imputer_target.fit_transform(y_train_solar_reshaped).ravel()

# Train the MLP model for solar radiation prediction
mlp_model_solar = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=500, random_state=42)
#mlp_model_solar.fit(X_train_imputed, y_train_solar)

# Now, retrain your model using the imputed target variable
mlp_model_solar.fit(X_train_imputed, y_train_solar_imputed)

# Proceed with prediction and evaluation as before
y_pred_solar = mlp_model_solar.predict(X_test_imputed)

mae_solar = mean_absolute_error(y_test_solar, y_pred_solar)
rmse_solar = np.sqrt(mean_squared_error(y_test_solar, y_pred_solar))

print(f'Mean Absolute Error for Solar Radiation: {mae_solar}')
print(f'Root Mean Square Error for Solar Radiation: {rmse_solar}')

# Assuming 'test_dates' is a Series or list with the dates corresponding to y_test_solar
comparison_solar = pd.DataFrame({
    'Date': test_dates,
    'Actual Solar Radiation': y_test_solar,
    'Predicted Solar Radiation': y_pred_solar
})
print(comparison_solar.head(5))


Mean Absolute Error for Solar Radiation: 80.95408881189307
Root Mean Square Error for Solar Radiation: 160.35346221671827
        Date  Actual Solar Radiation  Predicted Solar Radiation
0 2023-11-30                  4460.0                4541.539236
1 2023-11-29                  4626.0                4672.979376
2 2023-11-28                  6134.0                6214.097146
3 2023-11-27                  3812.0                3942.978314
4 2023-11-26                  5107.0                5107.581067






```
Mean Absolute Error for Solar Radiation: 80.95408881189307
Root Mean Square Error for Solar Radiation: 160.35346221671827
        Date               Actual Solar            Predicted Solar
0 2023-11-30                  4460.0                4541.539236
1 2023-11-29                  4626.0                4672.979376
2 2023-11-28                  6134.0                6214.097146
3 2023-11-27                  3812.0                3942.978314
4 2023-11-26                  5107.0                5107.581067
```



MAE (80.9541): On average, the model's predictions for solar radiation deviate from the actual measurements by about 80.95 units. Given the context of solar radiation (likely measured in watts per square meter or a similar unit), this error magnitude suggests that the model provides a relatively close approximation of the actual solar radiation values.

RMSE (160.3535): The higher value of RMSE compared to MAE indicates that there are some predictions with significant errors, as RMSE gives more weight to larger errors due to the squaring of the error terms. The difference between MAE and RMSE suggests the presence of outliers or predictions with substantial errors.

The actual vs. predicted solar radiation values for the first five data points show that the model can closely predict solar radiation for certain days, with some predictions being very close to the actual values (e.g., on 2023-11-26, the prediction is almost exactly the actual value). However, the presence of larger errors that contribute to the high RMSE value indicates variability in the model's predictive accuracy across different days.

**Pressure**

In [None]:
# Train the MLP model for average pressure at sea level prediction
mlp_model_pressure = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=500, random_state=42)
mlp_model_pressure.fit(X_train_imputed, y_train_pressure)

# Predict average pressure at sea level on the test data
y_pred_pressure = mlp_model_pressure.predict(X_test_imputed)

# Evaluate the model's performance
mae_pressure = mean_absolute_error(y_test_pressure, y_pred_pressure)
rmse_pressure = np.sqrt(mean_squared_error(y_test_pressure, y_pred_pressure))

print(f'Mean Absolute Error for Avg Pressure Sea: {mae_pressure}')
print(f'Root Mean Square Error for Avg Pressure Sea: {rmse_pressure}')

# Compare actual vs. predicted average pressure at sea level for the first 5 data points
comparison_pressure = pd.DataFrame({
    'Date': test_dates,
    'Actual Avg Pressure Sea': y_test_pressure,
    'Predicted Avg Pressure Sea': y_pred_pressure
})
print(comparison_pressure.head(5))


Mean Absolute Error for Avg Pressure Sea: 0.2848134405909842
Root Mean Square Error for Avg Pressure Sea: 0.3982512011352726
        Date  Actual Avg Pressure Sea  Predicted Avg Pressure Sea
0 2023-11-30                   100.99                  101.188666
1 2023-11-29                   101.21                  101.430452
2 2023-11-28                   100.92                  101.396786
3 2023-11-27                   100.20                  100.717188
4 2023-11-26                   101.32                  101.284011




```
Mean Absolute Error for Avg Pressure Sea: 0.2848134405909842
Root Mean Square Error for Avg Pressure Sea: 0.3982512011352726
        Date                 Actual AvgPressure    Predicted AvgPressure
0 2023-11-30                   100.99                  101.188666
1 2023-11-29                   101.21                  101.430452
2 2023-11-28                   100.92                  101.396786
3 2023-11-27                   100.20                  100.717188
4 2023-11-26                   101.32                  101.284011
```



In [None]:
#giving the prediction based on a given date for 2023
# Assuming 'test_dates' and 'y_pred_pressure' contain dates and predictions
predictions_df = pd.DataFrame({'Date': test_dates, 'Predicted Pressure': y_pred_pressure})
predictions_df.to_csv('predicted_pressures.csv', index=False)

def get_saved_prediction(date_str):
    predictions_df = pd.read_csv('predicted_pressures.csv')
    prediction = predictions_df.loc[predictions_df['Date'] == date_str, 'Predicted Pressure'].iloc[0]
    print(f"Predicted Pressure for {date_str}: {prediction}")

    actual_pressure = weather_data[weather_data['date'] == date_str]['avg_pressure_sea'].iloc[0]

    print(f"Actual Pressure: {actual_pressure}")


# given this date get the prediction
get_saved_prediction('2023-11-29')


Predicted Pressure for 2023-11-29: 101.43045163397184
Actual Pressure: 101.21


MAE and RMSE for predicting the average pressure at sea level (avg_pressure_sea) are 0.2848 and 0.3983, respectively. These values indicate that, on average, the model's predictions for the average pressure at sea level deviate from the actual measurements by these amounts. While still relatively low, these errors are slightly higher than those for the average temperature predictions, suggesting that predicting atmospheric pressure might be a bit more challenging for the model given the dataset and features used.

**Evaluation**

**Mean Absolute Error (MAE)** is a metric used to evaluate the accuracy of a regression model. It is calculated as the average of the absolute differences between the predicted values and the actual values in the dataset. MAE provides a straightforward interpretation of the average error magnitude per data point without considering the direction of the errors. A lower MAE value indicates better model performance, with 0 being the ideal score, meaning the model's predictions are exactly equal to the actual values. In the context of MLP (Multilayer Perceptron) models and other regression models, MAE is particularly useful for understanding the average prediction error in the same units as the target variable being predicted.


**Root Mean Square Error (RMSE)** is another metric used to evaluate the performance of regression models. It is calculated as the square root of the average of the squared differences between the predicted values and the actual values. RMSE gives a sense of how spread out these errors are. In other words, it gives you an idea of the magnitude of error. Unlike MAE, RMSE gives a higher weight to larger errors, meaning it penalizes large errors more than smaller ones. This can be particularly useful when large errors are particularly undesirable. A lower RMSE value indicates better model performance, with 0 being the ideal score.


---


### Their Use in MLP Models

Both MAE and RMSE are crucial for evaluating and comparing regression models, including MLPs, as they provide different perspectives on the model's error characteristics. MAE is less sensitive to outliers since it considers the absolute value of errors, making it a robust measure of model performance. In contrast, RMSE is more sensitive to outliers due to the squaring of errors, offering insights into the model's performance when large errors are particularly problematic.

In practice, the choice between MAE and RMSE depends on the specific application and whether large errors are significantly more harmful than smaller ones. In many cases, both metrics are reported to provide a comprehensive view of the model's performance.

###Value Ranges and Interpretation


The range of both MAE and RMSE is from 0 to ∞, where 0 indicates perfect predictions with no error. Both metrics are in the same units as the target variable being predicted. This makes them intuitively easy to understand; for example, an MAE of 5 in a temperature prediction model would mean the model's predictions are off by an average of 5 degrees.
Lower values indicate better model performance, with a value of 0 representing perfect predictions. Higher values indicate worse model performance, suggesting larger discrepancies between the predicted and actual values.
Comparing MAE and RMSE can give insights into the error distribution: if RMSE is significantly higher than MAE, it suggests the presence of outliers or large errors in some predictions.
Practical Consideration: The choice between MAE and RMSE depends on the specific context of the problem and whether large errors are particularly undesirable. RMSE will "punish" large errors more heavily than MAE, making it a more stringent measure of model performance in cases where such errors are especially problematic.