### Assignment 1: Model Exploration and Adjustment on ClimateBench


This assignment encourages a thorough exploration of machine learning techniques in the context of climate science, focusing on hands-on experience and critical analysis. You will be working with the [ClimateBench](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021MS002954) dataset, which contains inputs and associated climate model output from a full complexity Earth System Model.

#### Objectives:
- Understand the application of different machine learning models to climate-related regression tasks.
- Explore and adjust baseline models (Random Forest, Gaussian Process Regression, CNN-LSTM) to enhance performance.
- Analyze and report on the effectiveness of model adjustments.



#### Assignment Details:
1. **Introduction (10% of grade)**
   - Provide a brief overview of ClimateBench and its importance in environmental science (1-2 sentences).
   - Explain the significance of regressing aerosol and greenhouse gas emissions onto climate model temperature responses (1-2 sentences).

2. **Model Exploration (30% of grade)**
   - Describe each baseline model (Random Forest, Gaussian Process Regression, CNN-LSTM):
     - Model architecture and reasoning behind its suitability for the task.
     - Current performance metrics provided by ClimateBench.
   - Implement each model using a provided dataset from ClimateBench.

3. **Model Adjustment (40% of grade)**
   - Make at least two significant adjustments to *one* model to improve performance. Adjustments might include:
     - Changing model parameters (e.g., number of layers, hidden units, kernel functions).
     - Modifying the input features (e.g., adding new features, transforming existing features).
     - Applying different data pre-processing techniques.
   - Justify each adjustment based on theoretical or empirical evidence.

4. **Results Analysis (15% of grade)**
   - Compare the performance of the original and adjusted models using appropriate statistical metrics (e.g., RMSE, MAE, R-squared).
   - Discuss the effectiveness of each adjustment in improving the model's performance.

5. **Conclusion (5% of grade)**
   - Summarize key findings and lessons learned from the model adjustments.
   - Suggest potential future work in applying machine learning to climate science based on the results obtained.



#### Rubric:
- **Introduction**
  - Clarity and completeness of the overview (5%)
  - Understanding of the task's relevance (5%)

- **Model Exploration**
  - Accuracy and detail in the description of each model (10%)
  - Correct implementation and reporting of baseline model performance (20%)

- **Model Adjustment**
  - Creativity and appropriateness of model adjustments (20%)
  - Justification of adjustments (10%)
  - Implementation accuracy (10%)

- **Results Analysis**
  - Clarity and accuracy in comparing model performances (10%)
  - Depth of discussion on model improvements (5%)

- **Conclusion**
  - Effectiveness of summary and reflection on the assignment (5%)


### Submission Guidelines:
- Format: Jupyter Notebook with annotations and text descriptions inline. Roughly 5-15 pages, including code and figures, if it were printed.
- Deadline: **1 week from the assignment date.**

## Environment and data specifics

To run the baseline models you will need a few specific packages, many of which you should already have installed. If you're using Google Colab (see the getting started notes [here](../../00_setup/python_env_setup.md)) you can install the necessary packages with:

```python
pip install xarray esem[keras,gpflow] eofs cartopy
```

You will also need to import the `utils.py` module included in [this](utils.py) repo. You can just download it and drag it into Colab to import it in your notebook.

The training and test data is all included in Google Drive in the `ClimateBench` folder linked to from Canvas. You can mount the drive in Colab with the following code:

```python
from google.colab import drive
drive.mount('/content/drive')
```

Once you've done this be sure to point the `data_path` variable in the `utils.py` module to the correct location on your Google Drive, it should be something like:
```python
data_path = "/content/drive/Shareddrives/SOPC-209 Data/ClimateBench/train_val_updated/"
```