# Using Generative AI to Help Write Python Code

<div class="alert alert-block alert-success">
<b>Objectives:</b> In recent years, Generative AI (GenAI)—large language models such as ChatGPT, Bing Chat, Bard, and GitHub Copilot—have become powerful tools that can assist with writing or explaining code. In this notebook, we will:

1. Show basic ways to prompt a GenAI system to write or explain Python code.
2. Highlight how prompts can significantly affect the quality of the output.
3. Discuss the importance of being transparent about when and how you used GenAI in your workflow.

<div>
<img src="https://github.com/loftytopping/DEES_programming_course/blob/master/images/copilot.png?raw=true" width="800" alt="Github CoPilot is one of the many LLM enabled tools that has emerged"/>
</div>    
    
    
> **Key takeaway**: GenAI can be a helpful assistant, but it’s not a replacement for your own understanding or thorough testing. Always treat GenAI as a coding partner, *not* an infallible authority.

</div>


## 1. Guiding Principles for Using GenAI <a name="Part1"></a>

1. **Prompt Strategy**: The better your prompt, the better the output.
2. **Verify the Code**: Always review and test any code that GenAI provides.
3. **Document Your Usage**: If you use or adapt code generated by an AI model, cite it in your project or notebook.

Below are some guidelines for your prompts:

- Be **specific**.
- Provide **context** (e.g., the type of data you are working with).
- Ask for **clarifications** if needed.
- Request a **step-by-step explanation**.

## 2. Example: Prompting GenAI for Python Code <a name="Part2"></a>

Suppose you are working with a dataset of global temperature anomalies in a CSV file and you want to quickly plot it in Python. Here’s a short example of how you might prompt a GenAI system.

**Prompt**:
```
I have a CSV file containing global monthly temperature anomalies with columns "Year", "Month", and "Temperature_Anomaly".
1. Write a Python function to load this file into a pandas DataFrame.
2. Show how to plot the temperature anomalies over time using matplotlib or seaborn.
3. Explain how the code works line by line.
```

**Potential GenAI Response** (annotated and verified by you afterward):

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

def load_temperature_data(csv_path):
    """
    Loads the temperature anomaly data into a pandas DataFrame.
    Args:
        csv_path (str): Path to the CSV file.
    Returns:
        pd.DataFrame: DataFrame containing the temperature anomalies.
    """
    df = pd.read_csv(csv_path)
    return df

def plot_temperature_anomalies(df):
    """
    Plots temperature anomalies over time.
    Args:
        df (pd.DataFrame): DataFrame with columns 'Year', 'Month', and 'Temperature_Anomaly'.
    """
    # Create a datetime column for plotting
    df['Date'] = pd.to_datetime(df[['Year', 'Month']].assign(DAY=1))
    plt.figure(figsize=(10, 5))
    plt.plot(df['Date'], df['Temperature_Anomaly'], label='Temperature Anomaly')
    plt.xlabel('Date')
    plt.ylabel('Temperature Anomaly (°C)')
    plt.title('Global Monthly Temperature Anomalies')
    plt.legend()
    plt.show()

# Example usage (uncomment if you have the data file):
# df = load_temperature_data("global_temp_anomalies.csv")
# plot_temperature_anomalies(df)

**Explanation**:

1. We import `pandas` and `matplotlib` to handle data and visualization.
2. `load_temperature_data` reads the CSV using `pd.read_csv` and returns a DataFrame.
3. `plot_temperature_anomalies` creates a `Date` column by combining Year and Month.
4. We plot the anomalies over time and set labels and titles to make it clear.

## 3. Beyond the Basics: Prompt Engineering Strategies <a name="Part3"></a>

1. **Use Step-by-Step Queries**:
   - Break down your requests rather than asking for everything at once.
   - Example prompts:
     - “Write a Python function that reads a CSV file.”
     - “Now show how to plot the data using matplotlib.”
     - “Explain how to annotate the plot with a line for the mean anomaly.”

2. **Ask for Explanations or Alternatives**:
   - “Provide an alternative approach using xarray for NetCDF files.”
   - “Explain each line of the code so I understand how it works.”

3. **Iterate**:
   - If the first result has errors or lacks clarity, ask the model to refine or correct the code.

## 4. Verifying & Documenting AI-Generated Code <a name="Part4"></a>

### 4.1 Testing & Quality Control

- **Test locally**: Run the code in a local environment or a Jupyter notebook cell to ensure it works with your specific dataset.
- **Check logic**: Make sure the code makes sense. AI can sometimes produce code that “looks right” but has logical errors.
- **Use version control**: Tools like Git can help you compare changes and keep track of when and where AI-generated snippets are used.

### 4.2 Transparency & Academic Integrity

In academic or scientific contexts, be explicit about your sources. For instance, in the comments or documentation of your notebook, note:

> “Code in this section was generated with the assistance of [Model Name / ChatGPT / Bing Chat / etc.] on [Date].”

This transparency allows future readers (and your instructors) to understand how the code was produced, and it adheres to ethical standards of attribution.

### 4.3 Recording Your Prompts
To fully document your use of GenAI, please use the following:

1. **Maintain a Prompt Log**: Keep a simple text file or Markdown file where you copy/paste the exact prompts you used, along with the date and time. This is helpful for assignments or projects requiring detailed provenance.
2. **Include Prompts in an Appendix**: If you’re submitting a written report or Jupyter notebook, you could add an "Appendix" section listing all the prompts you asked. This is particularly useful if you iterated several times on your queries.
3. **Inline Comments**: In your code, you can add brief comments noting the prompt or question you asked. For instance:
   ```python
   # Prompt: "Explain how to calculate a 5-year rolling average on this dataset."
   # Response was adapted and verified on 2023-08-10.
   ```
4. **University Requirements**: We will discuss any established protocol with you as time evolves.


## 5. Extended Example: Using GenAI with Earth-Science Libraries <a name="Part5"></a>

GenAI can assist with more specialized libraries commonly used by Earth scientists. For example:

**Prompt**:
```
Write a Python script that opens a netCDF file using xarray, prints its metadata,
calculates the global average of a variable named 'surface_temp', and saves
the result to a new netCDF file. Provide explanations.
```

**Potential GenAI Response** :

In [None]:
import xarray as xr

def process_surface_temperature(input_file, output_file):
    """
    Opens a netCDF file, calculates a global average of 'surface_temp',
    and saves the result to a new netCDF file.

    Args:
        input_file (str): Path to the input netCDF file.
        output_file (str): Path to the output netCDF file.
    """
    # Open dataset
    ds = xr.open_dataset(input_file)
    
    # Print metadata
    print("Dataset information:")
    print(ds)
    
    # Calculate global average of surface_temp (naive mean across lat/lon)
    mean_temp = ds.surface_temp.mean(dim=("lat", "lon"))
    
    # Convert to a dataset and save
    mean_temp_ds = mean_temp.to_dataset(name="global_mean_surface_temp")
    mean_temp_ds.to_netcdf(output_file)

# Example usage (uncomment if you have the data file):
# process_surface_temperature("in_file.nc", "out_file.nc")

**Explanation**:

1. **Library import**: `xarray` is a library specialized for labeled multi-dimensional arrays, often used for netCDF data in Earth science.
2. **Opening dataset**: We use `xr.open_dataset` to open the netCDF file.
3. **Metadata**: Printing the dataset reveals dimension sizes, coordinates, variable names, and attributes.
4. **Mean temperature**: We take the naive mean across `lat` and `lon`. For area-weighted means, you’d use something like `ds.surface_temp.weighted(ds.cell_area).mean(dim=('lat', 'lon'))`.
5. **Saving results**: We convert the `mean_temp` DataArray into a Dataset and save it to a new netCDF file.

> **Note**: This code snippet might require modifications (e.g., verifying that `cell_area` exists and has correct units). Always adapt and verify AI-generated code for your specific data structure.

## 6. Where Can I Access GenAI Services? <a name="Part6"></a>

There are several platforms and tools students can use to access large language models or AI coding assistants:

1. **[OpenAI ChatGPT](https://chat.openai.com/)**: A free-to-try web interface (with paid tiers) for conversational AI.
2. **[Bing Chat](https://www.bing.com/search?q=Bing+AI)**: Available through Microsoft's Edge browser or Bing.com search.
3. **[Google Bard](https://bard.google.com/)**: Google's conversational AI, accessible via a web interface (availability may vary by region).
4. **GitHub Copilot**: Paid subscription (free for some student plans), integrates directly with Visual Studio Code, GitHub Codespaces, etc.
5. **Hugging Face Models**: [Hugging Face](https://huggingface.co/) hosts various open-source LLMs that you can try in your browser or run locally.

> Always check for any usage limits and subscription tiers!


## 7. Conclusion

Generative AI can be a powerful ally in writing, refining, and explaining Python code for Earth science applications. However:

1. **Always test and verify** AI-generated code.
2. **Stay transparent** about when and how you use AI.
3. **Practice clear prompt strategies** to get better results.

**Next Steps**:
- Experiment with your own prompts and datasets in any of the notebooks.
- Incorporate GenAI into your workflow while keeping scientific rigor and best practices in mind.
- Document your AI usage for clarity and integrity.
