# Using AI to Enhance Learning Python for the Molecular Sciences
This is a comprehensive tutorial on how to utilize Large Language Models (LLMs) and AI assistants to enhance your programming learning experience in molecular sciences. 
This notebook will guide you through various strategies and methods for effectively using AI as a coding mentor, 
understanding context engineering, leveraging study modes, debugging your code, and learning about specialized chemistry Python packages.

**Key Topics Covered:**
- Modern AI models (GPT, Claude, Gemini) and their capabilities
- Context engineering techniques for better AI interactions
- AI study modes for enhanced learning
- Prompt engineering best practices
- Debugging with AI assistance
- Chemistry-specific Python packages and AI integration

Keep in mind that AI models are continuously updated and examples in this notebook may give different results over time. 
The context of your conversation history can significantly influence AI responses.

This tutorial can be run on Google Colab, https://chemcompute.org, or locally using Jupyter.

**Updated for 2025** - Originally designed by Prof. Scott Reed for the MolSSI AI-assisted coding workshop, held August 17-18, 2024 at University of Colorado Denver. 

## Current AI Models and Their Capabilities

As of 2025, several powerful AI models are available for coding and scientific applications:

**Free Options:**
- **ChatGPT**: Available at https://chatgpt.com/ - 128K context window, excellent for coding tasks
- **Claude (Haiku)**: Available through https://claude.ai/ - Strong reasoning capabilities
- **Google Gemini**: Available at https://gemini.google.com/ - Good for multimodal tasks
- **Perplexity AI**: https://www.perplexity.ai/ - Excellent for research and web-connected queries

**Paid Options (Higher Capabilities):**
- **ChatGPT Plus**: 128K context, advanced reasoning, web browsing, code interpreter
- **Claude Pro (Sonnet/Opus)**: 200K context window, excellent for complex reasoning
- **GitHub Copilot**: Integrated coding assistant with real-time suggestions

**Key Capabilities to Understand:**
- **Context Windows**: Modern models handle 128K-200K tokens (roughly 100K-150K words)
- **Training Data**: Most models trained through 2024, with some having web access for current info
- **Multimodal**: Many can process images, documents, and code simultaneously
- **Function Calling**: Can interact with external tools and APIs

**Limitations to Remember:**
- May hallucinate or provide outdated information
- Context can influence responses within a conversation
- Performance varies by task complexity and domain specificity

## Few shot prompting
Few-shot prompting can be used as a technique to enable in-context learning for your GPT. The examples above are zero-shot prompting.
In zero-shot prompting no examples are provided to the GPT.

Few-shot prompting is a technique where you provide a model with a few examples of the type of information or answers you are seeking. 
This sets the context and helps the model understand the format and depth of response required. 
Here’s how you could structure a few-shot prompt for a chemistry problem.

Few-Shot Prompting Example in Chemistry
Problem: Predict the products of the following chemical reactions and explain the reaction type:

<span style="color:blue"><strong>CHAT PROMPT START:
</span>
Here are two examples of what I am looking to do.
Example 1:
Reaction: CH3CH2OH + PCC CH  3CH2
​	
 OH+PCC
Conditions: In dichloromethane at room temperature.
Products: 
CH3CHOCH3
​	
 CHO
Explanation: The reaction is an oxidation. Primary alcohols like ethanol react with pyridinium chlorochromate (PCC) under these conditions to form aldehydes.
Example 2:
Reaction: 
CH3COOH + CH3OH
CH3​	
COOH + CH3​OH
Conditions: With sulfuric acid as a catalyst.
Products: 
CH3COOCH3 +
H2O
CH3​	
COOCH3​	
 +H2O
Explanation: This is an esterification reaction. Acetic acid reacts with methanol in the presence of an acid catalyst to form methyl acetate and water.

Predict the products and explain the reaction type for the following reaction
Reaction: 
C6H5COOH
+
SOCl2
C6​H5​	
 COOH+SOCl2
​	
 
<span style="color:blue"><strong>CHAT PROMPT END:
</span>


## Explanation of the Few-Shot Prompt
Providing Examples: The examples act as a teaching guide. Each includes a different type of chemical reaction, explaining not just the products but also the type of reaction and conditions under which it occurs. This helps the model understand what kind of information is expected in the answer.

Structured Format: The consistent format across examples helps GPT-4 understand and maintain the structure in its response, making it more likely to provide both the reaction products and a succinct explanation of the reaction type.
By using this few-shot prompting technique, you help the model better understand the task and provide more accurate and contextually appropriate answers, especially for complex subjects like chemistry.

## Context Engineering: Advanced AI Interaction Techniques

Context engineering goes beyond simple prompt engineering by designing and managing the entire information environment that surrounds your AI interactions. This approach significantly improves AI performance, reduces hallucinations, and enables more sophisticated problem-solving.

### Key Components of Context Engineering

**1. System Instructions & Role Definition**
- Clearly define the AI's role and expertise level
- Set behavioral constraints and output formatting requirements
- Establish the scope of the AI's responsibilities

**2. Information Architecture**
- **Hierarchical Information**: Organize context from most to least important
- **Structured Formats**: Use tables, lists, and clear sections
- **Context Compression**: Summarize less critical information to fit token limits

**3. Memory Systems**
- **Short-term Memory**: Maintain conversation context
- **Long-term Memory**: Reference previous sessions or established facts
- **Working Memory**: Track current task progress and intermediate results

**4. Dynamic Context Assembly**
- **Retrieval-Augmented Generation (RAG)**: Fetch relevant information from databases
- **Tool Integration**: Connect to external APIs and data sources  
- **Real-time Updates**: Incorporate current information when needed


### Context Engineering Example for Chemistry

Here's how to apply context engineering to a molecular chemistry problem:

**Poor Context:**
```
Calculate the molecular weight of aspirin.
```

**Engineered Context:**
```
ROLE: You are an expert computational chemist with access to precise atomic masses.

TASK: Calculate the molecular weight of aspirin (acetylsalicylic acid)

CONTEXT:
- Molecular formula: C9H8O4
- Use IUPAC standard atomic masses (2021)
- Required precision: 4 decimal places
- Show step-by-step calculation

CONSTRAINTS:
- Must show atomic mass sources
- Include uncertainty if applicable
- Format output as: "Molecular Weight: XXX.XXXX g/mol"

VERIFICATION: Cross-check with known literature value (~180.16 g/mol)
```

This engineered approach provides clear role definition, specific requirements, verification criteria, and structured output formatting.


## AI Study Mode: Interactive Learning Strategies

AI study mode transforms AI assistants from answer-providers into interactive tutors that promote deeper learning through active engagement. This approach is particularly effective for complex scientific concepts.

### Core Study Mode Techniques

**1. Socratic Questioning**
Instead of giving direct answers, the AI guides you through discovery:

```
Student: "What's the hybridization of carbon in methane?"

Traditional AI: "The carbon in methane is sp³ hybridized."

Study Mode AI: "Let's think about this step by step. First, how many bonds does carbon form in methane? What does this tell us about the electron geometry around carbon?"
```

**2. Scaffolded Learning**
Breaking complex topics into manageable steps with checks for understanding:

```
AI: "Before we discuss molecular orbital theory, let's make sure you understand atomic orbitals. Can you describe what an s orbital looks like and how it differs from a p orbital?"
```

**3. Interactive Problem Solving**
The AI provides hints and guidance rather than complete solutions:

```
Student: "I'm stuck on this organic synthesis problem."

Study Mode AI: "I see you're trying to form a C-C bond. What are some common reactions that can create carbon-carbon bonds? Let's start with reactions you already know."
```


### Activating Study Mode in Different AI Platforms

**ChatGPT Study Mode (2024+)**
ChatGPT now has a built-in study mode feature. To activate:
1. Start a new conversation
2. Type: "I want to study [topic] in study mode"
3. Or use the study mode toggle in the interface

**Manual Study Mode Activation**
For any AI platform, use this prompt template:

```
STUDY MODE INSTRUCTIONS:
Act as my tutor for [subject/topic]. Instead of giving direct answers:
1. Ask guiding questions to help me discover answers
2. Provide hints when I'm stuck, not complete solutions
3. Check my understanding before moving to new concepts
4. Encourage me to explain my reasoning
5. Only give direct answers if I explicitly ask for them

Topic: [Your specific topic]
My current level: [Beginner/Intermediate/Advanced]
Learning goal: [What you want to achieve]
```

**Study Mode for Coding**
```
CODING TUTOR MODE:
Help me learn Python for chemistry applications. When I ask about code:
1. First ask what I think the solution approach should be
2. Guide me to identify the key concepts needed
3. Help me break down the problem into smaller steps
4. Let me write code first, then provide feedback
5. Only show complete solutions if I've attempted the problem

Current project: [Describe your coding goal]
```


## Strategies and Methods for Data Analysis Using AI

Modern AI assistants excel at data analysis tasks. Let's explore several approaches:

### Method 1: Direct Data Analysis
Download EPA water chemistry data from https://catalog.data.gov/dataset/cimek-water-chemistry. Select two columns of data and paste them into an AI prompt:

<span style="color:blue"><strong>CHAT PROMPT:</strong> I have water chemistry data with two variables [paste your data]. Create a Python script that:
1. Creates a professional scatterplot with proper labels and styling
2. Calculates correlation coefficient with statistical significance
3. Adds a trend line with confidence intervals
4. Provides interpretation of the results</span>

### Method 2: Using Context Engineering for Better Results
<span style="color:blue"><strong>ENGINEERED PROMPT:</strong>
ROLE: You are a data scientist specializing in environmental chemistry.

TASK: Analyze the relationship between [Variable 1] and [Variable 2] in water chemistry data.

DATA: [paste your data here] [Optionally include example graphs with desired format]

REQUIREMENTS:
- Use pandas for data handling
- Create publication-quality matplotlib plots
- Include statistical analysis (correlation, p-values)
- Handle any missing data appropriately
- Provide scientific interpretation of results

OUTPUT FORMAT:
- Well-commented Python code
- Brief explanation of findings
- Suggestions for further analysis</span> 

In [None]:
# Paste and run your code examples here


## Strategies and Methods for Using AI as a Coding Mentor
In this module, you'll learn how to use AI assistants for generating, testing, and troubleshooting Python code interactively. We'll also explore methods for integrating AI tools into your coding environment and development workflow.

The data for the water xyz file is also pasted here:

| atom     | x             | y             | z         |
|:---------|:--------------|:--------------|:----------|
| O        | 0.000000      | -0.007156     | 0.965491  
| H1       | -0.000000     | 0.001486      | -0.003471 
| H2       | 0.000000      | 0.931026      | 1.207929  

In the lesson materials, there is a file in the data folder called “water.xyz”. This is a very simple, standard file format that is often used to distribute molecular coordinates. The first line of the file is the number of atoms in the molecule, the second line is a title line (or may be blank), and the coordinates begin on the third line. The format of the coordinates is
Atom_Label  XCoor   YCoor   ZCoor
and the default units (which are used in this example) are angstroms.
Try writing a code to read in the information from the xyz file and determine the bond lengths between all the atoms. There is a numpy function to take the square root, numpy.sqrt(). To raise a number to a power, use **, as in 3**2 = 9. When you get stuck paste your script into GPT and ask for help.

In [None]:
# paste your final code solution here

Anohter option is to use GPT from the beginning to create a function that performs the desired action. 
For example try asking GPT to create a program that calculates the distance between each atom of water in the following coordinates:

In [None]:
# Here is an example of a GPT 4 generated answer.
import math

def calculate_distance(coord1, coord2):
    """Calculate the Euclidean distance between two 3D points."""
    return math.sqrt((coord2[0] - coord1[0]) ** 2 + (coord2[1] - coord1[1]) ** 2 + (coord2[2] - coord1[2]) ** 2)

def main():
    # Coordinates of atoms in a water molecule: O, H1, H2
    atoms = {
        'O': (0.000000, -0.007156, 0.965491),
        'H1': (-0.000000, 0.001486, -0.003471),
        'H2': (0.000000, 0.931026, 1.207929)
    }

    # Calculate distances between each pair of atoms
    atom_keys = list(atoms.keys())
    for i in range(len(atom_keys)):
        for j in range(i + 1, len(atom_keys)):
            atom1, atom2 = atom_keys[i], atom_keys[j]
            distance = calculate_distance(atoms[atom1], atoms[atom2])
            print(f"Distance between {atom1} and {atom2}: {distance:.6f}")

if __name__ == "__main__":
    main()

# Note: These last two lines are the common method to make a python script execute the main function

Next, try writing a prompt to accomplish the first project extension from Day 1: "Your initial project calculated the distance between every set of atoms. However, some of these atoms aren’t really bonded to each other. H1 and H2 are not bonded for example, and all of the distances between an atom and itself are zero. Use a distance cutoff of 1.5 angstroms to define a bond (that is, if the bond length is greater than 1.5 angstroms, consider the atoms not bonded). Modify your code to only print the atoms that are actually bonded to each other." This second approach can avoid problems that the first approach might cause. Too broad of a request comes with risk of unintended actions. Breaking a problem down and adding functionality incrementally can avoid such errors.

In [None]:
# Paste your code examples here and run the code

Section 6 of Python Scripting for Computational Molecular Science focuses on writing functions. That section suggests:
 "To think about where we should write functions in this code, let’s think about parts we may want to use again or in other places. One of the first places we might think of is in the bond distance calculation. Perhaps we’d want to calculate a bond distance in some other script. We can reduce the likelihood of errors in our code by defining this in a function (so that if we wanted to change our bond calculation, we would only have to do it in one place.)

Let’s change this code so that we write a function to calculate the bond distance. As explained above, to define a function, you start with the word def and then give the name of the function. In parenthesis are in inputs of the function followed by a colon. The the statements the function is going to execute are indented on the next lines. For this function, we will return a value. The last line of a function shows the return value for the function, which we can use to store a variable with the output value. Let’s write a function to calculate the distance between atoms."

This first paragraph could be used as a prompt in combination with the script below. Alternatively, you could try to accomplish the whole thing with a single prompt and not include the example code. Providing the example increases the chance it will perform as intended. The first approach could be quicker if you have not written an example yet, but will require a more detailed prompt.

In [None]:
# Starting code to functionalize using ChatGPT
import numpy
import os

file_location = os.path.join('data', 'water.xyz')
xyz_file = numpy.genfromtxt(fname=file_location, skip_header=2, dtype='unicode')
symbols = xyz_file[:, 0]
coordinates = xyz_file[:, 1:]
coordinates = coordinates.astype(numpy.float)
num_atoms = len(symbols)
for num1 in range(0, num_atoms):
    for num2 in range(0, num_atoms):
        if num1 < num2:
            x_distance = coordinates[num1, 0] - coordinates[num2, 0]
            y_distance = coordinates[num1, 1] - coordinates[num2, 1]
            z_distance = coordinates[num1, 2] - coordinates[num2, 2]
            bond_length_12 = numpy.sqrt(x_distance ** 2 + y_distance ** 2 + z_distance ** 2)
            if bond_length_12 > 0 and bond_length_12 <= 1.5:
                print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')

## Avoiding Hallucinations
Try asking ChatGPT 4.0-mini <span style="color:blue"><strong>CHAT PROMPT:</strong>  What is the molecular weight of ibuprofen?</span>
Try asking ChatGPT 4.0-mini <span style="color:blue"><strong>CHAT PROMPT:</strong> what is the molecular weight of CCO[C@H](COC1=CC=C(C=C1)C(F)(F)F)CSC2=CC(=C(C=C2)OCC(=O)O)C </span>

The first prompt should succeed since this information is in the training data. The second prompt failed at the time of writing. Why? What did it do wrong? Do other smiles strings work?

We will learn about tools to perform such calculations accurately later. For now assume there is no such tool and 
see if Chat GPT 4.0-mini can create one for you. 
Try the prompt: <span style="color:blue"><strong>CHAT PROMPT:</strong>Create a function that takes an input molecular formula and
calculates a molecular weight for the molecule.</span> Paste that function below and run the cell. Then test the function with 
the molecular formula for seladelpar which is:	C21H23F3O5S.



In [None]:
# Insert your function below

An example of a function provided by GPT 4o is provided below. It may not match the one you obtain.

In [None]:
def calculate_molecular_weight(formula):
    # Define atomic weights dictionary
    atomic_weights = {
        'H': 1.008, 'C': 12.01, 'N': 14.01, 'O': 16.00, 'P': 30.97, 'S': 32.07,
        'F': 19.00, 'Cl': 35.45, 'Br': 79.90, 'I': 126.90, 'B': 10.81, 'Si': 28.09
    }

    molecular_weight = 0
    i = 0
    n = len(formula)

    for i in range(n):
        if formula[i].isalpha():  # Element symbol
            element = formula[i]

            if i + 1 < n and formula[i + 1].islower():
                element += formula[i + 1]
                i += 1

            num_atoms = 1
            num_atoms_str = ''
            for j in range(i + 1, n):
                if formula[j].isdigit():
                    num_atoms_str += formula[j]
                else:
                    break

            if num_atoms_str:
                num_atoms = int(num_atoms_str)
                i += len(num_atoms_str)

            molecular_weight += atomic_weights[element] * num_atoms

    return molecular_weight

# Example usage:
formula = "C13H18O2"
mw = calculate_molecular_weight(formula)
print(mw)

In [None]:
# test your function

### Best Practices for AI-Enhanced Chemistry Learning (2025)

**1. Verify AI Outputs**
- Always cross-check critical calculations with reliable sources
- Use multiple AI models for important decisions
- Validate molecular structures and properties with experimental data

**2. Maintain Scientific Rigor**
- Request citations and sources when possible
- Ask for uncertainty estimates in calculations
- Understand the limitations of AI predictions

**3. Iterative Learning Approach**
- Start with simple concepts and build complexity
- Use AI to generate practice problems and solutions
- Create concept maps and study guides with AI assistance

**4. Integration with Traditional Resources**
- Combine AI insights with textbooks and peer-reviewed literature
- Use AI to explain complex concepts from your coursework
- Generate supplementary examples and analogies

**5. Collaborative Learning**
- Share AI-generated solutions with peers for discussion
- Use AI to facilitate group problem-solving sessions
- Create study groups that incorporate AI tools effectively

### Future Directions

As AI continues to evolve, expect:
- **Multimodal Capabilities**: AI that can interpret and generate chemical structures, spectra, and 3D molecular models
- **Real-time Laboratory Integration**: AI assistants that can guide experimental procedures and analyze results
- **Personalized Learning Paths**: AI that adapts to your learning style and knowledge gaps
- **Collaborative Research Tools**: AI that can assist in literature review, hypothesis generation, and experimental design

**Remember**: AI is a powerful tool to enhance your learning and research, but it should complement, not replace, fundamental understanding of chemistry principles and critical thinking skills.


## Debugging Your Code with GPTs
Use AI to debug your code, understand error messages, and refine your solutions. Option one is to paste your code and ask what errors it contains. Try this with the code below. Another option is to run your code, and copy the error message into your prompt and ask for suggested fixes. Try both of these with the error containing script below.

In [None]:
for number in energy_kcal:
    kJ = number * 4.184
    energy_kJ.append(kJ)
    print(energy_kJ)


## Chemistry Python Package Use through GPTs
Learn about tools like RDKit that GPTs are conversant in and how to expand a GPT's knowledge to meet your needs in molecular sciences.

Learn about tools like RDKit that GPTs are conversant in and how to expand a GPT's knowledge to meet your needs in molecular sciences.
Try asking GPT to calculate the molcular weight of CCO[C@H](COC1=CC=C(C=C1)C(F)(F)F)CSC2=CC(=C(C=C2)OCC(=O)O)C again. But this time ask it to use the rdkit module which is part of its python environment. 


In [None]:
import pandas as pd

data = pd.read_csv('distance_data_headers.csv')

# Calculate the mean of just the 'THR4_ATP' column using every row
mean_thr4_atp = data['THR4_ATP'].mean()

print("Mean of THR4_ATP column:", mean_thr4_atp)


## Optional Material:
## Prompt Engineering
## Goal 1: Specificity
Be clear and specific with your instructions: When asking for a chemical reaction mechanism, specify the type of reaction (e.g., nucleophilic substitution) and any particular conditions (e.g., solvent, temperature) that should be considered.
Clear tasks or questions will usually get better responses: For example, instead of asking "How does this compound react?", ask "What are the major products when compound X reacts with Y under acidic conditions?"
Longer, more detailed prompts often produce better results: Provide complete information about the reactants, possible catalysts, and the desired type of reaction outcome to help the model generate a more accurate response.
Delimiters can help the model separate conceptual sections of your prompt: For instance, you can differentiate between the reaction setup and the question by using bullet points or separating them with line breaks.
Context: Give ample context on what you’re trying to achieve and how: If you're looking for a synthesis pathway, describe the starting materials and the target molecule, including functional groups and stereochemistry.
Role prompting can help you give initial context on how the model should respond to future prompts: You could set up a scenario where the model acts as a research chemist who is designing a novel synthetic route.
Few-shot prompting means that you’re adding examples of your expected output to your prompt: Provide examples of similar chemical reactions or pathways to guide the model on the type of response you expect.
## Goal 2: Reasoning
Instruct the model to build complex answers incrementally instead of pushing for immediate answers: Request a step-by-step explanation of a reaction mechanism, detailing each intermediate and transition state.
Spelling out the necessary steps for completing the task helps the model correctly do tasks that would otherwise produce incorrect results: For computational chemistry tasks, describe each computational method or software to be used, along with the desired outputs (e.g., energy minimization, charge distribution).
Even without spelling out the steps yourself, you can often improve the results by adding a sentence that asks the model to tackle the challenge step by step: For example, "Please describe each step involved in the electrophilic addition of HBr to 1-butene."
When asking the model to assess whether a provided input is correct, ask the model to build its own solution first before deciding: If questioning the plausibility of a synthetic route, provide the route and ask the model to suggest possible improvements or identify potential issues in the proposed steps.
These adaptations help ensure that your prompts are tailored to generate more precise and relevant answers in a chemistry context, improving the utility of responses for educational, research, or practical applications.

## Strategies and Methods for Using GPTs as a Source for Code 
In this module, you will try writing a prompt that generates a stand alone script that has some utility but that you don't plan to expand upon. You will provide context from a paper from the literature that has data you want to visualize in a new or different way than presented in the paper.
You can pick your own paper. Or try the following example. Download the supporting information for the paper: [Emerging Brominated Flame Retardants in the Sediment of the Great Lakes](https://figshare.com/articles/journal_contribution/Emerging_Brominated_Flame_Retardants_in_the_Sediment_of_the_Great_Lakes/2538658?backTo=/collections/Emerging_Brominated_Flame_Retardants_in_the_Sediment_of_the_Great_Lakes/2506861)
Cut and past data from this document into the context window and add a prompt requesting code to create a graph of two specific columns. 

In [None]:
# Paste your code examples here and run them


In [None]:
# This is an example generated by GPT 4 to graph locations of HCDBCO samples using data from
# "Emerging Brominated Flame Retardants in the Sediment of the Great Lakes"
# Try running it and comparing it to the script you generated
# You can also paste the script or a part of it into your context window and ask GPT to explain it to you
# You may need to start by installing geopandas as follows:
# % pip install geopandas
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
# from shapely.geometry import Point

# Data from Table S2
data = {
    'Station': ['SU08', 'SU12', 'SU16', 'SU22', 'LM18', 'LM27', 'LM41', 'LM47', 'HU12', 'HU38', 'HU48', 'ER09', 'ER37', 'ON19', 'ON30', 'ON40'],
    'Latitude': [47.6059, 47.8552, 47.6217, 46.7997, 42.7333, 43.5988, 44.7365, 45.1788, 43.8902, 44.742, 45.2787, 42.5382, 42.1097, 43.3717, 43.5323, 43.5833],
    'Longitude': [-86.8180, -88.0465, -89.4631, -91.7484, -86.9997, -86.9143, -86.7221, -86.376, -82.0558, -82.0583, -82.4515, -79.6162, -81.5755, -79.3523, -76.9002, -78.0004],
    'HCDBCO_conc': [0.5, 0.3, 0.1, 0.4, 1.2, 0.5, 0.7, 0.8, 0.2, 0.4, 0.1, 0.6, 0.8, 0.9, 1.0, 1.2] # Sample concentrations
}

# Create DataFrame
df = pd.DataFrame(data)

# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))

# Set the coordinate reference system (CRS) to WGS84 (EPSG:4326)
gdf.set_crs(epsg=4326, inplace=True)

# Plot the map
fig, ax = plt.subplots(figsize=(10, 8))

# Plot points with color based on concentration
gdf.plot(ax=ax, column='HCDBCO_conc', cmap='coolwarm', markersize=50, legend=True, legend_kwds={'label': "HCDBCO Concentration (ng/g dw)", 'orientation': "horizontal"})

# Add labels for stations
for x, y, label in zip(gdf.geometry.x, gdf.geometry.y, gdf['Station']):
    ax.text(x, y, label, fontsize=8, ha='right')

# Set title and labels
ax.set_title('Surface Concentrations of HCDBCO in Sediment Core Sampling Sites')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

plt.show()
