# GenAI Prompt Engineering & Python

This workshop is built on experiences I've had working with people at UVA on research projects. It is one thing to have a GenAI tool write some code for you. It is another to get it to write good code for you. Keep in mind, you (the human) are in control of the situation and need to guide Copilot, ChatGPT, or whatever AI tool needed to the correct results. Leveraging GenAI to help you write code can be a huge time saver if you know how to do it correctly.

## Scenario 1: Temperature Sensor Data

This was a project I worked on with a graduate student in Electrical Engineering. The lab had done a literature review on scholarly articles written on different temperature sensor implements. They compiled a spreadsheet with data from these articles and wanted to do some data visualization with the results. The student was struggling to plot this data in a meaningful way. I have removed identifying information from the dataset.

**Step 1:** Install and import libraries. If you have already installed Anaconda you surely have these libraries already.

In [None]:
!conda install --yes numpy pandas matplotlib

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

**Step 2:** Load data

In [None]:
# Load the Excel file and assign correct headers from row 5
df = pd.read_excel("TemperatureSensorsData.xlsx", sheet_name="Overview", engine="openpyxl", skiprows=5)

### Take a look at the data

I recommend now to pause and take a look at the data we have loaded with python's pandas library. Open the file *"TemperatureSensorsData.xlsx"*. This data was compiled by the student from 25 scientific articles. The spreadsheet was populated with data about all the sensors, capturing the sensor performance, size, energy output, and so on. I am not an electrical engineer and you don't have to be either to appreciate the rest of this exercise.

**Step 3:** Plot the data

The student wanted to plot the data in different ways, with different variables on the x,y axes. She didn't know how to do that. This is where GenAI comes in.

### Prompting

Now let's ask our GenAI tool (Copilot) for help. 

**Question 1** How to make a scatter plot with Supply Voltage on the x axis and Conversion Time on the y axis.

*Things to keep in mind*
- Load the dataset into the AI chatbot
- Tell the AI specifically which language and libraries you want to use (if you know) such as python pandas, matplotlib, numpy
- Tell it how you want the data to look. *Ex:* I want Supply Voltage on the x axis and Conversion Time on the y axis
- Specifically ask for the code to do this. Copilot tends to not give the code unless explicitly asked.

*Example Prompt:* Load this dataset as a python pandas dataframe. Use the matplotlib library to plot the dataset with "Supply Voltage" on the x axis and "Conversion Time" on the y axis. Show the code to do this. 
  

In [None]:
''' Enter code from GenAI here'''



### Before we move on

Don't just accept the code blindly. Think about these things before moving to the next step.
- Does the code run without errors?
- Does the output look correct?
- Do I understand the code? Depending on the circumstances, you don't necessarily have to understand every line but do you at least understand the code in chunks?
- Is the code commented and documented?
- Does the code handle exceptions, edge cases, bad data, missing data, null values, etc?

**Step 4:** Plot data with labels

Plotting the data is easy enough and the AI should be able to handle that. The next thing the student wanted to do was to add labels to all the dots on the scatterplot. Specifically, she wanted to label each dot with value from the "Paper Number" column. This would allow a quick reference of the performance of each temperature sensor device.

**Question 2:** How to make the same plot but label the points on the plot with the corresponding value in the "Paper Number" column?

*What would be your prompt now?*

In [None]:
'''Enter code from GenAI here'''



*Follow up thoughts*

The code is definitely more complex than the first, standard scatter plot. One particularly tricky part of it is how the labels are created. For example, the code for the labels could look something like this

```

# Annotate each point with the corresponding Paper Number
for _, row in df_clean.iterrows():
    plt.annotate(int(row['Paper Number']), 
                 (row['Supply Voltage (V)'], row['Conversion Time (ms)']),
                 textcoords="offset points", xytext=(5,5), ha='left', fontsize=8)

```

Do you understand what is going on here? If you don't how would you follow up on that?

*Example Prompt:* Further explain the section in which the labels are created. (You can copy & paste your code for that in with the prompt). Walk me through the code line by line with comments

**Question 3:** After looking more closely at the values in the 'Conversion Time' column, you'll see that there is a big range of values. In the plot, a lot of these dots look like they have a value of 0. Looking at the data in the spreadsheet, this is not true. The student suggested to use a logarithmic scale on the y axis. Honestly, I didn't really know what that meant. Let's walk through the steps we took to figure that out together.

*Learn about what is a Logarithmic Scale.* We learned about this outside of the context of this specific dataset first. I asked the AI "*What is a logarithmic scale? Cite your sources*". The purpose of this is to learn some basics from the AI about this topic and follow up with that via a credible source.

*How to make the y axis of the plot a logarithmic scale.* Once we read about and determined that a logarithmic scale was in fact what the student wanted, we prompted the AI to do that. Give it a shot below.

*Lastly.* Again, make sure the code is doing what you intend and that you understand it.  

In [None]:
'''Enter code from GenAI here'''


## Scenario 2: Soil Sample Data

This is another example of a project I worked on with a graduate student in Environmental Science over the course of the summer 2025. This is a slimmed down dataset from the original, just focusing on the last part of the process, which was visualizing the data. The original dataset was daily readings from a research site in Wisconsin, taking soil samples and capturing measurements of many different chemical elements in the soil. This is time-series data, meaning there was a daily measurement taken over the course of several months. 

**Step 1:** Load the Data

In [None]:
# Load the Excel file and assign correct headers from row 5
df = pd.read_csv("soil_elements_timeseries.csv")

We had to think about how we wanted to plot the data. The student didn't know what the data looked like. We started with a simple line plot. 

**Step 2:** Plot the Data

When plotting this data, we want the dates on the X axis and CO2 values on the Y axis. The unit of measurement for the C02 data is parts per million. We want to load this dataset with the python pandas library and plot it with the Matplotlib library.

In [None]:
''' Enter code from GenAI here '''



### Before we move on

As we did previously, evaluate both the code and the resulting plot. Do you understand it? Do you like it? Is it showing the correct variables, dates, etc. 

### A few things to consider

I don't know exactly what your output will look like but in my testing, a few things to consider include...
- There is missing data in some rows, meaning there was no data gathered for that date. Are these represented correctly on the graph?
- Is a line plot the best way to represent this data? Maybe a scatter plot would be better?
- Is the unit of measurement shown on the Y axis. Remember, this is parts per million.
- Are the labels on the X axis represented accurately?
- Do you understand all the lines of the code and what they are doing?

If these or any other issues are present, try again with a new prompt below.

In [None]:
''' Enter code from GenAI here '''