# Developing AI Applications
* **Created by:** Eric Martinez
* **For:** CSCI 4341 - Special Topics in CS
* **At:** University of Texas Rio-Grande Valley

## Before you begin
The OpenAI API provides access to powerful LLMs like GPT-3.5 and GPT-4, enabling developers to leverage these models in their applications. To access the API, sign up for an API key on the OpenAI website and follow the documentation to make API calls.

For enterprise: Azure OpenAI offers a robust and scalable platform for deploying LLMs in enterprise applications. It provides features like security, compliance, and support, making it an ideal choice for businesses looking to leverage LLMs.
    
Options:
* [[Free] Sign-up for access to my OpenAI service](https://ericmichael-openai-playground-utrgv.hf.space/) - _requires your UTRGV email and student ID_
* [[Paid] Alternatively, sign-up for OpenAI API Access](https://platform.openai.com/signup)

## Step 0: Setup your `.env` file locally

Setup your `OPENAI_API_BASE` key and `OPENAI_API_KEY` in a file `.env` in this same folder.

```sh
# example .env contents (copy paste this into a .env file)
OPENAI_API_BASE=yourapibase
OPENAI_API_KEY=yourapikey
```

Install the required dependencies.

In [None]:
%pip install -q -r requirements.txt

## Rule of Thumb

If you can write down the instructions such that a competent human assistant can do it, then an LLM can probably do it too.

## LLMs are good at

- Writing Tasks
- Editing Tasks
- Programming Tasks
- Information Extraction / Summarization Tasks
- Extrapolation (Opposite of Summarization) Tasks
- Data Processing Tasks
- Learning from examples (in-context learning)

## LLMs are bad at

- Things humans are bad at (trained on human data)
- Reading your mind
- Following poor instructions
- Answering certain types of factional questions without a bank of facts to cite (hallucination)

What does the leading large language model, GPT-4, know?

<go find the chart>

#### How might GPT-4 have been trained? According to leaks and rumors

- Estimated cost to train: ~$63 MILLION
- Trained on data from the raw internet (controversial)
    - Large amounts of scraped data: books, journals, scholarly articles
    - Many code repositories from Github
    - Reddit
    - Websites
    - Twitter
    - YouTube transcripts
    - Movies / TV shows
- Refined from dataset produced by outsourced human annotators

## Workflow: Developing AI-Powered Applications

1. Prototyping: 
2. Prompt Tuning: 
3. Ideation / Experimentation: After prompt tuning efforts have been exhausted, thinking about novel design patterns, formats, or architectures that could lead to better results.
4. Automating a Minimum Viable Prototype
5. Collaboration
6. Engineering
7. Red-teaming
8. Monitoring

## Workflow: Prototyping

1. Prototyping: 
    - Creating quick prompts in the playground with small examples.

## Workflow: Prompt Tuning

2. Prompt Tuning: 
    - Incrementally improving the prompt, potentially adding things like in-context learning. 
    - Splitting into multiple prompts (chains). 
    - Avoiding coding. 
    - Looking for quick "feedback" in a low-stakes environment.

## Workflow: Ideation / Experimentation

3. Ideation / Experimentation: After prompt tuning efforts have been exhausted, thinking about novel design patterns, formats, or architectures that could lead to better results.
    - Retrieval Augmented Generation
    - Function Calling
    - Chain-of-thought - _Mention errata in previous lecture_
    - Novel execution architectures: tree-of-thoughts, graphs

## Workflow: Automating the MVP

4. Automating a Minimum Viable Prototype
    - Quickly prototype an MVP that proves the concept / interface.
    - Fail fast: It is ok for results to be wrong / inaccurate.
    - Iterate reasonably: Don't go and design crazy solutions yet. Stick to prompting techniques / chains. 

## Workflow: Collaboration
 
5. Collaboration
    - Involve stakeholders early
    - Collaborate with subject matter experts to review results, conceptually perform TDD using SME feedback.
    - Brainstorm improvement ideas
    - Discuss benchmarking criteria

## Workflow: Engineering

6. Engineering
    - Turn prototype into a software engineering project
    - Version Control
    - Automated testing
    - Validation / Benchmarking
    - CI/CD
    - Security
    - Iterate with advanced techniques
    - Deployment

## Workflow: Red-teaming

7. Red-teaming
    - _tbd_

## Workflow: Monitoring

8. Monitoring
    - _tbd_

## Workflow: Developing AI-Powered Applications

1. Prototyping: 
2. Prompt Tuning: 
3. Ideation / Experimentation: After prompt tuning efforts have been exhausted, thinking about novel design patterns, formats, or architectures that could lead to better results.
4. Automating a Minimum Viable Prototype
5. Collaboration
6. Engineering
7. Red-teaming
8. Monitoring

## Prototyping: How do we get LLMs to actually do things?
#### English as as Programming Language

In [None]:
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.

import openai

# Define a function to get the AI's reply using the OpenAI API
def get_ai_reply(message, model="gpt-3.5-turbo", system_message=None, temperature=0):
    # Initialize the messages list
    messages = []
    
    # Add the system message to the messages list
    if system_message is not None:
        messages += [{"role": "system", "content": system_message}]
    
    # Add the user's message to the messages list
    messages += [{"role": "user", "content": message}]
    
    # Make an API call to the OpenAI ChatCompletion endpoint with the model and messages
    completion = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature
    )
    
    # Extract and return the AI's response from the API response
    return completion.choices[0].message.content.strip()

## Prototyping
#### Example Task: Clean up some messy data

**Input:** "Patient John, aged 42 years, reported a stiff neck and a constant, throbbing headache for the past three days. Symptoms are severe in the morning. Patient Mary, 56 years old, complained about abdominal pain and vomiting for the last two days. Pain is severe and the vomiting is recurring after meals. Young patient Timothy, 7 years old, has been experiencing mild fever and a runny nose for five days, symptoms are worse in the evening. Lastly, patient Lisa, 36 years old, reported chronic low back pain for two weeks, the pain is moderate and often increases with physical activity."

**Output:** a nice spreadsheet

**Validation Example:** "Resident Noah, a 65-year-old, has had hypertension and intermittent chest discomfort for the last week. Discomfort is particularly noticeable after physical activity. Emma, a 25-year-old, has reported a rash and joint pain that began three days ago and seems to increase in severity at night. Another patient, Lucas, aged 33, has been experiencing frequent urination and weight loss over the past month, with mild discomfort reported. Finally, Olivia, a 48-year-old, has a recurring dry cough and feeling of breathlessness for the last two weeks, primarily occurring during the daytime."


## Prototyping
Go to playground and play around first

**Basic Prompt:** "You will be provided with unstructured data, and your task is to parse it into CSV format."

## Prototyping: A helpful structure that I use

- Supply 'priming' information - you are an (expert, assistant, programmer, ...)
- Supply general task information and instructions
- Supply format constraints
- Supply examples: known as _in-context learning_
- Supply additional context useful for solving the problem
- Supply the current task description

Note: You DO NOT need to use an LLM as a 'chatbot'

Activity: As a class try to improve the example

## Prototyping: Revised Example

```
You are an expert in medical transcription.

## Task

You will be provided with unstructured data, and your task is to parse it into CSV format.

## Format 

The headers for your CSV output are as follows:
patient_name, age, symptoms, duration, additional_information

Always include the header again in your response.

## Example

User: "Patient John, aged 42 years, reported a stiff neck and a constant, throbbing headache for the past three days. Symptoms are severe in the morning. Patient Mary, 56 years old, complained about abdominal pain and vomiting for the last two days. Pain is severe and the vomiting is recurring after meals. Young patient Timothy, 7 years old, has been experiencing mild fever and a runny nose for five days, symptoms are worse in the evening. Lastly, patient Lisa, 36 years old, reported chronic low back pain for two weeks, the pain is moderate and often increases with physical activity."

Assistant:
patient_name, age, symptoms, duration, additional_information
John,42,"Stiff neck and constant, throbbing headache",Three days,Severe in the morning
Mary,56,Abdominal pain and vomiting,Two days,Severe and recurring after meals
Timothy,7,Mild fever and a runny nose,Five days,Worse in the evening
Lisa,36,Chronic low back pain,Two weeks,Moderate and increases with physical activity

## Begin
```

## Fast-Forward: Automating

In [None]:
def text_to_csv_string(text):
    prompt="""
    You are an expert in medical transcription.

    ## Task

    You will be provided with unstructured data, and your task is to parse it into CSV format.

    ## Format 

    The headers for your CSV output are as follows:
    patient_name, age, symptoms, duration, additional_information

    Always include the header again in your response.

    ## Example

    User: "Patient John, aged 42 years, reported a stiff neck and a constant, throbbing headache for the past three days. Symptoms are severe in the morning. Patient Mary, 56 years old, complained about abdominal pain and vomiting for the last two days. Pain is severe and the vomiting is recurring after meals. Young patient Timothy, 7 years old, has been experiencing mild fever and a runny nose for five days, symptoms are worse in the evening. Lastly, patient Lisa, 36 years old, reported chronic low back pain for two weeks, the pain is moderate and often increases with physical activity."

    Assistant:
    patient_name, age, symptoms, duration, additional_information
    John,42,"Stiff neck and constant, throbbing headache",Three days,Severe in the morning
    Mary,56,Abdominal pain and vomiting,Two days,Severe and recurring after meals
    Timothy,7,Mild fever and a runny nose,Five days,Worse in the evening
    Lisa,36,Chronic low back pain,Two weeks,Moderate and increases with physical activity

    ## Begin
    """

    csv_text = get_ai_reply(text, system_message=prompt) # get csv from openai
    return csv_text

text = "Resident Noah, a 65-year-old, has had hypertension and intermittent chest discomfort for the last week. Discomfort is particularly noticeable after physical activity. Emma, a 25-year-old, has reported a rash and joint pain that began three days ago and seems to increase in severity at night. Another patient, Lucas, aged 33, has been experiencing frequent urination and weight loss over the past month, with mild discomfort reported. Finally, Olivia, a 48-year-old, has a recurring dry cough and feeling of breathlessness for the last two weeks, primarily occurring during the daytime."
print(text_to_csv_string(text))


## That's nice and all, but its still just a string... we can't really use that in code very easily

In [None]:
%pip install pandas 

In [None]:
import pandas as pd
import io   

def df_from_text(text):
    prompt="""
    You are an expert in medical transcription.

    ## Task

    You will be provided with unstructured data, and your task is to parse it into CSV format.

    ## Format 

    The headers for your CSV output are as follows:
    patient_name, age, symptoms, duration, additional_information

    Always include the header again in your response.

    ## Example

    User: "Patient John, aged 42 years, reported a stiff neck and a constant, throbbing headache for the past three days. Symptoms are severe in the morning. Patient Mary, 56 years old, complained about abdominal pain and vomiting for the last two days. Pain is severe and the vomiting is recurring after meals. Young patient Timothy, 7 years old, has been experiencing mild fever and a runny nose for five days, symptoms are worse in the evening. Lastly, patient Lisa, 36 years old, reported chronic low back pain for two weeks, the pain is moderate and often increases with physical activity."

    Assistant:
    patient_name, age, symptoms, duration, additional_information
    John,42,"Stiff neck and constant, throbbing headache",Three days,Severe in the morning
    Mary,56,Abdominal pain and vomiting,Two days,Severe and recurring after meals
    Timothy,7,Mild fever and a runny nose,Five days,Worse in the evening
    Lisa,36,Chronic low back pain,Two weeks,Moderate and increases with physical activity

    ## Begin
    """

    csv_text = get_ai_reply(text, system_message=prompt) # get csv from openai
    df = pd.read_csv(io.StringIO(csv_text)) # convert to pandas dataframe
    return df

In [None]:
text = "Resident Noah, a 65-year-old, has had hypertension and intermittent chest discomfort for the last week. Discomfort is particularly noticeable after physical activity. Emma, a 25-year-old, has reported a rash and joint pain that began three days ago and seems to increase in severity at night. Another patient, Lucas, aged 33, has been experiencing frequent urination and weight loss over the past month, with mild discomfort reported. Finally, Olivia, a 48-year-old, has a recurring dry cough and feeling of breathlessness for the last two weeks, primarily occurring during the daytime."
df = df_from_text(text)
display(df)

## It works!

So far we:
- prototyped a prompt
- iterated on it until it looked 'good enough'
- wrote code to automate the prompt and extract the information into CSV
- wrote code that uses the CSV as a Pandas Dataframe, which now allows us to use that data for a variety of purposes

## But... but... our user's don't use Python or Jupyter? This can't be it?

## Fine let's make this into an app then.

In [None]:
import gradio as gr

def parse_text(text):
    df = df_from_text(text)
    csv_location = "output.csv"
    df.to_csv(csv_location)
    return df, csv_location

with gr.Blocks() as demo:
    with gr.Tab("Unstructured Notes to CSV"):
        with gr.Row():
            with gr.Column():
                notes = gr.Textbox(label="Notes", lines=5)
                btn = gr.Button(value ="Submit")
                table = gr.Dataframe(label="Results")
                csv_file = gr.File(label="CSV Output", interactive=False)
            btn.click(parse_text, inputs = [notes], outputs = [table, csv_file])
    demo.launch(share=True)


## Key Takeaways so far

- LLMs know a lot more than the average person, for things it doesn't know well, include context, background, instructions, examples in the prompt.
- If you can formulate a problem and how to solve it, an LLM can probably do it.
- We can use code to automate and orchestrate these interactions.
- We can design user interfaces to make this more useful to humans.
- We can practice Agile methodologies when engaging with customers and stakeholders to incrementally deliver higher-quality results and foster collaboration.

That's it! 🎉 