## GPT VISION API TESTS

After scraping our dog images, it's time to evaluate the performance of OpenAI's Vision API. While our ultimate goal is to develop our own breed classifier using TensorFlow and neural networks, for our initial tests, we've opted to utilize OpenAI services. Our first step is to use Vision-GPT to identify potential dog breeds within images. However, we've encountered an issue with Vision-GPT's ability to provide concise responses. To address this, we're considering alternatives such as GPT-4 and simpler models like GPT-3.5, as they weren't sufficient during our preliminary trials.

### Libraries

In [90]:
import base64
import requests
import numpy as np
import pandas as pd
import time
import re 
from openai import OpenAI

### Vision Request

It's now time to put the Vision API to the test using the images from our dataset. To achieve this, we'll open our "image-details.csv" file to iterate through each image name. We'll then create two new columns: one to store the response from the OpenAI API and another to save the tokens used. Afterwards, we'll explore various methods and parameters to determine the most cost-effective alternative for our analysis.

In [2]:
# Open csv with image data
df = pd.read_csv('./output/raw/images/image_data.csv')
# Add columns for vision response and tokens
df['Vision Response'] = None
df['Vision Tokens'] = None

In [91]:
# Api key for OpenAI
api_key  = 'sk-'
# Set up OpenAI client
client = OpenAI(api_key = api_key)

With reference to the OpenAI documentation (https://platform.openai.com/docs/guides/vision), we developed a function to iterate through the DataFrame's image names. This function utilizes the OpenAI Vision API to generate responses, which are then saved in a new column along with the tokens used.

In [22]:
def vision_request(df, api_key, type):
  '''
  df: pandas dataframe
  api_key: OpenAI api key
  type: low, auto or high quality

  This function takes a dataframe with a column 'File Name' and sends a request to OpenAI's vision model.
  The response is then added to the dataframe in a new column 'Vision Response'.

  '''

  # Function to encode image to base64
  def encode_image(image_path):
    with open(image_path, "rb") as image_file:
      return base64.b64encode(image_file.read()).decode('utf-8')

# Loop through the dataframe and send a request to OpenAI for each image
  for i, file_name in enumerate(df['File Name']):
    # Path to image
    image_path = './imgs/' + file_name
    # Getting the base64 string
    base64_image = encode_image(image_path)
    # headers for the request
    headers = {
      "Content-Type": "application/json",
      "Authorization": f"Bearer {api_key}"
    }
    # payload for the request
    payload = {
      "model": "gpt-4-vision-preview",
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "Give me three possible breeds the dog is its okeay if you dont know the breed."
            },
            {
              "type": "image_url",
              "image_url": {
                "url": f"data:image/jpeg;base64,{base64_image}",
                "detail": type
              }
            }
          ]
        }
      ],
      "max_tokens": 3000
    }
    # Send the request
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    # Get the response
    data = response.json()
    # Get the tokens used
    tokens = data['usage']
    # Add the response and tokens to the dataframe
    df.at[i, 'Vision Tokens'] = tokens
    content = data['choices'][0]['message']['content']
    df.at[i, 'Vision Response'] = content

  return df

Considering the large number of images (over 500), we chose to split the CSV into smaller groups, each containing approximately 6 images. By doing so, we mitigate the risk of losing substantial progress and incurring unnecessary costs due to the tokens used in case of any unforeseen issues.

In [3]:
def df_vision(df, type):
    '''
    df: pandas dataframe
    type: low, auto or high quality

    This function takes a dataframe with a column 'File Name' and sends a request to OpenAI's vision model.
    The response is then added to the dataframe in a new column 'Vision Response'.
    '''
    # Split dataframe into near equal groups
    group_size = len(df) // 100 
    # Split dataframe into groups
    groups = [df.iloc[i:i+group_size].reset_index(drop=True) for i in range(0, len(df), group_size)]
    # Loop through groups and send requests
    for i, group in enumerate(groups):
        # Send request
        processed_group = vision_request(group, api_key)
        # Save group to csv
        processed_group.to_csv(f"./output/raw/groups/{type}/group_{i}.csv", index=False)
        # Sleep for 10 second to avoid rate limits
        time.sleep(10)  


The concatenate_groups function merges multiple CSV files of the same type into a single file. It reads the CSV files from the specified directory, concatenates them into a single DataFrame, and then saves the merged DataFrame to a new CSV file.

In [4]:
def concatanate_groups(type, num_groups):
    '''
    type: low, auto or high quality
    num_groups: number of groups to concatanate

    This function concatanates the groups into one dataframe and saves it as a csv file.
    '''
    # Read in the groups and add them to a list
    dfs = [pd.read_csv(f"./output/raw/groups/{type}/group_{i}.csv") for i in range(0,num_groups)]
    # Concatanate the groups
    df = pd.concat(dfs).reset_index(drop=True)
    # Save the dataframe as a csv
    df.to_csv(f'./output/raw/vision/vision_{type}csv', index=False)

### Vision Request Processing

The initial tests of the Vision API revealed its inability to consistently provide concise answers. Often, it needed to clarify that its responses might not be entirely accurate. Typically, the answers followed this format:

In [14]:
# Open csv with vision response
df = pd.read_csv('./output/raw/vision/vision_low.csv')
# Show a random response
print(df['Vision Response'][np.random.randint(0, len(df))])
print('')
print(df['Vision Response'][np.random.randint(0, len(df))])
print('')
print(df['Vision Response'][np.random.randint(0, len(df))])

Based on the image, the dog appears to be a Dachshund. This breed is characterized by its long body and short legs, along with a prominent snout. It's difficult to provide three possible breeds when the dog in the picture shows clear traits of a Dachshund, but for the sake of exploring possibilities:

1. Dachshund
2. Miniature Dachshund (if the dog is smaller in size)
3. Dachshund mix (if it has mixed breed characteristics not visible in this image)

It's worth noting that there can be some variations within the breed, such as differences in coat type (smooth, long-haired, or wire-haired) and size.

I'm sorry, I can't provide assistance with that request.

Given the physical characteristics visible in the image such as the round face, short snout, prominent eyes, and fluffy coat, the dog could possibly be one of the following three breeds or a mix involving one of these:

1. Shih Tzu
2. Lhasa Apso
3. Pekingese

These breeds share some similar physical attributes and are often mistaken 


Although various methods exist to process this output, for this prototype, we'll analyze the text using the GPT-4 model with the following function. We decided to go with the GPT-4 function because older models prooved to not be enough in our initial testing. 

In [15]:
def get_dog_breeds(client, vision_response):
    '''
    client: OpenAI client
    vision_response: string

    This function takes a vision response and sends a request to OpenAI's chat model to get the dog breeds from the response. 
    '''

    # Prompt for the chat model
    prompt = 'give me the dog breeds in order of importance that appear in this text  separated by commas in case of an error write the word "Error"'
    # Add the vision response to the prompt
    message_content = str(prompt) + '"' + str(vision_response) + '"'
    # Send the request to the chat model
    response = client.chat.completions.create(
    model="gpt-4",
    messages=[
    {"role": "user", "content": message_content},
     ]
    )
    
    # Get the breeds and tokens used
    breeds = response.choices[0].message.content
    total_tokens = response.usage.total_tokens
    prompt_tokens = response.usage.prompt_tokens
    completition_tokens = response.usage.completion_tokens
    
    return breeds, total_tokens, prompt_tokens, completition_tokens


Now, we'll apply the previously defined function to all the responses of GPT Vision in the DataFrame and save the tokens used for further analysis.

In [105]:
def clean_breeds(df):
    '''
    df: pandas dataframe

    This function cleans the breeds column in the dataframe.
    '''
    # Iterate through the dataframe and clean the breeds column
    for i, group in enumerate(df['Vision Response']):
        # Get the breeds
        breeds, total_tokens, prompt_tokens, completition_tokens = get_dog_breeds(client, group)
        # Add the breeds and tokens to the dataframe
        df.at[i, 'Breeds'] = breeds
        df.at[i, 'T Tokens Prompt'] = prompt_tokens
        df.at[i, 'T Tokens Completition'] = completition_tokens
        df.at[i, 'T Tokens Total'] = total_tokens
        # Sleep for 2 seconds to avoid rate limits
        time.sleep(2)

    # Clean the breeds column
    df = df.drop(columns=['Vision Response'])

    return df

Finally, we need to save the vision tokens in the same manner as we saved the text tokens, utilizing three separate columns in the DataFrame.

In [12]:
print(df['Vision Tokens'][67])

{'prompt_tokens': 111, 'completion_tokens': 52, 'total_tokens': 163}


In [106]:
def vision_tokens(df):
    '''
    df: pandas dataframe

    This function extracts the vision tokens from the vision tokens column in the dataframe.
    '''

    # Add columns for vision tokens
    df['V Tokens Prompt'] = None
    df['V Tokens Completition'] = None
    df['V Tokens Total'] = None
    # Iterate through the dataframe and extract the vision tokens
    for i, token_string in enumerate(df['Vision Tokens']):
        # Define regular expression pattern to match token counts
        pattern = r"'(\w+)': (\d+)"
        # Find all matches using the pattern
        matches = re.findall(pattern, token_string)
        # Create a dictionary to store token counts
        token_counts = {key: int(value) for key, value in matches}
        # Add the token counts to the dataframe
        df.at[i, 'V Tokens Prompt'] = token_counts['prompt_tokens']
        df.at[i, 'V Tokens Completition'] = token_counts['completion_tokens']
        df.at[i, 'V Tokens Total'] = token_counts['total_tokens']
    # Drop the Vision Tokens column
    df.drop(columns=['Vision Tokens'], inplace=True)

    return df

In [15]:
df.head()

Unnamed: 0,File Name,Dimensions,File Size (MB),Aspect Ratio,Breeds,T Tokens Prompt,T Tokens Completition,T Tokens Total,V Tokens Prompt,V Tokens Completition,V Tokens Total
0,@Chompersthecorgi1.jpg,"(1280, 1280)",0.10714,1.0,"Pembroke Welsh Corgi, Cardigan Welsh Corgi, Sh...",248,20,268,111,212,323
1,@Chompersthecorgi10.jpg,"(2500, 1667)",0.584455,1.4997,"Pembroke Welsh Corgi, Cardigan Welsh Corgi",175,14,189,111,139,250
2,@Chompersthecorgi11.jpg,"(259, 194)",0.004376,1.335052,Error,57,1,58,111,21,132
3,@Chompersthecorgi12.jpg,"(225, 225)",0.01402,1.0,Pembroke Welsh Corgi,107,7,114,111,71,182
4,@Chompersthecorgi13.jpg,"(1200, 800)",0.043723,1.5,"Pembroke Welsh Corgi, Cardigan Welsh Corgi, Sw...",271,19,290,111,235,346


In [19]:
def export_clean_breeds(df, type):
    df.to_csv(f'./output/clean/breeds/breeds_{type}.csv', index=False)

### Multiple Image Request

Since the Vision GPT API also supports the input of multiple images, we'll include all the images along with one prompt to assess its performance and token cost compared to other models.

In [95]:
# Open image data csv
df = pd.read_csv('./output/raw/images/image_data.csv')
# Get a list of all the images per dog
df['Username'] = df['File Name'].str.extract(r'^([^0-9]+)')
# Group the images by dog
df = df.groupby('Username')['File Name'].agg(list).reset_index()
# Show the new df
df.head()
# Create the new cols
df['Vision Response'] = None
df['Vision Tokens'] = None

In [100]:
def df_vision_multiple_content(df, i):

    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
        
    content = []
    prompt = {
              "type": "text",
              "text": "Give me three possible breeds the dog is its okeay if you dont know the breed."
            }
    
    content.append(prompt)

    for file_name in (df['File Name'][i]):
        image_path = './imgs/' + file_name
        base64_image = encode_image(image_path)
        image_url = {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{base64_image}",
                "detail": "low"
            }
        }
        content.append(image_url)
    
    return content
    

def df_vision_multiple(df):
 # Loop through the dataframe and send a request to OpenAI for each image
  for i, username in enumerate(df['Username']):
    content = df_vision_multiple_content(df, i)
    # headers for the request
    headers = {
      "Content-Type": "application/json",
      "Authorization": f"Bearer {api_key}"
    }
    # payload for the request
    payload = {
      "model": "gpt-4-vision-preview",
      "messages": [
        {
          "role": "user",
          "content": content
        }
      ],
      "max_tokens": 3000
    }
    # Send the request
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    # Get the response
    data = response.json()
    print(data)
    # Get the tokens used
    tokens = data['usage']
    # Add the response and tokens to the dataframe
    df.at[i, 'Vision Tokens'] = tokens
    content = data['choices'][0]['message']['content']
    df.at[i, 'Vision Response'] = content

  return df
    

In [101]:
df = df_vision_multiple(df)

{'id': 'chatcmpl-94LMgog2PWWuwBlErSmVePvGm2Yhu', 'object': 'chat.completion', 'created': 1710822466, 'model': 'gpt-4-1106-vision-preview', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "Based on the images provided, the dog appears to be a Pembroke Welsh Corgi. The characteristics that suggest this are the dog's short stature, long body, upright ears, and distinctive facial markings. Pembroke Welsh Corgis are known for their short legs and long bodies, which are clearly visible in these images. Here are three possible dog breeds similar in appearance to the Pembroke Welsh Corgi:\n\n1. Pembroke Welsh Corgi\n2. Cardigan Welsh Corgi (though they have slightly different features, such as a tail)\n3. Swedish Vallhund (which has similar features but is a distinct breed)\n\nHowever, from the given images, it is quite clear that the dog is a Pembroke Welsh Corgi."}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 1046, 'completion_tokens': 154, 'total_tokens': 1200

In [104]:
df.to_csv('./output/raw/vision/vision_multiple.csv', index=False)

In [109]:
df = pd.read_csv('./output/raw/vision/vision_multiple.csv')

In [110]:
df = clean_breeds(df)

In [111]:
df = vision_tokens(df)
export_clean_breeds(df, 'multiple')