## GPT VISION API TESTS

After scraping our dog images, it's time to evaluate the performance of OpenAI's Vision API. While our ultimate goal is to develop our own breed classifier using TensorFlow and neural networks, for our initial tests, we've opted to utilize OpenAI services. Our first step is to use Vision-GPT to identify potential dog breeds within images. However, we've encountered an issue with Vision-GPT's ability to provide concise responses. To address this, we're considering alternatives such as GPT-4 and simpler models like GPT-3.5, as they weren't sufficient during our preliminary trials.

### Libraries

In [19]:
import base64
import requests
import pandas as pd
import time
import re 
from openai import OpenAI

It's now time to put the Vision API to the test using the images from our dataset. To achieve this, we'll open our "image-details.csv" file to iterate through each image name. We'll then create two new columns: one to store the response from the OpenAI API and another to save the tokens used. Afterwards, we'll explore various methods and parameters to determine the most cost-effective alternative for our analysis.

In [2]:
# Open csv with image data
df = pd.read_csv('./output/raw/images/image_data.csv')
# Add columns for vision response and tokens
df['Vision Response'] = None
df['Vision Tokens'] = None

In [3]:
# Api key for OpenAI
api_key  = 'sk-'
# Set up OpenAI client
client = OpenAI(api_key = api_key)

With reference to the OpenAI documentation (https://platform.openai.com/docs/guides/vision), we developed a function to iterate through the DataFrame's image names. This function utilizes the OpenAI Vision API to generate responses, which are then saved in a new column along with the tokens used.

In [22]:
def vision_request(df, api_key, type):
  '''
  df: pandas dataframe
  api_key: OpenAI api key
  type: low, auto or high quality

  This function takes a dataframe with a column 'File Name' and sends a request to OpenAI's vision model.
  The response is then added to the dataframe in a new column 'Vision Response'.

  '''

  # Function to encode image to base64
  def encode_image(image_path):
    with open(image_path, "rb") as image_file:
      return base64.b64encode(image_file.read()).decode('utf-8')

# Loop through the dataframe and send a request to OpenAI for each image
  for i, file_name in enumerate(df['File Name']):
    # Path to image
    image_path = './imgs/' + file_name
    # Getting the base64 string
    base64_image = encode_image(image_path)
    # headers for the request
    headers = {
      "Content-Type": "application/json",
      "Authorization": f"Bearer {api_key}"
    }
    # payload for the request
    payload = {
      "model": "gpt-4-vision-preview",
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "Give me three possible breeds the dog is its okeay if you dont know the breed."
            },
            {
              "type": "image_url",
              "image_url": {
                "url": f"data:image/jpeg;base64,{base64_image}",
                "detail": type
              }
            }
          ]
        }
      ],
      "max_tokens": 3000
    }
    # Send the request
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    # Get the response
    data = response.json()
    # Get the tokens used
    tokens = data['usage']
    # Add the response and tokens to the dataframe
    df.at[i, 'Vision Tokens'] = tokens
    content = data['choices'][0]['message']['content']
    df.at[i, 'Vision Response'] = content

  return df

Considering the large number of images (over 500), we chose to split the CSV into smaller groups, each containing approximately 6 images. By doing so, we mitigate the risk of losing substantial progress and incurring unnecessary costs due to the tokens used in case of any unforeseen issues.

In [21]:
def df_vision(df, type):
    group_size = len(df) // 100  # Assuming you want approximately equal-sized groups
    groups = [df.iloc[i:i+group_size].reset_index(drop=True) for i in range(0, len(df), group_size)]

    for i, group in enumerate(groups[39:], start=39):
        processed_group = vision_request(group, api_key)
        processed_group.to_csv(f"./output/raw/groups/{type}/group_{i}.csv", index=False)
        time.sleep(10)  # Sleep for 10 second to avoid rate limits


In [6]:
def concatanate_groups(type, num_groups):
    dfs = [pd.read_csv(f"./output/raw/groups/{type}/group_{i}.csv") for i in range(0,num_groups)]
    df = pd.concat(dfs).reset_index(drop=True)
    df.to_csv(f'./output/raw/vision/vision_{type}csv', index=False)



In [8]:
print(df['Vision Response'][474])

Based on the image provided, it appears to be a fluffy dog with a long coat. Without seeing more of the dog's features, such as the face, it's a bit challenging to identify the breed with certainty. However, considering the coat type and assuming the dog is not a mixed breed or an atypical specimen, here are three possible breed guesses:

1. Old English Sheepdog: Known for their shaggy coat, which is evident in the photo.
2. Bearded Collie: They also have a long, flowing coat that could match what's seen in the image.
3. Polish Lowland Sheepdog: Similar to the Old English Sheepdog with a somewhat less dense coat but still long and shaggy.

Please note that these are just guesses, and without more information, it is difficult to determine the exact breed of the dog in the photo.


In [9]:

def get_dog_breeds(client, vision_response):
    prompt = 'give me the dog breeds in order of importance that appear in this text  separated by commas in case of an error write the word "Error"'
    message_content = str(prompt) + '"' + str(vision_response) + '"'
    
    response = client.chat.completions.create(
    model="gpt-4",
    messages=[
    {"role": "user", "content": message_content},
     ]
    )
    
    breeds = response.choices[0].message.content
    total_tokens = response.usage.total_tokens
    prompt_tokens = response.usage.prompt_tokens
    completition_tokens = response.usage.completion_tokens
    
    return breeds, total_tokens, prompt_tokens, completition_tokens


In [None]:
for i, group in enumerate(df['Vision Response']):
    breeds, total_tokens, prompt_tokens, completition_tokens = get_dog_breeds(client, group)
    print(breeds, total_tokens, prompt_tokens, completition_tokens)
    df.at[i, 'Breeds'] = breeds
    df.at[i, 'T Tokens Prompt'] = prompt_tokens
    df.at[i, 'T Tokens Completition'] = completition_tokens
    df.at[i, 'T Tokens Total'] = total_tokens
    time.sleep(2)

df = df.drop(columns=['Vision Response'])

In [11]:
df.head()

Unnamed: 0,File Name,Dimensions,File Size (MB),Aspect Ratio,Vision Tokens,Breeds,T Tokens Prompt,T Tokens Completition,T Tokens Total
0,@Chompersthecorgi1.jpg,"(1280, 1280)",0.10714,1.0,"{'prompt_tokens': 111, 'completion_tokens': 21...","Pembroke Welsh Corgi, Cardigan Welsh Corgi, Sh...",248,20,268
1,@Chompersthecorgi10.jpg,"(2500, 1667)",0.584455,1.4997,"{'prompt_tokens': 111, 'completion_tokens': 13...","Pembroke Welsh Corgi, Cardigan Welsh Corgi",175,14,189
2,@Chompersthecorgi11.jpg,"(259, 194)",0.004376,1.335052,"{'prompt_tokens': 111, 'completion_tokens': 21...",Error,57,1,58
3,@Chompersthecorgi12.jpg,"(225, 225)",0.01402,1.0,"{'prompt_tokens': 111, 'completion_tokens': 71...",Pembroke Welsh Corgi,107,7,114
4,@Chompersthecorgi13.jpg,"(1200, 800)",0.043723,1.5,"{'prompt_tokens': 111, 'completion_tokens': 23...","Pembroke Welsh Corgi, Cardigan Welsh Corgi, Sw...",271,19,290


In [12]:
print(df['Vision Tokens'][67])

{'prompt_tokens': 111, 'completion_tokens': 52, 'total_tokens': 163}


In [13]:
df['V Tokens Prompt'] = None
df['V Tokens Completition'] = None
df['V Tokens Total'] = None

In [14]:
for i, token_string in enumerate(df['Vision Tokens']):
    # Define regular expression pattern to match token counts
    pattern = r"'(\w+)': (\d+)"

    # Find all matches using the pattern
    matches = re.findall(pattern, token_string)

    # Create a dictionary to store token counts
    token_counts = {key: int(value) for key, value in matches}

    df.at[i, 'V Tokens Prompt'] = token_counts['prompt_tokens']
    df.at[i, 'V Tokens Completition'] = token_counts['completion_tokens']
    df.at[i, 'V Tokens Total'] = token_counts['total_tokens']

In [15]:
df.drop(columns=['Vision Tokens'], inplace=True)
df.head()

Unnamed: 0,File Name,Dimensions,File Size (MB),Aspect Ratio,Breeds,T Tokens Prompt,T Tokens Completition,T Tokens Total,V Tokens Prompt,V Tokens Completition,V Tokens Total
0,@Chompersthecorgi1.jpg,"(1280, 1280)",0.10714,1.0,"Pembroke Welsh Corgi, Cardigan Welsh Corgi, Sh...",248,20,268,111,212,323
1,@Chompersthecorgi10.jpg,"(2500, 1667)",0.584455,1.4997,"Pembroke Welsh Corgi, Cardigan Welsh Corgi",175,14,189,111,139,250
2,@Chompersthecorgi11.jpg,"(259, 194)",0.004376,1.335052,Error,57,1,58,111,21,132
3,@Chompersthecorgi12.jpg,"(225, 225)",0.01402,1.0,Pembroke Welsh Corgi,107,7,114,111,71,182
4,@Chompersthecorgi13.jpg,"(1200, 800)",0.043723,1.5,"Pembroke Welsh Corgi, Cardigan Welsh Corgi, Sw...",271,19,290,111,235,346


In [16]:
df.to_csv('./output/clean/breeds/breeds_low.csv', index=False)

In [18]:
df_auto = pd.read_csv("output/clean/breeds/breeds_auto.csv")
df_low = pd.read_csv("output/clean/breeds/breeds_low.csv")

error_count_auto = (df_auto['Breeds'] == 'Error').sum()
error_count_low = (df_low['Breeds'] == 'Error').sum()

print(f"Error count for auto: {error_count_auto}")
print(f"Error count for low: {error_count_low}")

Error count for auto: 154
Error count for low: 192


In [None]:
def vision_request_low(df, api_key):
  def encode_image(image_path):
    with open(image_path, "rb") as image_file:
      return base64.b64encode(image_file.read()).decode('utf-8')

  for i, file_name in enumerate(df['File Name']):
    image_path = './imgs/' + file_name


    # Getting the base64 string
    base64_image = encode_image(image_path)

    headers = {
      "Content-Type": "application/json",
      "Authorization": f"Bearer {api_key}"
    }

    payload = {
      "model": "gpt-4-vision-preview",
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": "Give me three possible breeds the dog is its okeay if you dont know the breed."
            },
            {
              "type": "image_url",
              "image_url": {
                "url": f"data:image/jpeg;base64,{base64_image}",
                "detail": "low"
              }
            }
          ]
        }
      ],
      "max_tokens": 3000
    }

    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    data = response.json()
    tokens = data['usage']
    df.at[i, 'Vision Tokens'] = tokens
    content = data['choices'][0]['message']['content']
    df.at[i, 'Vision Response'] = content
    print(content)

  return df