# Using GPT vision to label taxon colors

# README:

This is the *literal* notebook I used to label taxon colors. Because of that, it is messy and long! This was early on in usability of the OpenAI API and because of that there were a bunch of rate limitations and a bunch of query timeouts. In this notebook you can see that I'm submitting small batches of taxa at a time, and they are often cut off by errors, after which I have to loop back and submit the remaining for each batch. My error handling improves throughout the notebook as I encountered different kinds of issues that would occassionally cut off the batches early.

At the VERY end of this notebook, I go back and read in all of the csv files I've written out from the small labeling batches, and I write out one combined csv of taxon names matched to categorical colors: `../data/FULL_gpt_labeled_taxon.csv`. In the next notebook in this repo, I merge this taxon-specific color information with the iNaturalist exported observations from notebook 1 to create our final dataset.

### Imports

In [1]:
import pandas as pd
import numpy as np
from openai import OpenAI
import time

client = OpenAI(
    api_key='your-key-here'
)

In [35]:
inat_data = pd.read_csv('../data/combined_raw_inaturalist_export.csv')

  inat_data = pd.read_csv('../data/combined_raw_inaturalist_export.csv')


# Filtering the dataset

### Hybrids

In [36]:
# screen out all hybrid names (with the 'x' character)
hybrid_mask = ~np.array(['x' in str(i).split() for i in inat_data.scientific_name])
print(np.sum(~hybrid_mask))
inat_data = inat_data[hybrid_mask]

40


In [37]:
# there is a special character for x that we also have to screen out!
hybrid_mask = ~np.array(['×' in str(i).split() for i in inat_data.scientific_name])
print(np.sum(~hybrid_mask))
inat_data = inat_data[hybrid_mask]

2992


### Single words

In [38]:
# screen out all scientific names that are one word
single_names_mask = ~np.array([len(str(i).split())==1 for i in inat_data.scientific_name])
print(np.sum(~single_names_mask))
inat_data = inat_data[single_names_mask]

2805


### Add binomial name column to ignore subspecific ID

In [39]:
inat_data['binomial'] = [' '.join(str(i).split()[:2]) for i in inat_data.scientific_name]

# Get the unique species from the DataFrame based on the `binomial` column

In [43]:
unique_species = np.unique(inat_data.binomial)
len(unique_species)

13378

# Example of getting a photo for a taxon

In [2]:
import pyinaturalist

In [3]:
res = pyinaturalist.get_taxa('Monarda fistulosa')

In [4]:
total_results = res['total_results']
page = res['page']
per_page = res['per_page']
results = res['results']

photo = results[0]['default_photo']['medium_url']
photo

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/47763/medium.jpg'[0m

# Get the flower colors for the 90 most frequent species

## Get the species list

In [67]:
unique_species, counts = np.unique(inat_data.binomial, return_counts=True)

In [72]:
sorted_unique_species = unique_species[np.argsort(counts)[::-1]]

## Get the URL list using the iNaturalist API

In [73]:
urls = []
for taxon in sorted_unique_species[:90]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [76]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[:90],urls],index=['binomial','photo_url']).T

In [89]:
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Trillium grandiflorum,https://inaturalist-open-data.s3.amazonaws.com...
1,Dipterostemon capitatus,https://inaturalist-open-data.s3.amazonaws.com...
2,Trillium erectum,https://inaturalist-open-data.s3.amazonaws.com...
3,Sanguinaria canadensis,https://inaturalist-open-data.s3.amazonaws.com...
4,Trillium ovatum,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
85,Lysimachia ciliata,https://inaturalist-open-data.s3.amazonaws.com...
86,Adelinia grande,https://inaturalist-open-data.s3.amazonaws.com...
87,Sambucus canadensis,https://inaturalist-open-data.s3.amazonaws.com...
88,Malacothamnus fasciculatus,https://static.inaturalist.org/photos/19155029...


## Draft the GPT query

1) Specific formatting -- including color categories
2) Only if there is an obvious flower
3) Encourage it to be conservative

"Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate 'YES' or 'NO' to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', 'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose 'unknown'. The third line should indicates your confindence level in your assessment -- it should either be LOW, MEDIUM, or HIGH. Please be conservative here as well."

## Run it!

In [97]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate 'YES' or 'NO' to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', 'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose 'unknown'. The third line should indicates your confindence level in your assessment -- it should either be LOW, MEDIUM, or HIGH. Please be conservative here as well."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%5:
        print(idx)

0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85


In [99]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['conf'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [114]:
inat_taxon_df.to_csv('../data/gpt_labeled_taxon_photos_0_to_90.csv',index=False)

### Looking back... the 90 (+3 tests) requests cost $0.41.

0.41 / 90 = X / 1000  

X = $4.56 per 1000 requests  

$60.94 for the full 13378 species  

### 90:150

In [116]:
urls = []
# change the range here
startidx = 90
stopidx = 150
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [128]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Cardamine concatenata,https://inaturalist-open-data.s3.amazonaws.com...
1,Maianthemum racemosum,https://static.inaturalist.org/photos/39168234...
2,Monarda fistulosa,https://inaturalist-open-data.s3.amazonaws.com...
3,Claytonia caroliniana,https://static.inaturalist.org/photos/3566409/...
4,Nicotiana glauca,https://inaturalist-open-data.s3.amazonaws.com...
5,Lupinus arizonicus,https://inaturalist-open-data.s3.amazonaws.com...
6,Securigera varia,https://inaturalist-open-data.s3.amazonaws.com...
7,Opuntia basilaris,https://inaturalist-open-data.s3.amazonaws.com...
8,Aquilegia formosa,https://inaturalist-open-data.s3.amazonaws.com...
9,Lobelia siphilitica,https://inaturalist-open-data.s3.amazonaws.com...


In [131]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate 'YES' or 'NO' to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', 'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose 'unknown'. The third line should indicates your confindence level in your assessment -- it should either be LOW, MEDIUM, or HIGH. Please be conservative here as well."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%10:
        print(idx)

0
10
20
30
40
50


In [132]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['conf'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [133]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

### Cost=$0.27

After 90+3+60=153 reqests, cost 0.68. Therefore each question costs 0.68/153 dollars 

In [136]:
0.68 / 153

[1;36m0.0044444444444444444[0m

Each one costs 0.4444 cents / 0.004444 dollars.

# 150:250

In [137]:
urls = []
# change the range here
startidx = 150
stopidx = 250
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [138]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Nymphaea odorata,https://inaturalist-open-data.s3.amazonaws.com...
1,Medicago lupulina,https://inaturalist-open-data.s3.amazonaws.com...
2,Mitchella repens,https://inaturalist-open-data.s3.amazonaws.com...
3,Condea emoryi,https://inaturalist-open-data.s3.amazonaws.com...
4,Ferocactus cylindraceus,https://static.inaturalist.org/photos/2934370/...
...,...,...
95,Elaeagnus umbellata,https://static.inaturalist.org/photos/98619666...
96,Silybum marianum,https://static.inaturalist.org/photos/6078409/...
97,Melilotus officinalis,https://inaturalist-open-data.s3.amazonaws.com...
98,Lysimachia latifolia,https://inaturalist-open-data.s3.amazonaws.com...


In [139]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate 'YES' or 'NO' to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', 'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose 'unknown'. The third line should indicates your confindence level in your assessment -- it should either be LOW, MEDIUM, or HIGH. Please be conservative here as well."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%10:
        print(idx)

0
10
20
30
40
50
60
70
80
90


In [140]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['conf'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [141]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,conf,gpt_color
0,Nymphaea odorata,https://inaturalist-open-data.s3.amazonaws.com...,YES,HIGH,WHITE
1,Medicago lupulina,https://inaturalist-open-data.s3.amazonaws.com...,YES,HIGH,YELLOW
2,Mitchella repens,https://inaturalist-open-data.s3.amazonaws.com...,YES,HIGH,WHITE
3,Condea emoryi,https://inaturalist-open-data.s3.amazonaws.com...,YES,HIGH,PURPLE
4,Ferocactus cylindraceus,https://static.inaturalist.org/photos/2934370/...,NO,HIGH,NAN
...,...,...,...,...,...
95,Elaeagnus umbellata,https://static.inaturalist.org/photos/98619666...,NO,HIGH,NAN
96,Silybum marianum,https://static.inaturalist.org/photos/6078409/...,YES,HIGH,PURPLE
97,Melilotus officinalis,https://inaturalist-open-data.s3.amazonaws.com...,YES,HIGH,WHITE
98,Lysimachia latifolia,https://inaturalist-open-data.s3.amazonaws.com...,YES,HIGH,WHITE


In [143]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# Testing a "subjectivity" label

In [144]:
urls = []
# change the range here
startidx = 150
stopidx = 200
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [145]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Nymphaea odorata,https://inaturalist-open-data.s3.amazonaws.com...
1,Medicago lupulina,https://inaturalist-open-data.s3.amazonaws.com...
2,Mitchella repens,https://inaturalist-open-data.s3.amazonaws.com...
3,Condea emoryi,https://inaturalist-open-data.s3.amazonaws.com...
4,Ferocactus cylindraceus,https://static.inaturalist.org/photos/2934370/...
5,Rosa multiflora,https://inaturalist-open-data.s3.amazonaws.com...
6,Cornus canadensis,https://inaturalist-open-data.s3.amazonaws.com...
7,Eriogonum fasciculatum,https://inaturalist-open-data.s3.amazonaws.com...
8,Epilobium canum,https://inaturalist-open-data.s3.amazonaws.com...
9,Monotropa hypopitys,https://inaturalist-open-data.s3.amazonaws.com...


In [146]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate 'YES' or 'NO' to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', 'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose 'unknown'. The third line should indicate your assessment of the subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that the choice of color assignment seems highly subjective (e.g. if there are multiple colors to choose from, or the color seems intermediate between multiple categories). Please be conservative here as well."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%10:
        print(idx)

0
10
20
30
40


In [152]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['conf'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [153]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}_subjective.csv'
inat_taxon_df.to_csv(filename,index=False)

# 250:350

In [154]:
urls = []
# change the range here
startidx = 250
stopidx = 350
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [155]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Tellima grandiflora,https://inaturalist-open-data.s3.amazonaws.com...
1,Convolvulus equitans,https://inaturalist-open-data.s3.amazonaws.com...
2,Micranthes virginiensis,https://inaturalist-open-data.s3.amazonaws.com...
3,Stylophorum diphyllum,https://inaturalist-open-data.s3.amazonaws.com...
4,Zephyranthes chlorosolen,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
95,Euphorbia fendleri,https://inaturalist-open-data.s3.amazonaws.com...
96,Tragopogon dubius,https://inaturalist-open-data.s3.amazonaws.com...
97,Geranium carolinianum,https://inaturalist-open-data.s3.amazonaws.com...
98,Eriophyllum wallacei,https://inaturalist-open-data.s3.amazonaws.com...


In [156]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: three words separated onto three lines (one word per line). The first line should indicate 'YES' or 'NO' to answer whether there is a flower present. The second line should be one word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', 'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The flowers might not match these categories perfectly. Do the best you can. If in doubt, please be conservative and choose 'unknown'. The third line should indicate your assessment of the subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that the choice of color assignment seems highly subjective (e.g. if there are multiple colors to choose from, or the color seems intermediate between multiple categories). Please be conservative here as well."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%10:
        print(idx)

0
10
20
30
40
50
60
70
80
90


In [157]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['conf'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [158]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,conf,gpt_color
0,Tellima grandiflora,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Convolvulus equitans,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
2,Micranthes virginiensis,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
3,Stylophorum diphyllum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
4,Zephyranthes chlorosolen,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
...,...,...,...,...,...
95,Euphorbia fendleri,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,GREEN
96,Tragopogon dubius,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
97,Geranium carolinianum,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,PURPLE
98,Eriophyllum wallacei,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW


In [159]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# Rate limit increased! Also I refined the question.

# 0:1000

In [160]:
# change the range here
startidx = 0
stopidx = 1000

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [161]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Trillium grandiflorum,https://inaturalist-open-data.s3.amazonaws.com...
1,Dipterostemon capitatus,https://inaturalist-open-data.s3.amazonaws.com...
2,Trillium erectum,https://inaturalist-open-data.s3.amazonaws.com...
3,Sanguinaria canadensis,https://inaturalist-open-data.s3.amazonaws.com...
4,Trillium ovatum,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
995,Sonchus oleraceus,https://static.inaturalist.org/photos/28460419...
996,Ambrosia trifida,https://static.inaturalist.org/photos/51404370...
997,Gaultheria procumbens,https://inaturalist-open-data.s3.amazonaws.com...
998,Lathyrus tuberosus,https://inaturalist-open-data.s3.amazonaws.com...


In [163]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450


RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-vision-preview in organization org-c1K04m6D4JxMhPmM39kU10Wc on tokens per min (TPM): Limit 10000, Used 9476, Requested 532. Please try again in 48ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

In [164]:
idx

[1;36m475[0m

In [165]:
for idx in range(475,len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

475


RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-vision-preview in organization org-c1K04m6D4JxMhPmM39kU10Wc on requests per day (RPD): Limit 500, Used 500, Requested 1. Please try again in 2m52.8s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

In [166]:
idx

[1;36m477[0m

In [169]:
inat_taxon_df = inat_taxon_df.iloc[:477]

In [170]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['flower_present'] = flower_present_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['subjectivity'] = conf_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['gpt_color'] = color_list


In [171]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Trillium grandiflorum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Dipterostemon capitatus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
2,Trillium erectum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
3,Sanguinaria canadensis,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
4,Trillium ovatum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
...,...,...,...,...,...
472,Heterotheca grandiflora,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
473,Houstonia pusilla,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,BLUE
474,Hemizonia congesta,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
475,Gelsemium sempervirens,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW


In [174]:
inat_taxon_df.photo_url.iloc[-4]

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/2964293/medium.jpg'[0m

In [175]:
filename = f'../data/gpt_labeled_taxon_photos_0_to_477.csv'
inat_taxon_df.to_csv(filename,index=False)

# Next time will do more. :(

# Rate limit increased with upgrade to usage tier 2!

# 477 to 1400

In [176]:
# change the range here
startidx = 477
stopidx = 1400

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [177]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Verbascum blattaria,https://static.inaturalist.org/photos/21060467...
1,Eustoma exaltatum,https://inaturalist-open-data.s3.amazonaws.com...
2,Asclepias fascicularis,https://inaturalist-open-data.s3.amazonaws.com...
3,Allium acuminatum,https://inaturalist-open-data.s3.amazonaws.com...
4,Rudbeckia laciniata,https://static.inaturalist.org/photos/2320569/...
...,...,...
918,Laportea canadensis,https://inaturalist-open-data.s3.amazonaws.com...
919,Trillium stamineum,https://inaturalist-open-data.s3.amazonaws.com...
920,Fragaria chiloensis,https://static.inaturalist.org/photos/12686733...
921,Cardamine impatiens,https://inaturalist-open-data.s3.amazonaws.com...


In [178]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%25:
        print(idx)

0
25
50


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [182]:
idx

[1;36m68[0m

In [183]:
for idx in range(68,len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%25:
        print(idx)

75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875


BadRequestError: Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [184]:
idx

[1;36m883[0m

In [185]:
inat_taxon_df.photo_url.iloc[idx]

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/4093070/medium.JPG'[0m

In [186]:
for idx in range(883,len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.5)
    if not idx%25:
        print(idx)

900


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [187]:
idx

[1;36m910[0m

In [188]:
for idx in range(910,len(inat_taxon_df.photo_url)):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

In [189]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [192]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 1400 to 2300

In [217]:
# change the range here
startidx = 1400
stopidx = 2300

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [218]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Allium schoenoprasum,https://inaturalist-open-data.s3.amazonaws.com...
1,Geum rossii,https://inaturalist-open-data.s3.amazonaws.com...
2,Leptosiphon parviflorus,https://static.inaturalist.org/photos/34493368...
3,Asclepias erosa,https://inaturalist-open-data.s3.amazonaws.com...
4,Physocarpus malvaceus,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
895,Erythranthe bicolor,https://inaturalist-open-data.s3.amazonaws.com...
896,Rubus flagellaris,https://inaturalist-open-data.s3.amazonaws.com...
897,Astragalus purshii,https://inaturalist-open-data.s3.amazonaws.com...
898,Pluchea camphorata,https://inaturalist-open-data.s3.amazonaws.com...


In [219]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875


In [220]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [221]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Allium schoenoprasum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
1,Geum rossii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
2,Leptosiphon parviflorus,https://static.inaturalist.org/photos/34493368...,YES,LOW,YELLOW
3,Asclepias erosa,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
4,Physocarpus malvaceus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
...,...,...,...,...,...
895,Erythranthe bicolor,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
896,Rubus flagellaris,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
897,Astragalus purshii,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,GREEN
898,Pluchea camphorata,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK


In [222]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 2300 to 3200

In [223]:
# change the range here
startidx = 2300
stopidx = 3200

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [224]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Scandix pecten-veneris,https://inaturalist-open-data.s3.amazonaws.com...
1,Salsola australis,https://inaturalist-open-data.s3.amazonaws.com...
2,Callirhoe alcaeoides,https://inaturalist-open-data.s3.amazonaws.com...
3,Polygonum aviculare,https://inaturalist-open-data.s3.amazonaws.com...
4,Salix discolor,https://static.inaturalist.org/photos/92671962...
...,...,...
895,Euphorbia angusta,https://inaturalist-open-data.s3.amazonaws.com...
896,Hylodesmum pauciflorum,https://inaturalist-open-data.s3.amazonaws.com...
897,Aloysia wrightii,https://inaturalist-open-data.s3.amazonaws.com...
898,Rubus pensilvanicus,https://inaturalist-open-data.s3.amazonaws.com...


In [225]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675


BadRequestError: Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [226]:
idx

[1;36m679[0m

In [227]:
for idx in range(679,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

700
725
750
775
800
825
850
875


In [228]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [229]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Scandix pecten-veneris,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Salsola australis,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
2,Callirhoe alcaeoides,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
3,Polygonum aviculare,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
4,Salix discolor,https://static.inaturalist.org/photos/92671962...,NO,LOW,NAN
...,...,...,...,...,...
895,Euphorbia angusta,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
896,Hylodesmum pauciflorum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
897,Aloysia wrightii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
898,Rubus pensilvanicus,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN


In [230]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 3200 to 4100

In [231]:
# change the range here
startidx = 3200
stopidx = 4100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [232]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Lathyrus palustris,https://inaturalist-open-data.s3.amazonaws.com...
1,Glandularia gooddingii,https://inaturalist-open-data.s3.amazonaws.com...
2,Gratiola hispida,https://inaturalist-open-data.s3.amazonaws.com...
3,Ribes oxyacanthoides,https://inaturalist-open-data.s3.amazonaws.com...
4,Ribes nevadense,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
895,Acmispon wrangelianus,https://inaturalist-open-data.s3.amazonaws.com...
896,Brunnichia ovata,https://inaturalist-open-data.s3.amazonaws.com...
897,Lobelia appendiculata,https://static.inaturalist.org/photos/9920111/...
898,Hypericum brachyphyllum,https://inaturalist-open-data.s3.amazonaws.com...


In [233]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25


BadRequestError: Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [234]:
idx

[1;36m36[0m

In [235]:
for idx in range(36,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

50
75
100
125
150
175
200


ValueError: too many values to unpack (expected 3)

In [236]:
idx

[1;36m221[0m

In [238]:
inat_taxon_df.photo_url.iloc[221]

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/6916476/medium.jpg'[0m

In [240]:
for idx in range(221,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875


In [241]:
idx

[1;36m899[0m

In [242]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [243]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Lathyrus palustris,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
1,Glandularia gooddingii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
2,Gratiola hispida,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
3,Ribes oxyacanthoides,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
4,Ribes nevadense,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
...,...,...,...,...,...
895,Acmispon wrangelianus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
896,Brunnichia ovata,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
897,Lobelia appendiculata,https://static.inaturalist.org/photos/9920111/...,YES,LOW,BLUE
898,Hypericum brachyphyllum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW


In [244]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 4100 to 5100

In [245]:
# change the range here
startidx = 4100
stopidx = 5100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [246]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Anemone drummondii,https://inaturalist-open-data.s3.amazonaws.com...
1,Cirsium mohavense,https://inaturalist-open-data.s3.amazonaws.com...
2,Penstemon pachyphyllus,https://static.inaturalist.org/photos/43021796...
3,Hypericum lloydii,https://inaturalist-open-data.s3.amazonaws.com...
4,Talinum paniculatum,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
995,Saltugilia caruifolia,https://inaturalist-open-data.s3.amazonaws.com...
996,Fagopyrum esculentum,https://inaturalist-open-data.s3.amazonaws.com...
997,Heptapleurum actinophyllum,https://inaturalist-open-data.s3.amazonaws.com...
998,Carex vulpinoidea,https://inaturalist-open-data.s3.amazonaws.com...


In [247]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [248]:
idx

[1;36m593[0m

In [249]:
for idx in range(593,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

600
625


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [250]:
idx

[1;36m629[0m

In [251]:
for idx in range(629,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

650
675
700
725
750
775
800
825
850
875
900
925
950
975


In [252]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [253]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Anemone drummondii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Cirsium mohavense,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
2,Penstemon pachyphyllus,https://static.inaturalist.org/photos/43021796...,YES,LOW,PURPLE
3,Hypericum lloydii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
4,Talinum paniculatum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
...,...,...,...,...,...
995,Saltugilia caruifolia,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
996,Fagopyrum esculentum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
997,Heptapleurum actinophyllum,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,PINK
998,Carex vulpinoidea,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN


In [254]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 5100 to 6100

In [255]:
# change the range here
startidx = 5100
stopidx = 6100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [256]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Quercus alba,https://inaturalist-open-data.s3.amazonaws.com...
1,Carex grayi,https://inaturalist-open-data.s3.amazonaws.com...
2,Eleutherococcus sieboldianus,https://inaturalist-open-data.s3.amazonaws.com...
3,Nymphoides cristata,https://static.inaturalist.org/photos/10470337...
4,Rhododon ciliatus,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
995,Crataegus michauxii,https://inaturalist-open-data.s3.amazonaws.com...
996,Micranthes careyana,https://static.inaturalist.org/photos/12501282...
997,Quercus ilex,https://static.inaturalist.org/photos/43930461...
998,Clarkia dudleyana,https://inaturalist-open-data.s3.amazonaws.com...


In [257]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875
900
925
950
975


In [258]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [259]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Quercus alba,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
1,Carex grayi,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,GREEN
2,Eleutherococcus sieboldianus,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
3,Nymphoides cristata,https://static.inaturalist.org/photos/10470337...,YES,LOW,WHITE
4,Rhododon ciliatus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
...,...,...,...,...,...
995,Crataegus michauxii,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
996,Micranthes careyana,https://static.inaturalist.org/photos/12501282...,YES,LOW,WHITE
997,Quercus ilex,https://static.inaturalist.org/photos/43930461...,NO,LOW,NAN
998,Clarkia dudleyana,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK


In [260]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 6100 to 7100

In [261]:
# change the range here
startidx = 6100
stopidx = 7100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [262]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Nuphar microphylla,https://static.inaturalist.org/photos/48306597...
1,Crataegus macrosperma,https://inaturalist-open-data.s3.amazonaws.com...
2,Sida cordifolia,https://inaturalist-open-data.s3.amazonaws.com...
3,Asclepias hallii,https://static.inaturalist.org/photos/24485237...
4,Diplacus parryi,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
995,Tinantia macrophylla,https://static.inaturalist.org/photos/15503911...
996,Tithonia rotundifolia,https://inaturalist-open-data.s3.amazonaws.com...
997,Acacia baileyana,https://static.inaturalist.org/photos/30453215...
998,Canna glauca,https://inaturalist-open-data.s3.amazonaws.com...


In [263]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25
50
75
100


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [264]:
idx

[1;36m113[0m

In [265]:
for idx in range(113,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [266]:
idx

[1;36m891[0m

In [267]:
for idx in range(891,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

900
925
950
975


In [268]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [269]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Nuphar microphylla,https://static.inaturalist.org/photos/48306597...,YES,LOW,YELLOW
1,Crataegus macrosperma,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
2,Sida cordifolia,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
3,Asclepias hallii,https://static.inaturalist.org/photos/24485237...,YES,LOW,WHITE
4,Diplacus parryi,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
...,...,...,...,...,...
995,Tinantia macrophylla,https://static.inaturalist.org/photos/15503911...,YES,LOW,WHITE
996,Tithonia rotundifolia,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,ORANGE
997,Acacia baileyana,https://static.inaturalist.org/photos/30453215...,YES,LOW,YELLOW
998,Canna glauca,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW


In [270]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 7100 to 8100

In [271]:
# change the range here
startidx = 7100
stopidx = 8100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    photo = results[0]['default_photo']['medium_url']
    urls.append(photo)
    time.sleep(0.5)

In [272]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Scutellaria nervosa,https://inaturalist-open-data.s3.amazonaws.com...
1,Physostegia parviflora,https://inaturalist-open-data.s3.amazonaws.com...
2,Clerodendrum trichotomum,https://static.inaturalist.org/photos/23808643...
3,Monoptilon bellidiforme,https://inaturalist-open-data.s3.amazonaws.com...
4,Dalea adenopoda,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
995,Sporobolus cynosuroides,https://inaturalist-open-data.s3.amazonaws.com...
996,Erigeron humilis,https://inaturalist-open-data.s3.amazonaws.com...
997,Neillia incisa,https://inaturalist-open-data.s3.amazonaws.com...
998,Erigeron filifolius,https://inaturalist-open-data.s3.amazonaws.com...


In [273]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175


BadRequestError: Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [274]:
idx

[1;36m187[0m

In [275]:
for idx in range(187,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

200
225
250
275
300
325
350
375


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [276]:
idx

[1;36m392[0m

In [277]:
for idx in range(392,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(0.75)
    if not idx%25:
        print(idx)

400
425
450
475


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [278]:
idx

[1;36m490[0m

In [279]:
for idx in range(490,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1)
    if not idx%25:
        print(idx)

500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875
900
925
950
975


In [280]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [281]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Scutellaria nervosa,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Physostegia parviflora,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
2,Clerodendrum trichotomum,https://static.inaturalist.org/photos/23808643...,YES,LOW,WHITE
3,Monoptilon bellidiforme,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
4,Dalea adenopoda,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
...,...,...,...,...,...
995,Sporobolus cynosuroides,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
996,Erigeron humilis,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
997,Neillia incisa,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
998,Erigeron filifolius,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE


In [282]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

# 8100 to 9100

In [291]:
# change the range here
startidx = 8100
stopidx = 9100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    if results[0]['default_photo']:
        photo = results[0]['default_photo']['medium_url']
    else:
        photo = np.nan
    urls.append(photo)
    time.sleep(0.5)

In [292]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Scoparia montevidensis,https://inaturalist-open-data.s3.amazonaws.com...
1,Hypertelis umbellata,https://inaturalist-open-data.s3.amazonaws.com...
2,Campanula trachelium,https://inaturalist-open-data.s3.amazonaws.com...
3,Salvia jaimehintoniana,https://inaturalist-open-data.s3.amazonaws.com...
4,Monarda serotina,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
995,Solidago radula,https://static.inaturalist.org/photos/51048912...
996,Scilla sardensis,https://inaturalist-open-data.s3.amazonaws.com...
997,Navarretia paradoxinota,https://inaturalist-open-data.s3.amazonaws.com...
998,Cirsium coahuilense,https://inaturalist-open-data.s3.amazonaws.com...


In [296]:
# get rid of the one that has no photo url
inat_taxon_df = inat_taxon_df[inat_taxon_df['photo_url'].notna()]

In [297]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150


BadRequestError: Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [298]:
idx

[1;36m151[0m

In [299]:
for idx in range(151,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1)
    if not idx%25:
        print(idx)

175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875
900
925
950
975


In [300]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['flower_present'] = flower_present_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['subjectivity'] = conf_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['gpt_color'] = color_list


In [301]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Scoparia montevidensis,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
1,Hypertelis umbellata,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,BROWN
2,Campanula trachelium,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
3,Salvia jaimehintoniana,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,BLUE
4,Monarda serotina,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
...,...,...,...,...,...
995,Solidago radula,https://static.inaturalist.org/photos/51048912...,YES,LOW,YELLOW
996,Scilla sardensis,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,BLUE
997,Navarretia paradoxinota,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
998,Cirsium coahuilense,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE


In [302]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

In [305]:
inat_taxon_df.loc[996].photo_url

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/1134205/medium.jpg'[0m

# 9100 to 10100

In [306]:
# change the range here
startidx = 9100
stopidx = 10100

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    if results[0]['default_photo']:
        photo = results[0]['default_photo']['medium_url']
    else:
        photo = np.nan
    urls.append(photo)
    time.sleep(0.5)

In [307]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Nemacladus californicus,https://inaturalist-open-data.s3.amazonaws.com...
1,Desmanthus covillei,https://inaturalist-open-data.s3.amazonaws.com...
2,Oxytheca perfoliata,https://inaturalist-open-data.s3.amazonaws.com...
3,Geranium lentum,https://inaturalist-open-data.s3.amazonaws.com...
4,Nolina atopocarpa,https://static.inaturalist.org/photos/24455592...
...,...,...
995,Stephanomeria runcinata,https://inaturalist-open-data.s3.amazonaws.com...
996,Stephanomeria thurberi,https://inaturalist-open-data.s3.amazonaws.com...
997,Tricyrtis formosana,https://inaturalist-open-data.s3.amazonaws.com...
998,Doellingeria paucicapitata,https://inaturalist-open-data.s3.amazonaws.com...


In [308]:
# get rid of the one that has no photo url
inat_taxon_df = inat_taxon_df[inat_taxon_df['photo_url'].notna()]
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Nemacladus californicus,https://inaturalist-open-data.s3.amazonaws.com...
1,Desmanthus covillei,https://inaturalist-open-data.s3.amazonaws.com...
2,Oxytheca perfoliata,https://inaturalist-open-data.s3.amazonaws.com...
3,Geranium lentum,https://inaturalist-open-data.s3.amazonaws.com...
4,Nolina atopocarpa,https://static.inaturalist.org/photos/24455592...
...,...,...
995,Stephanomeria runcinata,https://inaturalist-open-data.s3.amazonaws.com...
996,Stephanomeria thurberi,https://inaturalist-open-data.s3.amazonaws.com...
997,Tricyrtis formosana,https://inaturalist-open-data.s3.amazonaws.com...
998,Doellingeria paucicapitata,https://inaturalist-open-data.s3.amazonaws.com...


In [309]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750


BadRequestError: Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [310]:
idx

[1;36m761[0m

In [311]:
for idx in range(761,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1)
    if not idx%25:
        print(idx)

775
800
825
850
875
900
925
950
975


In [312]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

In [313]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Nemacladus californicus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Desmanthus covillei,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
2,Oxytheca perfoliata,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,GREEN
3,Geranium lentum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
4,Nolina atopocarpa,https://static.inaturalist.org/photos/24455592...,YES,MEDIUM,GREEN
...,...,...,...,...,...
995,Stephanomeria runcinata,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
996,Stephanomeria thurberi,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK
997,Tricyrtis formosana,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,PINK
998,Doellingeria paucicapitata,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE


In [314]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

In [317]:
inat_taxon_df.loc[997].photo_url

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/28754525/medium.jpeg'[0m

# 10100 to 11600

In [328]:
len(sorted_unique_species)

[1;36m13378[0m

In [331]:
# change the range here
startidx = 10100
stopidx = 11600

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    if not len(results):
        photo=np.nan
    else:
        if results[0]['default_photo']:
            photo = results[0]['default_photo']['medium_url']
        else:
            photo = np.nan
    urls.append(photo)
    time.sleep(0.5)

In [332]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Streptanthus morrisonii,https://inaturalist-open-data.s3.amazonaws.com...
1,Tragia smallii,https://inaturalist-open-data.s3.amazonaws.com...
2,Stutzia covillei,https://static.inaturalist.org/photos/20310980...
3,Draba aurea,https://inaturalist-open-data.s3.amazonaws.com...
4,Tradescantia tharpii,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
1495,Salvia aequidistans,https://inaturalist-open-data.s3.amazonaws.com...
1496,Eucnide hypomalaca,https://static.inaturalist.org/photos/35688103...
1497,Eucnide floribunda,https://inaturalist-open-data.s3.amazonaws.com...
1498,Rhamnus pirifolia,https://inaturalist-open-data.s3.amazonaws.com...


In [333]:
# get rid of the one that has no photo url
inat_taxon_df = inat_taxon_df[inat_taxon_df['photo_url'].notna()]
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Streptanthus morrisonii,https://inaturalist-open-data.s3.amazonaws.com...
1,Tragia smallii,https://inaturalist-open-data.s3.amazonaws.com...
2,Stutzia covillei,https://static.inaturalist.org/photos/20310980...
3,Draba aurea,https://inaturalist-open-data.s3.amazonaws.com...
4,Tradescantia tharpii,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
1495,Salvia aequidistans,https://inaturalist-open-data.s3.amazonaws.com...
1496,Eucnide hypomalaca,https://static.inaturalist.org/photos/35688103...
1497,Eucnide floribunda,https://inaturalist-open-data.s3.amazonaws.com...
1498,Rhamnus pirifolia,https://inaturalist-open-data.s3.amazonaws.com...


In [334]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

0
25
50
75


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [335]:
idx

[1;36m91[0m

In [336]:
for idx in range(91,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875
900
925
950
975
1000
1025
1050
1075
1100
1125
1150
1175
1200
1225
1250
1275
1300
1325
1350
1375
1400
1425
1450
1475


In [337]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['flower_present'] = flower_present_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['subjectivity'] = conf_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['gpt_color'] = color_list


In [338]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Streptanthus morrisonii,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,BROWN
1,Tragia smallii,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,GREEN
2,Stutzia covillei,https://static.inaturalist.org/photos/20310980...,YES,MEDIUM,GREEN
3,Draba aurea,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
4,Tradescantia tharpii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
...,...,...,...,...,...
1495,Salvia aequidistans,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
1496,Eucnide hypomalaca,https://static.inaturalist.org/photos/35688103...,YES,LOW,WHITE
1497,Eucnide floribunda,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,YELLOW
1498,Rhamnus pirifolia,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN


In [339]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

In [342]:
inat_taxon_df.loc[1499].photo_url

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/34124656/medium.jpg'[0m

# 11600 to 12000

In [343]:
# change the range here
startidx = 11600
stopidx = 12000

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    if not len(results):
        photo=np.nan
    else:
        if results[0]['default_photo']:
            photo = results[0]['default_photo']['medium_url']
        else:
            photo = np.nan
    urls.append(photo)
    time.sleep(0.5)

In [344]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Euchiton sphaericus,https://inaturalist-open-data.s3.amazonaws.com...
1,Salsola ryanii,https://inaturalist-open-data.s3.amazonaws.com...
2,Salpiglossis arniatera,https://inaturalist-open-data.s3.amazonaws.com...
3,Salpianthus macrodontus,https://inaturalist-open-data.s3.amazonaws.com...
4,Salix sessilifolia,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
395,Epipremnum pinnatum,https://static.inaturalist.org/photos/18915883...
396,Solanum interius,https://inaturalist-open-data.s3.amazonaws.com...
397,Amorpha nitens,https://static.inaturalist.org/photos/7325500/...
398,Amorphophallus bulbifer,https://inaturalist-open-data.s3.amazonaws.com...


In [345]:
# get rid of the one that has no photo url
inat_taxon_df = inat_taxon_df[inat_taxon_df['photo_url'].notna()]
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Euchiton sphaericus,https://inaturalist-open-data.s3.amazonaws.com...
1,Salsola ryanii,https://inaturalist-open-data.s3.amazonaws.com...
2,Salpiglossis arniatera,https://inaturalist-open-data.s3.amazonaws.com...
3,Salpianthus macrodontus,https://inaturalist-open-data.s3.amazonaws.com...
4,Salix sessilifolia,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
395,Epipremnum pinnatum,https://static.inaturalist.org/photos/18915883...
396,Solanum interius,https://inaturalist-open-data.s3.amazonaws.com...
397,Amorpha nitens,https://static.inaturalist.org/photos/7325500/...
398,Amorphophallus bulbifer,https://inaturalist-open-data.s3.amazonaws.com...


In [346]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

0
25
50
75


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [347]:
idx

[1;36m77[0m

In [348]:
for idx in range(77,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

100
125
150
175
200
225
250
275
300
325
350
375


In [349]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['flower_present'] = flower_present_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['subjectivity'] = conf_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['gpt_color'] = color_list


In [350]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Euchiton sphaericus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,BROWN
1,Salsola ryanii,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
2,Salpiglossis arniatera,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,BLUE
3,Salpianthus macrodontus,https://inaturalist-open-data.s3.amazonaws.com...,YES,MEDIUM,MAROON
4,Salix sessilifolia,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,GREEN
...,...,...,...,...,...
395,Epipremnum pinnatum,https://static.inaturalist.org/photos/18915883...,NO,LOW,NAN
396,Solanum interius,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
397,Amorpha nitens,https://static.inaturalist.org/photos/7325500/...,YES,MEDIUM,BROWN
398,Amorphophallus bulbifer,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PINK


In [351]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

In [352]:
inat_taxon_df.loc[398].photo_url

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/180551/medium.jpg'[0m

# 12000 to 13378

In [353]:
# change the range here
startidx = 12000
stopidx = 13378

# get the taxa
urls = []
for taxon in sorted_unique_species[startidx:stopidx]:
    res = pyinaturalist.get_taxa(taxon)
    total_results = res['total_results']
    page = res['page']
    per_page = res['per_page']
    results = res['results']

    if not len(results):
        photo=np.nan
    else:
        if results[0]['default_photo']:
            photo = results[0]['default_photo']['medium_url']
        else:
            photo = np.nan
    urls.append(photo)
    time.sleep(0.5)

In [354]:
inat_taxon_df = pd.DataFrame([sorted_unique_species[startidx:stopidx],urls],index=['binomial','photo_url']).T
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Solanum deflexum,https://inaturalist-open-data.s3.amazonaws.com...
1,Cuscuta glabrior,https://static.inaturalist.org/photos/9515713/...
2,Ambrosia peruviana,https://inaturalist-open-data.s3.amazonaws.com...
3,Solidago nana,https://static.inaturalist.org/photos/21424388...
4,Sparganium natans,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
1373,Boechera lemmonii,https://inaturalist-open-data.s3.amazonaws.com...
1374,Guettarda elliptica,https://inaturalist-open-data.s3.amazonaws.com...
1375,Penstemon cyathophorus,https://inaturalist-open-data.s3.amazonaws.com...
1376,Boechera pinetorum,https://inaturalist-open-data.s3.amazonaws.com...


In [355]:
# get rid of the one that has no photo url
inat_taxon_df = inat_taxon_df[inat_taxon_df['photo_url'].notna()]
inat_taxon_df

Unnamed: 0,binomial,photo_url
0,Solanum deflexum,https://inaturalist-open-data.s3.amazonaws.com...
1,Cuscuta glabrior,https://static.inaturalist.org/photos/9515713/...
2,Ambrosia peruviana,https://inaturalist-open-data.s3.amazonaws.com...
3,Solidago nana,https://static.inaturalist.org/photos/21424388...
4,Sparganium natans,https://inaturalist-open-data.s3.amazonaws.com...
...,...,...
1373,Boechera lemmonii,https://inaturalist-open-data.s3.amazonaws.com...
1374,Guettarda elliptica,https://inaturalist-open-data.s3.amazonaws.com...
1375,Penstemon cyathophorus,https://inaturalist-open-data.s3.amazonaws.com...
1376,Boechera pinetorum,https://inaturalist-open-data.s3.amazonaws.com...


In [389]:
flower_present_list = []
color_list = []
conf_list = []
for idx in range(len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

0
25
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475


RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4-vision-preview in organization org-c1K04m6D4JxMhPmM39kU10Wc on requests per day (RPD): Limit 1500, Used 1500, Requested 1. Please try again in 57.6s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

In [390]:
idx

[1;36m494[0m

In [391]:
for idx in range(494,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

500
525
550
575
600
625
650
675
700
725
750
775
800
825
850
875


BadRequestError: Error code: 400 - {'error': {'message': "You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp'].", 'type': 'invalid_request_error', 'param': None, 'code': 'sanitizer_server_error'}}

In [392]:
idx

[1;36m887[0m

In [393]:
for idx in range(887,len(inat_taxon_df.photo_url)):
    response=client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please adhere to very specific formatting in your response: \
                    three words separated onto three lines (one word per line). The first line should indicate \
                    'YES' or 'NO' to answer whether there is a flower present. The second line should be one \
                    word from the following list, to best describe the flower color in the photo: ['BLUE', 'BROWN', \
                    'GREEN', 'ORANGE', 'PINK', 'PURPLE', 'RED', 'MAROON', 'WHITE', 'YELLOW','UNKNOWN','NAN']. The \
                    flowers might not match these categories perfectly. Do the best you can. If in doubt, please \
                    be conservative and choose 'unknown'. The third line should indicate your assessment of the \
                    subjectivity of the answer -- it should either be LOW, MEDIUM, or HIGH, where HIGH means that \
                    the choice of color assignment seems highly subjective."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": inat_taxon_df.photo_url.iloc[idx],
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    flower_present, color, conf = response.choices[0].message.content.split()
    flower_present_list.append(flower_present)
    color_list.append(color)
    conf_list.append(conf)
    time.sleep(1.0)
    if not idx%25:
        print(idx)

900
925
950
975
1000
1025
1050
1075
1100
1125
1150
1175
1200
1225
1250
1275
1300
1325
1350


In [394]:
inat_taxon_df['flower_present'] = flower_present_list
inat_taxon_df['subjectivity'] = conf_list
inat_taxon_df['gpt_color'] = color_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['flower_present'] = flower_present_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['subjectivity'] = conf_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  inat_taxon_df['gpt_color'] = color_list


In [395]:
inat_taxon_df

Unnamed: 0,binomial,photo_url,flower_present,subjectivity,gpt_color
0,Solanum deflexum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE
1,Cuscuta glabrior,https://static.inaturalist.org/photos/9515713/...,YES,MEDIUM,BROWN
2,Ambrosia peruviana,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,GREEN
3,Solidago nana,https://static.inaturalist.org/photos/21424388...,YES,LOW,YELLOW
4,Sparganium natans,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,GREEN
...,...,...,...,...,...
1373,Boechera lemmonii,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
1374,Guettarda elliptica,https://inaturalist-open-data.s3.amazonaws.com...,NO,LOW,NAN
1375,Penstemon cyathophorus,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,PURPLE
1376,Boechera pinetorum,https://inaturalist-open-data.s3.amazonaws.com...,YES,LOW,WHITE


In [396]:
filename = f'../data/gpt_labeled_taxon_photos_{startidx}_to_{stopidx}.csv'
inat_taxon_df.to_csv(filename,index=False)

In [400]:
inat_taxon_df.loc[1374].photo_url

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/247021022/medium.jpg'[0m

# Merge em all up!

In [401]:
import os

In [402]:
os.listdir('../data/')


[1m[[0m
    [32m'TRY_cleaned_colordata.csv'[0m,
    [32m'gpt_labeled_taxon_photos_6100_to_7100.csv'[0m,
    [32m'gpt_labeled_taxon_photos_11600_to_12000.csv'[0m,
    [32m'worldclim'[0m,
    [32m'gpt_labeled_taxon_photos_150_to_200_subjective.csv'[0m,
    [32m'gpt_labeled_taxon_photos_8100_to_9100.csv'[0m,
    [32m'gpt_labeled_taxon_photos_1400_to_2300.csv'[0m,
    [32m'gpt_labeled_taxon_photos_150_to_250.csv'[0m,
    [32m'.DS_Store'[0m,
    [32m'gpt_labeled_remaining_flowering_species.csv'[0m,
    [32m'top_abundant_manual_labeled.csv'[0m,
    [32m'gpt_labeled_taxon_photos_4100_to_5100.csv'[0m,
    [32m'gpt_labeled_taxon_photos_90_to_150.csv'[0m,
    [32m'to_label_color.csv'[0m,
    [32m'GPT_plus_TRY_combined_raw_inaturalist_export_nafiltered.csv'[0m,
    [32m'combined_raw_inaturalist_export.csv'[0m,
    [32m'maxent'[0m,
    [32m'gpt_labeled_taxon_photos_477_to_1400.csv'[0m,
    [32m'gpt_labeled_taxon_photos_0_to_90.csv'[0m,
    [32m'simplified_

In [403]:
starts_stops = [[0,477],
[477,1400],
[1400,2300],
[2300,3200],
[3200,4100],
[4100,5100],
[5100,6100],
[6100,7100],
[7100,8100],
[8100,9100],
[9100,10100],
[10100,11600],
[11600,12000],
[12000,13378]]

In [404]:
# Initialize an empty list to store dataframes
all_dfs = []

for start, stop in starts_stops:
    filename = f'../data/gpt_labeled_taxon_photos_{start}_to_{stop}.csv'
    tempdf = pd.read_csv(filename)
    all_dfs.append(tempdf)

# Concatenate all dataframes in the list
big_dataframe = pd.concat(all_dfs, ignore_index=True)


In [408]:
np.sum(big_dataframe.gpt_color.eq('WHITE'))

[1;36m2948[0m

In [409]:
big_dataframe.to_csv('../data/FULL_gpt_labeled_taxon.csv',index=False)

In [453]:
big_dataframe[big_dataframe.gpt_color.eq('RED')].sample().photo_url.iloc[0]

[32m'https://inaturalist-open-data.s3.amazonaws.com/photos/122882462/medium.jpeg'[0m

In [456]:
reds_only = big_dataframe[big_dataframe.gpt_color.eq("RED")]
reds_only.to_csv('../data/FULL_gpt_labeled_REDS_ONLY.csv',index=False)