# Match images to image URL and store position

We load the previous dataset with CLIP computed scores and match it to the image position.  

## load dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd

cat_df = pd.read_csv('/content/drive/MyDrive/AI/cats/pretty_cats.csv')
cat_df.head(5)

Unnamed: 0.1,Unnamed: 0,pretty,ugly
0,0,0.821487,0.178513
1,1,0.877141,0.122859
2,2,0.843921,0.156079
3,3,0.288922,0.711078
4,4,0.927846,0.072154


## Add image URLs

The image position in the dataset corresponds to the URL in the bucket (images imported in the same order).

Rename columns and delete complementary ugly score:

In [None]:
cat_df.columns = ['position', 'pretty_score', 'todel']
del cat_df['todel']
cat_df.head(5)

Unnamed: 0,position,pretty_score
0,0,0.821487
1,1,0.877141
2,2,0.843921
3,3,0.288922
4,4,0.927846


Add URLs.  

Here is the URL of the first image:

https://storage.googleapis.com/pretty_cats/image_0.jpg  

In [None]:
def get_url(position):
  return f'https://storage.googleapis.com/pretty_cats/image_{position}.jpg'

cat_df['url'] = cat_df.position.apply(get_url)
cat_df.head(5)

Unnamed: 0,position,pretty_score,url
0,0,0.821487,https://storage.googleapis.com/pretty_cats/ima...
1,1,0.877141,https://storage.googleapis.com/pretty_cats/ima...
2,2,0.843921,https://storage.googleapis.com/pretty_cats/ima...
3,3,0.288922,https://storage.googleapis.com/pretty_cats/ima...
4,4,0.927846,https://storage.googleapis.com/pretty_cats/ima...


## Check URL validity

Define function to check URLs

In [None]:
import requests

def check_url_status(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return "OK"
        else:
            return response.status_code
    except requests.exceptions.RequestException as e:
        return f"Error: {e}"

# Example usage
url = "https://www.example.com"
status = check_url_status(url)
print(f"The status of the URL is: {status}")


The status of the URL is: OK


Apply to dataset to check that all images are found inside the bucket:

(only random sample of 100 pics)

In [None]:
cat_df_sample = cat_df.sample(n=100)

In [None]:
cat_df_sample['http_status'] = cat_df_sample.url.apply(check_url_status)

In [None]:
cat_df_sample.http_status.value_counts()

http_status
OK    100
Name: count, dtype: int64

## Sort by score

In [None]:
cat_df.sort_values('pretty_score', ascending=False, inplace=True)
cat_df.reset_index(inplace=True, drop=True)
cat_df.reset_index(inplace=True)
cat_df.head(5)

Unnamed: 0,index,position,pretty_score,url
0,0,85,0.98841,https://storage.googleapis.com/pretty_cats/ima...
1,1,126,0.986533,https://storage.googleapis.com/pretty_cats/ima...
2,2,191,0.984669,https://storage.googleapis.com/pretty_cats/ima...
3,3,142,0.984484,https://storage.googleapis.com/pretty_cats/ima...
4,4,225,0.98447,https://storage.googleapis.com/pretty_cats/ima...


We rename features for an updated understanding:

In [None]:
cat_df.columns = ['pretty_position', 'image_number','pretty_score', 'url']
cat_df.head(5)

Unnamed: 0,pretty_position,image_number,pretty_score,url
0,0,85,0.98841,https://storage.googleapis.com/pretty_cats/ima...
1,1,126,0.986533,https://storage.googleapis.com/pretty_cats/ima...
2,2,191,0.984669,https://storage.googleapis.com/pretty_cats/ima...
3,3,142,0.984484,https://storage.googleapis.com/pretty_cats/ima...
4,4,225,0.98447,https://storage.googleapis.com/pretty_cats/ima...


In [None]:
cat_df.to_csv('/content/drive/MyDrive/AI/cats/cats_ordered_by_prettiest.csv', index=False)