# Semantic search of your personal cloud photos and videos

### An LLM-based generative AI app that uses:

#### **Pinecone** vector database for embedding storage and search

#### **Hugging Face** models and pipelines
#### - **salesforce_blip_image_captioning** model for caption generation
#### - **sentence_transformers (SBERT)** for embedding creation

#### **Google API** to access your personal Google Photos to perform semantic search and find "that one photo" 

#### Tested locally on a **MacBook Pro**, Apple M1 Pro with 16 GPU Cores and 32GB mem

#### References

**polzerdo55862** has a great notebook tutorial on using the Google Photos API via python. Some of the Google API cells below are a copy/paste from that notebook.

https://github.com/polzerdo55862/google-photos-api/blob/main/Google_API.ipynb

**Pinecone** quick tour shows how to initialize, fill, and delete a Pinecone "index" 

https://github.com/pinecone-io/examples/blob/master/docs/quick-tour/hello-pinecone.ipynb

**Pinecone** examples of how to query an index and use SBERT

https://github.com/pinecone-io/examples/blob/master/docs/semantic-search.ipynb

**Salesforce** examples of how to download and use the BLIP image captioning model 

https://github.com/salesforce/BLIP/blob/main/demo.ipynb

## Create virtualenv and install required packages

1. Open the terminal and navigate to your working directory. The folder structure of the repo includes the following directories:

    * **credentials**: folder to store the credentials you need to authenticate your "Python App" to the Google Photos Library
    * **media_items_list**: every time the script runs, I want to save a .csv file with all Google Photos media items and the corresponding metadata uploaded in the defined time period
    * **downloads**: storing downloaded images from Google Photos


2. Create a virtual environment `python3 -m venv venv`, activate it `. ./venv/bin/activate` and install requirements `pip install -r requirements.txt`

3. Install ipykernel which provides the IPython kernel for Jupyter: `pip install ipykernel` and add your virtual environment to Jupyter: `python -m ipykernel install --user --name=venv` 

    You can check the installation by navigating to /Users/<user>/Library/Jupyter/kernels. There should be a new directory called 'venv'. In the folder you can find the file 'kernel.json', which contains the path for the used python installation is defined.

4. Start jupyter notebook or jupyter lab: `jupyter lab .` and select the just created environment "venv" as Kernel

![](read_me_img/select_kernel.png)

## Enable Google API

5. Enable Google Photos API Service

   1. Go to the Google API Console [https://console.cloud.google.com/](https://console.cloud.google.com/). 
   2. From the menu bar, select a project or create a new project.
   
      ![](read_me_img/gifs/create_new_project_speed.gif)
      
   3. To open the Google API Library, from the Navigation menu, select APIs & Services > Library. 
   4. Search for "Google Photos Library API". Select the correct result and click "enable". If its already enabled, click "manage"
   
       ![](read_me_img/gifs/enable_api_speed.gif)
       
   5. Afterwards it will forward you to the "Photos API/Service details" page (https://console.cloud.google.com/apis/credentials)


6. Configure "OAuth consent screen" ([Source](https://stackoverflow.com/questions/65184355/error-403-access-denied-from-google-authentication-web-api-despite-google-acc))

   1. Go back to the Photos API Service details page and click on "[OAuth consent screen](https://console.cloud.google.com/apis/credentials/consent)" on the left side (below "Credentials") 
   2. Add a Test user: Use the email of the account you want to use for testing the API call
   
        ![](read_me_img/add_test_user.png)

7. Create API/OAuth credentials

   1. On the left side of the Google Photos API Service page, click Credentials
   2. Click on "Create Credentials" and create a OAuth client ID
   3. As application type I am choosing "Desktop app" and give your client you want to use to call the API a name
   4. Download the JSON file to the created credentials, rename it to "client_secret.json" and save it in the folder "credentials"
   
        ![](read_me_img/gifs/create_credentials_speed.gif)

In [None]:
!which python
!which pip

## Use the Google Photo Library API for the first time:

The following section shows how to use OAuth Credentials for authentication with the Google Library API. The code section below covers the following steps:

8. Create a service for the first time:

    1. Initialize GooglePhotosApi `google_photos_api = GooglePhotosApi()`

    2. Create Service using the `client_secret.json` file: `service = google_photos_api.create_service()`
        
        
       <b>Calling the API for the first time:</b>
       1. Google will ask you if you want to grant the App the required permissions you defined with the scope
       2. Since its just a test app at the moment, Google will make you aware of that > Click on "Continue"
       3. Once you granted the app the required permissions, you will see a "token_......pickle" file created in the folder "credentials". This token file will be used for future calls.

### Class to establish GoogleAPI credentials

In [6]:
import pickle
import os
from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
#from googleapiclient.http import MediaFileUpload
from google.auth.transport.requests import Request
import requests

class GooglePhotosApi:
    def __init__(self,
                 api_name = 'photoslibrary',
                 client_secret_file= r'./credentials/client_secret.json',
                 api_version = 'v1',
                 scopes = ['https://www.googleapis.com/auth/photoslibrary']):
        '''
        Args:
            client_secret_file: string, location where the requested credentials are saved
            api_version: string, the version of the service
            api_name: string, name of the api e.g."docs","photoslibrary",...
            api_version: version of the api

        Return:
            service:
        '''

        self.api_name = api_name
        self.client_secret_file = client_secret_file
        self.api_version = api_version
        self.scopes = scopes
        self.cred_pickle_file = f'./credentials/token_{self.api_name}_{self.api_version}.pickle'

        self.cred = None

    def run_local_server(self):
        # is checking if there is already a pickle file with relevant credentials
        if os.path.exists(self.cred_pickle_file):
            with open(self.cred_pickle_file, 'rb') as token:
                self.cred = pickle.load(token)

        # if there is no pickle file with stored credentials, create one using google_auth_oauthlib.flow
        if not self.cred or not self.cred.valid:
            if self.cred and self.cred.expired and self.cred.refresh_token:
                self.cred.refresh(Request())
            else:
                flow = InstalledAppFlow.from_client_secrets_file(self.client_secret_file, self.scopes)
                self.cred = flow.run_local_server()

            with open(self.cred_pickle_file, 'wb') as token:
                pickle.dump(self.cred, token)
        
        return self.cred


In [None]:
# initialize photos api and create service
google_photos_api = GooglePhotosApi()
creds = google_photos_api.run_local_server()

### Use python requests module and the token file to retrieve data from Google Photos

The following function sends a post request to the Media API to get a list of all entries. Since the API return is limited to 100 items, the search is narrowed down to one day. Thus, the call would only be a problem if more than 100 images were created/uploaded on one day.

In [8]:
import json
import requests

def get_response_from_medium_api(year, month, day):
    url = 'https://photoslibrary.googleapis.com/v1/mediaItems:search'
    payload = {
                  "filters": {
                    "dateFilter": {
                      "dates": [
                        {
                          "day": day,
                          "month": month,
                          "year": year
                        }
                      ]
                    }
                  }
                }
    headers = {
        'content-type': 'application/json',
        'Authorization': 'Bearer {}'.format(creds.token)
    }
    
    try:
        res = requests.request("POST", url, data=json.dumps(payload), headers=headers)
    except:
        print('Request error') 
    
    return(res)

Use the response of the API to write the results and required metadata into a data frame:

In [9]:
def list_of_media_items(year, month, day, media_items_df):
    '''
    Args:
        year, month, day: day for the filter of the API call 
        media_items_df: existing data frame with all find media items so far
    Return:
        media_items_df: media items data frame extended by the articles found for the specified tag
        items_df: media items uploaded on specified date
    '''

    items_list_df = pd.DataFrame()
    
    # create request for specified date
    response = get_response_from_medium_api(year, month, day)

    try:
        for item in response.json()['mediaItems']:
            items_df = pd.DataFrame(item)
            items_df = items_df.rename(columns={"mediaMetadata": "creationTime"})
            items_df.set_index('creationTime')
            items_df = items_df[items_df.index == 'creationTime']

            #append the existing media_items data frame
            items_list_df = pd.concat([items_list_df, items_df])
            media_items_df = pd.concat([media_items_df, items_df])
    
    except:
        print(response.text)

    return(items_list_df, media_items_df)

### Execute the API call for all dates to get a list with all media items

Data fields of note:

**id** Immutable
      
**baseUrl** Base URLs within the Google Photos Library API allow you to access the bytes of the media items. They are valid for 60 minutes. 

(https://developers.google.com/photos/library/guides/access-media-items)

In [12]:
import timm
import transformers
import torch

### Connect to MacBook MPS if NVIDIA is not available

In [13]:
# Check that MPS is available
if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    mps_device = torch.device("mps")

In [14]:
# device = 'cuda' if torch.cuda.is_available() else 'cpu'
device = 'cuda' if torch.cuda.is_available() else mps_device

In [15]:
device

device(type='mps')

### Load the models and test an image caption

In [16]:
# sentence_transforer model used to save the embeddings
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
model

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

In [17]:
# blip model used to generate captions
from transformers import pipeline

captioner = pipeline("image-to-text",model="Salesforce/blip-image-captioning-base")

In [20]:
import pandas as pd
from datetime import date, timedelta, datetime
import requests

# create a list with all dates between start date and today
sdate = date(2023,9,1)   # start date
edate = date.today()
date_list = pd.date_range(sdate,edate-timedelta(days=1),freq='d')

media_items_df = pd.DataFrame()

for date in date_list:
    
    # get a list with all media items for specified date (year, month, day)
    items_df, media_items_df = list_of_media_items(year = date.year, month = date.month, day = date.day, media_items_df = media_items_df)

{}

{}

{}

{}

{}



In [21]:
media_items_df.drop(['productUrl', 'mimeType', 'creationTime', 'filename'], axis=1, inplace=True)

In [22]:
media_items_df.reset_index(drop=True, inplace=True)

In [23]:
media_items_df.head()

Unnamed: 0,id,baseUrl
0,ALuQekoEDdHn-4zJECI5IfbAr1UfwOmwa1LjKeLjMRD1h6...,https://lh3.googleusercontent.com/lr/AAJ1LKeq7...
1,ALuQekrYsr8xfGeVKydlZ5BV-8oqNeg8DIj5ZruswgnG-4...,https://lh3.googleusercontent.com/lr/AAJ1LKf2s...
2,ALuQekqBrUHCLeojoVet8LNKs6avSCkvMFLJ-NqH6Ybcks...,https://lh3.googleusercontent.com/lr/AAJ1LKdj2...
3,ALuQekq3jPhquGT-X6FbQ15ulkTMv8MjPeujfg24fgM8jn...,https://lh3.googleusercontent.com/lr/AAJ1LKe27...
4,ALuQekrTVjjzq-DKEWbjNLafYgHe3kGlUmuj1snCznFOmP...,https://lh3.googleusercontent.com/lr/AAJ1LKeMA...


In [24]:
len(media_items_df)

172

In [25]:
# test the captioner
captioner(media_items_df['baseUrl'].values[0])



[{'generated_text': 'the screen of an iphone with the notifications on'}]

In [39]:
from IPython.display import Image

In [40]:
Image(url=media_items_df['baseUrl'].values[0])

### Generate captions and embeddings of all your photos and videos

In [41]:
img_embeddings = []
img_captions = []
img_ids = []

for i, burl in enumerate(media_items_df['baseUrl'].values):
    
    if i % 20 == 0:
        print(f"{i} of {len(media_items_df)}")
        
    try:
        curr_id = media_items_df['id'].values[i]
        # burl = media_items_df['baseUrl'].values[i]
        curr_desc = captioner(burl)
        curr_embedding = model.encode(curr_desc).tolist()
        
        img_ids.append(curr_id)
        img_embeddings.append(curr_embedding[0])
        img_captions.append(curr_desc[0])
    except:
        print("error - check if baseUrl has expired")

0 of 172
20 of 172
40 of 172
60 of 172
80 of 172
100 of 172
120 of 172
140 of 172
160 of 172


In [42]:
df_photos = pd.DataFrame({'id': img_ids, 'vector': img_embeddings, 'metadata': img_captions})

In [43]:
df_photos['baseUrl'] = media_items_df['baseUrl']

In [44]:
df_photos.head()

Unnamed: 0,id,vector,metadata,baseUrl
0,ALuQekoEDdHn-4zJECI5IfbAr1UfwOmwa1LjKeLjMRD1h6...,"[-0.0204818956553936, 0.023548683151602745, -0...",{'generated_text': 'the screen of an iphone wi...,https://lh3.googleusercontent.com/lr/AAJ1LKeq7...
1,ALuQekrYsr8xfGeVKydlZ5BV-8oqNeg8DIj5ZruswgnG-4...,"[-0.054045457392930984, 0.10627054423093796, -...",{'generated_text': 'a family poses for a photo...,https://lh3.googleusercontent.com/lr/AAJ1LKf2s...
2,ALuQekqBrUHCLeojoVet8LNKs6avSCkvMFLJ-NqH6Ybcks...,"[0.0678996816277504, 0.0504564493894577, -0.03...",{'generated_text': 'a white table with a white...,https://lh3.googleusercontent.com/lr/AAJ1LKdj2...
3,ALuQekq3jPhquGT-X6FbQ15ulkTMv8MjPeujfg24fgM8jn...,"[0.08913969993591309, 0.04626549035310745, -0....",{'generated_text': 'a white chair with a white...,https://lh3.googleusercontent.com/lr/AAJ1LKe27...
4,ALuQekrTVjjzq-DKEWbjNLafYgHe3kGlUmuj1snCznFOmP...,"[0.05451503023505211, 0.07686479389667511, -0....",{'generated_text': 'a white table with a small...,https://lh3.googleusercontent.com/lr/AAJ1LKeMA...


### Load and query embeddings with Pinecone

At the time of writting, Pinecone offers a free tier with one index: https://www.pinecone.io/pricing/

In [45]:
from tqdm.autonotebook import tqdm
import getpass

In [49]:
import os
import pinecone

# get api key from app.pinecone.io
print("Enter your Pinecone API key") 
api_key = getpass.getpass()

# find your environment next to the api key in pinecone console
# print("Enter your Pinecone Environment") 
# env = getpass.getpass()
env = 'gcp-starter'

pinecone.init(
    api_key=api_key,
    environment=env
)

Enter your Pinecone API key


 ········


In [50]:
# Giving our index a name
index_name = "photo-captions"

In [51]:
# Delete the index, if an index of the same name already exists
if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)

Creating a Pinecone Index.

In [52]:
# vector dimenstions
vdim = len(df_photos['vector'][0])
vdim

384

In [53]:
import time

dimensions = vdim
pinecone.create_index(name=index_name, dimension=dimensions, metric="cosine")

# wait for index to be ready before connecting
while not pinecone.describe_index(index_name).status['ready']:
    time.sleep(1)

In [54]:
index = pinecone.Index(index_name=index_name)

load

In [55]:
index.upsert(vectors=zip(df_photos.id, df_photos.vector, df_photos.metadata))  # insert vectors

# wait for index to be ready before connecting
while not pinecone.describe_index(index_name).status['ready']:
    time.sleep(1)

{'upserted_count': 172}

In [57]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00172,
 'namespaces': {'': {'vector_count': 172}},
 'total_vector_count': 172}

In [58]:
# input a semantic search phrase and query your photos
query = "white chair"

# create the query vector
xq = model.encode(query).tolist()

# now query
xc = index.query(xq, top_k=5, include_metadata=True)
xc

{'matches': [{'id': 'ALuQekq3jPhquGT-X6FbQ15ulkTMv8MjPeujfg24fgM8jnLD-f_oh5n32DOGLmsMQmE8Y2tK_1GR4st32AT4skgQdzrb77lWPw',
              'metadata': {'generated_text': 'a white chair with a white seat '
                                             'and a white table'},
              'score': 0.823199689,
              'values': []},
             {'id': 'ALuQekoVq5BubPE7CutfpJ1IAXSiTEjNDOVTXWjKasJ2ke_8evI1momxnYYh-qoEOhZwXir3YDcHNVQS7xnvAYNlnvRnqfx8QQ',
              'metadata': {'generated_text': 'a black leather office chair '
                                             'with a black leather seat'},
              'score': 0.643893957,
              'values': []},
             {'id': 'ALuQekqBrUHCLeojoVet8LNKs6avSCkvMFLJ-NqH6YbcksBMG-oGYKpKRL1ieIMJpBFpGVEeLTkI22E2EAXPVIhnEo-LbO_CaQ',
              'metadata': {'generated_text': 'a white table with a white top '
                                             'and a white base'},
              'score': 0.565575302,
              'values': 

In [59]:
img_urls = []

for i in range(0,5):
    img_id = xc['matches'][i]['id']
    img_url = df_photos.loc[df_photos['id'] == img_id, 'baseUrl'].iloc[0]
    img_urls.append(img_url)

In [None]:
Image(url=img_urls[0])

In [None]:
Image(url=img_urls[1])

In [None]:
Image(url=img_urls[2])

In [None]:
Image(url=img_urls[3])

In [None]:
Image(url=img_urls[4])