<a href="https://colab.research.google.com/github/jmartin1976/AKI-hackathon/blob/main/%5BPBP_BFA%5DAI_Image_and_Description_Enhancement.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Powered By People**
# **AI improvement of  images from L1 and L2 merchants**
## Requirements ##
1.   This notebook is supposed to run in a google drive with access to the folder structure indicated below
2.   The images are stored in a subfolder "Nano Projects Participants". This folder needs to be accesible in the google drive where the colab notebook is running.
3.   The structure of the subfolders below "Nano Projects Participants" is the following:
    - Nano Projects Partipants/*Name of the merchant*/*Product Images*/*Picks*
    - The code will process the images contained in the *Picks* subfolder
    - A new version of the images will be created inside a new subfolder *AI API* inside the subfolder *Picks*



# **Install the packages and import the libraries needed to run the code**

In [None]:
!pip install httpx        # to send httpx requests to the claid.ai API
!pip install openai       # to use openai API



In [None]:
import httpx
import urllib.request

In [None]:
import nest_asyncio
import asyncio

In [None]:
from openai import OpenAI

In [None]:
import requests
import io
import os
from urllib.parse import urlparse, parse_qs

In [None]:
import pandas as pd
import json

In [None]:
# libraries needed to mount google drive, retrieve credentials, authenticate,
# create a file service, upload files and manage httperrors when interacting with google api client
from googleapiclient.http import MediaIoBaseUpload
from google.colab import auth
from oauth2client.client import GoogleCredentials
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google.colab import drive

# **Required Functions Created to run the code**

## Send request to CLAID
 As these requests require:
 * Sending a file
 * Waiting for the file to be processed
 Using async requests allows other operations to run concurrently


In [None]:
async def send_request(url, headers, json_data):
    try:
        async with httpx.AsyncClient() as client:
            response = await client.post(url, headers=headers, json=json_data, timeout=30.0)
            if response.status_code == 200:
                data = response.json()
                print("Request successful, processing the result...")
                print(data)
                tmp_url = data['data']['output']['tmp_url']
                input_id = json_data['input'].split('id=')[1]
                color_aux = json_data['operations']['background']['color']
                return (f"{input_id}_{color_aux}", tmp_url)  # Return a tuple directly
            else:
                print(f"Request failed with status code: {response.status_code}, response: {response.text}")
    except Exception as e:
        print(f"An error occurred: {e}")

In [None]:
async def process_json_array(json_array, url, headers):
    all_results = []  # Initialize outside send_request to collect results from multiple calls
    for json_data in json_array:
        response_data = await send_request(url, headers, json_data)
        if response_data:
            all_results.extend(response_data)  # Append results from this call
        #1 request per second to be in the safe side (limit in CLAID API is 4 request per second)
        await asyncio.sleep(1)
    return all_results

In [None]:
# Asynchronously make the POST request
# This function is no longer used
async def send_request_upload(url, headers, json_data):
  try:
    async with httpx.AsyncClient() as client:
# timeout set to 30 seconds
        timer = 30.0
        response = await client.post(url, headers=headers, json=json_data, timeout=timer)

        if response.status_code == 200:
            # Assuming the API returns JSON data
            data = response.json()
            print("Request successful, processing the result...")
            print(data)

            # Extract the tmp_url from the response
            tmp_url = data['data']['output']['tmp_url']

            # Download the file content
            file_response = await client.get(tmp_url)

            if file_response.status_code == 200:
                file_content = file_response.content

                # Define your function to upload the file content to Google Drive
                await loop.run_in_executor(None, upload_file_to_google_drive, file_url, picks_folder_id)
            else:
                print("Failed to download the file from tmp_url.")
        else:
            print(f"Request failed with status code: {response.status_code}, response: {response.text}, input json: {json_data}")

  except httpx.ReadTimeout:
    print(f"The request timed out while waiting for a response from the server. input json: {json_data}")
  except httpx.HTTPStatusError as e:
    print(f"HTTP error occurred: {e.response.status_code}. input json: {json_data}")
  except Exception as e:
        # Handle other possible exceptions
        print(f"An error occurred: {e}")

## Create a json array
1. The ids of the original images are used to name the edited images
2. Two variables are included to create different versions of the json:
* file -> the id of the original file is used to define the input url
* color -> the list of different colors for the background is defined in an array colors
*<font color="orange">WARNING:</font> The json request format is different in the API and in the Playground of Claid.

In [None]:
def json_from_list_files_in_folder(service, folder_id):
    #List all files in the specified folder and generate JSON data
    json_array = []
    #two background colors per file #f7e3e3" (pink) "#ffffff" (background)
    #colors = ["#f7e3e3" , "#ffffff"] #define the list of colors
    colors = ["#ffffff"] #define the list of colors
    try:
        query = f"'{folder_id}' in parents and trashed = false and mimeType != 'application/vnd.google-apps.folder'"
        response = service.files().list(q=query, spaces='drive', fields='files(id, name)').execute()
        print(response)
        for file in response.get('files', []):
          for color in colors:
            json_data = {
                "input": f"https://drive.google.com/uc?export=download&id={file['id']}",
                # Add your operations here as per the initial requirement
                "operations": {
                  # enhance the image adding pixels
                  "restorations": {
                  "upscale": "smart_enhance"
                },
                # defines the output as 1000x1000 pixels
                "resizing": {
                  "width": 1000,
                  "height": 1000,
                  # fit in the canvas defined
                  "fit": "canvas"
                },
                # remove background using the product model version beta
                "background": {
                  "remove": {
                  "category": {
                      "type": "products",
                      "version": "beta"
                   },

                  "clipping": True
                  # Gloria Woodworks "clipping": False
                  },
                  # define background as pink "#f7e3e3" or white "#ffffff"
                  "color": f"{color}"
                },
                # defines 10% padding horizontally and vertically
                "padding": "10%"
                },
                "output": {
                  "format": {
                    "type": "png",
                     }
                }
            }
            print(json_data)
            json_array.append(json_data)
        return json_array
    except HttpError as error:
        print(f'An error occurred: {error}')
        return []

## Create AI name and AI description using only the image

In [None]:
def name_description_from_image(client, url_image, file_id):
    """
    Generates a name and description for an image using an AI model based on the provided URL.

    Parameters:
    - client: An initialized and authenticated API client capable of making requests.
    - url_image: The URL of the image to analyze.
    - file_id: An identifier for the file, included in the return for reference.

    Returns:
    - A tuple containing the file_id, ai_image_name, and ai_image_description.
    """
    prompt=(
        "Generate a name and an objective description of this product."
        "The name must have at most 2 words."
        "For the description, focus on describing the design, colors and patterns."
        "Provide a json with only two fields: ai_image_name and ai_image_description"
    )
    try:
        # API request
        response = client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": prompt
                        },
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": url_image
                            },
                        },
                    ],
                }
            ],
            max_tokens=200,
        )

        # Extract the content from the response
        message_content = response.choices[0].message.content

        # The content is expected to be in a Markdown code block format containing JSON.
        # Extracting JSON from the Markdown-like code block
        json_str = message_content.split('```json')[1].split('```')[0].strip()

        # Parse the JSON string
        data = json.loads(json_str)

        ai_image_name = data["ai_image_name"]
        ai_image_description = data["ai_image_description"]  # Note the key based on your format

        return (file_id, ai_image_name, ai_image_description)
    except Exception as e:
        print(f"An error occurred: {e}")
        return (file_id, None, None)

## Creates a subfolder to store the edited images

In [None]:
def find_or_create_subfolder(service, parent_folder_id, subfolder_name):
    # Search for the subfolder by name within the specified parent folder
    query = f"name = '{subfolder_name}' and '{parent_folder_id}' in parents and mimeType = 'application/vnd.google-apps.folder' and trashed = false"
    results = service.files().list(q=query, spaces='drive', fields="files(id, name)").execute()
    folders = results.get('files', [])

    # Check if the folder exists
    if folders:
        # Assuming the first matching folder is the one we want
        return folders[0]['id']
    else:
        # Folder doesn't exist, so create it
        folder_metadata = {
            'name': subfolder_name,
            'mimeType': 'application/vnd.google-apps.folder',
            'parents': [parent_folder_id]
        }
        folder = service.files().create(body=folder_metadata, fields='id').execute()
        print(f"Folder '{subfolder_name}' created successfully.")
        return folder.get('id')

In [None]:
async def async_find_or_create_subfolder(service, parent_folder_id, subfolder_name):
    # Use asyncio.to_thread to run the synchronous function in a separate thread
    return await asyncio.to_thread(find_or_create_subfolder, service, parent_folder_id, subfolder_name)

## Downloads the urls created by CLAID and stores assign the name of the id of the original file

In [None]:
def upload_file_to_google_drive(file_name, file_url, folder_id):
    # Download the file from the URL
    response = requests.get(file_url)
    print(file_url)

    if response.status_code != 200:
        print(f"Failed to download the file from {file_url}.")
        return

    file_content = io.BytesIO(response.content)

    # Create the media upload request body
    media = MediaIoBaseUpload(file_content, mimetype='application/octet-stream', resumable=True)

    # Define the metadata for the file to be uploaded
    file_metadata = {
        'name': file_name,
        'parents': [folder_id]
    }

    # Execute the upload
    file = service.files().create(body=file_metadata, media_body=media, fields='id').execute()

    print(f"File ID: {file.get('id')} uploaded successfully to folder ID: {folder_id}")

In [None]:
async def download_and_save_file(url, file_id, picks_id, target_folder_name):
    # Download the file
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        if response.status_code == 200:
            # Assume you have a way to get the 'picks_folder_id' where "AI API" subfolder exists or is to be created
            ai_api_folder_id = await async_find_or_create_subfolder(service, picks_id, target_folder_name)
            if ai_api_folder_id:
                # Save the file to Google Drive in the 'AI API' folder
                file_name = file_id  # Or another way to name the file meaningfully
                upload_file_to_google_drive(file_name, url, ai_api_folder_id)

In [None]:
async def process_downloads(id_url_pairs, picks_id, target_folder_name):
    for file_id, url in id_url_pairs:
        print(f"url: {url}  file_id: {file_id}")
        await download_and_save_file(url, file_id, picks_id, target_folder_name)

## GenAI Description Functions
Two alternatives are explored:
* Generate a new name and a new description using a previous product name and product description and asking OpenAI API to create a new one following an example. The prompt used and the example is included in the function.
* Generate a new name and a new description using a previous product name, product description and also a description of the image created using the Open AI API.

In [None]:
def improve_name_description(file_id,initial_name, initial_description):
    """
    Simulates the improvement of a product name and description using AI techniques.

    Parameters:
    - initial_name (str): The initial product name.
    - initial_description (str): The initial product description.

    Returns:
    - tuple: A tuple containing the improved name and description.
    """
    name_example="Short Cone"
    description_example=(
        "A piece that for its size becomes an excellent complement to the table setting."
        "Its simple design does not detract prominence for any flower, and that together with more vases"
        "can make compositions to decorate any shelf or corner."
        "Manufactured on a lathe in high temperature ceramic, natural rustic finish on the outside and transparent glaze on the inside."
        )
    context="You are a marketing expert able to creaste appealing descriptions for makers selling their products online."
    prompt=(
        f"Create an improved name and description based on this information: Initial name: {initial_name}. "
        f"Initial description: {initial_description}. "
        f"This is an example of a good name: {name_example}. "
        f"This is an example of a good description: {description_example}. "
        f"Make the product name of maximum 4 words. "
        f"Make the description of maximum 80 words. "
        f"The examples provided are from different products. Just take this as an indication of the style, do not use it to infer composition, materials, properties or origin"
        f"Do not add any element of information that is not included in the initial description. Just improve the writing without compromising accuracy."
        f"Please provide the output in json format in two variables: ai_name and ai_description"
        # TBD: Include a limit in the description lenght, and name?
        )
    client = OpenAI()
    response = client.chat.completions.create(
        #gpt-3.5 was trying to replicate the example format even if information like omposition or origin was not in the original_description therefor creation hallucinations.
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"{context}"},
             {"role": "user", "content": f"{prompt}"}
            ]
        )
    json_data=response.choices[0].message.content
    # Parse the JSON data
    data = json.loads(json_data)
    # Retrieve values
    ai_name = data['ai_name']
    ai_description = data['ai_description']
    return file_id,ai_name,ai_description

In [None]:
def improve_name_description_image(initial_name, initial_description, image_AI_description):
    """
    Simulates the improvement of a product description using a description generated from the image analysis.

    Parameters:
    - initial_name (str): The initial product name.
    - initial_description (str): The initial product description.
    - image_description (str): The description of an image, generated using AI to analyze the image.

    Returns:
    - tuple: A tuple containing the name and description.
    """
    name_example="Short Cone"
    description_example=(
        "A piece that for its size becomes an excellent complement to the table setting."
        "Its simple design does not detract prominence for any flower, and that together with more vases"
        "can make compositions to decorate any shelf or corner."
        "Manufactured on a lathe in high temperature ceramic, natural rustic finish on the outside and transparent glaze on the inside."
        )
    context="You are a marketing expert able to creaste appealing descriptions for makers selling their products online."
    prompt=(
        f"Create an improved name and description based on this information: Initial name: {initial_name}. "
        f"Initial description: {initial_description}. "
        f"AI generated image description: {image_AI_description}"
        f"This is an example of a good name: {name_example}. "
        f"This is an example of a good description: {description_example}. "
        f"Make the product name of maximum 4 words. "
        f"Make the description of maximum 80 words. "
        f"The examples provided are from different products. Just take this as an indication of the style, do not use it to infer composition, materials, properties or origin"
        f"Do not add any element of information from the example that is not included in the initial description. Just improve the writing without compromising accuracy."
        f"Use the AI generated image description to include some general information on the appearance of the product, colors and patterns but do not trust any reference to composition or materials. If AI generated image description is #N/A, ignore this information"
        f"Please provide the output in json format in two variables: ai_name and ai_description"
        )
    client = OpenAI()
    response = client.chat.completions.create(
        #gpt-3.5 was trying to replicate the example format even if information like omposition or origin was not in the original_description therefor creation hallucinations.
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"{context}"},
             {"role": "user", "content": f"{prompt}"}
            ]
        )
    json_data=response.choices[0].message.content
    # Parse the JSON data
    data = json.loads(json_data)
    # Retrieve values
    ai_name = data['ai_name']
    ai_description = data['ai_description']
    return ai_name,ai_description

## File management functions

### Delete from the Picks subfolder ids, the subfolders creatd with the AI images

In [None]:
def delete_target_subfolder_in_folders(folder_ids, target_folder_name):
    for folder_id in folder_ids:
        try:
            # Search for the target subfolder by name within the current folder
            query = f"'{folder_id}' in parents and name = '{target_folder_name}' and mimeType = 'application/vnd.google-apps.folder'"
            response = service.files().list(q=query,
                                            spaces='drive',
                                            fields='files(id, name)').execute()
            subfolders = response.get('files', [])

            # Proceed if the target subfolder is found
            for subfolder in subfolders:
                try:
                    # Delete the subfolder
                    service.files().delete(fileId=subfolder['id']).execute()
                    print(f"Deleted folder: {subfolder['name']} ({subfolder['id']})")
                except HttpError as error:
                    print(f'An error occurred while deleting the folder: {error}')
        except HttpError as error:
            print(f'An error occurred: {error}')
        except Exception as error:
            print(f'An unexpected error occurred: {error}')

### Get File Id

In [None]:
def get_file_id(service, file_name, folder_id):
    """
    Retrieves the ID of a file with a specified name located within a specified folder on Google Drive.

    Parameters:
    - service: Authenticated Google Drive API service instance.
    - file_name: Name of the file to find.
    - folder_id: ID of the folder where the file is located.

    Returns:
    - The ID of the file if found, otherwise None.
    """
    # Build the query to find the file by name within the specified folder
    query = f"'{folder_id}' in parents and name = '{file_name}' and mimeType = 'application/vnd.google-apps.spreadsheet' and trashed = false"

    try:
        # Execute the query
        response = service.files().list(q=query, spaces='drive', fields='files(id)').execute()
        files = response.get('files', [])

        # Check if any files were found
        if files:
            # Assuming there is only one file with this name in the folder, return the ID
            return files[0]['id']
        else:
            print('No files found.')
            return None
    except Exception as e:
        print(f'An error occurred: {e}')
        return None

### Store the content of a dataframe in a google sheet file

In [None]:
def df_to_googlesheet(service, dataframe, folder_id, dest_sheet):
    """
    Uploads a DataFrame to a Google Sheet within the specified folder.

    Args:
    - service: Google Drive service object.
    - dataframe: DataFrame containing the data to upload.
    - folder_id: ID of the parent folder where the sheet will be stored.
    - dest_sheet: Name of the destination Google Sheet.

    Returns:
    - True if successful, False otherwise.
    """
    try:
        # Create the Google Sheet
        body = {
            'name': dest_sheet,
            'mimeType': 'application/vnd.google-apps.spreadsheet',
            'parents': [folder_id]
        }
        response = service.files().create(body=body).execute()
        sheet_id = response['id']

        # Convert DataFrame to CSV string
        csv_data = dataframe.to_csv(index=False)

        # Upload CSV data to the Google Sheet
        media_body = MediaIoBaseUpload(io.BytesIO(csv_data.encode()), mimetype='text/csv', resumable=True)
        service.files().update(fileId=sheet_id, media_body=media_body).execute()

        print(f"Data uploaded successfully to '{dest_sheet}'")
        return True
    except Exception as e:
        print(f"An error occurred: {e}")
        return False

### Load the content of a google sheet in a dataframe

In [None]:
def sheet_to_dataframe(service, file_id):
    """
    Converts the first sheet of a Google Sheet identified by its file_id into a Pandas DataFrame.

    Parameters:
    - service: Authenticated Google Sheets API service instance.
    - file_id: ID of the Google Sheet to convert.

    Returns:
    - A Pandas DataFrame containing the data from the first sheet of the Google Sheet.
    """
    try:
        # Request to get the Google Sheet data
        sheet_metadata = service.spreadsheets().get(spreadsheetId=file_id).execute()
        first_sheet_title = sheet_metadata['sheets'][0]['properties']['title']

        # Get the data from the first sheet
        result = service.spreadsheets().values().get(spreadsheetId=file_id, range=first_sheet_title).execute()
        values = result.get('values', [])

        # Check if there is data
        if not values:
            print(f"No data found in the first sheet: {first_sheet_title}")
            return pd.DataFrame()  # Return an empty DataFrame if no data was found

        # Create a DataFrame from the data
        df = pd.DataFrame(values[1:], columns=values[0])
        return df

    except Exception as e:
        print(f"An error occurred: {e}")
        return pd.DataFrame()  # Return an empty DataFrame in case of error

# **Common Data and Execution**

## Allow asyncronous execution

In [None]:
# Apply the necessary changes to allow asyncio to run in notebook environments
nest_asyncio.apply()

## Data to use CLAID

In [None]:
# Define the URL for the API endpoint
url = "https://api.claid.ai/v1-beta1/image/edit"

# Your actual API key. Include here your api_key
api_key = "18b3341b6fc2424db2263a58a3e2bebe"

# Headers including the Authorization and Content-Type
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Data to use Open AI API

In [None]:
os.environ["OPENAI_API_KEY"]="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"

## Name of the subfolder to store the created images

In [None]:
target_folder_name = "tmp AI API"


## Mount Google Shared Folder
### When mounting the drive pop up screesn will appear.
1. Select the google account with access to the shared folder.
2. Allow to mount the drive
3. Allow access to the credentials
4. Allow services the required rights

In [None]:
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [None]:
# this is required for json array get_file_id
auth.authenticate_user()
creds = GoogleCredentials.get_application_default()
service = build('drive', 'v3', credentials=creds)

In [None]:
# clean and define which one we need to use
def authenticate_google_drive():
    # Get default credentials
    credentials = GoogleCredentials.get_application_default()

    # Specify the desired scope for Google Sheets API
    scope = 'https://www.googleapis.com/auth/spreadsheets'
    credentials = credentials.create_scoped([scope])

    # Build and return the service
    return build('drive', 'v3', credentials=credentials)

In [None]:
nano_folder_id="1QbipFkdcQ7OpXBfsSvSUATPo76vwDgG_"

# Sample execution in one folder

1. Select a subfolder, retrieve the id and include it in the picks_folder_id variable
- 1.1 To retrieve the id of a folder go to the folder copy the url in your browser ie: https://drive.google.com/drive/u/0/folders/1PnBf9D4W3zNnZ08hFIlWmVTgcP1WL4G1
And copy paste the last part: 1PnBf9D4W3zNnZ08hFIlWmVTgcP1WL4G1

In [None]:
#Mika Basket/Product Images/Picks folder id
picks_folder_id="1PnBf9D4W3zNnZ08hFIlWmVTgcP1WL4G1"

2. An array with one request per file found in the folder will be creted. The variable input contains the path for Claid to receive the original image
- 2.1 The input images need to be under 10Mb of size and under 24MPS resolution

In [None]:
json_array=[]
json_array=json_from_list_files_in_folder(service, picks_folder_id)

{'files': [{'id': '1LoKHx3bpPPd6M0nh2cjHL0uXKcSRtKIy', 'name': 'Copy of red and green basket-purple handle.jpeg'}, {'id': '13WG7_O_SlcJhIEhK79eEMoRnAviEmzUX', 'name': 'Copy of Red basket with plain handles.jpeg'}, {'id': '1vhwODuTgxyWG2sA_R1FevdHAE69YNVvD', 'name': 'manjano basket.jpeg'}]}
{'input': 'https://drive.google.com/uc?export=download&id=1LoKHx3bpPPd6M0nh2cjHL0uXKcSRtKIy', 'operations': {'restorations': {'upscale': 'smart_enhance'}, 'resizing': {'width': 1000, 'height': 1000, 'fit': 'canvas'}, 'background': {'remove': {'category': {'type': 'products', 'version': 'beta'}, 'clipping': True}, 'color': '#ffffff'}, 'padding': '10%'}, 'output': {'format': {'type': 'png'}}}
{'input': 'https://drive.google.com/uc?export=download&id=13WG7_O_SlcJhIEhK79eEMoRnAviEmzUX', 'operations': {'restorations': {'upscale': 'smart_enhance'}, 'resizing': {'width': 1000, 'height': 1000, 'fit': 'canvas'}, 'background': {'remove': {'category': {'type': 'products', 'version': 'beta'}, 'clipping': True}, 

3. The requests are sent to claid using an asyncronous function process_json array that internally send the requests.

In [None]:
# Use the current event loop to run your coroutine
loop = asyncio.get_event_loop()
results_array=loop.run_until_complete(process_json_array(json_array,url,headers))

Request successful, processing the result...
{'data': {'input': {'ext': 'jpg', 'mps': 1.92, 'mime': 'image/jpeg', 'format': 'JPEG', 'width': 1200, 'height': 1600}, 'output': {'ext': 'png', 'mps': 1.0, 'mime': 'image/png', 'format': 'PNG', 'width': 1000, 'height': 1000, 'tmp_url': 'https://dl.claid.ai/6c9aa9ad-d07c-4323-8edb-7cc999ef8c6b/uc.png', 'object_key': None, 'object_bucket': None, 'object_uri': None, 'claid_storage_uri': None}}}
Request successful, processing the result...
{'data': {'input': {'ext': 'jpg', 'mps': 1.92, 'mime': 'image/jpeg', 'format': 'JPEG', 'width': 1200, 'height': 1600}, 'output': {'ext': 'png', 'mps': 1.0, 'mime': 'image/png', 'format': 'PNG', 'width': 1000, 'height': 1000, 'tmp_url': 'https://dl.claid.ai/f8a4bfe0-c00e-450d-b16e-e36088c63ab2/uc.png', 'object_key': None, 'object_bucket': None, 'object_uri': None, 'claid_storage_uri': None}}}
Request successful, processing the result...
{'data': {'input': {'ext': 'jpg', 'mps': 1.92, 'mime': 'image/jpeg', 'forma

4. Review the results of the previous command and check tmp_url so you have the resulting image with background removal, padding and crop.

In [None]:
# transform the list into an dictionary
id_url_pairs = [(results_array[i], results_array[i + 1]) for i in range(0, len(results_array), 2)]

In [None]:
id_url_pairs

[('1LoKHx3bpPPd6M0nh2cjHL0uXKcSRtKIy_#ffffff',
  'https://dl.claid.ai/6c9aa9ad-d07c-4323-8edb-7cc999ef8c6b/uc.png'),
 ('13WG7_O_SlcJhIEhK79eEMoRnAviEmzUX_#ffffff',
  'https://dl.claid.ai/f8a4bfe0-c00e-450d-b16e-e36088c63ab2/uc.png'),
 ('1vhwODuTgxyWG2sA_R1FevdHAE69YNVvD_#ffffff',
  'https://dl.claid.ai/a0f5d5da-d73e-4483-9403-c2f56e128785/uc.png')]

5. The following command will store the images that are now in the CLAId storage in the fodder defined in target_folder_name

In [None]:
target_folder_name

'tmp AI API'

In [None]:
# Ensure the event loop is running and call process_downloads
loop.run_until_complete(process_downloads(id_url_pairs, picks_folder_id, target_folder_name))

url: https://dl.claid.ai/6c9aa9ad-d07c-4323-8edb-7cc999ef8c6b/uc.png  file_id: 1LoKHx3bpPPd6M0nh2cjHL0uXKcSRtKIy_#ffffff
Folder 'tmp AI API' created successfully.
https://dl.claid.ai/6c9aa9ad-d07c-4323-8edb-7cc999ef8c6b/uc.png
File ID: 1zwIEdeVuOcd2TRI8CRjrJQSZaCBma-hz uploaded successfully to folder ID: 1Pfogoo0NwIZwxVFfswgAa_OXxk5-q4Cg
url: https://dl.claid.ai/f8a4bfe0-c00e-450d-b16e-e36088c63ab2/uc.png  file_id: 13WG7_O_SlcJhIEhK79eEMoRnAviEmzUX_#ffffff
https://dl.claid.ai/f8a4bfe0-c00e-450d-b16e-e36088c63ab2/uc.png
File ID: 1aWSHGW1VJieZPCh2RyDLVy7WA3-Op5O0 uploaded successfully to folder ID: 1Pfogoo0NwIZwxVFfswgAa_OXxk5-q4Cg
url: https://dl.claid.ai/a0f5d5da-d73e-4483-9403-c2f56e128785/uc.png  file_id: 1vhwODuTgxyWG2sA_R1FevdHAE69YNVvD_#ffffff
https://dl.claid.ai/a0f5d5da-d73e-4483-9403-c2f56e128785/uc.png
File ID: 1Xn3CT9uHbWlkdkBPxwLWBGz-TzP994XD uploaded successfully to folder ID: 1Pfogoo0NwIZwxVFfswgAa_OXxk5-q4Cg


6. Check the folder was created and the files included. It might take a few seconds to update in your browser.
- 6.1 You can manually delete this fodler if you do not need it.  Deleting this folder will not affect point 7

7. The following lines show how to generate a description of the image using Open AI. To do this we rely on the temporal storage provided by CLAID as open AI is not able to acceess drive files.

In [None]:
# get file id of the image generated by CLAID and url from results_array. We remove the last 8 characters as this are not the file_id
file1=results_array[0][:-8]
url1=results_array[1]

In [None]:
# creates a name and description from the content of the image using openAI
client = OpenAI()
name_description_from_image(client, url1, file1)

('14w-9aiMlrrMlrCjqwTzYp-ZoAIBk6tOE',
 'Strap Tote',
 'A cylindrical tote bag featuring horizontal stripes in shades of orange, beige, and olive green. It has an attached brown leather button closure and matching leather straps. The tote is made from a woven fabric, giving it a textured appearance.')

# Batch Execution
1. This commented list contain the list of all *Picks* subfolders ids. You can copy paste those you want to work on in the varibale below.

In [None]:
'''picks_subfolder_id_list=[
  "1t1uX2r5KWghFRMY4-99H4GupNCjubaZu", # Nyota Basquets/Product Images/Picks
  "1ePpfWVo1G___nFELQuZy10CVrayh-Rw2", # Lucia's Beading Group/Product Images/Picks
  "1PnBf9D4W3zNnZ08hFIlWmVTgcP1WL4G1", # Mika Basket/Product Images/Picks
  "1vX-NMOVJlnP4wReTShet3ESaRnzMyfyQ", # Shop Vata/Product Images/Picks
  "1E8J-uebYnINDHc-6yRqbE0fxTSiLJaZ4", # Kiteghe Weavers/Product Images/Picks
  "16zBkND1BbxO9U9fmxOwKT2me3jNtoXUD", # Kibera Collection/Product Images/Picks
  "1x2pgCFqbkLbB2t1NWPRYRKSZnMcEnyiC", # Kadi Kraft/Product Images/Picks
  "1i0beRzefjpJ26JfbSAU4pAsLJbs5NBrf", # Jireh Handwoven Craft/Product Images/Picks
  "1vowHVGPBY11rEZgmke2qC4d9tzx9n_Df", # Goodies African Interiors/Product Images/Picks
  "163CsREOjmg0cXsd91-eiB5EC30KAvNgp", # Gloria Woodworks/Product Images/Picks
  "17OfqJmAes5jVnYpFpl_GdOeJh8AeEdL_", # Ceramiqa Pottery Studio/Product Images/Picks
  "1tAkHp_DKdiCH0MIS0CGAWNVSOOfCeg11", # Additional Images/Tosheka Textiles/Product Images/Picks
  "1IV6HSAKuBdpse2hMobYu3EzSf4tsm1M8", # Additional Images/Beadworks/Product Images/Picks
  "106uB-hi47ftDfqPM3aUogStQZojOatkl", # Ceekay Kiondo Crafts/Product Images/Picks
  "1AC2fiFkDUf5CVW8yfC2krk05F7qMUNtH", # Additional Images/Soapstone interiors/Product Images/Picks
  "1xt2TdsIZkrmKi6GtZGoR4DoT_rDp8ejH", # Additional Images/Anchor leaders/Product Images/Picks
  "163CsREOjmg0cXsd91-eiB5EC30KAvNgp", # Gloria Woodworks/Product Images/Picks
  "1h9-nOdQ_daYipMTvIsQjr74_TtbiXltP",  # Additional Images/Simaloi Crafts/Product Images/Picks
  "1YxbfsooChxkkXPfmd1Olfx1LrCwD-wfu" # Shatered Glass/product Images/Picks
]

In [None]:
picks_subfolder_id_list=[
#  "1tAkHp_DKdiCH0MIS0CGAWNVSOOfCeg11", #Additional Images/Tosheka Textiles/Product Images/Picks
  "106uB-hi47ftDfqPM3aUogStQZojOatkl" # Ceekay Kiondo Crafts/Product Images/Picks
]


#<font color="red">DANGER ZONE:</font>
- The following command is used to delete a subfolder with the name defined in target_folder_id in all the folders identified by their id in picks_subfolder_id. Run it only if needed


In [None]:
target_folder_name="AI tmp"
# Before running again the file delete the subfolders created with the AI images
delete_target_subfolder_in_folders(picks_subfolder_id_list, target_folder_name)

Deleted folder: AI tmp (17lYxxcTQhdt8mVs9CJW-_T3XN4WCq3CS)
Deleted folder: AI tmp (1Hs53dh6bkHwvvxgxz7Ve8b9dZsEQuT4_)


# Background removal
- Runs the code in all the folders defined in picks_subfolder_id_list
- Might take a lot of time depending on the number of subfolders and the number of files included in each of those.
- Make sure images are under 10Mb, 24MPS and that you have credit in Claid.ai

In [None]:
final_array=[]
for picks_id in picks_subfolder_id_list[:]:
  json_array=json_from_list_files_in_folder(service, picks_id)
  loop = asyncio.get_event_loop()
  results_array=loop.run_until_complete(process_json_array(json_array,url,headers))
  id_url_pairs = [(results_array[i], results_array[i + 1]) for i in range(0, len(results_array), 2)]
  print(f"id_url_pairs: {id_url_pairs}")
  loop.run_until_complete(process_downloads(id_url_pairs, picks_id, target_folder_name))
  final_array.extend(results_array)  # Append results_array at the end of final_array

{'files': [{'id': '14w-9aiMlrrMlrCjqwTzYp-ZoAIBk6tOE', 'name': 'Copy of Multi- stripped coloured kiondo bag with black lining and over the shoulder double straps.jpeg'}, {'id': '1_x-mvRcjN55vieqGIjhsF_Pb2p6_35zt', 'name': 'Copy of WhatsApp Image 2024-03-27 at 1.28.53 PM.jpeg'}, {'id': '1ylthSHgWDq1tZ-zyH4Avh1zBYAhoCqUT', 'name': 'Copy of White brown kiondo with Ankara zipped lining and over the shoulder double straps.jpeg'}]}
{'input': 'https://drive.google.com/uc?export=download&id=14w-9aiMlrrMlrCjqwTzYp-ZoAIBk6tOE', 'operations': {'restorations': {'upscale': 'smart_enhance'}, 'resizing': {'width': 1000, 'height': 1000, 'fit': 'canvas'}, 'background': {'remove': {'category': {'type': 'products', 'version': 'beta'}, 'clipping': True}, 'color': '#ffffff'}, 'padding': '10%'}, 'output': {'format': {'type': 'png'}}}
{'input': 'https://drive.google.com/uc?export=download&id=1_x-mvRcjN55vieqGIjhsF_Pb2p6_35zt', 'operations': {'restorations': {'upscale': 'smart_enhance'}, 'resizing': {'width':

# Creates ai_names and ai_descriptions based on the content of the images
## Uses the temporal storage of CLAID to generate the ai_name and ai_description
1. These urls of these temporal files have been stored in final_array during the background removal BATCH conversion
2. Use this image to generate a new name and description. The name is not used further in the process

In [None]:
client = OpenAI()

In [None]:
# Initialize a set to track processed files
processed_files = set()
data = []
df=[]
# Process the array in pairs and generates a data array
for i in range(0, len(final_array), 2):
    if i + 1 >= len(final_array):
        break  # Ensure there's a pair available
    file1 = final_array[i][:-8]  # Process file1 from the even index, skipping last 8 chars
    url1 = final_array[i + 1]  # Process url1 from the next odd index

    # Check if file1 has already been processed
    if file1 in processed_files:
        continue  # Skip this iteration if file1 was processed previously

    # Assuming name_description_from_image is a function you can call
    # Example call: result = name_description_from_image(client, url1, file1)
    # You need to define 'client' or ensure it's passed correctly into this scope.
    file_id, name, description = name_description_from_image(client, url1, file1)

    # Add file1 to the set of processed files
    processed_files.add(file1)

    # Append the result to the data list as a tuple
    data.append({'file_id': file_id, 'ai name from img': name, 'ai desc from img': description})

# converts the array into a dataframe
df = pd.DataFrame(data)

## Persist the results in a google sheet "AI name and description from Image" in the Nano Project Participant Folder
- file_id: the id of the original file
- ai_name_from_img: the name generated by ai from the image
- ai_desc_from_img: the description generated by ai from the image

In [None]:
df.head(8)

Unnamed: 0,file_id,ai name from img,ai desc from img
0,14w-9aiMlrrMlrCjqwTzYp-ZoAIBk6tOE,StripeTote,The bag features a cylindrical design with hor...
1,1_x-mvRcjN55vieqGIjhsF_Pb2p6_35zt,Straw Satchel,A handcrafted shoulder bag featuring a woven s...
2,1ylthSHgWDq1tZ-zyH4Avh1zBYAhoCqUT,Tribal Tote,The product is a woven tote bag featuring a mi...


In [None]:
creds=[]
service=[]
service = authenticate_google_drive()

1. Define the name of the google sheet you want to create under the nano_folder_id.
2. Make sure you do not overwrite valuable information.

In [None]:
file_AI_fromimage="AI name and description from image_v2"
df_to_googlesheet(service, df, nano_folder_id, file_AI_fromimage)

Data uploaded successfully to 'AI name and description from image_v2'


True

3. Remember to persist this file as you would need to re run the backgrlund removal to create the image description.
4. Alternatively you can move images to a server where Open AI can access - as of now it cannot access google drive -

# Improve name and description using only provided names and descriptions
- This code expect to receive the following information in a google sheet. The column names need to be exactly the following -case sensitive-:
    * Link -> A link to the url. It will be used to retrieve the id of the file. If the format is not correct the file_id column will not be added.
    * Product name -> The original product name
    * Product description -> The original product descfription
- Make sure the file does not include merged cells


In [None]:
# this code can be used to check if the specified google sheet document, sheet_name can be found in the nano_folder_id
sheet_name = "Input 2"
file_id = get_file_id(service, sheet_name, nano_folder_id)

if file_id:
    print(f"File ID: {file_id}")
else:
    print("File not found.")

File ID: 1xeXTn7Rr_uz19ni1GWxQoWrTY4R02fn4t-fet2fw3n8


## Load the content of sheet_name id in a dataframe df_human

In [None]:
df_human=[]

In [None]:
#delete?
auth.authenticate_user()
creds = GoogleCredentials.get_application_default()
service_sheets = build('sheets', 'v4', credentials=creds)

In [None]:
df_human = sheet_to_dataframe(service_sheets, file_id)

In [None]:
df_human.head()

Unnamed: 0,Maker,Link,Product name,Product description,Dimensions in inches,Materials,Next steps
0,Shattered Glass,https://drive.google.com/file/d/10tBeG2jDeMJin...,Glass Sets,"These come as a set of 1 Wine/Water carafe, 4 ...",,glass,
1,Anchor Leather & Craft,https://drive.google.com/file/d/1jwUb4oQL96WEW...,Placemats,Banana fibre placemat woven \nby women from Up...,"W -13.8""",Banana Fibre,
2,Anchor Leather & Craft,https://drive.google.com/file/d/1ZFA6hnf-ULvF7...,Mchele Bowl - Speckled Green,Hand made from Kenyan\nclay. This medium speck...,"L - 5.5""\nH -3""\nW -4""",Clay,
3,Anchor Leather & Craft,https://drive.google.com/file/d/1bu9ds2b0CPpK5...,Mchele Bowl - Speckled White,Hand made from Kenyan clay.\nThis medium speck...,"L - 5.5""\nH -3""\nW -4""",Clay,
4,Anchor Leather & Craft,https://drive.google.com/file/d/1y-lWDLbkOD1e8...,Clear Pitcher,Recycled blown glass jug. \nHand blown thrown ...,"L - 4""\nH - 8""\nW - 4.5""",Recycled Glass,


## Add columns ai_name, ai_description and file_id to the dataframe

In [None]:
if 'ai_name' not in df_human.columns:
    df_human['ai_name'] = None  # or use '' for an empty string if that's more appropriate
if 'ai_description' not in df_human.columns:
    df_human['ai_description'] = None  # or use ''
if 'file_id' not in df_human.columns:
    df_human['file_id'] = None  # or use '' for an empty string if that's more appropriate

## Uses Open AI to generate a new product name and product descrption
- For each of the lines in the sheet_file. Add file_id, ai_name and ai_description to the dataframe
- This is a slow process that uses Open AI resources, make sure you use a file with a limited number of rows of change the llop conditions until you feel confident with the results.

In [None]:
# Iterate over each row by index
for index, row in df_human.head().iterrows():
    url = row['Link']
    parsed_url = urlparse(url)
    path_parts = parsed_url.path.split('/')
    # Extract the file ID from the path
    file_id = path_parts[3] if len(path_parts) > 3 else None
    initial_name = row['Product name']
    initial_description = row['Product description']

    # Call the function to improve name and description
    file_id, ai_name, ai_description = improve_name_description(file_id, initial_name, initial_description)
    print(f"> initial name: {initial_name}")
    print(f"> ai name: {ai_name}")
    print(f"> initial description: {initial_description}")
    print(f"> ai description: {ai_description}")
    # Update the DataFrame with the new values
    df_human.at[index, 'file_id'] = file_id
    df_human.at[index, 'ai_name'] = ai_name
    df_human.at[index, 'ai_description'] = ai_description

> initial name: Glass Sets
> ai name: Sophisticated Beverage Ensemble
> initial description: These come as a set of 1 Wine/Water carafe, 4 (Whisky/water/beer) glasses, packed in a gift box.
> ai description: This all-inclusive collection features 1 versatile Wine/Water Carafe and 4 multifunctional glasses suitable for whisky, water or beer. Neatly packed in an elegant gift box, it complements any table setting effortlessly. Embrace the simplicity of design that does not overshadow the contents, elevating your drinking experience.
> initial name: Placemats
> ai name: Kenyan Empowerment Placemats
> initial description: Banana fibre placemat woven 
by women from Upper Eastern 
Kenya, using sustainable 
fibre to create value and 
empower these women
> ai description: Sustainably crafted from banana fibre, these unique placemats are handwoven by empowered women of Upper Eastern Kenya. They add an authentic, eco-friendly touch to your table setting. Their rustic simplicity allows your dishes

- Stores the content of the dataframe in an output file in the same folder (a different folder can be defined using its id)

In [None]:
# define the name of the output file
output_file="TMP AI Name Description_v2"
df_to_googlesheet(service, df_human, nano_folder_id, output_file)

Data uploaded successfully to 'TMP AI Name Description_v2'


True

# Improve name and description using AI generated image description
- Also used a provided product name and product description
- This code relies on an image description previously generated by AI (see above)
- This code expect to receive the following information in a google sheet. The column names need to be exactly the following -case sensitive-:
    * Product name -> The original product name
    * Product description -> The original product descfription
    * ai desc from img -> The description of the image, this is generated inmidiately after processing the images (see above)
- Make sure the file does not include merged cells


In [None]:
# define the name of the google sheet document
sheet_name = "[COPY] AI Name Description_v2_ToProcess"
file_id = get_file_id(service, sheet_name, nano_folder_id)

if file_id:
    print(f"File ID: {file_id}")
else:
    print("File not found.")

File ID: 1n8IMS8oew2djJGQKoXxsGdQyWvoLkEZ16WyiO9qlINs


## Load the content of sheet_name id in a dataframe df_AI

In [None]:
df_AI=[]
auth.authenticate_user()
creds = GoogleCredentials.get_application_default()
service_sheets = build('sheets', 'v4', credentials=creds)
df_AI = sheet_to_dataframe(service_sheets, file_id)

## Add columns ai_name_v2, ai_description_v2 to the dataframe

In [None]:
if 'ai_name_v2' not in df_AI.columns:
    df_AI['ai_name_v2'] = None  # or use ''
if 'ai_description_v2' not in df_AI.columns:
    df_AI['ai_description_v2'] = None  # or use ''

## Uses Open AI to generate a new product name and product descrption
- For each of the lines in the sheet_file. Add ai_name_v2 and ai_description_v2 to the dataframe
- This is a slow process that uses Open AI resources, make sure you use a file with a limited number of rows of change the loop conditions until you feel confident with the results.

In [None]:
for index, row in df_AI.iterrows():
    initial_name = row['Product name']
    initial_description = row['Product description']
    image_AI_description = row['ai desc from img']
    print(image_AI_description)

    # Call the function to improve name and description with the AI description of the image
    ai_name, ai_description = improve_name_description_image(initial_name, initial_description, image_AI_description)
    print(f"> initial name: {initial_name}")
    print(f"> ai name: {ai_name}")
    print(f"> initial description: {initial_description}")
    print(f"> ai image description: {image_AI_description}")
    print(f"> final ai description: {ai_description}")

    # Update the DataFrame with the new values
    df_AI.at[index, 'ai_name_v2'] = ai_name
    df_AI.at[index, 'ai_description_v2'] = ai_description

A circular woven mat with a concentric circle pattern. The design features natural tones, ranging from light tan to deep brown, with variations of shading and texture throughout the spiral weave. The rustic aesthetic suggests a durable, natural fiber construction, typically seen in handcrafted items.
> initial name: Placemats
> ai name: Kenyan Artisan Circle Placemat
> initial description: Banana fibre placemat woven 
by women from Upper Eastern 
Kenya, using sustainable 
fibre to create value and 
empower these women
> ai image description: A circular woven mat with a concentric circle pattern. The design features natural tones, ranging from light tan to deep brown, with variations of shading and texture throughout the spiral weave. The rustic aesthetic suggests a durable, natural fiber construction, typically seen in handcrafted items.
> final ai description: Hand-woven by empowered women of Upper Eastern Kenya, our Kenyan Artisan Circle Placemat makes a rustic yet elegant addition t

- Stores the content of the dataframe in an output file in the same folder (a different folder can be defined using its id)

In [None]:
# define the name of the output file
output_file="TMP AI Name Description_FromImage_v2"
df_to_googlesheet(service, df_AI, nano_folder_id, output_file)

Data uploaded successfully to 'TMP AI Name Description_FromImage_v1'


True

## AUX CODE
### Sample code to generate an improved product name and product description using human product name, human product description and genAI image description

In [None]:
initial_name="Emerald Splatter Kenyan Bowl"
initial_description="Handcrafted from authentic Kenyan clay, this stunning Emerald Splatter Bowl exudes a unique charm. Its speckled green pattern draws the eye, making it a standout piece in any setting. Perfect to add a pop of color and character to your table, it brings nature's hue home with a rustic finish outside and a clear, glazed inside."
image_AI_description="This is a circular ceramic bowl with a smooth, slightly curved exterior. The outer surface exhibits a gradient from a speckled teal at the base, transitioning to a sandy beige towards the rim. The inside features a creamy, glossy finish, giving the bowl a serene, understated elegance suitable for various decors."
improve_name_description_image(initial_name, initial_description, image_AI_description)

entra en la función


('Emerald Speckle Artisan Bowl',
 "Exclusively handcrafted Kenyan clay bowl with a rustic exterior, boasting a beautiful gradient of speckled teal and sandy beige hues. The glossy inner surface is creamy, providing gentle elegance. Striking and unique, it seamlessly blends nature's palette into your setting, becoming the focal point on any table.")

### Snippet to download a file


In [None]:
# Make sure the file exist in the url
url = 'https://dl.claid.ai/fc40412f-6936-43f6-95f5-c7c6642d4a6e/uc.png'
filename = "zebra bracelet.png"
# Download the file from `url` and save it locally under `filename`
urllib.request.urlretrieve(url, filename)

('zebra bracelet.png', <http.client.HTTPMessage at 0x7aaeb48c0b80>)