# EyeLevel - A Comprehensive Guide
*Updated 2024-12-11*

This notebook is designed to help you use and fully understand EyeLevel's tools. For support feel free to either contact support@eyelevel.ai (general technical support).

---

## What is EyeLevel?

EyeLevel is a set of unified technologies which are designed to allow you to parse and search through documents. This can be used in many applications, including:
- Finding references within a large document set
- Performing retreival augmented generation
- Reformatting visually dense documents into a useful textual representation

Eyelevel has two core technologies: GroundX and X-Ray.

X-Ray is a modern take on document parsing: it uses a variety of computer vision and natural language processing techniques to turn documents (even those with complex formatting,) into a textual representation we call a "semantic object". Semantic objects contain key information about the document, sections of the document, and elements within the document in order to provide highly contextualized and useful representations of the source material.

GroundX uses the semantic objects created by X-Ray to perform search. You can put a natural language question into GroundX and you'll get back a list of semantic objects, generated with X-Ray based on your documents, which are relevent to your question.

## How does X-Ray work?

We created a fine tuned vision model which is specifically designed to identify key elements within documents. We observe a variety of element types, but predominately concern ourselves with text, tables, and graphical figures. Once these elements have been identified they are extracted from the document and sent to a pipeliene depending on the type of element. Text is simply extracted while tables and images are grounded within a textual representation via fine tuned multimodal LLMs.

Once all elements within a document are identified, extracted, and grounded textually, X-Ray constructs a summarization on a document and section level based on the extracted textual representations. This allows X-Ray to create a representation of the greater document context, which is used to build semantic objects.

We use extracted text from the elements, as well as summary level information, to identify key ideas within the document. These ideas might encompass one or many extracted elements. We use the extracted data to construct a template of key information which needs to be filled in order to fully describe the identified ideas within the document.

Once the document has been devided into ideas, and templates including what is needed to describe those ideas are generated, the templates are filled out via yet another fine tuned LLM. This, ultimately, is what becomes a semantic object. All models used in this process exist within EyeLevel's cloud, and can be deployed to a VPC as an atomic unit.

## How does GroundX work?

Because X-Ray creates highly queryable semantic objects, GroundX search does not use the traditional cosine-similarity flavor of vector search common in many similar retreival systems. Rather, GroundX employs a customized textual search strategy built on top of Apache Lucene, Lucene being designed for indexing and searching textual data. We employ a configured variety of Apache Lucene enabled search which is specifically designed to be maximally compatible with the semantic objects output by X-Ray.

The validity of this approach is supported [in literature](https://arxiv.org/pdf/2308.14963) and also has a variety of practical benifits which allow for the optimization of GroundX on a case by case basis with minimal overhead.

There's a lot of nitty gritty engineering that goes into this, which is the cumulative experience of eyelevel working with numerous companies across a diverse spread of documents. What we settled on is a complex multi-field filter and search which prioritizes certain elements in the semantic object, and certain tokens within those elements.

## The Workflow

All of eyelevels technologies (including X-Ray) can be accessed via the GroundX SDK, which is essentially a collection of language specific implementations and CURL accessible API endpoints. The documentation for the API can be found here:

https://documentation.eyelevel.ai/reference/

The most fundamental component of organization within GroundX is the bucket, which is used to store documents. When you upload a document to a bucket it will trigger X-Ray parsing, and the result will be stored on the bucket for later querying. The semantic objects which X-Ray creates are ultimately what is stored in a bucket.

Projects are collections of buckets, allowing you to search between multiple buckets. This can allow you to organize information across buckets, and aggregate that information for specific use cases.

Buckets and Projects can be searched against based on a natural language query. GroundX will search for the most relevent semantic objects which match your query and return them. GroundX will also construct a recomended text block which aggregates information from the most relevent retreived semantic objects. This is designed to be injected into a language model, enabling RAG esque qorkflows. We'll explore, in depth, the results of search later in the notebook.

## Optimizing for Your Documents

While EyeLevel's products are designed to work out of the box on arbitrary human documents, in reality it's impossible to make a single unified system that is perfect in every use case. One of the core ideas of both X-Ray and GroundX is an element of configurability: We can fine tune computer vision models on your documents, we can adjust templating to match your needs, and we can modify our search system based on your specific requirments. We also have a depth of experience in analyzing and testing the performance of RAG systems in real world use cases. The takeaway is that X-Ray + GroundX allows you to acheive state of the art performance out of the box on a wide range of common document types, and can be tailored to perform exelently to your documents if necessary.

## On-Prem
GroundX has both hosted and kubernetes deployable versions. If you want to use GroundX in a secure environment, and do RAG without any external communication, check out the [GroundX On-Prem Repo](https://github.com/eyelevelai/groundx-on-prem)

---



# Creating an EyeLevel Account And Registering an API Key

You can create an account here:

https://dashboard.eyelevel.ai/auth/register

Once you have an account setup, you can navigate here to setup an API key:

https://dashboard.eyelevel.ai/apikey


In [67]:
!pip3 install python-dotenv
!pip3 install OpenAI
!pip3 install groundx
#for suplamentary demo
!pip3 install transformers
!pip3 install torch torchvision

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Defaulting to user installation because normal site-packages is not writeable
Collecting torchvision
  Downloading torchvision-0.20.1-cp39-cp39-macosx_1

In [None]:
"""Setting up API Keys
"""
is_google_secret = False
is_env = True

if is_google_secret:
    #if your api key is stored in the colab seceret manager
    from google.colab import userdata
    import os
    from openai import OpenAI

    api_key = userdata.get('GROUNDX_API_KEY')

    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

elif is_env:
    #using a local .env file
    import os
    from dotenv import load_dotenv

    load_dotenv()

    api_key = os.getenv('GROUNDX_API_KEY')

# The Python SDK
Interfacing with both X-Ray and GroundX can be done via [The Python GroundX SDK](https://pypi.org/project/groundx-python-sdk/). There's also a [node package](https://www.jsdelivr.com/package/npm/groundx-typescript-sdk) which exposes equivilent javascript functionality.

Currently GroundX and X-Ray exist as a series of endpoints which we'll be exploring in this notebook. Documentation around those endpoints can be found here:

https://documentation.eyelevel.ai/reference

In the near future these endpoints will soon be abstracted into language specific implementations of core functionality. For now we'll be working directly with the exposed API endpoints.

# Authenticating
To get set up in python simply import the module and create an instance of the GroundX client with your API key.

In [17]:
from groundx import GroundX
client = GroundX(
  api_key=api_key,
)

# Creating a Bucket
Buckets can either be created through [the dashboard](https://dashboard.eyelevel.ai/home) by selecting the `+ New Bucket` button, or via the api via the [create_bucket endpoint](https://documentation.eyelevel.ai/reference/Buckets/Bucket_create).

Here we'll create a bucket called demo_bucket.

Buckets are uniquely identified by a bucket_id which is returned upon completion of the endpoint. We'll use this bucket_id to upload documents and search against our bucket.

The body of the response is formatted thus:
```
body={'bucket': {'bucketId': ____, 'name': ____}}
```

In [26]:
response = client.buckets.create(
    name="demo_bucket"
)
bucket_id = response.bucket.bucket_id
print(f'Created bucket {bucket_id}')

Created bucket 13367


# Uploading Documents
Now that we have a bucket we can upload documents to that bucket. Recall that this triggers X-Ray to parse the documents so our bucket will be populated with a bunch of semantic objects.

X-Ray supports the following document types:
```
txt, docx, pptx, xlsx, pdf, png, jpg, csv, tsv, json
```
The primary use case of eyelevel products is in understanding complex human documents, so we'll use a PDF for this example. Specifically, we'll use [this document](https://arxiv.org/pdf/2110.11822), which is an academic paper with complex textual formatting and graphical figures.

When using the `upload_local` endpoint you'll get a response with the flavor of

```
{'ingest': {'processId': ____, 'status': 'queued'}})
```

Once you upload a set of documents it triggers X-Ray to begin processing the documents. This can be observed by using the `processId`.

In [28]:
from groundx import Document

doc_path = '2110.11822v2.pdf'

#uploading document
response = client.ingest(
    documents=[
      Document(
        bucket_id=bucket_id,
        file_name=doc_path,
        file_path=doc_path,
        file_type='pdf'
      )
    ]
)

In [32]:
process_id = response.ingest.process_id

# Tracking X-Ray Progress

We can poll the processId via the [get_processing_status_by_id](https://documentation.eyelevel.ai/reference/Documents/Document_getProcessingStatusById) endpoint, which will tell us if our documents are in one of four states.
```
cancelled, complete, errors, processing
```

The structure of the response obeys the following schema:
```
{
  "ingest": {
    "processId": "9e0ad09b-5150-48c0-aded-707587048fd9",
    "progress": {
      "cancelled": {
        "documents": [
          {
            <document info>
          }
        ],
        "total": 0
      },
      "complete": {
        "documents": [
          {
            <document info>
          }
        ],
        "total": 0
      },
      "errors": {
        "documents": [
          {
            <document info>
          }
        ],
        "total": 0
      },
      "processing": {
        "documents": [
          {
            <document info>
          }
        ],
        "total": 0
      }
    },
    "status": "queued",
```

where `<document info>` might look like the following:
```
{"bucketId": 0,
"documentId": "4704590c-004e-410d-adf7-acb7ca0a7052",
"fileName": "string",
"fileSize": "1.4MB",
"fileType": "txt",
"processId": "9e0ad09b-5150-48c0-aded-707587048fd9",
"searchData": {},
"sourceUrl": "http://example.com",
"status": "queued",
"statusMessage": "string",
"xrayUrl": "http://example.com"}
```

This can be used for a variety of status checks depending on the application. For now, because we're only uploading a single document for testing purposes, I'll just poll this endpoing every 10 seconds to see if our document is done by checking the status of the cumulative process

In [39]:
import time
while True:

    response = client.documents.get_processing_status_by_id(
        process_id=process_id
    )

    if response.ingest.status == 'complete':
        print('done!')
        break

    print('still processing...')
    time.sleep(10)

#getting the document id for the next section.
doc_id = response.ingest.progress.complete.documents[0].document_id

done!


# Viewing X-Ray parse results
In the previous section we got the document_id from the upload process. We can use that to get the URL where the X-Ray data is exposed (as a JSON object) then explore it.

In [42]:
#getting the URL of x-ray parsing
response = client.documents.get(
    document_id=doc_id
)
x_ray_url = response.document.xray_url

#getting x-ray data
import urllib.request, json
with urllib.request.urlopen(x_ray_url) as url:
    x_ray_data = json.loads(url.read().decode())

In [43]:
"""X-Ray summarization of the entire document
"""
x_ray_data['fileSummary']

"This document describes the environmental impacts of artificial intelligence (AI) solutions, particularly focusing on their life cycle assessment and the balance between their positive and negative effects. It involves researchers from various French universities and research institutions, including Univ. Paris-Saclay, Univ. Aix-Marseille, Univ. Bordeaux, and Université Grenoble Alpes. The document, dated April 21, 2022, examines the energy consumption and greenhouse gas emissions associated with AI, especially deep learning, and the broader environmental implications beyond just carbon footprint. It reviews existing methodologies for assessing these impacts and proposes a comprehensive life cycle assessment approach to evaluate AI services' environmental value. The purpose is to highlight the need for a holistic evaluation of AI's environmental costs and benefits, emphasizing the importance of considering both direct and indirect effects.\n\nKeywords: AI environmental impact, life cy

In [44]:
"""X-Ray provides some generally useful data about the document
"""
print('File Type:',x_ray_data['fileType'])
print('Language: ',x_ray_data['language'])
print('Keywords: ',x_ray_data['fileKeywords'])

File Type: pdf
Language:  English
Keywords:  2110.11822v2.pdf,AI environmental impact, artificial intelligence life cycle, AI energy consumption, deep learning emissions, greenhouse gas AI, AI carbon footprint, environmental assessment AI, AI sustainability, Univ. Paris-Saclay AI research, Univ. Aix-Marseille AI study, Univ. Bordeaux AI analysis, Université Grenoble Alpes AI project, April 21 2022 AI document, AI environmental value, AI positive negative effects, AI holistic evaluation, AI direct indirect effects, AI environmental methodologies, AI service assessment, AI research France, 2110.11822v2, AI environmental balance, AI ecological impact, AI life cycle approach.


In [45]:
"""Semantic object Exploration
Semantic objects can exist on one or multiple pages. In this object you can see
the following:
 - A list of bounding boxes from items on the page(s) that contribute to the object
 - The type of content, in this case a paragraph (as apposed to a figure or table)
 - the page number(s) the semantic object exists within.
 - sectionSummary: summarizes the greater section the semantic object is within.
 This is designed to provide additional context to the semantic object.
 - suggestedText: is LLM rewritten text which is based on the extracted text and
 other section level and document level information.
 - text: The raw extracted textual data
"""
x_ray_data['chunks'][0]

{'boundingBoxes': [{'bottomRightX': 1091,
   'bottomRightY': 486,
   'pageNumber': 1,
   'topLeftX': 138,
   'topLeftY': 319}],
 'chunk': '24t9xe-0',
 'contentType': ['table'],
 'json': [{'description': 'List of authors and their affiliations',
   'main_headers': ['Name', 'Affiliation', 'Location'],
   'table_number': 'Missing',
   'table_title': 'Missing'},
  {'affiliation': 'Univ. Paris-Saclay, LIMSI, CNRS, ENSIIE',
   'location': 'Orsay, France',
   'name': 'Anne-Laure Ligozat'},
  {'affiliation': 'Univ. Aix-Marseille, CNRS, Centrale Marseille',
   'location': 'Marseille, France',
   'name': 'Julien Lefèvre'},
  {'affiliation': 'Univ. Bordeaux, Bordeaux INP, CNRS, Laboratoire LaBRI',
   'location': 'Talence, France',
   'name': 'Aurélie Bugeau'},
  {'affiliation': 'Universite Grenoble Alpes, VERIMAG',
   'location': 'Grenoble, France',
   'name': 'Jacques Combaz'}],
 'multimodalUrl': 'https://upload.eyelevel.ai/layout/raw/prod/b2e8d0f1-0322-46f5-96c1-33de449a1011/c6e9873a-9b71-428b-

In [46]:
"""Semantic object Exploration
The previous example of a semantic object only contained textual information.
Let's explore a table.

As can be seen the content of this semantic object is similar to the one used
to represent paragraph information, but with some key differences:
- There's a json description of the data
- There's a url to the image used to extract data
- There is a narrative representation of the data

Click the multimodal URL to renter the image!

We've found that things like figures and tables often benifit from having
both a JSON description of what content exists, as well as a narrative description
to describe key elements.
"""
x_ray_data['chunks'][6]

{'boundingBoxes': [{'bottomRightX': 1157,
   'bottomRightY': 1253,
   'pageNumber': 4,
   'topLeftX': 665,
   'topLeftY': 882}],
 'chunk': 'cf0xmx-0',
 'contentType': ['table'],
 'json': [{'description': 'The table outlines the life cycle stages and unit processes for AI services, indicating whether each process is mandatory or recommended.',
   'main_headers': ['Life Cycle Stage', 'Requirement'],
   'table_title': 'Application to AI services of ITU recommendation'},
  {'life_cycle_stage': 'A - Raw material acquisition',
   'requirement': 'Mandatory'},
  {'life_cycle_stage': 'B - Production',
   'requirement': 'Devices production and assembly',
   'status': 'Mandatory'},
  {'life_cycle_stage': 'B - Production',
   'requirement': 'Manufacturer support activities',
   'status': 'Recommended'},
  {'life_cycle_stage': 'B - Production',
   'requirement': 'Production of support equipment',
   'status': 'Mandatory'},
  {'life_cycle_stage': 'B - Production',
   'requirement': 'ICT-specific sit

In [47]:
"""Semantic object Exploration
Here's an example of a figure

it has the same general structure as tables, but a different underlying pipeline
in X-Ray was used to create this object. In being more complex visually than
a table, the narrative representation is arguably more impactful
"""
x_ray_data['chunks'][9]

{'boundingBoxes': [{'bottomRightX': 801,
   'bottomRightY': 651,
   'pageNumber': 5,
   'topLeftX': 473,
   'topLeftY': 636}],
 'chunk': '8q2f7z-14',
 'contentType': ['paragraph'],
 'pageNumbers': [5],
 'sectionSummary': "The document explores the environmental impacts of artificial intelligence (AI), particularly focusing on life cycle assessment (LCA) of AI solutions. It involves researchers from French universities, including Univ. Paris-Saclay, Univ. Aix-Marseille, Univ. Bordeaux, and Université Grenoble Alpes. The main theme is the dual nature of AI's environmental impact, where AI is both a tool for addressing environmental issues and a contributor to greenhouse gas emissions due to its energy-intensive processes. The document reviews existing methodologies for assessing AI's environmental impacts and proposes a comprehensive LCA approach to evaluate AI services. It emphasizes the importance of considering both direct and indirect environmental effects, including energy consumpti

# Searching
Ok, we have a bunch of these semantic objects thanks to X-Ray, and they exist within GroundX buckets. Now we can run search via GroundX. Search will get us a list of semantic objects as well as some additional aggregate information.

Within the search object you'll find:
 - count: the number of relevent semantic objects
 - query: the query used in search
 - results: a list of semantic objects. These are normal semantic objects, but they each have an additional "score" attribute which describes how well they align with the users query.
 - score: How relevent the top scoring semantic object is.
 - text: a formatted block of text which contains information from relevent chunks. This can be used as context in a RAG application.

The list of semantic objects are just like the semantic objects previously discussed, but they each have a "score"

In [48]:
search_query = 'I need a diagram of the AI lifecycle'

response = client.search.content(
    id=bucket_id,
    query=search_query
)

In [50]:
# Exploring a retrieved chunk
dict(response.search.results[0])

{'bounding_boxes': [BoundingBoxDetail(bottom_right_x=954.7816, bottom_right_y=1070.8883, page_number=5, top_left_x=317.81744, top_left_y=649.6449)],
 'bucket_id': 13367,
 'chunk_id': None,
 'document_id': 'c6e9873a-9b71-428b-afeb-bf0a4c2e3229',
 'file_name': '2110.11822v2.pdf',
 'multimodal_url': 'https://upload.eyelevel.ai/layout/raw/prod/b2e8d0f1-0322-46f5-96c1-33de449a1011/c6e9873a-9b71-428b-afeb-bf0a4c2e3229/figure-5-0.jpg',
 'page_images': ['https://upload.eyelevel.ai/layout/raw/prod/b2e8d0f1-0322-46f5-96c1-33de449a1011/c6e9873a-9b71-428b-afeb-bf0a4c2e3229/5.jpg'],
 'score': 288.84952,
 'search_data': None,
 'source_url': 'https://upload.groundx.ai/prod/file/974d725e-e804-49f4-9792-de162e50a15c/c7dca033-104d-4bee-9a9a-15ed261187b6.pdf',
 'suggested_text': '{"figure_title": "Life Cycle Inventory of an AI Service", "figure_number": 2, "description": "Diagram illustrating the life cycle phases of an AI service and associated emissions.", "components": ["Production of devicei", "Use o

In [51]:
# exploring the formatted context to provide to the language model
response.search.text

'The following text excerpts are from the same section of a document named \'2110.11822v2.pdf\':\n\nText excerpt from page number 5:\n{"figure_title": "Life Cycle Inventory of an AI Service", "figure_number": 2, "description": "Diagram illustrating the life cycle phases of an AI service and associated emissions.", "components": ["Production of devicei", "Use of devicei", "End of life of devicei", "Production of electricity", "Resources (metals, etc.)"], "relationships": [{"source": "Production of devicei", "target": "Use of devicei", "type": "flow"}, {"source": "Use of devicei", "target": "End of life of devicei", "type": "flow"}, {"source": "Production of electricity", "target": "Production of devicei", "type": "support"}, {"source": "Resources (metals, etc.)", "target": "Production of electricity", "type": "resource"}, {"source": "Production of devicei", "target": "Emissions", "type": "environmental impact"}, {"source": "Use of devicei", "target": "Emissions", "type": "environmental 

# RAG
Now that we understand EyeLevel's X-Ray and GroundX more thoroughly, we can explore their application. In this example we'll be using [search.content](https://documentation.eyelevel.ai/reference/Search/Search_content) to search for relevent information, and pass GroundX's formatted aggregation to the language model.

GroundX's formatted aggregation is designed to put the most important things at the beginning. To use it for different sized language models, you can simply keep the first `n` charecters in the sequence. We've found that `n = 3*token_limit` typically works well, but more sophisticated token counting techniques can also be employed.

In [57]:
from openai import OpenAI

"""Defining RAG
using GroundX Search to retrive information, constructing an
augmented prompt based on GX's recommended textual representation,
and using OpenAI to generate a response.
"""

# ==== Retreival ====
def gx_search(query):
    response = client.search.content(
        id=bucket_id,
        query=query
    )
    return response.search.text

# ==== Augmentation ====
def gx_retrieve_and_augment(query):

    #getting context
    context = gx_search(query)

    if len(context) > 4000 * 3:
        context = context[:4000*3]

    #defining a high level prompt so the LLM knows what to do
    system_prompt = 'you are a helpful AI agent tasked with helping users extract information from the context below'

    #based on OpenAI's new formatting
    augmented_prompt = [{
        "role": "system",
        "content": system_prompt+'\n\n===\n'+context+'\n==='},
         {
        "role": "user",
        "content": query
         }]

    return augmented_prompt

# ==== Generation ====
def gxrag(query):

    #retrieving and augmenting
    augmented_prompt = gx_retrieve_and_augment(query)

    #Generating
    client = OpenAI()
    return client.chat.completions.create(model="gpt-4",messages=augmented_prompt).choices[0].message.content

res = gxrag('What are the major parts of the AI lifecycle?')
print('response:')
print(res)

response:
The major parts of the AI lifecycle, as mentioned in the document, include:

1. Raw material extraction: This includes all the industrial processes involved in the transformation from ore to metals.
2. Manufacturing: This encompasses the processes that create the equipment from the raw material.
3. Transport: This includes all transport processes involved, including product distribution.
4. Use: This primarily involves the energy consumption of the equipment while it is being used.
5. End of life: This refers to the processes to dismantle, recycle, and/or dispose of the equipment.

These phases are later simplified in the document to a single production phase, which combines raw material extraction, manufacturing, and transport.


# Image Reporting
Sometimes you don't only want a language model to answer the question for you. While RAG is useful, sometimes text simply isn't the appropriate response. In this example we'll use the same search approach as before, but provide a rich report of pages and figures, allong with generation, which might answer the question. This will allow a human to quickly evaluate the truthfullness of the generation, and come to their own conclusions as necessary.

Because X-Ray is multimodal by nature, the resulting semantic objects contain a variety of visual information which can be referenced. GroundX is useful in searching, but it's important to note that the GroundX ranking is designed for textual rather than visual search. As a result the most relevent diagram may not be the first search result from GroundX.

This can be easily aleviated by using a CLIP style model as a re-ranker, allowing for the most visually relevent information to be prioritized. It's unlikely that a small clip style model can understand image content, but it can likely seperate high level correct and incorrect types of images.

In [None]:
# Getting a CLIP style model to use as a reranker from Huggingface
from transformers import pipeline

#https://huggingface.co/openai/clip-vit-base-patch32?library=transformers
classifier = pipeline("zero-shot-image-classification", model="openai/clip-vit-large-patch14")

In [None]:
def GX_get_image_urls(query, rerank_filter):
    response = client.search.content(
        id=bucket_id,
        query=query
    )

    image_urls = set()

    print(response.search.results)
    print(len(response.search.results))
    for semantic_object in response.search.results:
        if 'multimodal_url' in dir(semantic_object):
            image_urls.add(semantic_object.multimodal_url)
        image_urls.update(semantic_object.page_images)

    image_urls = list(image_urls)

    ranked_results = classifier(
        image_urls,
        candidate_labels=rerank_filter,
    )

    #reformatting rank
    for i in range(len(ranked_results)):
        for j in range(len(rerank_filter)):
            ranked_results[i][j]['url'] = image_urls[i]

    #flattening
    ranked_results = [x for xs in ranked_results for x in xs]

    return ranked_results

query = 'A diagram of the AI lifecycle'
#creating a list of classes to compare to
rerank_filters = [query, 'miscellaneous figure', 'miscellaneous page', 'miscellaneous table']

ranked_results = GX_get_image_urls(query, rerank_filters)

In [None]:
import pandas as pd
df = pd.DataFrame(ranked_results)
df = df[df['label'] == query].sort_values('score',ascending=False)

def make_clickable(val):
    # target _blank to open new window
    return '<a target="_blank" href="{}">{}</a>'.format(val, val)

df.style.format({'url': make_clickable})

Unnamed: 0,score,label,url
12,0.999999,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/figure-5-1.jpg
8,0.999997,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/figure-5-0.jpg
36,0.999929,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/figure-4-0.jpg
0,0.999839,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/figure-6-0.jpg
60,0.99933,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/1.jpg
64,0.9917,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/5.jpg
40,0.982985,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/table-4-0.jpg
24,0.974595,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/figure-8-0.jpg
4,0.965495,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/2.jpg
20,0.942037,A diagram of the AI lifecycle,https://upload.eyelevel.ai/layout/raw/prod/d8606fc9-8be7-4bb1-b546-c8f2ba51b5e4/c4106c80-7a9d-4f3c-8b27-0b10aa9dd79c/table-4-1.jpg


As can be seen, the top responses are figures which are most relevent to the query.