<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_12_23_Analyze_Google_Drive_Feed_Ingestion_Costs_%26_Usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to analyze the credit usage and individual usage logs from a Graphlit preparation workflow applied to a Google Drive feed.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.

To access your Google Drive account, assign these properties as Colab secrets: GOOGLE_DRIVE_CLIENT_ID, GOOGLE_DRIVE_CLIENT_SECRET and GOOGLE_DRIVE_REFRESH_TOKEN.

You can optionally assign GOOGLE_DRIVE_FOLDER_ID to the folder to be ingested.

---

Install Graphlit Python client SDK

In [None]:
!pip install --upgrade graphlit-client

In [None]:
!pip install --upgrade isodate

In [None]:
!pip install --upgrade tiktoken

Initialize Graphlit

In [None]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Initialize Google Drive credentials

In [None]:
os.environ['GOOGLE_DRIVE_CLIENT_ID'] = userdata.get('GOOGLE_DRIVE_CLIENT_ID')
os.environ['GOOGLE_DRIVE_CLIENT_SECRET'] = userdata.get('GOOGLE_DRIVE_CLIENT_SECRET')
os.environ['GOOGLE_DRIVE_REFRESH_TOKEN'] = userdata.get('GOOGLE_DRIVE_REFRESH_TOKEN')

os.environ['GOOGLE_DRIVE_FOLDER_ID'] = userdata.get('GOOGLE_DRIVE_FOLDER_ID')

Define Graphlit helper functions

In [None]:
from typing import List, Optional
from datetime import datetime, timedelta
import isodate

# NOTE: folder id is '1105Gru1PaaD4u9DmPaRAGg_4MLQ3o9CQ' from this Google Drive URI
# https://drive.google.com/drive/folders/1105Gab1PaaD4u9DmPaRAGg_4MLQ3o9CQ

# NOTE: file id is '1TEzotGuRfCkQV6Ff1g-LBZK5h9CeCLG5' from this Google Drive URI
# https://drive.google.com/file/d/1TEzotGuRfCkAB6Ff1g-LBZK5h9CeCLG5

async def create_feed(correlation_id: str):
    if graphlit.client is None:
        return;

    input = input_types.FeedInput(
        name="Google Drive",
        type=enums.FeedTypes.SITE,
        site=input_types.SiteFeedPropertiesInput(
            type=enums.FeedServiceTypes.GOOGLE_DRIVE,
            googleDrive=input_types.GoogleDriveFeedPropertiesInput(
                clientId=os.environ['GOOGLE_DRIVE_CLIENT_ID'],
                clientSecret=os.environ['GOOGLE_DRIVE_CLIENT_SECRET'],
                refreshToken=os.environ['GOOGLE_DRIVE_REFRESH_TOKEN'],
                # NOTE: you can filter on specific folder, or multiple files
                # if neither is assigned, it will recursively ingest from the root of the Google Drive account
                folderId=os.environ['GOOGLE_DRIVE_FOLDER_ID'],
                #files=["{file-id}","{file-id}"]
            ),
            readLimit=10 # limiting to 10 files
        )
    )

    try:
        response = await graphlit.client.create_feed(input, correlation_id=correlation_id)

        return response.create_feed.id if response.create_feed is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def is_feed_done(feed_id: str):
    if graphlit.client is None:
        return;

    response = await graphlit.client.is_feed_done(feed_id)

    return response.is_feed_done.result if response.is_feed_done is not None else None

async def query_contents(feed_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.query_contents(
            filter=input_types.ContentFilter(
                feeds=[
                    input_types.EntityReferenceFilter(
                        id=feed_id
                    )
                ]
            )
        )

        return response.contents.results if response.contents is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def lookup_usage(correlation_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.lookup_usage(correlation_id)

        return response.lookup_usage if response.lookup_usage is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def lookup_credits(correlation_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.lookup_credits(correlation_id)

        return response.lookup_credits if response.lookup_credits is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_feeds():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_feeds(is_synchronous=True)

def dump_usage_record(record):
    print(f"{record.date}: {record.name}")

    duration = isodate.parse_duration(record.duration)

    if record.workflow:
        print(f"- Workflow [{record.workflow}] took {duration}, used credits [{record.credits:.8f}]")
    else:
        print(f"- Operation took {duration}, used credits [{record.credits:.8f}]")

    if record.entity_id:
        if record.entity_type:
            if record.entity_type == enums.EntityTypes.CONTENT and record.content_type:
                print(f"- {record.entity_type} [{record.entity_id}]: Content type [{record.content_type}], file type [{record.file_type}]")
            else:
                print(f"- {record.entity_type} [{record.entity_id}]")
        else:
            print(f"- Entity [{record.entity_id}]")

    if record.model_service:
        print(f"- Model service [{record.model_service}], model name [{record.model_name}]")

    if record.processor_name:
        if record.processor_name in ["Deepgram Audio Transcription", "Assembly.AI Audio Transcription"]:
            length = timedelta(milliseconds=record.count or 0)

            if record.model_name:
                print(f"- Processor name [{record.processor_name}], model name [{record.model_name}], length [{length}]")
            else:
                print(f"- Processor name [{record.processor_name}], length [{length}]")
        else:
            if record.count:
                if record.model_name:
                    print(f"- Processor name [{record.processor_name}], model name [{record.model_name}], units [{record.count}]")
                else:
                    print(f"- Processor name [{record.processor_name}], units [{record.count}]")
            else:
                if record.model_name:
                    print(f"- Processor name [{record.processor_name}], model name [{record.model_name}]")
                else:
                    print(f"- Processor name [{record.processor_name}]")

    if record.uri:
        print(f"- URI [{record.uri}]")

    if record.name == "Prompt completion":
        if record.prompt:
            print(f"- Prompt [{record.prompt_tokens} tokens (includes RAG context tokens)]:")
            print(record.prompt)

        if record.completion:
            print(f"- Completion [{record.completion_tokens} tokens (includes JSON guardrails tokens)], throughput: {record.throughput:.3f} tokens/sec:")
            print(record.completion)

    elif record.name == "Text embedding":
        if record.prompt_tokens is not None:
            print(f"- Text embedding [{record.prompt_tokens} tokens], throughput: {record.throughput:.3f} tokens/sec")

    elif record.name == "Document preparation":
        if record.prompt_tokens is not None and record.completion_tokens is not None:
            print(f"- Document preparation [{record.prompt_tokens} input tokens, {record.completion_tokens} output tokens], throughput: {record.throughput:.3f} tokens/sec")

    elif record.name == "Data extraction":
        if record.prompt_tokens is not None and record.completion_tokens is not None:
            print(f"- Data extraction [{record.prompt_tokens} input tokens, {record.completion_tokens} output tokens], throughput: {record.throughput:.3f} tokens/sec")

    elif record.name == "GraphQL":
        if record.request:
            print(f"- Request:")
            print(record.request)

        if record.variables:
            print(f"- Variables:")
            print(record.variables)

        if record.response:
            print(f"- Response:")
            print(record.response)

    if record.name.startswith("Upload"):
        print(f"- File upload [{record.count} bytes], throughput: {record.throughput:.3f} bytes/sec")

    print()


In [None]:
import tiktoken

model = "gpt-4o"
encoding = tiktoken.encoding_for_model(model)

In [None]:
import json
import requests

def parse_token_count(uri):
    try:
        response = requests.get(uri)
        response.raise_for_status()  # Raise an error for HTTP issues

        data = response.json()

        tok_value = data.get("tok")

        return tok_value

    except requests.exceptions.RequestException as e:
        print(f"Error fetching the JSON file: {e}")
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON: {e}")

    return None


Execute Graphlit example

In [None]:
from IPython.display import display, Markdown, HTML
import time
from datetime import datetime

# Remove any existing feeds; only needed for notebook example
await delete_all_feeds()

print('Deleted all feeds.')

# NOTE: create a unique cost correlation ID
correlation_id = datetime.now().isoformat()

feed_id = await create_feed(correlation_id=correlation_id)

if feed_id is not None:
    print(f'Created feed [{feed_id}].')

    # Wait for feed to complete, since ingestion happens asychronously
    done = False
    time.sleep(5)
    while not done:
        done = await is_feed_done(feed_id)

        if not done:
            time.sleep(10)

    print(f'Completed feed [{feed_id}].')

    # Query contents by feed
    contents = await query_contents(feed_id)

    if contents is not None:
        print(f'Found {len(contents)} contents in feed [{feed_id}].')

        total_token_count = 0

        for content in contents:
            if content is not None:

                print(f'Ingested content [{content.id}], state [{content.state}]:')

                if content.text_uri is not None:
                    print(f'Text Mezzanine: {content.text_uri}')

                    token_count = parse_token_count(content.text_uri)

                    if token_count is not None:
                        total_token_count += token_count

        print(f'Token count: {total_token_count}')


In [None]:
from IPython.display import display, HTML, JSON

time.sleep(10) # give it some time for billing events to catch up

credits = await lookup_credits(correlation_id)

if credits is not None:
    display(Markdown(f"### Credits used: {credits.credits:.6f}"))
    print(f"- storage [{credits.storage_ratio:.2f}%], compute [{credits.compute_ratio:.2f}%]")
    print(f"- embedding [{credits.embedding_ratio:.2f}%], completion [{credits.completion_ratio:.2f}%]")
    print(f"- ingestion [{credits.ingestion_ratio:.2f}%], indexing [{credits.indexing_ratio:.2f}%], preparation [{credits.preparation_ratio:.2f}%], extraction [{credits.extraction_ratio:.2f}%], enrichment [{credits.enrichment_ratio:.2f}%], publishing [{credits.publishing_ratio:.2f}%]")
    print(f"- search [{credits.search_ratio:.2f}%], conversation [{credits.conversation_ratio:.2f}%]")
    print()

usage = await lookup_usage(correlation_id)

if usage is not None:
    display(Markdown(f"### Usage records:"))

    for record in usage:
        dump_usage_record(record)
    print()


## Sanitized output example

Credits used: 0.129584
- storage [8.18%], compute [65.92%]
- embedding [25.90%], completion [0.00%]
- ingestion [0.00%], indexing [0.00%], preparation [0.00%], extraction [0.00%], enrichment [0.00%], publishing [0.00%]
- search [0.00%], conversation [0.00%]

Usage records:
2024-12-24T03:54:16.096Z: Serverless compute
- Workflow [Entity Event] took 0:00:04.957193, used credits [0.00892501]
- CONTENT [a1ffdcb9-6acf-4996-a38b-4b67d4e76c66]

2024-12-24T03:54:15.983Z: Image embedding
- Workflow [Preparation] took 0:00:00.436756, used credits [0.00240000]
- CONTENT [a1ffdcb9-6acf-4996-a38b-4b67d4e76c66]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.868Z: Serverless compute
- Workflow [Entity Event] took 0:00:04.956168, used credits [0.00892317]
- CONTENT [92880d3e-50d2-4511-9b18-aee8575e36b1]

2024-12-24T03:54:15.785Z: Image embedding
- Workflow [Preparation] took 0:00:00.644526, used credits [0.00800000]
- CONTENT [92880d3e-50d2-4511-9b18-aee8575e36b1]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.778Z: Serverless compute
- Workflow [Entity Event] took 0:00:05.489983, used credits [0.00988426]
- CONTENT [dd3b34b4-057f-4707-b7ba-4dd5e45b69eb]

2024-12-24T03:54:15.747Z: Serverless compute
- Workflow [Entity Event] took 0:00:05.789132, used credits [0.01042285]
- CONTENT [fe36412f-05db-4572-8bb7-50ecac0d7d6c]

2024-12-24T03:54:15.746Z: Serverless compute
- Workflow [Entity Event] took 0:00:05.002557, used credits [0.00900669]
- CONTENT [085c2cfb-8190-412e-b63a-ceb642657978]

2024-12-24T03:54:15.746Z: Serverless compute
- Workflow [Entity Event] took 0:00:05.144207, used credits [0.00926172]
- CONTENT [6298bd37-9b79-4f93-87cc-3fafbd52de18]

2024-12-24T03:54:15.746Z: Serverless compute
- Workflow [Entity Event] took 0:00:05.304285, used credits [0.00954992]
- CONTENT [a6fc9d9c-52c2-4744-ab98-47b478996ae1]

2024-12-24T03:54:15.651Z: Image embedding
- Workflow [Preparation] took 0:00:00.608577, used credits [0.00800000]
- CONTENT [085c2cfb-8190-412e-b63a-ceb642657978]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.648Z: Image embedding
- Workflow [Preparation] took 0:00:00.768369, used credits [0.00600000]
- CONTENT [dd3b34b4-057f-4707-b7ba-4dd5e45b69eb]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.641Z: Image embedding
- Workflow [Preparation] took 0:00:00.686420, used credits [0.00240000]
- CONTENT [6298bd37-9b79-4f93-87cc-3fafbd52de18]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.639Z: Image embedding
- Workflow [Preparation] took 0:00:00.897627, used credits [0.00600000]
- CONTENT [fe36412f-05db-4572-8bb7-50ecac0d7d6c]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.636Z: Image embedding
- Workflow [Preparation] took 0:00:00.716262, used credits [0.00240000]
- CONTENT [a6fc9d9c-52c2-4744-ab98-47b478996ae1]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:15.576Z: Serverless compute
- Workflow [Entity Event] took 0:00:05.443816, used credits [0.00980114]
- CONTENT [f35edd1d-d954-4c9c-beb7-fd236cc02a06]

2024-12-24T03:54:15.508Z: Image embedding
- Workflow [Preparation] took 0:00:00.812914, used credits [0.00600000]
- CONTENT [f35edd1d-d954-4c9c-beb7-fd236cc02a06]: Content type [FILE], file type [IMAGE]
- Model service [Jina], model name [CLIP_Image]

2024-12-24T03:54:14.845Z: Upload Image
- Workflow [Preparation] took 0:00:00.047656, used credits [0.00029575]
- CONTENT [a1ffdcb9-6acf-4996-a38b-4b67d4e76c66]: Content type [FILE], file type [IMAGE]
- File upload [89296 bytes], throughput: 1873754.097 bytes/sec

2024-12-24T03:54:14.322Z: Upload Image
- Workflow [Preparation] took 0:00:00.210142, used credits [0.00042451]
- CONTENT [92880d3e-50d2-4511-9b18-aee8575e36b1]: Content type [FILE], file type [IMAGE]
- File upload [128173 bytes], throughput: 609935.187 bytes/sec

2024-12-24T03:54:13.874Z: Upload Image
- Workflow [Preparation] took 0:00:00.042310, used credits [0.00032739]
- CONTENT [085c2cfb-8190-412e-b63a-ceb642657978]: Content type [FILE], file type [IMAGE]
- File upload [98851 bytes], throughput: 2336345.223 bytes/sec

2024-12-24T03:54:13.761Z: Upload Image
- Workflow [Preparation] took 0:00:00.055706, used credits [0.00024539]
- CONTENT [6298bd37-9b79-4f93-87cc-3fafbd52de18]: Content type [FILE], file type [IMAGE]
- File upload [74090 bytes], throughput: 1330020.698 bytes/sec

2024-12-24T03:54:13.731Z: Upload Image
- Workflow [Preparation] took 0:00:00.035058, used credits [0.00024539]
- CONTENT [a6fc9d9c-52c2-4744-ab98-47b478996ae1]: Content type [FILE], file type [IMAGE]
- File upload [74090 bytes], throughput: 2113367.068 bytes/sec

2024-12-24T03:54:13.577Z: Upload Image
- Workflow [Preparation] took 0:00:00.128996, used credits [0.00025456]
- CONTENT [dd3b34b4-057f-4707-b7ba-4dd5e45b69eb]: Content type [FILE], file type [IMAGE]
- File upload [76861 bytes], throughput: 595842.491 bytes/sec

2024-12-24T03:54:13.542Z: Upload Image
- Workflow [Preparation] took 0:00:00.042718, used credits [0.00024324]
- CONTENT [fe36412f-05db-4572-8bb7-50ecac0d7d6c]: Content type [FILE], file type [IMAGE]
- File upload [73441 bytes], throughput: 1719213.068 bytes/sec

2024-12-24T03:54:13.540Z: Upload Image
- Workflow [Preparation] took 0:00:00.088569, used credits [0.00021841]
- CONTENT [f35edd1d-d954-4c9c-beb7-fd236cc02a06]: Content type [FILE], file type [IMAGE]
- File upload [65946 bytes], throughput: 744569.507 bytes/sec

2024-12-24T03:54:12.504Z: Upload Master
- Workflow [Ingestion] took 0:00:00.240087, used credits [0.00127167]
- CONTENT [a1ffdcb9-6acf-4996-a38b-4b67d4e76c66]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [383957 bytes], throughput: 1599243.107 bytes/sec

2024-12-24T03:54:11.888Z: Upload Master
- Workflow [Ingestion] took 0:00:00.046908, used credits [0.00036383]
- CONTENT [92880d3e-50d2-4511-9b18-aee8575e36b1]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [109853 bytes], throughput: 2341867.004 bytes/sec

2024-12-24T03:54:11.887Z: Upload Master
- Workflow [Ingestion] took 0:00:00.047257, used credits [0.00017238]
- CONTENT [6298bd37-9b79-4f93-87cc-3fafbd52de18]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [52046 bytes], throughput: 1101339.484 bytes/sec

2024-12-24T03:54:11.809Z: Upload Master
- Workflow [Ingestion] took 0:00:00.086177, used credits [0.00212647]
- CONTENT [085c2cfb-8190-412e-b63a-ceb642657978]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [642051 bytes], throughput: 7450384.036 bytes/sec

2024-12-24T03:54:11.774Z: Upload Master
- Workflow [Ingestion] took 0:00:00.071127, used credits [0.00080229]
- CONTENT [a6fc9d9c-52c2-4744-ab98-47b478996ae1]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [242236 bytes], throughput: 3405687.581 bytes/sec

2024-12-24T03:54:11.534Z: Upload Master
- Workflow [Ingestion] took 0:00:00.123872, used credits [0.00048866]
- CONTENT [fe36412f-05db-4572-8bb7-50ecac0d7d6c]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [147542 bytes], throughput: 1191086.268 bytes/sec

2024-12-24T03:54:11.457Z: Upload Master
- Workflow [Ingestion] took 0:00:00.184800, used credits [0.00176999]
- CONTENT [dd3b34b4-057f-4707-b7ba-4dd5e45b69eb]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [534418 bytes], throughput: 2891866.035 bytes/sec

2024-12-24T03:54:11.343Z: Upload Master
- Workflow [Ingestion] took 0:00:00.094902, used credits [0.00050401]
- CONTENT [f35edd1d-d954-4c9c-beb7-fd236cc02a06]: Content type [FILE], file type [IMAGE]
- URI [https://www.googleapis.com/drive/v3/files/redacted]
- File upload [152176 bytes], throughput: 1603513.534 bytes/sec

2024-12-24T03:54:09.818Z: Serverless compute
- Workflow [Feed] took 0:00:03.171730, used credits [0.00285522]

2024-12-24T03:53:56.459Z: GraphQL
- Operation took 0:00:00.234135, used credits [0.00000000]
- Request:
mutation CreateFeed($feed: FeedInput!, $correlationId: String) { createFeed(feed: $feed, correlationId: $correlationId) { id name state type } }
- Variables:
{"feed":"{ name: \"Google Drive\", type: SITE, site: { type: GOOGLE_DRIVE, googleDrive: { folderId: \"redacted\", refreshToken: \"redacted\", clientId: \"redacted\", clientSecret: \"redacted\" }, readLimit: 10 } }","correlationId":"\"2024-12-24T03:53:56.197879\""}
- Response:
{"data":{"createFeed":{"id":"90be0893-2384-49f6-8088-8d8fea97a81f","name":"Google Drive","state":"ENABLED","type":"SITE"}}}

