Skip to content

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented May 15, 2025

Problem

We need to expose a new endpoint for discovering available inference models

Solution

  • Regenerate code off the latest spec
  • Wire the new method up in the sync and async implementations of Inference
    • pc.inference.get_model
    • pc.inference.list_models
  • Make some adjustments in model_utils to be less fragile if unexpected values appear in enum fields
  • Implement new tests for these list_models endpoints.

Usage

from pinecone import Pinecone

pc = Pinecone()

models = pc.inference.list_models()
models[0]
# {
#     "model": "llama-text-embed-v2",
#     "short_description": "A high performance dense embedding model optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 2048 tokens) and dynamic embedding size (Matryoshka Embeddings).",
#     "type": "embed",
#     "supported_parameters": [
#         {
#             "parameter": "input_type",
#             "type": "one_of",
#             "value_type": "string",
#             "required": true,
#             "allowed_values": [
#                 "query",
#                 "passage"
#             ]
#         },
#         {
#             "parameter": "truncate",
#             "type": "one_of",
#             "value_type": "string",
#             "required": false,
#             "default": "END",
#             "allowed_values": [
#                 "END",
#                 "NONE",
#                 "START"
#             ]
#         },
#         {
#             "parameter": "dimension",
#             "type": "one_of",
#             "value_type": "integer",
#             "required": false,
#             "default": 1024,
#             "allowed_values": [
#                 384,
#                 512,
#                 768,
#                 1024,
#                 2048
#             ]
#         }
#     ],
#     "vector_type": "dense",
#     "default_dimension": 1024,
#     "modality": "text",
#     "max_sequence_length": 2048,
#     "max_batch_size": 96,
#     "provider_name": "NVIDIA",
#     "supported_metrics": [
#         "Cosine",
#         "DotProduct"
#     ],
#     "supported_dimensions": [
#         384,
#         512,
#         768,
#         1024,
#         2048
#     ]
# }

And async

import asyncio
from pinecone import PineconeAsyncio

async def main():
  with PineconeAsyncio() as pc:
    await pc.inference.list_models()

asyncio.run(main())

Type of Change

  • New feature (non-breaking change which adds functionality)

@jhamon jhamon changed the base branch from main to release-candidate/2025-04 May 15, 2025 16:37
@jhamon jhamon force-pushed the jhamon/list-models branch from 7087d07 to 27c4f08 Compare May 15, 2025 18:15
@jhamon jhamon changed the title Expose new pc.inference.list_models() Expose new pc.inference.list_models() and pc.inference.get_model() May 15, 2025
@jhamon jhamon force-pushed the jhamon/list-models branch from 86c5f66 to c4c78b2 Compare May 16, 2025 14:02
@jhamon jhamon marked this pull request as ready for review May 16, 2025 15:24
@jhamon jhamon merged commit c1688f6 into release-candidate/2025-04 May 16, 2025
71 of 72 checks passed
@jhamon jhamon deleted the jhamon/list-models branch May 16, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant