<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2025_01_05_Query_Model_Cards.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to query the available LLMs, embedding models and reranking models via Graphlit API.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [1]:
!pip install --upgrade graphlit-client



Initialize Graphlit

In [2]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [3]:
from typing import List, Optional
from graphlit_api import QueryModelsModelsResults

async def query_models():
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.query_models()

        return response.models.results if response.models is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None


In [20]:
def pretty_print_model(card: QueryModelsModelsResults) -> str:
    # Safely unpack features lists (fall back to empty list if None)
    features = card.features
    metadata = card.metadata

    key_features = features.key_features if features is not None and features.key_features is not None else []
    strengths = features.strengths if features is not None and features.strengths is not None else []
    potential_use_cases = features.use_cases if features is not None and features.use_cases is not None else []

    # Determine maximum rows needed for the Features table
    max_feature_rows = max(len(key_features), len(strengths), len(potential_use_cases))

    # Build the Features table row by row
    feature_rows = []
    for i in range(max_feature_rows):
        key_col = f"- {key_features[i]}" if i < len(key_features) else ""
        strengths_col = f"- {strengths[i]}" if i < len(strengths) else ""
        potential_col = f"- {potential_use_cases[i]}" if i < len(potential_use_cases) else ""
        feature_rows.append(f"| {key_col} | {strengths_col} | {potential_col} |")

    if len(feature_rows) > 0:
        # Join all rows into a single string with the header
        features_table = (
            "| **Key Features**          | **Strengths**              | **Potential Use Cases**       |\n"
            "|---------------------------|----------------------------|--------------------------------|\n"
            + "\n".join(feature_rows)
        )
    else:
        features_table = "n/a"

    # Prepare data for the Metadata table
    metadata_table_rows = [
        f"| **Multilingual**            | {'Yes' if metadata and metadata.multilingual else 'No'} |",
        f"| **Multimodal**              | {'Yes' if metadata and metadata.multimodal else 'No'} |",
    ]

    if metadata is not None:
        if metadata.knowledge_cutoff:
            metadata_table_rows.append(f"| **Knowledge Cutoff**        | {metadata.knowledge_cutoff} |")
        if metadata.prompt_cost_per_million:
            metadata_table_rows.append(f"| **Prompt Cost per Million** | `${metadata.prompt_cost_per_million:,.2f}` |")
        if metadata.completion_cost_per_million:
            metadata_table_rows.append(f"| **Completion Cost per Million** | `${metadata.completion_cost_per_million:,.2f}` |")
        if metadata.embeddings_cost_per_million:
            metadata_table_rows.append(f"| **Embedding Cost per Million** | `${metadata.embeddings_cost_per_million:,.2f}` |")
        if metadata.reranking_cost_per_million:
            metadata_table_rows.append(f"| **Reranking Cost per Million** | `${metadata.reranking_cost_per_million:,.2f}` |")
        if metadata.context_window_tokens is not None:
            metadata_table_rows.append(f"| **Context Window Tokens**   | {metadata.context_window_tokens:,} |")
        if metadata.max_output_tokens is not None:
            metadata_table_rows.append(f"| **Max Output Tokens**       | {metadata.max_output_tokens:,} |")

    metadata_table = (
        "| **Property**               | **Value**                    |\n"
        "|----------------------------|------------------------------|\n"
        + "\n".join(metadata_table_rows)
    )

    # Construct the final Markdown
    markdown = f"""# {card.name}

### Model Type: {card.type}
**Model:** {card.model_type}.{card.model}

**Description:** {card.description}

**Model Card URI:** {card.uri}

**Available On:** {', '.join(card.available_on) if card.available_on else "N/A"}

---

## Features
{features_table}

---

## Metadata
{metadata_table}

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.
"""

    return markdown.strip()


Execute Graphlit example

In [21]:
from IPython.display import display, Markdown

models = await query_models()

if models is not None:
    for model in models:
        if model is not None:
            #print(pretty_print_model(model))
            display(Markdown(pretty_print_model(model)))

            print("---")
            print()

# jina-clip-v2

### Model Type: MULTIMODAL_EMBEDDING
**Model:** JinaModels.CLIP_Image

**Description:** The jina-clip-v2 model is a state-of-the-art CLIP-style model designed to handle both text and image data, offering multilingual support for 89 languages. It excels in high-resolution image processing with a resolution of 512x512 and employs Matryoshka representation learning for efficient truncated embeddings. This model is ideal for applications requiring robust multimodal capabilities, such as search and retrieval tasks across diverse languages and formats.

**Model Card URI:** https://huggingface.co/jinaai/jina-clip-v2

**Available On:** AWS SageMaker, Microsoft Azure, Google Cloud

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - Multilingual support for 89 languages | - Handles both text and image data | - Search and retrieval tasks |
| - High image resolution at 512x512 | - Supports high-resolution image processing | - Multimodal applications |
| - Matryoshka representation learning | - Efficient truncated embeddings | - Cross-lingual tasks |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | Yes |
| **Context Window Tokens**   | 8,191 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# jina-embeddings-v3

### Model Type: TEXT_EMBEDDING
**Model:** JinaModels.Embed_3_0

**Description:** The jina-embeddings-v3 model is a cutting-edge multilingual text embedding solution, boasting 570 million parameters and an impressive 8192 token-length capacity. It surpasses leading proprietary models from OpenAI and Cohere in performance on the MTEB benchmark, making it an excellent choice for applications requiring extensive text analysis and high accuracy in multilingual contexts.

**Model Card URI:** https://huggingface.co/jinaai/jina-embeddings-v3

**Available On:** AWS SageMaker, Microsoft Azure, Google Cloud

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - 570M parameters | - High performance in multilingual contexts | - Multilingual text analysis |
| - 8192 token-length capacity | - Extensive text analysis capabilities | - High-accuracy applications |
| - Outperforms OpenAI and Cohere on MTEB |  | - Large-scale text processing |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | No |
| **Context Window Tokens**   | 8,191 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-3-large

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_3_0_Large

**Description:** The Voyage-3-Large model is a robust text embedding solution designed to handle large-scale text data efficiently. It offers a generous free tier of 200 million tokens, making it an economical choice for businesses looking to leverage text embeddings without incurring high costs. With a competitive pricing structure of $0.18 per million tokens, it is ideal for applications requiring extensive text processing capabilities.

**Model Card URI:** None

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-3

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_3_0

**Description:** Voyage-3 is a versatile text embedding model that provides efficient processing of text data. It offers a substantial free tier of 200 million tokens, making it accessible for various applications. Priced at $0.06 per million tokens, it is suitable for projects that require cost-effective text embedding solutions.

**Model Card URI:** https://blog.voyageai.com/2024/09/18/voyage-3/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-3-lite

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_Lite_3_0

**Description:** Voyage-3-Lite is an economical text embedding model designed for lightweight applications. It offers a generous free tier of 200 million tokens and is priced at just $0.02 per million tokens, making it an excellent choice for budget-conscious projects that still require reliable text embedding capabilities.

**Model Card URI:** https://blog.voyageai.com/2024/09/18/voyage-3/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-code-3

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_Code_3_0

**Description:** Voyage-Code-3 is a specialized text embedding model tailored for code and programming-related text. It provides a substantial free tier of 200 million tokens and is priced at $0.18 per million tokens, making it a valuable tool for developers and organizations working with large volumes of code data.

**Model Card URI:** https://blog.voyageai.com/2024/12/04/voyage-code-3/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-finance-2

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_Finance_2_0

**Description:** Voyage-Finance-2 is a text embedding model optimized for financial data. It offers a free tier of 50 million tokens and is priced at $0.12 per million tokens, making it suitable for financial institutions and analysts who need to process large amounts of financial text data efficiently.

**Model Card URI:** https://blog.voyageai.com/2024/06/03/domain-specific-embeddings-finance-edition-voyage-finance-2/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-law-2

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_Law_2_0

**Description:** Voyage-Law-2 is a text embedding model designed for legal text processing. It provides a free tier of 50 million tokens and is priced at $0.12 per million tokens, making it an ideal choice for legal professionals and firms that require efficient text embedding solutions for legal documents.

**Model Card URI:** https://blog.voyageai.com/2024/04/15/domain-specific-embeddings-and-retrieval-legal-edition-voyage-law-2/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# voyage-code-2

### Model Type: TEXT_EMBEDDING
**Model:** VoyageModels.Voyage_Code_2_0

**Description:** Voyage-Code-2 is a text embedding model focused on code and programming text. It offers a free tier of 50 million tokens and is priced at $0.12 per million tokens, making it a cost-effective solution for developers and tech companies dealing with code data.

**Model Card URI:** https://blog.voyageai.com/2024/01/23/voyage-code-2-elevate-your-code-retrieval/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 32,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# mistral-large-latest

### Model Type: COMPLETION
**Model:** MistralModels.Mistral_Large

**Description:** Mistral Large is a premier model designed for high-complexity reasoning tasks. Released in November 2024, it represents the pinnacle of Mistral's model offerings, providing exceptional performance for demanding applications. With a maximum token capacity of 128k, it is well-suited for tasks requiring extensive context and detailed analysis.

**Model Card URI:** https://mistral.ai/news/mistral-large/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$2.00` |
| **Completion Cost per Million** | `$6.00` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# pixtral-large-latest

### Model Type: COMPLETION
**Model:** MistralModels.Pixtral_Large

**Description:** Pixtral Large is a cutting-edge multimodal model from Mistral, released in November 2024. It is designed to handle both text and image inputs, making it ideal for applications that require a comprehensive understanding of diverse data types. With a large context window of 128k tokens, it is capable of processing complex multimodal tasks efficiently.

**Model Card URI:** https://mistral.ai/news/pixtral-large/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | Yes |
| **Prompt Cost per Million** | `$2.00` |
| **Completion Cost per Million** | `$6.00` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# mistral-small-latest

### Model Type: COMPLETION
**Model:** MistralModels.Mistral_Small

**Description:** Mistral Small is an enterprise-grade model designed for tasks that require a compact yet powerful solution. Released in September 2024, it offers a maximum token capacity of 32k, making it suitable for applications that need efficient processing without compromising on performance. This model is ideal for businesses looking for a reliable and scalable solution.

**Model Card URI:** https://mistral.ai/news/september-24-release/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.20` |
| **Completion Cost per Million** | `$0.60` |
| **Context Window Tokens**   | 32,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# mistral-embed

### Model Type: TEXT_EMBEDDING
**Model:** MistralModels.Mistral_Embed

**Description:** Mistral Embed is a leading text embedding model designed to extract semantic representations from text. It is ideal for applications that require high-quality text embeddings, such as search and information retrieval. With a context window of 8k tokens, it provides efficient and accurate text representation capabilities.

**Model Card URI:** None

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 8,191 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# pixtral-12b-2409

### Model Type: COMPLETION
**Model:** MistralModels.Pixtral_12b_2409

**Description:** Pixtral is a versatile 12B model that excels in both text and image understanding. It is designed for applications that require a comprehensive approach to multimodal data processing. With a large context window of 128k tokens, Pixtral is capable of handling complex tasks that involve both textual and visual data.

**Model Card URI:** https://mistral.ai/news/pixtral-12b/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | Yes |
| **Prompt Cost per Million** | `$0.15` |
| **Completion Cost per Million** | `$0.15` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# open-mistral-nemo

### Model Type: COMPLETION
**Model:** MistralModels.Mistral_Nemo

**Description:** Mistral Nemo is a premier multilingual model released in July 2024, designed to handle a wide range of languages with high proficiency. It is ideal for applications that require robust multilingual capabilities, offering a maximum token capacity of 128k. This model is perfect for global applications that need to process diverse linguistic data efficiently.

**Model Card URI:** https://mistral.ai/news/mistral-nemo/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.15` |
| **Completion Cost per Million** | `$0.15` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# open-mixtral-8x7b

### Model Type: COMPLETION
**Model:** MistralModels.Mixtral_8x7b_Instruct

**Description:** Mixtral 8x7B is an advanced sparse mixture-of-experts model from Mistral, released in December 2023. It is designed to provide high efficiency and performance for complex tasks. With a maximum token capacity of 32k, it is ideal for applications that require a sophisticated approach to data processing.

**Model Card URI:** https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.70` |
| **Completion Cost per Million** | `$0.70` |
| **Context Window Tokens**   | 32,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama3.1-8b

### Model Type: COMPLETION
**Model:** CerebrasModels.Llama_3_1_8b

**Description:** Llama 3.1 8B is a powerful AI model developed by Cerebras, leveraging the capabilities of the Cerebras Wafer-Scale Engines and CS-3 systems. This model is instruction-tuned, making it ideal for conversational applications. With 8 billion parameters and a context length of 8192 tokens, it provides developers with a robust tool for creating responsive and intelligent chatbots. The model's training on over 15 trillion tokens ensures a comprehensive understanding of language, making it suitable for a wide range of conversational tasks.

**Model Card URI:** https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md

**Available On:** N/A

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
|  |  | - Conversational applications |
|  |  | - Chatbots |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-03-23 |
| **Prompt Cost per Million** | `$0.10` |
| **Completion Cost per Million** | `$0.10` |
| **Context Window Tokens**   | 8,192 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.3-70b

### Model Type: COMPLETION
**Model:** CerebrasModels.Llama_3_3_70b

**Description:** Llama 3.3 70B is an advanced AI model provided by Cerebras, utilizing the high-speed capabilities of the Cerebras Wafer-Scale Engines and CS-3 systems. This model is instruction-tuned for optimal performance in conversational applications. With a massive 70 billion parameters and a context length of 8192 tokens, it is designed to handle complex conversational tasks with ease. The extensive training on over 15 trillion tokens ensures a deep understanding of language, making it a valuable asset for developers looking to build sophisticated chatbots and conversational agents.

**Model Card URI:** https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

**Available On:** N/A

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
|  |  | - Conversational applications |
|  |  | - Chatbots |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-12-23 |
| **Prompt Cost per Million** | `$0.60` |
| **Completion Cost per Million** | `$0.60` |
| **Context Window Tokens**   | 8,192 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# command-r-plus

### Model Type: COMPLETION
**Model:** CohereModels.Command_R_Plus

**Description:** Command R+ is a state-of-the-art large language model designed to handle complex enterprise use cases with high scalability and performance. It is optimized for real-world applications, making it ideal for businesses looking to integrate advanced AI capabilities into their operations. With its robust architecture, Command R+ can efficiently process large volumes of data, providing accurate and insightful outputs that enhance decision-making and operational efficiency.

**Model Card URI:** https://docs.cohere.com/v2/docs/responsible-use

**Available On:** N/A

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
|  |  | - Enterprise applications |
|  |  | - Scalable AI solutions |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$2.50` |
| **Completion Cost per Million** | `$10.00` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# command-r

### Model Type: COMPLETION
**Model:** CohereModels.Command_R

**Description:** Command R is a versatile generative model tailored for tasks requiring long context understanding, such as retrieval-augmented generation and integration with external APIs. It is designed to handle complex queries and provide coherent and contextually relevant responses, making it a valuable tool for developers and businesses seeking to enhance their AI-driven applications.

**Model Card URI:** https://docs.cohere.com/v2/docs/responsible-use

**Available On:** N/A

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
|  |  | - Long context tasks |
|  |  | - Retrieval-augmented generation |
|  |  | - API integration |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$1.50` |
| **Completion Cost per Million** | `$6.00` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# command-r7b-12-2024

### Model Type: COMPLETION
**Model:** CohereModels.Command_R7B_202412

**Description:** Command R7B is a compact yet powerful generative model that excels in speed and efficiency, making it ideal for developers looking to build high-performance AI applications. Despite its smaller size, it delivers quality outputs, ensuring that applications run smoothly and effectively without compromising on performance.

**Model Card URI:** https://docs.cohere.com/v2/docs/responsible-use

**Available On:** N/A

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
|  |  | - High-speed AI applications |
|  |  | - Efficient AI solutions |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.38` |
| **Completion Cost per Million** | `$1.50` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 4,096 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# embed-english-v3.0

### Model Type: MULTIMODAL_EMBEDDING
**Model:** CohereModels.Embed_English_3_0

**Description:** Embed 3 is a premier multimodal embedding model designed to serve as an intelligent retrieval engine for semantic search and retrieval-augmented generation systems. It supports both text and image embeddings, making it a versatile tool for developers looking to enhance their applications with advanced search and retrieval capabilities. With its ability to handle multiple modalities, Embed 3 provides comprehensive solutions for complex data environments.

**Model Card URI:** https://docs.cohere.com/v2/docs/cohere-embed

**Available On:** N/A

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
|  |  | - Semantic search |
|  |  | - Retrieval-augmented generation |
|  |  | - Multimodal applications |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | Yes |
| **Context Window Tokens**   | 512 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# gemini-2.0-flash-exp

### Model Type: COMPLETION
**Model:** GoogleModels.Gemini_2_0_Flash_Experimental

**Description:** Gemini 2.0 Flash is Google's latest experimental multimodal model, designed to deliver next-generation features and capabilities. It supports a wide range of inputs including audio, images, video, and text, and provides text outputs, with plans to support image and audio outputs in the future. The model is optimized for speed and multimodal generation, making it suitable for a diverse variety of tasks. With a massive 1 million token context window, it is ideal for applications requiring extensive data processing. However, as an experimental model, it is recommended for exploratory testing and not for production use.

**Model Card URI:** https://ai.google.dev/gemini-api/docs/models/gemini-v2

**Available On:** Google AI Studio

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - Supports input from audio, images, video, and text | - Superior speed | - Exploratory testing |
| - Outputs text, with image and audio outputs coming soon | - Multimodal generation capabilities | - Prototyping applications requiring multimodal input |
| - 1 million token context window |  |  |
| - Optimized for speed and multimodal generation |  |  |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-08-24 |
| **Context Window Tokens**   | 1,048,576 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# gemini-1.5-flash

### Model Type: COMPLETION
**Model:** GoogleModels.Gemini_1_5_Flash

**Description:** Gemini 1.5 Flash is a highly versatile multimodal model from Google, optimized for fast performance across a wide range of tasks. It accepts inputs from audio, images, video, and text, and provides text outputs, making it suitable for applications that require diverse data processing capabilities. With a large context window and support for various system instructions and JSON modes, it is ideal for developers looking to balance performance and cost in their applications.

**Model Card URI:** https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf

**Available On:** Google AI Studio

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - Supports input from audio, images, video, and text | - Fast performance | - Applications requiring diverse data processing |
| - Outputs text | - Versatile task handling | - Balancing performance and cost |
| - Large context window |  |  |
| - Versatile performance across diverse tasks |  |  |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-05-24 |
| **Context Window Tokens**   | 1,048,576 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# gemini-1.5-flash-8b

### Model Type: COMPLETION
**Model:** GoogleModels.Gemini_1_5_Flash_8b

**Description:** Gemini 1.5 Flash-8B is a smaller variant of the Gemini 1.5 series, tailored for handling high volume tasks that require less computational intelligence. It supports a wide range of inputs including audio, images, video, and text, and provides text outputs. This model is ideal for applications that need to process large amounts of data quickly and efficiently, without the need for complex reasoning capabilities.

**Model Card URI:** https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf

**Available On:** Google AI Studio

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - Supports input from audio, images, video, and text | - Efficient data processing | - Applications with high data volume |
| - Outputs text | - Handles high volume tasks | - Tasks requiring less computational intelligence |
| - Optimized for high volume tasks |  |  |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-05-24 |
| **Prompt Cost per Million** | `$0.04` |
| **Completion Cost per Million** | `$0.15` |
| **Context Window Tokens**   | 1,048,576 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# gemini-1.5-pro

### Model Type: COMPLETION
**Model:** GoogleModels.Gemini_1_5_Pro

**Description:** Gemini 1.5 Pro is a robust multimodal model from Google, designed to handle complex reasoning tasks across various data types. It accepts inputs from audio, images, video, and text, and provides text outputs, making it suitable for applications that require in-depth data analysis and processing. With a larger context window and support for extensive data inputs, it is ideal for developers looking to implement sophisticated reasoning capabilities in their applications.

**Model Card URI:** https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf

**Available On:** Google AI Studio

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - Supports input from audio, images, video, and text | - Handles complex reasoning | - Applications requiring sophisticated reasoning |
| - Outputs text | - Supports extensive data inputs | - In-depth data analysis |
| - Optimized for complex reasoning tasks |  |  |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-05-24 |
| **Prompt Cost per Million** | `$1.25` |
| **Completion Cost per Million** | `$5.00` |
| **Context Window Tokens**   | 2,097,152 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# text-embedding-004

### Model Type: TEXT_EMBEDDING
**Model:** GoogleModels.Embedding_004

**Description:** The Text Embedding model from Google is designed to measure the relatedness of text strings, making it an essential tool for applications that require semantic understanding and text similarity analysis. With its optimization for creating embeddings with 768 dimensions, it offers superior retrieval performance, making it ideal for AI applications that rely on text embeddings for enhanced data processing and analysis.

**Model Card URI:** https://ai.google.dev/gemini-api/docs/embeddings

**Available On:** Google AI Studio

---

## Features
| **Key Features**          | **Strengths**              | **Potential Use Cases**       |
|---------------------------|----------------------------|--------------------------------|
| - Measures relatedness of text strings | - Superior retrieval performance | - Semantic understanding applications |
| - Optimized for 768-dimensional embeddings |  | - Text similarity analysis |

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | Yes |
| **Multimodal**              | No |
| **Context Window Tokens**   | 2,048 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# claude-3.5-sonnet

### Model Type: COMPLETION
**Model:** AnthropicModels.Claude_3_5_Sonnet

**Description:** Claude 3.5 Sonnet is Anthropic's most advanced model, designed to handle complex tasks with a large 200K context window. It offers flexible pricing options, including a 50% discount with the Batches API, making it suitable for a wide range of applications. This model is ideal for users who require high intelligence and extensive context handling capabilities.

**Model Card URI:** https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-04-24 |
| **Prompt Cost per Million** | `$3.00` |
| **Completion Cost per Million** | `$15.00` |
| **Context Window Tokens**   | 200,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# claude-3.5-haiku

### Model Type: COMPLETION
**Model:** AnthropicModels.Claude_3_5_Haiku

**Description:** Claude 3.5 Haiku is designed for speed and cost-effectiveness, offering a 200K context window and optimized latency for faster inference. It is particularly suitable for applications where quick response times and cost efficiency are critical. The model is available with a 50% discount through the Batches API, making it an attractive option for budget-conscious users.

**Model Card URI:** https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf

**Available On:** Amazon Bedrock

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-07-24 |
| **Prompt Cost per Million** | `$1.00` |
| **Completion Cost per Million** | `$5.00` |
| **Context Window Tokens**   | 200,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# claude-3-opus

### Model Type: COMPLETION
**Model:** AnthropicModels.Claude_3_Opus

**Description:** Claude 3 Opus is a robust model tailored for handling complex tasks, featuring a substantial 200K context window. It is designed for users who need a powerful tool to manage intricate and demanding applications. The model's pricing structure includes options for input, prompt caching, and output, providing flexibility for various use cases.

**Model Card URI:** https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-08-23 |
| **Prompt Cost per Million** | `$15.00` |
| **Completion Cost per Million** | `$75.00` |
| **Context Window Tokens**   | 200,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# claude-3-haiku

### Model Type: COMPLETION
**Model:** AnthropicModels.Claude_3_Haiku

**Description:** Claude 3 Haiku is optimized for speed and cost-effectiveness, featuring a 200K context window. It is ideal for applications that require rapid processing and budget-friendly solutions. The model offers a 50% discount with the Batches API, making it a practical choice for users looking to maximize efficiency and minimize costs.

**Model Card URI:** https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-08-23 |
| **Prompt Cost per Million** | `$0.25` |
| **Completion Cost per Million** | `$1.25` |
| **Context Window Tokens**   | 200,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# claude-3-sonnet

### Model Type: COMPLETION
**Model:** AnthropicModels.Claude_3_Sonnet

**Description:** Claude 3 Sonnet strikes a balance between speed, cost, and performance, offering a 200K context window. It is suitable for users who need a well-rounded model that can handle a variety of tasks efficiently. The model's pricing is structured to provide value while maintaining high performance standards.

**Model Card URI:** https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-08-23 |
| **Prompt Cost per Million** | `$3.00` |
| **Completion Cost per Million** | `$15.00` |
| **Context Window Tokens**   | 200,000 |
| **Max Output Tokens**       | 8,193 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# gpt-4o

### Model Type: COMPLETION
**Model:** OpenAIModels.GPT4o_128k

**Description:** GPT-4o is a cutting-edge multimodal model from OpenAI, designed to be faster and more cost-effective than its predecessor, GPT-4 Turbo. It boasts enhanced vision capabilities, making it ideal for applications requiring advanced image and text processing. With a substantial context window of 128K tokens and a knowledge cutoff in October 2023, GPT-4o is well-suited for complex tasks that demand a deep understanding of both visual and textual data. This model is perfect for developers looking to integrate sophisticated AI capabilities into their applications, offering a balance of performance and cost-efficiency.

**Model Card URI:** https://openai.com/index/gpt-4o-system-card/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-10-23 |
| **Prompt Cost per Million** | `$2.50` |
| **Completion Cost per Million** | `$10.00` |
| **Context Window Tokens**   | 128,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# gpt-4o-mini

### Model Type: COMPLETION
**Model:** OpenAIModels.GPT4o_Mini_128k

**Description:** GPT-4o Mini is a compact yet powerful model from OpenAI, designed to offer cost-effective AI solutions with enhanced vision capabilities. It surpasses the performance of GPT-3.5 Turbo while maintaining a lower cost, making it an excellent choice for developers seeking efficient AI models for applications that require both text and image processing. This model is ideal for projects with budget constraints but still demand high-quality AI performance.

**Model Card URI:** https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-10-23 |
| **Prompt Cost per Million** | `$0.15` |
| **Completion Cost per Million** | `$0.60` |
| **Context Window Tokens**   | 128,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# o1

### Model Type: COMPLETION
**Model:** OpenAIModels.O1_200k

**Description:** OpenAI o1 is a highly advanced reasoning model, offering robust support for tools, structured outputs, and vision capabilities. With a large context window of 200K tokens and a knowledge cutoff in October 2023, it is designed for complex reasoning tasks that require deep analytical capabilities. This model is perfect for developers looking to implement sophisticated AI solutions that can handle intricate problem-solving and data analysis tasks.

**Model Card URI:** https://openai.com/index/openai-o1-system-card/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | Yes |
| **Knowledge Cutoff**        | 2025-10-23 |
| **Prompt Cost per Million** | `$15.00` |
| **Completion Cost per Million** | `$60.00` |
| **Context Window Tokens**   | 200,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# o1-mini

### Model Type: COMPLETION
**Model:** OpenAIModels.O1_Mini_128k

**Description:** OpenAI o1 Mini is a streamlined version of the o1 model, optimized for speed and efficiency in coding and mathematical tasks. It offers a faster processing capability while maintaining the robust reasoning features of its larger counterpart. This model is ideal for developers who need a quick and efficient AI solution for technical applications, particularly in the fields of software development and mathematical computations.

**Model Card URI:** https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Knowledge Cutoff**        | 2025-10-23 |
| **Prompt Cost per Million** | `$3.00` |
| **Completion Cost per Million** | `$12.00` |
| **Context Window Tokens**   | 128,000 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# text-embedding-3-small

### Model Type: TEXT_EMBEDDING
**Model:** OpenAIModels.Embedding_3_Small

**Description:** Text Embedding 3 Small is a specialized model from OpenAI designed to enhance search, clustering, topic modeling, and classification tasks. It provides a cost-effective solution for embedding needs, making it suitable for applications that require efficient text processing and analysis. This model is perfect for developers looking to implement advanced text analytics in their applications without incurring high costs.

**Model Card URI:** https://platform.openai.com/docs/models#embeddings

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 8,191 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# text-embedding-3-large

### Model Type: TEXT_EMBEDDING
**Model:** OpenAIModels.Embedding_3_Large

**Description:** Text Embedding 3 Large is a robust model from OpenAI, tailored for advanced text embedding tasks such as search, clustering, topic modeling, and classification. It offers enhanced capabilities for handling large-scale text data, making it ideal for applications that require comprehensive text analysis and processing. This model is suitable for developers who need powerful embedding solutions for complex text analytics.

**Model Card URI:** https://platform.openai.com/docs/models#embeddings

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 8,191 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# text-embedding-ada-002

### Model Type: TEXT_EMBEDDING
**Model:** OpenAIModels.Ada_002

**Description:** Ada v2 is an advanced text embedding model from OpenAI, designed to facilitate sophisticated search, clustering, topic modeling, and classification tasks. It provides a balanced solution for embedding needs, offering both performance and cost-efficiency. This model is ideal for developers seeking to integrate advanced text analytics into their applications, ensuring high-quality results without excessive costs.

**Model Card URI:** https://platform.openai.com/docs/models#embeddings

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Context Window Tokens**   | 8,191 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# mixtral-8x7b-32768

### Model Type: COMPLETION
**Model:** GroqModels.Mixtral_8x7b_Instruct

**Description:** The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.

**Model Card URI:** https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.24` |
| **Completion Cost per Million** | `$0.24` |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.3-70b-versatile

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_3_70b

**Description:** The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

**Model Card URI:** https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.59` |
| **Completion Cost per Million** | `$0.79` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 32,768 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.2-90b-vision-preview

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_2_90b_Vision_Preview

**Description:** The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

**Model Card URI:** https://huggingface.co/meta-llama/Llama-3.2-1B

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.90` |
| **Completion Cost per Million** | `$0.90` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.2-11b-vision-preview

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_2_11b_Vision_Preview

**Description:** The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

**Model Card URI:** https://huggingface.co/meta-llama/Llama-3.2-1B

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.18` |
| **Completion Cost per Million** | `$0.18` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.2-3b-preview

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_2_3b_Preview

**Description:** The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

**Model Card URI:** https://huggingface.co/meta-llama/Llama-3.2-1B

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.06` |
| **Completion Cost per Million** | `$0.06` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.2-1b-preview

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_2_1b_Preview

**Description:** The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

**Model Card URI:** https://huggingface.co/meta-llama/Llama-3.2-1B

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.04` |
| **Completion Cost per Million** | `$0.04` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama-3.1-8b-instant

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_1_8b

**Description:** The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

**Model Card URI:** https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.05` |
| **Completion Cost per Million** | `$0.08` |
| **Context Window Tokens**   | 128,000 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama3-70b-8192

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_70b

**Description:** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. Llama 3 uses a tokenizer with a vocabulary of 128K tokens, and was trained on on sequences of 8,192 tokens.

**Model Card URI:** https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.59` |
| **Completion Cost per Million** | `$0.79` |
| **Context Window Tokens**   | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# llama3-8b-8192

### Model Type: COMPLETION
**Model:** GroqModels.Llama_3_8b

**Description:** Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. Llama 3 uses a tokenizer with a vocabulary of 128K tokens, and was trained on on sequences of 8,192 tokens.

**Model Card URI:** https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.05` |
| **Completion Cost per Million** | `$0.08` |
| **Context Window Tokens**   | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# deepseek-chat

### Model Type: COMPLETION
**Model:** DeepseekModels.Chat

**Description:** We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

**Model Card URI:** https://huggingface.co/deepseek-ai/DeepSeek-V3

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.27` |
| **Completion Cost per Million** | `$1.10` |
| **Context Window Tokens**   | 65,536 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---



# deepseek-coder

### Model Type: COMPLETION
**Model:** DeepseekModels.Coder

**Description:** We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens.

**Model Card URI:** https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct

**Available On:** N/A

---

## Features
n/a

---

## Metadata
| **Property**               | **Value**                    |
|----------------------------|------------------------------|
| **Multilingual**            | No |
| **Multimodal**              | No |
| **Prompt Cost per Million** | `$0.05` |
| **Completion Cost per Million** | `$0.08` |
| **Context Window Tokens**   | 65,536 |
| **Max Output Tokens**       | 8,192 |

---

> **Note**: All costs are in USD. Token limits and costs may vary depending on platform and configuration.

---

