# LLM Clients

You can consult the following trustable and updated LLM providers performance leaderboards to monitor latest performance metrics and cost per tokens:

- [Hugging Face LLM Performance Leaderboard](https://huggingface.co/spaces/ArtificialAnalysis/LLM-Performance-Leaderboard)
- [Artificial Analysis LLM API Providers Leaderboard](https://artificialanalysis.ai/leaderboards/providers)



## Environment Setup

Import required Python packages and get application settings.

In [8]:
# General purpose libraries
import os
from dotenv import load_dotenv

## Google Gemini

To use the Google Gen AI client directly or with Langchain, the following environment variable has to be set.

```
GOOGLE_API_KEY="your-api-key"
```

### Google Gemini Genarative Language API

The Gemini API is consumed through the Google `Gen AI / Gemini API` named as `Generative Language` API in GCP.

Relevant Information Sources:

-[Google Gemini API Official Docs - All available model IDs information section](https://ai.google.dev/gemini-api/docs/models)

-[Google Gemini / GenAI Github Repository](https://github.com/googleapis/python-genai?tab=readme-ov-file#google-gen-ai-sdk)

-[API Key Reference and Best Place for Creation and Monitoring](https://aistudio.google.com/usage)

-[Model Comparison and Pricing for 2.0 Generation](https://developers.googleblog.com/en/gemini-2-family-expands/)

For getting the information of all the avialble models in the Gemini API, we can consult this [reference](https://ai.google.dev/gemini-api/docs/models).

To list all the available modesl through code, we can use the following code block taken from this official [example](https://ai.google.dev/api/models).


In [60]:
from google import genai

# Create Google GenAI client
client = genai.Client()

# Get model information for a specific model
model_info = client.models.get(model="gemini-2.0-flash-lite")
print(model_info)

# # List all LLM-based model IDs available in the GenAI API
# print("List of models that support generateContent:\n")
# for m in client.models.list():
#     for action in m.supported_actions:
#         if action == "generateContent":
#             print(m.name)

# # Embedding model IDs can also be fetched through the client
# print("\nList of models that support embedContent:\n")
# for m in client.models.list():
#     for action in m.supported_actions:
#         if action == "embedContent":
#             print(m.name)

name='models/gemini-2.0-flash-lite' display_name='Gemini 2.0 Flash-Lite' description='Gemini 2.0 Flash-Lite' version='2.0' endpoints=None labels=None tuned_model_info=TunedModelInfo(base_model=None, create_time=None, update_time=None) input_token_limit=1048576 output_token_limit=8192 supported_actions=['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent'] default_checkpoint_id=None checkpoints=None


### Langchain Google Generative AI Chat model

The `ChatGoogleGenerativeAI` chat model wraps the Google Gen AI API services to be consumed by Langchain.

A small demo is provided next, but all the _know-hows_ can be easily found in the official [`ChatGoogleGenerativeAI` docs](https://python.langchain.com/docs/integrations/chat/google_generative_ai/)

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Create a Lanchain Chat Model for Gemini
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-lite",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

Invocation demo.

In [None]:
# Test the LLM model inference
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--a5dd906a-ada1-4ac6-b308-8a20e8272972-0', usage_metadata={'input_tokens': 20, 'output_tokens': 7, 'total_tokens': 27, 'input_token_details': {'cache_read': 0}})