-
Notifications
You must be signed in to change notification settings - Fork 464
Add Azure, Cohere, Anthropic, Palm Support - using liteLLM #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @ShreyaR @thekaranacharya @nefertitirogers any updates on this PR? Happy to help if there's any issues with this! cc: @ishaan-jaff |
|
I will add that this is related to Anthropic coverage feature request and the PR may need to be refactored to account for the new ways in which llm_providers works. The PR is delegating most of the functions to another LLM generalization library and perhaps is good separation of concerns compared to implementing a whole suite of wrappers here. However the entire llm_providers structure has changed and this PR needs to be reworked to provide a generic one. |
|
Hey there, I'm Pablo from Tryolabs, my colleague Paz Cuturi has been working together with some of the teams at Guardrails. I'm actually interested in implementing a use case combining LiteLLM+Guardrails to be able to switch LLMs while also providing guidance to the model. Anything I can do to help with this? What would be the refactor needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a straightforward change.
Can a repo admin please confirm they're ok with adding a dependency on a new library?
|
@ghbacct you can use our OpenAI proxy server to connect your model to guardrails: https://docs.litellm.ai/docs/simple_proxy No need to wait for this PR to get merged |
|
Hi! I saw the liteLLM interfaces and they look great - we could definitely add LiteLLM support as it's own code path (we'd change the switch here). We'd have litellm.completion as the llm_api argument. Adding it as it's own code path lets us support all of the APIs that lite supports, while maintaining backwards compatibility and giving the library time to harden with liteLLM usage (find out what works/what doesn't/provide bake time).
|
|
Hello @ishaan-jaff @krrishdholakia, Give it a spin today with the latest version: |
|
@thekaranacharya seems its not there in main branch we can access the example from Here |
|
@Harryalways317 Hi the link to the litellm example might have been changed, can you provide the link again? |
|
Here's a way to do it with instructor, could a similar way be used? |
|
Yep, similar as instructor, but if you need a detailed example use this gist, @w8jie you can refer this gist https://gist.github.com/Harryalways317/3257bea378e960914026e0df2e2c4354 or the sharing the code below, i was on mobile so didn't checked variable defs, pls check them like base url and model import json
import os
import time
from langfuse.callback import CallbackHandler
from langchain.callbacks import AsyncIteratorCallbackHandler
from langchain.callbacks.streaming_aiter_final_only import (
AsyncFinalIteratorCallbackHandler,
)
from fastapi import HTTPException
from loguru import logger
import litellm
import httpx
from typing import Any, Dict
import guardrails as gd
from guardrails.validators import ToxicLanguage,IsProfanityFree,PIIFilter,DetectSecrets
from starlette.responses import StreamingResponse
# # litellm.input_callback = ["sentry"]
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse","sentry"]
litellm.set_verbose=True
os.environ["OPENAI_API_KEY"] = "anything"
langfuse_handler = CallbackHandler(
public_key=os.environ.get("LANGFUSE_PUBLIC_KEY"),
secret_key=os.environ.get("LANGFUSE_SECRET_KEY"),
)
guard = gd.Guard.from_string(
#DetectSecrets(validation_method="sentence", on_fail="fix")
validators=[ToxicLanguage(validation_method="sentence",on_fail="fix"), IsProfanityFree(validation_method="sentence", on_fail="fix"),PIIFilter(pii_entities=['PERSON', 'PHONE_NUMBER'],validation_method="sentence", on_fail="fix")],
description="Ensure no toxic language",
)
def completion(prompt):
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"{prompt}"}
]
try:
response = guard(
litellm.completion,
api_base=base_url,
custom_llm_provider="openai",
model=model,
metadata={
"generation_name": f"instance1", # langfuse generation name
},
prompt=prompt,
# prompt_params={'query':messages},
max_tokens=max_tokens,
temperature=temperature,
)
logger.info(f'Prompt is {prompt} and response is {response}')
return response.model_dump_json()
except Exception as e:
logger.info(f'Prompt is {prompt}')
logger.error(f"An error occurred while requesting: {e}")
response = str(e)
return {"error": "An error occurred while processing your request.",'code':500,'response':response}
def generate_streaming_completion_llm(prompt: str,model: str = "HuggingFaceH4/zephyr-7b-beta", max_tokens: int = 12000,
temperature: float = 0.66) -> StreamingResponse:
"""
Generates a streaming completion from the LiteLLM API and returns a FastAPI StreamingResponse.
"""
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"{prompt}"}
]
base_url = f'http://{self-hosted-ip/base url}/v1'
model = 'HuggingFaceH4/zephyr-7b-beta'
logger.info(f'Streaming Prompt')
print('base_url',base_url)
os.environ["OPENAI_API_KEY"] = "anything"
os.environ['OPENAI_API_BASE'] = base_url
async def stream_generator():
try:
response = await litellm.acompletion(
api_base=base_url,
# llm_api=litellm.completion,
custom_llm_provider="openai",
model=model,
metadata={
"generation_name": f"instance1",
},
# prompt=prompt,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
stream=True,
# callbacks=(
# [AsyncFinalIteratorCallbackHandler(), langfuse_handler]
# ),
)
print(f"response: {response}")
async for chunk in response:
data = chunk.model_dump()
if 'choices' in data and data['choices']:
for choice in data['choices']:
if 'delta' in choice and choice['delta']:
if 'content' in choice['delta']:
yield choice['delta']['content']
except Exception as e:
yield json.dumps({"error": "An error occurred while processing your request.", "details": str(e)})
return
return StreamingResponse(content=stream_generator(), media_type="text/event-stream")` |
|
@w8jie @krrishdholakia @Harryalways317 we're putting up a new docs pr with a tighter LiteLLM integration at #635. Will push up changes that you can test out! |
|
@w8jie @krrishdholakia @Harryalways317 updated docs on using Guardrails with any custom LLM https://www.guardrailsai.com/docs/how_to_guides/llm_api_wrappers We've added a tighter integration with LiteLLM, take it for a spin and let us know if there's feedback. Also adding here -- in case there's interest we're heavily consolidating our LLM calling APIs right now. There's an open discussion that talks about this, feel free to leave comments there if you have feedback. |
Hi @ShreyaR I'm the maintainer of https://github.com/BerriAI/litellm
This PR is addressing #217
Would love to help increase model coverage for guardrails - let me know if you have feedback on this
Here's a sample of how it's used: