Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True OpenAI drop-in replacement by InferenceClient #2384

Merged
merged 7 commits into from
Jul 11, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jul 10, 2024

Closes #2369 @lappemic

Goal is to be able to use InferenceClient exactly the same way as OpenAI client. To do so we need to:

  • rename model to base_url => added an alias for it
  • rename model_id to model
  • rename token to api_key => added an alias for it
  • add alias for client.chat.completions.create

@philschmid could you have a look at it and confirm it meets your expectations? See tests for real example (here and here)

Sync + stream=False

client = InferenceClient(
    base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct",
    api_key="my-api-key",
)
output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=False,
    max_tokens=1024,
)
assert output.choices[0].message.content == "1, 2, 3, 4, 5, 6, 7, 8, 9, 10!"

Sync + stream=True

client = InferenceClient()
output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content)

Async + stream=False

    client = AsyncInferenceClient(
        base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct",
        api_key="my-api-key",
    )
    output = await client.chat.completions.create(
        model="meta-llama/Meta-Llama-3-8B-Instruct",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Count to 10"},
        ],
        stream=False,
        max_tokens=1024,
    )
    assert output.choices[0].message.content == "1, 2, 3, 4, 5, 6, 7, 8, 9, 10!"

Async + stream=True

client = AsyncInferenceClient()
output = await client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

chunked_text = [chunk.choices[0].delta.content async for chunk in output]

TODO:

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, very clear docs! Thanks @Wauplin

@julien-c
Copy link
Member

julien-c commented Jul 10, 2024

@philschmid could you have a look at it and confirm it meets your expectations?

from experience, a very tough test to pass usually 🤣

EDIT: PR looks cool!

@Wauplin
Copy link
Contributor Author

Wauplin commented Jul 11, 2024

Thanks for the reviews!

@Wauplin Wauplin merged commit bcef2ea into main Jul 11, 2024
16 checks passed
@Wauplin Wauplin deleted the 2369-true-openai-drop-in-replacement branch July 11, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Truly openai drop-in replacement for chat completion
5 participants