-
Notifications
You must be signed in to change notification settings - Fork 538
Use makeRequestOptions to generate inference snippets
#1273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
makeRequestOptional to generate inference snippetsmakeRequestOptions to generate inference snippets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed the generated snippets, all good!
This PR does nothing else than updating `packages/tasks-gen/scripts/generate-snippets-fixtures.ts` which is an internal script used to test the inference snippets. Goal of this PR is to store generated snippet under a new file structure like this: ``` ./snippets-fixtures/automatic-speech-recognition/python/huggingface_hub/1.hf-inference.py ``` instead of ``` ./snippets-fixtures/automatic-speech-recognition/1.huggingface_hub.hf-inference.py ``` In practice the previous file naming was annoying as it meant that adding a new snippet in a client type could lead to renaming another file (due to the `0.`, `1.`, ... prefixes). --- Typically in #1273 it makes the PR much bigger by e.g. deleting [`1.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-4759b74a67cc4caa7b2d273d7c2a8015ba062a19a8fad5cb2e227ca935dcb749) and creating [`2.openai.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-522e7173f8dd851189bb9b7ff311f4ee78ca65a3994caae803ff4fda5fe59733) just because a new [`1.requests.hf-inference.py`](https://github.com/huggingface/huggingface.js/pull/1273/files#diff-c8c5536f5af1631e8f1802155b66b0a23a4316eaaf5fcfce1a036da490acaa22) has been added. Separating files by language + client avoid these unnecessary problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe adding unit tests to check generated snippets look like expected woudl be nice
Still need to update the E2E tests probably, with |
This is what all these files are doing: https://github.com/huggingface/huggingface.js/pull/1273/files#diff-167c61699d82d2eb043e51674cba555d467fb72f0bf52d7ca30eef6da4f66732. Tests is here: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks-gen/scripts/generate-snippets-fixtures.ts |
| const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args; | ||
| delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you can use this syntax
| const { accessToken, endpointUrl, provider: maybeProvider, ...remainingArgs } = args; | |
| delete remainingArgs.model; // Remove model from remainingArgs to avoid duplication | |
| const { accessToken, endpointUrl, provider: maybeProvider, model, ...remainingArgs } = args; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I remember why I did that in the first place. If I deconstruct model and don't use it, the linter complains with 'model' is assigned a value but never used.
@SBrandeis any way to get rid of this? (otherwise I can go back to using delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!!
I have a few minor comments, but overall it looks good
| if ( | ||
| model.pipeline_tag && | ||
| ["text-generation", "image-text-to-text"].includes(model.pipeline_tag) && | ||
| model.tags.includes("conversational") | ||
| ) { | ||
| templateName = opts?.streaming ? "conversationalStream" : "conversational"; | ||
| inputPreparationFn = prepareConversationalInput; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess another solution would be to have the caller pass a isConversational: boolean flag or similar - wdyt?
...es/tasks-gen/snippets-fixtures/document-question-answering/python/requests/0.hf-inference.py
Outdated
Show resolved
Hide resolved
| function formatBody(obj: object, format: "python" | "json" | "js" | "curl"): string { | ||
| if (format === "python") { | ||
| return Object.entries(obj) | ||
| .map(([key, value]) => { | ||
| const formattedValue = JSON.stringify(value, null, 4).replace(/"/g, '"'); | ||
| return `${key}=${formattedValue},`; | ||
| }) | ||
| .join("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh i was wondering if we shouldn't use prettier or ruff to format snippets at runtime... not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so as well but not fully convinced. I feel that for some snippets it would condense more the code leading to slightly less readable snippets compared to what we have (maybe neglectable side effect?).
Anyway, maybe tooling for another day 😄
Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
PR built on top of #1273. This is supposed to be the last PR refactoring inference snippets :hear_no_evil: `python.ts`, `curl.ts` and `js.ts` have been merged into a single `getInferenceSnippets.ts` which handles snippet generations for all languages and all providers at once. Here is how to use it: ```ts import { snippets } from "@huggingface/inference"; const generatedSnippets = snippets.getInferenceSnippets(model, "api_token", provider, providerModelId, opts); ``` it returns a list `InferenceSnippet[]` defined as ```ts export interface InferenceSnippet { language: InferenceSnippetLanguage; // e.g. `python`, `curl`, `js` client: string; // e.g. `huggingface_hub`, `openai`, `fetch`, etc. content: string; } ``` --- ### How to review? It's hard to track all atomic changes made to the inference snippets but the best way IMO to review this PR is to check the generated snippets in the tests. Many inconsistencies in the URLs, sent parameters and indentation have been fixed. --- ### What's next? - [x] get #1273 approved - [ ] merge this one (#1291) into #1273 - [ ] merge #1273 - [ ] open PR in moon-landing to adapt to new interface (basically use `snippets.getInferenceSnippets` instead of `python.getPythonSnippets`, etc) - [ ] open PR to fix hub-docs automatic generation - [ ] fully ready for Inference Providers documentation! --------- Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
|
Merging this PR to move forward with https://github.com/huggingface-internal/moon-landing/pull/13013 |

The broader goal of this PR is to use
makeRequestOptionsfrom JS InferenceClient in order to get all the implementation details (correct URL, correct authorization header, correct payload, etc.). JS InferenceClient is supposed to be the ground truth in this case.In practice:
makeUrlwhen chatCompletion + image-text-to-text (review here + other providers)openaipython snippet (e.g. here, here)requestssnippet (here)Technically, this PR:
makeRequestOptionsin two parts: the async part that does the model ID resolution (depending on task+provider) and the sync part which generates the url, headers, body, etc. For snippets we only need the second part which is a sync call. => new (internal) methodmakeRequestOptionsFromResolvedModelsnippetGenerator/chat/completionsendpoint whenchatCompletionis enabledtext-generation=> now we will also use /chat/completion on "image-text-to-text" models./packages/inference/package.jsonto allow dev mode. Now runningpnpm run devin@inferencemakes it much easier to work with@tasks-gen(no need to rebuild each time I make a change)EDIT:
there is definitely a breaking change in how I handle the=> fixed.makeRequestOptionssplit (hence the broken CI). Will fix this.