Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Create generative AI APIs using AI subnet #2246

Merged
merged 11 commits into from
Jul 17, 2024
Merged

Conversation

victorges
Copy link
Member

@victorges victorges commented Jul 8, 2024

What does this pull request do? Explain your changes. (required)

This creates the initial versions of the AI Generate APIs, based on the design doc.

This initial version implements basically the proxy to an internal AI Gateway service, thus
also implementing the exact same interface as the gateway.

The main complexity of this was adding support to the multipart requests, currently used by
the AI Gateway interface for these APIs. This required upgrading the major version of the
ajv library and some of its related packages, as well as adding a bit of code to handle
multipart form validation and error handling.

Note: haven't added the actual API reference on purpose (paths: section on API schema), given we're
not yet sure how this new API is going to be advertised. Right now it is under an experiment so no end-users
will be able to use by themselves.

Specific updates (required)

  • Add schemas for the new APIs inputs
  • Implement the JSON handler for /generate/text-to-image API
  • Implement the multipart handler for other /generate APIs
  • Create a couple tests for the API proxy behavior

How did you test each of these updates (required)

  • yarn test
  • Put on staging and called the APIs

Does this pull request close any open issues?
Implements ENG-2181

Checklist

  • I have read the CONTRIBUTING document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

Copy link

vercel bot commented Jul 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
livepeer-studio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 16, 2024 10:53pm

const path = `/${name}`;
return app.post(
path,
authorizer({}),

Check failure

Code scanning / CodeQL

Missing rate limiting High

This route handler performs
authorization
, but is not rate-limited.
packages/api/src/controllers/generate.ts Dismissed Show dismissed Hide dismissed
Pretend they are JSON for now, adding support for multipart forms next.
That required upgrading ajv itself, which was a bit
of a pain but worked and also found a few issues in
our schema.
Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Added some minor comments. I have some other questions comments (about billing and monitoring) but I'll open them in Discord.

packages/api/src/parse-cli.ts Outdated Show resolved Hide resolved
type: string
default: SG161222/RealVisXL_V4.0_Lightning
enum:
- SG161222/RealVisXL_V4.0_Lightning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that currently, we accept only 2 models and that these 2 specific models are always available in Os?

Btw. what happens if the model is not available in the specific O?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leszko Forgot to reply this. Yeah right now we accept only these 2 models (or others in other APIs) which are the ones recommended by the AI Network team on the docs. e.g. https://docs.livepeer.org/ai/pipelines/text-to-image#models

Other models (called "on-demand") are still supported, but they might have less support from Orchestrators so I opted to keep them out on this first version.

Currently, an O can keep only 1 model (from any pipeline) warmed up, as it has to keep an ai-worker container running with that model loaded in the GPU. When the model is not available (warm) in the O, it will just kill that running container and start another ai-worker configured with the requested model. The model is usually available on the disk if the operator previously configured it (see this getting started), otherwise the ai-worker will download the model dynamically from hugging face.

This means that asking for an uncommon model can make the request really slow. Just loading the model on the GPU already takes a couple dozen seconds, and if the O has to download it as well it could take even minutes. So that's why I opted to limit the models that can be asked here (but there's still a risk no O has it warm and it takes longer than usual).

I believe this whole flow is being improved on by the AI network team, like with the "remote worker" architecture (similar to splitting O+T nodes). Maybe @rickstaa could correct if I said anything wrong or provide more info here 😄

packages/api/src/middleware/validators.ts Show resolved Hide resolved
Copy link
Member

@gioelecerati gioelecerati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple tests are failing

@victorges victorges merged commit a9979b7 into master Jul 17, 2024
8 checks passed
@victorges victorges deleted the vg/feat/ai-generate branch July 17, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants