Skip to content

Inference API v2 - design docs kick-off#2277

Draft
PawelPeczek-Roboflow wants to merge 1 commit intomainfrom
design/first-scratch-of-new-api-v2
Draft

Inference API v2 - design docs kick-off#2277
PawelPeczek-Roboflow wants to merge 1 commit intomainfrom
design/first-scratch-of-new-api-v2

Conversation

@PawelPeczek-Roboflow
Copy link
Copy Markdown
Collaborator

What does this PR do?

Related Issue(s):

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Other:

Testing

  • I have tested this change locally
  • I have added/updated tests for this change

Test details:

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

Copy link
Copy Markdown
Contributor

@dkosowski87 dkosowski87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/3 reviewed

* `GET /v2/models` - discover loaded models
* `DELETE /v2/models` - unload all models
* `POST /v2/models/load` - load given model
* `POST /v2/models/unload` - unload given model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `POST /v2/models/unload` - unload given model
* `DELETE /v2/models/unload` - unload given model


* `POST /v2/models/infer` - predict from model
* `GET /v2/models/interface` - discover model interface
* `GET /v2/models/compatibility` - discover models compatible with current server configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if GET /v2/models means discover loaded models->GET /v2/models/compatibility` seems confusing as it doesn't operate on the loaded models but, as I understand, returns a broader lists. Maybe

  • GET /v2/models?state=loaded
  • GET /v2/models?state=compatible


## Models endpoints

* `POST /v2/models/infer` - predict from model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One broader design question. Don't we see any value in separating model management endpoints from prediction endpoints. Similar as in torch serve where we have different ports for both. This probably would only make sense in self-hosted environment. Where a company admin manages model loading and unloading and we have some flag like SMART_MODEL_MANAGEMENT_ON_PREDICT=false where the model manager doesn't decide on loading/unloading models on predict requests.

curl -X POST https://serverless.roboflow.com/v2/models/infer \
--data-urlencode 'model_id=whatever/model-id/we?can?figure-out' \
-F "image=@photo.jpg;type=image/jpeg" \
-F 'inputs={"confidence": 0.5};type=application/json'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably also map nicely for params that be strictly assigned to elements of the batch. Assuming order needs to be kept.:

-F 'inputs=[{"confidence":0.5}, {"confidence":0.4}];type=application/json' \
-F "image=@photo-1.jpg" \
-F "image=@photo-2.jpg"

Mixing scalar and batch:

-F 'inputs=[{"confidence":0.5, "fuse_nms": true}, {"confidence":0.4, "fuse_nms" : true}];type=application/json' \
-F "image=@photo-1.jpg" \
-F "image=@photo-2.jpg"

Alternatively:

-F 'inputs=[{"confidence":0.5}, {"confidence":0.4}];type=application/json' \
-f 'defaults={"fuse_nms": true};type=application/json' \
-F "image=@photo-1.jpg" \
-F "image=@photo-2.jpg"

So we don't duplicate, but also separate batch inputs from scalars.

```

> [!IMPORTANT]
> Since we have `inference-models` and one model may have multiple model-packages, `model_package_id` is natural candidate for structured query param - letting clients specify which exact model package they want, altering dafault auto-loading choice. We can decide also that certain parameters of auto-loading should be possible to be passed (although we need to decide on that relatively fast due to engineering work in progress and impact on model manager).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably all relevant parameters, not sure if choosing those certain parameters makes sense, if the client is advanced enough to decide on those, he probably want to have the option of full control.

```

> [!IMPORTANT]
> There is security issue **embedded into accepting URLs as inputs - especially on the platform.** We accepted the risk of being middle-man in DDoS attack so far, likely it is going to be the case in the future (for user convenience), but would be good for all of parties involved into discussion to recognize and acknowledge this risk - to avoid surprises in the future.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have the ALLOW_URL_INPUT_WITHOUT_FQDN and ALLOW_NON_HTTPS_URL_INPUT vars. This plus timeouts and image size checks is imo ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants