Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use body for streaming instead of params #6098

Merged
merged 35 commits into from
Oct 27, 2023

Conversation

NarekA
Copy link
Contributor

@NarekA NarekA commented Oct 25, 2023

Making a new PR that follows conventions to replace #6091 and #6093

Http streaming breaks when the input doc schema has fields which are not str, int, or float. This includes dicts, bools, and nested objects (See Issue 6090). In addition it caps the input size to 2000 characters (Much less when you factor in URL encoding)

This PR changes the get endpoints to have the data passed in the request body rather than the params.

Goals:

  • resolves Issue 6090
  • check and update documentation. See guide and ask the team.

Also related: #6097

endpoint_path,
input_doc_model=None,
endpoint_path,
input_doc_model=None,
):
from fastapi import Request

@app.api_route(
path=f'/{endpoint_path.strip("/")}',
methods=['GET'],
Copy link
Contributor Author

@NarekA NarekA Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code works, but I did not allow get since it doesn't work with gateway deployments.

methods=['GET', 'POST'],

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (f2085a9) 75.98% compared to head (bf3cbe2) 76.89%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6098      +/-   ##
==========================================
+ Coverage   75.98%   76.89%   +0.91%     
==========================================
  Files         143      145       +2     
  Lines       13974    14014      +40     
==========================================
+ Hits        10618    10776     +158     
+ Misses       3356     3238     -118     
Flag Coverage Δ
jina 76.89% <76.00%> (+0.91%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
jina/__init__.py 56.00% <100.00%> (ø)
jina/clients/base/helper.py 85.38% <100.00%> (+2.43%) ⬆️
jina/serve/runtimes/gateway/request_handling.py 86.75% <100.00%> (+0.64%) ⬆️
...ve/runtimes/gateway/http_fastapi_app_docarrayv2.py 0.00% <0.00%> (ø)
jina/serve/runtimes/worker/http_fastapi_app.py 80.86% <72.72%> (+11.44%) ⬆️

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JoanFM
Copy link
Member

JoanFM commented Oct 25, 2023

I am not sure if I would prefer to keep the 2 endpoints separate and fix the underlying issue. @alaeddine-13 is the integration with SSE as in documentation possible still?

@NarekA
Copy link
Contributor Author

NarekA commented Oct 25, 2023

@JoanFM I'm just taking the existing endpoint and allowing it to accept either parameters or a request body. I'm happy to add a post endpoint as well, but I need your help with the gateway.

@NarekA
Copy link
Contributor Author

NarekA commented Oct 25, 2023

Here is a branch on my fork that will reproduce the gateway error

@NarekA NarekA force-pushed the fix-streaming-2-6091 branch 2 times, most recently from e87a684 to 5cfb423 Compare October 25, 2023 23:22
Comment on lines 146 to 156
stream = client.stream_doc(
start_time = time.time()
async for doc in client.stream_doc(
on='/hello',
inputs=MyDocument(text='hello world', number=i),
return_type=MyDocument,
)
start_time = None
async for doc in stream:
start_time = start_time or time.time()
):
assert doc.text == f'hello world {i}'
i += 1
delay = time.time() - start_time

# 0.5 seconds between each request + 0.5 seconds tolerance interval
assert delay < (0.5 * i), f'Expected delay to be less than {0.5 * i}, got {delay} on iteration {i}'
assert time.time() - start_time < (0.5 * i) + 0.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, we should merge this and fix the gateway.

@alaeddine-13
Copy link
Contributor

@NarekA why do you want to change the GET endpoint method ?
I thought the agreement was that, if you have basic requirements and would like to use client libraries, you use GET endpoint and if you need customized schemas and long input, you use the POST method.
What is wrong with this approach ?

@alaeddine-13
Copy link
Contributor

@NarekA why do you want to change the GET endpoint method ? I thought the agreement was that, if you have basic requirements and would like to use client libraries, you use GET endpoint and if you need customized schemas and long input, you use the POST method. What is wrong with this approach ?

Actually now I realize you want to allow passing either of body or query parameters.
I'm not sure if that's a good practice when building HTTP services but I think it's a nice idea actually. Yes @JoanFM this should keep compatibility with the documentation section where we use a standard js client.
A test would be great to have though or at least we need to try out the documentation on this PR before merging

@NarekA
Copy link
Contributor Author

NarekA commented Oct 26, 2023

@alaeddine-13 The current implementation with the POST method is fine, but it seems to break when using a deployment with a gateway. The stream all comes out at once. I modified this test (and am changing it back) not realizing that I was fixing the very thing it was testing for. The parameterized test passes for all the scenarios except for http+gateway. Using get instead of post seems to fix the gateway and we have only one endpoint so it's less confusing.

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there or can we have a test where the API is called with the query params?

@NarekA
Copy link
Contributor Author

NarekA commented Oct 26, 2023

If we do allow post, I like the idea of just using the same route with 2 methods: methods=['GET', 'POST'],. That way we have consistency between the endpoints, and don't have to maintain 2 routes.

async def streaming_get(request: Request):
query_params = dict(request.query_params)
async def streaming_get(request: Request, body: input_doc_model = None):
if not body:
Copy link
Contributor Author

@NarekA NarekA Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can and request.method == 'GET' to the condition here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may not need POST at all then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the current state of this PR, and it fixes everything. Can't tell if tests need to be re-run or if they're actually failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed that issue, I think the current failures just need re-runs, can you take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, found one that's not flakyness

Comment on lines 174 to 196
async def test_issue_6090_get_params(streaming_deployment):
"""Tests if streaming works with pydantic models with complex fields which are not
str, int, or float.
"""

docs = []
url = f"htto://localhost:{streaming_deployment.port}/stream-simple?text=my_input_text"
async with aiohttp.ClientSession() as session:

async with session.get(url) as resp:
async for doc, _ in resp.content.iter_chunks():
if b"event: end" in doc:
break
parsed = doc.decode().split("data:", 1)[-1].strip()
parsed = SimpleInput.parse_raw(parsed)
docs.append(parsed)

assert [d.text for d in docs] == [
'hello world my_input_text 0',
'hello world my_input_text 1',
'hello world my_input_text 2',
'hello world my_input_text 3',
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this test for passing documents as query params

Comment on lines 188 to 197
async with session.get(target_url) as response:
request_kwargs = {}
if request.body_exists():
payload = await request.json()
if payload:
request_kwargs['json'] = payload

async with session.get(
url=target_url, **request_kwargs
) as response:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found out where the gateway forwarding is happening.

@NarekA
Copy link
Contributor Author

NarekA commented Oct 27, 2023

Only one test job failing, and I can't find a clear stack trace in it so it might be flakiness.

@JoanFM JoanFM changed the title fix: Use body for streaming instead of params fix: use body for streaming instead of params Oct 27, 2023
@JoanFM JoanFM merged commit ab2cc19 into jina-ai:master Oct 27, 2023
128 of 130 checks passed
@JoanFM JoanFM mentioned this pull request Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming breaks for HTTP client when fields are not str, int, or float
3 participants