Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination bug: retrieving last page with _count>number of items to return results in server timeout error #1300

Closed
imranmoezkhan opened this issue May 5, 2019 · 14 comments · Fixed by #1584

Comments

@imranmoezkhan
Copy link

@imranmoezkhan imranmoezkhan commented May 5, 2019

This looks like a bug to me (let me know if its not and I'm doing something wrong):
I have some data on my server with 1835 observations of a particular loinc code:
/baseDstu3/Observation?code=http://loinc.org%7C2093-3&_summary=count
output bundle has -> "total": 1835

I do a normal search for the code using default "_count" and keep going to the next page until i reach the last page:
/baseDstu3?_getpages=8a158809-fae8-4fb5-a00b-2d2631b0bc96&_getpagesoffset=1820&_count=10&_pretty=true&_bundletype=searchset

On this page, the url for the next page is:
/baseDstu3?_getpages=8a158809-fae8-4fb5-a00b-2d2631b0bc96&_getpagesoffset=1830&_count=10&_pretty=true&_bundletype=searchset

Trying to retrieve that url results in a timeout error:
{
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "error",
"code": "processing",
"diagnostics": "Request timed out after 60385ms"
}
]
}

When I change the url to have "_count=5" or less, I get the remaining observations bundle.

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented May 6, 2019

@imranmoezkhan

This comment has been minimized.

Copy link
Author

@imranmoezkhan imranmoezkhan commented May 6, 2019

Yes, its HAPI FHIR Server - 3.8.0-SNAPSHOT
I used the hapi-fhir-cli project to deploy the server, changed the db to postgres and used the DatabaseBackedPagingProvider class to set the paging provider. I also configured some extra hibernate properties (mostly the c3p0 pooling ones) - not sure if this is causing the issue:

extraProperties.put("hibernate.dialect", org.hibernate.dialect.PostgreSQL94Dialect.class.getName());
extraProperties.put("hibernate.jdbc.batch_size", "20");
extraProperties.put("hibernate.cache.use_query_cache", "false");
extraProperties.put("hibernate.cache.use_second_level_cache", "false");
extraProperties.put("hibernate.cache.use_structured_entries", "false");
extraProperties.put("hibernate.cache.use_minimal_puts", "false");
extraProperties.put("hibernate.search.model_mapping", LuceneSearchMappingFactory.class.getName());
extraProperties.put("hibernate.search.default.directory_provider", "filesystem");
extraProperties.put("hibernate.search.default.indexBase", "target/lucenefiles");
extraProperties.put("hibernate.search.lucene_version", "LUCENE_CURRENT");
extraProperties.put("hibernate.search.default.worker.execution", "async");
extraProperties.put("hibernate.c3p0.min_size", "5");
extraProperties.put("hibernate.c3p0.max_size", "50");
extraProperties.put("hibernate.c3p0.timeout", "120");

@jakeod99

This comment has been minimized.

Copy link

@jakeod99 jakeod99 commented Nov 4, 2019

We are having the same issue on 4.0.3

Even when we use _count=1 and each bundle provides an individual search result, the bundle containing the last search result still provides a link to the next (non-existent) bundle. If that link is followed, the server times out.

Below is a stack trace from following a faulty next link.

fhir_1   | 2019-11-04 16:19:25.863 [qtp2101440631-12] INFO  fhir.access [LoggingInterceptor.java:171] Path[/r4] Source[10.14.231.12] Operation[get-page  ] UA[PostmanRuntime/7.17.1] Params[?_getpages=00e369ab-123a-453a-bf68-88d23fec4a2e&_pretty=true&_elements=id%2Citem.item.item.linkId%2Citem.item.item.text%2Citem.item.linkId%2Citem.item.text%2Citem.linkId%2Citem.text%2Cname%2Cversion&_count=10&_getpagesoffset=40&_bundletype=searchset] ResponseEncoding[]
fhir_1   | 2019-11-04 16:19:35.371 [qtp2101440631-12] INFO  fhir.access [LoggingInterceptor.java:171] Path[/r4] Source[10.14.231.12] Operation[get-page  ] UA[PostmanRuntime/7.17.1] Params[?_getpages=00e369ab-123a-453a-bf68-88d23fec4a2e&_pretty=true&_elements=id%2Citem.item.item.linkId%2Citem.item.item.text%2Citem.item.linkId%2Citem.item.text%2Citem.linkId%2Citem.text%2Cname%2Cversion&_count=10&_getpagesoffset=50&_bundletype=searchset] ResponseEncoding[]
fhir_1   | 2019-11-04 16:20:40.989 [qtp2101440631-12] ERROR c.u.f.j.s.SearchCoordinatorSvcImpl [SearchCoordinatorSvcImpl.java:206] Search 1267 of type SEARCH for Questionnaire?item-text:contains=fall timed out after 60224ms
fhir_1   | 2019-11-04 16:20:40.990 [qtp2101440631-12] ERROR c.u.f.r.s.i.ExceptionHandlingInterceptor [ExceptionHandlingInterceptor.java:136] Failure during REST processing
fhir_1   | ca.uhn.fhir.rest.server.exceptions.InternalErrorException: Request timed out after 60225ms
fhir_1   | 	at ca.uhn.fhir.jpa.search.SearchCoordinatorSvcImpl.getResources(SearchCoordinatorSvcImpl.java:207)
fhir_1   | 	at jdk.internal.reflect.GeneratedMethodAccessor136.invoke(Unknown Source)
fhir_1   | 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
fhir_1   | 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
fhir_1   | 	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
fhir_1   | 	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
fhir_1   | 	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
fhir_1   | 	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:295)
fhir_1   | 	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
fhir_1   | 	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
fhir_1   | 	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
fhir_1   | 	at com.sun.proxy.$Proxy135.getResources(Unknown Source)
fhir_1   | 	at ca.uhn.fhir.jpa.search.PersistedJpaBundleProvider.doSearchOrEverything(PersistedJpaBundleProvider.java:175)
fhir_1   | 	at ca.uhn.fhir.jpa.search.PersistedJpaBundleProvider.getResources(PersistedJpaBundleProvider.java:246)
fhir_1   | 	at ca.uhn.fhir.rest.server.method.BaseResourceReturningMethodBinding.createBundleFromBundleProvider(BaseResourceReturningMethodBinding.java:143)
fhir_1   | 	at ca.uhn.fhir.rest.server.method.PageMethodBinding.handlePagingRequest(PageMethodBinding.java:158)
fhir_1   | 	at ca.uhn.fhir.rest.server.method.PageMethodBinding.doInvokeServer(PageMethodBinding.java:78)
fhir_1   | 	at ca.uhn.fhir.rest.server.method.BaseResourceReturningMethodBinding.invokeServer(BaseResourceReturningMethodBinding.java:380)
fhir_1   | 	at ca.uhn.fhir.rest.server.RestfulServer.handleRequest(RestfulServer.java:998)
fhir_1   | 	at ca.uhn.fhir.rest.server.RestfulServer.doGet(RestfulServer.java:336)
fhir_1   | 	at ca.uhn.fhir.rest.server.RestfulServer.service(RestfulServer.java:1650)
fhir_1   | 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
fhir_1   | 	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
fhir_1   | 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
fhir_1   | 	at org.eclipse.jetty.websocket.server.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:214)
fhir_1   | 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
fhir_1   | 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
fhir_1   | 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
fhir_1   | 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
fhir_1   | 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
fhir_1   | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
fhir_1   | 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1700)
fhir_1   | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
fhir_1   | 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
fhir_1   | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
fhir_1   | 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
fhir_1   | 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1667)
fhir_1   | 	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
fhir_1   | 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
fhir_1   | 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
fhir_1   | 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
fhir_1   | 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
fhir_1   | 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
fhir_1   | 	at org.eclipse.jetty.server.Server.handle(Server.java:505)
fhir_1   | 	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
fhir_1   | 	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
fhir_1   | 	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
fhir_1   | 	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
fhir_1   | 	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
fhir_1   | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
fhir_1   | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
fhir_1   | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
fhir_1   | 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
fhir_1   | 	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
fhir_1   | 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
fhir_1   | 	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
fhir_1   | 	at java.base/java.lang.Thread.run(Thread.java:834)
@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 4, 2019

@jakeod99 is this something you can replicate on our public server?

@jakeod99

This comment has been minimized.

Copy link

@jakeod99 jakeod99 commented Nov 4, 2019

Interesting - can't seem to replicate on the public server. Turns out it only times out when we search by our custom SearchParameter. The search seems to work properly for the resources it retrieves, but the bundles don't always contain a total: <number> field and the last next link times out the server. This is our SearchParameter:

"resourceType": "SearchParameter",
"id": "QuestionnaireItemText",
"meta": {
    "versionId": "1",
    "lastUpdated": "2019-10-25T12:08:21.962+00:00"
},
"text": {
    "status": "generated",
    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\">Search for a questionnaire by item.text nested field - up to 3 levels.</div>"
},
"url": <redacted, but I checked and it accurately points to this SearchParameter on our server>
"name": "QuestionnaireItemText",
"status": "active",
"description": "Search for a questionnaire by item.text nested field - up to 3 levels.",
"code": "item-text",
"base": [
    "Questionnaire"
],
"type": "string",
"expression": "Questionnaire.item.text | Questionnaire.item.item.text | Questionnaire.item.item.item.text",
"xpathUsage": "normal",
"modifier": [
    "contains"
]

Is this a problem with HAPI's handling of the SearchParameter resource (specifically paging their responses) or is our SearchParameter wrong and therefore throwing things off?

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 4, 2019

Fascinating!

Does it occur if you upload that search parameter to the public server?

@timster

This comment has been minimized.

Copy link

@timster timster commented Nov 7, 2019

We couldn't duplicate it on the public test server. Is the source of hapi.fhir.org published anywhere? We suspect at this point it's a configuration issue on our server. I'd be curious to compare to the way hapi.fhir.org is configured.

Is it using a Postgres database?

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 7, 2019

@timster

This comment has been minimized.

Copy link

@timster timster commented Nov 7, 2019

I was able to duplicate this on hapi.fhir.org after uploading our questionnaires.

This tells me there should be 242 results for item-text:contains=Section (using the custom SearchParameter):

http://hapi.fhir.org/baseR4/Questionnaire?item-text:contains=Section&_summary=count

So running the search works (limiting elements to speed things up):

http://hapi.fhir.org/baseR4/Questionnaire?item-text:contains=Section&_elements=id,name

Going to the next page gave me the error:

http://hapi.fhir.org/baseR4?_getpages=d637d3fb-896e-4c6d-bec1-a212720d05cf&_getpagesoffset=20&_count=20&_pretty=true&_bundletype=searchset&_elements=id,name

<html>

<head>
	<title>504 Gateway Time-out</title>
</head>

<body bgcolor="white">
	<center>
		<h1>504 Gateway Time-out</h1>
	</center>
	<hr>
	<center>nginx/1.14.0 (Ubuntu)</center>
</body>

</html>

So it seems this might not be related to how many items there are? Maybe our questionnaires are too big for searching?

Let me know if you want me to open this as a separate issue.

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 7, 2019

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 7, 2019

Ok, so... good news: I have a working (failing) unit test that reproduces this problem exactly now :)

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 7, 2019

Looks like a distinct clause got dropped on the main query builder, and that was causing issues when a search parameter matched many multiple distinct values on the same resource... That's a fairly rare occurrence, but it definitely can happen.

Fix is going through CI now

@timster

This comment has been minimized.

Copy link

@timster timster commented Nov 7, 2019

Oh wow. Thanks for the quick debugging and turnaround.

@jamesagnew

This comment has been minimized.

Copy link
Owner

@jamesagnew jamesagnew commented Nov 7, 2019

SearchCoordinator bugs are literally priority 1 as far as I am concerned. Congratulations on finding one :)

jamesagnew added a commit that referenced this issue Nov 7, 2019
* Fix paging issue

* Add changelog with credit

* Add a test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.