Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout issues with `sync=true` jobs #166

Open
olegshirokikh opened this issue Jun 9, 2015 · 10 comments
Open

Timeout issues with `sync=true` jobs #166

olegshirokikh opened this issue Jun 9, 2015 · 10 comments
Labels

Comments

@olegshirokikh
Copy link

@olegshirokikh olegshirokikh commented Jun 9, 2015

This is timeout issue for sync=true jobs of about >40 seconds. It doesn't return any SJS ERROR response, but instead generic spray message:

The server was not able to produce a timely response to your request.

Probably, it's a better idea to run such longer jobs async, but controlling this timeout and allowing sync jobs to run longer than 40sec would be very useful! If that's not possible, getting at least some response back would be cool - to track the timeout error. Now I don't see an immediate way to catch what happened...

Comment:

In the meanwhile, is there any relatively quick workaround for this?

I've tried to add this config to SJS deployed ,conf file:

spark {
   ...
   jobserver {
    ...
      spray.can.server {
         idle-timeout = 1000s
         request-timeout = 1000s
      }
   }
}

but it didn't seem to have any effect. Is it correct place and should such settings be picked up by a server?

Also, according to this one can set the spray.io.ConnectionTimeouts.SetIdleTimeout and spray.http.SetRequestTimeout. Would it be easy to set those in the SJS code - if yes, could you please point the right direction?

Another thought - maybe, in the event of timeout (or any other issue with sync=true submissions), it's possible to return meaningful response - such as sync-related issue error message and JobID - this way the user at least immediately has the job for querying the status/results. Is there a way to return JobID no matter what - even if everything fails?

@olegshirokikh
Copy link
Author

@olegshirokikh olegshirokikh commented Jun 9, 2015

To repro - with standard LongPiJob example... Note duration threshold being 40...

oleg@oleg-ubuntu:~/dev/spark-jobserver$ curl -d '{stress.test.longpijob.duration: 40}' 'localhost:8090/jobs?appName=timeoutTest&classPath=spark.jobserver.LongPiJob&sync=true&timeout=21474835'
{
  "status": "OK",
  "result": 3.141321779318313
}
oleg@oleg-ubuntu:~/dev/spark-jobserver$ curl -d '{stress.test.longpijob.duration: 41}' 'localhost:8090/jobs?appName=timeoutTest&classPath=spark.jobserver.LongPiJob&sync=true&timeout=21474835'
The server was not able to produce a timely response to your request.
oleg@oleg-ubuntu:~/dev/spark-jobserver$
@addisonj
Copy link
Contributor

@addisonj addisonj commented Jun 16, 2015

to properly set the timeouts, they don't go under the the spark.jobserver namespace.

Example config that works for me:

spark {
 ...
}

spray.can.server {
  idle-timeout = 180 s
  request-timeout = 120 s
  parsing.max-content-length = 200m
}
@zeitos
Copy link
Member

@zeitos zeitos commented Sep 25, 2015

@olegshirokikh did addisonj answer helped? can I close this one?

@noorul
Copy link
Contributor

@noorul noorul commented Oct 5, 2015

For how long do we have to wait for response? I think we should fix a time frame.

@velvia
Copy link
Contributor

@velvia velvia commented Oct 5, 2015

@noorul @zeitos we could possibly add defaults that work better for everyone. However, 200MB for a jar? That seems like awfully big.

@zeitos zeitos added the newbie label Oct 6, 2015
@zeitos
Copy link
Member

@zeitos zeitos commented Nov 11, 2015

what would be a good default? @addisonj showed a way to override the config. I can add that to the troubleshooting MD

@SriTDT
Copy link

@SriTDT SriTDT commented May 16, 2019

Maybe i am pretty late for this thread , but yeah i am also facing the error, just like what olegshirokikh was / is facing. I am trying to hit the jobserver with my api service, initially i thought that its my api which is creating an issue , but then i realized its the server , after starting the server level debugging in spring tool suite, from where i am running my api, i have set the idle timeouts and request timeouts , but still got no luck !

@zeitos
Copy link
Member

@zeitos zeitos commented May 16, 2019

@SriTDT did you try @addisonj solution?

@SriTDT
Copy link

@SriTDT SriTDT commented May 17, 2019

Yes I tried, but i got this as a response on spring boot console

Ask timed out on [Actor[akka://JobServer/user/context-supervisor/c2#2124268092]] after [10000 ms]. Sender[null] sent message of type "spark.jobserver.JobManagerActor$StartJob".",[\n]"

@SriTDT
Copy link

@SriTDT SriTDT commented May 17, 2019

Which clearly says that the connection is getting closed after 0.3 minute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.