New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout issues with `sync=true` jobs #166

Open
olegshirokikh opened this Issue Jun 9, 2015 · 6 comments

Comments

Projects
None yet
5 participants
@olegshirokikh

This is timeout issue for sync=true jobs of about >40 seconds. It doesn't return any SJS ERROR response, but instead generic spray message:

The server was not able to produce a timely response to your request.

Probably, it's a better idea to run such longer jobs async, but controlling this timeout and allowing sync jobs to run longer than 40sec would be very useful! If that's not possible, getting at least some response back would be cool - to track the timeout error. Now I don't see an immediate way to catch what happened...

Comment:

In the meanwhile, is there any relatively quick workaround for this?

I've tried to add this config to SJS deployed ,conf file:

spark {
   ...
   jobserver {
    ...
      spray.can.server {
         idle-timeout = 1000s
         request-timeout = 1000s
      }
   }
}

but it didn't seem to have any effect. Is it correct place and should such settings be picked up by a server?

Also, according to this one can set the spray.io.ConnectionTimeouts.SetIdleTimeout and spray.http.SetRequestTimeout. Would it be easy to set those in the SJS code - if yes, could you please point the right direction?

Another thought - maybe, in the event of timeout (or any other issue with sync=true submissions), it's possible to return meaningful response - such as sync-related issue error message and JobID - this way the user at least immediately has the job for querying the status/results. Is there a way to return JobID no matter what - even if everything fails?

@olegshirokikh

This comment has been minimized.

Show comment
Hide comment
@olegshirokikh

olegshirokikh Jun 9, 2015

To repro - with standard LongPiJob example... Note duration threshold being 40...

oleg@oleg-ubuntu:~/dev/spark-jobserver$ curl -d '{stress.test.longpijob.duration: 40}' 'localhost:8090/jobs?appName=timeoutTest&classPath=spark.jobserver.LongPiJob&sync=true&timeout=21474835'
{
  "status": "OK",
  "result": 3.141321779318313
}
oleg@oleg-ubuntu:~/dev/spark-jobserver$ curl -d '{stress.test.longpijob.duration: 41}' 'localhost:8090/jobs?appName=timeoutTest&classPath=spark.jobserver.LongPiJob&sync=true&timeout=21474835'
The server was not able to produce a timely response to your request.
oleg@oleg-ubuntu:~/dev/spark-jobserver$

To repro - with standard LongPiJob example... Note duration threshold being 40...

oleg@oleg-ubuntu:~/dev/spark-jobserver$ curl -d '{stress.test.longpijob.duration: 40}' 'localhost:8090/jobs?appName=timeoutTest&classPath=spark.jobserver.LongPiJob&sync=true&timeout=21474835'
{
  "status": "OK",
  "result": 3.141321779318313
}
oleg@oleg-ubuntu:~/dev/spark-jobserver$ curl -d '{stress.test.longpijob.duration: 41}' 'localhost:8090/jobs?appName=timeoutTest&classPath=spark.jobserver.LongPiJob&sync=true&timeout=21474835'
The server was not able to produce a timely response to your request.
oleg@oleg-ubuntu:~/dev/spark-jobserver$
@addisonj

This comment has been minimized.

Show comment
Hide comment
@addisonj

addisonj Jun 16, 2015

Contributor

to properly set the timeouts, they don't go under the the spark.jobserver namespace.

Example config that works for me:

spark {
 ...
}

spray.can.server {
  idle-timeout = 180 s
  request-timeout = 120 s
  parsing.max-content-length = 200m
}
Contributor

addisonj commented Jun 16, 2015

to properly set the timeouts, they don't go under the the spark.jobserver namespace.

Example config that works for me:

spark {
 ...
}

spray.can.server {
  idle-timeout = 180 s
  request-timeout = 120 s
  parsing.max-content-length = 200m
}
@zeitos

This comment has been minimized.

Show comment
Hide comment
@zeitos

zeitos Sep 25, 2015

Member

@olegshirokikh did addisonj answer helped? can I close this one?

Member

zeitos commented Sep 25, 2015

@olegshirokikh did addisonj answer helped? can I close this one?

@noorul

This comment has been minimized.

Show comment
Hide comment
@noorul

noorul Oct 5, 2015

Contributor

For how long do we have to wait for response? I think we should fix a time frame.

Contributor

noorul commented Oct 5, 2015

For how long do we have to wait for response? I think we should fix a time frame.

@velvia

This comment has been minimized.

Show comment
Hide comment
@velvia

velvia Oct 5, 2015

Contributor

@noorul @zeitos we could possibly add defaults that work better for everyone. However, 200MB for a jar? That seems like awfully big.

Contributor

velvia commented Oct 5, 2015

@noorul @zeitos we could possibly add defaults that work better for everyone. However, 200MB for a jar? That seems like awfully big.

@zeitos zeitos added the newbie label Oct 6, 2015

@zeitos

This comment has been minimized.

Show comment
Hide comment
@zeitos

zeitos Nov 11, 2015

Member

what would be a good default? @addisonj showed a way to override the config. I can add that to the troubleshooting MD

Member

zeitos commented Nov 11, 2015

what would be a good default? @addisonj showed a way to override the config. I can add that to the troubleshooting MD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment