Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drafter doesn't seem to report DB errors when batching triples into drafts #72

Open
RickMoynihan opened this issue Feb 5, 2016 · 1 comment

Comments

@RickMoynihan
Copy link
Member

From the discussion below it sounds like the error occurred and was swallowed somewhere inside sesame/grafter... Probably this line... should take a look:

https://github.com/Swirrl/drafter/blob/master/src/drafter/backend/sesame/common/draft_management.clj#L10

See chat below:

[13:14]
ricroberts Hi @RickMoynihan
[13:14]
I’ve been running an upload of postcodes to sg-sandbox
[13:14]
and something weird happened
[13:14]
it was happily batching writes:
[13:15]
ricroberts added a Plain Text snippet
2016-02-05 12:37:06,443 INFO write-scheduler :: Writer waiting for tasks
2016-02-05 12:37:06,443 INFO write-scheduler :: Executing job #swirrl_server.async.jobs.Job{:id #uuid "a75e8363-9ea5-4dc5-a230-86df28d8ed2f", :priority :batch-write, :time 1454675826443, :function #<core$partial$fn__4234 clojure.core$partial$fn__4234@586168ef>, :value-p #<core$promise$reify__6363@1b96733b: :pending>}
2016-02-05 12:37:06,452 INFO draft-api :: Adding a batch of triples to repoclojure.lang.LazySeq@5bf48648
2016-02-05 12:37:08,545 INFO middleware 1401082658 :: REQUEST /status/finished-jobs/a75e8363-9ea5-4dc5-a230-86df28d8ed2f /; q=0.5, application/xml {}
2016-02-05 12:37:08,545 INFO middleware 1401082658 :: RESPONSE 404 took 0ms
2016-02-05 12:37:08,549 INFO middleware 1177801622 :: REQUEST /status/finished-jobs/a75e8363-9ea5-4dc5-a230-86df28d8ed2f /; q=0.5, application/xml {}
2016-02-05 12:37:08,549 INFO middleware 1177801622 :: RESPONSE 404 took 0ms
2016-02-05 12:37:08,755 INFO write-scheduler :: Queueing job: #swirrl_server.async.jobs.Job{:id #uuid "a75e8363-9ea5-4dc5-a230-86df28d8ed2f", :priority :batch-write, :time 1454675828755, :function #<core$partial$fn__4234 clojure.core$partial$fn__4234@61a84658>, :value-p #<core$promise$reify__6363@1b96733b: :pending>}
2016-02-05 12:37:08,756 INFO write-scheduler :: Writer waiting for tasks
2016-02-05 12:37:08,756 INFO write-scheduler :: Executing job #swirrl_server.async.jobs.Job{:id #uuid "a75e8363-9ea5-4dc5-a230-86df28d8ed2f", :priority :batch-write, :time 1454675828755, :function #<core$partial$fn__4234 clojure.core$partial$fn__4234@61a84658>, :value-p #<core$promise$reify__6363@1b96733b: :pending>}
2016-02-05 12:37:08,767 INFO draft-api :: Adding a batch of triples to repoclojure.lang.LazySeq@7ba04909
2016-02-05 12:37:11,048 INFO write-scheduler :: Queueing job: #swirrl_server.async.jobs.Job{:id #uuid "a75e8363-9ea5-4dc5-a230-86df28d8ed2f", :priority :batch-write, :time 1454675831048, :function #<core$partial$fn__4234 clojure.core$partial$fn__4234@4c0d8a28>, :value-p #<core$promise$reify__6363@1b96733b: :pending>}
2016-02-05 12:37:11,049 INFO write-scheduler :: Writer waiting for tasks
2016-02-05 12:37:11,049 INFO write-scheduler :: Executing job #swirrl_server.async.jobs.Job{:id #uuid "a75e8363-9ea5-4dc5-a230-86df28d8ed2f", :priority :batch-write, :time 1454675831048, :function #<core$partial$fn__4234 clojure.core$partial$fn__4234@4c0d8a28>, :value-p #<core$promise$reify__6363@1b96733b: :pending>}
2016-02-05 12:37:11,057 INFO draft-api :: Adding a batch of triples to repoclojure.lang.LazySeq@bc9f7a79
Add Comment Collapse
[13:15]
ricroberts but since 12:37 it stopped adding batches
[13:15]
and the checks for whether it finished just return 404
[13:15]
ricroberts added a Plain Text snippet
2016-02-05 12:37:13,239 INFO middleware 2037890086 :: REQUEST /status/finished-jobs/a75e8363-9ea5-4dc5-a230-86df28d8ed2f /; q=0.5, application/xml {}
2016-02-05 12:37:13,239 INFO middleware 2037890086 :: RESPONSE 404 took 0ms
Add Comment
[13:15]
ricroberts There’s no completed log entry(edited)
[13:16]
😕
[13:17]
drafter hasn’t restarted
[13:18]
i’ll try restarting it and see what state it got to
[13:18]
(unless you want me to check anything first(?) @RickMoynihan(edited)
[13:19]
there’s virtually no cpu activity
[13:23]
rickmoynihan Sorry rick was just writing this up: https://openrdf.atlassian.net/browse/SES-2369
[13:23]
ricroberts np
[13:23]
rickmoynihan parsing your woes now...
[13:24]
hmmm weird
[13:24]
are there any exceptions or errors anywhere?
[13:25]
ricroberts no
[13:25]
i restarted drafter / killed the job in pmd
[13:25]
not all the data got thru
[13:26]
rickmoynihan did any go in?
[13:26]
ricroberts about 80% of it
[13:26]
by the look of it
[13:26]
50k postcodes out of 65k
[13:27]
rickmoynihan @leekitching: might have some ideas - I think the batching code has been tweaked a few times
[13:28]
ricroberts it’s running build_449
[13:28]
if that makes a diff!
[13:28]
rickmoynihan ta
[13:29]
will figure out what commit its at
[13:29]
ricroberts it was taking ages (longer than i expected) so i looked at drafter
[13:29]
rickmoynihan https://github.com/Swirrl/drafter/tree/f38195a347e490dc31bdfd8219cb0e48a8f01c76(edited)
[13:33]
do we know if stardog barfed somewhere?
[13:35]
ricroberts will look
[13:38]
rickmoynihan one change has been the child-job stuff lee did - not sure how that stuff works
[13:42]
ricroberts after restarting drafter now getting errors on every sparql request
[13:42]
rickmoynihan stardog?
[13:42]
ricroberts yeah
[13:43]
looks like it
[13:43]
ricroberts added a Plain Text snippet
Feb 05, 2016 1:41:08 PM com.complexible.common.protocols.server.rpc.ServerHandler exceptionCaught
SEVERE: exceptionCaughtServerHandler
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
Add Comment Click to expand inline 38 lines
[13:43]
rickmoynihan so do we think stardogs queries started erroring and drafter failed - but failed to fail properly?
[13:43]
ricroberts it was happily loading postcodes then….
[13:44]
ricroberts added a Plain Text snippet
Feb 05, 2016 12:37:20 PM com.complexible.stardog.protocols.server.SPECServerFunction query
SEVERE: There was an error executing query: INSERT DATA
{
...

}
java.lang.RuntimeException: com.complexible.stardog.plan.eval.operator.OperatorException: There was a fatal failure during preparation of 1d33a1e4-0681-44be-881d-9cd9058cb884 org.openrdf.rio.RDFParseException: Unable to allocate 4.1K bytes, direct memory exhausted
at com.complexible.common.iterations.TransformException.wrapException(TransformException.java:66)
at com.complexible.common.iterations.TransformException.hasNext(TransformException.java:90)
at com.complexible.common.iterations.Iterations.each(Iterations.java:339)
at com.complexible.common.iterations.Iterations.consume(Iterations.java:417)
at com.complexible.stardog.plan.eval.QueryEngine.executeUpdate(QueryEngine.java:155)
at com.complexible.stardog.query.DefaultQueryFactory$UpdateQueryImpl.execute(DefaultQueryFactory.java:311)
at com.complexible.stardog.query.DefaultQueryFactory$UpdateQueryImpl.execute(DefaultQueryFactory.java:290)
at com.complexible.stardog.StardogKernel$DelegatingUpdateQuery.execute(StardogKernel.java:3817)
at com.complexible.stardog.StardogKernel$SecuredUpdateQuery.execute(StardogKernel.java:3712)
at com.complexible.stardog.StardogKernel$SecuredUpdateQuery.execute(StardogKernel.java:3699)
at com.complexible.stardog.protocols.server.SPECServerFunction.query(SPECServerFunction.java:511)
at com.complexible.stardog.protocols.server.SPECServerFunction.handleMessage(SPECServerFunction.java:147)
at com.complexible.common.protocols.server.rpc.ServerHandler.handleMessage(ServerHandler.java:247)
at com.complexible.common.protocols.server.rpc.ServerHandler.channelRead(ServerHandler.java:146)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:32)
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:299)
at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:36)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.complexible.stardog.plan.eval.operator.OperatorException: There was a fatal failure during preparation of 1d33a1e4-0681-44be-881d-9cd9058cb884 org.openrdf.rio.RDFParseException: Unable to allocate 4.1K bytes, direct memory exhausted
at com.complexible.stardog.plan.eval.operator.impl.UpdateSequenceOperatorImpl.computeNext(UpdateSequenceOperatorImpl.java:88)
at com.complexible.stardog.plan.eval.operator.impl.UpdateSequenceOperatorImpl.computeNext(UpdateSequenceOperatorImpl.java:27)
at com.complexible.common.collect.AbstractSkippingIteration.tryToComputeNext(AbstractSkippingIteration.java:125)
at com.complexible.common.collect.AbstractSkippingIteration.hasNext(AbstractSkippingIteration.java:110)
at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:92)
at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:27)
at com.complexible.common.collect.AbstractSkippingIteration.tryToComputeNext(AbstractSkippingIteration.java:125)
at com.complexible.common.collect.AbstractSkippingIteration.hasNext(AbstractSkippingIteration.java:110)
at com.complexible.common.iterations.TransformException.hasNext(TransformException.java:87)
... 19 more
Caused by: com.complexible.stardog.db.DatabaseException: There was a fatal failure during preparation of 1d33a1e4-0681-44be-881d-9cd9058cb884 org.openrdf.rio.RDFParseException: Unable to allocate 4.1K bytes, direct memory exhausted
at com.complexible.stardog.db.DatabaseConnectionImpl.commit(DatabaseConnectionImpl.java:350)
at com.complexible.stardog.db.DelegatingTransactionalConnectableConnection.commit(DelegatingTransactionalConnectableConnection.java:68)
at com.complexible.stardog.plan.eval.operator.impl.UpdateSequenceOperatorImpl.computeNext(UpdateSequenceOperatorImpl.java:71)
... 27 more
Caused by: com.complexible.tx.api.HeuristicRollbackException: There was a fatal failure during preparation of 1d33a1e4-0681-44be-881d-9cd9058cb884 org.openrdf.rio.RDFParseException: Unable to allocate 4.1K bytes, direct memory exhausted
at com.complexible.tx.api.impl.DefaultTransaction.doRollback(DefaultTransaction.java:625)
at com.complexible.tx.api.impl.DefaultTransaction.runPreparePhase(DefaultTransaction.java:446)
at com.complexible.tx.api.impl.DefaultTransaction.commit(DefaultTransaction.java:336)
at com.complexible.stardog.db.DatabaseConnectionImpl.commit(DatabaseConnectionImpl.java:342)
... 29 more
Add Comment Collapse
[13:45]
ricroberts (added a bit more to the start of that last error)
[13:45]
rickmoynihan ok
[13:46]
maybe stardog failed to fail properly in that case
[13:46]
ricroberts hrm
[13:46]
but weird that drafter thought it was still going
[13:46]
rickmoynihan or maybe it did fail properly and drafter failed to fail gracefully
[13:46]
yeah thats what I mean
[13:47]
ricroberts :simple_smile:
[13:48]
gonna try it again
[13:48]
btw, this was the first load i did after successfully loading 900M triples via data add (stardog command line)
[13:49]
rickmoynihan https://github.com/Swirrl/drafter/blob/better-errors/src/drafter/backend/sesame/common/draft_management.clj#L10
[13:50]
ricroberts did it again and it worked in <1 min
[13:50]
rickmoynihan ↖️ this line calls into grafter -> sparql-repository - the way errors are handled in that stack is a little trixy
[13:50]
so I wouldn't be surprised if there was a problem there

@ricroberts
Copy link
Contributor

I just experienced something similar. After loading 900M triples, I tried to do an intensive set of queries, and it crashed stardog with a similar error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants