-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
neobolt.exceptions.ServiceUnavailable in ecr.load_ecr_repository_images() when connecting to remote Neo4j #522
neobolt.exceptions.ServiceUnavailable in ecr.load_ecr_repository_images() when connecting to remote Neo4j #522
Comments
Actually going to reopen this issue as #523 does not completely resolve it. |
@achantavy I believe I know what the root cause is given my Neo4j experience. See my PR #526. You were on the right track with (edited to explain batching since I hit comment too soon) |
Attempts to fix lyft#522 by batching the data to minimize transaction size. Should resolve network errors due to unresponsive Neo4j server when under too much memory pressure. I chose the batch size arbitrarily, but given it's creating nodes as well as relationships to highly dense nodes, it shouldn't be set too high without knowing the characteristics of the database host.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Hey @voutilad, I put together a plan for improving write perf in this project. If you have 10 minutes would really appreciate your input: https://docs.google.com/document/d/1IZ12R3oROn11LcYj5XunokyOjJkKu-H2O1TEk065Dsk/edit# |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new change or reopen this one. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@achantavy I'm also happy to help if @voutilad is busy. If you want to we can have a look at the code together, just drop me an email to schedule a call. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@voutilad @jexp Thanks for the help so far. I saw this issue happen on a different section of internal code and was able to resolve it by using explicit transactions! Deployment information: we have a k8s cronjob running neo4j python driver 1.7.6, writing data to a Neo4j enterprise 3.5.19 database across an AWS Network Load Balancer. To summarize, the code would
Most times, step (3) would take longer than 380 seconds and the code would work fine, which is not what I would expect because this is longer than the timeout from our AWS NLB and the value of our neo4j driver's To fix, I changed the code to explicitly use To get to this solution, I stumbled upon this section of the current driver doc: https://neo4j.com/docs/api/python-driver/current/api.html#managed-transactions-transaction-functions
Sure enough, this seemed encouraging and it worked! Prior to this I had only been reading the docs for the 1.7 driver but it seems that the docs have become more thorough for the current drivers. I'll push out a similar fix to address this specific issue and other related ones. I guess to summarize Python driver best practices that we've learned in this project,
This has been bugging me for months and I'm glad to finally have forward movement on this problem. :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Description:
When loading 50MB worth of data to a remote Neo4j server (i.e. not located on the same machine), ecr.load_ecr_repository_images() crashes with a neobolt.exceptions.ServiceUnavailable error after running for 2 hours.
To Reproduce:
Run
ecr.load_ecr_repository_images()
with 50MB of data.POC code:
Logs:
Please complete the following information::
0c9a662
3.7.9
Have observed this in a Docker container based on Debian as well as my OSX laptop. Neither of them appear to be resource constrained: CPU usage is around 0%, memory usage of the python process is about 200-300MB.
Additional context:
This appears related to
ConnectionResetError and ServiceUnavailable exceptions thrown from ECR sync #440
Sync connections can become blocked on neo4j >=3.3, which will then cause them to be dropped due to inactivity. #170
all of these issues involve sending fairly large objects over the Bolt connection.
Update:
I've also observed this issue on load_ecr_repositories().
The text was updated successfully, but these errors were encountered: