Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotALeaderError in write_transaction() #276

Closed
P1zz4br0etch3n opened this issue Jan 22, 2019 · 11 comments
Closed

NotALeaderError in write_transaction() #276

P1zz4br0etch3n opened this issue Jan 22, 2019 · 11 comments

Comments

@P1zz4br0etch3n
Copy link

I got NotALeaderError in write_transaction() when running an application with neo4j-driver for about 1 day on average. This happens almost every day since it is runnning 24/7. The app runs a transaction every 5 minutes if necessary. After the first error is thrown no more transactions will succeed until application restart.

This might be related: neo4j-contrib/neomodel#335

Neo4j Version: 3.4.9 Enterprise
Neo4j Mode: Causal cluster with 3 core
Driver version: Python driver 1.7.1
Operating System: Docker base image python:2.7-slim
Packaging Tool: Pipenv

Steps to reproduce

  1. Start Neo4j on self-hosted VM
  2. Start Application on Kubernetes
  3. Let it run for a day

Expected behavior

The application keeps being able to run write_transaction() on Neo4j cluster

Actual behavior

After running for about 1 day the application stops being able to write.
Traceback (modified to hide function/application names):

Traceback (most recent call last):\
 results = session.write_transaction(_unit_of_work, time_limit)\
 File \\"/usr/local/lib/python2.7/site-packages/neo4j/__init__.py\\", line 708, in write_transaction\
 return self._run_transaction(WRITE_ACCESS, unit_of_work, *args, **kwargs)\
 File \\"/usr/local/lib/python2.7/site-packages/neo4j/__init__.py\\", line 683, in _run_transaction\
 tx.close()\
 File \\"/usr/local/lib/python2.7/site-packages/neo4j/__init__.py\\", line 822, in close\
 self.sync()\
 File \\"/usr/local/lib/python2.7/site-packages/neo4j/__init__.py\\", line 787, in sync\
 self.session.sync()\
 File \\"/usr/local/lib/python2.7/site-packages/neo4j/__init__.py\\", line 538, in sync\
 detail_count, _ = self._connection.sync()\
 File \\"/usr/local/lib/python2.7/site-packages/neobolt/direct.py\\", line 506, in sync\
 detail_delta, summary_delta = self.fetch()\
 File \\"/usr/local/lib/python2.7/site-packages/neobolt/direct.py\\", line 413, in fetch\
 raise error\
NotALeaderError: No write operations are allowed directly on this database. Writes must pass through the leader. The role of this server is: FOLLOWER\
@technige
Copy link
Contributor

I'm not sure how easy this'll be to recreate just from the information we have here. But it sounds to me like the routing table has invalid data or isn't being updated correctly.

Getting extra logs from the client would help. You'll need to hook into the logger called "neobolt" to see the conversation between client and server. There's a built-in helper to dump this to stdout, which you might be able to capture. These two lines just need to go at the top of the application:

from neobolt.diagnostics import watch
watch("neobolt")

After that, we should have a much clearer idea what is (or isn't) happening.

@P1zz4br0etch3n
Copy link
Author

Thanks for your fast reply. We're capturing the neobolt logs since yesterday but the error didn't occur, yet. I will comment again when it recurs.

@P1zz4br0etch3n
Copy link
Author

We now have a log file that captured the error, starting from the last successful commit. Do you need the queries we run? If yes, can I send the file to you directly? I don't want to publish them here.

@technige
Copy link
Contributor

Yes, please drop me an email: nigel at neo4j dot com.

@P1zz4br0etch3n
Copy link
Author

Ok, I've sent you an email with the issue title in subject.

@technige
Copy link
Contributor

Thanks, received. I'll have a look over the next couple of days.

@P1zz4br0etch3n
Copy link
Author

Any progress on this?

@P1zz4br0etch3n
Copy link
Author

We're still getting that error..

@technige
Copy link
Contributor

technige commented Feb 27, 2019

@P1zz4br0etch3n Can you confirm that you are using bolt+routing and not just bolt. We can't see any routing table updates in the log file.

@P1zz4br0etch3n
Copy link
Author

Yes, we are definitely using bolt+routing.

@technige
Copy link
Contributor

technige commented Mar 4, 2019

Will be fixed in the 1.7.2 patch that contains #283 (as well as equivalent patches for 1.5 and 1.6). Due for release on Thursday 7th March 2019.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants