Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3.OperationalError: disk I/O error - coverage/sqldata.py, line 1024 #915

Closed
caronc opened this issue Jan 6, 2020 · 57 comments
Closed

sqlite3.OperationalError: disk I/O error - coverage/sqldata.py, line 1024 #915

caronc opened this issue Jan 6, 2020 · 57 comments
Labels
Milestone

Comments

@caronc
Copy link

@caronc caronc commented Jan 6, 2020

Describe the bug
Since coverage v5.0.2, I throw an exception during Python 3.6 testing (Python 3.4, 3.5, 3.7, and 3.8 work fine - source). Below is a screenshot of the exception thrown (source)
Screenshot from 2020-01-06 16-05-07

This problem does not occur with v5.0.1 (source).

Originally I thought this was a Travis CI issue (source), but it appears specific to the last release of coverage instead.

To Reproduce
How can we reproduce the problem? Please be specific.

  1. Python v3.6.7
  2. Ubuntu 16.04.6 LTS
  3. Kernel: 4.15.0-1028-gcp (xenial container here)
  4. v5.0.2 of coverage (pulled from PyPi/pip)
  5. The following packages are installed:
    apprise==0.8.2
    asgiref==3.2.3
    attrs==19.3.0
    certifi==2019.11.28
    chardet==3.0.4
    Click==7.0
    coverage==5.0.2
    Django==3.0.2
    entrypoints==0.3
    filelock==3.0.12
    flake8==3.7.9
    idna==2.8
    importlib-metadata==1.3.0
    Markdown==3.1.1
    mccabe==0.6.1
    mock==3.0.5
    more-itertools==8.0.2
    oauthlib==3.1.0
    packaging==20.0
    pluggy==0.13.1
    py==1.8.1
    pycodestyle==2.5.0
    pyflakes==2.1.1
    pyparsing==2.4.6
    pytest==5.3.2
    pytest-cov==2.8.1
    pytest-django==3.7.0
    pytz==2019.3
    PyYAML==5.3
    requests==2.22.0
    requests-oauthlib==1.3.0
    six==1.13.0
    sqlparse==0.3.0
    toml==0.10.0
    tox==3.14.3
    urllib3==1.25.7
    virtualenv==16.7.9
    wcwidth==0.1.8
    zipp==0.6.0
    
  6. I'm trying to build my package Apprise API which I hadn't needed to build until of late. Here is the latest build status.
  7. The command ran was:
    coverage run --parallel -m pytest apprise_api

Expected behavior
Not to get an exception (or maybe it just needs to be gracefully caught since it's on an __exit__ call anyway)?

Additional context
If i re-run the Travis-CI runner on a previous Python 3.6 test runner that has passed (in the past), it fails on Python 3.6 (more details and screenshots explained here).

@caronc caronc added the bug label Jan 6, 2020
@geekvikas

This comment has been minimized.

Copy link

@geekvikas geekvikas commented Jan 6, 2020

Happens in Ubuntu docker image with circleci too but slight different error message is thrown..

Couldn't use data file '/home/circleci/project/test_results/.coverage': database disk image is malformed

@joaonc

This comment has been minimized.

Copy link

@joaonc joaonc commented Jan 6, 2020

Broke our CI as well with the same issue. We pinned converage to 5.0.1. Looks like this should be fixed asap.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

This does sound serious. I tried to reproduce it with the Apprise API repo, but running the tests locally doesn't show the problem. Does anyone have specific instructions for reproducing the error?

@nedbat nedbat added this to the 5.0.3 milestone Jan 7, 2020
@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

@geekvikas Can you give me details of your failure?

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

@joaonc Can you provide details of your failure?

@feu-de-bois

This comment has been minimized.

Copy link

@feu-de-bois feu-de-bois commented Jan 7, 2020

Hope this is not adding noise, but perhaps is related? I see an exception in the same file in python2.7 running coverage with the --with-coverage option from nosetest. This exception does not occur in v5.0.1.

File "/home/user/.env_python2.7/local/lib/python2.7/site-packages/coverage/sqldata.py", line 1048, in execute raise CoverageException("Couldn't use data file {!r}: {}".format(self.filename, msg)) coverage.misc.CoverageException: Couldn't use data file '/home/brendan/dev/mt/mt-models/.coverage': no such table: meta

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

@feu-de-bois That sounds like a different error. If you could open a new issue, with specific instructions about how to reproduce it, that would be a big help.

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 7, 2020

@nedbat The one thing i did not include in my original description (since you're not able to reproduce right now), is that it seems to happen with:

  • Ubuntu 16.04.6 LTS
  • Kernel: 4.15.0-1028-gcp

It's part of the xenial docker containers (available on Ubuntu's officially hosted docker page here). This may or may not help you. I'll add this information to my original description so it's all in one place.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

@caronc Thanks, I appreciate the extra context. I will try it out. I should warn you though: I may need more explicit instructions, like "docker run ..." then "sudo apt-get install ....", etc.

@joaonc

This comment has been minimized.

Copy link

@joaonc joaonc commented Jan 7, 2020

@nedbat here's some more context:
We use pytest and pytest-cov, which runs coverage. Here's the error we get:

Traceback (most recent call last):
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/sqldata.py", line 1033, in execute
  return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: context

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/_pytest/main.py", line 196, in wrap_session
  session.exitstatus = doit(config, session) or 0
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/_pytest/main.py", line 246, in _main
  config.hook.pytest_runtestloop(session=session)
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
  return self._hookexec(self, self.get_hookimpls(), kwargs)
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
  return self._inner_hookexec(hook, methods, kwargs)
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
  firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/pluggy/callers.py", line 203, in _multicall
  gen.send(outcome)
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/pytest_cov/plugin.py", line 254, in pytest_runtestloop
  self.cov_controller.finish()
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/pytest_cov/engine.py", line 197, in finish
  self.cov.stop()
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/control.py", line 642, in save
  data = self.get_data()
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/control.py", line 696, in get_data
  if self._collector and self._collector.flush_data():
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/collector.py", line 425, in flush_data
  self.covdata.add_lines(self.mapped_file_dict(self.data))
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/sqldata.py", line 440, in add_lines
  self._set_context_id()
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/sqldata.py", line 399, in _set_context_id
  context_id = self._context_id(context)
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/sqldata.py", line 376, in _context_id
  row = con.execute("select id from context where context = ?", (context,)).fetchone()
File "/home/jenkins/agent/workspace/build/lib/python3.7/site-packages/coverage/sqldata.py", line 1048, in execute
  raise CoverageException("Couldn't use data file {!r}: {}".format(self.filename, msg))
coverage.misc.CoverageException: Couldn't use data file '/home/jenkins/agent/workspace/.coverage': no such table: context

Hope that helps.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

@joaonc Sorry, I should have been clearer. Can you give me detailed specific instructions of the exact commands I can run that will clone your code and run your test suite to reproduce the failure?

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 7, 2020

@feu-de-bois Your error is described in #916, and I'm working on a fix for it.

@shashanksingh28

This comment has been minimized.

Copy link

@shashanksingh28 shashanksingh28 commented Jan 7, 2020

I am almost certain the error we are observing is the same issue.

With pytest==5.3.2,pytest-cov==2.8.1,coverage==5.0.2

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/_pytest/main.py", line 196, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/_pytest/main.py", line 246, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
INTERNALERROR>     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR>     gen.send(outcome)
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/pytest_cov/plugin.py", line 254, in pytest_runtestloop
INTERNALERROR>     self.cov_controller.finish()
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/pytest_cov/engine.py", line 197, in finish
INTERNALERROR>     self.cov.stop()
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/coverage/control.py", line 642, in save
INTERNALERROR>     data = self.get_data()
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/coverage/control.py", line 696, in get_data
INTERNALERROR>     if self._collector and self._collector.flush_data():
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/coverage/collector.py", line 423, in flush_data
INTERNALERROR>     self.covdata.add_arcs(self.mapped_file_dict(self.data))
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/coverage/sqldata.py", line 471, in add_arcs
INTERNALERROR>     with self._connect() as con:
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/coverage/sqldata.py", line 1016, in __enter__
INTERNALERROR>     self._connect()
INTERNALERROR>   File "<path>.tox/test/lib/python3.7/site-packages/coverage/sqldata.py", line 998, in _connect
INTERNALERROR>     self.con = sqlite3.connect(self.connect_filename, check_same_thread=False)
INTERNALERROR> sqlite3.OperationalError: unable to open database file

Issue does not exist with coverage==5.0.b2

Tried to come up with a minimum reproducible setup but couldn't come up with something I could share here that is simple..

r9y9 added a commit to r9y9/nnmnkwii that referenced this issue Jan 8, 2020
workaround for nedbat/coveragepy#915
@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 8, 2020

@shashanksingh28 You don't have to make a minimal reproducer. I will run your entire test suite on your product if you can share it with me.

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 8, 2020

@nedbat Have a look at the Travis CI ticket i had open. One of the community/dev reps there was kind enough to give you exact instructions you hinted towards. I'll paste them here, but I certainly don't want to take credit for his awesome effort:

cid=$(docker run -dti --privileged=true --entrypoint=/sbin/init -v /sys/fs/cgroup:/sys/fs/cgroup:ro travisci/ci-sardonyx:packer-1542104228-d128723)

docker exec -it $cid /bin/bash
su - travis
git clone --depth=50 --branch=python38-ci-testing https://github.com/caronc/apprise-api.git caronc/apprise-api
cd caronc/apprise-api/
export TOXENV=py36
source ~/virtualenv/python3.6/bin/activate
python --version
pip --version
pip install codecov
pip install -r dev-requirements.txt
pip install -r requirements.txt
tox

Kudos to BanzaiMan for this information. 👍

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 9, 2020

@caronc Thanks, this reproduces the error. The bad news is I still don't have an idea why it's happening, but this gives me something to work with.

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 9, 2020

Perhaps it's something to do with tox? Maybe it's moving away your temporary storage before the database is actually done with it? It's almost like you're missing a call to flush() or whatever the equivalent to it for SQLite. It's also possible the content was flushed, but the file descriptor simply wasn't closed properly which might not be a big deal? Maybe you just need to gracefully catch the exception? I'm obviously speaking without experience or knowledge of your coverage source code, so I mean no disrespect with these suggestions/thoughts. 🙂 Please let me know if there is anything I can help with.

@vnitinv

This comment has been minimized.

Copy link

@vnitinv vnitinv commented Jan 10, 2020

@caronc No, even with simple unittest run I also get the same error.

nosetests -v --with-coverage --cover-package=jnpr.junos  --cover-inclusive  --where=tests/unit/ -a unit
Traceback (most recent call last):
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/sqldata.py", line 1033, in execute
    return self.con.execute(sql, parameters)
sqlite3.OperationalError: no such table: meta

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/bin/nosetests", line 8, in <module>
    sys.exit(run_exit())
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/core.py", line 121, in __init__
    **extra_args)
  File "/opt/anaconda3/lib/python3.7/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/core.py", line 207, in runTests
    result = self.testRunner.run(self.test)
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/core.py", line 66, in run
    result.printErrors()
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/result.py", line 110, in printErrors
    self.config.plugins.report(self.stream)
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/nose/plugins/cover.py", line 183, in report
    self.coverInstance.stop()
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/control.py", line 670, in combine
    self.get_data()
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/control.py", line 696, in get_data
    if self._collector and self._collector.flush_data():
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/collector.py", line 425, in flush_data
    self.covdata.add_lines(self.mapped_file_dict(self.data))
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/sqldata.py", line 436, in add_lines
    self._choose_lines_or_arcs(lines=True)
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/sqldata.py", line 496, in _choose_lines_or_arcs
    ('has_arcs', str(int(arcs)))
  File "/Users/nitinkr/Coding/PyEZ_env/iagent_test/lib/python3.7/site-packages/coverage/sqldata.py", line 1048, in execute
    raise CoverageException("Couldn't use data file {!r}: {}".format(self.filename, msg))
coverage.misc.CoverageException: Couldn't use data file '/Users/nitinkr/Coding/PyEZ/py-junos-eznc/.coverage': no such table: meta

with older version things works fine

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 10, 2020

@vnitinv You have a different error message. Your error is covered by #916 (and #918).

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 11, 2020

I have been trying to understand what is going on, and I don't get it. Using @caronc's example. It's not something between 5.0.1 and 5.0.2. It reproduces on every version that had SQLite (5.0a2 and beyond).

If I remove pytest-cov, then the behavior stops. But I cannot see what is happening that makes the error. If others have reproducible scenarios, send them my way.

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 11, 2020

@nedbat that is still really good detective work. I've updated my Travis CI ticket to see if it's something they can update on their end. I wish i did this sooner, but I was able to push a build forcing v5.0.1 which proved your observation (as it still failed with the same error). What version of SQLite are you testing with right now and passing with?

At the same time, since it's an Alpha version of SQLite, maybe it would be worth opening a ticket with them before it goes final an a lot more people show up here? Are you saying that 5.0a1 works fine?

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 11, 2020

@caronc I'm not sure what you mean by an alpha version of SQLite? We're all using whatever version is built with Python, no? On the docker image, it's Python 3.6.7 with SQLite 3.11.0. On my Mac, it's Python 3.6.0 with SQLite 3.19.3

And just to be clear, I can still easily reproduce the I/O error using the docker image, I just don't see what's causing it. I'm still working on it....

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 11, 2020

I understand; this line threw me off:

It reproduces on every version that had SQLite (5.0a2 and beyond).

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 11, 2020

I see! I meant versions of coverage.py that used SQLite.

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 12, 2020

@ionelmc I think that view is wee bit counter productive. I you were to rewind time for even just 3 weeks from now, it worked in Python 3.6 without any problem at all. The same command works perfectly on all other versions of Python (2.7, 3.4, 3.5, 3.7, and 3.8).

Why would anyone run coverage over pytest-cov

But you do it too with your python-aspectlib project on line 139, 153, 181, and so on and so forth (at least 4 of 5 more times in that file). I presume your use case is the same as mine?

You definitely know coverage to a much greater level than I do. Could you suggest an alternative command I could run to achieve the same task? I just want to run all unit-tests on my project in several versions of Python and capture the coverage. I've achieved 100%, so I like knowing when it hits 99% (or less) so that I can use the report to address this. I'm definitely open to any advice if the solution to this problem if it is simply just just a procedural one and not code based! 👍

@SamLebarbare

This comment has been minimized.

Copy link

@SamLebarbare SamLebarbare commented Jan 12, 2020

I've tried commenting out the two pragmas, and the problem still occurred for me, though others have said it fixes it? Do other people have data points?

Humm... commenting? you must set it to "=on" explicitly, if your db is on disk...

@ionelmc

This comment has been minimized.

Copy link
Contributor

@ionelmc ionelmc commented Jan 12, 2020

Why would anyone run coverage over pytest-cov

But you do it too with your python-aspectlib project on line 139, 153, 181, and so on and so forth (at least 4 of 5 more times in that file). I presume your use case is the same as mine?

In tox ini of https://github.com/nedbat/apprise-api/tree/nedbat/debug-915 there is:

coverage run -m pytest --cov=. {posargs} apprise_api

That will run pytest "inside coverage". Basically coverage setups up various things then invokes pytest as if it would be started with "python -m pytest". Then pytest loads plugins like pytest-cov that look at the cmdline arguments and if "--cov" is present then pytest-cov (the pytest plugin) starts and sets up coverage again. This seems unnecessary to me.

An user should be able to record coverage with any of these:

  • coverage run -m pytest apprise_api
  • pytest --cov=. apprise_api (this is what I do in aspectlib)

I suspect you added --cov simply to display the report but that does much more - it actually starts coverage (again in your situation). Perhaps the plugin should error out or disable itself if coverage already runs.

@dw

This comment has been minimized.

Copy link

@dw dw commented Jan 12, 2020

For anyone else having fun with this, in GDB 'break pysqlite_connection_commit', 'run', and 'cont 451' after the first breakpoint hits seems to bring you to the last instance of SQLite API use before the exception

@nelsonjchen

This comment has been minimized.

Copy link

@nelsonjchen nelsonjchen commented Jan 12, 2020

Would something like this help? If not, could whatever this is using be called from gdb and maybe it might elaborate on the error?

python/cpython#1108

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 12, 2020

An user should be able to record coverage with any of these:

  • coverage run -m pytest apprise_api
  • pytest --cov=. apprise_api (this is what I do in aspectlib)

@ionelmc : This is great; thank you very much for that response! I will create a branch and try this out.

I suspect you added --cov simply to display the report but that does much more - it actually starts coverage (again in your situation). Perhaps the plugin should error out or disable itself if coverage already runs

This is indeed my reasoning. Can I still accomplish the same feat using your suggestion?

I realize I'm hijacking the issue at hand here, but a working alternative is still a great secondary (maybe even a primary) solution.

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 12, 2020

That was the first thing I was going to check. SQLite returns much more detailed error codes than "disk I/O error".

@rohfle

This comment has been minimized.

Copy link

@rohfle rohfle commented Jan 12, 2020

Using the supplied docker image, I notice that the following works

tox -e bad # fails
tox -e bad # still fails
rm /home/travis/apprise-api/.tox/bad/lib/python3.6/site-packages/pytest_cov/__pycache__/engine.cpython-36.*
tox -e bad # succeeds

Could this be a pycache related issue?

@dw

This comment has been minimized.

Copy link

@dw dw commented Jan 12, 2020

(huge pinch of salt)

It appears to be happening within execute(), within the C API function sqlite3_stmt_execute(), processing (I think) a Transaction opcode, due to:

#1  unixLock (id=0x113d338, eFileLock=1) at sqlite3.c:29058
#2  0x00007ffff39b86d4 in sqlite3OsLock (lockType=1, id=<optimized out>) at sqlite3.c:17970
#3  pagerLockDb (pPager=0x113d1c8, eLock=1) at sqlite3.c:45046
#4  0x00007ffff39b875c in pager_wait_on_lock (pPager=0x113d1c8, locktype=1) at sqlite3.c:47742
#5  0x00007ffff3a040d6 in sqlite3PagerSharedLock (pPager=0x113d1c8) at sqlite3.c:48959
#6  0x00007ffff3a099fc in lockBtree (pBt=0x10437b8) at sqlite3.c:58566
#7  sqlite3BtreeBeginTrans (p=p@entry=0x103e308, wrflag=0) at sqlite3.c:58926
#8  0x00007ffff3a36dc5 in sqlite3VdbeExec (p=p@entry=0x799ad8) at sqlite3.c:77502
#9  0x00007ffff3a41970 in sqlite3Step (p=0x799ad8) at sqlite3.c:73031

I can't see any associated lock taken via a system call in strace, but this call stack is returning 3850, which is equivalent to:

#define SQLITE_IOERR_LOCK              (SQLITE_IOERR | (15<<8))

Related strace:

getcwd("/home/travis/apprise-api", 1024) = 25
write(3, "self: <SqliteDb @0x7f1ec4988860 "..., 86) = 86
fcntl(31, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = -1 EBADF (Bad file descriptor)
open("/home/travis/apprise-api/.coverage", O_RDONLY|O_CLOEXEC) = 31
fstat(31, {st_mode=S_IFREG|0644, st_size=192512, ...}) = 0
ioctl(31, TCGETS, 0x7ffc260829c0)       = -1 ENOTTY (Inappropriate ioctl for device)
lseek(31, 0, SEEK_CUR)                  = 0
read(31, "SQLite format 3\0\4\0\1\1\0@  \0\0\0\n\0\0\0\274"..., 4096) = 4096
close(31)                               = 0
write(3, "EXCEPTION from execute: disk I/O"..., 39) = 39

Note the failing F_SETLK it appears to ignore and continue from. It seems strange that it would continue reading the file header after encountering this error, so maybe this isn't the real problem

Random guess: somehow tickling a bug in sqlite3 or the Python wrapper, due to the ordering of opens/closes. It could be that sqlite3 has the old fd 31 cached from a previous open, it definitely seems wrong that it would be trying to lock a FD it didn't know for certain it owned (this is an instant problem in any threaded program)

Possible brute force workaround: discard the connection after each commit()/abort()

You can recreate this stack with 'break sqlite3BtreeBeginTrans', 'cont 3174'

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 13, 2020

The EBADF error from F_SETLK is almost certainly the problem. However, the man page says that can only happen if the file is not open for reading, which is definitely not true as it successfully reads from it a few syscalls later.

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 13, 2020

Wait, it opened the file again before the successful read. Looks like the file was somehow closed even though the sqlite database connection was still open.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 13, 2020

@gfv Thanks. Do you mind sharing the code change you mean?

@gfv

This comment has been minimized.

Copy link

@gfv gfv commented Jan 13, 2020

@nedbat I figured out that the effect was mostly due to my forgotten debug statement within "close". This points strongly at a race condition somewhere close by (pun intended), but I can't yet see anything conclusive.

(I've deleted my earlier comment, sorry)

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 13, 2020

Disabling the Python GC (with gc.disable()) made the problem go away.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 13, 2020

My blog post was posted to Hacker News, and people there have been digging: https://news.ycombinator.com/item?id=22028581

It seems that a NamedTemporaryFile in apprise-api is being closed, and SQLite thinks it's its file, and tries to use the closed file descriptor. Not sure why that is...

@rickyz

This comment has been minimized.

Copy link

@rickyz rickyz commented Jan 13, 2020

The bug is this code (also present in some of the other tests), which patches tempfile._TemporaryFileWrapper to inject OSErrors:

https://github.com/nedbat/apprise-api/blob/master/apprise_api/api/tests/test_add.py#L166

This causes this line to raise an OSError:

https://github.com/python/cpython/blob/3.6/Lib/tempfile.py#L556

triggering the _os.close in the exception handler.

However, the FileIO object created by _io.open continues to reference the file descriptor.

The os reuses the closed fd for opening the sqlite db file. When the FileIO is finalized by the CPython GC some time later, it closes that fd from under sqlite. That leads to the EBADF errors described above.

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 13, 2020

I found that too. Here's my understanding:

  1. apprise tries to create a NamedTemporaryFile. This throws some sort of exception after this call to open. So the file object is created but never returned. The finally closes the file descriptor, but the file object sticks around for some reason, and thinks it still owns the file descriptor.
  2. Much much later, sqlite opens the database and reuses the same file descriptor number.
  3. Even later than that, while the database is open, python runs GC and decides to finally delete that file object, thus closing the FD that is currently being used by sqlite.

So this is a subtle bug in NamedTemporaryFile.

@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 13, 2020

So is the issue related to my unit tests (in apprise-api)? I'm trying to effectively simulate a bad disk write/read in several tests in order to achieve 100% code coverage in addition to just being able to handle these edge cases in general. Should I be doing it another way?

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 13, 2020

Mocking NamedTemporaryFile instead of _TemporaryFileWrapper would work around it. This is probably something that should be fixed in Python.

@rickyz

This comment has been minimized.

Copy link

@rickyz rickyz commented Jan 13, 2020

I'm not especially familiar with Python best practices, but patching tempfile.NamedTemporaryFile instead of the internal implementation class is probably a better idea.

I wouldn't consider this a Python bug, as the code is making what I consider to be the reasonable assumption that making a _TemporaryFileWrapper doesn't throw.

It would probably also be correct for NamedTemporaryFile to call file.close() instead of os.close if the FileIO object was successfully created.

Edit: The Python fix is still probably worth doing, as exceptions like KeyboardInterrupt can come at any time.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 13, 2020

Wow, thanks everyone, here and on Hacker News. I've summed up the answer in Bug #915: solved!

@caronc Thanks for persisting! I'll close this as not a coverage issue. The mock should be on NamedTemporaryFile, not on _TemporaryFileWrapper, since the first is the public interface, and is what you actually call in your product code. But I'm curious: the mock is called "mock_ntf" as if it had once been for NamedTemporaryFile. Why is that?

@nedbat nedbat closed this Jan 13, 2020
@nedbat nedbat added the not-our-bug label Jan 13, 2020
@caronc

This comment has been minimized.

Copy link
Author

@caronc caronc commented Jan 13, 2020

@nedbat: at the time I was having a lot of trouble using mock to have it throw on the NamedTemporaryFile() object (a lack of my own education no doubt). Through trial and error and with the help of pdb I stepped through the internals of it so that I could find a new function/object that mock would actually work on. Strangely the approach you see (and debugged with hackernews) worked... well this just than became my working solution.

What's weird is all of this worked fine before (tests passed a few weeks ago) and even today, they all pass on other versions of Python. Either way, I'll re-investigate this and play with the outer call of NamedTemporaryFile() again/

I'm curious though: If there ever was an actual OSError thrown (for lets say... lack of disk space) in a real world environment, I imagine this exception would fire from the depths i placed my current mocked object? So while my practices were wrong, I presume this truly could be a bug with Python as @tbodt suggested above? No? I mean, this would be hard to reproduce otherwise.

Thanks for persisting!

No, Thank YOU so much and all of the amazing people you conjured from Hackernews! It was exciting to watch all of your brains in motion while mine sputtered and spewed black smoke trying to grasp how you all narrowed it down! 🙂

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 13, 2020

@caronc I do believe this is a genuine bug in Python, and I hope the core devs take bpo 39318 seriously and fix it, ideally with backports.

I understand about the difficulty of getting mocks to work. This article might help: Why your mock doesn't work.

@tbodt

This comment has been minimized.

Copy link

@tbodt tbodt commented Jan 13, 2020

The code was written with the assumption that any failure would happen in _io.open instead of _TemporaryFileWrapper. The _TemporaryFileWrapper constructor doesn't actually do anything other than construct two objects, so it's very rare for that to fail.

As for why this worked on other versions of Python and other environments, that's because the problem only shows up when the GC happens to run after the same FD has been reused. If the GC runs at even a very slightly different time (such as between database opens instead of while the database is open) it won't fail in the same way.

@trinitronx

This comment has been minimized.

Copy link

@trinitronx trinitronx commented Jan 15, 2020

I found this same type of error was happening while using coverage==5.0.1, while using nosetest s --with-coverage. Adding this information for future Googlers of this error to find.

0a7861647e52:/usr/src/app/scout-report-generator$ pip list coverage
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Package             Version
------------------- ----------
awscli              1.11.190
boto3               1.10.43
botocore            1.7.48
certifi             2019.11.28
chardet             3.0.4
colorama            0.3.7
cov-core            1.15.0
coverage            5.0.1
docutils            0.15.2
funcsigs            1.0.2
futures             3.3.0
iampoliciesgonewild 1.0.6.2
idna                2.8
jmespath            0.9.4
mock                3.0.5
netaddr             0.7.19
nose                1.3.7
nose-cov            1.6
opinel              3.3.4
pip                 19.0.1
pyasn1              0.4.8
python-dateutil     2.8.1
PyYAML              3.12
requests            2.22.0
rsa                 3.4.2
s3transfer          0.1.13
setuptools          40.7.1
six                 1.13.0
urllib3             1.25.7
wheel               0.32.3
You are using pip version 19.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

The root cause in my particular case was due to running tests inside a docker container as a user which did not have file ownership of the python app's source code directory. Changing ownership to this user before running the tests worked!

Original Error was:

/scout-report-generator # /bin/su - jenkins -s /bin/sh
0a7861647e52:/usr/src/app/scout-report-generator$ nosetests --with-coverage --with-xunit --xunit-file=/scout-report-generator/generated//test.xml tests/test-rules-processingengine.py | tee /scout-report-generator/generated//test.txt;
Traceback (most recent call last):
  File "/usr/local/bin/nosetests", line 10, in <module>
    sys.exit(run_exit())
  File "/usr/local/lib/python2.7/site-packages/nose/core.py", line 121, in __init__
    **extra_args)
  File "/usr/local/lib/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/local/lib/python2.7/site-packages/nose/core.py", line 145, in parseArgs
    self.config.configure(argv, doc=self.usage())
  File "/usr/local/lib/python2.7/site-packages/nose/config.py", line 346, in configure
    self.plugins.configure(options, self)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 284, in configure
    cfg(options, config)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/cover.py", line 155, in configure
    self.coverInstance.load()
  File "/usr/local/lib/python2.7/site-packages/coverage/control.py", line 396, in load
    self._data.read()
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 741, in read
    with self._connect():       # TODO: doesn't look right
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 298, in _connect
    self._create_db()
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 247, in _create_db
    with db:
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 1015, in __enter__
    self._connect()
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 997, in _connect
    self.con = sqlite3.connect(filename, check_same_thread=False)
sqlite3.OperationalError: unable to open database file
0a7861647e52:/usr/src/app/scout-report-generator$ touch .coverage
touch: .coverage: Permission denied

After creating a blank .coverage file:

/scout-report-generator # touch .coverage
/scout-report-generator # chown jenkins:ci .coverage
/scout-report-generator # /bin/su - jenkins -s /bin/sh
0a7861647e52:~$ cd /usr/src/app/scout-report-generator && nosetests --with-coverage --with-xunit --xunit-file=/scout-report-generator/generated//test.xml tests/test-rules-processingengine.py | tee /scout-report-generator/generated//test.txt;
Traceback (most recent call last):
  File "/usr/local/bin/nosetests", line 10, in <module>
    sys.exit(run_exit())
  File "/usr/local/lib/python2.7/site-packages/nose/core.py", line 121, in __init__
    **extra_args)
  File "/usr/local/lib/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/local/lib/python2.7/site-packages/nose/core.py", line 145, in parseArgs
    self.config.configure(argv, doc=self.usage())
  File "/usr/local/lib/python2.7/site-packages/nose/config.py", line 346, in configure
    self.plugins.configure(options, self)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 284, in configure
    cfg(options, config)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/usr/local/lib/python2.7/site-packages/nose/plugins/cover.py", line 155, in configure
    self.coverInstance.load()
  File "/usr/local/lib/python2.7/site-packages/coverage/control.py", line 396, in load
    self._data.read()
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 741, in read
    with self._connect():       # TODO: doesn't look right
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 296, in _connect
    self._open_db()
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 264, in _open_db
    self._read_db()
  File "/usr/local/lib/python2.7/site-packages/coverage/sqldata.py", line 274, in _read_db
    self._filename, exc
coverage.misc.CoverageException: Data file '/usr/src/app/scout-report-generator/.coverage' doesn't seem to be a coverage data file: Couldn't use data file '/usr/src/app/scout-report-generator/.coverage': no such table: coverage_schema

After changing python source directory ownership & no .coverage file:

/scout-report-generator # chown -R jenkins:ci /usr/src/app/scout-report-generator

0a7861647e52:/usr/src/app/scout-report-generator$
0a7861647e52:/usr/src/app/scout-report-generator$ rm .coverage
0a7861647e52:/usr/src/app/scout-report-generator$ cd /usr/src/app/scout-report-generator && nosetests --with-coverage --with-xunit --xunit-file=/scout-report-generator/generated//test.xml tests/test-rules-processingengine.py | tee /scout-report-generator/generated//test.txt;
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-default-security-group-with-rules.json]: "Non-empty rulesets for default security groups"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-lowercase-required.json]: "Password policy lacks lowercase requirement"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-opens-all-ports-to-all.json]: "All ports open to all"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-whitelists-aws-ip-from-banned-region.json]: "Security group whitelists AWS IPs outside the USA"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-expiration-threshold.json]: "Passwords expire after 90 days"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-opens-port-to-all.json]: "TCP port open to all"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-default-security-group-in-use.json]: "Default security groups in use"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-no-uppercase-required.json]: "Password policy lacsk uppercase requirement"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-reuse-enabled.json]: "Password reuse enabled"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-opens-known-port-to-all.json]: "MySQL port open to all"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-opens-all-ports-to-self.json]: "Unrestricted network traffic within security group"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-no-expiration.json]: "Password expiration disabled"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-whitelists-aws.json]: "Security group whitelists AWS CIDRs"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-no-number-required.json]: "Password policy lacks number requirement"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-minimum-length.json]: "Minimum password length too short"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-opens-port-range.json]: "Use of port ranges"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-no-symbol-required.json]: "Password policy lacks symbol requirement"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing EC2 rule[ec2-security-group-opens-all-ports.json]: "All ports open"
Loading ruleset /usr/src/app/scout-report-generator/tests/data/ruleset-test.json
Processing IAM rule[iam-password-policy-no-lowercase-required.json]: "Password policy lacks lowercase requirement"
Existing  rules: 105
Processed rules: 19
Verified  rules: 19
.
Name                                  Stmts   Miss  Cover
---------------------------------------------------------
AWSScout2/__init__.py                    13      0   100%
AWSScout2/configs/__init__.py             1      0   100%
AWSScout2/configs/browser.py             61     29    52%
AWSScout2/rules/__init__.py               7      0   100%
AWSScout2/rules/processingengine.py      46      8    83%
AWSScout2/rules/rule.py                  87     15    83%
AWSScout2/rules/rule_definition.py       61     27    56%
AWSScout2/rules/ruleset.py              126     64    49%
AWSScout2/rules/utils.py                 90     19    79%
AWSScout2/utils.py                       14      5    64%
---------------------------------------------------------
TOTAL                                   506    167    67%
----------------------------------------------------------------------
Ran 1 test in 2.773s

OK

Conclusion:

This same error / symptom may happen inside a docker container (or any *nix system) where the current working directory is not owned & writable by the test user. This can especially be likely in cases such as CI/CD pipelines, and Jenkins building and testing within Docker containers.

The coverage module seems to want no blank .coverage file pre-existing (or perhaps at least a valid sqlite3 file), where this file and parent directory are writable by the current user.

@nedbat

This comment has been minimized.

Copy link
Owner

@nedbat nedbat commented Jan 15, 2020

@trinitronx The underlying error is different in your case, with a different cause. I don't know if there is something coverage.py can do to make that situation better, but we should discuss it in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.