Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different HDFStores in multiple threads crashes Python #2397

Closed
pag opened this issue Nov 30, 2012 · 6 comments

Comments

Projects
None yet
3 participants
@pag
Copy link

commented Nov 30, 2012

import threading
import pandas as pd
import time

def foo():
    store = pd.HDFStore('my_hdf_file.h5')
    store['foo']
    store.close()


def main():
    threading.Thread(target=foo).start()
    threading.Thread(target=foo).start()
    time.sleep(2)

if __name__ == '__main__':
    main()

Crashes for me (Windows 7 using pytables 2.4.0 and pandas 0.9.1 from http://www.lfd.uci.edu/~gohlke/pythonlibs/). I can't get the stack trace easily, I can try harder if necessary. Simply using tables.openFile and reading a few values seems to work fine.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2012

The underlying storage mechanism, PyTables is inherently not threadsafe for WRITES. HDFStore opens the store file with mode 'a' (append), by default, so this is trying to open 2 writers. Try opening in read mode.

store = pd.HDFStore('my_hdf_file.h5', mode = 'r')

I will add a note to the docs....as this is also a problem in multi-processing (concurrent reads ok, but writing and reading at the same time is a problem)

http://pl.digipedia.org/usenet/thread/16072/93/

@pag

This comment has been minimized.

Copy link
Author

commented Nov 30, 2012

Sorry, my original example was a reader. The crash still happens with mode='r'.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 30, 2012

I tried your example (after I created the h5 file), and with mode = 'r'

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/code/arb/test/pytables-threading.py", line 8, in foo
    store.close()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 263, in close
    self.handle.close()
  File "/usr/local/lib/python2.7/site-packages/tables/file.py", line 2162, in close
    del _open_files[filename]
KeyError: 'my_hdf_file.h5'

so in the PyTables layer it is trying to close a file which it thinks is open already. this is a bug in PyTables, see this thread. I guess its not thread-safe even for reads

PyTables/PyTables#130

using with doesn't help either

from pandas.io.pytables import get_store
def foo():
    with get_store('my_hdf_file.h5', mode = 'r') as store:
        store['foo']
        store.close()

I would just say avoid opening/using the file in multi-threads. I have found no issues using read only in multiple processes however.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2012

the following example works correctly. I think if you open and close in the main thread, then you can concurrently read w/o a problem in other threads.(still avoid read/writing in more than 1 thread however)

import threading
import pandas as pd
import time

class Thread(threading.Thread):

    def __init__(self, store):
        threading.Thread.__init__(self)
        self.store = store

    def run(self):
        print self.store['foo']

def main():
    store = pd.HDFStore('my_hdf_file.h5')        
    t1 = Thread(store = store)
    t2 = Thread(store = store)
    t1.start()
    t2.start()
    time.sleep(2)
    t1.join()
    t2.join()
    store.close()

if __name__ == '__main__':
    store = pd.HDFStore('my_hdf_file.h5')
    store['foo'] = pd.Series(range(10))
    store.close()
    main()
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 6, 2012

We could potentially add locks to HDFStore at some point to prevent multiple threads from accessing the file at once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.