Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross platform file caching #44

Closed
sqlalchemy-bot opened this issue Nov 4, 2013 · 21 comments
Closed

Cross platform file caching #44

sqlalchemy-bot opened this issue Nov 4, 2013 · 21 comments
Labels

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by Anonymous

I'm looking for a cross platform file caching backend as the dogpile.cache.backends.file requires fcntl which is Unix.
In my case I'll have a single thread doing rw operations so I don't mind not having a rw lock.

  • It should be mentioned in the docs that the file backend is not compatible with Windows in a more explicit way
  • The file backend should be cross platform
  • The file backend should have an optional rw lock
@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

I'm confused. The file backend already uses an RW lock, note:

def acquire_read_lock(self, wait):
    return self._acquire(wait, os.O_RDONLY, fcntl.LOCK_SH)

def acquire_write_lock(self, wait):
    return self._acquire(wait, os.O_WRONLY, fcntl.LOCK_EX)

as far as how to make it cross platform, I've no idea how a read/write file lock would be implemented on windows (I know how to do it with threading.condition, if you want an option to be present that works in only one process).

@sqlalchemy-bot
Copy link
Author

Antoine Bertin (diaoul) wrote:

What about something using pywin32 or portalocker?

I know it has a rw lock but AFAIK it is not optional.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

right, was going to say earlier, dependencies on win32 type stuff I can't specifically have directly in dogpile.cache, so I'm open to having the FileLock factory be pluggable, e.g. as an argument to DBMBackend. feel free to send a pullreq.

@sqlalchemy-bot
Copy link
Author

Antoine Bertin (diaoul) wrote:

Then don't depend on it, just have a Windows specific FileLocker and use it if os.platform == 'nt'

What do you think about that?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

we already have lock mechanisms as pluggable. I'd rather not endorse a particular win32 system as I don't have the resources to test or support it, portalocker hasn't had a commit in 3 years (does it work at scale? does it work on the latest windows? etc., I have no idea). I commonly create hooks so that features I can't support can be implemented and maintained by others on the outside.

@sqlalchemy-bot
Copy link
Author

Antoine Bertin (diaoul) wrote:

Well, I don't have the resources either so I hope a contributor will step up.
For reference, upstream issue is Diaoul/subliminal#255

@sqlalchemy-bot
Copy link
Author

Jeroen () wrote:

I'm running subliminal and made some changes to overcome the problems in windows. I do not know if it solves all the problems or if it even works, but it fixed my errors :D

  1. install https://github.com/WoLpH/portalocker (windows replacement fcntl)
  2. install pywin32 for required libraries.
  3. change dogpile.cache-0.5.1-py2.7.egg\dogpile\cache\backends\file.py to:
#!python

"""
File Backends
------------------

Provides backends that deal with local filesystem access.

"""

from __future__ import with_statement
from dogpile.cache.api import CacheBackend, NO_VALUE
from contextlib import contextmanager
from dogpile.cache import compat
from dogpile.cache import util
import os

import portalocker
"""import fcntl"""

__all__ = 'DBMBackend', 'FileLock'

class DBMBackend(CacheBackend):
    """A file-backend using a dbm file to store keys.

    Basic usage::

        from dogpile.cache import make_region

        region = make_region().configure(
            'dogpile.cache.dbm',
            expiration_time = 3600,
            arguments = {
                "filename":"/path/to/cachefile.dbm"
            }
        )

    DBM access is provided using the Python ``anydbm`` module,
    which selects a platform-specific dbm module to use.
    This may be made to be more configurable in a future
    release.

    Note that different dbm modules have different behaviors.
    Some dbm implementations handle their own locking, while
    others don't.  The :class:`.DBMBackend` uses a read/write
    lockfile by default, which is compatible even with those
    DBM implementations for which this is unnecessary,
    though the behavior can be disabled.

    The DBM backend by default makes use of two lockfiles.
    One is in order to protect the DBM file itself from
    concurrent writes, the other is to coordinate
    value creation (i.e. the dogpile lock).  By default,
    these lockfiles use the ``flock()`` system call
    for locking; this is only available on Unix
    platforms.

    Currently, the dogpile lock is against the entire
    DBM file, not per key.   This means there can
    only be one "creator" job running at a time
    per dbm file.

    A future improvement might be to have the dogpile lock
    using a filename that's based on a modulus of the key.
    Locking on a filename that uniquely corresponds to the
    key is problematic, since it's not generally safe to
    delete lockfiles as the application runs, implying an
    unlimited number of key-based files would need to be
    created and never deleted.

    Parameters to the ``arguments`` dictionary are
    below.

    :param filename: path of the filename in which to
     create the DBM file.  Note that some dbm backends
     will change this name to have additional suffixes.
    :param rw_lockfile: the name of the file to use for
     read/write locking.  If omitted, a default name
     is used by appending the suffix ".rw.lock" to the
     DBM filename.  If False, then no lock is used.
    :param dogpile_lockfile: the name of the file to use
     for value creation, i.e. the dogpile lock.  If
     omitted, a default name is used by appending the
     suffix ".dogpile.lock" to the DBM filename. If
     False, then dogpile.cache uses the default dogpile
     lock, a plain thread-based mutex.


    """
    def __init__(self, arguments):
        self.filename = os.path.abspath(
                            os.path.normpath(arguments['filename'])
                        )
        dir_, filename = os.path.split(self.filename)

        self._rw_lock = self._init_lock(
                                arguments.get('rw_lockfile'),
                                ".rw.lock", dir_, filename)
        self._dogpile_lock = self._init_lock(
                                arguments.get('dogpile_lockfile'),
                                ".dogpile.lock",
                                dir_, filename,
                                util.KeyReentrantMutex.factory)

        # TODO: make this configurable
        if compat.py3k:
            import dbm
        else:
            import anydbm as dbm
        self.dbmmodule = dbm
        self._init_dbm_file()

    def _init_lock(self, argument, suffix, basedir, basefile, wrapper=None):
        if argument is None:
            lock = FileLock(os.path.join(basedir, basefile + suffix))
        elif argument is not False:
            lock = FileLock(
                        os.path.abspath(
                            os.path.normpath(argument)
                        ))
        else:
            return None
        if wrapper:
            lock = wrapper(lock)
        return lock

    def _init_dbm_file(self):
        exists = os.access(self.filename, os.F_OK)
        if not exists:
            for ext in ('db', 'dat', 'pag', 'dir'):
                if os.access(self.filename + os.extsep + ext, os.F_OK):
                    exists = True
                    break
        if not exists:
            fh = self.dbmmodule.open(self.filename, 'c')
            fh.close()

    def get_mutex(self, key):
        # using one dogpile for the whole file.   Other ways
        # to do this might be using a set of files keyed to a
        # hash/modulus of the key.   the issue is it's never
        # really safe to delete a lockfile as this can
        # break other processes trying to get at the file
        # at the same time - so handling unlimited keys
        # can't imply unlimited filenames
        if self._dogpile_lock:
            return self._dogpile_lock(key)
        else:
            return None

    @contextmanager
    def _use_rw_lock(self, write):
        if self._rw_lock is None:
            yield
        elif write:
            with self._rw_lock.write():
                yield
        else:
            with self._rw_lock.read():
                yield

    @contextmanager
    def _dbm_file(self, write):
        with self._use_rw_lock(write):
            dbm = self.dbmmodule.open(self.filename,
                                "w" if write else "r")
            yield dbm
            dbm.close()

    def get(self, key):
        with self._dbm_file(False) as dbm:
            if hasattr(dbm, 'get'):
                value = dbm.get(key, NO_VALUE)
            else:
                # gdbm objects lack a .get method
                try:
                    value = dbm[key]
                except KeyError:
                    value = NO_VALUE
            if value is not NO_VALUE:
                value = compat.pickle.loads(value)
            return value

    def get_multi(self, keys):
        return [self.get(key) for key in keys]

    def set(self, key, value):
        with self._dbm_file(True) as dbm:
            dbm[key] = compat.pickle.dumps(value)

    def set_multi(self, mapping):
        with self._dbm_file(True) as dbm:
            for key,value in mapping.items():
                dbm[key] = compat.pickle.dumps(value)

    def delete(self, key):
        with self._dbm_file(True) as dbm:
            try:
                del dbm[key]
            except KeyError:
                pass

    def delete_multi(self, keys):
        with self._dbm_file(True) as dbm:
            for key in keys:
                try:
                    del dbm[key]
                except KeyError:
                    pass

class FileLock(object):
    """Use lockfiles to coordinate read/write access to a file.

    Only works on Unix systems, using
    `fcntl.flock() <http://docs.python.org/library/fcntl.html>`_.

    """

    def __init__(self, filename):
        self._filedescriptor = compat.threading.local()
        self.filename = filename

    def acquire(self, wait=True):
        return self.acquire_write_lock(wait)

    def release(self):
        self.release_write_lock()

    @property
    def is_open(self):
        return hasattr(self._filedescriptor, 'fileno')

    @contextmanager
    def read(self):
        self.acquire_read_lock(True)
        try:
            yield
        finally:
            self.release_read_lock()

    @contextmanager
    def write(self):
        self.acquire_write_lock(True)
        try:
            yield
        finally:
            self.release_write_lock()

    def acquire_read_lock(self, wait):
        return self._acquire(wait, os.O_RDONLY, portalocker.LOCK_SH)

    def acquire_write_lock(self, wait):
        return self._acquire(wait, os.O_WRONLY, portalocker.LOCK_EX)

    def release_read_lock(self):
        self._release()

    def release_write_lock(self):
        self._release()

    def _acquire(self, wait, wrflag, lockflag):
        wrflag |= os.O_CREAT
        fileno = os.open(self.filename, wrflag)
        try:
            if not wait:
                lockflag |= portalocker.LOCK_NB
            portalocker.lock(fileno, lockflag)
        except IOError:
            os.close(fileno)
            if not wait:
                # this is typically
                # "[Errno 35] Resource temporarily unavailable",
                # because of LOCK_NB
                return False
            else:
                raise
        else:
            self._filedescriptor.fileno = fileno
            return True

    def _release(self):
        try:
            fileno = self._filedescriptor.fileno
        except AttributeError:
            return
        else:
            portalocker.unlock(fileno)
            os.close(fileno)
            del self._filedescriptor.fileno

PS This is my first time installing/editing python, entering a possible bugfix. sorry for the inconvenience of not testing it and not providing a .patch.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

take a look at 34af3b7 - you can take your FileLock approach and subclass AbstractFileLock with it, and drop the lock in. An example of usage is provided. I didn't actually test dropping in an alternate lock implemenation, so if there's any problems with my checkin, reopen

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Author

Antoine Bertin (diaoul) wrote:

I don't understand why it doesn't bother you to depend on fnctl but it does bother you to depend on some win32 stuff.
I wished you could make the default implementation cross platform.

Anyway, thanks for implementing a clean workaround :) Do you plan on releasing soon?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

if there were a windows-compatible file locking system in the Python std lib, I'd use that, but there's not, which IMO says a lot about the feasibility of locking files on windows.

portalocker is at version 0.3, says it's "beta", and hasn't had a commit in three years. It apparently has one ticket just opened two days ago that the dev has responded to, so that's good, but it seems to be a very basic issue. So i really have very little indication that portalocker is a widely used and accepted system that I can confidently import in my library knowing that it's the best possible approach. On windows, you probably get much better performance using threading conditions, which is why I also offer that as an option. Windows just doesn't have a single, widely accepted approach for this kind of thing and I have no interest in endorsing one of them.

@sqlalchemy-bot
Copy link
Author

Antoine Bertin (diaoul) wrote:

Thanks for the clarification.
Do you intend to make a release soon with this feature?

@sqlalchemy-bot
Copy link
Author

Antoine Bertin (diaoul) wrote:

I can't test but the fact that there's still "import fcntl" will still throw "ImportError: No module named fcntl" under windows.
You should put this under "if os.name == 'posix'"

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

good idea

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

also why can't you test? would be a much better idea if you could before any release happens

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

7c34b2b

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

all tests have been fixed to work on windows in c1de009, ensured that the condition-based lock example and the DBM backend passes tests on windows in d40c41c, 0.5.2 is released on pypi.

@sqlalchemy-bot
Copy link
Author

morpheus65535 (morpheus65535) wrote:

This error message seems to be back in version 0.6.4. Could it still be fixed? Thanks!

@razsteinmetz
Copy link

Hi,
Any chance this will ever work on windows?

@zzzeek
Copy link
Member

zzzeek commented Aug 17, 2021

Hi,
Any chance this will ever work on windows?

I've just re-read this issue and it seems to indicate that everything works on windows. as described in #44 (comment) and in the docs at https://dogpilecache.sqlalchemy.org/en/latest/api.html#dogpile.cache.backends.file.DBMBackend you would use your win32 locking library of choice and plug it in using the lock_factory parameter.

@razsteinmetz
Copy link

I agree, but I would suggest to mention it in the documentation (or at least in the opening example) - it will save the time to figure it out from the error message. Just my 10cents....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants