Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to v3? #60

Closed
6 tasks done
grantjenks opened this issue Dec 11, 2017 · 8 comments
Closed
6 tasks done

Bump to v3? #60

grantjenks opened this issue Dec 11, 2017 · 8 comments

Comments

@grantjenks
Copy link
Owner

grantjenks commented Dec 11, 2017

Upcoming features:

@elistevens
Copy link
Contributor

See also: #61

Thanks!

@grantjenks
Copy link
Owner Author

#61 merged and updated at c8aecac. I changed None to diskcache.UNKNOWN since None is a valid key/value. I also modified the test to use the first 32 characters of the sha256 hash and added testing to fanout cache.

@grantjenks
Copy link
Owner Author

@elistevens when you do "rsync" backups take careful note of the switches used. The test_rsync test was failing intermittently when using "rsync" without "--checksum". As I understand it, rsync checks file sizes and modification times to heuristically determine which files have been modified (this is the default behavior). For a little database like sqlite which is quickly edited and uses an underlying block storage, it's possible to modify the database but retain the same size and modification time.

You can pass "--checksum" to rsync to tell it not to use it's default heuristic and instead compare checksums. This way, you'll still benefit from incremental transfers. But if you were exceedingly paranoid, you might not trust the checksums and choose "--ignore-times" which will transfer all files and behave like a copy.

The switches I'm using in testing are:

rsync -a --checksum --delete --stats source destination

The purpose of each switch:

  • "-a" -- Copy everything.
  • "--checksum" -- Detect file changes using a checksum.
  • "--delete" -- Delete files in destination that are not in source.
  • "--stats" -- View the total number of bytes transferred; useful to see it's working incrementally.

@elistevens
Copy link
Contributor

I'm surprised that the DB could be modified, but the modification time on the file not be updated.

For our use case, 99% of the content won't have been changed on a day-to-day basis, so our nightly backups will probably do something like using time+size for rsync, and then making sure the DB has been copied every time.

Thanks for letting me know there might be issues there.

@grantjenks
Copy link
Owner Author

It's not so much that the modification time is not updated but that the resolution of the modification time is less than necessary. I develop on a MacBook Pro with an OS X Extended filesystem. According to Wikipedia and an Ars Technica article (both cited by a Stack Overflow answer), the resolution of the modification time on HFS+ file systems is 1 second. I think it's easy to imagine in testing that multiple modifications to the database could occur within the same second and the database could remain the same size.

Considering that you will likely do nightly backups, I doubt it could ever be an issue. I just want you to be aware that rsync uses heuristics (like file size and modification time) and those may be inaccurate.

You may also want to use the "-z" option to compress the transfer over rsync.

@elistevens
Copy link
Contributor

Ahh, that makes much more sense. Yeah, that won't be an issue for our use case. Great!

@grantjenks
Copy link
Owner Author

V3 tagged in git and deployed to PyPI. I'm waiting now to see Travis and AppVeyor come back green.

The new diskcache is faster than the old one. Yaay! Always a good sign.

I added test_core.py:test_custom_eviction_policy for your custom eviction scenario.

I also think the new design (new-style format strings) would allow you to update the expire_time on every get/incr. Something like:

    dc.EVICTION_POLICY['lru-gt-1s'] = {
        'init': None,
        'get': 'expire_time = {now} + 90 * 24 * 60 * 60',
        'cull': None,
    }

And then just use cache.expire() in your nightly job. I haven't tested that yet but you might want to look into it.

Also note that with your custom eviction policy, you can exceed the cache's size limit. If the culling query returns no rows then the culling stops regardless of the cache's volume.

@grantjenks
Copy link
Owner Author

All green in Travis and AppVeyor. I think that meets all the v3 milestones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants