-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor storage operations into separate Backend classes #348
Conversation
Awesome! Just FYI I have what will probably be a long day of work ahead and
some plans for this evening, so I probably won’t get to review this until
Friday evening, but I’m looking forward to it!
…On Thu, Oct 15, 2020 at 1:31 AM PelleK ***@***.***> wrote:
Following the discussion in #253
<#253> and #325
<#325> I've created a
first iteration on what a Backend interface could look like and how the
current file storage operations may be refactored into this interface. It
goes from the following principles
- app.py talks only to core.py with regards to package operations
- at configuration time, a Backend implementation is chosen and
created for the lifetime of the configured app
- core.py proxies requests for packages to this Backend()
- The Backend interface/api is defined through three things
- methods that an implementation must implement
- methods that an implementation may override if it knows better
than the defaults
- the PkgFIle class that is (should be) the main carrier of data
- where possible, implementation details must be hidden from concrete
Backends to promote extensibility
Other things I've done in this PR:
- I've tried to talk about packages and projects, rather than files
and prefixes, since these are the domain terms PEP503 uses, and imho it's
also more clear what it means
- Better testability of the CacheManager (no more race conditions when
watchdog is installed during testing)
- Cleanup some more Python 2 code
- Started moving away from os.path and py.path in favour of pathlib
Furthermore I've created a plugin.py with a sample of how I think plugin
system could look like. This sampIe assumes we use argparse and allows
for the extension of cli arguments that a plugin may need. I think the
actual implementation of such a plugin system is beyond the scope of this
PR, but I've used it as a target for the Backend refactoring. If requested,
I'll remove it from this PR.
The following things still need to be done / discussed. These can be part
of this PR or moved into their own, separate PRs
- Simplify the PgkFile class. It currently consists of a number of
attributes that don't necessarily belong with it, and not all attributes
are aptly named (imho). I would like to minimalize the scope of PkgFile
so that its only concern is being a data carrier between the app and the
backends, and make its use more clear.
- Add a PkgFile.metadata that backend implementations may use to store
custom data for packages. For example the current PkgFile.root
attribute is an implementation detail of the filestorage backends, and
other Backend implementations should not be bothered by it.
- Use pathlib wherever possible. This may also result in less
attributes for PkgFile, since some things may be just contained in a
single Path object, instead of multtiple strings.
- Improve testing of the CacheManager.
------------------------------
You can view, comment on, or merge this pull request online at:
#348
Commit Summary
- move some functions around in preparation for backend module
- rename pkg_utils to pkg_helpers to prevent confusion with stdlib
pkgutil
- further implement the current filestorage as simple file backend
- rename prefix to project, since that's more descriptive
- add digester func as attribute to pkgfile
- WIP caching backend
- WIP make cache better testable
- better testability of cache
- WIP file backends as plugin
- Merge branch 'master' of github.com:elfjes/pypiserver into
backend_interface
File Changes
- *M* pypiserver/__init__.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-eb39dcb6c5edb691cf2c59ec607eba8aaca08cc8c4bd5f49f6b4354f31ba5d73>
(3)
- *M* pypiserver/__main__.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-7e55e15c67273f0925ecd703161471f0ebc3562841787e4d4b336cfc950a7e6d>
(4)
- *M* pypiserver/_app.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-6485df1d3c02eb24b709ff35b4bce34060ecc909153df3b51c43c4a592693fed>
(111)
- *A* pypiserver/backend.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-4626005dbcd09690415b2d2f8a1b1535fab767ecb8da813f4402d7f0dce0380e>
(224)
- *M* pypiserver/cache.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-7a67bdfe8ae3dc5180bf7cd3b79be6bcf1801d0377c3c097e52f4550885c12c6>
(42)
- *M* pypiserver/core.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-dc372a5afa8581783c82548716c3539c6bdef35475464a6cd5d11e1037454ebf>
(396)
- *M* pypiserver/manage.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-fbd074c517ad1c416858c39cae72b25c8efbb8c16b2a01f0a9b127cc7a1f8152>
(51)
- *A* pypiserver/pkg_helpers.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-1894d0e278f38176f9be68ed996f81409c813b5fb917c0044d00ebeef9c10698>
(107)
- *A* pypiserver/plugin.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-032e81e4a9a172bc2858d71558fe76081f31c0b422375696a512a11a5ac1970b>
(41)
- *M* tests/test_app.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-a67cb1853203a6f1956991a9d9881d231c4d43557f5baffd45cc672a87e41cc6>
(45)
- *M* tests/test_core.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-938de46e643ab091cff6a7c23e4e752e8907e2266f47889d988923352f7a1058>
(68)
- *M* tests/test_main.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-41d180f6f7233d175c2dfe36c45c4ea80eed4754fa7ed4eed46ccde1b2a778c7>
(8)
- *M* tests/test_manage.py
<https://github.com/pypiserver/pypiserver/pull/348/files#diff-7e8346b0ea680b9921d653a97988fed4c0d2b24fdb5eba7d9132d2a8e32dd3dd>
(38)
Patch Links:
- https://github.com/pypiserver/pypiserver/pull/348.patch
- https://github.com/pypiserver/pypiserver/pull/348.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#348>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTQRYL5PHNN7QHKFJKRHNLSK2JKXANCNFSM4SRRMEYQ>
.
|
Sure no worries. Looking forward to your feedback 😄 |
Got a chance to give this a quick overview this evening, and I really like the direction! I have an in-progress branch to replace all usage of the old config with the stuff from #339 with all tests passing. You can see that here. It seems like that could mesh really well with the approach taken here. I think we would update the I've got no major concerns with what I've seen so far, but I'll be sure to make time to give this a more thorough review over the course of this week. In regards to the things you mentioned as potentially needing more work:
I agree with this goal! I don't feel strongly either way on whether it should be a part of this PR, but I definitely think it's worth exploring
Yeah, and having some access to metadata will also give us a route towards e.g. fully supporting PEP 503 and the
I've been working in this direction in my branch as well, so definitely no issues with seeing more of it.
Yeah the existing watchdog cache had really minimal tests, so anything here is an improvement Regarding getting both of our streams of work going harmoniously, we can either get this merged, and then I can rebase and update on top of this, or vice-versa. I'll probably go ahead and open a PR with that branch tonight or tomorrow, just so that we can better look at how they might work together, but there's no rush to get that one in, and I'm happy to do the work of rebasing if we get this one in first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some time this evening for a first pass!
pypiserver/core.py
Outdated
|
||
def with_digester(func: t.Callable[..., t.Iterable[PkgFile]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to keep the decorator from overriding the underlying type, you can use a generic like:
DigesterFunc = t.TypeVar("DigesterFunc", bound=t.Callable[..., t.Iterable[PkgFile]])
def with_digester(func: DigesterFunc) -> DigesterFunc:
...
you may need a # type: ignore
where you're returning your wrapper function inside the decorator, but by telling mypy that you're returning a type of the same type that was put into the function, you allow it to infer that the resulting decorated function takes the same arguments as the original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! :)
tests/test_app.py
Outdated
|
||
# @pytest.mark.xfail( | ||
# ENABLE_CACHING, reason="race condition when caching is enabled" | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will remove
pypiserver/_app.py
Outdated
@@ -104,7 +91,7 @@ def root(): | |||
fp = request.custom_fullpath | |||
|
|||
try: | |||
numpkgs = len(list(packages())) | |||
numpkgs = len(list(core.get_all_packages())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we added a "package_count" or similar method to the backend so that, for example, we wouldn't actually have to parse all the files into PkgFile
instances in order to get their count, and could instead just nab all files whose names match a regex or something. It would also be useful with imagined remote backends, where pulling all the files might be a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a package_count
method sounds good, but I'm not sure about adding "query" or filter functionality. This would require additional complexity from each backend (including said imagined ones) without adding apparent value. What would be a use case for having such filtering functionality in a package_count
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just meant a count without any custom filtering, but I expressed that poorly. Where I was talking about a regex, I essentially just meant "count every file that looks like a package based on the filename"
pypiserver/_app.py
Outdated
links = [ | ||
(f.relfn_unix, urljoin(fp, f.fname_and_hash(config.hash_algo))) | ||
for f in files | ||
(pkg.relfn_unix, urljoin(fp, pkg.fname_and_hash)) for pkg in packages | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try replacing this list comprehension ([]
) with a generator comprehension (()
)? We definitely need to allocate a list for sorting, but here I think we may be able to be lazy and just pass a generator into the template call, which should save some RAM on systems handling a lot of packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure :)
Good catch on the nevermindfname_and_hash
. Apparently no test caught that
pypiserver/backend.py
Outdated
) | ||
|
||
|
||
def as_file(fh: t.BinaryIO, destination: PathLike): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bikeshed: as_file()
or to_file()
feels to me like the function will be returning something file-like. What about write_file()
?
pypiserver/backend.py
Outdated
os.remove(pkg.fn) | ||
|
||
def exists(self, filename): | ||
# TODO: Also look in subdirectories? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently we're recursing into root directories, so we may want to do so here too
pypiserver/backend.py
Outdated
|
||
class CachingFileBackend(SimpleFileBackend): | ||
def __init__( | ||
self, config: Configuration, roots: t.List[PathLike], cache_manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to define either a base class or a protocol for the cache_manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, no Protocol
s in python<3.8. I see that I put the type_hint for cache_manager in the __init__
assignment rather that the function signature, so I'm changing that.
Did you mean to add a sort of CacheManager interface/ base implementation that different backends may consume and/or extend? This might be an idea, but I'm afraid that there is not going to be much common requirements between the different backends, since most of the logic will lie in cache invalidation, which is going to be very backend-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, no Protocols in python<3.8
This is true, but I think that if we want to use it, you can get it for earlier pythons by installing typing-extensions.
Did you mean to add a sort of CacheManager interface/ base implementation that different backends may consume and/or extend
Yep, that was the idea, so that you could potentially have like a WatchdogCachingFileBackend
and a WhateverOtherCachingFileBackend
, so long as the mangers share the same interface. It seems like all
we need for that at the moment is listdir()
and digest_file()
, so it might not be too bad to specify an abstract interface
) | ||
|
||
|
||
class SimpleFileBackend(Backend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mypy will probably want us to define types for the subclasses. Since this is a new file, we can add it to the typechecks in tox.ini
to make sure it catches anything we miss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with mypy and I'm not quite sure what you mean with this. A subclass is a type, right? Or do you mean something else?
I'll add the new file(s) to the tox.ini
and see what needs be done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I just meant that we would need to type these methods for mypy to be happy, and that mypy will enforce that they match the base class types to satisfy the liskov substitution principle
Thanks for the review, much appreciated :)
Yeah I was also thinking of having a
Maybe it's best to give it it's own PR, so that we don't have to make this one larger than it already is :)
Yeah I realized that there are two types of metadata that we must distinguish. One is the PEP503 metadata that we need for full support of this PEP. The other is geared towards extra data that a specific backend implementation may need to keep track of it's packages, such as a package
Yeah sure, I'll finish up this PR and then create a new one for the additional work to ease the integration pain a little bit |
Co-authored-by: Matthew Planchard <mplanchard@users.noreply.github.com>
This PR is a pretty substantial refactor of the entrypoints of pypiserver (`__main__` and `__init__`) to use the argparse-based config added in #339. - Updated `RunConfig` and `UpdateConfig` classes to have exclusive init kwargs, instead of taking an namespace. This turned out to be much easier when working with the library-style app initialization in `__init__`, both for direct instantiation and via paste config - Added an `iter_packages()` method to the `RunConfig` to iterate over packages specified by the configuration (note @elfjes, I think that replacing this with e.g. a `backend` reference will be a nice way to tie in #348) - Added a general-purpose method to map legacy keyword arguments to the `app()` and `paste_app_factory()` functions to updated forms - Refactored the `paste_app_factory()` to not mutate the incoming dictionary - Removed all argument-parsing and config-related code from `__main__` and `core` - Moved `_logwrite` from `__init__` to `__main__`, since that was the only place it was being used after the updates to `core` - Updated `digest_file` to use `hashlib.new(algo)` instead of `getattr(hashlib, algo)`, because the former supports more algorithms - Updated `setup.py` to, instead of calling `eval()` on the entirety of `__init__`, to instead just evaluate the line that defines the version - Assigned the config to a `._pypiserver_config` attribute on the `Bottle` instance to reduce hacky test workarounds - Fixed the tox config, which I broke in #339 * Config: add auth & absolute path resolution * Config: check pkg dirs on config creation * Instantiate config with kwargs, not namespace * WIP: still pulling the threads * Init seems to be working * tests passing locally, still need to update cache * Fix tox command * unused import * Fix typing * Be more selective in exec() in setup.py * Require accurate casing for hash algos * Remove old comment * Comments, minor updates and simplifications * move _logwrite to a more reasonable place * Update config to work with cache * Type cachemanager listdir in core * Update config module docstring, rename method * Add more comments re: paste config * Add comments to main, remove unneded check * Remove commented code * Use {posargs} instead of [] for clarity in tox * Add dupe check for kwarg updater * Remove unused references on app instance * Fix typo * Remove redundancy in log level parsing
I've added a It took me a while to figure out how to exactly add the argument and make it work with the Speaking of pluggable backends, @mplanchard when you have time, could you give me your feedback on the proposal I made for plugins in Also, if you're ok with the current status of this PR, we can merge it, so that I can start working on the other things in a separate PR. |
There is definitely still room for improvement in making these obvious/easy to add to.
I think the way we would approach this would be to have a dynamic portion of the config object, e.g. |
yeah that could work. Or like a |
@@ -0,0 +1,2 @@ | |||
[run] | |||
omit = pypiserver/bottle.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good thinking
|
||
Observer = None | ||
|
||
ENABLE_CACHING = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but I wonder if we should (for 2.0) make this a config option instead of implicitly using the cache if the package is available. The implicitness feels potentially surprising to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see you added this already!
pypiserver/config.py
Outdated
|
||
backend = available_backends[arg] | ||
|
||
return BackendProxy(backend(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put this in the _ConfigCommon
so that it's available for the update command too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense to me. I think the main reason I didn't do that yet, was because it currently requires hash_algo
to be part of the same config and that one's on the RunConfig. But if you're fine with moving that to common, then I see no reason to not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, now that we need it in both the run and update configs, we might as well move it
|
||
|
||
class _ConfigCommon: | ||
hash_algo: t.Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the hash_algo
init argument to this common config?
"digest", # The file digest in the form of <algo>=<hash> | ||
"relfn_unix", # The relative file path in unix notation | ||
"parsed_version", # The package version as a tuple of parts | ||
"digester", # a function that calculates the digest for the package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this documentation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I mostly put it there for my own sanity until I have the chance to refactor PkgInfo
self.fn = fn | ||
self.root = root | ||
self.relfn = relfn | ||
self.relfn_unix = None if relfn is None else relfn.replace("\\", "/") | ||
self.replaces = replaces | ||
self.digest = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be digester
to match the __slots__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both digest and digester are part of slots. I do see that I need to assign self.digester=None
in __init__
to be consistent
from pypiserver.backend import SimpleFileBackend, CachingFileBackend | ||
from pypiserver import get_file_backend | ||
|
||
DEFAULT_PACKAGE_DIRECTORIES = ["~/packages"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull this from config.DEFAULTS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in my opinion, if we're doing this as plugins, then the package root is a config setting specific to the file backends. Which would also mean that the cli argument should not be part of the global config, but provided by the plugin. Which is why I added that definition there, and not take it from the global config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does make sense. I just want to avoid needing to update this twice if we update it (although the likelihood that we'll ever change this is pretty low tbf)
Hi @mplanchard, It's been a while, how are you? What do you think about the status of this PR? As far as I'm concerned it's ready to be merged. I've processed most, if not all of your feedback so far. Are there still any blocking things? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being slow on this. Q4 and the beginning of Q1 have been crazy at my day job. Let's get it in!
pypiserver/config.py
Outdated
# if arg not in available_backends: | ||
# raise argparse.ArgumentTypeError( | ||
# f"Value must be one of {', '.join(available_backends.keys())}" | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks, yeah I thought I might need to check that here, but it's actually picked up by argparse
Leaving it to you to merge. Let me know if you don't have permissions, and I'll hit the button instead |
@mplanchard @elfjes if no rights to merge, I can do also! |
Yeah I think so! For bigger stuff, we should probably both review, but for small things I'm okay with either of us approving and merging. And feel free to tag me for a review if there's anything at all you're uncertain about or would like me to weigh in on. If there's anything time sensitive, bug me in Zulip. It's harder for me to ignore real time chat than GH issues 🤪. |
I removed that dead code you mentioned, so I guess it needs reapproval. I don't have a merge button so if any of you @mplanchard @dee-me-tree-or-love can merge it, I'd be much obliged :) |
Following the discussion in #253 and #325 I've created a first iteration on what a
Backend
interface could look like and how the current file storage operations may be refactored into this interface. It goes from the following principlesapp.py
talks only tocore.py
with regards to package operationsBackend
implementation is chosen and created for the lifetime of the configured appcore.py
proxies requests for packages to thisBackend()
Backend
interface/api is defined through three thingsPkgFIle
class that is (should be) the main carrier of dataBackend
s to promote extensibilityOther things I've done in this PR:
CacheManager
(no more race conditions whenwatchdog
is installed during testing)os.path
andpy.path
in favour ofpathlib
Furthermore I've created a
plugin.py
with a sample of how I think plugin system could look like. This sampIe assumes we useargparse
and allows for the extension of cli arguments that a plugin may need. I think the actual implementation of such a plugin system is beyond the scope of this PR, but I've used it as a target for the Backend refactoring. If requested, I'll remove it from this PR.The following things still need to be done / discussed. These can be part of this PR or moved into their own, separate PRs
PgkFile
class. It currently consists of a number of attributes that don't necessarily belong with it, and not all attributes are aptly named (imho). I would like to minimalize the scope ofPkgFile
so that its only concern is being a data carrier between the app and the backends, and make its use more clear.PkgFile.metadata
that backend implementations may use to store custom data for packages. For example the currentPkgFile.root
attribute is an implementation detail of the filestorage backends, and other Backend implementations should not be bothered by it.pathlib
wherever possible. This may also result in less attributes forPkgFile
, since some things may be just contained in a singlePath
object, instead of multtiple strings.CacheManager
.