Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading resources from memory #648

Closed
layday opened this issue Jan 30, 2020 · 12 comments
Closed

Loading resources from memory #648

layday opened this issue Jan 30, 2020 · 12 comments
Labels
on hold use case

Comments

@layday
Copy link
Contributor

@layday layday commented Jan 30, 2020

Alembic uses __file__ to locate both templates and migrations on disk and, although Alembic ostensibly appears to support in-package migration environments [1], it attempts to resolve the script_location on disk. This reliance on __file__ makes embedding Alembic in freezers and ZIP apps difficult. Are there plans to support the new resource API for reading templates and migrations contained in packages going forward, which would allow loading resources from memory?

[1]

For support of applications that package themselves into .egg files, the value can also be specified as a package resource, in which case resource_filename() is used to find the file (new in 0.2.2). Any non-absolute URI which contains colons is interpreted here as a resource name, rather than a straight filename.

https://alembic.sqlalchemy.org/en/latest/tutorial.html?highlight=script_location

@zzzeek zzzeek added the use case label Jan 30, 2020
@zzzeek
Copy link
Member

@zzzeek zzzeek commented Jan 30, 2020

Alembic uses __file__ to locate both templates and migrations on disk and,

this seemed naturally intuitive to me, but then I did a grep on Alembic and in fact I cannot locate the use of __file__ within alembic's source code except for a single location, which is used for the sole purpose of locating Alembic's template files. An alternative way of locating these templates that doesn't use the package_dir method, however nonetheless works on Python 2.7 and Python 3.5 and above, would be welcome. This might be something that could be contributed without much difficulty.

For locating migrations on disk, the situation would appear to be much more complicated. The script_location is resolved to a file location because Alembic does not only read from this location, it also writes to it. I don't see any "write" methods at https://docs.python.org/3/library/importlib.html#module-importlib.resources so it would appear that if this "memory only" mode of operation were supported, the file location would simply be ignored, which means there would need to be a split in the codepaths to support operations in both modes somehow.

The Alembic project would welcome a long-term contributor who wants to work on converting all of the file-oriented methods in https://github.com/sqlalchemy/alembic/blob/master/alembic/script/base.py , https://github.com/sqlalchemy/alembic/blob/master/alembic/util/pyfiles.py etc. to be agnostic of filesystem present or not and to make use of newer Python APis in a conditional way (see pyfiles.py for the general idea) but without actually referring to "files" outside of the pyfiles.py package, that is, some generic API that does "read" / "write". It would appear that breaking out all the file-oriented methods in https://github.com/sqlalchemy/alembic/blob/master/alembic/script/base.py into a new object called a "ScriptStrategy" or something like that, where there is FileScriptStrategy and then ResourceScriptStrategy, would be at least one way to do it.

This is not something the Alembic project has the resources to undertake any time soon, however if you are interested in contributing, I would be willing to provide guidance and code review on an ongoing basis, noting however that a change this big will likely take many months as I don't always have time to review code and it would likely need many revisions.

The contribution would need to include:

  1. continued functionality on Python 2.7, Python 3.5 and above

  2. "write" functionality needs to still work, obviously, however within the scope of this "memory only" mode I guess it wouldn't unless there is some means that this works

  3. new unit tests

  4. new documentation

  5. support of regressions in the new feature for at least one year

The general guidelines for contributing to Alembic are the same as those for SQLAlchemy at https://www.sqlalchemy.org/develop.html. We use a model of pull requests which are then pulled into our gerrit instance at http://gerrit.sqlalchemy.org/ for review.

I'm already overseeing ongoing contributions from a few other contributors so a large part of my day is going through and helping contributors move forward. But things don't get merged until they are as perfect as they can possibly be so there can be a long process before we get there for bigger things.

@layday
Copy link
Contributor Author

@layday layday commented Jan 30, 2020

That does sound like a rather significant undertaking. As a first step, we could generate the template path on demand so that Alembic would run from memory. This would allow me to extract the migrations, from my package, on disk and set the script_location dynamically.

An alternative way of locating these templates that doesn't use the package_dir method, however nonetheless works on Python 2.7 and Python 3.5 and above, would be welcome.

I would be happy to make a PR for this. I see Alembic relies on pkg_resources from setuptools for the script_location but that has been unofficially deprecated. Its replacement, importlib.resources, though I am sure is a lot more robust, is a tiny bit unpleasant to work with in its present form. The first thing to note is that importlib.resources can only interact with explicit packages - packages which contain an __init__.py. We would need to add an __init.py__ not only in templates but also in every individual template folder. At runtime, importlib.resources will import these generating .pyc files in Python 2 and __pycache__ folders in Python 3, which we would need to filter out along with __init__.py, otherwise the migration environment might be picked up by, for example, find_packages in setup.pys downstream. A corollary of the __init__.py limitation is that importlib.resources does not support interacting with resources in sub-folders (see https://gitlab.com/python-devs/importlib_resources/issues/58 for discussion around this issue). I don't know what the best approach to take here is but I've found myself concatenating package names to recurse into sub-packages. I've put together a very quick PoC using importlib.resources here: layday@3d4b606.

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Jan 30, 2020

Does the "unofficial deprecation" of pkg_resources do away with .egg files altogether?

Additionally, I agree that importlib.resources looks very unrefined. Not sure if you've noticed but Python keeps changing things related to importlib constantly , you can see all the battle being waged in compat.py as they continue to make the APIs less clear, more verbose, and continue to deprecate things that were only introduced a few versions earlier.

I find the __init__.py requirement very awkward, what is the rationale for this?

@layday
Copy link
Contributor Author

@layday layday commented Jan 30, 2020

It seems to have something to do with the importlib implementation in CPython relating to PEP 420 namespace packages. I assume that the resource loader/reader/whatnot is not able to distinguish between an implicit package - a package lacking an __init__.py - and a namespace package - multiple synonymous packages on the PYTHONPATH which can only contain other packages (folders). This comment sort of explains it.

I don't know what the plan is for eggs. I've not been using Python for long enough to have experienced the transition from eggs to wheels.

layday added a commit to layday/alembic that referenced this issue Feb 3, 2020
This allows importing Alembic in environments
that do not have `__file__` as discussed in sqlalchemy#648.
layday added a commit to layday/alembic that referenced this issue Feb 3, 2020
This allows importing Alembic in environments
that do not have `__file__` as discussed in sqlalchemy#648.

Fixes: sqlalchemy#648
@zzzeek
Copy link
Member

@zzzeek zzzeek commented Feb 4, 2020

were not done, right, since we want to do a lot more here.

@zzzeek zzzeek reopened this Feb 4, 2020
@layday
Copy link
Contributor Author

@layday layday commented Feb 4, 2020

Thanks for getting #651 in so quickly, and yes, I was following the PR guidelines by adding a "fixes" line which I assume auto-closed this issue; I would still like to work on integrating importlib.resources.

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Feb 12, 2020

note #91 as well

@layday
Copy link
Contributor Author

@layday layday commented Feb 19, 2020

There is apparently a work in progress to replace the importlib.resources API with a newer API so I'm reluctant to work on this until there is some clarity on how we're supposed to retrieve package resources.

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Feb 19, 2020

yes! they keep changing their minds. I'm almost relieved you found that so we don't go down a huge rabbit hole.

@zzzeek
Copy link
Member

@zzzeek zzzeek commented Mar 29, 2020

do we want to leave this open or not? it seems premature to be going down this path right now.

@layday
Copy link
Contributor Author

@layday layday commented Mar 29, 2020

I don't know, but there are more changes yet to come. I assume once this is merged it'll be close to stable but because the API has diverged from the stdlib we'd have to use importlib_resources as a dependency or maintain two separate internal APIs.

@zzzeek zzzeek added the on hold label Apr 27, 2021
@zzzeek
Copy link
Member

@zzzeek zzzeek commented Apr 27, 2021

there's no plans to work on this, reopen if we think we want to continue looking into this.

@zzzeek zzzeek closed this as completed Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on hold use case
Projects
None yet
Development

No branches or pull requests

2 participants