Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer Guide: Document practices for data files #37399

Merged
merged 13 commits into from Mar 31, 2024

Conversation

mkoeppe
Copy link
Member

@mkoeppe mkoeppe commented Feb 19, 2024

Topics here:

Follow-ups:

  • use of Features

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 19, 2024

@tornaria @gmou3 Let's collaborate here to bring our documentation up to date on these topics.

Comment on lines 168 to 177
- Large data files should not be added to the Sage source tree.
Instead:

- Create a separate git repository for them
- Add metadata in your repository that make it a pip-installable
package (distribution package)
- Upload it to PyPI
- Create metadata in ``SAGE_ROOT/build/pkgs`` that make your new
pip-installable package known to Sage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Some ideas:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, linking to these projects as examples of best practices is a good idea. Will do (unless someone sends me a PR first... hint, hint)

Copy link
Member Author

@mkoeppe mkoeppe Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tornaria are these points taken care of by the current version of the text already?

@tornaria
Copy link
Contributor

tornaria commented Feb 19, 2024

For accessing resources in packages, link to https://docs.python.org/3/library/importlib.resources.html#importlib.resources.files. Note everything else is deprecated.

Basic examples:

  1. open a resource for text reading: fd = files(package).joinpath(resource).open('rt')
  2. open a resource for binary reading: fd = files(package).joinpath(resource).open('rb')
  3. read a resource as text: text = files(package).joinpath(resource).read_text()
  4. read a resource as bytes: bytes = files(package).joinpath(resource).read_bytes()
  5. open a xz resource for text reading: fd = lzma.open(files(package).joinpath(resource), 'rt')
  6. open a xz resource for binary reading: fd = lzma.open(files(package).joinpath(resource), 'rb')

Note: as_files() is only necessary if one needs an actual file by path, e.g. to pass to an external command. AFAICT, it's not needed to use as in the examples above.

Edit: I added explicit 'rt' to text mode since open() and lzma.open() have different meanings for 'r'.

@tornaria
Copy link
Contributor

Notes:

  1. I think in both conway-polynomials and matroid-database, the construct TextIOWrapper(lzma.open(FILE)) is equivalent to lzma.open(FILE, 'r')
  2. Is the as_files(dbpath) really necessary in https://github.com/sagemath/conway-polynomials/blob/master/src/conway_polynomials/__init__.py#L117 ?
  3. Is the .open("rb") really necessary in https://github.com/gmou3/matroid-database/blob/main/src/matroid_database/__init__.py#L89 ?

@gmou3
Copy link
Contributor

gmou3 commented Feb 19, 2024

This code looks nice to me:

import lzma
from importlib.resources import files

path = files(module).joinpath(name).with_suffix(".txt.xz")
try:
    return lzma.open(path, 'rt')
except FileNotFoundError:

It seems like we want the 'rt' option: https://docs.python.org/3/library/lzma.html.

@tornaria
Copy link
Contributor

It seems like we want the 'rt' option: https://docs.python.org/3/library/lzma.html.

You are right; a different default from io.open() and also a different meaning of 'r' 🤷

Comment on lines 176 to 177
This is no longer recommended, and PRs that update such uses
are welcome.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not un-recommend this until sage.interfaces is a package or we remove support for python 3.9. The alternative is ugly, and is adding code that will have to be replaced again soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding code that will have to be replaced again soon

why is this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib_resources is only needed for namespace packages for python 3.9 so your try hack is unecessary with python 3.10.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please. It's not a "hack". These are the best practices.

And by documenting this method to import it, we avoid the "too much information" problem where we tell developers every little detail of what is supported on what version etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please. It's not a "hack". These are the best practices.

According to whom?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe second-best practices

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now (with abf3ae6), at best practices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give an external reference (i.e. not yourself)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to tell what you are looking for. https://importlib-resources.readthedocs.io/en/latest/using.html# leaves no doubt how to import from it. We have decided to conditionalize the dependency on the backport package in #36776 @orlitzky, hence here we conditionalize the import.

In case you are looking for this: You can see how it is used in projects in the wild in this code search:
https://github.com/search?q=%22from+importlib_resources+import%22&type=code&p=1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not un-recommend this until ...

@tornaria I've changed it in 4d38922, please take a look.

@mkoeppe
Copy link
Member Author

mkoeppe commented Feb 20, 2024

@orlitzky @soehms You might be interested in this PR as well

@orlitzky
Copy link
Contributor

Notes:

1. I think in both conway-polynomials and matroid-database, the construct `TextIOWrapper(lzma.open(FILE))` is equivalent to `lzma.open(FILE, 'r')`

2. Is the `as_files(dbpath)` really necessary in https://github.com/sagemath/conway-polynomials/blob/master/src/conway_polynomials/__init__.py#L117 ?

They make the type-checker (mypy) happy.

Comment on lines 169 to 175
try:
# Use backport package providing Python 3.11 features
from importlib_resources import files
except ImportError:
from importlib.resources import files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
# Use backport package providing Python 3.11 features
from importlib_resources import files
except ImportError:
from importlib.resources import files
from importlib.resources import files

maybe with a note about not using resources in namespace packages until 3.9 is removed support, i.e. apr 2024?. Really not worth doing this for 2 months and using the version of importlib.resources from python is better to avoid bleeding edge and stick to long term stable API.

Your code allows one, in principle, to use features from new importlib_resources which are not available in the version of python used. But if those features are indeed used, this results in an incompatibility with a system where python is new enough to have importlib.resources but where importlib_resources is not installed (or not up to date). From that point of view is much better to stick to the stable API, meaning features that are common available in all the supported versions of python.

The only newer nicety I see (available on 3.12) is that one can use files() to stand for files(__package__)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case of "importlib_resources not up to date" is a real point. I'll make a change.

I don't agree with the rest what you're saying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In abf3ae6, I've changed it to using an explicit version check (for 3.11!) instead of try... except.

I'm using 3.11, not 3.9.
Note the section on "Language Standard" in the Developer's Guide: The idea is that we develop the Sage library to one language standard, and that's currently 3.9 + the backports for 3.11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until 3.9 is removed support, i.e. apr 2024?

We don't have a set schedule for this yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only newer nicety I see (available on 3.12) is that one can use files() to stand for files(package)

That's quite nice, so let's please merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until 3.9 is removed support, i.e. apr 2024?

We don't have a set schedule for this yet.

42 months as per NEP 29 is apr 2024. They call this best practices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please @tornaria. Our project has not adopted NEP 29. If you want to know about that, read
https://github.com/sagemath/sage/wiki/NEP-29:-Python-version-strategy

Comment on lines 171 to 175
if sys.version_info < (3, 11):
# Use backport package providing Python 3.11 features
from importlib_resources import files
else:
from importlib.resources import files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sys.version_info < (3, 11):
# Use backport package providing Python 3.11 features
from importlib_resources import files
else:
from importlib.resources import files
from importlib.resources import files

I already gave rationale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already know that I disagree, and I have explained it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't agree with the suggestion, feel free to label the PR as disputed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken this part out now to regain velocity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@tornaria tornaria mentioned this pull request Feb 20, 2024
5 tasks
@mkoeppe mkoeppe force-pushed the document_package_data_practices branch from 4acd6ca to 1bbc922 Compare February 20, 2024 18:45
@mkoeppe mkoeppe mentioned this pull request Feb 22, 2024
3 tasks
@mkoeppe mkoeppe force-pushed the document_package_data_practices branch from 1bbc922 to 8dcb34a Compare February 26, 2024 23:16
@mkoeppe mkoeppe marked this pull request as ready for review February 26, 2024 23:18
@mkoeppe mkoeppe force-pushed the document_package_data_practices branch from 8dcb34a to 6984347 Compare March 5, 2024 21:32
@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 5, 2024

Let's get this in please.

@orlitzky
Copy link
Contributor

orlitzky commented Mar 6, 2024

I agree with @tornaria that it would be better to use importlib.resources unconditionally, and simply add an __init__.py to the database packages in the meantime (I already did this with conway-polynomials). We would be better off without the three backport packages.

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 6, 2024

But this document does not describe the use of importlib.resources in the standalone packages.
It describes its use in the Sage library.

@orlitzky
Copy link
Contributor

orlitzky commented Mar 6, 2024

But this document does not describe the use of importlib.resources in the standalone packages. It describes its use in the Sage library.

Ok, but same answer for all the same reasons.

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 6, 2024

Well, no. The modularization of the Sage library and its uses of namespace packages does not disappear just like that.

@orlitzky
Copy link
Contributor

orlitzky commented Mar 6, 2024

The database should only be present in one distribution right? And if not, I think there's a decent consensus that we should chill for a while on the init file removal until namespace packages are well-supported upstream.

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 6, 2024

I think there's a decent consensus that we should chill for a while on the init file removal until namespace packages are well-supported upstream.

No, Michael, there is no such consensus, and certainly not a "decent" one. That the Sage library uses namespace packages for the purpose of the modularization is documented policy in the developer's guide.

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 6, 2024

The database should only be present in one distribution right?

For the separate distributions (conway-polynomials, matroid-database), using an ordinary package (with __init__.py) is of course the right choice, as they control their own namespaces.

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 6, 2024

What we are trying to document here is to cover cases such as SAGE_ROOT/src/sage/interfaces/maxima.lisp.
sage.interfaces is a namespace package, and in fact one that is designed to be split over 6 distribution packages:
https://github.com/sagemath/sage/pull/36676/files#diff-2af172ee945eebb48bbe8a3cdd6b523221c1a995b2e4f8ac23fa7c3e1181e6ef

@orlitzky
Copy link
Contributor

orlitzky commented Mar 6, 2024

No, Michael, there is no such consensus, and certainly not a "decent" one. That the Sage library uses namespace packages for the purpose of the modularization is documented policy in the developer's guide.

It's in the developer's guide because you put it there.

What we are trying to document here is to cover cases such as SAGE_ROOT/src/sage/interfaces/maxima.lisp.
sage.interfaces is a namespace package, and in fact one that is designed to be split over 6 distribution packages:

Handled by doing nothing and waiting for namespace package support upstream. Nobody needs this by tomorrow, and the maintenance to keep it working is a headache.

Comment on lines 40 to 57
(to be used in place of ``importlib.metadata`` when Python older
than 3.11 is in use),
- `importlib_resources <../reference/spkg/importlib_resources>`_:
If necessary, you can use it in place of ``importlib.resources``
when Python older than 3.11 is in use if you need a newer
feature::

import sys

if sys.version_info < (3, 11):
# Use backport package providing Python 3.11 features
from importlib_resources import files
else:
from importlib.resources import files

- `typing_extensions <../reference/spkg/typing_extensions>`_
(to be used in place of ``typing``).
(to be used in place of ``typing`` when any of the features
introduced after Python 3.9.0 are used).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot about this, other that this the PR looks good to me.

I would remove almost all of this section, starting from "Some key language and library functions..." to the end, but I understand if you don't want to do it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't forget about it. I actually changed the language so it does not unconditionally asks for it to be imported like that.

Copy link
Contributor

@tornaria tornaria Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, let's not play games.

Two reviewers showed concern for this. You can have a different opinion, but if you accept to take it out "to regain velocity", then do it without tricks fully. I already stated my positive review conditional on this.

Edit: toned it down. I didn't intend to offend @mkoeppe at all. The underlying message is a constructive review to get the 95% of your PR that we all agree is a very nice improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonzalo, this is uncalled for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your concern about this improvement of the documentation in the "Language Standard" section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with the functionality discussed here. The only thing I can do is moderate the conflict (if my understanding of it is good enough). Could this be a compromise: we leave the example code-block as is and add a note that some distributions no longer ship importlib_resources for good reasons. Ideally we should add some URL pointers that address these good reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with the functionality discussed here. The only thing I can do is moderate the conflict (if my understanding of it is good enough). Could this be a compromise: we leave the example code-block as is and add a note that some distributions no longer ship importlib_resources for good reasons. Ideally we should add some URL pointers that address these good reasons.

Sorry I missed this before.

It seems we don't agree whether to recommend or not using importlib_resources. In this
context, I think merging this nice PR without this small snippet is a good way to go, and I thought that's what @mkoeppe was proposing. That's why I gave it positive review conditional on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split out the improvement of this section of the manual (orthogonal to the goal of this PR) out to a separate PR and will take @soehms's suggestion into account there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done now in the rebased branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and in #37654.

location as the Python code. This is referred to as "package data".

The preferred way to access the data from Python is using the
`importlib.resources API <https://importlib-resources.readthedocs.io/en/latest/using.html>`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't look rendered correctly when I follow the documentation link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6045e5c, I hope

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 23, 2024

@mkoeppe mkoeppe requested a review from kiwifb March 24, 2024 21:14
@mkoeppe mkoeppe force-pushed the document_package_data_practices branch from 6045e5c to d3a4313 Compare March 26, 2024 04:37
@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 26, 2024

Let's get this in please.

Copy link

Documentation preview for this PR (built with commit d3a4313; changes) is ready! 🎉

@mkoeppe
Copy link
Member Author

mkoeppe commented Mar 28, 2024

Thanks.

@vbraun vbraun merged commit e6a377e into sagemath:develop Mar 31, 2024
14 of 15 checks passed
@mkoeppe mkoeppe added this to the sage-10.4 milestone Mar 31, 2024
vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 20, 2024
…kages

    
<!-- ^ Please provide a concise and informative title. -->
<!-- ^ Don't put issue numbers in the title, do this in the PR
description below. -->
<!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method
to calculate 1 + 2". -->
<!-- v Describe your changes below in detail. -->
<!-- v Why is this change required? What problem does it solve? -->
<!-- v If this PR resolves an open issue, please link to it here. For
example, "Fixes sagemath#12345". -->

- sagemath#35203 added these packages as unconditional dependencies
- prompted by a sporadic use of typing_extensions in the Sage library
(see sagemath#34831)
- motivated further by the demand to immediately drop support for Python
3.8 so that newer typing features can be used in the Sage library
(sagemath#35404)
- sagemath#36776 reduced the packages to
conditional dependencies

Here we improve the documentation in the section "Language standard" of
the developer's guide so that it aligns with how the conditional
dependencies are declared.

Based on changes split out from sagemath#37399.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#37654
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe
@mkoeppe mkoeppe deleted the document_package_data_practices branch May 5, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants