Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor environment config variables to path #30855

Closed
tobiasdiez opened this issue Nov 3, 2020 · 45 comments
Closed

Refactor environment config variables to path #30855

tobiasdiez opened this issue Nov 3, 2020 · 45 comments

Comments

@tobiasdiez
Copy link
Contributor

Currently, sage.env contains a huge list of variables that can be set via env vars or sage_conf. These are simple string variables. However, in most cases these variables actually have a different type, e.g. they point to files or are booleans. This ticket proposes to convert these stringl variables into the proper class (e.g. Path or bool).

Since now all paths in sage.env are pathlib.Paths, some other files had to be changed since they still assumed to be str. I tried to make the changes as minimal as possible, leaving a deeper refactoring to follow-up tickets.

CC: @mkoeppe

Component: interfaces: optional

Branch/Commit: public/refactoring/envVars @ 7a32e55

Reviewer: Matthias Koeppe

Issue created by migration from https://trac.sagemath.org/ticket/30855

@tobiasdiez tobiasdiez added this to the sage-9.3 milestone Nov 3, 2020
@mkoeppe
Copy link
Member

mkoeppe commented Nov 3, 2020

comment:2

Note that in sage.features we have a rather similar approach to configuration. Can't redo this again in sage.env

@tobiasdiez
Copy link
Contributor Author

comment:3

Thanks for the pointer to sage.features.
It seems that sage.features.databases is the closest to the refactoring done in this ticket. However, even there sage.env.CREMONA_MINI_DATA_DIR is used. I don't see any disadvantages of replacing this string variable with a method returning a Path object. In particular, there is no code duplication (although the code in features.databases can then easily be streamlined using the pathlib library).

Or are you proposing to replace/encapsulate all variables in sage.env in separate features?

@mkoeppe
Copy link
Member

mkoeppe commented Nov 3, 2020

comment:4

Replying to @tobiasdiez:

Thanks for the pointer to sage.features.
It seems that sage.features.databases is the closest to the refactoring done in this ticket. However, even there sage.env.CREMONA_MINI_DATA_DIR is used.

Yes, sage.features.databases provides an abstraction on top of sage.env.

I would not consider it an improvement to put another, intermediate abstraction inside sage.env.

@mkoeppe
Copy link
Member

mkoeppe commented Nov 3, 2020

comment:5

Replying to @tobiasdiez:

are you proposing to replace/encapsulate all variables in sage.env in separate features?

Well, encapsulating these variables is the direction in which the code added to sage.features has been going.

@tobiasdiez
Copy link
Contributor Author

comment:6

Replying to @mkoeppe:

I would not consider it an improvement to put another, intermediate abstraction inside sage.env.

I think you misunderstood my intention a bit. I didn't add another layer of abstraction, I just changed the interface to the same information. Instead of yielding a string path, it returns now a proper Path object. I can also readd the variable ELLCURVE_DATA_DIR having type Path if you prefer that instead of the getter-method. I just thought that since there is a disk access involved, it might speedup initiation a bit to encapsulate everything in a method. That can easily changed, of course.

Well, encapsulating these variables is the direction in which the code added to sage.features has been going.

ok, but adding such an abstraction for the ellCurve data directory can be done later (and wasn't my intention).

@mkoeppe
Copy link
Member

mkoeppe commented Nov 3, 2020

comment:7

The patch is adding file system inspection to sage.env - this makes things less uniform. The other database features are doing this in sage.features via StaticFile.

@tobiasdiez
Copy link
Contributor Author

comment:8

I agree, it's less uniform because of the special handling of paths.

To make things clear, can we agree that the following is the desired behavior for paths like the ellCurve data directory?

  1. If user sets the path via an env variable, and the path exists, take it.
  2. If not, but sage.conf sets a path that exists, take it.
  3. Else, fall back to hard-coded folder relative to SAGE_SHARE.

I see the following two approaches:
Option a) is to have all this logic in sage.env (maybe with a light wrapper in sage.features) and Option b) is to expose the steps 1 and 2 as methods get_env_variable and get_sage_conf_variable in sage.env and put the file check logic in sage.features. I have a light preference for Option a) as this would require the least amount of changes but Option b) also sounds fine with me. Which option do you prefer?

@mkoeppe
Copy link
Member

mkoeppe commented Nov 3, 2020

comment:9

Replying to @tobiasdiez:

To make things clear, can we agree that the following is the desired behavior for paths like the ellCurve data directory?

  1. If user sets the path via an env variable, and the path exists, take it.
  2. If not, but sage.conf sets a path that exists, take it.
  3. Else, fall back to hard-coded folder relative to SAGE_SHARE.

That's different and more complex than the current behavior. Do you need this? Then please clarify the ticket description - it's not just refactoring.

@tobiasdiez
Copy link
Contributor Author

comment:10

I don't really need this, but thought it might be helpful / desired. If you don't think it's needed, then I can easily remove this extra logic again.

@tobiasdiez
Copy link
Contributor Author

comment:11

On a second thought, I actually need something like this in #30371. Not so much for paths but for package names. In fact the package name check implemented in #30706 has pretty much the same spirit as this the refactoring done in this ticket.

@mkoeppe
Copy link
Member

mkoeppe commented Nov 3, 2020

comment:12

If you need a new behavior for configuration of "package names" (not sure what you mean by that), please open a ticket for it. In general, I recommend to keep tickets that implement new features or change behavior separate from tickets that refactor existing code.

@tobiasdiez
Copy link
Contributor Author

comment:13

What I meant was that in #30706 the added method get_cblas_pc_module_name has the same structure as get_ellcurve_data_dir: it checks the env variables and the sage.conf information for the first existing option (using pkgconfig).

Anyway, is the filepath handling described above in #30855 comment:8 desired or not? I don't really the need this additional logic, but it was easy to implement so I went for it while refactoring. I can easily revert it, just please let me know.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

5be342eMerge branch 'develop' of git://github.com/sagemath/sage into public/refactoring/envVars

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Changed commit from eef3168 to 5be342e

@tobiasdiez

This comment has been minimized.

@tobiasdiez tobiasdiez modified the milestones: sage-9.3, sage-9.4 Jan 25, 2021
@tobiasdiez tobiasdiez changed the title Refactor environment config variables Refactor environment config variables to path Jan 25, 2021
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Changed commit from 5be342e to 265bdd7

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

265bdd7Use paths everywhere

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

df1cdafFix elliptic curve database path

@tobiasdiez
Copy link
Contributor Author

comment:23

I have a hard time seeing how SAGE_LOCAL is a feature, but I've now removed the existence check. So this is really only a refactoring from string-paths to pathlib-paths.

@tobiasdiez

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Changed commit from e7fa157 to 7a32e55

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jan 25, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

7a32e55Fix doctests as well

@mkoeppe
Copy link
Member

mkoeppe commented Jan 25, 2021

comment:26

SAGE_LOCAL is not a great example - pretty much any use of SAGE_LOCAL in src/sage needs to be removed anyway.

@mkoeppe
Copy link
Member

mkoeppe commented Jan 25, 2021

comment:27

For example this one here:

--- a/src/sage/interfaces/lie.py
+++ b/src/sage/interfaces/lie.py
@@ -331,7 +331,7 @@ class LiE(ExtraTabCompletion, Expect):
                         prompt = '> ',
 
                         # This is the command that starts up your program
-                        command = "bash "+ SAGE_LOCAL + "/bin/lie",
+                        command = f"bash { SAGE_LOCAL / 'bin' / 'lie' }",
 
                         server=server,
                         script_subdirectory = script_subdirectory,

This would better be rewritten using an Executable feature.

@mkoeppe
Copy link
Member

mkoeppe commented Jan 25, 2021

comment:28

Also, as changes like this one indicate:

--- a/src/sage/doctest/sources.py
+++ b/src/sage/doctest/sources.py
@@ -87,8 +87,8 @@ def get_basename(path):
     root = os.path.dirname(path)
     # If the file is in the sage library, we can use our knowledge of
     # the directory structure
-    dev = SAGE_SRC
-    sp = SAGE_LIB
+    dev = str(SAGE_SRC)
+    sp = str(SAGE_LIB)
     if path.startswith(dev):
         # there will be a branch name
         i = path.find(os.path.sep, len(dev))

The change of the type of the variables is not backwards-compatible (although in some contexts you can use path like a string). So there is a danger of breaking user code.

@tobiasdiez
Copy link
Contributor Author

comment:29

I agree, that some of the code can be further refactored and e.g. moved to the feature module. However, this only changes the location of code such as SAGE_LOCAL + "/bin/lie" and thus it is still advantageous to have SAGE_LOCAL as path.

What do you propose to do about the backward-compatibility? What are sage's conventions concerning breaking changes (especially concerning relatively internal things like sage.env)?

@mkoeppe
Copy link
Member

mkoeppe commented Jan 25, 2021

comment:30

It's really very simple:
We have a low-level interface (sage.env), we have a high-level interface (sage.features). It is not a clear improvement to make the existing low-level interface slightly higher-level. Instead, improve code that uses the low-level interface by using the high-level interface instead.

@tobiasdiez
Copy link
Contributor Author

comment:31

There are over 400 usages of these path variables in sage.env (conservative estimate based on searching for join(SAGE_). All of these calls profit from the strengthen of the interface to use pathlib. Even if refactoring to sage.features would remove half of them, that's still a huge number of string -> path conversions that have to be repeatedly done in many places. So why not fix this at the origin?

@mkoeppe
Copy link
Member

mkoeppe commented Jan 26, 2021

comment:32

Replying to @tobiasdiez:

What are sage's conventions concerning breaking changes

We use the deprecation mechanism if we have to make incompatible changes.

Hard to do use deprecation here - which is another reason why this old interface (sage.env) is best left alone.

@mkoeppe
Copy link
Member

mkoeppe commented Jan 26, 2021

comment:33

Replying to @mkoeppe:

SAGE_LOCAL is not a great example - pretty much any use of SAGE_LOCAL in src/sage needs to be removed anyway.

For reference - #30818

@tobiasdiez
Copy link
Contributor Author

comment:34

Why are you so much against the idea of improving sage.env? For me it is sage's interface for configuration (via environment variables or sageconf). So it is natural that many place in sage will access this information. And I don't understand why you don't want to have an universal check for basic consistency (paths specified exist, boolean flags are yes/no etc). Right now these things are checked in various places, leading to code duplication and additional work for maintenance.

For deprecation, that's possible using a wrapper around Path and str that prints warning messages if the string-methods are used (similar to https://stackoverflow.com/questions/922550/how-to-mark-a-global-as-deprecated-in-python)

@mkoeppe
Copy link
Member

mkoeppe commented Jan 30, 2021

comment:35

Replying to @tobiasdiez:

Why are you so much against the idea of improving sage.env? For me it is sage's interface for configuration (via environment variables or sageconf).

See comment 4.
sage.env is a low-level interface - a thin wrapper around os.environ, combining environment variables and sage_conf and simple fallbacks. It maps string keys to strings.

Adding complexity there is not an improvement.

sage.features was designed to provide a high-level interface to the system environment of sagelib (#20382).

It already does what it seems you are trying to introduce in sage.env: It is typed and it implements filesystem lookups.

@mkoeppe
Copy link
Member

mkoeppe commented Jan 30, 2021

comment:36

Replying to @tobiasdiez:

There are over 400 usages of these path variables in sage.env (conservative estimate based on searching for join(SAGE_).

git grep 'join(SAGE' src/sage | grep -v 'sage:' (ignoring lines that are in doctests!) gives 97 hits, which will need review. That's really just a small remainder of refactoring work that has already been done in the past few years.

For example, all places using SAGE_EXTCODE would be changed by #31306.

@tobiasdiez
Copy link
Contributor Author

comment:37

And how are the features supposed to locate files in SAGE_LOCAL, SAGE_VENV, SAGE_ETC, SAGE_LIB, SAGE_SHARE etc? And why is it not helpful for this if these variables are Paths?

@mkoeppe
Copy link
Member

mkoeppe commented Jan 30, 2021

comment:38

Replying to @tobiasdiez:

And how are the features supposed to locate files in SAGE_LOCAL, SAGE_VENV, SAGE_ETC, SAGE_LIB, SAGE_SHARE etc?

sage.features.Executable uses PATH. See #31296 for a proposed improvement.

sage.features.StaticFile uses a customizable search path, which defaults to SAGE_SHARE - this probably needs a better design, as we are moving away from assumptions related to the installation scheme of the Sage distribution.

And why is it not helpful for this if these variables are Paths?

That's not how this works. A proposed change needs to be sufficiently motivated - what problem is the ticket trying to solve? In particular, convincing motivation is necessary when breaking changes to the interface are proposed or the complexity of the code is increased by the change.

pathlib is nice, but I don't think there is any sort of consensus in the Python community that it is "better" than os.path. It's just a matter of preferences.

@tobiasdiez
Copy link
Contributor Author

comment:39

Replying to @mkoeppe:

what problem is the ticket trying to solve?

The idea was to provide basic existence and consistency checks that would apply to all path-based env variables. Since this is apparently not desired, let's close this ticket to not spent more time discussing this.

@mkoeppe
Copy link
Member

mkoeppe commented Jan 30, 2021

comment:40

In any case, the discussion was fruitful as it led to creating tickets #31291, #31292, #31293, #31296, #31306. Any work on these tickets would be very welcome!

@mkoeppe
Copy link
Member

mkoeppe commented Jan 30, 2021

Reviewer: Matthias Koeppe

@mkoeppe
Copy link
Member

mkoeppe commented Jan 30, 2021

Changed author from Tobias Diez to none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants