Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing tk.uncached_path #32

Open
JackTemaki opened this issue May 27, 2021 · 5 comments
Open

Removing tk.uncached_path #32

JackTemaki opened this issue May 27, 2021 · 5 comments

Comments

@JackTemaki
Copy link
Contributor

As there were also plans to remove tk.uncached_path completely (well, making it deprecated) from Sispyhus, I would suggest so start removing this from all locations and replacing it by get_path() and get_cached_path(), which is much cleaner. In most parts we do not want to have the dual usage of str/tk.Path, and if you want to pass external wrapping this as a Path before passing this to a Job is much cleaner.

If there are no objections within the next days, I will refactor that myself.

@curufinwe
Copy link
Contributor

Moving to Path only is ok for me. But if we change the caching behavior we should maybe switch to something where the caching is fully integrated also into the python code. I.e. instead of having the path object modify the string it should provide the location of the cached object. Even better: do this through an open wrapper. Something like:

class MyJob(job):
  def __init__(self, path):
    self.path = path

  def run(self):
    with self.path.open("rt"):  # applies the cache function and open the cached file
       pass
   with self.path.open("wt"): # also applies the cache function but also copies the file back to the original filesystem once it is closed
    pass
  some_library.process_read_only(self.path.get_cached_path())  # if some other library processes the file in read-only fashion use the cached path
  some_library.write_to(self.path.get_path())  # otherwise we path the "normal" path

@JackTemaki
Copy link
Contributor Author

JackTemaki commented May 31, 2021

We already have the definable file_caching function as wrapper, so adding some implementation here should be no problem, and does not require to change anything in Sisyphus. So even adding a subprocess call there to call cf might already do the correct thing (which can of course be replaced by a nicer python library that does this)

But if we want to have something like the "copy back", then we need of course some nicer helpers.

@curufinwe
Copy link
Contributor

The function only does some basic manipulation of the string, but ideally we switch this to something better integrated. Actually there are still some places where cf is hardcoded and some places where the cache_manager is called explicitly in python.

@albertz
Copy link
Member

albertz commented Apr 14, 2022

In a lot of jobs, we accept str|Path and then we use tk.uncached_path to handle both cases. See basically all usage of self.returnn_python_exe.

How would we replace this code? Basically just replicate uncached_path?

Note that we must (or should) allow str for multiple reasons:

  • Do not break old setups.
  • gs.RETURNN_PYTHON_EXE can not be a tk.Path because you cannot have from sysiphus import tk inside the settings file due to circular imports (is this expected or another bug?).

@michelwi
Copy link
Contributor

In the spirit of sisyphus, all inputs to a job should be Path objects (or parameters of a basic type) so a path to a tool or script (e.g. returnn) should in fact be a Path object and str should be disallowed except for string-type parameters.
This would also check if the tool exists and only then run the jobs.

My personal ideal world solution for returnn (and RASR, SCTK, etc) would be to use a CloneGitRepositoryJob and set the output Path object as the input of the Job that uses returnn.
A CreatePythonVEnvJob can even create the necessary returnn_python_exe as a path object and install a curated set of required libraries.
This way we would also avoid having people modify their local version of returnn and never push back the changes they made.

Do not break old setups.

Yes, this will be the reason the update will be difficult/impossible

gs.RETURNN_PYTHON_EXE can not be a tk.Path

with my solution we wouldn't need to define it.
Actually RETURNN_PYTHON_EXE is not a setting that is defined in sisyphus.global_settings so we are here in fact abusing the settings mechanics of sisyphus and should not have set this to begin with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants