Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for loading NLD/GSF from disk #139

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

vetlewi
Copy link
Collaborator

@vetlewi vetlewi commented Jul 22, 2020

I've added a static method to the Extractor class that will allow the user to load an existing ensemble of NLD and gSF from disk. This will eventually close issue #138.

At the moment there are some issues that will need to be resolved before ready for merging:

  • Determine that the state of the object is valid
  • Write unit tests for the load method

@vetlewi vetlewi added the feature_request Requestion this feature label Jul 22, 2020
@vetlewi vetlewi linked an issue Jul 22, 2020 that may be closed by this pull request
@fzeiser
Copy link
Collaborator

fzeiser commented Jul 27, 2020

There seems to be some confusion: There was a load method, but I removed it in 32572d5. The results are anyhow loaded in extract_from, if the regenerate flag is set. However, I guess following line should be updated then:

ompy/ompy/extractor.py

Lines 34 to 35 in 0fba72a

self.nld and self.gsf, as well as saved to disk. The saved results are
used if filenames match, or can be loaded manually with `load()`.

The cleanest alternative would be to refactor the code to have a clear load and save function that is used and not two alternative implementations for the same goal. However, I guess I was just too lazy to do something about it last time [at least it's not a broken functionality].

@fzeiser
Copy link
Collaborator

fzeiser commented Aug 13, 2020

  • I agree that the pull request makes sense. However, in the current scheme there is still a doubling of the load and save methods:

    ompy/ompy/extractor.py

    Lines 131 to 143 in dd10550

    for i in tqdm(range(self.ensemble.size)):
    nld_path = self.path / f'nld_{i}.npy'
    gsf_path = self.path / f'gsf_{i}.npy'
    if nld_path.exists() and gsf_path.exists() and not regenerate:
    LOG.debug(f"loading from {nld_path} and {gsf_path}")
    nlds.append(Vector(path=nld_path))
    gsfs.append(Vector(path=gsf_path))
    else:
    nld, gsf = self.step(i)
    nld.save(nld_path)
    gsf.save(gsf_path)
    nlds.append(nld)
    gsfs.append(gsf)
  • If you want to continue on this issue, I'll also put in a flag that this should be metnioned in the release notes (potentially incompatible loading old data).
  • if one removes the trapezoid (...) from the call syntax one probably has to adopt the "getting stated" notebook

@fzeiser fzeiser marked this pull request as ready for review August 13, 2020 13:04
@fzeiser fzeiser marked this pull request as draft August 13, 2020 13:05
I've added a dedicated save method and refactored the `extract_from()` method
@vetlewi
Copy link
Collaborator Author

vetlewi commented Aug 14, 2020

  • I agree that the pull request makes sense. However, in the current scheme there is still a doubling of the load and save methods:

    ompy/ompy/extractor.py

    Lines 131 to 143 in dd10550

    for i in tqdm(range(self.ensemble.size)):
    nld_path = self.path / f'nld_{i}.npy'
    gsf_path = self.path / f'gsf_{i}.npy'
    if nld_path.exists() and gsf_path.exists() and not regenerate:
    LOG.debug(f"loading from {nld_path} and {gsf_path}")
    nlds.append(Vector(path=nld_path))
    gsfs.append(Vector(path=gsf_path))
    else:
    nld, gsf = self.step(i)
    nld.save(nld_path)
    gsf.save(gsf_path)
    nlds.append(nld)
    gsfs.append(gsf)

Refactored. There should also be written a unit test for the methods. Will do it before pull.

@vetlewi
Copy link
Collaborator Author

vetlewi commented Aug 14, 2020

I think that before merging this PR there should be a super class defining the save/load syntax.

Probably something like

class LoadSaver():
    def __init__(self, path: Union[str, Path]):
        if path is not None:
            self.path = Path(path)
            try:
                self.load(self.path)
            except ValueError:
                self.path.mkdir(exist_ok=True, parents=True)


    def load(self, path: Optional[Union[str, Path]] = None):
        raise NotImplementedError


    def save(self, path: Optional[Union[str, Path]] = None):
        raise NotImplementedError

and require the subclasses to implement load() and save().

@fzeiser fzeiser self-requested a review August 17, 2020 12:17
@fzeiser
Copy link
Collaborator

fzeiser commented Aug 17, 2020

Thanks for implementing the abstract loadsaver. I guess we should then make Extractor (and the normalizers) subclasses of abstract_load_saver

@vetlewi
Copy link
Collaborator Author

vetlewi commented Aug 17, 2020

Thanks for implementing the abstract loadsaver. I guess we should then make Extractor (and the normalizers) subclasses of abstract_load_saver

I've now made Extractor a subclass of the abstract_load_saver:

The ensemble class will require some refactoring, while the normalizers already has something similar built in the abstract_normalizer, which could be a subclass of the abstract_load_saver.

I'm not sure I want to do that much refactoring... Need to do prioritize the actual analysis I'm working on :/

@vetlewi
Copy link
Collaborator Author

vetlewi commented Nov 5, 2020

I don't think I will be working much more on this PR anytime soon. Maybe just merge and then create an issue noting that other classes needs refactoring to use AbstractLoadSaver?
Or maybe not... I can always PR to a branch in the fork on my own GH when I need it... :p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load method for the Extractor class is missing
2 participants