Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local file_like compatibility #3

Closed
thclark opened this issue Sep 18, 2020 · 0 comments · Fixed by #25
Closed

Local file_like compatibility #3

thclark opened this issue Sep 18, 2020 · 0 comments · Fixed by #25
Assignees
Labels
decision needed A decision is required (e.g. on UX or company policy) feature A new feature of the app

Comments

@thclark
Copy link
Contributor

thclark commented Sep 18, 2020

Currently, the Datafile resources only record locations on the current filesystem, from which (or to which) files can be read (or written) in the user's analysis code.

Providing a set of methods, or a class inheritance so that the Datafile can be used as a context manager for opening the file itself, could provide a powerful way of easing the creation of results file.

Something like:

df = Datafile(path='my_file.bmp')
with open(df, 'w') as fp:
  fp.write('data')

or (less desirable as it's not a standard pattern but far easier to implement)

df = Datafile(path='my_file.bmp')
with df.open('w') as fp:
  fp.write('data')

Would be more elegant than

df = Datafile(path='my_file.bmp')  # or getting it from the manifest
with open(df.full_name, 'w') as fp:
    fp.write(data)

Even better, being able to use NamedTemporary files and similar could be useful to avoid hassle in garbage collection:

with NamedTemporaryFile(suffix='.csv') as fp:
       df = Datafile(fp=fp)
       self.assertEqual('csv', df.extension)

Here as a function written toward achieving that (feature presently shelved as it's tricky to get right):

def get_local_path_prefix_from_fp(fp, path=None):
    """ Handles extraction of path and local_path_prefix from a file-like object, with checking around a bunch of edge
    cases.

    Useful when you have a file-like object you've created during an analysis, to find the local_path_prefix
    you need to create a datafile:
    my_file = 'a/file/to/put/analysis/results.in'
    with open(my_file) as fp:
        # ...
        # Write stuff to file
        # ...
        # Create datafile
        path, local_path_prefix = get_local_path_prefix_from_fp(fp, path=my_file)
        Datafile(path=path, local_path_prefix=local_path_prefix)
    """

    # TODO Revamp to use path-likes properly instead of managing strings

    # Allow file-likes or class (like the tempfile classes) that wrap file-likes with a .file attribute
    instance_check = isinstance(fp.file, io.IOBase) if hasattr(fp, "file") else isinstance(fp, io.IOBase)
    if (not instance_check) or (not hasattr(fp, "name")):
        raise InvalidFilePointerException("'fp' must be a file-like object with a 'name' attribute")

    # Allow `path` to define what portion of the file path is considered a local prefix and what portion is
    # considered to be this file's path within a dataset
    fp_name = str(fp.name)  # Allows use of temporary files, whose name might be interpreted as an integer (sigh!).

    # If path not given, use the filename only
    if path is not None:
        path = fp_name.split("/\\")[-1]

    # Remove any directory prefix on the path, which should always be relative
    path = path.lstrip("\\/")

    # Check that the path given actually properly matches the end of the real location on disc
    if not fp_name.endswith(path):
        raise InvalidInputException(f"'path' ({path}) must match the end of the file path on disc ({fp_name}).")

    # Check that the path given is a whole portion
    # TODO this could be tidier. Split both paths and iterate back from the filename up the directory tree,
    #  checking at each step that things match
    local_path_prefix = utils.strip_from_end(fp_name, path.strip("\\/"))
    if len(local_path_prefix) > 0 and not local_path_prefix.endswith(("\\", "/")):
        raise InvalidInputException(f"The 'path' provided ({path}) is not a valid portion of the file path ({fp_name})")

    return path, local_path_prefix

Here are some test cases for it:

    def test_with_temporary_file(self):
        """ Ensures that a datafile can be created using an un-named temporary file.
        """
        with TemporaryFile() as fp:
            df = Datafile(fp=fp)
            self.assertEqual('', df.extension)
    
    def test_with_named_temporary_file(self):
        """ Ensures that if a user creates a namedTemporaryFile and shoves data into it, they can create a Datafile from
        it which picks up the name successfully
        """
        with NamedTemporaryFile(suffix='.csv') as fp:
            df = Datafile(fp=fp)
            self.assertEqual('csv', df.extension)
    
    def test_with_fp_and_conflicting_name(self):
        """ Ensures that a conflicting name won't work if instantiating a file pointer
        """
        with NamedTemporaryFile(suffix='/me.csv') as fp:
            # temp_name = fp.name.split('/\\')[-1].split('.')[0]
            with self.assertRaises(exceptions.InvalidInputException):
                Datafile(fp=fp, name=f'some_other_name.and_extension')
    
    def test_with_fp_and_correct_name(self):
        """ Ensures that a matching name will correctly split the file name and local path
        """
        with NamedTemporaryFile(suffix='.csv') as fp:
            temp_name = fp.name.split('/\\')[-1]
            print(temp_name)
            df = Datafile(fp=fp, path=temp_name)
            self.assertEqual(temp_name, df.full_path)
            self.assertEqual(fp.name, df.full_path)
@thclark thclark self-assigned this Sep 18, 2020
@thclark thclark added experience (UX) feature A new feature of the app labels Sep 18, 2020
@thclark thclark added this to To do in Twined Ecosystem Roadmap via automation Sep 19, 2020
@thclark thclark moved this from To do to Triage in Twined Ecosystem Roadmap Sep 19, 2020
@thclark thclark added the decision needed A decision is required (e.g. on UX or company policy) label Sep 19, 2020
@thclark thclark linked a pull request Nov 20, 2020 that will close this issue
13 tasks
@thclark thclark mentioned this issue Nov 20, 2020
13 tasks
Twined Ecosystem Roadmap automation moved this from Triage to Done Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision needed A decision is required (e.g. on UX or company policy) feature A new feature of the app
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant