You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the Datafile resources only record locations on the current filesystem, from which (or to which) files can be read (or written) in the user's analysis code.
Providing a set of methods, or a class inheritance so that the Datafile can be used as a context manager for opening the file itself, could provide a powerful way of easing the creation of results file.
Here as a function written toward achieving that (feature presently shelved as it's tricky to get right):
defget_local_path_prefix_from_fp(fp, path=None):
""" Handles extraction of path and local_path_prefix from a file-like object, with checking around a bunch of edge cases. Useful when you have a file-like object you've created during an analysis, to find the local_path_prefix you need to create a datafile: my_file = 'a/file/to/put/analysis/results.in' with open(my_file) as fp: # ... # Write stuff to file # ... # Create datafile path, local_path_prefix = get_local_path_prefix_from_fp(fp, path=my_file) Datafile(path=path, local_path_prefix=local_path_prefix) """# TODO Revamp to use path-likes properly instead of managing strings# Allow file-likes or class (like the tempfile classes) that wrap file-likes with a .file attributeinstance_check=isinstance(fp.file, io.IOBase) ifhasattr(fp, "file") elseisinstance(fp, io.IOBase)
if (notinstance_check) or (nothasattr(fp, "name")):
raiseInvalidFilePointerException("'fp' must be a file-like object with a 'name' attribute")
# Allow `path` to define what portion of the file path is considered a local prefix and what portion is# considered to be this file's path within a datasetfp_name=str(fp.name) # Allows use of temporary files, whose name might be interpreted as an integer (sigh!).# If path not given, use the filename onlyifpathisnotNone:
path=fp_name.split("/\\")[-1]
# Remove any directory prefix on the path, which should always be relativepath=path.lstrip("\\/")
# Check that the path given actually properly matches the end of the real location on discifnotfp_name.endswith(path):
raiseInvalidInputException(f"'path' ({path}) must match the end of the file path on disc ({fp_name}).")
# Check that the path given is a whole portion# TODO this could be tidier. Split both paths and iterate back from the filename up the directory tree,# checking at each step that things matchlocal_path_prefix=utils.strip_from_end(fp_name, path.strip("\\/"))
iflen(local_path_prefix) >0andnotlocal_path_prefix.endswith(("\\", "/")):
raiseInvalidInputException(f"The 'path' provided ({path}) is not a valid portion of the file path ({fp_name})")
returnpath, local_path_prefix
Here are some test cases for it:
deftest_with_temporary_file(self):
""" Ensures that a datafile can be created using an un-named temporary file. """withTemporaryFile() asfp:
df=Datafile(fp=fp)
self.assertEqual('', df.extension)
deftest_with_named_temporary_file(self):
""" Ensures that if a user creates a namedTemporaryFile and shoves data into it, they can create a Datafile from it which picks up the name successfully """withNamedTemporaryFile(suffix='.csv') asfp:
df=Datafile(fp=fp)
self.assertEqual('csv', df.extension)
deftest_with_fp_and_conflicting_name(self):
""" Ensures that a conflicting name won't work if instantiating a file pointer """withNamedTemporaryFile(suffix='/me.csv') asfp:
# temp_name = fp.name.split('/\\')[-1].split('.')[0]withself.assertRaises(exceptions.InvalidInputException):
Datafile(fp=fp, name=f'some_other_name.and_extension')
deftest_with_fp_and_correct_name(self):
""" Ensures that a matching name will correctly split the file name and local path """withNamedTemporaryFile(suffix='.csv') asfp:
temp_name=fp.name.split('/\\')[-1]
print(temp_name)
df=Datafile(fp=fp, path=temp_name)
self.assertEqual(temp_name, df.full_path)
self.assertEqual(fp.name, df.full_path)
The text was updated successfully, but these errors were encountered:
Currently, the
Datafile
resources only record locations on the current filesystem, from which (or to which) files can be read (or written) in the user's analysis code.Providing a set of methods, or a class inheritance so that the Datafile can be used as a context manager for opening the file itself, could provide a powerful way of easing the creation of results file.
Something like:
or (less desirable as it's not a standard pattern but far easier to implement)
Would be more elegant than
Even better, being able to use NamedTemporary files and similar could be useful to avoid hassle in garbage collection:
Here as a function written toward achieving that (feature presently shelved as it's tricky to get right):
Here are some test cases for it:
The text was updated successfully, but these errors were encountered: