Skip to content

Conversation

@DimitriPapadopoulos
Copy link
Contributor

@DimitriPapadopoulos DimitriPapadopoulos marked this pull request as draft December 26, 2021 22:38
@DimitriPapadopoulos DimitriPapadopoulos force-pushed the mktemp branch 2 times, most recently from da5235c to 3060389 Compare December 26, 2021 22:50
@martindurant
Copy link
Member

Please ping me when this is ready

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Dec 29, 2021

The error in tests is currently:


=================================== FAILURES ===================================
__________________________________ test_write __________________________________

    def test_write():
        tmp = str(tempfile.mkdtemp())
        fn = tmp + "afile"
        url = "simplecache::file://" + fn
        with fsspec.open(url, "wb") as f:
            f.write(b"hello")
>           assert fn not in f.name
E           TypeError: argument of type 'int' is not iterable

fsspec/implementations/tests/test_cached.py:127: TypeError

That's because open() is now called on the integer file descriptor returned by mkstemp() instead of the pathname returned by mktemp(). Hence f.name is this integer file descriptor, not a pathname.

@martindurant Is it important that f.name is a pathname? My understanding is that it is not important, as could be implied by the not in assert fn not in f.name.

@DimitriPapadopoulos DimitriPapadopoulos marked this pull request as ready for review December 29, 2021 10:56
@martindurant
Copy link
Member

  • fsspec.open I think we would like to indeed produce an object whose .name is the filename
  • for the specific test, the point is that the open file should not have the name of the final target, something that cannot be known from the numerical file descriptor alone

@DimitriPapadopoulos
Copy link
Contributor Author

The problem is that the proper way to create and use a temporary file is through the file descriptor returned by mkstemp(). I have to call open() with that file descriptor, not with a path-like object anymore. The resulting file object will not contain a path name.

As for the specific test, the point is indeed that the temporary file should not contain the name of the final target. Why? My guess is that it's for security reasons. When using mkstemp(), that is not a concern anymore.

@DimitriPapadopoulos
Copy link
Contributor Author

Perhaps I can use tempfile.NamedTemporaryFile instead. But again I'm wondering why end users would be interested in the name of a temporary file.

@martindurant
Copy link
Member

As for the specific test, the point is indeed that the temporary file should not contain the name of the final target. Why?

There are two uses for a local temporary file

  • when writing to local with autocommit=False, that you should have somewhere to write, but then decide on whether to "commit" the file to its final location or "discard" without affecting the target (don't clobber any file there). Often done in a transaction context
  • the case here, when you are explicitly writing to a local cache, to be flushed to the final location when done. This is normally used where the final destination is remote, so you could argue that a URL like "simplecache::memory://test-file" would be more realistic.

@martindurant
Copy link
Member

I'm wondering why end users

The user isn't interested, but we are testing to make sure that the right thing is happening

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Dec 29, 2021

If the end-user is not interested, I'm still wondering why we should expect the name of the temporary file in the name attribute of the returned file-like object. As for the test, it is obsolete if we use mkstemp() instead of mktemp(), because mkstemp() already does the right thing for us.

I have tried to use the higher-level tempfile.NamedTemporaryFile instead. I believe it should pass all tests, but note that self.fh will now be a bloated tempfile.NamedTemporaryFile instead of a simple file object. I am not sure if it is worth it.

Note that OpenFile does not have a documented name attribute.

@martindurant
Copy link
Member

Actually, OpenFile and AbstractBufferedFile have .path attributes, so maybe we can do both? It's not so important either way, you can decide on which version feels nicest.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Jan 2, 2022

Will try that. The nicest version is the simplest:

  • acknowledge that the name attribute of a file object created from mkstemp() will not necessarily be a str anymore, by construction,
  • hence LocalTempFile.__getattr__("name") will not return a str anymore,
        def __getattr__(self, item):
            return getattr(self.fh, item)
  • fix by adding a read-only name property to LocalTempFile, and initialize it from the second return value of mkstemp() – the absolute pathname of the temporary file.

@DimitriPapadopoulos DimitriPapadopoulos force-pushed the mktemp branch 2 times, most recently from f676a4b to ace7363 Compare January 2, 2022 20:34
@martindurant
Copy link
Member

OK, I think this works fine. Sorry for the long conversation around this!

@martindurant martindurant merged commit 704ed91 into fsspec:master Jan 4, 2022
@DimitriPapadopoulos DimitriPapadopoulos deleted the mktemp branch January 4, 2022 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants