Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileType, Metadata: File-like object support #1

Closed
lazka opened this issue Jul 4, 2014 · 36 comments
Closed

FileType, Metadata: File-like object support #1

lazka opened this issue Jul 4, 2014 · 36 comments
Labels

Comments

@lazka
Copy link
Member

@lazka lazka commented Jul 4, 2014

Originally reported by: Christoph Reiter (Bitbucket: lazka, GitHub: lazka)


From joe.wreschnig@gmail.com on June 15, 2009 07:49:19

FileType and Metadata subclasses should support loading from file-like
objects as well as filenames. We can probably restrict this to FLOs that
support random access.

Original issue: http://code.google.com/p/mutagen/issues/detail?id=1


@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on June 15, 2009 21:40:22

Labels: -Priority-Medium Priority-Low

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on June 16, 2009 11:14:31

Owner: ---

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on June 16, 2009 11:14:17

Status: New

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on July 19, 2009 19:48:40

Summary: FileType, Metadata: File-like object support
Status: Accepted

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 17, 2011 16:52:26

we could also rename the first argument to be something more generic, like filething, add a filename kwarg in case someone was being extra explicit, and simply copy filename onto filething if filething was None. This requires a few more code changes and is generally ugly, but maintains api compatibility while adding file like object support. 

also if passed a file object I don't think mutagen should take it upon it self to close said object, but I could be wrong, thoughts?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 17, 2011 14:58:02

I am working on this as part of adding a audio field type to django.  The two options I see is to overload the filename variable and allow the passing of a file like object, maintaining api compatibility in the process, OR  adding a fileobj argument to a slew of functions. 

Both require factoring out some of the file handling functions. Do you have a prefrence?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 18, 2011 11:55:19

Sounds good. I already have some code, I will finish putting together the id3 and easy id3 file along the lines of what we discussed and email you the patch, that way we can hash anything unexpected out before we bother with the rest. 

What are your thoughts on subclassing the test case objects and overriding setUp to run each the tests against each method of file handling?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 17, 2011 23:48:57

I think the approach of using kwargs with the first one being some autodetect magic is probably best.

def __init__(self, filething=None, filename=None, fileobj=None): ...

Where exactly one must be set; filename is always assumed to be a string, fileobj is always assumed to be an FLO, and if neither of those is set filething is type-detected by looking for a seek method. With this signature we can also tell people to start calling constructors with the form OggVorbis(filename=...) to future-proof their code.

Mutagen probably shouldn't close the object, but I think it should at least flush it (if it supports flushing).
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 18, 2011 12:07:06

Also It seems like we are going to have the same code in quite a few functions, I am thinking of breaking filething/filename/fileobj disambiguation out to a decorator. The downside I see to that is that is makes the api less visible when looking at a function, but i rather have comments repeated than code. At the worst folks use filething = something and it works for them.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 18, 2011 12:25:22

I don't think we want a decorator.

def getfileobj(filething, filename, fileobj, mode="rb"):
    if != one set: raise ArgumentError
    elif fileobj: return fileobj # could even early-fail on a bad mode
    elif filename: return open(filename, mode)
    else: ... # filething type detection

def load(self, filething=None, filename=None, fileobj=None):
    fileobj = _util.getfileobj(filething, filename, fileobj, "rb")

It's still a single line of code and it's more explicit.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 18, 2011 12:18:21

I'm not sure that's necessary or sensible. There's only going to be a handful of places where we do a filename/FLO conversion. Simple coverage checks can make sure we handle this right.

The hard part is making sure we're prepared for non-disk FLOs once they're in the pipeline. If we're going to support this it should be supported as much as possible within reason, otherwise there's no point.

So my focus would be on what we need to accurately mock strange FLOs like FIFOs and sockets. We probably want:
 * Fully seekable/readable/writable within POSIX restrictions. This is basically disk files, so they should be supported now.
 * Linear read-only, like sockets. We should definitely be able to read tags from these. This might require some changes to the parsers for some of the more structured formats. I know we seek like crazy around MP3s, but it's all for data I'm comfortable not filling in for such streams anyway.
 * For the underlying tag formats, it's probably worthwhile to support linear write-only FLOs. So you could send an APEv2 or ID3v2 tag alone over a stream.

Please attach patches here rather than emailing them to me.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 18, 2011 13:51:57

Well, with all that in mind, one thought that comes to mind is to implement a file object proxy class that takes the filething, filename and fileobj as arguments to its constructor. If the plan is to support flo's that don't support seek natively, I rather deal with the file code as a unit rather than in all the places it is touched in each format.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 18, 2011 13:45:07

Clearly if we intend on supporting flo's without seek then then a lot more would need to be done. Perhaps a file object abstraction class that slurps the whole file into memory if its doesnt support seek? 

here is basically what I have now. 

The idea being to wrap the file.open file.close etc functions to make it do the right thing under the circumstances.

Attachment: patch

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 18, 2011 13:54:28

What's the advantage of supporting FLOs if we only support seekable ones? How many FLOs come from seekable sources that aren't files we could request by filename?

What's the advantage of making a new proxy type? What would it do, throw an (AttributeError, IOError) when trying to seek? But that's exactly what would happen now, so making a new type doesn't gain anything.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 18, 2011 14:01:53

its thrown together and would get pep 8 ified later.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 18, 2011 13:58:13

I think your patch doesn't understand the point. We shouldn't be passing around a filething-or-filename-or-fileobj to anything but __init__, load, and save in public classes. And the first thing those functions should do is turn it into a fileobj.

Also, I don't know if that patch is indicative of the rest of what you worked on stylistically, but there's no way I'd consider merging a patch for something like this unless it follows the majority of PEP-8.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 19, 2011 05:10:38

Storing real audio files in StringIOs is a horrible idea. It'll eat memory very quickly - MP3 and Vorbis files could easily be tens to hundreds of megabytes, and FLACs over a gigabyte. Emulating seek faces the same problem. We shouldn't create situations where we end up allocating hundreds of megabytes of memory just to parse out some tags.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on May 18, 2011 18:03:01

a example of a flo that supports seek is anything based on stringio, like the file objects returned by the django storage module for amazon s3.

A reason for a file like object proxy class would be to emulate seek on flo's that don't support it, so we wouldn't have to rewrite large chunks of the format specific code just to be able to read tags from a flo that doesn't support seek.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reacocard on May 19, 2011 15:33:15

Another example of a seekable FLO that isn't local are files stored remotely and accessed over SMB/SFTP via GVFS/GIO.  While GIO doesn't provide a native FLO, creating a wrapper to make it act like one should be trivial.  This in particular is a case we want to be able to support in Exaile, and is in fact why I opened this request way back on a prior bugtracker.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reacocard on May 22, 2011 10:40:29

FUSE is just a compat layer for applications that are not GIO-aware. The primary method of access is via the native gio.File and associated APIs. http://developer.gnome.org/gio/stable/ch01.html
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 22, 2011 07:39:40

I know that such FLOs used to common, but I was under the impression GVFS/GIO used FUSE almost exclusively these days, and the application-level support was a legacy of GNOME-VFS, which is dead/dying (thankfully).

If that's _not_ the case, that's a considerably more useful thing to support than plans to slurp hundreds of megabytes into StringIOs.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on May 22, 2011 11:57:50

But it's available, right? Is there any actual downside to using it or is it just GNOME developers being stupidly parochial again?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From rb.p...@gmail.com on August 02, 2011 12:07:38

Hi 
I am also interested on using a filelike object. Because for moin-2.0 I have that already while uploading an item. 

I don't nderstand currently why there are 3 params in that comment 12 defined
def getfileobj(filething, filename, fileobj, mode="rb"):

Why isn't isinstance used and only one of them defined?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reacocard on May 23, 2011 08:16:11

It adds an extra level of indirection (read: it's slower) and if FUSE isn't available for whatever reason, then the FUSE compat layer obviously doesn't work at all.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From rb.p...@gmail.com on August 02, 2011 12:16:11

thanks for telling, I'll look at it
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joshua.b...@gmail.com on August 02, 2011 12:13:15

I gave up on trying to make this happen in the main repo, as it sounds like the maintainer doesn't actually want to support file like objects. I wrote support for a few filetypes and tossed it up in github. If you have a better plan for how to do it, I gladly accept patches. If the maintainer wants to merge it I am happy to work with him, but I have better things to do than argue if anyone needs the feature. https://github.com/Jbonnett/Mutagen-flo
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on August 11, 2011 01:58:45

@Joshua: I'm not against it, I just don't want to do a halfassed job of it. I never got anything resembling an acceptable patch back from you.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From andrefcruz on September 24, 2012 04:33:59

I'm also in favor of filelike object support. In my case I have StringIOs of files (they are all <20MB) and I am trying to avoid disk IO if possible. It would be a shame if I had to dump the files to disk only to obtain their artwork. I have the filename and a StringIO of the file content, so support for this would be awesome.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From jay...@interlaced.org on June 22, 2012 05:48:37

Has there been any forward momentum on this in the past 10 or so months?  I'm finally biting the bullet and tacking on some code to hopefully read/write AIFF-embedded ID3 tags, and I'm kind of surprised to see that mutagen doesn't handle file-like objects.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From jschr...@gmail.com on September 03, 2013 20:52:00

For a web project I'd like to extract metadata from audio files stored in MongoDB (or gridfs to be more precise). And while they do have seek(), the objects operate on certainly aren't files and the open semantics work quite different.

So, is there any progress on this? Because writing the files to a temporary named file and extracting metadata and then copying the data to MongoDB is quite a waste.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Oct 23, 2014

Original comment by André Cruz (Bitbucket: EDevil, GitHub: EDevil):


Any progress on this?

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Oct 29, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


No, sorry.

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jun 25, 2015

Original comment by jbonnett_ (Bitbucket: jbonnett_, GitHub: Unknown):


Has the maintainer changed? I don't want to have to argue against "joe.wreschnig@gmail.com" about the validity of this feature again, but I would be happy to work on a patch if given the go ahead from someone who can merge it.

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 16, 2015

Original comment by Geoffrey Lalonde (Bitbucket: lalonde, GitHub: lalonde):


It would be pretty sweet to see this. Some dude made a concerted effort over here: https://bitbucket.org/rhaps0dy/mutagen/commits/all

@lazka lazka removed the trivial label Apr 27, 2016
lazka added a commit that referenced this issue Jun 16, 2016
There are now three ways to pass a file. First pass a file path or
file object as the first arg and mutagen will try to do the right thing.
The other two are passing a file name using the "filename" kwarg
or a file object using the "fileobj" kwarg.

Required methods are read/seek/tell. Only StringIO/BytesIO is tested.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jun 21, 2016

The required interface is now specified in the docs: https://mutagen.readthedocs.io/en/latest/user/filelike.html

The next step I have in mind is trying to write a wrapper for Gio.File streams. For the writing part we will probably need write(), fileno() and truncate()

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jun 25, 2016

All fixed in master now.

@lazka lazka closed this Jun 25, 2016
lazka pushed a commit that referenced this issue Nov 8, 2018
Feature/riff wave merge phw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.