-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using same file path for input & output #27
Comments
I see your point, but my personal preference is to avoid having
and if you no longer care about |
The other option is to put all your input files and output files in separate directories to allow them to have the same basename. This is what I usually do, fwiw. |
Ehm, well, I'm of course aware that I can do this, but that's not my use case ;) My use case requires overwriting the same file. Saving the output file into a separate directory is exactly what the snippet I included in my first post does (and then copies it over and overwrites the input file). But I feel like that's besides the point - regardless of whether one prefers to keep their input/output files separate or not, right now the user can provide the same filepath for input/output and pysox will spit out an invalid audio file without any warning, which is undocumented behavior and, I think, undesirable. |
this is an easy enough thing to do with tempfiles: https://github.com/ejhumphrey/claudio/blob/master/claudio/sox.py#L131 just make sure if you use mkstmp, you explicitly close the file ID. this is one of those cases where python3 + context managers is way shinier than py2: https://docs.python.org/3/library/tempfile.html#examples |
forgot to mention ... would be an easy enough PR 😉 |
@justinsalamon totally agree that pysox should throw an error if |
fwiw, I am in favor of a call signature like |
@ejhumphrey I know I can use tempfiles, it's in my first comment on this issue ;)
@rabitt yes, the main issue I was bringing up is that the creation of an empty file is bad. However, my preferred solution would be for pysox to use tempfiles (as per my example) to basically yes support using the same path for input/output. An alternative solution is what you propose (raise an error), which would be better than the current behavior, though I don't see why you wouldn't want to support overwriting the input file - there are plenty of tools out there (e.g. shutil, librosa) that happily overwrite files and it's a fairly common use case. |
@justinsalamon you know i don't know how to read |
agreed
In a library like this the main paradigm is creating outputs by modifying inputs. Other types of libraries might do something like "run process, create a new text file" and allow you to overwrite said text file. In this case, you can still "reproduce" the results because the original information is preserved. However in this kind of library, if you allow the input file to be overwritten, if you run the same process twice you get different results. That's bad for reproducibility, which is why I'm in favor of the "put input and output files in different directories" approach. Because of ^^rant, I prefer to throw an error, rather than use the tempfile approach. |
Well, you could argue that reproducibility means being able to repeat an experiment as performed and obtain the same results. If you start with the same dataset, then as long as the codebase is the same, you will obtain the same results, even if part of the process means overwriting some of the data. So I don't necessarily see this an an issue for reproducibility. The bigger point though, I think, is that you're enforcing a certain behavior on the user when I think that decision should be left to the user. Of course you don't want to support the user doing something that's blatantly wrong, but I'm not sure overwriting files falls into that bucket. As a concrete example, for scaper I'm generating new audio files, but because pysox always involves writing to disk, I end up performing multiple steps of file io all to produce a single final result. There's no real reason for me to save each intermediate step to a separate file nor keep it on the file system once the process is over. So basically if pysox doesn't support using the same file for input/output, I'll just have to wrap it in code that implements this logic. Anyway, it's your call and I won't insist futher, just wanted to put my 2c out there :) |
Sox doesn't support using the same file as both input and output - doing this will result in an empty, invalid audio file. While this is sox behavior and not pysox, it would be nice if pysox took care of this behind the scenes. Right now the user needs to worry about this logic themselves, e.g. like this:
Pysox does issue a warning when a file is about to be overwritten, which is even more confusing under this scenario since the user (who might be unfamiliar with the quirks of sox) has no reason to think that the overwritten file will be invalid.
The text was updated successfully, but these errors were encountered: