Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if we run out of storage space while writing we should throw a useful error #246

Closed
mr-c opened this issue Dec 20, 2013 · 17 comments
Closed

Comments

@mr-c
Copy link
Contributor

mr-c commented Dec 20, 2013

No description provided.

@RamRS
Copy link
Contributor

RamRS commented Jan 23, 2014

Should we maybe abstract File IO into a stand-alone module?

@mr-c
Copy link
Contributor Author

mr-c commented Jan 23, 2014

That would be useful, yep!
On Jan 23, 2014 2:19 PM, "Ram RS" notifications@github.com wrote:

Should we maybe abstract File IO into a stand-alone module?


Reply to this email directly or view it on GitHubhttps://github.com//issues/246#issuecomment-33158174
.

@RamRS
Copy link
Contributor

RamRS commented Jan 23, 2014

Maybe create a checklist of functionalities to be abstracted?

  • Check file already exists before writing
  • Check file exists before input
  • Check disk space while writing; and
  • Handle other IO exceptions (the 3 above can have tailored actions, others
    can follow the generic error handling implemented right now)

Ram

On Thu, Jan 23, 2014 at 2:37 PM, Michael R. Crusoe <notifications@github.com

wrote:

That would be useful, yep!
On Jan 23, 2014 2:19 PM, "Ram RS" notifications@github.com wrote:

Should we maybe abstract File IO into a stand-alone module?


Reply to this email directly or view it on GitHub<
https://github.com/ged-lab/khmer/issues/246#issuecomment-33158174>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/246#issuecomment-33159985
.

@mr-c
Copy link
Contributor Author

mr-c commented Jan 23, 2014

There are a couple I/O related issues that should be worked on together.

#247 "don't infinite loop when reading truncated hash files"
#245 "if --savehash is specified then don't continue if there is not enough free disk space"
#147 "Verify input file existence on all Python scripts"
#135 "khmer tools should output intelligent error messages when fed empty files"
#117 " If loadhash is specified in e.g. normalize-by-median, don't complain about default hashsize parameters"
#87 "khmer should be graceful with respect to errors while processing multiple files"

@RamRS
Copy link
Contributor

RamRS commented Jan 23, 2014

Thank you. This helps a lot.

I'll prepare signatures for generic read-write functions. These should
ideally perform checks before calling screed methods and the final return
from an API input method should be a screed-ish? reference.

Ram

On Thu, Jan 23, 2014 at 4:34 PM, Michael R. Crusoe <notifications@github.com

wrote:

There are a couple I/O related issues that should be worked on together.

#247 #247 "don't infinite loop
when reading truncated hash files"
#245 #245 "if --savehash is
specified then don't continue if there is not enough free disk space"
#147 #147 "Verify input file
existence on all Python scripts"
#135 #135 "khmer tools should
output intelligent error messages when fed empty files"
#117 #117 " If loadhash is
specified in e.g. normalize-by-median, don't complain about default
hashsize parameters"
#87 #87 "khmer should be
graceful with respect to errors while processing multiple files"


Reply to this email directly or view it on GitHubhttps://github.com//issues/246#issuecomment-33171075
.

@RamRS
Copy link
Contributor

RamRS commented Jan 23, 2014

Can psutil (https://code.google.com/p/psutil/) be used to check disk space or do we have other standards that we already use?

@mr-c
Copy link
Contributor Author

mr-c commented Jan 23, 2014

We only support Linux and OS X so the os.statvfs function is good enough.
No need to add a dependency for cross platform compatibility here.

http://docs.python.org/2/library/os.html#os.statvfs
On Jan 23, 2014 6:15 PM, "Ram RS" notifications@github.com wrote:

Can psutil (https://code.google.com/p/psutil/) be used to check disk
space or do we have other standards that we already use?

Reply to this email directly or view it on GitHubhttps://github.com//issues/246#issuecomment-33180412
.

@RamRS
Copy link
Contributor

RamRS commented Jan 23, 2014

Cool. Next up - how do we estimate python object size (to compare with free disk space) - can we use this piece of code? http://code.activestate.com/recipes/546530/

@RamRS
Copy link
Contributor

RamRS commented Jan 23, 2014

Also, #247 , #117 and #87 are not really errors with generic handling options - we need to handle these case-by-case, so I'll exclude logic for them, at least initially.

@mr-c
Copy link
Contributor Author

mr-c commented Jan 23, 2014

We don't, not directly. Since the whole point of khmer is to operate on a
streaming basis the entire output file isn't in all memory at the same time.

(Also that code is huge and complex. Using it would be a great burden to
the project as we would have to grok and support it. The benefit would not
outweigh the cost.)

The exception to this are writing out the hashtables to disk. Those we can
compute their size prior to their saving to disk or even their creation.

We could issue a warning based upon the input file size. It seems risky to
run a tool such as any of ours if one doesn't have at least as much free
storage space as the input files.
On Jan 23, 2014 6:42 PM, "Ram RS" notifications@github.com wrote:

Cool. Next up - how do we estimate python object size (to compare with
free disk space) - can we use this piece of code?
http://code.activestate.com/recipes/546530/


Reply to this email directly or view it on GitHubhttps://github.com//issues/246#issuecomment-33182329
.

@RamRS
Copy link
Contributor

RamRS commented Jan 24, 2014

That is a reasonable (not to mention amazing) way to approximate required space, @mr-c ! Will do.

As of now, the methods in the new API are:

FileStatus check_file(fullFilePath) # Checks file existence and emptiness, FileStatus is an enum --> FileStatus.Exists, FileStatus.Empty, FileStatus.NonExistent - to avoid confusion in return values and not use integers

long check-space(inFile) # uses size of input file and os.statvfs to check if we have enough disk space to write output. Else, returns a size that can be used to delete files and make space

@RamRS RamRS mentioned this issue Jan 24, 2014
4 tasks
@RamRS
Copy link
Contributor

RamRS commented Jan 26, 2014

@mr-c How do we determine the size of the hashtable? I'm unable to understand the C++ part of the implementation :(

@ctb
Copy link
Member

ctb commented Jan 26, 2014

Both normalize-by-median and load-into-counting output messages with the
size. It's basically number of hashtables * size of hashtables.

--t

On Sat, Jan 25, 2014 at 04:22:00PM -0800, Ram RS wrote:

@mr-c How do we determine the size of the hashtable? I'm unable to understand the C++ part of the implementation :(


Reply to this email directly or view it on GitHub:

#246 (comment)

C. Titus Brown, ctb@msu.edu

@RamRS
Copy link
Contributor

RamRS commented Jan 26, 2014

Got it. And should I double check to make sure that any script creating a ht afresh does indeed consume CMD line args via build_construct_args() and not directly via add_argument()?

@camillescott
Copy link
Member

@RamRS, that is being addressed in #283

@RamRS
Copy link
Contributor

RamRS commented Jan 30, 2014

Oh, Ok. Thanks!

@mr-c mr-c added this to the 1.0 release milestone Feb 28, 2014
@mr-c mr-c modified the milestones: 1.1+ Release, 1.0 release Apr 2, 2014
@mr-c
Copy link
Contributor Author

mr-c commented May 30, 2014

Fixed by #333, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants