GitHub - feyeleanor/readorder: Readorder orders a list of files into a more effective read order.

feyeleanor / readorder Public

forked from copiousfreetime/readorder

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Readorder orders a list of files into a more effective read order.

copiousfreetime.rubyforge.org/readorder

ISC license

1 star 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bin		bin
lib		lib
spec		spec
tasks		tasks
.gitignore		.gitignore
HISTORY		HISTORY
LICENSE		LICENSE
README		README
Rakefile		Rakefile
TODO.taskpaper		TODO.taskpaper
gemspec.rb		gemspec.rb

Repository files navigation

== Readorder

* Homepage[http://copiousfreetime.rubyforge.org/readorder/]
* {Rubyforge Project}[http://rubyforge.org/projects/copiousfreetime/]
* email jeremy at copiousfreetime dot org
* git clone git://github.com/copiousfreetime/readorder.git

== DESCRIPTION
    
Readorder orders a list of files into a more effective read order.

You would possibly want to use readorder in a case where you know ahead
of time that you have a large quantity of files on disc to process.  You
can give that list off those files and it will report back to you the
order in which you should process them to make most effective use of
your disc I/O.

Given a list of filenames, either on the command line or via stdin,
readorder will output the filenames in an order that should increase 
the I/O throughput when the files corresponding to the filenames are
read off of disc.

The output order of the filenames can either be in inode order or
physical disc block order.  This is dependent upon operating system
support and permission level of the user running readorder.

== COMMANDS

=== Sort

Given a list of filenames, either on the command line or via stdin,
output the filenames in an order that should increase the I/O 
throughput when the contents files are read from disc.

==== Synopsis

  readorder sort [filelist*] [options]+

  filelist (-1 ~> filelist=#<IO:0x1277e4>) 
      The files containing filenames 
  --inode 
      Only use inode order do not attempt physical block order 
  --log-level=log-level (0 ~> log-level=info) 
      The verbosity of logging, one of [ debug, info, warn, error, fatal ] 
  --log-file=log-file (0 ~> log-file) 
      Log to this file instead of stderr 
  --output=output (0 ~> output) 
      Where to write the output 
  --error-filelist=error-filelist (0 ~> error-filelist) 
      Write all the files from the filelist that had errors to this file 
  --help, -h 

==== Example Output

=== Analyze

Take the list of filenames and output an analysis of the volume of
data in those files.

==== Synopsis

  readorder analyze [filelist*] [options]+

  filelist (-1 ~> filelist=#<IO:0x1277e4>) 
      The files containing filenames 
  --log-level=log-level (0 ~> log-level=info) 
      The verbosity of logging, one of [ debug, info, warn, error, fatal ] 
  --log-file=log-file (0 ~> log-file) 
      Log to this file instead of stderr 
  --output=output (0 ~> output) 
      Where to write the output 
  --error-filelist=error-filelist (0 ~> error-filelist) 
      Write all the files from the filelist that had errors to this file 
  --data-csv=data-csv (0 ~> data-csv) 
      Write the raw data collected to this csv file 
  --help, -h 

==== Example Output

=== Test

Give a list of filenames, either on the commandline or via stdin, 
take a random subsample of them and read all the contents of those
files in different orders.
  
* in initial given order
* in inode order
* in physical block order
  
Output a report of the various times take to read the files.
  
This command requires elevated priveleges to run.  It will purge your disc
cache multiple times while running,  and will spike the I/O of your machine.
Run with care.

==== Synopsis

  readorder test [filelist*] [options]+

  filelist (-1 ~> filelist=#<IO:0x1277e4>) 
      The files containing filenames 
  --percentage=percentage (0 ~> int(percentage)) 
      What random percentage of input files to select 
  --log-level=log-level (0 ~> log-level=info) 
      The verbosity of logging, one of [ debug, info, warn, error, fatal ] 
  --log-file=log-file (0 ~> log-file) 
      Log to this file instead of stderr 
  --error-filelist=error-filelist (0 ~> error-filelist) 
      Write all the files from the filelist that had errors to this file 
  --help, -h 

==== Example result

  
                              Test Using First Of                           
    ========================================================================

      Total files read :         8052
      Total bytes read :      6575824
      Minimum filesize :          637
      Average filesize :          816.670
      Maximum filesize :         1393
      Stddev of sizes  :           86.936

                      read order   Elapsed time (sec)  Read rate (bytes/sec)
    ------------------------------------------------------------------------
                  original_order              352.403              18659.944
                    inode_number               53.606             122669.175
     first_physical_block_number               47.520             138379.024

This is the output of a a <tt>readorder test</tt> command run on a directory on
a ReiserFS filesytem containing 805,038 files, constituting 657,543,700 bytes
of data.  A sample of 1% of the files was used for the test.

If we process them in their original order we can see that this will
potentially take us 9.78 hours.  If we process them in physical block number
order that is reduces to 1.31 hours.

== CREDITS

* Linux System Programming by Robert Love
* {readahead project}[https://fedorahosted.org/readahead/]

== ISC LICENSE

Copyright (c) 2009, Jeremy Hinegardner

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.