Skip to content

Sort results that are received one at a time, notify with sorted progress

License

Notifications You must be signed in to change notification settings

jeancroy/StreamSorter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#StreamSorter

Sort data that we receive item per item. (Online sorting) Ready to use in a event-based environment. (API organised as callback with closure to instance)

Usage

  #Setup your callback
  onSortedResults = (sortedList, lastResults) ->
       # Hi. I'm a callback and I like my result sorted
       # `sortedList` contain all the results so far & sorted

  # Setup sorter object with your callback
  # (here we add an optional key: (o) -> o.path)
  BatchSorter = require("BatchSorter")
  sorter = new BatchSorter(onSort: onSortedResults, key: "path")

  # Register the sorter with the data provider
  searcher.on 'results-found', sorter.insert
  
  searcher.search()

Two main algorithm

###Batch sorter

  • Best for speed / power usage
  • Best if the data producer spit a lot of item very fast, so you can afford to be informed of the process once in a while.
  • Best if the process produce data that are locally sorted in chunk
  • Similar in principles to Timsort
    • (Mix merge sort, binary search and insertion sort)

Items are inserted into a local batch which is then merged into the main sorted list. Similar to building a groceries list, then going to store once for all those items. This is more efficient that going to the store for a single item each time you realise you need something

  #Setup your callback
  onSortedResults = (sortedList, lastResults) ->
  # For BatchSorter `lastResults` is an array that contain all the elements added since last callback

###Insert sorter

  • Best for simplicity
  • Best if you need the sorted list for every single data produced
  • Use binary insertion sort.
  #Setup your callback
  onSortedResults = (sortedList, lastResult) ->
  # For InsertSorter `lastResult` is the last item added

End of batch ( apply to BatchSorter )

Batch end when

  • they reach a certain size or
  • a timeout delay elapsed since the batch start

This mean, in the following example we'll receive onSortedResults once after 100 items. Then the timeout of 50ms expire, and we receive the all 105 items sorted.

  # Default option for batch size
  BatchSorter = require("BatchSorter")
  sorter = new BatchSorter(batchSize:100, timeout: 50, onSort: onSortedResults)
  
  #Process that insert 105 items
  sorter.insert(i) for i in [0..105]

Manual end of batch

sorter.update()

Update now, no need to wait for timeout

  #Process that insert 105 items
  sorter.insert(i) for i in [0..105]
  
  #Don't want to wait, will call onSort
  sorter.update()

sorter.isDirty() can tell you if there's any pending results that would need update()

sorter.finish()

Like update, but will also reset data.

  # Process that insert 105 items
  sorter.insert(i) for i in [0..105]
  
  # Will call onSort then clean internal sortedList
  sorter.finish()
  
  # Process that insert new items
  sorter.insert(i) for i in [106..200]
  
  # Will call onSort but leave array values to add more items
  sorter.update()
  
  # Process that insert new items
  sorter.insert(i) for i in [201..200]
  

get Data

sorter.getData() will return an array with all result sorted. This make the usage of the callback optional (for example the benchmark don't use callbacks). For batch sorter there will be an implicit call to update()

Test

Jasmine is used to test spec of both sorter. Multiple level of test is included.

  • Basic spec test with synthetic data.
  • Overall behavior test with random strings and verify each callback.
  • Real data test (Benchmark will validate sort order against array.sort() )

Benchmark

We provide benchmark that show how the two StreamSorter compare against a single array.sort() at the end. Real life scenario is a bit different because we have to update the callback with sorted progress as we go. So in general we don't expect to beat a single sort after the fact.

But BatchSort is optimised for process that produces almost sorted data, and this is what we see in benchmark. Atom scandal is one of such process that produce item in sorted chunk, but chunk arrive out of order (also we only receive item one-per-one so we don't have the a priori notion of a chunk )

Benchmark.coffee

Main benchmark will test sort order on a recorded atom/scandal output. This is done to ensure repeatability and because file system provide alot of variability (hard to isolate only the effect of sort)

Benchmark-live.coffee

Live benchmark will use Atom Scandal to sort this folder. For this you need to install scandal. You can also install a bunch of other stuff to make more file in the benchmark. The file can be called with an argument to specify a path to scan (Default is project folder).

Benchmark-record.coffee

Record benchmark will produce the data file for main benchmark. The file can be called with an argument to specify a path to scan (Default is project folder).

Stats.coffee

Also in the benchmark folder, an interesting statistic / timing class. It was needed to figure out number of test needed to beat file system variability.

About

Sort results that are received one at a time, notify with sorted progress

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published