Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh lazy signal #1219

Merged
merged 155 commits into from Feb 12, 2017
Merged

Conversation

to266
Copy link
Contributor

@to266 to266 commented Jul 29, 2016

[resubmit of #1102 ]
Adds a new class of "Lazy" signals that only actually operate on data (and even access it) when required or explicitly told to do so.

The intended workflow is as follows:

  1. Load the data lazily.
  2. Perform calculations (will most likely be fast / instantaneous) or operations
  3. Save the result signal to a file. Then the operations are performed by dask, and results written to file.

One of the drawbacks is that the dask.array.Array (which all this PR relies upon) does not support slicing assignment (i.e. a[3] += 2 does not work, have to do something like a = da.concatenate(a[:3], a[3]+2, a[3:])), so not everything works as seamlessly.

Also, lazy signals mean plotting is slow (since it's calculated on requests), but any-size data should be feasible

NEW things

  • Lazy*Signal classes: mirror the normal signal classes, but perform the operations lazily.
  • nan*() methods: just like numpy.nanmax and others. Previously were missing...
  • "ONMF" decomposition: Online NMF implementation as per this paper by Zhao et.al. In particular it's the OPGD implementation in the paper. Not yet thoroughly tested
  • as_lazy() method: All signals now have as_lazy() method, which does the obvious. Will probably be used to convert any other signal into its lazy implementation.
  • compute() method: Lazy signals can be converted to conventional ones using this method. The better (and more realistic) workflow saves the end lazy signal to a file (since it presumably does not fit in memory all at once).

Behaviour Changes

  • CircleROI for LazySignal sets the elements outside the ROI to np.nan instead of using a masked array, since dask does not support masked arrays. Essentially the motivating factor for nan* methods.
  • Lazy loading: deprecated all memmap, mmap, load_to_memory and similar kwargs, and instead left only the new lazy kwarg. This means user does not have to pay attention to the format of the file, and still be able to load it lazily, as it's handled by the format reader.
  • Stacking : Majorly rewritten, can now perform the stacking lazily. Also supports stacking numbers (floats, integers, complex numbers) that get broadcasted as appropriate.
  • get_histogram: lazy implementations do not support knuth' and 'blocks' bins.
  • ragged argument in map: the ragged is necessary for lazy signals, but optional (i.e. can be determined automatically while running) for normal ones. If ragged, the results of the map are not assumed to be of similar shape or even numpy.arrays, and can be any python object.

Notable code changes (relevant for developers)

  • Rewrote a bunch of iterating algorithms to use signal.map, since it's generalized and works for both conventional and lazy signals.
  • misc.signal_tools.broadcast_signals added. Used in *nary functions, so no need for new tests.
  • lazifyTestClass decorator for test classes. Creates new test_lazy_* methods from all existing test_* methods, where any signal in the class is casted as lazy. Allows reusing most of the regular tests for lazy signals. Can overwrite other class attributes as well.

TODO:

  • stacking
  • most existing readers can load lazily
  • lazy/iterating decomposition tests (mainly O(R)NMF)
  • bfc lazy loading
  • developer guide update
  • "Big Data" chapter in the user guide
  • Holography signals

Tomas Ostasevicius and others added 30 commits June 17, 2016 21:13
@francisco-dlp
Copy link
Member

Also some words in the User Guide about what happens when mixing lazy and std signals (e.g. lazy + std) would be useful.

@to266
Copy link
Contributor Author

to266 commented Feb 9, 2017

Forgot that holography types were added, so now I have to make sure they work (and currently they don't for some reason). WIP

@to266 to266 removed the status: WIP label Feb 10, 2017
@to266 to266 mentioned this pull request Feb 11, 2017
@francisco-dlp francisco-dlp merged commit b0dbae6 into hyperspy:RELEASE_next_minor Feb 12, 2017
@thomasaarholt
Copy link
Contributor

roaring applause

@sem-geologist
Copy link
Contributor

yes, yes, yes :)

congratulations @to266 !

@magnunor
Copy link
Contributor

Very nice! :)

@vidartf
Copy link
Member

vidartf commented Feb 12, 2017

👏 🎉

@to266
Copy link
Contributor Author

to266 commented Feb 12, 2017

🥇 :DD

@tjof2 tjof2 mentioned this pull request Feb 14, 2017
10 tasks
@francisco-dlp francisco-dlp modified the milestone: v1.2 Mar 2, 2017
@to266 to266 deleted the ENH_lazy_signal branch March 2, 2017 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants