NEW: bruker's composite file (bcf) hypermap io (read-only) library #712

sem-geologist · 2015-08-27T10:52:32Z

The library with reader of bruker composite file (BCF) hypermaps and images.

TO DO:

header parser, SEM imagery, sum EDS spectra to numpy
pure python parser functions of elemental mapping to numpy (fuly functioning, but memory inefficient)
pure python parser with dynamic loading of file and optional dynamic zlib decompression (required to enable cython based parser)
the hyperspy abstract/maped functions of library (file_reader)
cython based implementation of Delphi/Bruker packed array fast parsing
modify the setup.py, to compile cython code
some minimal loading tests
integration to hyperspy tests

deleted some commented out remnants of the code

got rid of while loops, improved the functions to work better with many file holding files and older sfs containers with smaller chunk size. It probably would be more practical to use xrange in for loops, but numbers are not huge and range in python 2 will work efficient enough. However in future porting to python3 will need no changes.

fixed one typo, and fixed missing part for compression, where problem was reading file pointer table instead of beginning of the packed file, as well some types were compared not correctly.

…nctionality in one library

…/hyperspy into bcf_implementation Conflicts: hyperspy/misc/io/unsfs.py

sem-geologist · 2016-02-05T14:07:27Z

Why checking code quality takes so long?

francisco-dlp · 2016-02-05T14:16:21Z

This is a recurrent issue with landscapeio, but it is unfortunately out
of our control.

On Fri, 2016-02-05 at 06:07 -0800, Petras wrote:

Why checking code quality takes so long?
—
Reply to this email directly or view it on GitHub.

sem-geologist · 2016-02-05T15:37:02Z

The python code will not work on big endian machine... excatly between 563 to 581 lines, in function bin_to_numpy. Propositions how to fix it for big endian machines is highly apreciated. Cython code is not going to have this problem...
update: I think I fixed it, needs testing on big-endian machine...

sem-geologist · 2016-02-05T17:22:39Z

I remember I did asked half year ago about how to return both imagery and hyperspectral data with one file_reader function.... Did something changed in that field? I just want to remind, that bcf contains both sem imagery and hyperspectral data.

francisco-dlp · 2016-02-05T18:56:06Z

The emi file reader also returns several signals as a list. See https://github.com/hyperspy/hyperspy/blob/master/hyperspy/io_plugins/fei.py#L231

…g chunks by implementing generators, fixing the silent "try, except, pass" into explicit exception handlings preventing unintended blocking of real errors.

part in bin_to_numpy function to work also on big-endian machines

from string and casting array back to string so as it was read. The byteswap is needed, and that even simplifies the solution as both, big endian, and little endian machines have to swap it once.

…f bcf NEW: added downsample and cutoff abilities at parsing level. MOVED: removed old memory inefficient parsing functions. From now parser is method of BCF_reader class

sem-geologist · 2016-02-14T16:05:32Z

So in the end, now I have moved everything to classes and saw no slow down of excecution, contrary, I got rid of some parts and saw 1-2% improvement over previously heavy parsing functions siting outside of class. I think lanscape will complain about that function(now method of the BCF_parser class) like a hell for too many variables and complexity. The price of dividing it would be additional slowness (that was my experience on the first approuch, but calling function/method per pixel is expensive (not only calling, but also copying data for those functions)). In the end, I am not creating something new, the method have to parse binary data with so many quirks and whistles that to make it fast in python is nearly impossible... python is far from best too for doing that. Doing that with raw pointers would be much easier and faster (whats why I have the goal to put heavy parsing functions to cython). My profiling showed that the highest time spend is in numpy. Numpy is efficient in reading, saving, doing vectorised math on whole array, but it is very slow in doing element by element actions, what is the case in parsing bruker data.
My plan is to create the final numpy array (zeros), then pass it to cython as memoryview, where accesing and changing the array elements is orders of magnitude faster (what my experimental code do). I really would apreciate if somebody would look through my code and would suggest if it should be changed somehow. I tested it with single bcf. The largest map I have (where non compressed saved as ripple is ~4GB) is parsed in ~6-7minutes (on my overheating amd Turion II 2100Mhz laptop processor). I expect better results on more powerful machines. However as hyperspy main aim is FIB based slicing hypermaps, it tends to be small and have low counts. I specifically thought about it and added downsampling, and energy cuttof arguments for parser. In that way high resolution imagery (Bruker beam controlling board and Esprit software is capable of doing 4096x4096 pixels) and hypermapping can be done simultaniously. Then imagery can be read in full resolution, and hypermap in sensible downsampled resolution. Downsampling as I tested gives no (<1%) overhead coming from the slow numpy inplace addition on selection:

numpy_array[i] += value  # this is 3 times slower
numpy_array[i] = value  # this is slow

so in example on downsample ratio of 2, the 2x2 pixels will be squeezed into 1. So, slow assignemet or "inplace" addition to numpy slice is done 4 time per pixel, but there is 4 time less pixels in downsampled array so this counts out. Downsampled hypermaps takes just 2 times less of memory, instead of 4 times. The problem is that python integer cant be added "inplace" to numpy slice with unsinged type. So the final arrays are signed (instead of uint8-> int16, instead of uint16->int32....).
Another optional atribute/feature 'the cut_off by energy' I think could be also very valuable. The Brukers Esprit software allows seting the energy range to these: 10, 20, 40, 80kV. I am not very experienced on TEM ( and if somebody have Bruker on TEM, the testing is highly appreciated of this library), but on SEM the most commonly 0-15kV range is most useful, as it lets resolve some overlaps (Y,Nb,Zr,P). The 15-20kV for us have little value as in case of our SEM acceleration voltage limits, or lack of elements in geological speciments for that region. In hypermaps with 20kV range that region normally for us contains nothing except the noise. Another situation I come up with is FIB slicing, where on some SEM the imaging is wished to be done on low voltages (3-7kV) (InLens Detector and ESB (energy selective BSE) mounted in the beam column). Cut off value also will be useful for stacking the hypermaps, where it can be set same for all hypermap files imported, so instead of setting all slices to one of the Esprit setted value (10, 20...) of channels, it can be set to some sane value saving 10-40% of memory.

There is some stuff which makes me woried. If I test my library in ipython:
1.create the instance of the BCF_parser class and assign it to variable
2. del instance, or do variable (which pointed to instance) = 0

the memory is not released and garbage collected... if I import gc, and do gc.collect() it gets then released. Probably I will have to use gc in the file_reader function to cope with this... It probably points to some badly designed references by me :/ which I can't find, and default gc behaviour do not clean the memory automaticaly. Or does this have something to do with ipython as I found reported in many places....?

of bcf insanely packed array depacking to numpy

vidartf · 2016-02-28T12:57:29Z

hyperspy/io_plugins/unbcf_fast.pyx

+    uint16_t val
+
+cdef struct Val32:
+    uint32_t val


Can you not use ctypedef uint32_t Val32?

I will try and see if that will work... I noticed some bugs in some not essential code in python code (night work...) and just updated...
I used structs as I find it much easier and cleaner to cast pointer to struct than to ctypedef'ed variables... those still will need to be be accesed by Val32[0] (probably, I just got rid of those stucts and '.val', and it do not compile anymore) notation instead of Val32.val, and casting getts complicated... I am very inexperienced with c and cython, and the syntax of python is quite mixing up... I will see how this will go

and metadata wrapping to the file_reader function.

francisco-dlp · 2016-07-12T10:20:14Z

Could you update this with the current master?

francisco-dlp · 2016-07-12T10:20:50Z

Let us know when this is ready to merge.

implementation to setup.py

sem-geologist · 2016-07-12T10:47:24Z

@francisco-dlp yesterday I looked throught bye bye record_by. While I renamed record_by which was used not as meant in the kwargs, it is used inside dictionary, which is passed to hyperspy loader. Can you confirm that it can stay in the dict, or should I change it to (is there some alternative/new api to tell the loader that this array is image and this array is eds signal raster/whatever?)

which accidentally slipped with merge with master

francisco-dlp · 2016-07-12T10:52:55Z

If your reader doesn't read anything signal_dimension > 2 or signal_dimension == 0 it can carry on using record_by internally. record_by is removed from metadata afterwards. This is the code. I took this approach because, although record_by is not general enough for hyperspy, it is general enough for most file formats. In this was we avoid the danger of introducing bugs by touching all the io readers...

sem-geologist · 2016-07-12T11:04:35Z

I dont get how much dimentions is what: image (BSE, SEI whatever, 1 channel) 2 dimentions (x, y)?
eds hypermap 3 dimmentions (x, y, Energy)? Aslo I can think 2 dimentions as crossection (x, Energy), one dimention - eds from one point (Energy). What is 0 dimentions? If I get this clear then my hypermaps is >2 and I am in the trouble... how to tell the loader what is what?

francisco-dlp · 2016-07-12T11:12:12Z

You got it right. 0 signal dimension would be an scalar e.g. mapped across a 2D space:

>>> scalar = hs.signals.BaseSignal(np.random.random((32, 32)))
>>> scalar.axes_manager.set_signal_dimension(0)
>>> scalar
<BaseSignal, title: , dimensions: (32, 32|)>

as this gives headache in tests and something changed in master what is hard to grasp with my little brain...

sem-geologist · 2016-07-12T13:05:30Z

@francisco-dlp I am very confused, tests cant pass throught on bcf and it complains about record_by. I got rid of any 'record_by' completely (also in dictionary which is passed to loader) and the test is still failing, What did changed in api and what do I need to change for it to work back?

francisco-dlp · 2016-07-12T14:16:26Z

Notice that #1127 has not been merged yet, that's probably the issue: you can't get rid of record_by without #1127.

sem-geologist · 2016-07-12T16:00:04Z

@francisco-dlp I see, it is not merged, however I tried to merge bcf implementation with #1127 on separate branch: the previously failing tests on record_by are gone... BUT it introduces the scale to be broken!

AssertionError: 0.010001 != 4.368392531e-05 within 12 places

which looks for me that axes got mixed again (instead of x or y scale it took the energy scale) (I had problem with this at the step of finalizing this io_plugin before)... :/ (maybe because it does not know what is what? as we don't tell it anymore...)

francisco-dlp · 2016-07-12T16:21:05Z

That makes sense. I think that for your reader it is ok to keep record_by as it doesn't need the more general approach. I would simply revert the latest commit.

to dictionary, fix test

francisco-dlp · 2016-07-13T07:31:26Z

This is still failing. We'll probably release HyperSpy 1.0 today or early tomorrow, any chance of fixing this before the release? If not, no worries, HyperSpy 1.1 will probably be released in a few weeks anyway given the number of almost ready PRs in the pipeline.

sem-geologist · 2016-07-13T07:39:18Z

does #1127 got merged?I tested this with it merged on separate branch (fix_bcf) and it works. if remove record_by will be merged on now then after i can udate this to master and it should stop throwing errors without any more tweeking. basicaly it is ready for review, as nothing is gona change

francisco-dlp · 2016-07-13T14:00:51Z

I just merged #1127.

francisco-dlp · 2016-07-13T14:02:16Z

Once Travis and Appveyor are satisfied ,I'll happily merge this.

…perspy-master

causing tests to fail

sem-geologist · 2016-07-13T14:30:59Z

sorry for that, I managed somehow to miss the places after coma in number comparison. So wasted testing cycles... especially then at this moment osx and appveyor is expensive. I should start wearing glasses...

sem-geologist · 2016-07-13T16:10:21Z

@francisco-dlp , the appveoyr and travis just passed :) . The landscape probabably is hanged.
Do not forget documentation #934, which I updated and merged with master today.

francisco-dlp · 2016-07-13T16:48:23Z

Reverse engineering this format is an amazing achievement. Thanks a lot for contributing your work to HyperSpy.

sem-geologist · 2016-07-13T16:52:19Z

Indeed, it gives a lot of satisfaction. I look now to any binary format with a bit of hope :D that one day... I will rule it also.

sem-geologist added 4 commits August 17, 2015 13:26

add sfs uncompressor library

c198833

update the unsfs.py

75ec53f

Clean up unsfs.py

65a36fb

deleted some commented out remnants of the code

Cleaning up unsfs.py: some doc updates, typos

72a14e2

francisco-dlp added the type: new feature label Aug 27, 2015

francisco-dlp added this to the 0.9 milestone Aug 27, 2015

sem-geologist added 2 commits August 27, 2015 23:20

compression fix

216c583

fixed one typo, and fixed missing part for compression, where problem was reading file pointer table instead of beginning of the packed file, as well some types were compared not correctly.

MartialD mentioned this pull request Jan 13, 2016

NEW: add very basic support for FEI Velox software EMD format #710

Closed

5 tasks

sem-geologist added 3 commits February 4, 2016 13:02

delete older function based unsfs implementation

c81b5bd

adding bcf.py with all (SFS unpacking and bruker hypermap parsing) fu…

f602f39

…nctionality in one library

Merge branch 'bcf_implementation' of https://github.com/sem-geologist…

10cfaaa

…/hyperspy into bcf_implementation Conflicts: hyperspy/misc/io/unsfs.py

sem-geologist added 7 commits February 11, 2016 12:02

ENCHANCE: some code and doc string cleanups, simplification of readin…

fe1b7e9

…g chunks by implementing generators, fixing the silent "try, except, pass" into explicit exception handlings preventing unintended blocking of real errors.

CLEANUP: some more clean-up's...

b950ba9

ENHANCE: fix some more doc strings, add iterative uncopressing

ac9e69b

ENHANCE: more of fixes so landscape would not complain too much

e53d78f

ENHANCMENT: added some tricks and redesigned the unpack12bit

832e839

part in bin_to_numpy function to work also on big-endian machines

BUG-FIX: it looks numpy dont cares about endianess dtype reading

41515c4

from string and casting array back to string so as it was read. The byteswap is needed, and that even simplifies the solution as both, big endian, and little endian machines have to swap it once.

IMPLIMENTATION: implemented dynamic pure python reading and parsing o…

22d8758

…f bcf NEW: added downsample and cutoff abilities at parsing level. MOVED: removed old memory inefficient parsing functions. From now parser is method of BCF_reader class

ADD: the fully functional cython implementation

5385fb2

of bcf insanely packed array depacking to numpy

sem-geologist changed the title ~~Bcf implementation~~ NEW: bruker's composite file (bcf) hypermap io (read-only) library Feb 28, 2016

vidartf mentioned this pull request Feb 28, 2016

Would anybody be interested in the bruker's bcf format read-only functionality #597

Closed

vidartf reviewed Feb 28, 2016
View reviewed changes

some minor fixes in the python code and sem imagery

8882172

and metadata wrapping to the file_reader function.

sem-geologist added 2 commits July 12, 2016 12:32

Merge branch 'master' into bcf_implementation

463112b

add: compilation of bcf fast

26f8f2b

implementation to setup.py

delete python declaration in .travis

5140b09

which accidentally slipped with merge with master

try: getting rid completely of record_by

289074b

as this gives headache in tests and something changed in master what is hard to grasp with my little brain...

fix: bring back the record_by

b91f5e6

to dictionary, fix test

sem-geologist added 3 commits July 13, 2016 16:10

Merge branch 'master' of https://github.com/hyperspy/hyperspy into hy…

14da5e6

…perspy-master

Merge branch 'hyperspy-master' into bcf_implementation

e9d0e2b

fix: overlooked number of digits before point

d556ddd

causing tests to fail

francisco-dlp removed status: needs review status: waiting for author labels Jul 13, 2016

francisco-dlp merged commit c8228c7 into hyperspy:master Jul 13, 2016

sem-geologist deleted the bcf_implementation branch March 4, 2017 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW: bruker's composite file (bcf) hypermap io (read-only) library #712

NEW: bruker's composite file (bcf) hypermap io (read-only) library #712

sem-geologist commented Aug 27, 2015 •

edited

sem-geologist commented Feb 5, 2016

francisco-dlp commented Feb 5, 2016

sem-geologist commented Feb 5, 2016

sem-geologist commented Feb 5, 2016

francisco-dlp commented Feb 5, 2016

sem-geologist commented Feb 14, 2016

vidartf Feb 28, 2016

sem-geologist Feb 28, 2016

francisco-dlp commented Jul 12, 2016

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016 •

edited

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016 •

edited

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016 •

edited

francisco-dlp commented Jul 12, 2016

francisco-dlp commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

francisco-dlp commented Jul 13, 2016

francisco-dlp commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

francisco-dlp commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

NEW: bruker's composite file (bcf) hypermap io (read-only) library #712

NEW: bruker's composite file (bcf) hypermap io (read-only) library #712

Conversation

sem-geologist commented Aug 27, 2015 • edited

sem-geologist commented Feb 5, 2016

francisco-dlp commented Feb 5, 2016

sem-geologist commented Feb 5, 2016

sem-geologist commented Feb 5, 2016

francisco-dlp commented Feb 5, 2016

sem-geologist commented Feb 14, 2016

vidartf Feb 28, 2016

Choose a reason for hiding this comment

sem-geologist Feb 28, 2016

Choose a reason for hiding this comment

francisco-dlp commented Jul 12, 2016

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016 • edited

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016 • edited

francisco-dlp commented Jul 12, 2016

sem-geologist commented Jul 12, 2016 • edited

francisco-dlp commented Jul 12, 2016

francisco-dlp commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

francisco-dlp commented Jul 13, 2016

francisco-dlp commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

francisco-dlp commented Jul 13, 2016

sem-geologist commented Jul 13, 2016

sem-geologist commented Aug 27, 2015 •

edited

sem-geologist commented Jul 12, 2016 •

edited

sem-geologist commented Jul 12, 2016 •

edited

sem-geologist commented Jul 12, 2016 •

edited