-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: bruker's composite file (bcf) hypermap io (read-only) library #712
NEW: bruker's composite file (bcf) hypermap io (read-only) library #712
Conversation
deleted some commented out remnants of the code
got rid of while loops, improved the functions to work better with many file holding files and older sfs containers with smaller chunk size. It probably would be more practical to use xrange in for loops, but numbers are not huge and range in python 2 will work efficient enough. However in future porting to python3 will need no changes.
fixed one typo, and fixed missing part for compression, where problem was reading file pointer table instead of beginning of the packed file, as well some types were compared not correctly.
…nctionality in one library
…/hyperspy into bcf_implementation Conflicts: hyperspy/misc/io/unsfs.py
Why checking code quality takes so long? |
This is a recurrent issue with landscapeio, but it is unfortunately out On Fri, 2016-02-05 at 06:07 -0800, Petras wrote:
|
The python code will not work on big endian machine... excatly between 563 to 581 lines, in function bin_to_numpy. Propositions how to fix it for big endian machines is highly apreciated. Cython code is not going to have this problem... |
I remember I did asked half year ago about how to return both imagery and hyperspectral data with one file_reader function.... Did something changed in that field? I just want to remind, that bcf contains both sem imagery and hyperspectral data. |
The emi file reader also returns several signals as a list. See https://github.com/hyperspy/hyperspy/blob/master/hyperspy/io_plugins/fei.py#L231 |
…g chunks by implementing generators, fixing the silent "try, except, pass" into explicit exception handlings preventing unintended blocking of real errors.
part in bin_to_numpy function to work also on big-endian machines
from string and casting array back to string so as it was read. The byteswap is needed, and that even simplifies the solution as both, big endian, and little endian machines have to swap it once.
…f bcf NEW: added downsample and cutoff abilities at parsing level. MOVED: removed old memory inefficient parsing functions. From now parser is method of BCF_reader class
So in the end, now I have moved everything to classes and saw no slow down of excecution, contrary, I got rid of some parts and saw 1-2% improvement over previously heavy parsing functions siting outside of class. I think lanscape will complain about that function(now method of the BCF_parser class) like a hell for too many variables and complexity. The price of dividing it would be additional slowness (that was my experience on the first approuch, but calling function/method per pixel is expensive (not only calling, but also copying data for those functions)). In the end, I am not creating something new, the method have to parse binary data with so many quirks and whistles that to make it fast in python is nearly impossible... python is far from best too for doing that. Doing that with raw pointers would be much easier and faster (whats why I have the goal to put heavy parsing functions to cython). My profiling showed that the highest time spend is in numpy. Numpy is efficient in reading, saving, doing vectorised math on whole array, but it is very slow in doing element by element actions, what is the case in parsing bruker data. numpy_array[i] += value # this is 3 times slower
numpy_array[i] = value # this is slow so in example on downsample ratio of 2, the 2x2 pixels will be squeezed into 1. So, slow assignemet or "inplace" addition to numpy slice is done 4 time per pixel, but there is 4 time less pixels in downsampled array so this counts out. Downsampled hypermaps takes just 2 times less of memory, instead of 4 times. The problem is that python integer cant be added "inplace" to numpy slice with unsinged type. So the final arrays are signed (instead of uint8-> int16, instead of uint16->int32....). There is some stuff which makes me woried. If I test my library in ipython: the memory is not released and garbage collected... if I import gc, and do gc.collect() it gets then released. Probably I will have to use gc in the file_reader function to cope with this... It probably points to some badly designed references by me :/ which I can't find, and default gc behaviour do not clean the memory automaticaly. Or does this have something to do with ipython as I found reported in many places....? |
of bcf insanely packed array depacking to numpy
uint16_t val | ||
|
||
cdef struct Val32: | ||
uint32_t val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not use ctypedef uint32_t Val32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try and see if that will work... I noticed some bugs in some not essential code in python code (night work...) and just updated...
I used structs as I find it much easier and cleaner to cast pointer to struct than to ctypedef'ed variables... those still will need to be be accesed by Val32[0] (probably, I just got rid of those stucts and '.val', and it do not compile anymore) notation instead of Val32.val, and casting getts complicated... I am very inexperienced with c and cython, and the syntax of python is quite mixing up... I will see how this will go
and metadata wrapping to the file_reader function.
Could you update this with the current master? |
Let us know when this is ready to merge. |
implementation to setup.py
@francisco-dlp yesterday I looked throught |
which accidentally slipped with merge with master
If your reader doesn't read anything signal_dimension > 2 or signal_dimension == 0 it can carry on using record_by internally. |
I dont get how much dimentions is what: image (BSE, SEI whatever, 1 channel) 2 dimentions (x, y)? |
You got it right. 0 signal dimension would be an scalar e.g. mapped across a 2D space: >>> scalar = hs.signals.BaseSignal(np.random.random((32, 32)))
>>> scalar.axes_manager.set_signal_dimension(0)
>>> scalar
<BaseSignal, title: , dimensions: (32, 32|)> |
as this gives headache in tests and something changed in master what is hard to grasp with my little brain...
@francisco-dlp I am very confused, tests cant pass throught on bcf and it complains about record_by. I got rid of any 'record_by' completely (also in dictionary which is passed to loader) and the test is still failing, What did changed in api and what do I need to change for it to work back? |
@francisco-dlp I see, it is not merged, however I tried to merge bcf implementation with #1127 on separate branch: the previously failing tests on
which looks for me that axes got mixed again (instead of x or y scale it took the energy scale) (I had problem with this at the step of finalizing this io_plugin before)... :/ (maybe because it does not know what is what? as we don't tell it anymore...) |
That makes sense. I think that for your reader it is ok to keep record_by as it doesn't need the more general approach. I would simply revert the latest commit. |
to dictionary, fix test
This is still failing. We'll probably release HyperSpy 1.0 today or early tomorrow, any chance of fixing this before the release? If not, no worries, HyperSpy 1.1 will probably be released in a few weeks anyway given the number of almost ready PRs in the pipeline. |
does #1127 got merged?I tested this with it merged on separate branch (fix_bcf) and it works. if |
I just merged #1127. |
Once Travis and Appveyor are satisfied ,I'll happily merge this. |
sorry for that, I managed somehow to miss the places after coma in number comparison. So wasted testing cycles... especially then at this moment osx and appveyor is expensive. I should start wearing glasses... |
@francisco-dlp , the appveoyr and travis just passed :) . The |
Reverse engineering this format is an amazing achievement. Thanks a lot for contributing your work to HyperSpy. |
Indeed, it gives a lot of satisfaction. I look now to any binary format with a bit of hope :D that one day... I will rule it also. |
The library with reader of bruker composite file (BCF) hypermaps and images.
TO DO: