New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JEOL eds data plugins #2488
JEOL eds data plugins #2488
Conversation
This looks like a good start! |
Here you can find some files generated by JEOL Analysis software: "rawdata.ASW" is the main file. It contains all project file links but also some other information such as the working area width useful for image scale.
Example : import hyperspy.api as hs
s = hs.load("rawdata.ASW") # if you want do download all the files of the project
s_img = hs.load("Sample/00_View000/View000_0000000.img") # if you want do download the haadf image
s_map_Zn = hs.load("Sample/00_View000/View000_0000016.img") # if you want do download the Zn elemental map
s_eds = hs.load("Sample/00_View000/View000_0000017.pts") # if you want do download the eds datacube Hope it will help |
@sempicor Thanks for the addition; I think it will be great to support another microscopy file format in HyperSpy. By "tests", @ericpre meant actual pytest test methods, like are defined in our This is an example for the FEI formats: https://github.com/hyperspy/hyperspy/blob/RELEASE_next_minor/hyperspy/tests/io/test_fei.py These tests run automatically whenever new code is included, and ensure that we do not introduce regressions or other bugs. Please the the developer guide for more detail on writing test cases. We will not be able to include any new features until all the code added is covered by test cases to go along with it. |
Codecov Report
@@ Coverage Diff @@
## RELEASE_next_minor #2488 +/- ##
======================================================
+ Coverage 76.49% 76.66% +0.17%
======================================================
Files 202 203 +1
Lines 29808 30125 +317
Branches 6520 6567 +47
======================================================
+ Hits 22801 23095 +294
- Misses 5195 5205 +10
- Partials 1812 1825 +13
Continue to review full report at Codecov.
|
Thanks for your reply. I started to write some tests for loading all jeol project or individual image file or eds datacube. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sempicor, nice to see that you sorted out the tests! A few comments for now:
- Would it be possible to use a smaller file than 12MB (
JEOL_files/Sample/00_View000/View000_0000017.pts
) for the test suite? Including "large" file is problematic and usually we generate very small dataset just for this purpose. - It would be good to reformat the code to make it more readable (and easier to review). We follows PEP8 and you could run black on these files.
- Regarding saving memory when loading EDS datacube:
- the emd velox and bcf bruker reader have an parameter to rebin (
rebin_energy
ordownsample
) when loading - similarly, you could set the
dtype
of the numpy array to save on array size, see for example, the parameters of the emd velox reader.
- the emd velox and bcf bruker reader have an parameter to rebin (
Also is it worth squashing these commits since the last 19 have basically identical messages |
Yes, agreed and actually I would like to have the 12MB removed because it will stay in the history and this make the git history unnecessarily large... Maybe, this can be tidied up after the test files are sorted? |
@tjof2, sorry I am a beginner on git and github and these commits were just superfluous trial/error to solve why checks failed. I will try to be more concise next time. |
@sempicor no problem at all! If you're not sure how to squash the commits here, we can help. |
@tjof2 it would be very nice because I have no idea how to do that. |
@sempicor: I have squashed the commits in a branch in my fork (https://github.com/ericpre/hyperspy/commits/JEOL_io_plugin). We have two ways to proceed:
|
@ericpre or selecting a squash commit option from the dropdown when merging the PR might work? |
bc69eb8
to
1db6344
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the test coverage, the read_eds
is not tested.
Regarding the current approach to solve the memory error: this approach will work fine with data having very low count in the spectum image (less than 256 counts, since this is using uint8
data type), however, it doesn't work if some pixels have higher counts, which is usual. Therefore another approach needs to be used - see one my comments above or have a look at how other reader deal with this specific issue.
@sempicor: I have rebased and push to your branch, which means that your remote branch has now diverged from your local branch. The easiest for you is to checkout from this branch and continue from there. |
thanks @ericpre for your help. Regarding the use of uint8 I am not sure it is a problem as having more than 256 counts on a pixel implies performing at least over than 256 sweeps which is much more than the way we work. |
It is a bad practise to hard wire the type of the data. There are people who acquire EDS with high statistics and they should be able to open their data too. If there is no way to predict the required dtype, you could simply add an optional parameter to specify the dtype along with a warning if the data reach the maximum of the dtype to advice the user to increase the dtype using the optional parameter. Alternatively, you re-load the data a second time with a larger dtype, this would be slower but more correct also with a warning advice the user to set the dtype to load faster. |
Thanks again for your help. |
Great, please don't hesitate to ask if you are not sure about something or if something is unclear! |
Yes, we can see the apb file, do you know what is in there? |
APB file is created when I use P.Back menu in JEOL EDX program. I attach the screenshot image of the APB file which in created in the JEOL example file of #2488 (comment) |
I think that the live time data can be acquired by APB file....?haha.... |
@yy-ssang, can you try again #2607, I push a commit to skip this file. If you run hyperspy from a git repository, you will need to pull the last changes otherwise, you need to run again:
I have never used the JEOL program, so I will leave to @sempicor (or anyone else interested in this) to add support for this file, if this is worth it. PR welcome obviously! |
I opened the ".APB" file and I don't know what it is supposed to be at this point. I just noticed a periodic pattern of 61 cycles which could correspond to the number of scanning. |
@sempicor: sounds like a good plan! Thanks @yy-ssang for the swift bug report, the PR adding reading support for JEOL file has only been merged yesterday! |
@ericpre @sempicor Thank you all! the pts file is open well! I also report if there is any problem. |
Unaware of this effort I have implemeted my stand-alone reader for JEOL .pts files based on the initial effort of @sempicor (https://github.com/sempicor/jeol eds reader). So the basic algo is identical. I came across a few problems / questions:
The surplus data does not worry me right now but the difference in the two spectra does. Does anybody have information on these issues. BTW my reader also has the option to split frames / sweeps i.e. to read a N x X x Y x E data cube. Can be used e.g. to check for sample movement and all that is offered by the "play back" functionality of the JEOL software. |
Hi @ialxn, Concerning your questions:
I hope this can help you. |
Providing that this length of this dimension is known in advance, it should be easy to add in the current implementation of the reader in hyperspy/hyperspy/io_plugins/jeol.py Lines 423 to 435 in 512d129
@ialxn, would you like to make a pull request for this change? It would be good to use the same keyword parameter as for the velox EDS reader ( @ialxn and @sempicor, I had a look at parallelising the for loop in readcube with numba (it has a very easy to use syntax for this), but to read the stream correctly (not missing any X-ray count!), we need known where to split the stream, typically, where the |
Hi @ericpre, I'd be glad to help but I am not sure to understand what you asked. Thus xn+1<xn indicate you start a new linescan and yn+1<yn indicate you start a new frame. Maybe you can use such test to split the stream for parallelizing but I don't know how this work. BTW I think values of 24576 and 28672 indicate time information as their sum are approximately 100realtime and 100livetime values given in the header. |
Yes, I figured this out when finishing this PR! @ialxn mentioned that the JEOL software has a "play back" functionality, and this make me thing that they should be a table somewhere providing the beginning of each frame, so that the software can pick efficient pick up the right start without reading from the beginning of the vector. |
Ok @ericpre , I am not sure I saw such information but there is still parts of the binary file I wasn't able to decode. |
First, thanks for all the comments and sorry to be late with my answers. Reading individual sweeps is easy. Once the slow scan axis restarts (new value is smaller than last recorded value) a new sweep has started. I had put together a summary to discuss with colleagues that provides a bit more information and also a few plots to illustrate the diffrent points. Should be available for the next ten day at https://datatrans.psi.ch/?ShareToken=2DC1144743A799EB48EC23247B76CD9E3802FF42 @sempicor I have to go through my data but I try to make one of the data files (I probably have only the .pts) available. |
@sempicor Here is the .pts used for figure 2 in summary.pdf posted above. This one only has addidtional data for the two tags that can be used for images such as figures 4 and 5 but nothing in the spectrum-like category. https://datatrans.psi.ch/?ShareToken=35270DE8A5A1A5C61C2273EA7F5F268CD8E3A8F7 |
@ialxn , thanks for the data. Concerning the spectral mismatch it is strange because I don't have it on my data Concerning the data in range from 40960 to 45056 I am not sure they are spectrum data. I rather think they are image pixel value that might be used to check and correct drifts between sweeps. Concerning the data of 24576 and 28672 they allow to calculate realtime and livetime parameters. @ericpre , I am not really sure frame position are hard coded however there is a way to quickly extract them starting from rawdata (which is decoded by readcube): Then fpos will contain all positions where y go back to its initial position which mean starting a new frame. |
Parallel processing the data: How about the following rough idea
Trying to give a picture: x |
@sempicor Data in range from 40960 to 45056: If this is indeed related to the drift correction they should be only present if the option correct for sample movement in the JEOL software is active. Could you check (PSI is in partial lockdown and I will only have access to the instrument / software next Thursday earliest). And, as this data is recorded in the data stream, I would assume it needs to be used to obtain the correct images. |
@ialxn , I don't have access to the software either. Moreover, I discovered the existence of value ranging from 40960 to 45056 on data provided by @yy-ssang . I have never seen these kind of value before which explain the lack of implementation. You'll see 60 images looks like 'View000_0000000.img' and the last one is truncated probably due to abortion of the acquisition without finishing current frame. |
@sempicor , @yy-ssang yes, I see the same with 128.pts I provided (50 images, 128 x 128 pixels). I also "think" that the edx data is corrected because I usually get drifts between individual sweeps (determined by crosscorrelation between sweeps) of around 1 pixel. I'll try to provide a data set (I have access to the instrument next Thurday) where I move the stage while acquiring data in order to have a large sample movement. So I can check, if the saved edx data shows up "blurred" if all sweeps are summed. Ok, this issue seems to be (mostly) resolved. |
@ericpre with regard to the pull request for reading individual frames. Probably not worth it because my code is really ugly and the implementation is simple. Here the code in question:
|
@sempicor Yes, the spectrum shift seems to be related to the ExCoef parameter. I tried the following:
From this I get
BTW do you have an idea where the scale factor of the spectrum from tag EDXREF ( |
@ialxn , Thanks to you I finally understood what mean ExCoef parameters. You can see that the difference between EDXRF spectrum and interpolated sum spectrum (right) is better. However, as I don't know the way JEOL is performing its own correction there is still a significant difference. I am sorry I still don't understand what it the scale factor you mentioned for EDXREF. EDXRF (not EDXREF) appears twice in metadata. Once in "EDS Data/AnalyzableMap MeasData/Data" but it's the EDXRF spectrum (int32) and once in "PTTD Param/Params/PARAMPAGE1_EDXRF" where you have "CH Res" (channel resolution ? of 0.01 eV, float64), "E Noise" (energy noise ? of 45 ?, float64), "Fano F" (?, 0.12?, float64), "NumCH" (number of channel ?, 4096, uint32) and the Tpl modes we already discussed. |
Regarding parallelisation of reading the stream, the simple approach I have taken is great because of the increase memory usage. I will keep what I did in a commit in a branch of my fork, in case which can be useful in the future. To workaround the memory usage issue, an alternative would be to use sparse array as this is currently done with the Velox emd reader. This would also help with adding support for lazy signal. |
Hi all, thank you for developing the codes. My sample drifted during the experiment so I want to align the image series to align the EDX data. I would like to ask how I could extract the image series acquired from the .pts file? I have tried to modify the jeol.py code as shown in the pictures attached but the image I got wasnt the right one. |
@IanNTU, even if this is still work in progress, you may want to test the branch of this PR: #2846. |
Closes #2257.
Description of the change
Future improvements
Apologies
This is my first pull request and I not familiar with github
Thanks for your attention