Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEOL eds data plugins #2488

Merged
merged 16 commits into from Dec 27, 2020
Merged

Conversation

arnduveg
Copy link
Contributor

@arnduveg arnduveg commented Aug 19, 2020

Closes #2257.

Description of the change

  • upload "jeol.py" in io_plugins to read several files made by JEOL Analysist Station software (".ASW": summary file, ".IMG": image file, ".MAP": first ones is an image file similar to ".IMG" other ones are elemental maps, ".PTS": eds data file, ".EDS": eds spectrum file.
  • add "jeol" reference in init.py

Future improvements

  • datacube reconstruction is quite slow
  • "Memory Error" output when I try to reconstruct the whole datacube (512x512x4096) so only part of the data are read

Apologies

This is my first pull request and I not familiar with github

Thanks for your attention

@ericpre
Copy link
Member

ericpre commented Aug 21, 2020

This looks like a good start!
Can you add some tests? Ideally by adding smallest possible file generated by the JEOL software.

@arnduveg
Copy link
Contributor Author

Here you can find some files generated by JEOL Analysis software:
jeol_exemple_files.zip

"rawdata.ASW" is the main file. It contains all project file links but also some other information such as the working area width useful for image scale.
all data files are stored in "Sample" folder. In this example you have :

  • "View000_0000000.img" which is the haadf image
  • "View000_0000001.map" which is again the haadf image
  • "View000_0000002.map" to "View000_0000016.map" which are the elemental maps of selected elements in this project
  • "View000_0000017.pts" is the most important file which contain eds data.

Example :

import hyperspy.api as hs

s = hs.load("rawdata.ASW") # if you want do download all the files of the project
s_img = hs.load("Sample/00_View000/View000_0000000.img") # if you want do download the haadf image
s_map_Zn = hs.load("Sample/00_View000/View000_0000016.img") # if you want do download the Zn elemental map
s_eds = hs.load("Sample/00_View000/View000_0000017.pts") # if you want do download the eds datacube

Hope it will help

@jat255
Copy link
Member

jat255 commented Aug 21, 2020

@sempicor Thanks for the addition; I think it will be great to support another microscopy file format in HyperSpy. By "tests", @ericpre meant actual pytest test methods, like are defined in our tests folder.

This is an example for the FEI formats: https://github.com/hyperspy/hyperspy/blob/RELEASE_next_minor/hyperspy/tests/io/test_fei.py

These tests run automatically whenever new code is included, and ensure that we do not introduce regressions or other bugs. Please the the developer guide for more detail on writing test cases. We will not be able to include any new features until all the code added is covered by test cases to go along with it.

@codecov
Copy link

codecov bot commented Aug 24, 2020

Codecov Report

Merging #2488 (9919c32) into RELEASE_next_minor (7bdee7a) will increase coverage by 0.17%.
The diff coverage is 92.30%.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #2488      +/-   ##
======================================================
+ Coverage               76.49%   76.66%   +0.17%     
======================================================
  Files                     202      203       +1     
  Lines                   29808    30125     +317     
  Branches                 6520     6567      +47     
======================================================
+ Hits                    22801    23095     +294     
- Misses                   5195     5205      +10     
- Partials                 1812     1825      +13     
Impacted Files Coverage Δ
hyperspy/io_plugins/__init__.py 73.91% <ø> (ø)
hyperspy/io_plugins/jeol.py 92.30% <92.30%> (ø)
hyperspy/model.py 80.33% <0.00%> (ø)
hyperspy/axes.py 85.93% <0.00%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bdee7a...9919c32. Read the comment docs.

@arnduveg
Copy link
Contributor Author

Thanks for your reply.

I started to write some tests for loading all jeol project or individual image file or eds datacube.

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sempicor, nice to see that you sorted out the tests! A few comments for now:

  • Would it be possible to use a smaller file than 12MB (JEOL_files/Sample/00_View000/View000_0000017.pts) for the test suite? Including "large" file is problematic and usually we generate very small dataset just for this purpose.
  • It would be good to reformat the code to make it more readable (and easier to review). We follows PEP8 and you could run black on these files.
  • Regarding saving memory when loading EDS datacube:
    • the emd velox and bcf bruker reader have an parameter to rebin (rebin_energy or downsample) when loading
    • similarly, you could set the dtype of the numpy array to save on array size, see for example, the parameters of the emd velox reader.

@tjof2
Copy link
Contributor

tjof2 commented Aug 26, 2020

Also is it worth squashing these commits since the last 19 have basically identical messages

@ericpre
Copy link
Member

ericpre commented Aug 26, 2020

Also is it worth squashing these commits since the last 19 have basically identical messages

Yes, agreed and actually I would like to have the 12MB removed because it will stay in the history and this make the git history unnecessarily large... Maybe, this can be tidied up after the test files are sorted?
On a similar topic, I have noticed that I let a similarly "large" file go into the history in the nexus reader PR (#2328), which I would like to try to clean up when we split hyperspy....

@arnduveg arnduveg closed this Aug 26, 2020
@arnduveg arnduveg reopened this Aug 26, 2020
@arnduveg
Copy link
Contributor Author

@ericpre

  • I changed the 13.2Mo test file for a 1.4Mo. I hope it is better now.
  • I will check how to reformat the code soon
  • I found a solution to solve the memory problem. I was trying to create a 512x512x4096 numpy array in float64 (because I didn't specified a dtype). It works better with uint8.

@tjof2, sorry I am a beginner on git and github and these commits were just superfluous trial/error to solve why checks failed. I will try to be more concise next time.

@tjof2
Copy link
Contributor

tjof2 commented Aug 26, 2020

sorry I am a beginner on git and github and these commits were just superfluous trial/error to solve why checks failed. I will try to be more concise next time.

@sempicor no problem at all! If you're not sure how to squash the commits here, we can help.

@arnduveg
Copy link
Contributor Author

sorry I am a beginner on git and github and these commits were just superfluous trial/error to solve why checks failed. I will try to be more concise next time.

@sempicor no problem at all! If you're not sure how to squash the commits here, we can help.

@tjof2 it would be very nice because I have no idea how to do that.

@ericpre
Copy link
Member

ericpre commented Aug 26, 2020

@sempicor: I have squashed the commits in a branch in my fork (https://github.com/ericpre/hyperspy/commits/JEOL_io_plugin). We have two ways to proceed:

  • either you pull my branch (mentioned above) and then (orce-push it to the branch of this PR (https://github.com/sempicor/hyperspy/tree/RELEASE_next_minor)
  • or I push to the branch of this PR for you and then you pull it. Since it will be different from your local branch the easiest is possibly to delete the old one and checkout the new one.

@tjof2
Copy link
Contributor

tjof2 commented Aug 27, 2020

@ericpre or selecting a squash commit option from the dropdown when merging the PR might work?

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the test coverage, the read_eds is not tested.

Regarding the current approach to solve the memory error: this approach will work fine with data having very low count in the spectum image (less than 256 counts, since this is using uint8 data type), however, it doesn't work if some pixels have higher counts, which is usual. Therefore another approach needs to be used - see one my comments above or have a look at how other reader deal with this specific issue.

hyperspy/tests/io/test_jeol.py Outdated Show resolved Hide resolved
hyperspy/tests/io/test_jeol.py Outdated Show resolved Hide resolved
hyperspy/io_plugins/jeol.py Outdated Show resolved Hide resolved
hyperspy/io_plugins/jeol.py Outdated Show resolved Hide resolved
@ericpre
Copy link
Member

ericpre commented Sep 20, 2020

@sempicor: I have rebased and push to your branch, which means that your remote branch has now diverged from your local branch. The easiest for you is to checkout from this branch and continue from there.

@arnduveg
Copy link
Contributor Author

thanks @ericpre for your help.
Indeed, I added the read_eds at a later stage and I forgot to write some tests.
I'll try to do that soon.

Regarding the use of uint8 I am not sure it is a problem as having more than 256 counts on a pixel implies performing at least over than 256 sweeps which is much more than the way we work.
For example the largest dataset I had to read was a 124 sweeps and the max pixel value was only 21.
However, I agree that it would be nice to have some option to reduce the hypermap (maybe a cutoff) as most of the time only energies lower than 20 keV are useful.

@ericpre
Copy link
Member

ericpre commented Sep 21, 2020

Regarding the use of uint8 I am not sure it is a problem as having more than 256 counts on a pixel implies performing at least over than 256 sweeps which is much more than the way we work.
For example the largest dataset I had to read was a 124 sweeps and the max pixel value was only 21.
However, I agree that it would be nice to have some option to reduce the hypermap (maybe a cutoff) as most of the time only energies lower than 20 keV are useful.

It is a bad practise to hard wire the type of the data. There are people who acquire EDS with high statistics and they should be able to open their data too. If there is no way to predict the required dtype, you could simply add an optional parameter to specify the dtype along with a warning if the data reach the maximum of the dtype to advice the user to increase the dtype using the optional parameter. Alternatively, you re-load the data a second time with a larger dtype, this would be slower but more correct also with a warning advice the user to set the dtype to load faster.

@arnduveg
Copy link
Contributor Author

Thanks again for your help.
I'll try to manage this as soon as possible

@ericpre
Copy link
Member

ericpre commented Sep 22, 2020

Great, please don't hesitate to ask if you are not sure about something or if something is unclear!

@ericpre
Copy link
Member

ericpre commented Dec 28, 2020

Yes, we can see the apb file, do you know what is in there?

@yy-ssang
Copy link

APB file is created when I use P.Back menu in JEOL EDX program.

I attach the screenshot image of the APB file which in created in the JEOL example file of #2488 (comment)

image

@yy-ssang
Copy link

I think that the live time data can be acquired by APB file....?haha....

@ericpre
Copy link
Member

ericpre commented Dec 28, 2020

@yy-ssang, can you try again #2607, I push a commit to skip this file. If you run hyperspy from a git repository, you will need to pull the last changes otherwise, you need to run again:

pip install https://github.com/ericpre/hyperspy/archive/fix_stream_jeol.zip

I have never used the JEOL program, so I will leave to @sempicor (or anyone else interested in this) to add support for this file, if this is worth it. PR welcome obviously!

@arnduveg
Copy link
Contributor Author

I opened the ".APB" file and I don't know what it is supposed to be at this point. I just noticed a periodic pattern of 61 cycles which could correspond to the number of scanning.
I'll try to work on it when I have time but for the moment I think it is better to skip it.

@ericpre
Copy link
Member

ericpre commented Dec 28, 2020

@sempicor: sounds like a good plan!

Thanks @yy-ssang for the swift bug report, the PR adding reading support for JEOL file has only been merged yesterday!

@yy-ssang
Copy link

@ericpre @sempicor Thank you all! the pts file is open well! I also report if there is any problem.

@ialxn
Copy link
Contributor

ialxn commented Jan 7, 2021

Unaware of this effort I have implemeted my stand-alone reader for JEOL .pts files based on the initial effort of @sempicor (https://github.com/sempicor/jeol eds reader). So the basic algo is identical.

I came across a few problems / questions:

  • The spctra seem to have an offset of 96 channels (my data shows 4096 energy channels). I could not find this offset in the header. So is this hard coded ? What would be the offset in case of only having 2048 channels ? Probably 48.

  • If I compare the reference spectrum read from tag EDXREF to the summed spectrum of the whole data cube I notice that the scale is different by about a factor of 10e44. No worries here but if I calculate the difference of the normalized (and shifted) spectra I find large differences in the first about 60 channels. Not only do the intensities differ but also the shape of the spectra is different!

  • There is additional data in the .pts files with values in the range 40960 ≤ d < 45056 that seems to be "spectrum-like". Additionally there is data with values 24576 or 28672 that show a distorted periodic pattern if plotted as image. What do this data represent ?

The surplus data does not worry me right now but the difference in the two spectra does. Does anybody have information on these issues.

BTW my reader also has the option to split frames / sweeps i.e. to read a N x X x Y x E data cube. Can be used e.g. to check for sample movement and all that is offered by the "play back" functionality of the JEOL software.

@arnduveg
Copy link
Contributor Author

arnduveg commented Jan 7, 2021

Hi @ialxn,

Concerning your questions:

  • yes the offset is hard coded. It is stored in DigZ variable. There are several DigZ variables depending on T1 to T4 or Quanti mode so be sure you take the correct value. Also you'll have a DigL variable which indicate the number of value replaced by 0 at the beginning of your spectrum.
  • I didn't notice the problem you mention regarding differences between summed spectrum and EDXRF spectrum except that in my case EDXRF spectrum display only 2048 of the 4096 channels and the first 10 values are replaced by 0. You can check this on the example files I put on hyperspy.
  • on the project you forked I had memory problem so I deliberated cut the datacube. This problem was fixed in the hyperspy jeol plugin. To sum up values ranging from 32768 to 36864 correspond to y position, from 36864 to 40960 correspond to x position and from 45056 to 49152 to channel position. At this moment I still don't know what are the values of 24576 and 28672.

I hope this can help you.

@ericpre
Copy link
Member

ericpre commented Jan 7, 2021

BTW my reader also has the option to split frames / sweeps i.e. to read a N x X x Y x E data cube. Can be used e.g. to check for sample movement and all that is offered by the "play back" functionality of the JEOL software.

Providing that this length of this dimension is known in advance, it should be easy to add in the current implementation of the reader in

@numba.njit(cache=True)
def readcube(rawdata, hypermap, rebin_energy, channel_number,
width_norm, height_norm): # pragma: no cover
for value in rawdata:
if value >= 32768 and value < 36864:
x = int((value - 32768) / width_norm)
elif value >= 36864 and value < 40960:
y = int((value - 36864) / height_norm)
elif value >= 45056 and value < 49152:
z = int((value - 45056) / rebin_energy)
if z < channel_number:
hypermap[y, x, z] += 1
return hypermap

@ialxn, would you like to make a pull request for this change? It would be good to use the same keyword parameter as for the velox EDS reader (sum_frames) to be consistent.

@ialxn and @sempicor, I had a look at parallelising the for loop in readcube with numba (it has a very easy to use syntax for this), but to read the stream correctly (not missing any X-ray count!), we need known where to split the stream, typically, where the x or y start. Considering, I don't use this reader, I didn't try to figure it out any further but if you are aware that this information is available somewhere, then it should be easy to implement it.

@arnduveg
Copy link
Contributor Author

arnduveg commented Jan 7, 2021

Hi @ericpre, I'd be glad to help but I am not sure to understand what you asked.
I found jeol eds rawdata where simply a big vector starting at the 16384th byte. The value of each data indicate if the data is suppose to be x, y or channel position. For example for the 30st data you can have:
b pos: value -> comment
b4000: 36864 -> y1=0
b4002: 32864 -> x1=12
b4004: 45185 -> e1=129 -> +1 count at (0,12,129)
b4006: 33040 -> x2=34
b4008: 45207 -> e2=151 -> +1 count at (0,34,151)
b400A: 33320 -> x3=69
b400C: 45174 -> e3=118 -> +1 count at (0,69,118)
b400E: 24576 -> realtime=1
b4010: 28672 -> livetime=1
b4012: 33800 -> x4=129
b4014: 45181 -> e4=125 -> +1 count at (0,129,125)
b4016: 34000 -> x5=154
b4018: 45308 -> e5=252 -> +1 count at (0,154,252)
b401A: 34024 -> x6=157
b401C: 45178 -> e6=122 -> +1 count at (0,157,122)
b401E: 45176 -> e7=120 -> +1 count at (0,157,120)
b4020: 34064 -> x7=162
b4022: 46047 -> e8=991 -> +1 count at (0,162;991)
b4024: 34168 -> x8=175
b4026: 45794 -> e9=738 -> +1 count at (0,175,738)
b4028: 24576 -> realtime=2
b402A: 28672 -> livetime=2
b402C: 34520 -> x9=219
b402E: 45181 -> e10=125 -> +1 count at (0,219,125)
b4030: 24576 -> realtime=3
b4032: 28672 -> livetime=3
b4034: 35240 -> x10=309
b4036: 45957 -> e11=901 -> +1 count at (0,309,901)
b4038: 35280 -> x11=314
b403A: 45194 -> e12=138 -> +1 count at (0,314,138)
...

Thus xn+1<xn indicate you start a new linescan and yn+1<yn indicate you start a new frame. Maybe you can use such test to split the stream for parallelizing but I don't know how this work.

BTW I think values of 24576 and 28672 indicate time information as their sum are approximately 100realtime and 100livetime values given in the header.

@ericpre
Copy link
Member

ericpre commented Jan 7, 2021

Yes, I figured this out when finishing this PR!
To parallelise, we basically needs to know before starting reading the raw data, how to split the raw data in order to send a chunk of the raw data to process simultaneously - in parallel. All process are independent and don't share information and when there are all finished their output are recombined.
The issue is that if we split in the wrong place, for example, between a x and a y value, then this process will be missing the x value and it will not know where to put the corresponding X-ray count.

@ialxn mentioned that the JEOL software has a "play back" functionality, and this make me thing that they should be a table somewhere providing the beginning of each frame, so that the software can pick efficient pick up the right start without reading from the beginning of the vector.

@arnduveg
Copy link
Contributor Author

arnduveg commented Jan 8, 2021

Ok @ericpre , I am not sure I saw such information but there is still parts of the binary file I wasn't able to decode.
I'll try to work on that. Also I noticed new features on the file @yy-ssang provided that I will have to decipher to improve the io_plugin.
@ialxn , is it possible to have a look on an example of one of your .pts file to check the spectrum problem you mentioned ?

@ialxn
Copy link
Contributor

ialxn commented Jan 8, 2021

First, thanks for all the comments and sorry to be late with my answers.

Reading individual sweeps is easy. Once the slow scan axis restarts (new value is smaller than last recorded value) a new sweep has started.

I had put together a summary to discuss with colleagues that provides a bit more information and also a few plots to illustrate the diffrent points. Should be available for the next ten day at

https://datatrans.psi.ch/?ShareToken=2DC1144743A799EB48EC23247B76CD9E3802FF42

@sempicor I have to go through my data but I try to make one of the data files (I probably have only the .pts) available.

@ialxn
Copy link
Contributor

ialxn commented Jan 8, 2021

@sempicor Here is the .pts used for figure 2 in summary.pdf posted above. This one only has addidtional data for the two tags that can be used for images such as figures 4 and 5 but nothing in the spectrum-like category.

https://datatrans.psi.ch/?ShareToken=35270DE8A5A1A5C61C2273EA7F5F268CD8E3A8F7

@arnduveg
Copy link
Contributor Author

arnduveg commented Jan 8, 2021

@ialxn , thanks for the data.

Concerning the spectral mismatch it is strange because I don't have it on my data
compare_spectrum
However on both your data and @yy-ssang there are discrepancies between both spectra before channel 52.
compare_spectra_ialxn
compare_spectra_yy-ssang
I think a solution of this problem might reside in the ExCoef parameter ([0.00987743, 0.964696 , 0.0158119 , 0.525] in your case) as the last value is the energy corresponding to channel 52. At this point I don't know what ExCoef is supposed to be but it might be scale/shift parameters for low energy portion of the spectrum.

Concerning the data in range from 40960 to 45056 I am not sure they are spectrum data. I rather think they are image pixel value that might be used to check and correct drifts between sweeps.

Concerning the data of 24576 and 28672 they allow to calculate realtime and livetime parameters.

@ericpre , I am not really sure frame position are hard coded however there is a way to quickly extract them starting from rawdata (which is decoded by readcube):
ypos = np.where((rawdata>=36864) * (rawdata<40960))[0]
Y = np.array(rawdata[ypos],dtype=float)
fpos = np.where(Y[1:]<Y[:-1])[0]

Then fpos will contain all positions where y go back to its initial position which mean starting a new frame.
It could also work on x value to determine every time a new line is started but I am not sure it would be useful.

@ialxn
Copy link
Contributor

ialxn commented Jan 8, 2021

Parallel processing the data: How about the following rough idea

  • Divide the whole data into a few (equal) chunks.
  • Due to the sequentail data format (x- and y-values are only updated when needed) x and y might not be known at the beginning of a chunk.
  • The first chunk can be processed as we do now.
  • For all other chunks: Start at the end of previous chunk and scan backwards until you find at least an x and y value. We might e.g. find several x values before we find the first y value. In this case only retain the first one found (this value, i.e. the last in the stream, is still valid at beginning of the chunk).

Trying to give a picture:

x
E
x
E
y
E
x (b)
y
E
y (a)
----> split here. Search backwards in previous chunk to find x (b) and y (a) required
E
----> If the split occurs here, y found at (a) is updated by y (c)
y (c)
E

@ialxn
Copy link
Contributor

ialxn commented Jan 8, 2021

@sempicor Data in range from 40960 to 45056: If this is indeed related to the drift correction they should be only present if the option correct for sample movement in the JEOL software is active. Could you check (PSI is in partial lockdown and I will only have access to the instrument / software next Thursday earliest). And, as this data is recorded in the data stream, I would assume it needs to be used to obtain the correct images.

@arnduveg
Copy link
Contributor Author

arnduveg commented Jan 8, 2021

@ialxn , I don't have access to the software either. Moreover, I discovered the existence of value ranging from 40960 to 45056 on data provided by @yy-ssang . I have never seen these kind of value before which explain the lack of implementation.
I noticed that on @yy-ssang data every (x,y,f) tuple was written following by a 40960<=d<45056 value even if no e data was associated where on my data (x,y) are written only if a e data is recorded. It is why a guess it might correspond to image pixel rather than eds (spectrum) data.
If you want to have a look on 40960<=d<45056 datacube reconstruction you can try:
fd = open('YY-SSANG/example/Sample/00_View000/View000_0000008.pts', "br") from 'example.zip' provided earlier by @yy-ssang
fd.seek(8*16**3)
rawdata = np.fromfile(fd, dtype="u2")
ipos = np.where((rawdata>=40960) * (rawdata<45056))[0]
I = np.array(rawdata[ipos]-40960, dtype=float)
test = np.nan*np.ones([256*256*61]) as in this case data are 61 sweep of 256x256 images
test[:len(ipos)] = I
test = test.reshape(61,256,256)

You'll see 60 images looks like 'View000_0000000.img' and the last one is truncated probably due to abortion of the acquisition without finishing current frame.
I don't now why these data are recorded for @yy-ssang but not I my case neither the use for JEOL software. If you have both kinds of dataset maybe you'll find the answer.

@yy-ssang
Copy link

yy-ssang commented Jan 9, 2021

@sempicor I think you are right. The example data I uploaded is acquired with 'immediately' end mode not 'Frame End' end mode. This is last frame on JEOL EDX program. From this, the last one is due to stopping acquisition immediately.

image

@ialxn
Copy link
Contributor

ialxn commented Jan 10, 2021

@sempicor , @yy-ssang yes, I see the same with 128.pts I provided (50 images, 128 x 128 pixels). I also "think" that the edx data is corrected because I usually get drifts between individual sweeps (determined by crosscorrelation between sweeps) of around 1 pixel. I'll try to provide a data set (I have access to the instrument next Thurday) where I move the stage while acquiring data in order to have a large sample movement. So I can check, if the saved edx data shows up "blurred" if all sweeps are summed.

Ok, this issue seems to be (mostly) resolved.

@ialxn
Copy link
Contributor

ialxn commented Jan 10, 2021

@ericpre with regard to the pull request for reading individual frames. Probably not worth it because my code is really ugly and the implementation is simple. Here the code in question:

        if self.split_frames:
            dcube = np.zeros([self.meta.Sweep, self.meta.im_size, self.meta.im_size, N_spec], dtype=dtype)
        else:
            dcube = np.zeros([1, self.meta.im_size, self.meta.im_size, N_spec], dtype=dtype)
        N = 0
        N_err = 0
        unknown = {}
        frame = 0
        x = -1
        # Data is mapped as follows:
        #   32768 <= datum < 36864                  -> y-coordinate
        #   36864 <= datum < 40960                  -> x-coordinate
        #   45056 <= datum < END (=45056 + N_ch)    -> count registered at energy
        END = 45056 + self.meta.N_ch
        scale = 4096 / self.meta.im_size
        # map the size x size image into 4096x4096
        for d in data:
            N += 1
            if 32768 <= d < 36864:
                y = int((d - 32768) / scale)
            elif 36864 <= d < 40960:
                d = int((d - 36864) / scale)
                if self.split_frames and d < x:
                    # A new frame starts once the slow axis (x) restarts. This
                    # does not necessary happen at x=zero, if we have very few
                    # counts and nothing registers on first scan line.
                    frame += 1
                x = d
            elif 45056 <= d < END:
                z = int(d - 45056)
                ##################################################
                #                                                #
                #  tentative OFFSET by 96 channels (see #59_60)  #
                #                                                #
                ##################################################
                z -= 96
                if N_spec > z >= 0:
                    dcube[frame, x, y, z] = dcube[frame, x, y, z] + 1
            else:
                if verbose:
                    # I have no idea what these data mean
                    # collect statistics on these values for debug
                    if str(d) in unknown:
                        unknown[str(d)] += 1
                    else:
                        unknown[str(d)] = 1
                    N_err += 1
        if verbose:
            print('Unidentified data items ({} out of {}, {:.2f}%) found:'.format(N, N_err, 100*N_err/N))
            for key in sorted(unknown):
                print('\t{}: found {} times'.format(key, unknown[key]))
        return dcube```

@ialxn
Copy link
Contributor

ialxn commented Jan 10, 2021

@sempicor Yes, the spectrum shift seems to be related to the ExCoef parameter. I tried the following:

E_uncorr = np.arange(53) / 100
E_corr = ExCoef[0]*E_uncorr**2 + ExCoef[1]*E_uncorr + ExCoef[2]

delta_E = E_corr - E_uncorr  # Energy difference
delta_CH = 100*delta_E       # Channel difference

From this I get delta_CH[0] = 1.58 and delta_CH[52] = 0.012 which seems to make sense looking at your plots above. I guess the rebinning (up to an energy of ExCoef[3]) should be only performed in functions that return spectra. As for the rebinning, would it make sense (channel 0 as example):

  • delta_CH = 1.58 i.e. the counts (uncorrected channels) are divided between (corrected) channels 1 and 2 (channels that bracket delta_CH).
  • place 58% of counts of uncorrected channel 0 into corrected channel 1 and 48% into corrected channel 2 i.e. counts are on average placed in channel 1.58.

BTW do you have an idea where the scale factor of the spectrum from tag EDXREF (float32) comes from?

@arnduveg
Copy link
Contributor Author

@ialxn , Thanks to you I finally understood what mean ExCoef parameters.
Indeed, it is used to recalculate first energies then I guess you can interpolate edx counts on new energies.

shift_correction

You can see that the difference between EDXRF spectrum and interpolated sum spectrum (right) is better. However, as I don't know the way JEOL is performing its own correction there is still a significant difference.

I am sorry I still don't understand what it the scale factor you mentioned for EDXREF. EDXRF (not EDXREF) appears twice in metadata. Once in "EDS Data/AnalyzableMap MeasData/Data" but it's the EDXRF spectrum (int32) and once in "PTTD Param/Params/PARAMPAGE1_EDXRF" where you have "CH Res" (channel resolution ? of 0.01 eV, float64), "E Noise" (energy noise ? of 45 ?, float64), "Fano F" (?, 0.12?, float64), "NumCH" (number of channel ?, 4096, uint32) and the Tpl modes we already discussed.

@ericpre ericpre mentioned this pull request Jan 12, 2021
4 tasks
@ericpre
Copy link
Member

ericpre commented Jan 13, 2021

Regarding parallelisation of reading the stream, the simple approach I have taken is great because of the increase memory usage. I will keep what I did in a commit in a branch of my fork, in case which can be useful in the future. To workaround the memory usage issue, an alternative would be to use sparse array as this is currently done with the Velox emd reader. This would also help with adding support for lazy signal.

@lanNTU
Copy link

lanNTU commented Dec 2, 2021

Hi all, thank you for developing the codes. My sample drifted during the experiment so I want to align the image series to align the EDX data. I would like to ask how I could extract the image series acquired from the .pts file?

I have tried to modify the jeol.py code as shown in the pictures attached but the image I got wasnt the right one.

pic1
pic2
pic3

@ericpre
Copy link
Member

ericpre commented Dec 2, 2021

@IanNTU, even if this is still work in progress, you may want to test the branch of this PR: #2846.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loading .pts EDX files (JEOL format)
7 participants