Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AHI HRIT reader has gotten slower #1384

Closed
pnuu opened this issue Oct 1, 2020 · 44 comments · Fixed by #1986
Closed

AHI HRIT reader has gotten slower #1384

pnuu opened this issue Oct 1, 2020 · 44 comments · Fixed by #1986

Comments

@pnuu
Copy link
Member

pnuu commented Oct 1, 2020

Describe the bug
Compared to older versions of Satpy, the ahi_hrit reader seems to have gotten very much slower.

To Reproduce

  1. Create an environment with Satpy 0.19.1.
  2. Create an environment with Satpy 0.23.0.
  3. Compare the timings creating the same set of composites, e.g. by running the below script and composite definition. The built-in composite needs to be overridden, as it isn't consistent over these versions.
#!/usr/bin/env python

import glob
import os
os.environ['PYTROLL_CHUNK_SIZE'] = "1024"
os.environ['DASK_NUM_WORKERS'] = "2"
os.environ['OMP_NUM_THREADS'] = "1"
import time

from satpy import Scene


def main():
    fnames = glob.glob(
        "/home/lahtinep/data/satellite/geo/himawari-8/IMG*20201001070*")
    comps = ["truecolor",]
    glbl = Scene(reader='ahi_hrit', filenames=fnames)
    glbl.load(comps)
    glbl.save_datasets(base_dir='/tmp/')


if __name__ == "__main__":
    tic = time.time()
    main()
    print("Elapsed time:", time.time() - tic)

in $PPP_CONFIG_DIR/composites/ahi.yaml

  truecolor:
    compositor: !!python/name:satpy.composites.SelfSharpenedRGB
    prerequisites:
    - name: B03
      modifiers: [sunz_corrected, rayleigh_corrected]
    - name: green
    - name: B01
      modifiers: [sunz_corrected, rayleigh_corrected]
    high_resolution_band: red
    standard_name: truecolor

in $PPP_CONFIG_DIR/enhancements/ahi.yaml

  truecolor:
    standard_name: truecolor
    operations:
    - name: cira_stretch
      method: !!python/name:satpy.enhancements.cira_stretch

Expected behavior
Similar timings for the old and new versions. The other chains I've upgraded over the same versions have actually gotten faster. For the full set of composites I got processing times (via Trollflow) of around 10 minutes. With the new version(s) the process crashes around 28 minutes when the memory runs out.

Actual results
For Satpy 0.19.1: 49.9 s
For Satpy 0.23.0: 59.2 s

Environment Info:

  • OS: Linux
  • Satpy Versions: 0.19.1 and 0.23.0
  • Pyspectral Versions: 0.9.2 and 0.10.0
@pnuu
Copy link
Member Author

pnuu commented Oct 1, 2020

The only difference I see in the hrit_jma.py file is the addition of start_time and end_time properties, addition of acq_time to the coordinates. The scanline time parsing itself looks quite fast, so it can't be that. But was there something in setting .coords triggering the computation of the res xr.DataArray`?

@pnuu
Copy link
Member Author

pnuu commented Oct 1, 2020

No, no extra computations. I used a custom scheduler to check that, and both 0.19.1 and 0.23.0 call da.compute() 4 times.

@djhoese
Copy link
Member

djhoese commented Oct 1, 2020

Are we sure the reader is slower? Maybe the compositing? What about dask versions? Same input storage device? Same timings after repeated executions?

@pnuu
Copy link
Member Author

pnuu commented Oct 1, 2020

I'm starting to lean on compositing. With simple composites like natural_color the new version is somewhat faster, also for ahi_hrit, but when creating four composites using DayNightCompositor the computation time increases from 115 s to 160 s.

The weird thing is that the compositing is slow only with ahi_hrit reader. For example seviri_l1b_hrit which uses the same base class has gotten faster between these two versions. Same goes for the other chains I've upgraded, using readers viirs_compact, avhrr_l1b_aapp and abi_l1b.

With 0.19.1 the support library versions are whatever were current in early November last year.

@simonrp84
Copy link
Member

Maybe it's specific to the composite being used?
Your AHI composite has rayleigh_corrected as a modifier, does your SEVIRI composite? The default ones on github don't have the Rayleigh correction.
Not sure how much that helps, as Rayleigh correction has been in for many years, but maybe it'll point things in the right direction.

@pnuu
Copy link
Member Author

pnuu commented Oct 1, 2020

I ran the "four Day/Night composites" case again with the custom scheduler. The old version calls compute() four times (total time 136 s):

[INFO: 2020-10-01 15:53:30 : satpy.composites] Getting reflective part of B07
Compute() called 1 times
Compute() called 2 times
Compute() called 3 times

[INFO: 2020-10-01 15:53:50 : satpy.writers] Computing and writing results...
Compute() called 4 times

For the new version compute() is called only once (187 s):

[INFO: 2020-10-01 15:56:24 : satpy.writers] Computing and writing results...
Compute() called 1 times

So I'm now thinking it's something in Pyspectral. Fancy indexing or something like that?

@pnuu
Copy link
Member Author

pnuu commented Oct 1, 2020

Maybe it's specific to the composite being used?
Your AHI composite has rayleigh_corrected as a modifier, does your SEVIRI composite? The default ones on github don't have the Rayleigh correction.

The GOES/ABI truecolor has the same modifiers applied without slowdown, SEVIRI composites don't have rayleigh.

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

Here are the profiles Dask diagnostics give.

With 0.19.1 (Pyspectral 0.9.2):
dask_profile_0 19 1

With 0.23.0 (Pyspectral 0.10.0):
dask_profile_0 23 0

Two things I notice are the 1 GB higher memory usage and the single CPU usage for much larger time with 0.23.0 after 80 s time has elapsed.

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

In the above graphs the colorful bits in the end are calls to store_chunk(). So for some reason in the new installation (rasterio 0.1.6, gdal 3.1.2) it takes much more time to perform these calls, and they happen in a single thread. The older installation has rasterio 1.1.0 and gdal 3.0.2.

For both cases the input data are read from the same location (NAS via NFS) and the results are also saved to same location (local /tmp/).

@mraspaud
Copy link
Member

mraspaud commented Oct 2, 2020

you mention gdal and rasterio, could you try taking them out of the loop by eg writing a png file instead?

@mraspaud
Copy link
Member

mraspaud commented Oct 2, 2020

Also you show that the reflective part on B07 is used, that's not for true color, right?

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

Also you show that the reflective part on B07 is used, that's not for true color, right?

That is for the later case with four Day/Night composites which also have the slowness present.

I'm now testing saving to PNG, results in a bit.

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

And the profile graphs when saving with glbl.save_datasets(base_dir='/tmp/', writer='simple_image', filename='{name}_{start_time:%Y%m%d_%H%M%S}.png').

Satpy 0.19.1 (142 seconds):
dask_profile_0 19 1_pil
Satpy 0.23.0 (132 seconds):
dask_profile_0 23 0_pil

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

So: rasterio slows down the new Satpy saving (for AHI HRIT only). The memory usage is the same for both saving methods, the new Satpy using 1 GB more RAM.

@mraspaud
Copy link
Member

mraspaud commented Oct 2, 2020

Another thing I'm thinking about: the size of the resulting images is the same right? so it's not compression or something...

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

There are some minor differences in the filesizes, but nothing really pronounced:

$ ls -l /tmp/*20201001* | awk '{print $5, $8;}'
57151471 /tmp/cloud_phase_with_dust_20201001_080000.tif
58287673 /tmp/cloud_phase_with_dust_20201001_080020.tif
58448131 /tmp/convective_storms_with_24h_microphysics_20201001_080000.tif
58444157 /tmp/convective_storms_with_24h_microphysics_20201001_080020.tif
56867602 /tmp/day_microphysical_with_night_microphysical_20201001_080000.tif
55772658 /tmp/day_microphysical_with_night_microphysical_20201001_080020.tif
64940820 /tmp/day_solar_with_ash_20201001_080000.tif
64025212 /tmp/day_solar_with_ash_20201001_080020.tif

The top one is with Satpy 0.19.2, the lower (with seconds in the filenames) is Satpy 0.23.0.

@pnuu
Copy link
Member Author

pnuu commented Oct 2, 2020

Looking at the images, I think the file size differences are due to minute differences in the blending of Day/Night composites, and change in NIR reflectance (pytroll/pyspectral#112) cutting.

@pnuu
Copy link
Member Author

pnuu commented Oct 3, 2020

On Slack @mraspaud noticed that commenting out the self.start_time and self.end_time properties from hrit_jma.py::HRITJMAFileHandler reduced the processing time, and there were some thoughts that the differing times for the segments will cause some problems with metadata combination code.

I did another dask profiler test by commenting out only the start_time property. The resulting CPU/memory graphs are below.

Satpy 0.23.0:
with_start_time

Satpy 0.23.0 with the start_time property removed:
without_start_time

So indeed, the start_time property using acq_time is causing the increase in processing time and memory.

These graphs were made on my work laptop with more memory than the server all the previous graphs were made, so apart from the peak memory use these results are not comparable with the earlier ones.

@djhoese
Copy link
Member

djhoese commented Oct 3, 2020

I haven't been completely following this but are the start time and end time produced by the file handler single datetime objects or are they a single (scalar) array of type object or type datetime?

@pnuu
Copy link
Member Author

pnuu commented Oct 3, 2020

The start_time and end_time properties by the base class are single datetime objects. The versions in hrit_jma.py also return single datetime objects, but that includes getting them from an array of np.datetime64s and conversion to datetime.

@pnuu
Copy link
Member Author

pnuu commented Oct 7, 2020

Currently, with the acq_time used for start_time, the file times for IMG*001 segments of different channels covering the same area are as follow (sorted by time):

2016-10-07 03:00:20.131200
2016-10-07 03:00:20.131200
2016-10-07 03:00:20.131200
2016-10-07 03:00:20.304000
2016-10-07 03:00:20.304000
2016-10-07 03:00:20.304000
2016-10-07 03:00:20.476800
2016-10-07 03:00:20.476800
2016-10-07 03:00:20.476800
2016-10-07 03:00:20.563200
2016-10-07 03:00:20.563200
2016-10-07 03:00:20.563200
2016-10-07 03:00:20.563200
2016-10-07 03:00:20.563200
2016-10-07 03:00:20.563200
2016-10-07 03:00:20.822400
2016-10-07 03:00:20.822400
2016-10-07 03:00:20.822400
2016-10-07 03:00:20.908800
2016-10-07 03:00:20.908800
2016-10-07 03:00:20.908800
2016-10-07 03:00:20.908800
2016-10-07 03:00:20.908800
2016-10-07 03:00:20.908800
2016-10-07 03:00:21.168000
2016-10-07 03:00:21.168000
2016-10-07 03:00:21.168000
2016-10-07 03:00:21.254400
2016-10-07 03:00:21.254400
2016-10-07 03:00:21.254400
2016-10-07 03:00:21.427200
2016-10-07 03:00:21.427200
2016-10-07 03:00:21.427200
2016-10-07 03:00:21.513600
2016-10-07 03:00:21.513600
2016-10-07 03:00:21.513600
2016-10-07 03:00:21.600000
2016-10-07 03:00:21.600000
2016-10-07 03:00:21.600000

With the older version where the nominal time is used, all of these had start_time 2016-10-07 03:00:00.

@pnuu
Copy link
Member Author

pnuu commented Oct 8, 2020

Couple more tests:

Remove fractional seconds

start_time = self.acq_time[0].astype(datetime)
return start_time - dt.timedelta(microseconds=start_time.microsecond)

-> still slow and uses more memory

Remove seconds and fractional seconds:

start_time = self.acq_time[0].astype(datetime)
return start_time - dt.timedelta(seconds=start_time.second, microseconds=start_time.microsecond)

-> faster and lower memory usage.

This verifies that accessing the acquisition times isn't the culprit, but something else.

I'll see what happens in satpy.yaml_reader.FileYAMLReader.start_time property with return min(x[0].start_time for x in self.file_handlers.values()) with different versions.

@pnuu
Copy link
Member Author

pnuu commented Oct 8, 2020

Interesting. Doing (removing seconds and their fractions)

start_time = min(x[0].start_time for x in self.file_handlers.values())
start_time = start_time - dt.timedelta(seconds=start_time.second, microseconds=start_time.microsecond)
return start_time

in satpy.yaml_reader.FileYAMLReader.start_time property doesn't help. So the slow and memory-hungry bit happens somewhere between the file handler (hrit_ahi.py) code and this use of this property.

@pnuu
Copy link
Member Author

pnuu commented Oct 8, 2020

As far as I can see, all of this happens before the dependency tree is initialized. I'm running out of ideas to test.

@pnuu
Copy link
Member Author

pnuu commented Oct 8, 2020

Next test: memory profiling satpy.Scene.__init__(). No change in memory use at this point between self.acq_time[0].astype(datetime) and self._start_time.

@pnuu
Copy link
Member Author

pnuu commented Oct 8, 2020

Memory profiling Scene.save_datasets() didn't reveal anything that would hint at the root cause (delayed processing and all that), but shows nicely the increase in memory use:

Using self.acq_time[0].astype(datetime)

1369 5113.441 MiB 4928.590 MiB           return writer.save_datasets(datasets, compute=compute, **save_kwargs)

Using self._start_time

1369 3813.168 MiB 3628.602 MiB           return writer.save_datasets(datasets, compute=compute, **save_kwargs)

The number fields are: code line number / memory usage / increase in memory usage for that line.

@pnuu
Copy link
Member Author

pnuu commented Oct 8, 2020

Interestingly, using writer='simple_image' reverses the memory consumption..

@pnuu
Copy link
Member Author

pnuu commented Nov 12, 2020

Some more testing. If I load only plain channels without any modifiers, there is practically no difference in processing time. So now I'm suspecting that the time-dependent modifiers (sunz_modified, rayleigh_corrected) have something to do with the slowdown.

@pnuu
Copy link
Member Author

pnuu commented Dec 13, 2021

Another test. Replacing

    @property
    def start_time(self):
        """Get start time of the scan."""
        return self.acq_time[0].astype(datetime)

with a static, identical time to the Scene I'm testing, with:

    @property
    def start_time(self):
        """Get start time of the scan."""
        return datetime(2021, 12, 8, 6, 0, 20)

reduces the memory usage. So it isn't the "extra" seconds compared to the start_time derived from the filename causing the problem.

Doing self._start_time = times64[0].astype(datetime) in _get_acq_time() and using that in start_time() property is again slower and uses more memory. The same slowness happens if using self._start_time = times64[0].copy().astype(datetime) to get rid of potential Numpy references in the Dask graph.

@pnuu
Copy link
Member Author

pnuu commented Jan 19, 2022

Ok. Next hypothesis: each of the channel has a very slightly varying start_time, sometimes in the order of milliseconds. This might cause the SZA correction to be computed separately for each VIS channel.

@pnuu
Copy link
Member Author

pnuu commented Jan 19, 2022

Yes! Removing the composites where sunzen_corrected is used, removes also the increase in memory usage. In this case it doesn't matter if start_time property is commented out or as it is currently.

satpy.modifiers.SunZenithCorrector is calling satpy.modifiers.angles.get_cos_sza(), which is using the start_time of the given dataset. I'll test if rounding the time to a full second/minute changes things.

@pnuu
Copy link
Member Author

pnuu commented Jan 19, 2022

Adjusting the cache key creation in SunZenithCorrector from

        key = (vis.attrs["start_time"], area_name)

to

        key = (0, 1)

should always re-use the cached coszen array, but my debug printing shows it is always recomputed. Adjusting the start_time property still has the desired effect even if the prints show repeated computes. I'm again getting lost 😅

@djhoese
Copy link
Member

djhoese commented Jan 19, 2022

How are you determining that things are being recomputed?

@pnuu
Copy link
Member Author

pnuu commented Jan 19, 2022

Just by my debug prints in if/elif/else blocks. The code should go to else (my addition for debug printing) when the coszen data are available from the cache, but it is always None and goes to the if block where satpy.modifiers.angles.get_cos_sza() is called. So this is always returning None, no matter what the cache key is.

I'm getting too tired now for further work, but I'll continue tomorrow if there's nothing more urgent happening

@djhoese
Copy link
Member

djhoese commented Jan 19, 2022

I wonder if the WeakValueDictionary used for the cache is no longer doing anything useful after various satpy refactorings and new versions of python.

Either way, I asked about computing because there is a chance that if the get_cos_sza() produces the same set of dask tasks then it should be alright. There is a chance that the default names dask is generating for the dask arrays are too unique and later on when it goes to optimize the dask graph it doesn't realize that the tasks are the same. For some of these tasks we may be able to generate the name manually to ensure that this doesn't happen. So what I wanted to make sure by asking my question was if the dask array is being generated multiple times (multiple calls to get_cos_sza) or if the dask array is being computed multiple times (multiple equal arrays being generated during .compute()). I see that you mean the former.

@pnuu
Copy link
Member Author

pnuu commented Jan 19, 2022

Printing the size of self.coszen_cache at the beginning of each call to SunZenithCorrector prints 0 and at the end 1. So the cache seems to get cleared between correcting different channels.

@djhoese
Copy link
Member

djhoese commented Jan 19, 2022

That will likely happen when there are no more handles (reference counts) on the value returned by the dictionary. If we are only using the dask array (which it looks like we are) then maybe when the DataArray version of the cza result is garbage collected it is also removed from the WeakValueDictionary cache.

@pnuu
Copy link
Member Author

pnuu commented Jan 20, 2022

Yep, the WeakValueDictionary got cleared after use as all the calls were considered unique (different start_times). Changing the caching to a normal dictionary shows the accumulation of cached cossza arrays. And further rounding the time used in the cache key shows that the cached cossza is re-used. The processing time and memory use are not affected though 🤦‍♂️.

@pnuu
Copy link
Member Author

pnuu commented Jan 20, 2022

A simple workaround would be to add a reader kwarg use_exact_start_time=True to the file handler and either return the time with microsecond resolution (default, current) or self._start_time which uses the datetime parsed from the filename. But I think the better choice would be to round the cached data time to closest minute(?) similar to the angles are rounded to closest 0.5 degrees (or something) when caching on-disk.

@pnuu
Copy link
Member Author

pnuu commented Jan 20, 2022

I went the easy way and made a PR using the reader_kwarg approach.

@mraspaud
Copy link
Member

Rounding to the closest second would work here, right? We could even have a flag for the caching that say how long is acceptable

@pnuu
Copy link
Member Author

pnuu commented Jan 20, 2022

I tried that, but evidently didn't find all the places where data_arr.attrs['start_time'] were used as adding rounding to those places didn't help. And those places were plentiful, so couldn't even think of a single place, other than in the reader, where that could've been done instead of doing it in several places.

@djhoese
Copy link
Member

djhoese commented Jan 20, 2022

So making sure that the cossza is reused didn't effect performance? And we're still not sure where/how things are going wrong?

@pnuu
Copy link
Member Author

pnuu commented Jan 20, 2022

No, it didn't affect. Only using the nominal time lowered the memory usage and made the whole processing go faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants