Extend use of bz2 compression for input files for seviri_l1b_hrit #1796

pdebuyl · 2021-08-19T12:16:00Z

Feature Request

Is your feature request related to a problem? Please describe.

I use MSG seviri data in HRIT format. For some reason, our archive has the epilogue and prologue files compressed with bzip2 on disk. Currently, this is not possible to pass bzip2 compressed files to the seviri_l1b_reader

Describe the solution you'd like

Add .bz2 to the list of supported extension for prologue and epilogue files and use the unzip_file helper to decompress them as needed.

Describe any changes to existing user workflow

This does not remove any existing feature and does not change how one would use satpy.

Additional context

In principle, I could provide decompressed files to satpy but it is of course more convenient if the process is automatic.

I understand fully if this extra convenience is not taken into account because it is too specific. Else, I would be willing to draft a PR using another bz2 aware reader as example.

The text was updated successfully, but these errors were encountered:

mraspaud · 2021-08-19T13:40:43Z

@pdebuyl Interesting! Is your archive accessible to all or is it only for internal usage?
Anyway, the impact is small, so I would be fine with this being implemented if it makes your life easier :)

The unzip_file helper would indeed work great, but I was wondering if we could make it even better by making it a context manager? That way, it would be possible to do:

with unzip_file(somefilename) as non_compressed_file:
    with open(non_compressed_file)....

where unzip_file would yield the decompressed filename if the file was compressed in the first place, or alternatively the original filename in the case of not compressed data.

What do you think?

pdebuyl · 2021-08-20T09:39:52Z

Hi @mraspaud thank you for replying!

Our archive is internal. It is a straight dump from EUMETCast though, with the prologue, epilogue, and cloudmask "bzip2-ed". The HRIT files use the internal compression (extension C_).

Regarding the use of a context manager, does satpy rely on lazy reading for some of the unzip_file-using readers? If so, the context should extent to the duration of use of the xarray dataset, which is less trivial. In other words: should the uncompressed file survive after xr.open_dataset is called?

mraspaud · 2021-08-20T11:05:14Z

Some of the reader indeed do read the data lazily. However this is not the case for the prologue and epilogue files as far as I know, and when we do read lazily, we use np.memmap, which should in principle hold to the file until closed (so even if the file is removed, the data will still be available):

In [15]: arr = np.ones((10000, 10000))

In [16]: arr.tofile("ones", "")

In [19]: data = np.memmap("ones", dtype=np.float)

In [20]: data
Out[20]: memmap([1., 1., 1., ..., 1., 1., 1.])

In [21]: os.remove("ones")

In [22]: data
Out[22]: memmap([1., 1., 1., ..., 1., 1., 1.])

In [23]: data * 5
Out[23]: array([5., 5., 5., ..., 5., 5., 5.])

pdebuyl · 2021-08-20T14:18:11Z

I have started to test an update. I tried first at the level of HRITFileHandler.__init__ but this calls _get_hd to obtain header information, which means that the file is already open at that point.

Current approach: modify self.filename before reaching _get_hdr with the temporary unzipped file.

It seems to work but needs some cleaning up. Also, would you like to replace the current uses of unzip_file with the context manager?

I'll give an update next week.

mraspaud · 2021-08-23T06:46:16Z

I was thinking about using unzip_file here: https://github.com/pytroll/satpy/blob/main/satpy/readers/seviri_l1b_hrit.py#L229-L240
and here: https://github.com/pytroll/satpy/blob/main/satpy/readers/seviri_l1b_hrit.py#L301-L306

Regarding replacing the current usages, if it's possible, I definitely think we should use the context syntax :)

pdebuyl · 2021-08-23T09:16:39Z

I found that you need to start the context before here: https://github.com/pytroll/satpy/blob/main/satpy/readers/hrit_base.py#L160

Else, the _get_hd routine open the zipped file.

I have the following as a WIP: https://github.com/pdebuyl/satpy/tree/bzip2_PRO_EPI

I don't remove the temporary files yet as they are removed too early with this solution.

mraspaud · 2021-08-23T09:34:32Z

ok, I understand. So indeed, _get_hd needs to use the unzipping context too.

pdebuyl · 2021-08-23T11:14:20Z

I have not yet looked at the image file handling, but for the prologue and epilogue, the files are opened and closed in _get_hd and re-opened as necessary.

Another solution: open the context before the __init__ of HRITMSGPrologueFileHandler, and keep it until read_prologue (and read_epilogue) is called.

I pushed this other solution to the same branch, if it is accepted I can make another branch or squash it.

mraspaud · 2021-08-23T13:10:47Z

The easiest would be if you made a PR with your current branch, so we can review it properly. If you feel there are too many commits, we can do a squash merge when merging, so don't worry about that :)

mraspaud added the enhancement code enhancements, features, improvements label Aug 19, 2021

pdebuyl mentioned this issue Aug 23, 2021

Add on-the-fly bz2 decompression for HRIT MSG PRO and EPI files #1798

Merged

4 tasks

mraspaud closed this as completed in #1798 Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend use of bz2 compression for input files for seviri_l1b_hrit #1796

Extend use of bz2 compression for input files for seviri_l1b_hrit #1796

pdebuyl commented Aug 19, 2021

mraspaud commented Aug 19, 2021 •

edited

pdebuyl commented Aug 20, 2021

mraspaud commented Aug 20, 2021

pdebuyl commented Aug 20, 2021

mraspaud commented Aug 23, 2021

pdebuyl commented Aug 23, 2021

mraspaud commented Aug 23, 2021

pdebuyl commented Aug 23, 2021

mraspaud commented Aug 23, 2021

Extend use of bz2 compression for input files for seviri_l1b_hrit #1796

Extend use of bz2 compression for input files for seviri_l1b_hrit #1796

Comments

pdebuyl commented Aug 19, 2021

Feature Request

mraspaud commented Aug 19, 2021 • edited

pdebuyl commented Aug 20, 2021

mraspaud commented Aug 20, 2021

pdebuyl commented Aug 20, 2021

mraspaud commented Aug 23, 2021

pdebuyl commented Aug 23, 2021

mraspaud commented Aug 23, 2021

pdebuyl commented Aug 23, 2021

mraspaud commented Aug 23, 2021

mraspaud commented Aug 19, 2021 •

edited