SSL4EO-S12: add new dataset/datamodule #1151

adamjstewart · 2023-02-28T21:34:59Z

Adds the SSL4EO-S12 dataset.

Some things the dataset doesn't currently support because I don't need them, but could potentially be added someday:

Automatic downloading: same as SEN12MS and So2Sat, we haven't yet figured out automatic downloads from the TUM servers
RGB version, 100-patch MSI subset, 50K RGB subset: alternate versions of the dataset on OneDrive and Google Drive
Combining s1, s2c, or s2a: we don't yet have an easy model for supporting this in trainers and transforms
Spectral band subsets: adds complexity to the dataset
GeoDataset: each file has geographic metadata, we could build a GeoDataset

Plotting seems broken at the moment, will upload plots once that's working better.

adamjstewart · 2023-02-28T21:35:29Z

tests/data/ssl4eo/data.py

+
+    if path.endswith("B1.tif") or path.endswith("B9.tif"):
+        profile["width"] = profile["height"] = SIZE // 6
+    elif (


There's probably a better way to do this...

torchgeo/datasets/ssl4eo.py

adamjstewart · 2023-02-28T21:39:13Z

torchgeo/datasets/ssl4eo.py

+        directory = os.path.join(self.root, self.split, f"{index // self.times:07}")
+        subdirs = sorted(os.listdir(directory))
+        directory = os.path.join(directory, subdirs[index % self.times])


This is somewhat hard-coded, but is much faster than storing a list of every directory like we do in SeCo.

adamjstewart · 2023-02-28T21:39:45Z

torchgeo/datasets/ssl4eo.py

+        for band in self.metadata[self.split]["bands"]:
+            filename = os.path.join(directory, f"{band}.tif")
+            with rasterio.open(filename) as f:
+                image = f.read(out_shape=(1, self.size, self.size)).astype(np.float32)


Each Sentinel band is in its original resolution (10–60 m) and needs to be resampled to the higher resolution.

adamjstewart · 2023-02-28T21:40:29Z

torchgeo/datasets/ssl4eo.py

+            a matplotlib Figure with the rendered sample
+        """
+        if self.split == "s1":
+            # See Sentinel1.plot


Wish we could call the plot methods of Sentinel1 and Sentinel2 directly so we didn't have to copy the same logic into every dataset. I bet we could figure out a way to do this...

adamjstewart · 2023-03-01T05:07:02Z

torchgeo/datasets/ssl4eo.py

+
+            co_polarization = torch.clamp(co_polarization / 0.3, min=0, max=1)
+            cross_polarization = torch.clamp(cross_polarization / 0.05, min=0, max=1)
+            ratio = torch.clamp(ratio / 25, min=0, max=1)


@RitwikGupta this plotting trick isn't working, all I see is a black image with red specs. I know you're not the biggest fan of RGB plotting, but this dataset presumably only has one type of image processing so it should be possible to reliably plot. Not sure how to tell if the image is power/decibel/amplitude or gamma/sigma and what that actually means for plotting. Any advice? You can download the 100 patch subset if you want to test the plotting method.

@adamjstewart The Sentinel-1 imagery they pulled from GEE is in the decibel scale. Normally, Sentinel-1 data from Copernicus is in the power scale. The ratio you want to calculate should be in the power scale. This code will work to do visualize this specific data:

cross_pol = np.exp(cross_pol / 10) co_pol = np.exp(co_pol / 10) ratio = cross_pol / co_pol fig, ax = plt.subplots(1, 4, figsize=(20, 10)) ax[0].imshow(cross_pol, cmap='bone') ax[1].imshow(co_pol, cmap='bone') ax[2].imshow(ratio, cmap='bone') ax[3].imshow(np.concatenate([cross_pol, co_pol, ratio], axis=2))

Here's the relevant GEE documentation that says they convert to decibel when pulling S1 data in GEE: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD

I thought it was supposed to be [co_pol, cross_pol, ratio], not [cross_pol, co_pol, ratio]?

Am I not supposed to do normalization (co_pol / 0.3, cross_pol / 0.05, ratio / 25)?

Even with divide by 10, no normalization, swap co_pol/cross_pol, the images don't look great:

Which image are you using? I can test with the same one.

I think you can keep [co_pol, cross_pol, ratio].

Are you doing np.exp(x / 10) or just x /10?

You should use the log scale values in that post for viz purposes only. It's not required, but it may help. Do min-max norm to those ranges, not just division.

Oh yes, I was using x / 10 instead of np.exp(x / 10). However, it still looks pretty bad. I also tried 10 ** (x / 10). I'm just plotting the first 20 images of the 100 patch subset if you want to try to tinker with it.

In the post, it's unclear to me how one is supposed to plot db data. If I immediately convert from db to power, would I use the linear values or the db values? Is linear different from power? What does it mean for values to range from -15 to 5 (for red)? Does that mean to convert values from [-15, 5] to [0, 1]? Do I perform np.exp(x / 10) before or after all this normalization? Do I use co / cross or co - cross?

why do you want to use power values to plot? In my paper, I simply plot the dB values (99% percentile and scale each one to [0,1])

I wanted to try to make an RGB plot (see #821 (comment) for an example) because it's prettier and in theory provides more useful information. Unfortunately it's also much harder to plot. We could plot 2 grayscale plots in the worst case scenario.

then can also plot dB for each channel? e.g. [VV,VH,VV/VH]?

I think that's what I was doing originally and it didn't work either. I just copied the plotting code from our Sentinel1 dataset, which worked fine for data downloaded from ASF DAAC, but apparently not for data downloaded from GEE.

adamjstewart · 2023-03-10T02:56:08Z

S1 plotting is definitely still broken, but I would be okay with merging this as is and fixing S1 plotting in a follow-up PR since I need this dataset soon.

I'm planning on adding a datamodule for this, but will wait until #1168 is merged so I can properly test it.

calebrob6 · 2023-03-29T10:00:53Z

I don't have the S1 downloaded so can't check it

adamjstewart · 2023-03-29T15:20:50Z

I don't have the S1 downloaded so can't check it

You can download the 100-patch subset for testing.

adamjstewart · 2023-03-29T15:32:20Z

torchgeo/datasets/ssl4eo.py

@calebrob6 this directly contradicts the change made in #1027. Should we revert that PR too? Given that:

All of our datasets use a different scale factor, ignoring which satellite their images come from

All images are normalized before they are returned from the dataset or datamodule, changing the appropriate scale factor

I'm starting to think we should use one of the default enhancements from QGIS:

Min / max

2% / 98%

± 2 std dev

There could even be an option to select which of these you want to use. We can decide and implement this later, but I want to make sure we're not contradicting ourselves and won't revert #1027 multiple times until we come up with a uniform policy. This also relates to #496.

dividing by 10k obviously didn't work though

Then why did we merge #1027?

calebrob6 · 2023-03-29T16:58:45Z

s1 plotting is broken

calebrob6 · 2023-03-29T17:05:33Z

I'm guessing the images from gee and other places are different

…

On Wed, Mar 29, 2023, 8:04 PM Adam J. Stewart ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ On torchgeo/datasets/ssl4eo.py <#1151 (comment)>: Then why did we merge #1027 <#1027>? — Reply to this email directly, view it on GitHub <#1151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIJUTS4JTDHT6ICPWN5GU3W6RTSJANCNFSM6AAAAAAVLIDS24> . You are receiving this because you were mentioned.Message ID: ***@***.***>

adamjstewart · 2023-03-29T17:10:54Z

Yep. We could just plot 2 grayscale images instead of one RGB image for now and fix this in the future when someone (me) has a better understanding of SAR imagery. I'd prefer not to merge this with broken plotting, but we could open an issue and promise to fix it before 0.5.0 is released.

RitwikGupta · 2023-03-29T17:20:03Z

Sorry all. I was out on vacation and then sick for a bit. I will take a look at this again. GEE imagery is in the decibel scale instead of the power scale most Sentinel-1 imagery is released in. This is standard for analysis and visualization.

adamjstewart · 2023-04-05T21:01:53Z

Any updates @RitwikGupta?

We might want to just merge this PR and fix S1 plotting at a later date.

calebrob6 · 2023-04-09T20:42:31Z

Yep, I would vote to make the plot show something and merge

adamjstewart · 2023-04-09T22:52:44Z

Alright, see how this looks:

* SSL4EO-S12: add new dataset * Style fixes * 100% coverage * fix mypy * black fixes * mypy fix * Convert from db to power * Don't cast to numpy * Remove comments referring to SeCo * SSL4EO: add extraction time * Add RandomSeasonContrast * Fix axes indexing * Add datamodule * fix tests * mypy fixes * fix missing import * Fix tests * isort fix * Typo fix * s2c: add B10 * Update test channels * S2 plotting was broken * Fix plotting * Black fix * Rename conf files * Remove file introduced by bad merge * Fix pixel size of bands * black fix * Better S1 plotting --------- Co-authored-by: Caleb Robinson <calebrob6@gmail.com>

adamjstewart added this to the 0.5.0 milestone Feb 28, 2023

github-actions bot added datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing labels Feb 28, 2023

adamjstewart commented Feb 28, 2023

View reviewed changes

adamjstewart commented Mar 1, 2023

View reviewed changes

adamjstewart marked this pull request as ready for review March 10, 2023 02:54

isaaccorley previously approved these changes Mar 10, 2023

View reviewed changes

adamjstewart added 13 commits March 17, 2023 14:25

SSL4EO-S12: add new dataset

e7afd1f

Style fixes

44f2953

100% coverage

45fc445

fix mypy

729d377

black fixes

a2f39d5

mypy fix

0f0e02d

Convert from db to power

1ad0fb6

Don't cast to numpy

d2a8049

Remove comments referring to SeCo

3df7ec1

SSL4EO: add extraction time

a424a77

Add RandomSeasonContrast

ec0379a

Fix axes indexing

6172fff

Add datamodule

4b0774c

adamjstewart dismissed isaaccorley’s stale review via 4b0774c March 17, 2023 20:48

adamjstewart force-pushed the datasets/ssl4eo branch from 8bdad76 to 4b0774c Compare March 17, 2023 20:48

github-actions bot added the datamodules PyTorch Lightning datamodules label Mar 17, 2023

adamjstewart added 5 commits March 17, 2023 23:19

fix tests

7b32aa3

mypy fixes

bd5962a

fix missing import

0265ccf

Fix tests

6529150

isort fix

8268f6e

s2c: add B10

1fd7a2b

adamjstewart dismissed nilsleh’s stale review via 1fd7a2b March 27, 2023 15:02

adamjstewart and others added 3 commits March 27, 2023 10:37

Update test channels

80a5c3a

S2 plotting was broken

fe28fe9

Fix plotting

651347b

adamjstewart commented Mar 29, 2023

View reviewed changes

adamjstewart force-pushed the datasets/ssl4eo branch from c31d3af to 651347b Compare March 29, 2023 15:53

adamjstewart added 2 commits March 29, 2023 10:57

Merge branch 'main' into datasets/ssl4eo

5cae88d

Black fix

b4e4410

adamjstewart added 3 commits March 30, 2023 15:21

Merge branch 'main' into datasets/ssl4eo

c4a5c75

Rename conf files

5dbcab6

Remove file introduced by bad merge

8af0a53

adamjstewart added 3 commits April 9, 2023 17:19

Fix pixel size of bands

87f068e

black fix

d553b57

Better S1 plotting

90714a0

calebrob6 approved these changes Apr 15, 2023

View reviewed changes

calebrob6 merged commit 8e9a894 into main Apr 15, 2023
21 checks passed

SSL4EO-L automation moved this from In progress to Done Apr 15, 2023

calebrob6 deleted the datasets/ssl4eo branch April 15, 2023 20:51

adamjstewart mentioned this pull request Sep 25, 2023

Add BioMassters Dataset #1560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSL4EO-S12: add new dataset/datamodule #1151

SSL4EO-S12: add new dataset/datamodule #1151

adamjstewart commented Feb 28, 2023 •

edited

adamjstewart Feb 28, 2023

adamjstewart Feb 28, 2023

adamjstewart Feb 28, 2023

adamjstewart Feb 28, 2023

adamjstewart Mar 1, 2023

RitwikGupta Mar 1, 2023

RitwikGupta Mar 1, 2023

adamjstewart Mar 1, 2023

RitwikGupta Mar 6, 2023

adamjstewart Mar 6, 2023

wangyi111 Mar 27, 2023

adamjstewart Mar 27, 2023

wangyi111 Mar 27, 2023

adamjstewart Mar 27, 2023

adamjstewart commented Mar 10, 2023

calebrob6 commented Mar 29, 2023

adamjstewart commented Mar 29, 2023

adamjstewart Mar 29, 2023

calebrob6 Mar 29, 2023

adamjstewart Mar 29, 2023

calebrob6 commented Mar 29, 2023

calebrob6 commented Mar 29, 2023 via email

adamjstewart commented Mar 29, 2023

RitwikGupta commented Mar 29, 2023

adamjstewart commented Apr 5, 2023

calebrob6 commented Apr 9, 2023

adamjstewart commented Apr 9, 2023

SSL4EO-S12: add new dataset/datamodule #1151

SSL4EO-S12: add new dataset/datamodule #1151

Conversation

adamjstewart commented Feb 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamjstewart commented Mar 10, 2023

calebrob6 commented Mar 29, 2023

adamjstewart commented Mar 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calebrob6 commented Mar 29, 2023

calebrob6 commented Mar 29, 2023 via email

adamjstewart commented Mar 29, 2023

RitwikGupta commented Mar 29, 2023

adamjstewart commented Apr 5, 2023

calebrob6 commented Apr 9, 2023

adamjstewart commented Apr 9, 2023

adamjstewart commented Feb 28, 2023 •

edited