Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets: consistent capitalization of band names #778

Merged
merged 2 commits into from Dec 19, 2022
Merged

Conversation

adamjstewart
Copy link
Collaborator

There seems to be a 50/50 mix of RGB_BANDS/ALL_BANDS and rgb_bands/all_bands in our datasets. The GeoDataset base class uses lowercase, so this PR changes all other datasets to match. From what I can tell, PEP-8 doesn't seem to distinguish between class attributes and instance attributes, so I'm not sure if these are considered variables or global variables.

@ashnair1 not sure if this affects #687

@adamjstewart adamjstewart added this to the 0.4.0 milestone Sep 14, 2022
@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Sep 14, 2022
Copy link
Member

@calebrob6 calebrob6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect all caps here like I would use in an Enum (e.g. cv2.INTER_LINEAR).

@adamjstewart
Copy link
Collaborator Author

This isn't exactly an enum, it's a class attribute. Even then, we often override many of these class attributes at the instance level (filename_glob is overridden in both Landsat and Sentinel __init__ methods). PEP-8 makes it clear that global variables should be uppercase and instance variables should be lowercase, but makes no mention of class variables. Do you know of any style guides that take a stance on how class attributes should be formatted? If we do decide to capitalize all of them, we'll also have to capitalize the rest of the class attributes in GeoDataset, RasterDataset, and VectorDataset.

@calebrob6
Copy link
Member

I'm saying it behaves like an Enum, so I would expect it to be all caps. Changing these to all caps doesn't mean every class attribute has to be capitalized. If that inconsistency is too much, then we could make it an enum, or just not capitalize it I guess.

@ashnair1
Copy link
Collaborator

For now, I would just go with not capitalizing as in this PR. Since this seems to only occur in NonGeoDatasets, it shouldn't affect #687.

Besides, the capitalization convention is for Enum members not the Enum itself.

Example:

class AllBands(Enum):
    RED: 1
    GREEN: 2
    BLUE: 3 
    NIR: 4

There might be some value in representing bands as Enum though. Enums are generally clearer, you get auto completion and it could help with band indexing for GeoDatasets. The current method involves iterating through the band list and indexing it for every band specified. This is fine for current optical datasets but if we ever get into hyper-spectral images this could potentially slow things down.

List of str (current)

class Landsat8(Landsat):
    """Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS)."""

    filename_glob = "LC08_*_{}.*"

    all_bands = [
        "SR_B1",
        "SR_B2",
        "SR_B3",
        "SR_B4",
        "SR_B5",
        "SR_B6",
        "SR_B7",
        "SR_B8",
        "SR_B9",
        "SR_B10",
        "SR_B11",
    ]
    rgb_bands = ["SR_B4", "SR_B3", "SR_B2"]

vs

Using Enum

class Landsat8(Landsat):
    """Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS)."""

    filename_glob = "LC08_*_{}.*"

    class Landsat8Bands(Enum):
        COASTAL_AEROSOL = 1
        BLUE = 2
        GREEN = 3
        RED = 4
        NIR = 5
        SWIR1 = 6
        SWIR2 = 7
        PAN = 8
        CIRRUS = 9
        TIRS1 = 10
        TIRS2 = 11
    
    rgb_bands = [Landsat8Bands.RED, Landsat8Bands.GREEN, Landsat8Bands.BLUE]

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Sep 27, 2022

I thought @calebrob6 meant something like:

class Landsat8Bands(Enum):
    RGB = ["SR_B4", "SR_B3", "SR_B2"]
    ALL = ["SR_B1", ..., "SR_B11"]

The problem with that is it doesn't allow you to enumerate arbitary subsets/orders/combinations.

@calebrob6
Copy link
Member

Yes, that is what I meant. I don't care too much about capitalization vs. not capitalization -- generally, I want the choices here to be obvious to users. All caps band choices seems like a decent way to separate out the "interesting" class attributes vs. the instance attributes. Enums are probably the right way, but for now let's just go all lowercase.

calebrob6
calebrob6 previously approved these changes Sep 30, 2022
@adamjstewart
Copy link
Collaborator Author

I'd rather not change this back and forth multiple times. There's no rush to merge this, but let's come to a consensus on the best way to do this before changing it.

@calebrob6
Copy link
Member

Kk, I would suggest:

  • Datasets take a list of bands as str where the band names are consistent everywhere.
  • Datasets include an Enum that has combinations of interesting bands as @adamjstewart suggests above. This doesn't preclude users defining their own combinations if they want.
  • Somewhere (maybe in the RasterDatasets like Landsat* and Sentinel2) there are Enums that actually name the bands as @ashnair1 suggests above (e.g. mapping "Red" to "B04"). This way, what the bands actually are is documented, and users can specify that they want [Red, Green, Blue] or ["B04", "B03", "B02"] depending on their familiarity.

@adamjstewart
Copy link
Collaborator Author

Datasets take a list of bands as str

Already the case I hope.

where the band names are consistent everywhere.

This is much harder. B01 in one dataset does not match a different dataset. Some datasets call it B01 and others call it B1. I'm not sure if it makes sense or is even possible to match band names across datasets.

Datasets include an Enum that has combinations of interesting bands

I feel like this is the wrong use case for enums. Typically, enums are used when you want to limit the input to a specific subset of options, and you want to document those options with names instead of numbers. If the methods accept a list or an enum, that feels weird to me.

users can specify that they want [Red, Green, Blue] or ["B04", "B03", "B02"] depending on their familiarity.

Really like this idea, but I'll need to think about how to do this properly. I don't love our blog where we use bands=Landsat8.all_bands[1:-2] without any clear reason why.

@calebrob6
Copy link
Member

Hmm, I feel like this shouldn't be difficult... The requirements here are to expose the interesting subsets of band combinations to the user in a way that is easily discoverable right? My problem with including these as just another lower-case class attribute is that they aren't really easily discoverable.

This is much harder. B01 in one dataset does not match a different dataset. Some datasets call it B01 and others call it B1. I'm not sure if it makes sense or is even possible to match band names across datasets.

I mean across datasets that use the same satellite / platform. E.g. we can easily remap "B1" to "B01" to standardize across datasets that use Sentinel 2 if we want.

I feel like this is the wrong use case for enums.

Out of curiosity, what do style guides say about this? E.g. it doesn't seem wrong at all to me.

@adamjstewart
Copy link
Collaborator Author

Out of curiosity, what do style guides say about this?

Unfortunately, PEP-8 was written before Python supported Enums.

@adamjstewart
Copy link
Collaborator Author

Enums for band numbers would be super useful for our index transforms (e.g., NDVI).

@adamjstewart
Copy link
Collaborator Author

Datasets include an Enum that has combinations of interesting bands as @adamjstewart suggests above. This doesn't preclude users defining their own combinations if they want.

I think this defeats the purpose of an enum. Enums are supposed to be a list of all possible accepted values. If they are only some possible values, then they're basically variables.

@calebrob6
Copy link
Member

Maybe we can think about this in a different way -- currently, how should a user know which class variables in Landsat8 are acceptable to pass as argument to bands (and more generally for all of our datasets)? We could and should document this, but I'm wondering if there is anything else that we can do to make it more casually obvious to a user that is, e.g., inspecting the class in an IDE?

Maybe it is as simple as consistently providing .all_bands and .rgb_bands?

@adamjstewart
Copy link
Collaborator Author

I think consistently providing .all_bands is a good start, and I think documentation would be even better.

I'm thinking we should probably merge this PR as is and consider moving to enums in the future. Let me resolve the merge conflicts...

@calebrob6
Copy link
Member

calebrob6 commented Dec 19, 2022

Sounds good to me!

@adamjstewart, meta question -- do you have a long term set of issues like this that you keep track of?

@calebrob6 calebrob6 merged commit 064f9c3 into main Dec 19, 2022
@calebrob6 calebrob6 deleted the datasets/rgb-all branch December 19, 2022 19:39
@adamjstewart
Copy link
Collaborator Author

Issues like what?

I'm only pushing for this to be merged so I have one less open PR. Other than that, I just check https://github.com/microsoft/torchgeo/issues and https://github.com/microsoft/torchgeo/pulls once in a while. https://github.com/microsoft/torchgeo/milestones is used to track releases, and #931 (comment) is tracking things I'm focusing on for the next release.

@calebrob6
Copy link
Member

I mean high-level API things to think about for future development (e.g. "how should users know what options they have for band sets?")

@adamjstewart
Copy link
Collaborator Author

Ah, not really. I just define the API in the base class and try to stick to it. It should be documented in the base class.

yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* Datasets: consistent capitalization of band names

* Fix tests too
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants