Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File rename: nwpu.py -> vhr10.py #1030

Merged
merged 4 commits into from
Jan 23, 2023
Merged

File rename: nwpu.py -> vhr10.py #1030

merged 4 commits into from
Jan 23, 2023

Conversation

adamjstewart
Copy link
Collaborator

This rename was originally done in #847, but I asked @ashnair1 to leave it for a follow-up PR so that the diff would be easier to review. This is the follow-up PR.

I agree with @ashnair1, it doesn't make sense to name the file nwpu.py. We already have other datasets from NWPU, VHR-10 isn't the only one. And none of these datasets have anything in common, so there's no need to group them.

There are no public API changes since users are supposed to import the dataset from torchgeo.datasets, not torchgeo.datasets.nwpu.

@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Jan 23, 2023
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 23, 2023
docs/api/datasets.rst Outdated Show resolved Hide resolved
@adamjstewart
Copy link
Collaborator Author

I would undo the last commit. It's still called the NWPU VHR-10 dataset.

@calebrob6
Copy link
Member

If it is called the NWPU VHR-10 dataset then it should be in nwpu.py? We have RESISC45 that comes from NWPU but refer to it as RESISC45. We shouldn't call it VHR-10 in some places and "NWPU VHR-10" in others.

@adamjstewart
Copy link
Collaborator Author

@ashnair1 do you want to comment?

nwpu.py isn't very specific. It would be like calling it nasa.py and grouping together all datasets produced by NASA. We could call it nwpu_vhr10.py, but NWPUVHR10 is a bit too many initials for my taste.

RESISC45 is always referred to as RESISC45 in the literature, but NWPU VHR-10 is always referred to as NWPU VHR-10 in the literature [1], [2].

@adamjstewart
Copy link
Collaborator Author

I obviously don't feel very strongly about this (I was the one who chose the original name) but I think @ashnair1 wanted the rename: #847 (comment)

@ashnair1
Copy link
Collaborator

ashnair1 commented Jan 23, 2023

I agree with @calebrob6. NWPU is the organisation and VHR-10 is the dataset. So let's call it the VHR-10 dataset.

@adamjstewart
Copy link
Collaborator Author

My only concern is that Very-High-Resolution isn't a very specific dataset name. It's always referred to as NWPU VHR-10 in the literature, so why not call it that?

@ashnair1
Copy link
Collaborator

My only concern is that Very-High-Resolution isn't a very specific dataset name. It's always referred to as NWPU VHR-10 in the literature, so why not call it that?

I would think VHR10 i.e. Very-High-Resolution with 10 classes is specific enough no? My thought is as long as users recognize the dataset and we accurately reference the dataset we're fine.

RESISC45 is always referred to as RESISC45 in the literature, but NWPU VHR-10 is always referred to as NWPU VHR-10 in the literature [1], [2].

I haven't worked with this dataset but I see papers referring to it as NWPU RESISC45

Ultimately it comes down to this

  1. Do we believe the name of the dataset is ORG DATASET_NAME or DATASET_NAME? Whatever we choose it must be consistent across both RESISC45 and VHR10.
  2. Do we want to reorganize the file (put both resisc45 and vhr10 in one file nwpu.py) or just let the ORG DATASET_NAME be a documentation thing i.e. only reflected in the dataset.rst and dataset.csv files.

Personally I prefer VHR10 throughout. But if we want to use NWPU VHR10 and NWPU RESISC45, I suggest using it in only the .rst and .csv file and not put those two datasets into nwpu.py

@calebrob6
Copy link
Member

Tensorflow calls it RESISC45 -- https://www.tensorflow.org/datasets/catalog/resisc45.

We can just make the decision to call it VHR-10 because noone else has built out a dataloaders ;)

@adamjstewart
Copy link
Collaborator Author

To summarize this conversation, I think we all agree on the following:

  1. We should not have a single nwpu.py containing VHR-10 and RESISC45 (the datasets have very little in common and no shared code)
  2. The class names should not be NWPUVHR10 or NWPURESISC45 (so as to avoid super long acronyms)

The only remaining questions to be addressed in this PR are:

  1. What to call the filenames? I think we want vhr10.py and resisc45.py
  2. How to refer to these datasets in the documentation?

I propose we let the authors decide. If we look at the original papers that introduce these datasets, the authors call them NWPU-RESISC45 [1] and NWPU VHR-10 [2], [3].

So my personal preference is to add NWPU to the docs only, and my secondary preference would be to remove NWPU from the section headers but keep it in the dataset docstring. Would the latter be a good compromise?

@ashnair1
Copy link
Collaborator

  1. Yes vhr10.py and resisc45.py are better names
  2. I am fine with both options suggested.

@adamjstewart
Copy link
Collaborator Author

Should be good to go now.

@calebrob6 calebrob6 merged commit e2680b4 into main Jan 23, 2023
@calebrob6 calebrob6 deleted the datasets/vhr10 branch January 23, 2023 19:45
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* File rename: nwpu.py -> vhr10.py

* Update more locations

* Name change in several other places

* Add NWPU only to docstring

Co-authored-by: Caleb Robinson <calebrob6@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants