Add dataset table to the documentation #435

calebrob6 · 2022-02-26T03:47:45Z

In the TorchGeo paper we have a table that lists the properties of some of the datasets we've added. This should be reproduced in the docs so that we have an overview of what all is available in the library.

I've just copied (more/less) the tables from the paper and haven't updated them with the datasets that have been implemented since.

Things to look into/questions:

What other columns can we add for the Geospatial datasets? In the paper we've split it into Benchmark/Generic, however that's not really the organization that we have here. Perhaps we need a single table with a "Type" column?
Adding hyperlinks to the citations for each dataset
Updating the CSVs to be current

adamjstewart · 2022-02-26T18:29:03Z

What other columns can we add for the Geospatial datasets? In the paper we've split it into Benchmark/Generic, however that's not really the organization that we have here. Perhaps we need a single table with a "Type" column?

This starts to re-raise the question of how we should categorize datasets. We keep flip-flopping on this and it's leading to a lot of inconsistencies. Functionally, I think the biggest distinction is between geospatial datasets (have geospatial metadata) and non-geospatial datasets. In terms of attributes that we might want to list in this table, the biggest distinction is between benchmark datasets (bot input image and target labels) and non-benchmark datasets. The division in the docs doesn't necessarily need to match the base class division.

I'm leaning towards splitting the docs into benchmark vs. non-benchmark and keeping geospatial vs. non-geospatial as a base class distinction only. I've also been thinking about renaming VisionDataset to NonGeoDataset. Thoughts?

Adding hyperlinks to the citations for each dataset

Torchvision does this for their models and it always confuses me because I expect the hyperlink to take me to the model class definition, not the citation. I would prefer to have hyperlinks to class definitions and then the class definition contains a hyperlink to the citation. Thoughts?

Updating the CSVs to be current

We'll have to remind people to update this table every time they add a new dataset.

calebrob6 · 2022-02-26T21:44:31Z

Dataset naming stuff

NonGeoDataset sounds great to me. And I'm fine with having a "benchmark dataset" table that doesn't align with how the classes are organized.

I would prefer to have hyperlinks to class definitions and then the class definition contains a hyperlink to the citation. Thoughts?

Fine with me -- rows in the table should definitely link somewhere.

We'll have to remind people to update this table every time they add a new dataset.

Yep! That's fine with me. I can also add instructions to the contributing page in this PR.

ashnair1 · 2022-03-23T17:08:28Z

While we're on the topic, what about datasets like SpaceNet where they're pre-chipped to be like VisionDatasets but do contain geospatial metadata? In that case, it was the organisation of the dataset that informed the decision to make it a VisionDataset (query by integer index and not bounding box) not its lack of geospatial metadata.

Kind of lies in between geo-vs-vision

adamjstewart · 2022-03-23T18:33:17Z

@ashnair1 my current plan is to someday convert all of those to GeoDatasets (#83). The only thing holding us back at the moment is #409. I'm also planning on adding a new sampler (maybe PreChippedGeoSampler?) that doesn't require the user to specify the epoch length or patch size and instead gathers this directly from the dataset r-tree index. This will make them almost as simple as a VisionDataset but way more powerful since you can combine them with other datasets.

adamjstewart

In terms of filenames, there isn't a ton of consistency. We list "Geospatial Datasets" in "generic_datasets.csv" and "Non-geospatial Datasets" in "non_geo_datasets.csv". I'm honestly not sure what to call them anymore, and we've gone back and forth for a while. We should figure out how to make these more consistent in our docs/API. This doesn't necessarily need to happen in this PR, just pointing out the inconsistencies here.

docs/api/datasets.rst

docs/api/vision_datasets.csv

* added to data table * add links * fix docs

adamjstewart · 2022-06-16T00:42:20Z

This looks great! Will take a closer look later. Since we first created the docs, torchvision's docs have completely changed. They now have a short page with just the dataset tables and then separate pages for each dataset. I actually kind of like that format, and it may allow us to skip the step of adding the dataset to datasets.rst. This doesn't change anything in this PR, but it may make this PR more important in the future.

docs/api/geo_datasets.csv

docs/api/non_geo_datasets.csv

docs/user/contributing.rst

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

docs/api/geo_datasets.csv

calebrob6 · 2022-06-18T12:32:45Z

Alright, ready for round 2

docs/user/contributing.rst

docs/api/geo_datasets.csv

* Add benchmark dataset table * Add geospatial datasets * Work on Data table (microsoft#478) * added to data table * add links * fix docs * Added section for implementing new datasets to the Contributing page * Removing extra file * Add EDDMapS and GBIF rows to generic * Formatting * Renaming to make sense * Short names * Fixes * Checking references * Trying links * Figured out links * Removing hyphens for empty cells as these are rendered as bullet points * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * Update docs/user/contributing.rst Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * Update docs/api/geo_datasets.csv * Update geo_datasets.csv * Update geo_datasets.csv * Update contributing.rst * Formatting * Fix table links Co-authored-by: Nils Lehmann <35272119+nilsleh@users.noreply.github.com> Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

github-actions bot added the documentation Improvements or additions to documentation label Feb 26, 2022

adamjstewart added this to the 0.3.0 milestone Feb 27, 2022

adamjstewart mentioned this pull request Mar 23, 2022

Work on Data table #478

Merged

calebrob6 closed this Mar 30, 2022

calebrob6 reopened this Mar 30, 2022

calebrob6 force-pushed the docs/dataset_table branch from 65c2b1a to a1b61f5 Compare April 5, 2022 17:25

adamjstewart reviewed Apr 5, 2022

View reviewed changes

docs/api/datasets.rst Show resolved Hide resolved

docs/api/vision_datasets.csv Outdated Show resolved Hide resolved

calebrob6 force-pushed the docs/dataset_table branch from a1b61f5 to 5e3dc17 Compare June 15, 2022 19:39

calebrob6 and others added 7 commits June 15, 2022 14:08

Add benchmark dataset table

b916dca

Add geospatial datasets

c8d2f77

Work on Data table (#478)

48b5005

* added to data table * add links * fix docs

Added section for implementing new datasets to the Contributing page

cbf8716

Removing extra file

8c8946e

Add EDDMapS and GBIF rows to generic

5c2999e

Formatting

0e1b673

calebrob6 force-pushed the docs/dataset_table branch from 0ed71a8 to 0e1b673 Compare June 15, 2022 21:09

calebrob6 added 7 commits June 15, 2022 14:11

Renaming to make sense

f6d9d8d

Short names

97dd1b4

Fixes

e8e85b4

Checking references

2174907

Trying links

9dead4c

Figured out links

768353a

Removing hyphens for empty cells as these are rendered as bullet points

ed31d3f

adamjstewart reviewed Jun 17, 2022

View reviewed changes

Update docs/api/non_geo_datasets.csv

1d8f590

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

calebrob6 and others added 4 commits June 18, 2022 04:44

Update docs/api/non_geo_datasets.csv

b25a204

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

Update docs/api/non_geo_datasets.csv

93b95bd

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

Update docs/api/non_geo_datasets.csv

af71d05

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

Update docs/user/contributing.rst

b47c55c

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

calebrob6 commented Jun 18, 2022

View reviewed changes

docs/api/geo_datasets.csv Show resolved Hide resolved

calebrob6 added 4 commits June 18, 2022 04:48

Update docs/api/geo_datasets.csv

9662b71

Update geo_datasets.csv

93e82be

Update geo_datasets.csv

39eb939

Update contributing.rst

53cb623

adamjstewart reviewed Jun 18, 2022

View reviewed changes

docs/user/contributing.rst Outdated Show resolved Hide resolved

Formatting

3605d61

adamjstewart reviewed Jun 18, 2022

View reviewed changes

docs/api/geo_datasets.csv Show resolved Hide resolved

Fix table links

cb4b337

github-actions bot added the datasets Geospatial or benchmark datasets label Jun 19, 2022

adamjstewart approved these changes Jun 19, 2022

View reviewed changes

adamjstewart enabled auto-merge (squash) June 19, 2022 19:29

adamjstewart merged commit 98cc3c9 into main Jun 19, 2022

adamjstewart deleted the docs/dataset_table branch June 19, 2022 19:30

adamjstewart mentioned this pull request Jul 11, 2022

0.3.0 release #664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset table to the documentation #435

Add dataset table to the documentation #435

calebrob6 commented Feb 26, 2022 •

edited

Loading

adamjstewart commented Feb 26, 2022

calebrob6 commented Feb 26, 2022 •

edited

Loading

ashnair1 commented Mar 23, 2022 •

edited

Loading

adamjstewart commented Mar 23, 2022

adamjstewart left a comment

adamjstewart commented Jun 16, 2022

calebrob6 commented Jun 18, 2022

Add dataset table to the documentation #435

Add dataset table to the documentation #435

Conversation

calebrob6 commented Feb 26, 2022 • edited Loading

adamjstewart commented Feb 26, 2022

calebrob6 commented Feb 26, 2022 • edited Loading

ashnair1 commented Mar 23, 2022 • edited Loading

adamjstewart commented Mar 23, 2022

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart commented Jun 16, 2022

calebrob6 commented Jun 18, 2022

calebrob6 commented Feb 26, 2022 •

edited

Loading

calebrob6 commented Feb 26, 2022 •

edited

Loading

ashnair1 commented Mar 23, 2022 •

edited

Loading