Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SustainBenchCropYield reads all the data on __getitem__ #1754

Closed
calebrob6 opened this issue Dec 4, 2023 · 1 comment · Fixed by #1756
Closed

SustainBenchCropYield reads all the data on __getitem__ #1754

calebrob6 opened this issue Dec 4, 2023 · 1 comment · Fixed by #1756
Assignees
Labels
datasets Geospatial or benchmark datasets
Milestone

Comments

@calebrob6
Copy link
Member

Every call to __getitem__ will result in the entire dataset being loaded from file -- https://github.com/microsoft/torchgeo/blob/main/torchgeo/datasets/sustainbench_crop_yield.py#L140.

We should check if the files are mmapped (and release mmapped version if not), or add caching to make this dataset faster.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Dec 4, 2023
@adamjstewart adamjstewart added this to the 0.5.2 milestone Dec 4, 2023
@calebrob6
Copy link
Member Author

The data is so small that I think we can just preload everything up front -- am testing this currently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants