-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 4k SWO Ecoplot dataset #8
Comments
@aazuspan, As far as an API, I suppose it all depends on what we intend to serve as sample data. For just the two that we have currently, I like either If we went with the latter option, would any valid value Absent that flexibility, I'd go with one of the first two options. |
Right, I was thinking that once the user had requested a certain resolution, that image would exist in cache locally for that resolution, so it would be a one-time cost. But I totally understand your hesitation with that approach.
That sounds like a great path forward.
I think your proposed tests were more about performance rather than adherence to correct values, right? If that's the case, I don't see any issue with using lossy compression to get down to a file size that can be hosted on Github. I think keeping all files in a zip makes sense to me as well. |
Yeah, that was my thought as well - thanks for the sanity check! |
@grovduck I just noticed that I've been committing the latitude and longitude rasters (including in the new zipped datasets) even though they're not in the tabular data and wouldn't be used for prediction. Any reason you can think of that I should leave those in? If not, I'll replace those zips when I make the next PR. |
@aazuspan, totally my bad. Sorry for having those in there. I think I stripped them out as covariates in the model out of an abundance of caution. Sounds good to strip those out in the next PR. |
No problem! Turns out they compressed so well that they barely affected the file size anyways, but still probably worth pulling out to avoid confusion in the future. |
The current SWO Ecoplot dataset rasters are 128x128 - good for a quick test but too small to really evaluate Dask performance. @grovduck processed some 4096x4096 versions that would be a good addition. The files are too large to include in the package (~300MB), so we'll need to use an on-the-fly downloader. We could build that from scratch, but I think it makes more sense to use a pre-existing solution like pooch that handles caching and versioning. In the interest of keeping dependencies light, it could be an optional dependency that's checked when using the data loader. Once that's in place, it may be worth throwing the 128x128 version in there as well, just to keep all the data consolidated and the package minimal.
In terms of API design, I could see this as a separate dataset (e.g.
load_swo_ecoplot_large
) or as a parameter (e.g.load_swo_ecoplot(large=True)
. If we wanted to include a lot of different resolution options, using asize
parameter likeload_swo_ecoplot(size=4096)
might make sense, but with just two options that seems like it's more confusing than helpful.Any preferences on API design, or better naming ideas @grovduck?
The text was updated successfully, but these errors were encountered: