-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add load functions for example datasets #3
Comments
Andy Hudak has given us permission to use the Moscow Mountain / St. Hudak, A.T. (2010) Field plot measures and predictive maps for "Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data". Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station. https://www.fs.usda.gov/rds/archive/Catalog/RDS-2010-0012
|
Hey @grovduck, I'm pushing ahead with this and had a few questions. First, am I understanding right that Moscow Mountain / St. Joseph's is a single dataset that we're currently referring to as just Second, and more complicated, how do we want to refer to attributes in the dataset? It seems to me that we're sort of going to be trading off between compatibility with the Option 1Stick with the
The inconsistencies are 1) adding an Option 2Sacrifice some compatibility by using more descriptive names for the array attributes. Obviously Option 3Ditch compatibility and just use
Of course we could adjust those names or consider splitting Let me know what you think, or if there are other options I'm not considering! I have a PR that's about ready to go using option 1 as a placeholder, so if it would be easier to discuss this with the code in place I can submit that as a draft. |
Correct, it is a single dataset containing species information and environmental/lidar covariates at two locations in Idaho (Moscow Mountain and St. Joe Woodlands - I had mistakenly called it St. Joseph's, but have verified that it should be St. Joes). I just downloaded both datasets and only the first one ("Nearest neighbor imputation of species-level ...") contains the species level basal areas/densities so I think we only need to include the first citation. I've updated my comment above. Yes, I've used
My first inclination is to go with Option 1 to be as consistent with
I think you're actually OK on this. Take a look at load_linnerud. I think this is pretty much exactly what you've proposed for
I think we'll have to make the exception to include |
Thanks for the clarification on the dataset! I suppose the test files could be renamed, but I'm not too worried about it since it's just an internal detail.
As you guessed, this was my leaning too, and I agree with all the points you brought up. Targeting this towards an
Great point! I was only comparing against One more question. Any thoughts on how we should handle the |
@aazuspan, I'm trying to process some thoughts and get your feedback about |
Yeah, I realized I might be jumping the gun trying to tackle this before we got prediction up and running! I'm good to pause this until we have a plan there :) |
Resolved by #39! |
Mimic the
sklearn.datasets
module with functions to load the example datasets fromyaImpute
(pending approval for us to share the datasets). Unlikesklearn
, probably package datasets in a purpose-made class rather than aBunch
object, but try to keep a consistent API.It may make sense for us to define a
Dataset
class for public examples with aTestingDataset
subclass that contains additional neighbors and distances for internal testing, but we can figure that out later.The text was updated successfully, but these errors were encountered: