rather than filling your inbox with another email, pinging you fine folks here:
@ethanwhite @cboettig @dfalster @dlebauer @ibartomeus @wcornwell
I'd love your feedback on this thing
What it is and why
Trait data seems to be one of the most locked down bits of the data ecosystem. Datasets are out there but often in supplemental materials, etc. that are hard to discover.
This service aims to be primarily an REST API that can be searched for datasets based on
- taxonomy
- location
- trait type
- general search across all fields
I got a Amazon research grant for $500 credits for https://github.com/sckott/gbids but it's not seeing much use as I thought it would get. So i'm trying to pivot to this idea, which i've been thinking about for some time.
There's description of the API at https://github.com/sckott/traitdb#api
Both /search routes are based on Elaticsearch, and current plan has the entire datasets loaded into ES. I don't think this will scale for sure, so evaluating other options while still allowing searching datasets. The stuff returned from this route are the actual records (aka rows) of data from the datasets, so one can get just the records they want based on some search.
There's only 6 datasets in there now, as I evaluate what works and what doesn't, and get feedback for you :)
I haven't yet cleaned up/standardized these but will be useful once done:
- taxonomy
- geolocation (lat/long, or similar)
- place names
- other things?
Then we could allow search specifically on those elements instead of just a full text search across all.
There is no website for this yet, but could make one on top of the API, for users that prefer a GUI.
What do you think?
- Do you think people will use this?
- Any issues you see with licensing/etc? All Dryad datasets are CC0, and I've been using those so far. AFAIK supp. ESA jourjals datasets are CC licensed, so I think those are all fair game as well
- Would it be better to not actually serve datasets, but only be a discovery service?
- If this was to narrow focus, what would be of the most benefit to the most people?
- other things?
rather than filling your inbox with another email, pinging you fine folks here:
@ethanwhite @cboettig @dfalster @dlebauer @ibartomeus @wcornwell
I'd love your feedback on this thing
What it is and why
Trait data seems to be one of the most locked down bits of the data ecosystem. Datasets are out there but often in supplemental materials, etc. that are hard to discover.
This service aims to be primarily an REST API that can be searched for datasets based on
I got a Amazon research grant for $500 credits for https://github.com/sckott/gbids but it's not seeing much use as I thought it would get. So i'm trying to pivot to this idea, which i've been thinking about for some time.
There's description of the API at https://github.com/sckott/traitdb#api
/:datasetid/fieldshttps://traits.party/datasets/z85a07642-9f49-408c-a16f-f71135d9450f/fields/ gives the fields in the dataset/fetchhttps://traits.party/datasets/z85a07642-9f49-408c-a16f-f71135d9450f/fetch retrieves the entire dataset from S3/searchroute (not running yet) searches across all datasets/:datasetid/searchroute (not running yet) searches in a given datasetBoth
/searchroutes are based on Elaticsearch, and current plan has the entire datasets loaded into ES. I don't think this will scale for sure, so evaluating other options while still allowing searching datasets. The stuff returned from this route are the actual records (aka rows) of data from the datasets, so one can get just the records they want based on some search.There's only 6 datasets in there now, as I evaluate what works and what doesn't, and get feedback for you :)
I haven't yet cleaned up/standardized these but will be useful once done:
Then we could allow search specifically on those elements instead of just a full text search across all.
There is no website for this yet, but could make one on top of the API, for users that prefer a GUI.
What do you think?