Jen Hammock edited this page Nov 28, 2017 · 16 revisions

Mmmmmm.... Fresh Data...

Why? What? How?

Users subscribe to FreshData monitors using http://gimmefreshdata.github.io . Whenever new occurrence data is available for a monitor, web services are updated and subscribers are notified. At time of writing one experimental notification mechanism- twitter- is supported.

Data providers can register data archives with FreshData by adding a source repository to https://github.com/gimmefreshdata. Some data archives are imported once, while others are updated periodically. Whenever data is added, FreshData monitors are updated and subscribers are notified.

Web services are also available listing data records, by provider, which have been sent to a data subscriber, along with metadata about the subscriber's query, and their research interest, if provided. Data providers are thus invited to monitor usage of their data.

FreshData uses technology including, but not limited to, Apache Spark (number crunching), Apache Mesos (clustering), Jenkins (job management), Cassandra (monitor cache) and Kafka (notifications and progress information). For more information see architecture.

Use Cases

Use Case: Researcher

  • arrive at search page (whenever you want)
  • select geographic filter (on map, polygon)
  • select date filter (calendar select, before date and/or after date)
  • select taxon filter (autocomplete dropdown suggestions or write-in, multiple entry permitted)
  • SUBMIT query! (the system saves your query)
  • see results, from GBIF and iDigBio, tabulated preview of first n records, taxon name, lat, long, source url (?)
  • download csv, all records, all available metadata, one sprawling flat file.
  • FOLLOW query! (webhook or email address)
  • GO PUBLIC! (Optional) Copy this query url and follow this link to start a new project profile on scistarter with an explanation and your contact info and everything! (or follow this extra fancy custom url scistarter will tell us how to construct to prepopulate their project profile without copying and pasting.) (and/or use this link to do the same on iNat or citsci.org, if they see the scistarter thing and want one too.)
  • UNFOLLOW! (If you don't click on your query link for a set expiry period- default 180 days- the system removes your FOLLOW flag.)

Use Case: Service Provider of Researcher (eg: EOL)

  • access list of submitted queries (whenever you have scheduled this to occur)
  • add filter "followed queries only"
  • pair queries with the user identities of your members, as recorded because they were logged in when they followed the query (???)
  • per user, determine FOLLOW report interval from user settings, determine whether a report is due and when was last time.
  • Per query with a report due: add filter "first available in FD after date=last time". If result >0, message querying member. Include basic query url, "fresh filtered" url, cheerful explanatory text, and UNFOLLOW url.

Use Case: SciStarter Anonymous Biodiversity Queries

  • access list of submitted queries (whenever you have scheduled this to occur)
  • update profile of multipolygon citizen science project

Use Case: Service Provider of Observer (eg: iNaturalist)

  • access list of queried records (whenever you have scheduled this to occur)
  • add filter "provider platform=you"
  • add filter "queried and delivered by FD after date=last time"
  • attach notification/tag/comment to the queried records, including url to the queries that captured them

Mockups

Mockups created during worksession on 6 November 2015 at National Museum of Natural History that included: Jen Hammock, Yurong He, Michele Weber and Jorrit Poelen.

search

email notification

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.