Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
What's this all about
Why we do this
Crowdsourcing campaigns and systems include a number of components, such as a mobile app usually loaded on a smartphone, some offline storage capacity if Internet connectivity is somewhat shaky, synchronization mechanisms between the mobile app and the raw data server, quality assurance processes running on all incoming raw data, data processing, aggregation and improvement processes, data access and metadata enrichment services etc. There is quite some complexity behind.
For sure, you can build a simple system with a couple of days. If you don't want to share your data or interface with others, there is no need to comply to any standards. Just send your little JSON object with the location and value parameters from the mobile app and process it on your system. Done. No need to think much about data models, services, software architecture and so on. But, if you like to share your data with others, or at least allow others to discover your data, if you want to integrate your data with processing capacities provided by others, then, well, you better have a second look at standards.
Many ways lead to Rome, and many options exist to model observations in the context of citizen science. Here, we will provide some details on the chosen approach, and some arguments against other possible approaches. Our goal is to develop a broadly-applicable solution that works in many domains and features high level interoperability between components offering both raw and processed data, even across domains.
Every Observation Counts
Initially, we had to answer a rather simple questions - which becomes tough the more you play around with it.
How important is each observation?
We could collect multiple observations and package them into a single container. That would allow us providing metadata that applies to all observations in this container only once. On the other side, all observations have to be pretty homogeneous. Does this work? Well, consider the following situation. You participate in a butterfly sampling campaign. The first day, you capture only very few butterflies. The next day the weather improves, the area is full of different species. Can we pack all observations into a single container? The weather is an important feature that has changed. There are endless other scenarios. At the end, the decision was clear: Every observation counts. We want to be able to analyze raw data even two years later and still make sure that we are not comparing apples with oranges. For sure, there are other ways that would provide the same outcome. We could link the weather information with each butterfly observation. But is the data from the local weather station really the one we observed down in that little valley where we captured the butterflies?
Does it mean we cannot package observations?
For sure you can. We are talking about the raw observations here. Nothing prevents us from packaging all observations from a single campaign, even from a huge number of users, into a single container. Just, we conserve all detail about each observation to make sure we can access the raw data at the finest granularity.
Do we need this?
There are many platforms that allow you submitting your e.g. weather station data. Those platforms claim that they can do all sorts of magic with this data. In principle, you can estimate the error of an individual station by clever processing of all other stations. But this is complex work and there is always some error left in the equations. What looks like a spark at the upper of lower end might be just the result of a particular micro-climate. Who knows. Therefore, we invest a bit in our observation model. And as said, we feature collections of observations and support compact representations at the end to exchange data even over thin pipes.