Please register here.
The goal of this Hackathon session is to develop summarization functions that add noise to sensor data to protect privacy. However, when the summarized data are collected, analytics such as the summation and average aggregation functions, should be performed with a satisfactory accuracy. Therefore, a summarization function can be evaluated in the light of this trade-off: privacy-preservation vs. accuracy in data analytics.
A summarization function receives as input a vector with the raw sensitive data and provides as output a vector with summarized values of the same size. The entropy/diversity of the summarized data should be lower than the one of the raw data. Performance is measured as follows:
- Privacy-preservation is measured with relative error of raw-summarized data. For several users, the average of these errors measures the collective privacy-preservation.
- Accuracy in analytics is measured with the relative error of raw-summarized aggregated data.
More information about the performance metrics can be found here and here. Participants do not need to implement these metrics themselves, instead they can use the Challenge Analyzer to see how their summarization algorithm performs.
The application scenario of this hackathon challenge is the following: you are given the smart meter power consumption readings of 1000 consumers during a winter and summer month. For each consumer and day, 48 measurements are recorded. There are the raw data that need to be summarized. Try out one or more summarization functions over consumers' data so that you maximize the average local error of summarized data, while you minimize the error of aggregated data. To get access to the dataset, please contact us.
This repository provides you all the necessary utilities and APIs to implement the summarization functions. You do not need to worry about how you can load the data, how you output the data, in what format, etc. All these details are handled by the challengeLib.jar. More specifically, all required utilities can be found here. We also provide an example of a summarization function that is based on the k-means algorithm.
To participate in the Nervousnet Hackathon Challenge, follow the following 6 steps:
- Create the class
- Use the method
Loader.javato retrieve all the required data.
- Implement your summarization function within
MySummarizationFunction.javausing the returned values of
exportClonedRawValues(...)in step 2. Here is an example.
- Use the
Dumper.javato initialize and prepare the output of the summarization function.
- Add the summarized data in the output of the
- Call the method
More information about how to implement summarization functions can be found in this tutorial.
This hackathon is inspired by a work that envisions information sharing as a participatory and democratic supply-demand system self-regulated in a bottom-up fashion by citizens.
>E. Pournaras, J. Nikolic, P. Velasquez, M. Trovati, N. Bessis and D. Helbing, Self-regulatory Information Sharing in Participatory Social Sensing, The European Physical Journal Data Science, 5:14, 2016 © SpringerOpen
Self-regulatory Information Sharing in Participatory Social Sensing
Participation in social sensing applications is challenged by privacy threats. Large-scale access to citizens’ data allow surveillance and discriminatory actions that may result in segregation phenomena in society. On the contrary are the benefits of accurate computing analytics required for more informed decision-making, more effective policies and regulation of techno-socio-economic systems supported by ‘Internet-of Things’ technologies. In contrast to earlier work that either focuses on privacy protection or Big Data analytics, this paper proposes a self-regulatory information sharing system that bridges this gap. This is achieved by modeling information sharing as a supply-demand system run by computational markets. On the supply side lie the citizens that make incentivized but self-determined decisions about the level of information they share. On the demand side stand data aggregators that provide rewards to citizens to receive the required data for accurate analytics. The system is empirically evaluated with two real-world datasets from two application domains: (i) Smart Grids and (ii) mobile phone sensing. Experimental results quantify trade-offs between privacy-preservation, accuracy of analytics and costs from the provided rewards under different experimental settings. Findings show a higher privacy-preservation that depends on the number of participating citizens and the type of data summarized. Moreover, analytics with summarization data tolerate high local errors without a significant influence on the global accuracy. In other words, local errors cancel out. Rewards can be optimized to be fair so that citizens with more significant sharing of information receive higher rewards. All these findings motivate a new paradigm of truly decentralized and ethical data analytics.