Data Responsibility & Sharing Guidelines #6
Many of our AI focused cohorts are unsure as to how/whether they should release their training datasets along-side their machine learning code.
The UN OCHA has provided great guidance on releasing data, determining sensitivity, etc. The main guidance document can be found here:
I have contacts at HDX which might serve as a the perfect platform to release their data on.
An initial document or perhaps roadmap template for assessing & releasing a data set online.
UNICEF has shown interest in their mentors assigning a number of "homework" assignments. Below is a few ideas of some assignments which can be done.
Data Ecosystem Map: This is to document where their data is coming from, what's using it, stakeholders, etc. The benefit here is not just helping organize how data collection will work in production but also projecting partnerships which may need to be formed in the future.
Information Sharing Protocol: This document is all about conducting an initial assessment of the sensitivity of information you are collecting, who you want to share it with, and how. It's fairly well thought out but is worth a review and potentially some edits. If you are in the EU, it might just be better to do a Data Protection Impact Assessment
Determine Dataset Structure: As you create new data sets, update them, and remove old datasets, you should follow a easy to follow protocol for keeping track of your various data sets as they're updated and deprecated.