Skip to content

ethglobal23nyc/blockchain-federated-machine-learning

Repository files navigation

Blockchain marketplace where people get paid for help training ML models and don’t need to share their data with anyone.

Data is the new oil. Powerful machine learning models are trained using people’s data. In the current world, people usually share their private data with a centralized entity in exchange for some freemium service. Users are not paid for sharing their private data that powers the applications and rely on the central authority to protect their data privacy. Imagine a world where people don’t need to share their local data with a central entity but still contribute to training a global machine learning model using their data locally, and even better get paid for doing so? In this project, we use blockchain to achieve that.

How it's made

In this project, we created a blockchain model marketplace where model sponsors, who want to train some ML models using people’s data (for example, health data to predict health conditions), can post their jobs. The posting basically says I need this type of data to train a specific model and in exchange I will pay X tokens. The model sponsor would provide in the listing the initial model to be trained, a model trainer executable (for clients to run on their local node to update the model using their local data) and the reward for the job.

Clients (nodes) can see the listings in the marketplace and decide if they want to participate or not. If a client decides to participate, they accept the job from the marketplace. After that the client load the initial model and the trainer executable from IPFS (Filecoin), runs the trainer executable on their local node with their local data. Once the job completes, a new model would be written to IPFS. We use Cartesi to run a python service for model training and validation, given {model_cid, data_cid} where cid is a unique IPFS file reference string. Cartesi machine will listen to the network, and upon message (in the form of creating a blockchain transaction), the corresponding reward is issued to the client’s wallet. We also tried Filecoin’s Lilypad for the distributed execution of the ML compute and think Lilypad would work great as well for future implementations. Note that the model training happens on the client side and the client data never leaves the client node. Only the new model weights are delivered to the model sponsor in exchange for the reward. Thus, this protects the user data privacy and simultaneously the client gets rewarded for their effort training the model using their data.

For wallet connection and authentication, we used WalletConnect. After completing the task, the client can see their reward in their wallet.