Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: ZK-friendly ML model explorations #16

Closed
saeyoon17 opened this issue Oct 18, 2023 · 7 comments
Closed

Proposal: ZK-friendly ML model explorations #16

saeyoon17 opened this issue Oct 18, 2023 · 7 comments
Labels
Application Proposal Proposal submitted by applicants Completed Grant has closed and finished

Comments

@saeyoon17
Copy link

saeyoon17 commented Oct 18, 2023

General Grant Proposal

Project Overview 📄

Overview

This task explores different zk-applicable machine learning techniques and compare them.

Project Details

Throughout the project, we explore different zk-applicable machine learning algorithms that can perform the Heart Failure Prediction Dataset.

Specifically, we target to explore

  • Neural Network
  • Linear Regression
  • Decision Tree
  • k-Nearest Neighbor

I plan to compare the folloings:

  • Number of model parameters (if any)
  • Model accuracy
  • Train complexity (time/memory)
  • Proof generation complexity (time/memory)
  • Performance degradation when converted to ZK circuit
  • Verification complexity (time/memory)

Team 👥

Team members

  • Saeyoon Oh
  • email: saeyoon17@gmail.com
  • telegram handle: greenteaboom
  • discord handle: saeyoon_oh

Team Website

Team's experience

Team Code Repos

Development Roadmap 🔩

Overview

  • Total Estimated Duration: 8 weeks (160 hours)
  • Full-time equivalent (FTE): 0.5 FTE

Milestone 1️⃣: Training/Proof generation using Neural Network

  • Estimated Duration: 2 weeks
  • FTE: 0.5

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can train a neural network, using the heart failure dataset and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

  1. Functionality: Train/Test/Inference pipeline using neural network. The model architecture is to be determined where I plan to start with simple MLP and expand.
  2. Functionality: Converting neural network model to ZK circuits using Circom or EZKL.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Milestone 2️⃣: Training/Proof generation using Linear Regression

  • Estimated Duration: 2 weeks
  • FTE: 0.5

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can make classification using linear regression using given dataset, and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

  1. Functionality: Train/Test/Inference pipeline using linear regression.
  2. Functionality: Converting linear regression model to ZK circuits using Circom or EZKL.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Milestone 3️⃣: Training/Proof generation using Decision Tree

  • Estimated Duration: 2 weeks
  • FTE: 0.5

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can make classification using decision tree using given dataset, and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

  1. Functionality: Train/Test/Inference pipeline using decision tree.
  2. Functionality: Converting decision tree to ZK circuits using Circom/EZKL/zkML.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Milestone 4️⃣: Training/Proof generation using kNN / Final report

  • Estimated Duration: 2 weeks
  • FTE: 0.5

Deliverables and Specifications

0a. Source code / Documentation - We plan to provide the source code and the documentations of how one can make classification using kNN using given dataset, and make heart failure prediction with it. The code should also contain evaluation pipeline where one can check the model accuracy. Also, it would allow one to prove that the prediction was made using the correct circuit.

0b. Final report - We plan to write down the final reports on observed models, where we compare the followings:

  • Number of model parameters (if any)
  • Model accuracy
  • Train complexity (time/memory)
  • Proof generation complexity (time/memory)
  • Performance degradation when converted to ZK circuit
  • Verification complexity (time/memory)
  1. Functionality: Train/Test/Inference pipeline using kNN.
  2. Functionality: Converting kNN to ZK circuits using Circom/EZKL/zkML.
  3. Functionality: Proof generation/Verification pipeline with utilities to check the time/memory complexity.

Additional Information ➕

Plans on converting models to ZK circuits

I am planning to first construct each model using pytorch and try EZKL. Yet if the operations are unimplemented, I am planning to look for other conversion methods, or construct circom circuit on my own.

Relevant works

@NOOMA-42 NOOMA-42 added the Application Proposal Proposal submitted by applicants label Oct 18, 2023
@NOOMA-42
Copy link
Collaborator

@socathie Would you kindly review this proposal

@socathie
Copy link

@saeyoon17 Thank you for your proposal. Your previous work on torch2circom shows that you are a good fit for this project. However, I'm worried that the Iris dataset is too low-dimensional (only 4 features) for the comparison/benchmarking to be meaningful. Hence, may I suggest some possible modifications:

  1. Choose a slightly more complicated dataset, one of more features and a bigger sample size; OR
  2. Focus on less "advanced" ML algorithms that are more suitable for this problem, more comparable in terms of complexity and performance, and less explored in previous ZKML implementations, e.g. decision tree (already proposed), kNN, SVD, LR, etc.

On the other hand, the deliverables will need to be more well-defined and details. Here is an example I had from when I did the grant on circomlib-ml and ZKaggle:

Milestone 1 Full-feature circomlib-ml
Deliverables:
0a. Documentation - We will provide both inline documentation of the code and a basic tutorial that explains how a user can (for example) spin up the application.
0b. Testing Guide - The code will have proper unit-test coverage (e.g. 90%) to ensure functionality and robustness. In the guide we will describe how to run these tests

  1. Functionality: Full strides compatibility in current layers - We will rewrite some current templates in circomlib-ml, e.g. adding strides compatibility to Conv2D, so that they will be fully compatible with current tensorflow standards
  2. Functionality: Flatten - We will write a circom template that will flatten a multidimensional input into a one-dimensional vector.
  3. Functionality: Dropout/Normalization - Dropout (and other regularization layers such as batch normalization) is one of the most common layers used in SOTA neural networks. Adding them will make the library more complete
  4. Functionality: Encrypt/decrypt - ECDH encryption and decryption templates will be added to circomlib-ml to enable encryption of model weights in further applications.
  5. BONUS Functionality: Proof aggregation - We will explore the possibility of aggregating multiple evaluation proofs into one using the recent zkPairing development.
  6. Application - All newly added templates will come together to form a more accurate model on the MNIST dataset than the current one hosted on https://zk-ml.netlify.app/

Of course, given the scope of your proposal, your deliverables will be very different. This is just to give an idea of the level of detail we want. Let me know if you have any questions!

@saeyoon17
Copy link
Author

@socathie Thanks! I will make sure to revise the proposal soon. :)

@saeyoon17
Copy link
Author

@socathie Hi Cathie! I edited the proposal. Could you kindly take a look at it?
Tell me if anything else is insufficient. Thank you!

@NOOMA-42
Copy link
Collaborator

@socathie Hi Cathie! I edited the proposal. Could you kindly take a look at it? Tell me if anything else is insufficient. Thank you!

Looks good content wise, I'll follow up with FTE/Cost internally. Will keep you update

@NOOMA-42
Copy link
Collaborator

NOOMA-42 commented Oct 27, 2023

@saeyoon17
I've removed the pricing rate from proposal. Pricing rate will be processed internally and will not be revealed reveal to public.

@adrianmcli
Copy link
Collaborator

This looks good to me!

@NOOMA-42 NOOMA-42 added the Completed Grant has closed and finished label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Application Proposal Proposal submitted by applicants Completed Grant has closed and finished
Projects
None yet
Development

No branches or pull requests

4 participants