Replies: 2 comments
-
|
Some first experiments on this in #1246. Not a full implementation, but first end-to-end wiring across all layers:
Could be a useful starting point. Happy to answer questions |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
TEE
This proposal outlines the integration of Trusted Execution Environments (TEE) into the Gonka network architecture. By leveraging hardware-level isolation (Intel TDX, AMD SEV-SNP, NVIDIA Confidential Compute), we can protect user payloads and inference data from the physical hosts/miners, enabling true "Confidential MLNodes." This document proposes changes to the node registry, validation pipeline, and economic incentives.
Problem
Currently, user requests data sent to the Gonka network is hidden from the public but transparent to the specific host (miner) executing and validating the workload. Even with off-chain payloads, a malicious operator with physical access to the machine can access the data by using custom binaries or modifying the execution environment.
For enterprise and privacy-sensitive adoption, trust in the protocol must supersede trust in the hardware operator.
Proposal
We propose a new node class: the Confidential MLNode. This node operates within a TEE, isolating the full Virtual Machine (VM) from the hypervisor.
Supported CPUs:
Both technologies support attaching NVIDIA GPUs with Confidential Compute:
A VM with an attached GPU can encrypt all in-memory data, data transferred to the GPU, and data in GPU memory from the hypervisor.
By carefully modifying how MLNode works with data:
The full inference pipeline can be protected from both the host/miner and the hypervisor (server owner in the case of rented servers).
Using a temporary public key generated by the TEE-protected MLNode enables end-to-end data encryption without any decrypted data existing in unprotected memory.
Architecture Upgrade
MLNode
VM with MLNode is launched from an image with metadata saved on-chain (model embedded). Full MLNode behavior is defined by its REST API
VM generates a key pair for data encryption/decryption
VM provides an attestation certificate signed by the CPU's hardware module. The signed data includes:
Note: If the VM is restarted or the image is replaced, the private key will be lost.
Host publishes this certificate on-chain
Open Question 1: Should the certificate be validated on-chain?
All requests to the Confidential MLNode are encrypted using the public key from the certificate.
Relevant materials:
Framework to simplify TEE deployment:
Insightful paper about NVIDIA Confidential Compute:
Network Node
1. New MLNode Type
Confidential MLNode with a separate pipeline for scheduling inferences. These nodes do not require inference validation.
Confidential MLNodes have an associated attestation certificate and public key.
2. Certificate Validation
Requirements for certificate validation:
Open Question 2: Should full certificate validation be performed on-chain? Probably yes, on recording.
3. Inference Pipeline
Current
/chat/completionsrequest flow:https://what-is-gonka.hashnode.dev/decentralized-ai-inference-balancing-security-and-performance
In the current system, any request is scheduled to any executor. For Confidential MLNodes, the workflow changes:
This design enables users to send
/chat/completionsrequests directly to the Confidential MLNode without proxying through multiple Network Nodes.Possible Pipeline & Payment
Proposed workflow:
Note: Metadata can be sent in batches periodically to optimize on-chain transactions. Unclaimed quota is automatically refunded at the next epoch boundary.
Signed metadata from a TEE key is inherently trusted - the execution environment is pre-defined and verified via attestation. The MLNode cannot produce valid signatures without running in the attested TEE, which guarantees the correct binary ran.
Open Question 3: How to provide redundancy when a Confidential MLNode becomes unavailable?
Economic Incentive
=> eliminating the need to share work rewards with validators
=> eliminating need to generate and store artifacts for validation
Open Question 4: Since Confidential MLNodes have less validation overhead, should they receive higher bitcoin-style rewards to incentivize enabling TEE?
Beta Was this translation helpful? Give feedback.
All reactions