Skip to content

Latest commit

 

History

History
94 lines (54 loc) · 7.94 KB

RNP-009.md

File metadata and controls

94 lines (54 loc) · 7.94 KB

RNP-009: Prime Intellect Compute Client


RNP # Title Category Author Created Status
009 Prime Intellect Compute Client Core Vincent Weisser and Johannes Hagemann 19-03-2024 Initial Proposal Vote

Overview

This proposal advocates for the onboarding of the Prime Intellect Compute Platform as a new compute client within the Render Network.

The goal of Prime Intellect (https://primeintellect.ai) is to commoditize AI compute and intelligence to advance scientific progress and human flourishing. To achieve this, they aggregate all GPU supply from decentralized & centralized clouds and surpass limitations of traditional computing infrastructure using distributed, low-communication training approaches to offer multi-node training on-demand across more cost-effective providers and spot instances.

On the AI model side, they are crafting a novel & sustainable funding model for open-source AI, making models and data licenses investable with compute.

By integrating with the Render Network, Prime Intellect seeks to maximize utilization rates of idling compute resources of Render Nodes and their revenue by plugging them into our marketplace.

Motivation

The current AI compute market is characterized by high costs, market inefficiencies, and challenges in accessing multi-node instances on-demand. Prime Intellect addresses these issues by aggregating global compute, offering the cheapest and best options and enabling decentralized training across instances with optimized orchestration for efficiency.

Prime Intellect's integration with the Render Network would not only complement existing compute clients by improving utilization rates of Render Nodes, but also push the frontier on novel decentralized training research. This partnership is expected to reduce the entry barriers for AI development, democratize access to powerful compute resources, and foster innovation within the AI community.

Prime Intellect Compute Platform

The Prime Intellect Compute Platform (https://app.primeintellect.ai) is working on the following innovations to create a more efficient compute market:

Demo Video

  • Enabling multi-node instances on demand: Most cloud providers don’t offer large multi-node clusters (beyond 8 interconnected chips) on an on-demand basis. Users seeking access to 16-128+ GPU clusters are often bound to sign lengthy, expensive contracts with providers like CoreWeave or AWS. Prime Intellect aims to integrate novel decentralized training research such as SWARM parallelism or DiLoCo by Google DeepMind into their orchestration solution to enable training across different servers to function as a unified cluster, e.g. connecting separated nodes of 1-8 H100s for training on larger configurations (8-128+ H100s). In summary, they are focused on developing decentralized training solutions that exhibit several advantageous properties compared to current offerings. These include:
    • Enabling multi-node training on-demand,
    • Solutions for orchestration, efficiency optimizations, node failure mitigation, infrastructure and monitoring,
    • Integrating fault tolerance to handle node failures and enabling training on much cheaper spot instances, which can result in cost savings of 2-3x,
    • Flexibility to ramp up or down compute resources during training, allowing for dynamic adjustments in the total compute based on competitive pricing and further reducing costs,
    • Scaling to larger model sizes,
    • Requiring low network bandwidth,
    • Being compatible with varying network latencies, or incorporating optimizations to minimize the network latency,
    • Avoiding slower convergence or instabilities during training.

prime-intellect-decentralized-training

Technical Integration

We suggest creating an independent working group (The Prime Intellect-Render Working Group) based on achieving specific milestones of the following projects.

Prime Intellect-Render Connectors

Implement connectors to seamlessly integrate Render Nodes to the Prime Intellect Compute Platform, minimizing GPU idling time and maximizing profits.

provider-integration-part-1.png

provider-integration-part-3.png

provider-integration-part-2.png

provider-integration-part-4.png

Documentation Overview for Prime Intellect Compute Integration

**Price Optimization **

Develop dynamic pricing models for Render Nodes on the Prime Intellect Compute Platform to adjust prices in real-time based on market demand and supply. This aims to optimize both profit and utilization rates of Render Nodes by setting prices just under the lowest competitor rates, while ensuring they remain economically rational without falling below a certain threshold. Prices are not capped at the upper end, allowing for adjustments based on demand constraints.

**Decentralized AI Model Training **

Advancing the frontier of decentralized AI training on the Render Network. An open-source reproduction of the Distributed Low-Communication Training (DiLoCo) work by Google Deepmind on Render Nodes.

DiLoCo is a novel decentralized training approach that enables the training of language models on islands of devices that are poorly connected, making it an ideal fit for the Render Network Nodes. The method allows for training data parallel on different instances, requiring synchronization of gradients only every 500 steps.

DiLoCo

Strengths:

  • Very low communication needed between instances. Ideal for distributed training on nodes with low internet speeds.
  • Robust to ramping up/down individual workers and having an adaptive amount of total compute

Limitations:

  • DiLoCo has only been tested up to a model size of 400M parameters.

  • DiLoCo requires every instance to have enough GPU memory to be able to hold the parameters, gradients and optimizer states of the model (general data parallelism limitation, except for slower off-loading methods), even though these could be shared for the machines of one instance e.g. via ZeRO-3.

  • DiLoCo is limited to a synchronous setting → Fortunately Google Deepmind kindly did some research for this already that they released on January 18, 2024, extending DiLoCo to the asynchronous case, making it possible to run on heterogenous devices.

    Asynchronous Local-SGD Training for Language Modeling

Although the algorithm and promising experiments results for DiLoCo were published by Google, no open-source reproduction exists to date. We believe these advances shouldn’t be reserved for a few players at large research labs but be open-source, available to the GPU-poor.

The Prime Intellect-Render Working Group is eager to pursue follow-up projects after the successful completion of these initial projects to further strengthen the Render ecosystem. Future projects could include the integration of spot instances, as the Prime Intellect decentralized training stack already manages this, or enabling Render Node Providers to contribute their compute power to the most promising AI models in exchange for ownership and governance in the models (AI IP-NFTs).

Prime Intellect will start as soon as possible with the goal towards launching within a quarter.