In [1]:
import numpy as np
import pandas as pd

/kaggle/input/2023-kaggle-ai-report/sample_submission.csv
/kaggle/input/2023-kaggle-ai-report/arxiv_metadata_20230510.json
/kaggle/input/2023-kaggle-ai-report/kaggle_writeups_20230510.csv


# Working Title - "State of AI: Hardware"

# Introduction

Recent years have seen unprecedented acceleration in artificial intelligence (AI) system hardware capabilities, outpacing traditional benchmarks like Moore's Law, which predicts a two-year transistor doubling on a circuit. Epic Games CEO Tim Sweeney recently said "Artificial intelligence is doubling at a rate much faster than Moore’s Law’s 2 years, or evolutionary biology’s 2M years. Why? Because we’re bootstrapping it on the back of both laws. And if it can feed back into its own acceleration, that’s a stacked exponential" (Sweeney, 2023).

Despite a slowed transistor doubling rate, hardware accelerators' performance growth significantly outstrips Moore's two-year benchmark (Moore, 2022). This essay will explore key breakthroughs, driving forces, and performance measures of AI-related acceleration hardware in the last two years, along with its implications for the future of AI. While impossible to detail every development, this essay strives to provide a comprehensive picture of the most impactful AI hardware between 2021 and present day.

# Deep Learning Hardware Revolution

Coined in 1956 by scientists including John McCarthy and Marvin Minsky, "artificial intelligence" ignited an investment wave that led to an AI bubble. Central processing unit (CPU) architectures could not keep up with demand from early AI algorithms. When the bubble burst in the early 1980s, AI development regressed into the "AI Winter" (Jotrin Electronics, 2022).

Early computer hardware development focused on Central Processing Units (CPUs) guided by Moore's Law, predicting steady power growth. However, the computational demands of early AI quickly outpaced CPU capabilities. The Graphics Processing Unit (GPU), introduced by Nvidia in 1999, enabled parallel computations on matrices, promoting faster AI model training.

The true potential of GPUs for AI was first recognized in 2012 when a neural network model known as AlexNet, trained using Nvidia's GPUs, won the ImageNet competition. This achievement sparked widespread acceptance of GPUs for AI (Amodei & Hernandez, 2018). Since then, specialized hardware accelerators meant for large amount of mathematical computations such as GPUs, Tensor Processing Units (TPUs), Neural Processing Units (NPUs), and Intelligence Processing Units (IPUs) have become essential for efficient AI model training and deployment. Further innovations in the field, including AI-assisted chip design, started gaining momentum around 2018. These led to the development of frameworks like Google's chip design optimization platform (Goldie & Mirhoseini, 2020) and Synopsys' DSO.ai, an AI application for autonomous chip design (Synopsys, 2023).

The ongoing global chip shortage, triggered by the COVID-19 pandemic, accentuates the need for semiconductor production innovations. The demand surge for consumer electronics has prioritized smarter semiconductor manufacturing, leading to significant AI hardware advancements between 2021 and 2023 (Appenzeller, Bornstein, & Casado, 2023).

# Measuring AI Hardware Progress

Before examining hardware advancements, it is essential to establish some common performance measurement terminology. CPU power is typically assessed by transistor count, size of transistors in nanometers (nm) or clock speed in terahertz (THz). Accelerators like GPUs measure computational capacity, often termed "bulk compute", in floating point operations per second (FLOPS). This unit gauges the calculations a chip can perform in a second, with variants like TeraFLOPS (trillions of FLOPS), PetaFLOPS (a quadrillion FLOPS), and ExaFLOPS (quintillion FLOPS). Also, energy efficiency (FLOPS per watt), total energy used (watts), and data transfer capability (measured in MB, TB, or PB per second) are important accelerator performance indicators.

Using only CPU power or transistor count to compare hardware overlooks recent AI compute progress. For instance, Apple's NPU or "neural engine" has seen over 100% annual improvements in bulk compute, surpassing CPU and GPU growth. Figure 1 shows the historical processing power of Apple's iPhone chips, emphasizing the NPU's distinct growth trend (Vellante & Floyer, 2021).

<div align="center" style="background-color: #333; padding: 20px;">
      <img src="https://d2axcg2cspgbkk.cloudfront.net/wp-content/uploads/Breaking-Analysis_-Moores-Law-is-Accelerating-and-AI-is-Ready-to-Explode-1.jpg" width="500">
</div>

<div align="center">

**Figure 1.** Apple iPhone A-series chip growth in operations per second capability. From "A new era of innovation: Moore’s Law is not dead and AI is ready to explode", by J. Vellante & D. Floyer, 2021.

</div>


OpenAI identified three components reflecting AI progress in a 2018 study: algorithmic innovation, data, and training compute, with operations per second as a crucial measure surpassing clock speed's relevance (Amodei & Hernandez, 2018).

Moore's Law, correlating with the transistor count growth trend and ML advancement, shows signs of strain due to tech innovations. As the 2020s began, the tech community questioned Moore's Law's relevance, with MIT Technology Review's "We're not prepared for the end of Moore's Law" (Hoffman, 2020) casting doubts. However, some suggested a broader Moore's Law definition and more comprehensive metrics for computing advancements. Figure 2a shows the distinction of growth prior to and after 2012.

Figure 2b shows a closer look at AI models' computational demand surpassing Moore's Law since 2012. AI training compute soared by over 300,000 times with a 3.4-month doubling time—seven-fold if following Moore's Law strictly (Amodei & Hernandez, 2018). This indicates we are in a new AI compute evolution era, aligning with Sweeney's 'stacked exponential' concept.

<div align="center">
    <div style="display: flex; justify-content: space-around; background-color: #333; padding: 20px;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F86778ff808061547b22637c2437454ef%2Fai-and-compute-all.png?generation=1687738766744537&alt=media" style="width: 50%; margin-right: 10px;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F9574381efab68a160ffbfb6297e69b83%2Fai-and-compute-modern-log.png?generation=1687738839069138&alt=media" style="width: 50%; margin-left: 10px;">
    </div>
</div>


<div align="center">

**Figure 2.** (a) Two distinct eras of compute (b) Closer look at the deep learning revolution since 2012. From "AI and compute," by D. Amodei & D. Hernandez, 2018.


</div>

# AI Hardware Advancements

The following sections summarize key hardware advancements in AI that have taken place during the years 2021, 2022, and 2023. Key improvements are summarized and more technical details are included in summary tables. While difficult to pinpoint exactly when each hardware advancement was made, the primary focus is to highlight progress according to when the majority of sources acknowledge the new technology and have data to understaind it in more detail. Some companies are more transparent about chip capacities than others, often leading to more limited information on the specifics of each piece of hardware. 

## 2021

Investment in AI hardware development significantly increased in 2021, with capital invested globally almost doubling to \$68 billion (Sharma, 2021). Precedence Research valued the 2021 AI hardware market at \$10 billion, projecting growth to \$90 billion by 2030 (Precedence Research, 2022). Processors, rather than storage or network devices, were most in demand. However, network and storage devices increasingly limit performance as processing power exceeds expectations (Vellante & Floyer, 2021).

<div align="center" style="background-color: #333; padding: 20px;">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.precedenceresearch.com/insightimg/Artificial-Intelligence-in-Hardware-Market-Size-2021-to-2030.jpg" style="width: 30%; margin-right: 1%;"/>
        <img src="https://www.precedenceresearch.com/insightimg/Artificial-Intelligence-in-Hardware-Market-Share-By-Type-2021.jpg" style="width: 30%; margin-left: 1%; margin-right: 1%;"/>
        <img src="https://d2axcg2cspgbkk.cloudfront.net/wp-content/uploads/Breaking-Analysis_-Moores-Law-is-Accelerating-and-AI-is-Ready-to-Explode-3.jpg" style="width: 30%; margin-left: 1%;"/>
    </div>
</div>


<div align="center">

**Figure 3.** AI Hardware Market Behavior. (a) AI Hardware Market Value. Adapted from "Artificial Intelligence (AI) in Hardware Market," by Precedence Research, 2022. (b) Processor AI Hardware Market Share. Adapted from "Artificial Intelligence (AI) in Hardware Market," by Precedence Research, 2022. (c) Growth In Processing Speed vs network and storage devices. From "Google is using machine learning to design its next generation of machine learning chips," by J. Vincent, 2021.


</div>

### OpenAI
OpenAI launched DALL-E in January 2021, a multimodal AI system generating images from text. Although not hardware, its computational implications are vast, combining computer vision and natural language processing—two resource-intensive AI fields. Training and scaling models like DALL-E and GPT-3 require considerable hardware resources, emphasizing processing power, networking capabilities, and high-speed storage. While DALL-E's exact hardware setup remains undisclosed, replicating it on smaller scales reveals the complexity of hardware setup (Cuenca, 2023).

### Graphcore
In early 2021, Graphcore, a U.K. based chip manufacturer announced its second generation of the Colossus intelligence processing unit (IPU), GC200 Colossus MK2. Each GC200 IPU has 59 billion transistors, 1,472 independent programmable cores, and 250 TeraFLOPS. This level of parallel processing capability was designed to effectively handle the sparsity and irregularity of machine learning workloads, providing a different approach to the more common Nvidia's GPUs (Doherty, 2021).

### Cerebras Systems
In April, Cerebras Systems, a renowned AI chip startup, relased the WSE-2 the largest chip ever built. This chip boasted 850,000 cores and 2.6 trillion transistors. This capacity more than doubles that of its predecessor, the WSE-1 (Dilmengani, 2023). Cerebras also announced the "world's first brain-scale" AI solution. This term stems from the estimate that the human brain has an order of 100 trillion synapses and prior to this existing AI clusters could match about 1% of this. The CS-2 accelerator, the size of a smaller refrigerator, to support models of over 120 trillion parameters in size but is comprised of only one WSE-2 chip (Business Wire, 2021).

### Nvidia 
The Nvidia Grace CPU was also announced in April, the company's first data center CPU, was designed to address the computational requirements of advanced applications such as natural language processing, recommender systems, and AI supercomputing that analyze large datasets. Grace combined energy-efficient Arm CPU cores with a unique low-power memory subsystem to deliver high performance with remarkable efficiency. This Arm-based processor aimed to provide a ten-fold performance increase for systems training large AI models, compared to leading servers at the time. Notably, the Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy’s Los Alamos National Laboratory have plans to build Grace-powered supercomputers to support scientific research efforts (Nvidia, 2021).

### Google
In May, Google announced the introduction of their fourth-generation TPUs, for AI and ML workloads. TPUs, designed specifically to optimize AI computation, stood as Google's response to the rising dominance of GPUs. Google further documented the performance gains of the TPU v4 this year, offering a staggering 10x increase in ML system performance compared to its predecessor, TPU v3. With innovative interconnect technologies and domain-specific accelerators, the TPU v4 not only amplifies performance, but it also champions energy efficiency. Notably, the TPU v4 is tailored for LLMs such as LaMDA, MUM, and PaLM, with the PaLM model delivering 57.8% of peak hardware floating-point performance over 50 days of training on the TPU v4 (Jouppi & Patterson, 2022). The following month Google published a paper in Nature detailing their approach to using AI for the floor planning stage of chip design (Mirhoseini et al., 2021). This paper was the formalization of their 2020 blog post about AI powered chip design and made the findings more transparent. They also revealed that their fourth generation TPU, released just one month earlier, was designed using this new deep reinforcement learning technique (Vincent, 2021).

### NERSC, HPE, Nvidia and AMD
Another major announcement came from Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC), the Perlmutter supercomputer, built by HPE in collaboration with Nvidia and AMD features around 6,159 Nvidia A100 GPUs and roughly 1,500 AMD Milan CPUs, collectively providing an impressive 3.8 exaflops of theoretical "AI performance". It has since been instrumental in mapping the visible universe spanning 11 billion light years by processing data from the Dark Energy Spectroscopic Instrument (DESI), with early benchmarking revealing up to 20X performance speedups using the GPUs, thus reducing computational timeframes from weeks or months to merely hours (HPC Wire, 2021).

### Mythic
In June, Mythic announced the M1076 25 trillion operations per second (TOPS) AI processor which is capable of storing up to 80 million weighted parameters which means that it can run complex AI models without the need for external memory (Mitchell, 2021). The M1076 required ten times less power than a conventional system-on-chip or GPU. This introduction marked a shift towards creating more energy-efficient hardware solutions for AI, an important consideration as energy costs and environmental impacts become more of a concern (Sharma, 2021). 

### Cadence
Following the excitement of AI powered chip design in June, July brought the release of the Cerebrus platform from Cadence. The Cerebrus Intelligent Chip Explorer tool leverages ML to enhance the process of chip design, making engineers remarkably more productive. The introduction of ML has added an additional layer of automation to the design process, resulting in up to 10 times improved productivity per engineer and yielding a 20% enhancement in power, performance, and chip area (Takahashi, 2021).

### SambaNova Systems
In August SambaNova Systems, another popular chip startup, announced a unique Dataflow architecture, a high-performance and high-accuracy hardware-software system designed for AI applications​. The Dataflow architecture is powered by its innovative Cardinal SN10 reconfigureable data unit chip boasting an incredible 300 TeraFLOPS and up to 150 TB/s on-chip memory bandwidth. These high speed compute capabilities are particularly relevant in the context of machine learning and AI (Kennedy, 2021).

### Tenstorrent
Canadian based startup Tenstorrent released their flagship Grayskull processor into production in late 2020. It wasn't until Hot Chips 33 in August that it got its true debut. The Grayskull is referred to as an "all-in-one" computer system. In preliminary experiments, the system hit 368 TOPs and had been observed processing up to 23,345 sentences per second using Google’s BERT-Base language model for the SQuAD 1.1 data set, giving it a 26 times performance advantage over existing solutions (Wiggers, 2020).

### Apple
In October, Apple further upgraded the M1 series chips—released just a year prior and hailed as Apple's most powerful chips. The M1 Pro and M1 Max chips featured a standard but enhanced 16-core NPU for better on-device ML, showcasing Apple's focus on integrating ML in their products ("Introducing M1 Pro and M1 Max," 2021). Although not a traditional AI pioneer, Apple is emphasizing Edge AI, deploying AI applications on physical devices rather than relying on centralized cloud servers.

### AWS
Amazon Web Services (AWS) continued hardware announcements during their annual re:Invent conference in November 2021. Among these was the third generation of their Graviton hardware, Graviton3. This latest version is three times faster for ML workloads and up to a 60% energy saving compared to other leading hardware, giving it the "best price-performance ratio in Amazon EC2" (Nikita, 2022).

### Groq
Although not technically released in 2021, the startup called Groq, founded by a former Google engineer in 2016, recieved a lot of attention in late 2021 when it announced that their flagship GroqChip or Groq tensor streaming processor was used for COVID drug discovery at Argonne National Laboratory and showed a 333x speed improvement compared to legacy GPUs at the time (Westfall, 2021). This chip is highly specialized for ML workloads and not much else, capable of 1 PetaOp/s performance on a single chip implementation (Garanhel, 2022).

<div style="display:flex;justify-content:center;align-items:center;flex-direction:column;">
    <h3 style="text-align:center;">Table 1. Hardware Summary 2021</h3>
    <table style="border:1px solid black; border-collapse: collapse; text-align:left;">
        <thead style="background-color: #76B900; color: #333;">
            <tr>
                <th style="font-style: italic;">Hardware</th>
                <th style="font-style: italic;">Company</th>
                <th style="font-style: italic;">Key Features</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>DALL-E</td>
                <td>OpenAI</td>
                <td>End-to-end text-to-image generative model at scale, 12B paramaeter version of GPT-3</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>GC200 IPU</td>
                <td>Graphcore</td>
                <td>2nd Gen, 59 Bn transistors, 250 TeraFLOPS (double previous MK1), 62 TB/s, 0.57 TFLOP/W</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Wafer Scale Engine 2 (WSE-2)</td>
                <td>Cerebras Systems</td>
                <td>2.6 trillion 7nm transistors, 850,000 cores, 20 PB/s, 1 PetaFLOP, largest chip ever built</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>CS-2 'brain-scale' Accelerator</td>
                <td>Cerebras Systems</td>
                <td>Supports models up to 120 trillion parameters, contains 1 WSE-2, 13.5 million AI-optimized cores, 500kW</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Grace CPU</td>
                <td>Nvidia</td>
                <td>4nm transistors, 144 cores, up to 1 TB/s, 7.1 TeraFLOP, 10x model training speed </td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Perlmutter Supercomputer</td>
                <td>HPE, Nvidia, AMD</td>
                <td>Supercomputer with 6,159 Nvidia A100 GPUs and 1,500 AMD Milan CPUs, 180 PetaOPs standard, 4 ExaFlops AI</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>TPU v4</td>
                <td>Google</td>
                <td>TPU with 275 TeraFLOPS, 1.1 PB/s, 2 TOPS/W (3x FLOP/W vs v3), designed for Google's data centers, AI aided design</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>M1076</td>
                <td>Mythic</td>
                <td>25 TeraOPS, up to 80 million weighted parameters, 10X less power than conventional GPU</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>AI Powered Chip Design</td>
                <td>Google</td>
                <td>AI for floor planning stage of chip design using reinforcement learning</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Cerebrus Platform</td>
                <td>Cadence</td>
                <td>Platform for AI assisted chip design</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Cardinal SN10</td>
                <td>SambaNova Systems</td>
                <td>300 TeraFLOPS and up to 150 TB/s memory</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Grayskull</td>
                <td>Tenstorrent</td>
                <td>368 TOPS, 23k+ sentences per second with BERT, 75/150/300W versions, 384 GB/s</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>M1 Pro/Max Chip</td>
                <td>Apple</td>
                <td>16-core Neural engine optimized for Edge AI/ML acceleration, 11 TOPS</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Graviton3</td>
                <td>AWS</td>
                <td>60% energy savings, 3x faster for ML workloads</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>GroqChip/Groq TSP</td>
                <td>Groq</td>
                <td>1 core, 1 PetaOp/s, 333x speed improvement over leading GPUs</td>
            </tr>
        </tbody>
    </table>
</div>


<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.cerebras.net/wp-content/uploads/2022/03/Chip-comparison-01.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.eetasia.com/wp-content/uploads/sites/2/2021/08/Cerebras-CS-2.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://cdn.aitimes.com/news/photo/202210/147269_155272_111.png" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.servethehome.com/wp-content/uploads/2021/08/SambaNova-SN10-RDU-Cover.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.servethehome.com/wp-content/uploads/2022/05/AWS-Graviton3-Processor-Cover.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.apple.com/newsroom/images/product/mac/standard/Apple_M1-Pro-M1-Max_M1-Family_10182021_big_carousel.jpg.slideshow-xlarge_2x.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.techspot.com/images2/news/bigimage/2021/05/2021-05-20-image-16.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://qtxasset.com/quartz/qcloud4/media/image/fierceelectronics/1573766532/groq%20tsp%20chip.jpg?VersionId=CP3F6kc9YRrRqAuppFoRisrvVh0KERYb" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.eetasia.com/wp-content/uploads/sites/2/2021/06/Mythic-M.2-AE.jpg" style="width: 10%; margin-left: 1%; margin-right: 1%;"/>        
    </div>
</div>

***Note:** Images of processors used throughout are sourced from their respective company websites and news outlets.*


## 2022

### Nvidia
Nvidia announced the release of their new DGX Station, DGX-1, and DGX-2 built on state-of-the-art Volta GPU architecture (Gupta, 2022). The system includes the DGX A100, which is a single server system featuring multiple A100 GPUs. Since 2020, the A100 has served as the industry standard for GPU hardware which often prompts comparison when new hardware is released. Nvidia also announced the release of the H100 data center GPU, built with the new Hopper architecture with scalable speed for lower numerical accuracy. All of these components are specifically designed for deep learning training, accelerated analytics, and inference (Fu, 2022). Just one year after Google made their research and methods for incorporating AI into chip design, Nvidia announced their own incorporation of AI called 'PrefixRL'. Similar methods of reinforcement learning were incorporated into their new Hopper architecture resulting in circuits 25% smaller than those designed by humans with standard EDA tools (Roy, Raiman, & Godil, 2023). Around the same time, an internal struggle emerged at Google questioning the accuracy of findings in their original paper published in 2021 (Dave, 2022). In late 2022 Nvidia announced the new GeoForce 40 series GPU, Nvidia's more consumer centric hardware of choice.

### Intel
Intel’s Habana Labs released the second generation of their deep learning processors for training and inference — Habana Gaudi2. Initial results show quicker trianing times for BERT in MLPerf benchmarks (Gupta, 2022).

### Apple
In March and June, Apple also made significant strides in their hardware capabilities, unveiling the M1 Ultra and M2 chip, both next-generation enhancements of their breakthrough M1 chip. The M1 Ultra doubled the number of previous of neural engine cores from 16 to 32 ("Apple unveils M1 Ultra," 2022). The new mac standard neural engine in M2 is around 40% faster than the prior year ("Apple unveils M2," 2022). These advancements showed Apples continued innovation surrounding on-device machine learning.

### IBM & Tokyo Electron
In July, IBM and Tokyo Electron made strides in 3D chip stacking by addressing the limitations posed by Moore's law. Silicon carrier wafers, a significant obstacle in 3D chip manufacturing, were at the core of their challenges. The advancements they've introduced are designed to optimize the production process, with the added advantage of potentially alleviating the global chip shortage (Peckham, 2022).

### Cadence
In June, Cadence announced that customer adoption of the Cerebrus platform is growing and shared that a company called MediaTek has integraded Cerebrus into production. MediaTek powers more than two billion connected devices around the world [source]. Cadence also launched JedAI, a platform that integrates all the different AI chip design products [source].

### Untether
In August, Untether AI introduced a device codenamed 'Boqueria', also known as SpeedAI240. This device is designed to enhance energy efficiency and density, allowing scalability for devices of different sizes, a feature that proves useful when working with language models of varying parameter sizes (Burt, 2022).

### Cerebras
Meanwhile, Cerebras Systems launched their AI supercomputer, Andromeda, aiming to accelerate academic and commercial research. Both these major advancements were reported by Gupta (2022).

### Tesla
On AI Day in September, Tesla revealed its powerful Dojo chip, designed for faster training and inference in self-driving cars. The chip is apparently so powerful that it "tripped the power grid in Palo Alto"[source]. Tesla claims that one Dojo chip can replace 6 GPUs. The Dojo is expected to increase Tesla’s ability to train neural nets using video data, which is in high demand in the self-driving car initiative at Tesla [source].

### SambaNova
Additionally, SambaNova Systems announced the shipping of the second generation of the DataScale system—SN30. The system, powered by the Cardinal SN30 chip, is built for large models with more than 100 billion (100B) parameters and capable of handling both 2D and 3D images (Fu, 2022).

### AWS
In October, AWS announced the general availability of Amazon EC2 Trn1 instances, which are powered by AWS Trainium chips, build specifically for training high-performance ML applications in the cloud. Trn1 instances claim up to a 50% cost savings over comparable GPU-based EC2 instances for training large language models (Amazon Web Services, 2022). A month later, at AWS re:Invent 2022, Amazon made the EC2 Inf2 powered by the AWS Inferentia2 generally available. This machine learning accelerator, optimized for inference, offers larger compute density enabling lower cost per query. Inf2 also boasts the ability to deploy a 175B parameter model, such as GPT-3, in a single server. These two chip architectures represent a shift from using general purpose hardware to using hardware custom built to the specific phase of the system in order to lower task-specific compute costs (Liu, 2022).

### IBM
November marks the launch of IBM's AIU chip. This unique chip confines GPU floating point operations to 8-bit, less precise than even a CPU with higher compute density and similar model accuracy. Furthermore, it demands less power and memory compared to other top-tier GPUs with identical measures. The primary advantage lies in its affordability. Priced around \$1,000 per chip, a set of ten AIUs could rival the performance of the Nvidia H100 GPU, which ranges between \$20,000 to \$30,000. IBM believes this platform will make model training 1000 times faster by 2030 (Morgan, 2022).

### OpenAI
OpenAI's November 2022 release of ChatGPT underscored the key role of hardware in AI performance. Training the GPT-3 on 10,000 Nvidia A100 GPUs facilitated managing the 175B parameters and 45-terabyte dataset in a reasonable timeframe. Moreover, real-time inference demands powerful servers for model hosting and efficient hardware for quick request processing. It's estimated OpenAI uses around 3,617 HGX A100 servers, or 28,936 single A100 GPUs, daily to serve ChatGPT. Despite high upfront costs, this large-scale infrastructure enables a low per query cost of 0.36 cents (Patel & Ahmad, 2023). Thus, ChatGPT exemplifies a successful AI system, leveraging specialized hardware at each production stage.

<div style="display:flex;justify-content:center;align-items:center;flex-direction:column;">
    <h3 style="text-align:center;">Table 2. Hardware Summary 2022</h3>
    <table style="border:1px solid black; border-collapse: collapse; text-align:left;">
        <thead style="background-color: #76B900; color: #333;">
            <tr>
                <th style="font-style: italic;">Hardware</th>
                <th style="font-style: italic;">Company</th>
                <th style="font-style: italic;">Key Features</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>DGX Station, DGX-A100</td>
                <td>Nvidia</td>
                <td>8 GPUs, Each with 312 TeraFLOPS, 54 Bn transistors, GPU memory of 640 GB.</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>H100 data center GPU</td>
                <td>Nvidia</td>
                <td>80 Bn 4nm transistors, 60 TeraFLOPS (64-bit), 3 TB/s, 75% more power consumption. Designed with PrefixRL.</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>GeoForce 40 Series GPU</td>
                <td>Nvidia</td>
                <td>Largest 76.3 Bn 4nm transistors, 60 TeraFLOPS (64-bit), 1 TB/s, 330.3 Tensor TeraFLOPS, 450W.</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Habana Gaudi2</td>
                <td>Intel</td>
                <td>24 tensor cores, built with 7nm technology, 2.45 TB/s, beats Nvidia A100 in MLPerf, 600W</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>IBM z16</td>
                <td>IBM</td>
                <td>First Telum based system, 19 miles of wire, 22B 7nm transistors, 6 TeraFLOPS, </td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>M1 Ultra, M2</td>
                <td>Apple</td>
                <td>M1 Ultra - 114 Bn transistors, 32-cores, 22 TeraOPS, M2 - 20 Bn 5nm transistors, 40% faster standard NPU, 15.8 TeraOPS</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>3D Breakthroughs</td>
                <td>IBM/T.E.</td>
                <td>3D chip enabled silicon carrier wafers</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>MediaTek Integration, JedAI</td>
                <td>Cadence</td>
                <td>10x increased chip engineering productivity, 60% improvement in timing</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Boqueria, speedAI240</td>
                <td>Untether</td>
                <td>2 PFLOPs (8-bit), 30 TereFLOPS (64-bit), 30 TeraFLOPS/W, 1456 cores, 7nm transistor size</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Dojo Tile, D1 Chip</td>
                <td>Tesla</td>
                <td>25 chips/tile, 8,850 cores, 9 PFLOPs, 36 TB/s, 15 kW</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Andromeda Supercomputer</td>
                <td>Cerebras Systems</td>
                <td>Combines 16 Cerebras CS-2 systems for academic and commercial research, 1 ExaOPs</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Cardinal SN30</td>
                <td>SambaNova Systems</td>
                <td>86 billion transistors, 688 TeraFLOPS (16-bit), 2orks with model up to 100B parameters, 12.8x memory of Nvidia A100</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>EC2 Trn1, EC2 Inf2</td>
                <td>AWS</td>
                <td>Cloud Trn1 clusters - up to 3 PetaOPS, Inferentia clusters - up to 2.3 PetaOps, 10x lower latency</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>AIU</td>
                <td>IBM</td>
                <td>32 cores, 23 Bn 5nm transistors, 205 TeraFLOPS (8-bit operations), 820 TeraFLOPS (4-bit), low cost</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>ChatGPT/GPT-3 System</td>
                <td>OpenAI</td>
                <td>Trained with 10,000 Nvidia GPUs, inferenced with 28,936 Nvidia GPUs</td>
            </tr>
        </tbody>
    </table>
</div>


<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.nextplatform.com/wp-content/uploads/2020/05/nvidia-ampere-dgx-a100-exploded.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.cnet.com/a/img/resize/0dca15c0f6cac83ab4d6ba565259760d2535ff23/hub/2022/05/04/51dfeda0-9050-4f87-8552-b18b1462a501/20220429-nvidia-h100-hopper-ai-gpu-01.jpg?auto=webp&width=1200" style="width: 10%; margin-left: 1%;"/>
        <img src="https://habana.ai/wp-content/uploads/2021/06/habana-card.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.mbsdirect.com/media/k2/items/cache/e6f7eb86455f00eeac074fd25095dcde_XL.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://i.extremetech.com/imagery/content-types/043o5ZJEHHKLUGOD8uFI8oH/hero-image.fill.size_1200x675.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://images.squarespace-cdn.com/content/v1/5f8756dc4930e23c50399ae9/fd53bb14-ec85-4cd8-b2e6-8853fe89a45c/speedAI-7P6A9757-Edit+hires-bkg.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://assets.bizclikmedia.net/1800/abed4faa483941543aa38afe5245086a:bcb3d7f73d6bcbe0dd5efea7c72cddc4/andromeda-doors-closed.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://imageio.forbes.com/specials-images/imageserve/633eb8c2e2a31b5c346461d8/0x0.jpg?format=jpg&width=1200" style="width: 10%; margin-left: 1%;"/>
        <img src="https://external-preview.redd.it/F8Gu4-KjnA8PGvIZZGPrHpucpSMXDW66pX-IiQiyeEw.jpg?width=640&crop=smart&auto=webp&s=829463ecf13a1809dd2a07bd7444da1ef20ba091" style="width: 10%; margin-left: 1%; margin-right: 1%;"/>        
    </div>
</div>

***Note:** Images of processors used throughout are sourced from their respective company websites and news outlets.*

## 2023

### Shift to Software
The 2023 developer conferences from Google, Apple, and Microsoft gave us a preview of the types of advancements we are going to see in the second half of the year and beyond. Google I/O, Microsoft Build, and Apple WWDC each highlighted primarily software advancments that indicate the companies priorites. Microsoft is fully embracing and participating in the AI race with new virtual assistants, product features, Azure Supercomputer, and open LLMs just to name a few. Many were suprised at the lack of a TPUv5 announcement at Google I/O given that it has been acknowledged to exist back in 2021 (). The emphasis across the board seems to be around enhancing existing products with AI without any major hardware leaps disclosed.

### LLMs
The late 2022 release of OpenAI's ChatGPT has sparked a surge in AI advancements over the following six months, primarily fueled by an increased demand for advanced GPU hardware. The competitive landscape, featuring key players like Google with their Bard powered by PaLM 2, Microsoft's Bing AI, and Meta's LLaMA, has been driven by the development of large language models (LLMs). Latest AI hardware's power and efficiency is crucial for training large language models (LLMs), broadening their practical uses. However, hardware is just one facet of product development. Choices about parameter and data size critically shape the design, affecting everything from hardware requirements to training duration and model performance. For instance, the implications of choosing 175B parameters for OpenAI's GPT-3, 1.8 trillion for GPT-4, versus LLaMA's 65B parameters are considerable.

### Apple
Apple, a bit more withdrawn from the hype surrounding AI, did not once mention the term "artificial intelligence" at WWDC (Greenburg, 2023). Instead they unveiled numerous software improvements for ML across the device ecosystem along with the upgraded M2 Ultra's 32-core neural engine touted as 40% faster than the prior year 32-core model with 31.6 TOPs. ("Apple introduces M2 Ultra," 2023).

### Nvidia
Nvidia's Grace CPU Superchips, now integrated into the UK-based Isambard 3 supercomputer, provides superior speed and memory bandwidth with its 55,000+ core count. These benefits stem from its incorporation of Arm Neoverse V2 cores (Kennedy, 2023). Nvidia's DGX GH200 supercomputer is powered by a combination of the new Grace Hopper 200 GPU and the Grace CPU, with features like 528 GPU tensor cores, 4 TB/s memory bandwidth, and 144 TB of shared memory, marking an almost 500-fold increase compared to the previous generation DGX A100. A cluster of 256 such configurations offers a staggering 1 ExaFLOP (Edwards, 2023).

### Intel
Intel, with its Meteor Lake chips, embedded Vision Processing Units (VPUs) across all variants, thereby offloading AI processing tasks from the CPU and GPU to the VPU. This move resulted in increased power efficiency and ability to handle complex AI models, providing benefits for power-hungry applications such as Adobe suite, Microsoft Teams, and Unreal Engine (Roach, 2023). Intel also introduced the 4th generation Xeon processors with 10x speed improvement for Pytorch training/inference. The update Xeon series offers optimized models for high-performance, low-latency networks and edge workloads (Smith, 2023).

### AMD
AMD introduced an AI chip called MI300X, described as "the world's most advanced accelerator for generative AI". This introduction is expected to compete head-on with Nvidia's AI chips and generate interest from major cloud providers. Simultaneously, AMD initiated high-volume shipping of a general-purpose central processor chip named "Bergamo", adopted by Meta Platforms and others for their computing infrastructure (Mohan, 2023).

### Meta
Meta made its foray into AI hardware by unveiling its first custom-designed chips, the Meta Training and Inference Accelerator (MTIA) and the Meta Scalable Video Processor (MSVP). These chips, optimized for deep learning and video processing, underpin Meta's plans for a next-gen data center optimized for AI, illustrating its dedication to crafting a fully integrated AI ecosystem (Khare, 2023).

### Groq
Groq gained recent attention by claiming that it had created a process to move Meta's LLaMA from Nvidia chips over to its own hardware signaling a potential threat to Nvidia's 90% GPU market share. The complexity of the current AI hardware makes it a tedious task to adapt model architectures to run quickly on new setups (Lee & Nellis, 2023). 

### AI Assisted Chip Design
AI assisted chip design has resurfaced as a more popular avenue for manufacturers. Although there haven't been any significant announcements made this year with regard to AI designed chips, there is a shifting sentiment that this movement is going more mainstream due to the increase in customer contracts being reported by Synopsys and Cadence. (Ward-Foxton, 2023).

### TBD
While these advancements in 2023 are indeed significant, it's important to note that the main AI Hardware conferences such as AI Hardware Summit, AWS Re:Invent, Hot Chips, and other popular conferences are yet to occur. At the time of writing we don't have the full picture of all the developments in the field for 2023. As such, the information about current state of AI hardware in 2023 is still very limited.

<div style="display:flex;justify-content:center;align-items:center;flex-direction:column;">
    <h3 style="text-align:center;">Table 3. Hardware Summary 2023</h3>
    <table style="border:1px solid black; border-collapse: collapse; text-align:left;">
        <thead style="background-color: #76B900; color: #333;">
            <tr>
                <th style="font-style: italic;">Hardware</th>
                <th style="font-style: italic;">Company</th>
                <th style="font-style: italic;">Key Features</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Interactive LLMs and Assistants</td>
                <td>Google (BARD), Microsoft (Bing AI)</td>
                <td>Large-scale models designed for interactive and responsive tasks, leveraging the power and efficiency of the latest AI acceleration hardware</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>M2 Ultra</td>
                <td>Apple</td>
                <td>32-core neural engine, 31.6 TOPS</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Nvidia Grace CPU Superchips</td>
                <td>Nvidia</td>
                <td>384 Arm-based Nvidia Grace CPU Superchips, >55,000 cores, FP64 performance, <270 kW power consumption, Arm Neoverse V2 cores</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Nvidia DGX GH200</td>
                <td>Nvidia</td>
                <td>384 Arm-based Nvidia Grace CPU Superchips, >55,000 cores, FP64 performance, <270 kW power consumption, Arm Neoverse V2 cores</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Meteor Lake chips with Vision Processing Units (VPUs)</td>
                <td>Intel</td>
                <td>Embedded VPUs in all chips for increased power efficiency and the ability to handle complex AI models</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Xeon v4</td>
                <td>Intel</td>
                <td>10x speed improvement for Pytorch models. Specialized models for networking optimization</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>MI300X AI Chip and Bergamo Processor</td>
                <td>AMD</td>
                <td>Introduced the MI300X, the world's most advanced accelerator for generative AI, and started high-volume shipping of the Bergamo central processor chip</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Meta Training and Inference Accelerator (MTIA) and Meta Scalable Video Processor (MSVP)</td>
                <td>Meta</td>
                <td>Unveiled custom-designed AI chips optimized for deep learning and video processing and discussed plans for a next-gen data center optimized for AI</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>GPU Model Migration</td>
                <td>Groq</td>
                <td>Capable of migrating Meta's LLaMA off of Nvidia GPUs quicker</td>
            </tr>
        </tbody>
    </table>
</div>

<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://cdn.arstechnica.net/wp-content/uploads/2023/06/gh200_grace_hopper_chip-800x450.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/grace-cpu/grace-cpu-superchip-2c50-p@2x.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.notebookcheck.net/fileadmin/Notebooks/News/_nc3/M2_specifications_speculation_M2_Ultra_M2_Max_M2_Pro_drdNBC.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://sg-computers.com/images/stories/virtuemart/product/E5-2600V466.png" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.hardwaretimes.com/wp-content/uploads/2023/01/Intel-13th-Gen-U-series-CloseUp-6-scaled.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://images.anandtech.com/doci/18915/AMD-Instinct-MI300X-Xray_678x452.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.zdnet.com/a/img/resize/2f1b6abf86bbe7c1e5af80cc2c259147eed61779/2023/05/18/5ac07d2d-0d5f-4afd-b10f-fdf0f7e0f6f6/mtia-die-photo-copy.jpg?auto=webp&width=1280" style="width: 10%; margin-left: 1%;"/>
        <img src="https://images.prismic.io/encord/967316f7-8f58-4487-9447-406692c16593_image2.png?auto=compress,format" style="width: 10%; margin-left: 1%;"/>    
    </div>
</div>

***Note:** Images of processors used throughout are sourced from their respective company websites and news outlets.*

## MLPerf: A Closer Look

In response to the need for a relevant standard for measuring hardware performance across the growing landscape of deep learning processors since 2012, researchers from Baidu, Google, Harvard, Stanford, and Berkeley developed Machine Learning Performance (MLPerf) in 2018. MLPerf is a set of industry-standard metrics that evaluates the speed of ML software and hardware. It specifically assesses training and inference performance, scalability, and power performance, tailoring these assessments to the requirements of each specific model or task (Bunting, 2023).

Although MLPerf launched in 2018, it wasn't until the very end of 2020 that it was properly scaled and standardized into the ML Commons consortium. The development of these benchmarks came together for their first official debut as MLPerf v1.0 in 2021. MLPerf consists of eight benchmark tests: image recognition, medical-imaging segmentation, two versions of object detection, speech recognition, natural-language processing, recommendation, and reinforcement learning. Often referred to as "the Olympics of machine learning", MLPerf features computer hardware and software from 21 different companies competing on any or all the tests (Moore, 2022). This incentivizes hardware companies like Nvidia to put their best foot forward. The results of the June 2022 MLPerf v2.0 benchmark tests were compared to Moore's law to show the unexpected rate of progress achieved in training times alone. The 2022 MLPerf results showed 9-10x increase in training time performance vs 2018 as shown in Figure 4 (Moore, 2022).

<div align="center" style="background-color: #333; padding: 20px;">
      <img src="https://spectrum.ieee.org/media-library/a-chart-shows-six-lines-of-various-colors-sweeping-up-and-to-the-right.jpg?id=30049159&width=1580&quality=80" width="450">
</div>

<div align="center">

**Figure 4.** MLPerf Training Time Benchmarks vs Moore's Law. Source: S. Moore, 2022.

</div>

Nvidia's AI hardware in the 2023 MLPerf tests (MLPerf v3.0) has shown a considerable performance increase over its 2022 results. Nvidia is also one of the few companies that has consistently submitted MLPerf results for all eight benchmark tests. Exploring their hardware results shows the evolution of AI compute across the ML landscape.

In 2022, Nvidia's AI platform, powered by the A100 Tensor Core GPU, demonstrated significant versatility and efficiency across MLPerf. It achieved the fastest time to train on four out of eight tests and was found to be the fastest on a per-chip basis on six out of the eight tests. This performance was attributed to full-stack innovations spanning GPUs, software, and at-scale improvements, delivering 23x more performance in 3.5 years since the first MLPerf submission (Salvator, 2022). Figure 5a shows the 2022 MLPerf results for the A100.

Fast forward to 2023, the results are even more impressive. The newly introduced Nvidia H100 Tensor Core GPUs, designed in part by AI, running on DGX H100 systems, not only achieved the highest performance in every test of AI inference but also saw a performance gain of up to 54% since their debut in September 2022 (Salvator, 2023). Figure 5b shows the 2023 MLPerf results for the H100.

<div align="center" style="background-color: #333; padding: 20px;">
    <div style="display: flex; justify-content: center; width: 90%">
        <img src="https://www.hpcwire.com/wp-content/uploads/2021/09/Nvidia_Mlperf_Datacenter.png" style="width: 50%; margin-right: 1%;"/>
        <img src="https://blogs.nvidia.com/wp-content/uploads/2023/04/H100-GPU-inference-performance-MLPerf-1536x857.jpg" style="width: 50%; margin-left: 1%; margin-right: 1%;"/>
    </div>
</div>



<div align="center">

**Figure 5.** (a) MLPerf A100 2022 Results. Adapted from "Nvidia Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests," by D. Salvator, 2022. (b) MLPerf H100 2023 Results. Adapted from "Inference MLPerf AI," by D. Salvator, 2023a.


</div>

Specifically, in the healthcare domain, the H100 GPUs have improved performance by 31% since launch on the 3D-UNet benchmark, used for medical imaging. Additionally, the H100 GPUs powered by the Transformer Engine excelled in the BERT benchmark, a transformer-based large language model, significantly contributing to the rise of generative AI (Salvator, 2023). 

# Future Acceleration

This fast-paced advancement towards advanced AI has stirred up a wide range of perspectives within the AI community. On one hand, figures like Geoffrey Hinton, Elon Musk, and former Google CEO Eric Schmidt urge caution, highlighting ethical and existential risks in the long run (Gairola, 2023). On the other hand, optimists such as Andrew Ng see the potential of superior AI to drive unprecedented advancements and solve global challenges (Cherney, 2023). Just as these modern figures grapple with uncertainty regarding AI development, the pioneers of this field also reflected deeply on the implications of machine advancement.

Reflecting on the pace of technological progress, Samuel Butler (1863) expressed this observation in his essay, "There is no security against the ultimate development of mechanical consciousness, in the fact of machines possessing little consciousness now... Reflect upon the extraordinary advance which machines have made during the last few hundred years, and note how slowly the animal and vegetable kingdoms are advancing." Butler's words highlight the staggering speed of machine evolution, especially when compared to the slow, incremental changes seen in biological organisms.

Years later, Alan Turing echoed these sentiments. Turing (1951) reflected on the unpredictable nature of the technological advancements of his time, asserting, "We can see plenty there that needs to be done, and although we can make some fairly reliable, though quite rough, guesses, based on past experience, as to the order in which these developments will come, there remains the possibility that any one particular piece of research may lead to consequences of a revolutionary character."

Whether or not machines will surpass human intelligence is an age old question. The difference between the hype experienced now vs the 1970's is that we have the compute capability to keep up with the hype, suggesting that we are not in a bubble that would lead to another "AI Winter". As illustrated in Figure 6, a recent report from ARK shows that Neural Networks have the most influential potential as a catalyst for a host of other technologies (ARK Investment Management LLC, 2023).

<div align="center" style="background-color: #333; padding: 20px;">
      <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F59209c8272bb9d69862c7ddecb4a70bc%2Fnn_catalyst.png?generation=1687929381525083&alt=media" width="450">
</div>

<div align="center">

**Figure 6.** Neural Network Importance As A Catalyst Source. From "Big Ideas 2023," by ARK Investment Management LLC, 2023.

</div>

Along with this potential for continued innovation surrounding AI, we also have a dramatically different outlook on AI developments by 2030. AI training costs are currently dropping around 70% per year and projected to continue at that same pace. An important point though is that this is relative to GPT-3 level performance, the underlying factor here is a hardware cost improvement projection and not a model complexity projection. (ARK Investment Management LLC, 2023). 

<div align="center" style="background-color: #333; padding: 20px;">
    <div>
    <div style="width: 70%;">
      <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F735835fa86dba0aa643ee5b8151a91dd%2Fcosttotrain.png?generation=1687929392429753&alt=media">
      </div>
    </div>
</div>


Despite the future outlook on lower costs on training existing models like GPT-3, model complexity has already evolved far beyond this model that initially powered ChatGPT. As stated previously, GPT-3 is comprised of 175B parameters and was trained on 45 terabytes of data. The successor released in March 2023, GPT-4, contains 1.8 trillion parameters and was trained on 1 petabyte of data (E2Analyst, 2023). Any improvements in training cost for GPT-3 are now irrelevant due to the higher cost of data and much larger parameter size. A recent report by OpenAI analyzed training cost of recent AI models and found that the cost of training is increasing exponentially as shown below. Their data shows that the cost of training a model is expected to rise to \$500 million by 2030. The primary factor driving this increase in expense is the need for more data. As model architectures become more complex the more data they will need to train on as evidenced by the recent GPT-4 (Cottier, 2023).

<div align="center" style="background-color: #333; padding: 20px;">
    <div style="display: flex; justify-content: center; width: 40%;">
        <img src="https://mpost.io/wp-content/uploads/image-82-40.jpg" style="margin-right: 10px;">
    </div>
</div>

**Figure 2.** Caption for Figure 2. Source: Author(s) (Year).

While we continue to see hardware shortages and specialized GPUs remaining costly (Vanian, 2023), a recent estimate shows that the cost of AI hardware and software, when measured by relative compute unit (RCU), will continue to decline at a consistent rate. This combination of continued innovations will eventually enable applications like ChatGPT to run inferences at such a low cost that it can be deployable to the level of Google search (ARK Investment Management LLC, 2023). 

<div align="center" style="background-color: #333; padding: 20px;">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2Fcd48cc2bff57085850ae9c8e3fccc874%2Fcostperinf.png?generation=1687972455714214&alt=media" width="400" height="300" style="margin-right: 10px;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2Fc0765a4abc3f6c8ff40c97c2473e2639%2Faihardwarecost.png?generation=1687929416513761&alt=media" width="400" height="300" style="margin-left: 10px;">
    </div>
</div>

# Conclusion

The last two years have clearly been historic in the development of AI specific hardware. Addressing current computational demands extends beyond simply creating more powerful hardware or doubling the number of transistors on a chip. Lower costs and greater access are achieved by optimizing hardware, software, and algorithms for the full range of system requirements. Task-specific benchmarks, such as MLPerf, will play an increasingly influential role in measuring this progress across the growing lineup of products. Recent developments in AI hardware demonstrate an emphasis on specialization, such as dedicating specific chip architectures to model training and inference tasks to lower the computational costs. However, there exists a tension between cost-efficiency and the democratization of AI systems. While hardware costs are decreasing, the growing demand for data threatens to offset these gains. Despite these challenges, the pace of hardware advancements shows no sign of slowing down, indicating an exciting future for AI innovation.

# Sources

Ali, T. (2018, May 31). IT Hardware Benchmarks for Machine Learning and Artificial Intelligence. Medium. https://medium.com/@tauheedul/it-hardware-benchmarks-for-machine-learning-and-artificial-intelligence-6183ceed39b8

Amazon Web Services. (2022, October 13). Introducing Amazon EC2 TRN1 Instances for High-Performance, Cost-Effective Deep Learning Training. https://aws.amazon.com/about-aws/whats-new/2022/10/ec2-trn1-instances-high-performance-cost-effective-deep-learning-training/

Amodei, D., & Hernandez, D. (2018, May 16). AI and Compute. OpenAI. https://openai.com/research/ai-and-compute/

Appenzeller, G., Bornstein, M., & Casado, M. (2023, April 27). Navigating the high cost of AI compute. Andreessen Horowitz. https://a16z.com/2023/04/27/navigating-the-high-cost-of-ai-compute/

Apple. (2022, June). Apple unveils M2, taking the breakthrough performance and capabilities of M1 even further. Apple Newsroom. https://www.apple.com/newsroom/2022/06/apple-unveils-m2-with-breakthrough-performance-and-capabilities/

Apple Inc. (2021, October 18). Introducing M1 Pro and M1 Max: the most powerful chips Apple has ever built. Apple Newsroom. https://www.apple.com/newsroom/2021/10/introducing-m1-pro-and-m1-max-the-most-powerful-chips-apple-has-ever-built/

Apple Inc. (2022, March). Apple unveils M1 Ultra, the world's most powerful chip for a personal computer. Apple Newsroom. https://www.apple.com/newsroom/2022/03/apple-unveils-m1-ultra-the-worlds-most-powerful-chip-for-a-personal-computer/

Apple Inc. (2023, June). Apple introduces M2 Ultra. Apple Newsroom. https://www.apple.com/newsroom/2023/06/apple-introduces-m2-ultra/

ARK Investment Management LLC. (2023, January 31). Big Ideas 2023. https://research.ark-invest.com/hubfs/1_Download_Files_ARK-Invest/Big_Ideas/ARK%20Invest_013123_Presentation_Big%20Ideas%202023_Final.pdf

Barry, D. J. (2023, April 17). Beyond Moore's Law: New solutions for beating the data growth curve. Microcontroller Tips. https://www.microcontrollertips.com/beyond-moores-law-new-solutions-beating-data-growth-curve/

Bunting, J. (2023, March 14). AI Benchmarks Are Broken. SemiEngineering. https://semiengineering.com/ai-benchmarks-are-broken/

Burt, J. (2022, August 23). Untether AI pulls the curtain rope for its next-gen inferencing system. The Next Platform. https://www.nextplatform.com/2022/08/23/untether-ai-pulls-the-curtain-rope-for-its-next-gen-inferencing-system/

Business Wire. (2021, August 24). Cerebras Systems Announces World’s First Brain-Scale Artificial Intelligence Solution. https://www.businesswire.com/news/home/20210824005644/en/Cerebras-Systems-Announces-World%E2%80%99s-First-Brain-Scale-Artificial-Intelligence-Solution

Butler, S. (1863). Darwin Among the Machines. In The Notebooks of Samuel Butler.

Cherney, M. A. (2023, June 6). Andrew Ng says AI poses no extinction risk. Silicon Valley Business Journal. https://www.bizjournals.com/sanjose/news/2023/06/06/andrew-ng-says-ai-poses-no-extinction-risk.html

Cottier, B. (2023). Trends in the dollar training cost of machine learning systems. Epoch AI. https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems

Cuenca, P. (2023, January 25). The Infrastructure Behind Serving DALL-E Mini. Weights & Biases. https://wandb.ai/dalle-mini/dalle-mini/reports/The-Infrastructure-Behind-Serving-DALL-E-Mini--VmlldzoyMTI4ODAy

Dave, P. (2022, May 3). Google faces internal battle over research on AI to speed up chip design. Reuters. https://www.reuters.com/technology/google-faces-internal-battle-over-research-ai-speed-chip-design-2022-05-03/

Dilmengani, C. (2023, June 17). AI chip makers: Top 10 companies in 2023. https://research.aimultiple.com/ai-chip-makers/

Doherty, S. (2021, August 25). Designing the Colossus MK2 IPU: Simon Knowles at Hot Chips 2021. Graphcore. https://www.graphcore.ai/posts/designing-the-colossus-mk2-ipu-simon-knowles-at-hot-chips-2021

E2Analyst. (2023). GPT-4: Everything you want to know about OpenAI’s new AI model. Medium. https://medium.com/predict/gpt-4-everything-you-want-to-know-about-openais-new-ai-model-a5977b42e495

Edwards, B. (2023, May 24). The lightning onset of AI—what suddenly changed? An Ars Frontiers 2023 recap. Ars Technica. https://arstechnica.com/information-technology/2023/05/the-lightning-onset-of-ai-what-suddenly-changed-an-ars-frontiers-2023-recap/

Edwards, B. (2023, June 8). Nvidia's new AI superchip combines CPU and GPU to train monster AI systems. Ars Technica. https://arstechnica.com/information-technology/2023/06/nvidias-new-ai-superchip-combines-cpu-and-gpu-to-train-monster-ai-systems/

Freund, K. (2021, August 9). Using AI to help design chips has become a thing. Forbes. https://www.forbes.com/sites/karlfreund/2021/08/09/using-ai-to-help-design-chips-has-become-a-thing/?sh=29e752cb5d9d

Fu, J. (2022, September 29). AI frontiers in 2022. Better Programming. https://betterprogramming.pub/ai-frontiers-in-2022-5bd072fd13c

Gairola, A. (2023, May 25). Former Google CEO echoes Musk and Hinton's dire warnings on AI becoming existential risk. Benzinga. https://www.benzinga.com/news/23/05/32566930/former-google-ceo-echoes-musk-and-hintons-dire-warnings-on-ai-becoming-existential-risk

Garanhel, M. (2022, October 14). Top 20 AI Chips of Your Choice in 2022. AI Accelerator Institute. https://www.aiacceleratorinstitute.com/top-20-chips-choice-2022/

Goldie, A., & Mirhoseini, A. (2020, April 3). Chip Design with Deep Reinforcement Learning. Google AI Blog. https://ai.googleblog.com/2020/04/chip-design-with-deep-reinforcement.html

Greenberg, M. (2023, June 6). The best AI features Apple announced at WWDC 2023. VentureBeat. https://venturebeat.com/ai/the-best-ai-features-apple-announced-at-wwdc-2023/

Gupta, A. (2022, March 22). Nvidia’s Grace CPU: The ins and outs of an AI-focused processor. Ars Technica. https://arstechnica.com/gadgets/2022/03/nvidias-grace-cpu-the-ins-and-outs-of-an-ai-focused-processor/

Hamblen, M. (2023, February 16). ChatGPT runs 10K Nvidia training GPUs with potential for thousands more. Fierce Electronics. Retrieved from https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more

Hennessy, J. L., & Patterson, D. A. (2018). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.

Higginbotham, S. (2022, February 14). Google is using AI to design chips for its AI hardware. Protocol. https://www.protocol.com/google-is-using-ai-to-design-chips

Hoffman, K. (2020, February 24). We're not prepared for the end of Moore's law. MIT Technology Review. Retrieved from https://www.technologyreview.com/2020/02/24/905789/were-not-prepared-for-the-end-of-moores-law/

HPC Wire. (2021, May 27). NERSC debuts Perlmutter, world's fastest AI supercomputer. https://www.hpcwire.com/2021/05/27/nersc-debuts-perlmutter-worlds-fastest-ai-supercomputer/

Hruska, J. (2021, June 8). Intel’s 2021-2022 roadmap: Alder Lake, Meteor Lake, and a big bet on EUV. ExtremeTech. https://www.extremetech.com/computing/323126-intels-2021-2022-roadmap-alder-lake-meteor-lake-and-a-big-bet-on-euv

Intelligent Computing Lab, Peking University. (2022). Scalable Architecture for Neural Networks. http://nicsefc.ee.tsinghua.edu.cn/projects/neuralscale/

Jotrin Electronics. (2022, January 4). A brief history of the development of AI chips. Retrieved from https://www.jotrin.com/technology/details/a-brief-history-of-the-development-of-ai-chips

Jouppi, N., & Patterson, D. (2022, June 29). TPU v4 enables performance, energy, and CO2e efficiency gains. Google Cloud Blog. Retrieved from https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains

Kandel, A. (2023, April 7). Secrets of ChatGPT's AI Training: A Look at the High-Tech Hardware Behind It. Retrieved from https://www.linkedin.com/pulse/secrets-chatgpts-ai-training-look-high-tech-hardware-behind-kandel/

Kaur, D. (2021, November 3). Here's what the 2021 global chip shortage is all about. Tech Wire Asia. https://techwireasia.com/2021/11/heres-what-the-2021-global-chip-shortage-is-all-about/

Kennedy, P. (2021, August 24). SambaNova SN10 RDU at Hot Chips 33. ServeTheHome. https://www.servethehome.com/sambanova-sn10-rdu-at-hot-chips-33/

Kennedy, P. (2023, June 17). Nvidia Notches a Modest Grace Superchip Win at ISC 2023. ServeTheHome. Retrieved from https://www.servethehome.com/nvidia-notches-a-modest-grace-superchip-win-at-isc-2023-arm-hpe/

Khare, Y. (2023, June 16). Meta Reveals AI Chips to Revolutionize Computing. Analytics Vidhya. Retrieved from https://finance.yahoo.com/news/1-amd-says-meta-using-174023713.html
https://www.analyticsvidhya.com/blog/2023/05/meta-reveals-ai-chips-to-revolutionize-computing/

Lee, J., & Nellis, S. (2023, March 9). Groq adapts Meta's chatbot to its own chips in race against Nvidia. Reuters. https://www.reuters.com/technology/groq-adapts-metas-chatbot-its-own-chips-race-against-nvidia-2023-03-09/

Liu, M. (2022). Get the Latest from re:Invent 2022. AWS re:Post. https://repost.aws/articles/ARWg0vtgR7RriapTABCkBnng/get-the-latest-from-re-invent-2022

McKenzie, J. (2023, June 20). Moore’s law: further progress will push hard on the boundaries of physics and economics. Physics World. https://physicsworld.com/a/moores-law-further-progress-will-push-hard-on-the-boundaries-of-physics-and-economics/

Mitchell, R. (2021, June 19). Mythic announces latest AI chip M1076. Electropages. https://www.electropages.com/blog/2021/06/mythic-announces-latest-ai-chip-m1076

Mirhoseini, A., Goldie, A., Yazgan, M. et al. (2021). Chip placement with deep reinforcement learning. Nature 595, 230–236. https://www.nature.com/articles/s41586-021-03544-w

MLCommons. (2023, March 8). History. MLCommons. Retrieved from https://mlcommons.org/en/history/

Mohan, R. (2023, June 17). AI chip race heats up as AMD introduces rival to Nvidia technology. Tech Xplore. Retrieved from https://techxplore.com/news/2023-06-ai-chip-amd-rival-nvidia.html

Moore, G. E. (1965). Cramming more components onto integrated circuits. Electronics, 38(8), 114-117.

Moore, S. (2022). MLPerf Rankings 2022. IEEE Spectrum. https://spectrum.ieee.org/mlperf-rankings-2022

Morgan, T. P. (2022, October 20). IBM’s AI Accelerator: This Had Better Not Be Just A Science Project. The Next Platform. https://www.nextplatform.com/2022/10/20/ibms-ai-accelerator-this-had-better-not-be-just-a-science-project/

Naik, A. R. (2021, August 4). Explained: Nvidia's record-setting performance on MLPerf v1.0 training benchmarks. Analytics India Magazine. https://analyticsindiamag.com/explained-nvidias-record-setting-performance-on-mlperf-v1-0-training-benchmarks/

Nikita, S. (2022, May 27). AWS announces general availability of Graviton 3 processors. MGT Commerce. https://www.mgt-commerce.com/blog/aws-announces-general-availability-of-graviton-3-processors/

Narasimhan, S. (2022, June 29). Nvidia partners sweep all categories in MLPerf AI benchmarks. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2022/06/29/nvidia-partners-ai-mlperf/

Narendran, S. (2023, May 11). Every major AI feature announced at Google I/O 2023. ZDNet. Retrieved from https://www.zdnet.com/article/every-major-ai-feature-announced-at-google-io-2023/

Naval Group. (2023, March 2). AI-powered chip design: A revolution in the semiconductor industry. Naval Group Press Room. https://www.naval-group.com/en/news/ai-powered-chip-design-a-revolution-in-the-semiconductor-industry/

Nosta, J. (2023, March 10). Stacked exponential growth: AI is outpacing Moore's law and evolutionary biology. Medium. https://johnnosta.medium.com/stacked-exponential-growth-ai-is-outpacing-moores-law-and-evolutionary-biology-12882c38b68d

Nvidia. (2021, April 12). Nvidia Announces CPU for Giant AI and High Performance Computing Workloads. Nvidia Newsroom. https://nvidianews.nvidia.com/news/nvidia-announces-cpu-for-giant-ai-and-high-performance-computing-workloads

Nvidia. (2023, May 2). Introducing Nvidia Grace: A CPU specifically designed for giant-scale AI and HPC. Nvidia Newsroom. https://nvidianews.nvidia.com/news/introducing-nvidia-grace-a-cpu-specifically-designed-for-giant-scale-ai-and-hpc

Patel, D., & Ahmad, A. (2023, February 9). The inference cost of search disruption. SemiAnalysis. https://www.semianalysis.com/p/the-inference-cost-of-search-disruption

Peckham, O. (2022, July 7). IBM, Tokyo Electron Announce 3D Chip Stacking Breakthrough. HPCwire. https://www.hpcwire.com/2022/07/07/ibm-tokyo-electron-announce-3d-chip-stacking-breakthrough/

PR Newswire. (2018). Synopsys Unveils Fusion Compiler Enabling 20 Percent Higher Quality-of-Results and 2x Faster Time-to-Results. https://www.prnewswire.com/news-releases/synopsys-unveils-fusion-compiler-enabling-20-percent-higher-quality-of-results-and-2x-faster-time-to-results-300744510.html

Precedence Research. (2022). Artificial Intelligence (AI) in Hardware Market. https://www.precedenceresearch.com/artificial-intelligence-in-hardware-market

Roach, J. (2023, June 17). Intel thinks your next CPU needs an AI processor — here’s why. Digital Trends. https://www.digitaltrends.com/computing/intel-meteor-lake-vpu-computex-2023/

Roy, R., Raiman, J., & Godil, S. (2023, April 5). Designing arithmetic circuits with deep reinforcement learning. Nvidia Developer Blog. Retrieved from https://developer.nvidia.com/blog/designing-arithmetic-circuits-with-deep-reinforcement-learning/

Salvator, D. (2022). Nvidia Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2022/04/06/mlperf-edge-ai-inference-orin/

Salvator, D. (2023a). Inference MLPerf AI. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2023/04/05/inference-mlperf-ai/

Sharma, S. (2021, December 20). 2021 Was a Breakthrough Year for AI. VentureBeat. https://venturebeat.com/ai/2021-was-a-breakthrough-year-for-ai/

Smith, L. (2023, January 10). 4th Gen Intel Xeon Scalable Processors Launched. StorageReview. https://www.storagereview.com/news/4th-gen-intel-xeon-scalable-processors-launched

Sweeney, T. [@TimSweeneyEpic]. (2023, April 13). Artificial intelligence is doubling at a rate much faster than Moore’s Law’s 2 years, or evolutionary biology’s 2M years. Why? Because we’re bootstrapping it on the back of both laws. And if it can feed back into its own acceleration, that’s a stacked exponential. Twitter. https://twitter.com/TimSweeneyEpic/status/1646645582583267328

Synopsys. (2023). DSO.ai. Retrieved June 2023, from https://www.synopsys.com/ai/chip-design/dso-ai.html

Takahashi, D. (2021, July 22). AI’s got talent: Meet the new rising star in media and entertainment. VentureBeat. https://venturebeat.com/ais-got-talent-meet-the-new-rising-star-in-media-and-entertainment/

Tardi, C. (2023, June 17). Moore's Law. Investopedia. https://www.investopedia.com/terms/m/mooreslaw.asp

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. doi:10.1093/mind/LIX.236.433

Vanian, J. (2023, March 13). ChatGPT and generative AI are booming, but at a very expensive price. CNBC. Updated 2023, April 17. https://www.cnbc.com/2023/03/13/chatgpt-and-generative-ai-are-booming-but-at-a-very-expensive-price.html

Vellante, D., & Floyer, D. (2021, April 10). New era of innovation: Moore's law is not dead and AI is ready to explode. SiliconANGLE. https://siliconangle.com/2021/04/10/new-era-innovation-moores-law-not-dead-ai-ready-explode/

Vincent, J. (2021, June 10). Google is using machine learning to design its next generation of machine learning chips. The Verge. https://www.theverge.com/2021/6/10/22527476/google-machine-learning-chip-design-tpu-floorplanning

Ward-Foxton, S. (2023, February 10). AI-Powered Chip Design Goes Mainstream. EE Times. https://www.eetimes.com/ai-powered-chip-design-goes-mainstream/

Westfall, R. (2021, November 18). Groq Turbocharges COVID Drug Discovery at Argonne National Laboratory. Futurum Research. https://futurumresearch.com/research-notes/groq-turbocharges-covid-drug-discovery-at-argonne-national-laboratory/

Wiggers, K. (2020, April 7). Tenstorrent reveals Grayskull, an all-in-one system that accelerates AI model training. VentureBeat. https://venturebeat.com/ai/tenstorrent-reveals-grayskull-an-all-in-one-system-that-accelerates-ai-model-training/

ChatGPT (OpenAI) and Bard (Google) were used as tools while writing this essay. Their contributions were the following:
- Narrative: In the early stages of writing many ideas for the structure of the essay were generated, in the latter parts generating ideas for what could be removed.
- Research: ChatGPT's Bing integration was used throughout the information gathering process to check for additional sources outside of the ones found using traditional search methods.
- Validation: In order to ensure that my summaries of AI hardware advancements were accurate, they were checked against the contents of original sources to ensure summaries reflected the particular advancement or technology.

Ultimately the output of these generative tools was always rewritten.

In [7]:
submission_df = pd.read_csv("/kaggle/input/2023-kaggle-ai-report/sample_submission.csv")
submission_df.head()

Unnamed: 0,type,value
0,essay_category,'copy/paste the exact category that you are su...
1,essay_url,'http://www.kaggle.com/your_username/your_note...
2,feedback1_url,'http://www.kaggle.com/.../your_1st_peer_feedb...
3,feedback2_url,'http://www.kaggle.com/.../your_2nd_peer_feedb...
4,feedback3_url,'http://www.kaggle.com/.../your_3rd_peer_feedb...


In [8]:
val = ["'Other'", "http://www.kaggle.com/your_username/your_public_notebook",
      "http://www.kaggle.com/.../your_1st_peer_feedback",
      "http://www.kaggle.com/.../your_2nd_peer_feedback",
      "http://www.kaggle.com/.../your_3rd_peer_feedback"]
submission_df.value = val
submission_df.to_csv('submission.csv', index=False)

In [9]:
submission_df.head()

Unnamed: 0,type,value
0,essay_category,'Other'
1,essay_url,http://www.kaggle.com/your_username/your_publi...
2,feedback1_url,http://www.kaggle.com/.../your_1st_peer_feedback
3,feedback2_url,http://www.kaggle.com/.../your_2nd_peer_feedback
4,feedback3_url,http://www.kaggle.com/.../your_3rd_peer_feedback
