In [1]:
import numpy as np
import pandas as pd

/kaggle/input/2023-kaggle-ai-report/sample_submission.csv
/kaggle/input/2023-kaggle-ai-report/arxiv_metadata_20230510.json
/kaggle/input/2023-kaggle-ai-report/kaggle_writeups_20230510.csv


# Working Title - "State of AI: Hardware"

# Introduction

The past two years have seen an unprecedented acceleration in the capability of the hardware behind Artificial Intelligence (AI) systems. Tim Sweeney, CEO of Epic Games described this transition in a recent tweet. He states, "Artificial intelligence is doubling at a rate much faster than Moore’s Law’s 2 years, or evolutionary biology’s 2M years. Why? Because we’re bootstrapping it on the back of both laws. And if it can feed back into its own acceleration, that’s a stacked exponential." (Sweeney, 2023). 

Traditionally, Moore's Law, which predicts a 2-year doubling of transistors integrated onto a circuit, has served as the benchmark for anticipated growth in computing power. Yet, in the past two years, acceleration hardware has surpassed this prediction in several ways (Moore, 2022). What advancements in AI related acceleration hardware have been made in the last two years? What are the factors driving the pace of hardware innovation? How can we measure the performance achieved? What does the acceleration in performance mean for the future of AI? While it is impossible to cover every development, this essay will explore these questions in order to encapsulate the current state of AI acceleration hardware in 2023.

# Historical Context

The term "artificial intelligence" was first used by scientists John McCarthy, Claude Shannon, and Marvin Minsky at the Dartmouth Conference in 1956. With optimistic expectations for the field's potential, Minsky boldly proclaimed in 1970 that machines with average human intelligence would exist in the near future. The hype surrounding this forecast ignited an investment wave that lasted over a decade and culminated in an AI bubble. However, when this bubble burst in the early 1980s, AI development regressed back to research labs, and the field entered what is now referred to as the "AI Winter". (Jotrin Electronics, 2022).

In the early years of computer hardware development, Central Processing Units (CPUs) were the primary source of innovation around computing capabilities. Moore's Law had been the guiding principle for the advancement of these CPUs, predicting a steady pace of growth. However, the demands of AI computations quickly exceeded the capabilities of these CPU architectures, resulting in much slower progress than originally anticipated by Minsky. 

The advent of the Graphics Processing Unit (GPU) by Nvidia in 1999 marked a significant change in the computing landscape. Initially used for 3D graphics in video games, their potential in AI model training wasn't recognized until a decade later. The use of GPUs for model training marked an acceleration in AI hardware, far surpassing the advancements predicted by Moore's Law.

This acceleration reached a tipping point in 2012 with the advent of AlexNet, a groundbreaking neural network model powered by Nvidia's GPU hardware. AlexNet won the ImageNet competition by delivering a record-setting image recognition accuracy (Amodei & Hernandez, 2018).

The use of hardware in AI has since become a key factor in the speed and efficiency of model training and deployment. In response to the limitations of GPUs, AI-assisted chip design emerged around 2018, leading to innovations like Google's framework for optimizing chip design through a game framework (Goldie & Mirhoseini, 2020) and Synopsys' DSO.ai, the "world's first autonomous AI application for chip design" (Synopsys, 2023).

The ongoing global chip shortage, which began with the COVID-19 pandemic in 2020, further underscores the need for innovations in semiconductor production. This shortage and the accompanying surge in demand for consumer electronics have elevated the priority of smarter semiconductor manufacturing, setting the stage for significant advancements in AI hardware between 2021 and 2023 (Appenzeller, Bornstein, & Casado, 2023).

# Measuring AI Hardware Progress

Before delving into recent hardware advances, it is crucial to establish an understanding of certain key measurement terminologies for comparing performance. Traditional central processing unit (CPU) power is typically evaluated based on either transistor count or clock speed represented in terahertz (THz). More relvant to the material here are accelerators such as a graphics processing unit (GPU), neural processing unit (NPU), or tensor processing unit (TPU), gauge their overall computational capability, known as "bulk compute," primarily in variations of floating point operations per second (FLOPS). This measurement describes the number of calculations a chip can execute within a second. Among other variants, we have TeraFLOPS (TFLOPS), which stands for trillions of FLOPS, and tera operations per second (TOPS), a similar measure that incorporates both integer and floating point calculations typically in 64-bit numbers for accuracy. PetaOp/s, denoting a quadrillion operations per second, is yet another variant. Other important indicators of hardware performance include energy efficiency and the amount of memory the chip can move in and out at one time commonly measured in megabytes (MB) or terabytes (TB) per second. These measures serve as key indicators of hardware performance in the realm of artificial intelligence.


Focusing solely on CPU power or transistor count to compare hardware fails to capture the true extent of recent progress in AI compute. Take, for example, Apple's NPU, also known as the "neural engine." The performance of these chips has seen year-on-year improvements of over 100% in bulk compute, outpacing the progression in their CPU and GPU counterparts. The chart below overlays the historical processing power of Apple's iPhone chips, highlighting the distinct growth trend of the NPU (Vellante & Floyer, 2021).

<div align="center">
      <img src="https://d2axcg2cspgbkk.cloudfront.net/wp-content/uploads/Breaking-Analysis_-Moores-Law-is-Accelerating-and-AI-is-Ready-to-Explode-1.jpg" width="450">
</div>

OpenAI in a 2018 publication asserted that three measurable components of an AI system reflect AI's progress over time: algorithmic innovation, data, and available compute for training. OpenAI concurred that the most accurate reflection of AI advancement hinges on the measure of operations per second, which is even more relevant to model training than clock speed (Amodei & Hernandez, 2018).

For decades, Moore's law successfully correlated with not only the growth trend in transistor count but also the advancement in ML when measured by compute. Yet, with the onset of recent tech innovations, this relationship has shown signs of strain. As we moved into the 2020s, the technology community began questioning the continued relevance of Moore's Law. Influential pieces, like "We're not prepared for the end of Moore's Law" (Hoffman, 2020), published in the MIT Technology Review, cast doubts over the longevity of this once-dependable guideline in relation to its prediction of transistor count. On the other hand, some argued that Moore's Law was not dead, but its definition needed broadening. It was clear that we needed more comprehensive metrics to gauge computing advancements and accurately predict future trajectories.

The charts below illustrate how the growth in computational demand for different AI models has veered ahead of Moore's Law since 2012. The compute utilized in AI training has swelled by over 300,000 times, with a doubling time of merely 3.4 months. If progress had adhered strictly to Moore's Law, the increase would have been limited to a factor of seven (Amodei & Hernandez, 2018). These charts suggest we have truly entered a new era in the pace of AI compute evolvement more closely aligning with the 'stacked exponential' described by Sweeney.

<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F86778ff808061547b22637c2437454ef%2Fai-and-compute-all.png?generation=1687738766744537&alt=media" width="400" style="margin-right: 10px;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F9574381efab68a160ffbfb6297e69b83%2Fai-and-compute-modern-log.png?generation=1687738839069138&alt=media" width="400" style="margin-left: 10px;">
    </div>
</div>

# AI Hardware Advancements

## 2021

AI hardware development received considerable investment in 2021. The amount of capital invested in AI hardware companies globally almost doubled from 36 billion USD in 2020 to 68 billion USD in 2021 (Sharma, 2021). According to a report by Precedence Research (2022), the AI hardware market in 2021 was valued at 10 billion USD in 2021 and was projected to grow to almost 90 billion USD by 2030. In addition, the highest demand in AI hardware in 2021 was for processors (65%) rather than storage or network devices. The only issue with that is while processing power is clearly outperforming expectations, network and storage devices are increasingly the bottleneck of even greater performance (Vellante & Floyer, 2021).

<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.precedenceresearch.com/insightimg/Artificial-Intelligence-in-Hardware-Market-Size-2021-to-2030.jpg" style="width: 30%; margin-right: 1%;"/>
        <img src="https://www.precedenceresearch.com/insightimg/Artificial-Intelligence-in-Hardware-Market-Share-By-Type-2021.jpg" style="width: 30%; margin-left: 1%; margin-right: 1%;"/>
        <img src="https://d2axcg2cspgbkk.cloudfront.net/wp-content/uploads/Breaking-Analysis_-Moores-Law-is-Accelerating-and-AI-is-Ready-to-Explode-3.jpg" style="width: 30%; margin-left: 1%;"/>
    </div>
</div>

2021 started off with the release of DALL-E by OpenAI in January. DALL-E, a multimodal AI system, distinguishes itself by generating images from text descriptions. Although not a hardware advancement per se, DALL-E's significance in the computational domain is undeniable. It merges two of the most computationally intensive fields in AI: computer vision and natural language processing. To train models like DALL-E and its underlying model, GPT-3, and deep learning models for images, substantial hardware resources are necessary. Moreover, to operate at scale, the model depends on a potent combination of efficient processing power, robust networking capabilities, and high-speed storage hardware. While the exact hardware setup of DALL-E haven't been disclosed, recent attempts to replicate DALL-E on a much smaller scale have shown the complexity of end-to-end hardware setup (Cuenca, 2023).

In early 2021, Graphcore, a U.K. based chip manufacturer announced its second generation of the Colossus intelligence processing unit (IPU), GC200 Colossus MK2. Each GC200 IPU has 59 billion transistors, 1,472 independent programmable cores, and 250 TFLOPS. This level of parallel processing capability was designed to effectively handle the sparsity and irregularity of machine learning workloads, providing a different approach to the more common Nvidia's GPUs (Doherty, 2021).

In April, Cerebras Systems, a renowned AI chip startup, relased the WSE-2 the largest chip ever built. This chip boasted 850,000 cores and 2.6 trillion transistors. This capacity more than doubles that of its predecessor, the WSE-1 (Dilmengani, 2023). Cerebras also announced the "world's first brain-scale" AI solution. This term stems from the estimate that the human brain has an order of 100 trillion synapses and prior to this existing AI clusters could match about 1% of this. The CS-2 accelerator, the size of a smaller refrigerator, to support models of over 120 trillion parameters in size (Business Wire, 2021).

The Nvidia Grace CPU was also announced in April, the company's first data center CPU, was designed to address the computational requirements of advanced applications such as natural language processing, recommender systems, and AI supercomputing that analyze large datasets. Grace combined energy-efficient Arm CPU cores with a unique low-power memory subsystem to deliver high performance with remarkable efficiency. This Arm-based processor aimed to provide a ten-fold performance increase for systems training large AI models, compared to leading servers at the time. Notably, the Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy’s Los Alamos National Laboratory have plans to build Grace-powered supercomputers to support scientific research efforts (Nvidia, 2021).

In May, Google announced the introduction of their fourth-generation TPUs, for AI and ML workloads. TPUs, designed specifically to optimize AI computation, stood as Google's response to the rising dominance of GPUs. Google further documented the performance gains of the TPU v4 this year, offering a staggering 10x increase in ML system performance compared to its predecessor, TPU v3. With innovative interconnect technologies and domain-specific accelerators, the TPU v4 not only amplifies performance, but it also champions energy efficiency. Notably, the TPU v4 is tailored for LLMs such as LaMDA, MUM, and PaLM, with the PaLM model delivering 57.8% of peak hardware floating-point performance over 50 days of training on the TPU v4 (Jouppi & Patterson, 2022).

Another major announcement came from Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC), the Perlmutter supercomputer, built by HPE in collaboration with Nvidia and AMD features around 6,159 Nvidia A100 GPUs and roughly 1,500 AMD Milan CPUs, collectively providing an impressive 3.8 exaflops of theoretical "AI performance". It has since been instrumental in mapping the visible universe spanning 11 billion light years by processing data from the Dark Energy Spectroscopic Instrument (DESI), with early benchmarking revealing up to 20X performance speedups using the GPUs, thus reducing computational timeframes from weeks or months to merely hours (HPC Wire, 2021).

In June, Mythic announced the M1076 25 trillion operations per second (TOPS) AI processor which is capable of storing up to 80 million weighted parameters which means that it can run complex AI models without the need for external memory (Mitchell, 2021). The M1076 required ten times less power than a conventional system-on-chip or GPU. This introduction marked a shift towards creating more energy-efficient hardware solutions for AI, an important consideration as energy costs and environmental impacts become more of a concern (Sharma, 2021). 

Google published a paper in Nature detailing their approach to using AI for the floor planning stage of chip design (Mirhoseini et al., 2021). This paper was the formalization of their 2020 blog post about AI powered chip design and made the findings more transparent. They also revealed that their fourth generation TPU, released just one month earlier, was designed using this new deep reinforcement learning technique (Vincent, 2021).

Following the excitement of AI powered chip design in June, July brought the release of the Cerebrus platform from Cadence. The Cerebrus Intelligent Chip Explorer tool leverages ML to enhance the process of chip design, making engineers remarkably more productive. The introduction of ML has added an additional layer of automation to the design process, resulting in up to 10 times improved productivity per engineer and yielding a 20% enhancement in power, performance, and chip area (Takahashi, 2021).

In August SambaNova Systems, another popular chip startup, announced a unique Dataflow architecture, a high-performance and high-accuracy hardware-software system designed for AI applications​. The Dataflow architecture is powered by its innovative Cardinal SN10 reconfigureable data unit chip boasting an incredible 300 TFLOPS and up to 150 terabyte per second on-chip memory bandwidth. These high speed compute capabilities are particularly relevant in the context of machine learning and AI (Kennedy, 2021).

Canadian based startup Tenstorrent released their flagship Grayskull processor into production in late 2020. It wasn't until Hot Chips 33 in August that it got its true debut. The Grayskull is referred to as an "all-in-one" computer system. In preliminary experiments, the system hit 368 TOPs and had been observed processing up to 23,345 sentences per second using Google’s BERT-Base language model for the SQuAD 1.1 data set, giving it a 26 times performance advantage over existing solutions (Wiggers, 2020).

In October, Apple also continued upgrades to the M1 series chips released only a year earlier and already touted as the most powerful chips Apple had ever built. Most notably, both the M1 Pro and M1 Max chips came equipped with the standard 16-core Neural Engine but further enhanced for accelerating on-device ML, indicative of Apple's investment in advancing ML technology through their existing products ("Introducing M1 Pro and M1 Max," 2021). While Apple is not traditionally viewed as a trailblazer in the AI domain, the company has been concentrating on enhancing on-device inference capabilities, a concept known as Edge AI. This approach prioritizes deploying AI applications on devices within the physical world, shifting away from reliance on a centralized cloud server.

Amazon Web Services (AWS) continued hardware announcements during their annual re:Invent conference in November 2021. Among these was the third generation of their Graviton hardware, Graviton3. This latest version is three times faster for ML workloads and up to a 60% energy saving compared to other leading hardware, giving it the "best price-performance ratio in Amazon EC2" (Nikita, 2022).

Although not technically released in 2021, the startup called Groq, founded by a former Google engineer in 2016, recieved a lot of attention in late 2021 when it announced that their flagship GroqChip or Groq tensor streaming processor was used for COVID drug discovery at Argonne National Laboratory and showed a 333x speed improvement compared to legacy GPUs at the time (Westfall, 2021). This chip is highly specialized for ML workloads and not much else, capable of 1 PetaOp/s performance on a single chip implementation (Garanhel, 2022).

<div style="display:flex;justify-content:center;align-items:center;flex-direction:column;">
    <h3 style="text-align:center;">Hardware Summary 2021</h3>
    <table style="border:1px solid black; border-collapse: collapse; text-align:left;">
        <thead style="background-color: #76B900; color: #333;">
            <tr>
                <th style="font-style: italic;">Hardware</th>
                <th style="font-style: italic;">Company</th>
                <th style="font-style: italic;">Key Features</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>DALL-E</td>
                <td>OpenAI</td>
                <td>End-to-end text-to-image generative model at scale</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>GC200 IPU</td>
                <td>Graphcore</td>
                <td>2nd Gen Colossus, 59 billion transistors 250 TFLOPS (double previous MK1)</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Wafer Scale Engine 2 (WSE-2)</td>
                <td>Cerebras Systems</td>
                <td>AI processor with 2.6 trillion transistors, 850,000 cores, largest chip ever built</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>CS-2 'brain-scale' Accelerator</td>
                <td>Cerebras Systems</td>
                <td>Supports models up to 120 trillion parameters</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Grace CPU</td>
                <td>Nvidia</td>
                <td>Company's first data center CPU for AI, 10x model training performance boost</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Perlmutter Supercomputer</td>
                <td>HPE, Nvidia, AMD</td>
                <td>Supercomputer with 6,159 Nvidia A100 GPUs and 1,500 AMD Milan CPUs</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>TPU v4</td>
                <td>Google</td>
                <td>TPU with 275 TFLOPS designed for Google's data centers, AI aided design</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>M1076</td>
                <td>Mythic</td>
                <td>25 TOPS, up to 80 million weighted parameters, 10X less power than conventional GPU</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>AI Powered Chip Design</td>
                <td>Google</td>
                <td>AI for floor planning stage of chip design using reinforcement learning</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Cerebrus Platform</td>
                <td>Cadence</td>
                <td>Platform for AI assisted chip design</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Cardinal SN10</td>
                <td>SambaNova Systems</td>
                <td>300 TFLOPS and up to 150 TB/s memory</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Grayskull</td>
                <td>Tenstorrent</td>
                <td>368 TOPS, 23k+ sentences per second with BERT</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>M1 Pro/Max Chip</td>
                <td>Apple</td>
                <td>16-core Neural engine optimized for Edge AI/ML acceleration, 11 TOPS</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Graviton3</td>
                <td>AWS</td>
                <td>60% energy savings, 3x faster for ML workloads</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>GroqChip/Groq TSP</td>
                <td>Groq</td>
                <td>1 PetaOp/s, 333x speed improvement over leading GPUs</td>
            </tr>
        </tbody>
    </table>
</div>


<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.cerebras.net/wp-content/uploads/2022/03/Chip-comparison-01.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.eetasia.com/wp-content/uploads/sites/2/2021/08/Cerebras-CS-2.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://cdn.aitimes.com/news/photo/202210/147269_155272_111.png" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.servethehome.com/wp-content/uploads/2021/08/SambaNova-SN10-RDU-Cover.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.servethehome.com/wp-content/uploads/2022/05/AWS-Graviton3-Processor-Cover.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.apple.com/newsroom/images/product/mac/standard/Apple_M1-Pro-M1-Max_M1-Family_10182021_big_carousel.jpg.slideshow-xlarge_2x.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.techspot.com/images2/news/bigimage/2021/05/2021-05-20-image-16.jpg" style="width: 10%; margin-left: 1%;"/>
        <img src="https://qtxasset.com/quartz/qcloud4/media/image/fierceelectronics/1573766532/groq%20tsp%20chip.jpg?VersionId=CP3F6kc9YRrRqAuppFoRisrvVh0KERYb" style="width: 10%; margin-left: 1%;"/>
        <img src="https://www.eetasia.com/wp-content/uploads/sites/2/2021/06/Mythic-M.2-AE.jpg" style="width: 10%; margin-left: 1%; margin-right: 1%;"/>        
    </div>
</div>

## 2022

Throughout 2022, the AI hardware landscape saw an array of impressive launches from leading tech companies and startups alike. Nvidia announced the release of their new DGX Station, DGX-1, and DGX-2 built on state-of-the-art Volta GPU architecture (Gupta, 2022). The system includes the DGX A100, which is a single server system featuring multiple A100 GPUs. The chip has integrated eight GPUs at 312 TFLOPs a piece and has a GPU memory of 640 GB. Nvidia also announced the release of the H100 data center GPU, built with the new Hopper architecture boasting 60 TFLOPS with 64-bit, and scalable speed for lower numerical accuracy. All of these components are specifically designed for deep learning training, accelerated analytics, and inference (Fu, 2022).

Just one year after Google made their research and methods for incorporating AI into chip design, Nvidia announced their own incorporation of AI called 'PrefixRL'. Similar methods of reinforcement learning were incorporated into their new Hopper architecture resulting in circuits 25% smaller than those designed by humans with standard EDA tools (Roy, Raiman, & Godil, 2023). Around the same time, an internal struggle emerged at Google questioning the accuracy of findings in their original paper published in 2021 (Dave, 2022). 

Intel’s Habana Labs released the second generation of their deep learning processors for training and inference — Habana Gaudi2 (Gupta, 2022). IBM launched their first Telum Processor-based system, IBM z16, aimed at improving performance and efficiency for large datasets and featuring on-chip acceleration for AI inference (Fu, 2022).

In March and June, Apple also made significant strides in their hardware capabilities, unveiling the M1 Ultra and M2 chip, both next-generation enhancements of their breakthrough M1 chip. The M1 Ultra doubled the number of previous of neural engine cores from 16 to 32 ("Apple unveils M1 Ultra," 2022). The new mac standard neural engine in M2 can process up to 15.8 TOPS, 40% faster than the prior year. ("Apple unveils M2," 2022).

In July, IBM and Tokyo Electron made strides in 3D chip stacking by addressing the limitations posed by Moore's law. Silicon carrier wafers, a significant obstacle in 3D chip manufacturing, were at the core of their challenges. The advancements they've introduced are designed to optimize the production process, with the added advantage of potentially alleviating the global chip shortage (Peckham, 2022).

In August, Untether AI introduced a device codenamed 'Boqueria', also known as speedAI240. This 2 PFLOPs device is designed to enhance energy efficiency and density, allowing scalability for devices of different sizes — a feature that proves useful when working with language models of varying parameter sizes (Burt, 2022).

On AI Day in September, Tesla revealed its powerful Dojo chip, designed for faster training and inference in self-driving cars. Meanwhile, Cerebras Systems launched their AI supercomputer, Andromeda, aiming to accelerate academic and commercial research. Both these major advancements were reported by Gupta (2022).

AMD, though not traditionally focused on AI, released Zen 4, a new version of their Zen microarchitecture built on a 5 nm architecture, and introduced a new line of PC processors for ML capabilities. Additionally, SambaNova Systems announced the shipping of the second generation of the DataScale system—SN30. The system, powered by the Cardinal SN30 chip, is built for large models with more than 100 billion (100B) parameters and capable of handling both 2D and 3D images. These developments were detailed by Fu (2022).

The 2022 AI Hardware Summit in September showcased the emergent trend of Edge AI, pointing out its potential as a major avenue for growth and performance improvement. Edge AI has seen remarkable advancement due to the maturation of deep learning and enhanced computing power. Edge AI has enormous advantages such as reduced latency, better privacy, and reduced energy consumption. Furthermore, a noticeable shift was identified toward TPUs in Edge AI, with more vendors beginning to adopt TPUs as AI accelerators. The Summit also highlighted how AI chips have now advanced to the level of detecting human emotions, emphasizing the impressive strides being made in object detection (Fu, 2022). 

In October, AWS announced the general availability of Amazon EC2 Trn1 instances, which are powered by AWS Trainium chips, build specifically for training high-performance ML applications in the cloud. Trn1 instances claim up to a 50% cost savings over comparable GPU-based EC2 instances for training large language models (Amazon Web Services, 2022). A month later, at AWS re:Invent 2022, Amazon made the EC2 Inf2 powered by the AWS Inferentia2 generally available. This machine learning accelerator, optimized for inference, offers larger compute density enabling lower cost per query. Inf2 also boasts the ability to deploy a 175B parameter model, such as GPT-3, in a single server. These two chip architectures represent a shift from using general purpose hardware to using hardware custom built to the specific phase of the system in order to lower task-specific compute costs (Liu, 2022).

November marks the launch of IBM's AIU chip. This unique chip confines GPU floating point operations to 8-bit, a move that permits a vast increase in calculations per second with similar training accuracy. Furthermore, it demands less power and memory compared to other top-tier GPUs with identical measures. The chip shines with approximately 205 TFLOPs for 8-bit operations and 820 TFLOPs for 4-bit. The primary advantage lies in its affordability. Priced around $1,000 per chip, a set of ten AIUs could rival the performance of the Nvidia H100 GPU, which ranges between $20,000 to $30,000. IBM believes this platform will make model training 1000 times faster by 2030 (Morgan, 2022).

The November 2022 release of ChatGPT by OpenAI has highlighted the essential role of hardware in driving AI performance. The high-level functionality of ChatGPT demands significant memory, storage, and interconnect technology. For instance, GPT-3 was trained on an extensive network of 10,000 Nvidia A100 GPUs. Each Nvidia A100, which is a roughly $12,000 tensor core GPU (Kandel, 2023), is necessary to allow the 175B parameters to be trained on the 45 terabyte dataset in a reasonable amount of time. Once the model is trained, the real-time inference stage requires powerful servers for hosting the model, as well as efficient hardware capable of quickly processing requests to produce predictions quickly. A recent article suggested that OpenAI requires around 3,617 HGX A100 servers, equivalent to 28,936 single A100 GPUs, to serve ChatGPT every day. While the overhead of acquiring this hardware is very large, the massive infastructure allows ChatGPT to operate at a low cost per query of 0.36 cents (Patel & Ahmad, 2023). ChatGPT is a successful AI system that relies on highly specialized hardware for each component of model production.

<div style="display:flex;justify-content:center;align-items:center;flex-direction:column;">
    <h3 style="text-align:center;">Hardware Summary 2022</h3>
    <table style="border:1px solid black; border-collapse: collapse; text-align:left;">
        <thead style="background-color: #76B900; color: #333;">
            <tr>
                <th style="font-style: italic;">Hardware</th>
                <th style="font-style: italic;">Company</th>
                <th style="font-style: italic;">Key Features</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>DGX Station, DGX-1, DGX-2, DGX-A100</td>
                <td>Nvidia</td>
                <td>AI supercomputer clusters built on Volta GPU architecture for deep learning, analytics and inference</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>H100 data center GPU</td>
                <td>Nvidia</td>
                <td>Flagship product built on the new Hopper architecture, ideal for large-scale ML and deep learning workloads, designed with PrefixRL.</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Habana Gaudi2</td>
                <td>Intel</td>
                <td>Deep learning processor for training and inference, built with 7nm technology</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>IBM z16</td>
                <td>IBM</td>
                <td>First Telum Processor-based system, for improving performance and efficiency for large datasets, features on-chip acceleration for AI inference</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>M1 Ultra, M2</td>
                <td>Apple</td>
                <td>32-core pro model neural engine, 40% faster standard Neural Engine over previous year, 15.8 TOPs</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>3D Breakthroughs</td>
                <td>IBM/T.E.</td>
                <td>3D chip enabled silicon carrier wafers</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Dojo Supercomputer</td>
                <td>Tesla</td>
                <td>Revealed for faster training and inference in self-driving cars, claims to outperform multiple GPUs</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Zen 4</td>
                <td>AMD</td>
                <td>Microarchitecture built on a 5 nm architecture, introduced for ML capabilities</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Andromeda Supercomputer</td>
                <td>Cerebras Systems</td>
                <td>Combines 16 Cerebras CS-2 systems for academic and commercial research, performs one quintillion OPS</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>SN30 Datascale</td>
                <td>SambaNova Systems</td>
                <td>Second generation of the DataScale system, powered by Cardinal SN30 chip, built for large models with more than 100B parameters</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>EC2 Trn1, EC2 Inf2</td>
                <td>AWS</td>
                <td>Cloud instances powered by latest Trainium, Inferentia chips, built specifically for training & inference</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>AIU</td>
                <td>IBM</td>
                <td>Low cost 205 TFLOPs limited to 8-bit operations</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>ChatGPT/GPT-3 System</td>
                <td>OpenAI</td>
                <td>Trained with 10,000 Nvidia GPUs, inferenced with 28,936 Nvidia GPUs</td>
            </tr>
        </tbody>
    </table>
</div>


[hardware pictures here]

## 2023

The late 2022 release of OpenAI's ChatGPT has sparked a surge in AI advancements over the following six months, primarily fueled by an increased demand for advanced GPU hardware. The competitive landscape, featuring key players like Google with their Bard powered by PaLM 2, Microsoft's Bing AI, and Meta's LLaMA, has been driven by the development of large language models (LLMs). Latest AI hardware's power and efficiency is crucial for training large language models (LLMs), broadening their practical uses. However, hardware is just one facet of product development. Choices about parameter and data size critically shape the design, affecting everything from hardware requirements to training duration and model performance. For instance, the implications of choosing 175B parameters for OpenAI's GPT-3, 1.8 trillion for GPT-4, versus LLaMA's 65B parameters are considerable.

The 2023 developer conferences from Google, Apple, and Microsoft gave us a preview of the types of advancements we are going to see in the second half of the year and beyond. Google I/O, Microsoft Build, and Apple WWDC each highlighted advancements in AI hardware and software. Google and Microsoft are fully embracing and participating in the AI race with new virtual assistants, product features, and open LLMs just to name a few. Apple, a bit more withdrawn from the hype surrounding AI, did not once mention the term "artificial intelligence" at WWDC (Greenburg, 2023). Instead they unveiled numerous software improvements for ML across the device ecosystem along with the upgraded M2 Ultra's 32-core neural engine touted as 40% faster than the prior year 32-core model with 31.6 TOPs. ("Apple introduces M2 Ultra," 2023).

Nvidia marked a substantial milestone with its Grace CPU Superchips, finding a place in the UK-based Isambard 3 supercomputer. This setup, featuring 384 Arm-based Nvidia Grace CPU Superchips, commands a total core count exceeding 55,000. The incorporation of Arm Neoverse V2 cores offers a high-performance edge, as the Grace chips are projected to have superior speed and memory bandwidth compared to their counterparts (Kennedy, 2023).

Intel, with its Meteor Lake chips, embedded Vision Processing Units (VPUs) across all variants, thereby offloading AI processing tasks from the CPU and GPU to the VPU. This move resulted in increased power efficiency and ability to handle complex AI models, providing benefits for power-hungry applications such as Adobe suite, Microsoft Teams, and Unreal Engine (Roach, 2023). Intel also introduced the 4th generation Xeon processors with 10x speed improvement for Pytorch training/inference. The update Xeon series offers optimized models for high-performance, low-latency networks and edge workloads (Smith, 2023).

AMD introduced an AI chip called MI300X, described as "the world's most advanced accelerator for generative AI". This introduction is expected to compete head-on with Nvidia's AI chips and generate interest from major cloud providers. Simultaneously, AMD initiated high-volume shipping of a general-purpose central processor chip named "Bergamo", adopted by Meta Platforms and others for their computing infrastructure (Mohan, 2023).

Meta made its foray into AI hardware by unveiling its first custom-designed chips, the Meta Training and Inference Accelerator (MTIA) and the Meta Scalable Video Processor (MSVP). These chips, optimized for deep learning and video processing, underpin Meta's plans for a next-gen data center optimized for AI, illustrating its dedication to crafting a fully integrated AI ecosystem (Khare, 2023).

Groq gained recent attention by claiming that it had created a process to move Meta's LLaMA from Nvidia chips over to its own hardware signaling a potential threat to Nvidia's 90% GPU market share. The complexity of the current AI hardware makes it a tedious task to adapt model architectures to run quickly on new setups (Lee & Nellis, 2023). 

AI assisted chip design has resurfaced as a more popular avenue for manufacturers. Although there haven't been any significant announcements made this year with regard to AI designed chips, there is a shifting sentiment that this movement is going more mainstream due to the increase in customer contracts being reported by Synopsys and Cadence. (Ward-Foxton, 2023).

While these advancements in 2023 are indeed significant, it's important to note that the main AI Hardware conferences such as AI Hardware Summit, AWS Re:Invent, Hot Chips, and other popular conferences are yet to occur. At the time of writing we don't have the full picture of all the developments in the field for 2023. As such, the information about current state of AI hardware in 2023 is limited.

<div style="display:flex;justify-content:center;align-items:center;flex-direction:column;">
    <h3 style="text-align:center;">Hardware Summary 2023</h3>
    <table style="border:1px solid black; border-collapse: collapse; text-align:left;">
        <thead style="background-color: #76B900; color: #333;">
            <tr>
                <th style="font-style: italic;">Hardware</th>
                <th style="font-style: italic;">Company</th>
                <th style="font-style: italic;">Key Features</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Interactive LLMs and Assistants</td>
                <td>Google (BARD), Microsoft (Bing AI)</td>
                <td>Large-scale models designed for interactive and responsive tasks, leveraging the power and efficiency of the latest AI acceleration hardware</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>M2 Ultra</td>
                <td>Apple</td>
                <td>32-core neural engine, 31.6 TOPS</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Nvidia Grace CPU Superchips in Isambard 3</td>
                <td>Nvidia</td>
                <td>384 Arm-based Nvidia Grace CPU Superchips, >55,000 cores, FP64 performance, <270 kW power consumption, Arm Neoverse V2 cores</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>Meteor Lake chips with Vision Processing Units (VPUs)</td>
                <td>Intel</td>
                <td>Embedded VPUs in all chips for increased power efficiency and the ability to handle complex AI models</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Xeon v4</td>
                <td>Intel</td>
                <td>10x speed improvement for Pytorch models. Specialized models for networking optimization</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>MI300X AI Chip and Bergamo Processor</td>
                <td>AMD</td>
                <td>Introduced the MI300X, the world's most advanced accelerator for generative AI, and started high-volume shipping of the Bergamo central processor chip</td>
            </tr>
            <tr style="background-color: #E0E0E0; color: #333;">
                <td>Meta Training and Inference Accelerator (MTIA) and Meta Scalable Video Processor (MSVP)</td>
                <td>Meta</td>
                <td>Unveiled custom-designed AI chips optimized for deep learning and video processing and discussed plans for a next-gen data center optimized for AI</td>
            </tr>
            <tr style="background-color: #F8F8F8; color: #333;">
                <td>GPU Model Migration</td>
                <td>Groq</td>
                <td>Capable of migrating Meta's LLaMA off of Nvidia GPUs quicker</td>
            </tr>
        </tbody>
    </table>
</div>

[Hardware pictures here]

## MLPerf: A Closer Look

In response to the need for a clear standard for measuring hardware performance across the growing landscape of deep learning processors in 2012, researchers from Baidu, Google, Harvard, Stanford, and Berkeley developed Machine Learning Performance (MLPerf) in 2018. MLPerf is a set of industry-standard metrics that evaluates the speed of ML software and hardware. It specifically assesses training and inference performance, scalability, and power performance, tailoring these assessments to the requirements of each specific model or task (Bunting, 2023).

Although MLPerf launched in 2018, it wasn't until the very end of 2020 that it was properly scaled and standardized into the ML Commons consortium. The development of these benchmarks came together for their first official debut as MLPerf v1.0 in 2021. MLPerf consists of eight benchmark tests: image recognition, medical-imaging segmentation, two versions of object detection, speech recognition, natural-language processing, recommendation, and a form of gameplay called reinforcement learning. Often referred to as "the Olympics of machine learning", MLPerf features computer hardware and software from 21 different companies competing on any or all the tests (Moore, 2022). This incentivizes hardware companies like Nvidia to put their best foot forward. The results of the June 2022 MLPerf v2.0 benchmark tests were compared to Moore's law to show the unexpected rate of progress achieved in training times alone. The 2022 MLPerf results showed 9-10x increase in training time performance vs 2018 (Moore, 2022).

<div align="center">
      <img src="https://spectrum.ieee.org/media-library/a-chart-shows-six-lines-of-various-colors-sweeping-up-and-to-the-right.jpg?id=30049159&width=1580&quality=80" width="450">
</div>

Nvidia's AI hardware in the 2023 MLPerf tests (MLPerf v3.0) has shown a considerable performance increase over its 2022 results. Nvidia is also one of the few companies that has consistently submitted MLPerf results for all eight benchmark tests. Exploring their hardware results shows the evolution of AI compute across the ML landscape.

In 2022, Nvidia's AI platform, powered by the A100 Tensor Core GPU, demonstrated significant versatility and efficiency across MLPerf. It achieved the fastest time to train on four out of eight tests and was found to be the fastest on a per-chip basis on six out of the eight tests. This performance was attributed to full-stack innovations spanning GPUs, software, and at-scale improvements, delivering 23x more performance in 3.5 years since the first MLPerf submission (Salvator, 2022). A visual representation of these results can be seen in the left image below.

Fast forward to 2023, the results are even more impressive. The newly introduced Nvidia H100 Tensor Core GPUs, designed in part by AI, running on DGX H100 systems, not only achieved the highest performance in every test of AI inference but also saw a performance gain of up to 54% since their debut in September 2022 (Salvator, 2023). The 2023 results can be observed in the right image below.

<div align="center">
    <div style="display: flex; justify-content: center; width: 90%">
        <img src="https://www.hpcwire.com/wp-content/uploads/2021/09/Nvidia_Mlperf_Datacenter.png" style="width: 50%; margin-right: 1%;"/>
        <img src="https://blogs.nvidia.com/wp-content/uploads/2023/04/H100-GPU-inference-performance-MLPerf-1536x857.jpg" style="width: 50%; margin-left: 1%; margin-right: 1%;"/>
    </div>
</div>



Specifically, in the healthcare domain, the H100 GPUs have improved performance by 31% since launch on the 3D-UNet benchmark, used for medical imaging. Additionally, the H100 GPUs powered by the Transformer Engine excelled in the BERT benchmark, a transformer-based large language model, significantly contributing to the rise of generative AI (Salvator, 2023). 

# Future Acceleration

This fast-paced advancement towards advanced AI has stirred up a wide range of perspectives within the AI community. On one hand, figures like Geoffrey Hinton, Elon Musk, and former Google CEO Eric Schmidt urge caution, highlighting ethical and existential risks in the long run (Gairola, 2023). On the other hand, optimists such as Andrew Ng see the potential of superior AI to drive unprecedented advancements and solve global challenges (Cherney, 2023). Just as these modern figures grapple with uncertainty regarding AI development, the pioneers of this field also reflected deeply on the implications of machine advancement.

Reflecting on the pace of technological progress, Samuel Butler (1863) expressed this observation in his essay, "There is no security against the ultimate development of mechanical consciousness, in the fact of machines possessing little consciousness now... Reflect upon the extraordinary advance which machines have made during the last few hundred years, and note how slowly the animal and vegetable kingdoms are advancing." Butler's words highlight the staggering speed of machine evolution, especially when compared to the slow, incremental changes seen in biological organisms.

Years later, Alan Turing echoed these sentiments. Turing (1951) reflected on the unpredictable nature of the technological advancements of his time, asserting, "We can see plenty there that needs to be done, and although we can make some fairly reliable, though quite rough, guesses, based on past experience, as to the order in which these developments will come, there remains the possibility that any one particular piece of research may lead to consequences of a revolutionary character."

Whether or not machines will surpass human intelligence is an age old question. The difference between the hype experienced now vs the 1970's is that we have the compute capability to keep up with the hype, suggesting that we are not in a bubble that would lead to another "AI Winter". A recent report from ARK shows that Neural Networks have the most influential potential as a catalyst for a host of other technologies (ARK Investment Management LLC, 2023). 

<div align="center">
      <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F59209c8272bb9d69862c7ddecb4a70bc%2Fnn_catalyst.png?generation=1687929381525083&alt=media" width="450">
</div>

Along with this potential for continued innovation surrounding AI, we also have a dramatically different outlook on AI developments by 2030. AI training costs are currently dropping around 70% per year and projected to continue at that same pace. An important point though is that this is relative to GPT-3 level performance, the underlying factor here is a hardware cost improvement projection and not a model complexity projection. (ARK Investment Management LLC, 2023). 

<div align="center">
    <div>
    <div style="width: 70%;">
      <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2F735835fa86dba0aa643ee5b8151a91dd%2Fcosttotrain.png?generation=1687929392429753&alt=media">
      </div>
    </div>
</div>


Despite the future outlook on lower costs on training existing models like GPT-3, model complexity has already evolved far beyond this model that initially powered ChatGPT. As stated previously, GPT-3 is comprised of 175B parameters and was trained on 45 terabytes of data. The successor released in March 2023, GPT-4, contains 1.8 trillion parameters and was trained on 1 petabyte of data (E2Analyst, 2023). Any improvements in training cost for GPT-3 are now irrelevant due to the higher cost of data and much larger parameter size. A recent report by OpenAI analyzed training cost of recent AI models and found that the cost of training is increasing exponentially as shown below. Their data shows that the cost of training a model is expected to rise to $500 million USD by 2030. The primary factor driving this increase in expense is the need for more data. As model architectures become more complex the more data they will need to train on as evidenced by the recent GPT-4 (Cottier, 2023).

<div align="center">
    <div style="display: flex; justify-content: center; width: 40%;">
        <img src="https://mpost.io/wp-content/uploads/image-82-40.jpg" style="margin-right: 10px;">
    </div>
</div>

While we continue to see hardware shortages and specialized GPUs remaining costly (Vanian, 2023), a recent estimate shows that the cost of AI hardware and software, when measured by relative compute unit (RCU), will continue to decline at a consistent rate. This combination of continued innovations will eventually enable applications like ChatGPT to run inferences at such a low cost that it can be deployable to the level of Google search (ARK Investment Management LLC, 2023). 

<div align="center">
    <div style="display: flex; justify-content: center;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2Fcd48cc2bff57085850ae9c8e3fccc874%2Fcostperinf.png?generation=1687972455714214&alt=media" width="400" height="300" style="margin-right: 10px;">
        <img src="https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F1506047%2Fc0765a4abc3f6c8ff40c97c2473e2639%2Faihardwarecost.png?generation=1687929416513761&alt=media" width="400" height="300" style="margin-left: 10px;">
    </div>
</div>

# Conclusion

The last two years have clearly been historic in the development of AI specific hardware. Addressing current computational demands extends beyond simply creating more powerful hardware or doubling the number of transistors on a chip. Lower costs and greater access are achieved by optimizing hardware, software, and algorithms for the full range of system requirements. Task-specific benchmarks, such as MLPerf, will play an increasingly influential role in measuring this progress across the growing lineup of products. Recent developments in AI hardware demonstrate an emphasis on specialization, such as dedicating specific chip architectures to model training and inference tasks to lower the computational costs. However, there exists a tension between cost-efficiency and the democratization of AI systems. While hardware costs are decreasing, the growing demand for data threatens to offset these gains. Despite these challenges, the pace of hardware advancements shows no sign of slowing down, indicating an exciting future for AI innovation.

# Sources

Ali, T. (2018, May 31). IT Hardware Benchmarks for Machine Learning and Artificial Intelligence. Medium. https://medium.com/@tauheedul/it-hardware-benchmarks-for-machine-learning-and-artificial-intelligence-6183ceed39b8

Amazon Web Services. (2022, October 13). Introducing Amazon EC2 TRN1 Instances for High-Performance, Cost-Effective Deep Learning Training. https://aws.amazon.com/about-aws/whats-new/2022/10/ec2-trn1-instances-high-performance-cost-effective-deep-learning-training/

Amodei, D., & Hernandez, D. (2018, May 16). AI and Compute. OpenAI. https://openai.com/research/ai-and-compute/

Appenzeller, G., Bornstein, M., & Casado, M. (2023, April 27). Navigating the high cost of AI compute. Andreessen Horowitz. https://a16z.com/2023/04/27/navigating-the-high-cost-of-ai-compute/

Apple. (2022, June). Apple unveils M2, taking the breakthrough performance and capabilities of M1 even further. Apple Newsroom. https://www.apple.com/newsroom/2022/06/apple-unveils-m2-with-breakthrough-performance-and-capabilities/

Apple Inc. (2021, October 18). Introducing M1 Pro and M1 Max: the most powerful chips Apple has ever built. Apple Newsroom. https://www.apple.com/newsroom/2021/10/introducing-m1-pro-and-m1-max-the-most-powerful-chips-apple-has-ever-built/

Apple Inc. (2022, March). Apple unveils M1 Ultra, the world's most powerful chip for a personal computer. Apple Newsroom. https://www.apple.com/newsroom/2022/03/apple-unveils-m1-ultra-the-worlds-most-powerful-chip-for-a-personal-computer/

Apple Inc. (2023, June). Apple introduces M2 Ultra. Apple Newsroom. https://www.apple.com/newsroom/2023/06/apple-introduces-m2-ultra/

ARK Investment Management LLC. (2023, January 31). Big Ideas 2023. https://research.ark-invest.com/hubfs/1_Download_Files_ARK-Invest/Big_Ideas/ARK%20Invest_013123_Presentation_Big%20Ideas%202023_Final.pdf

Barry, D. J. (2023, April 17). Beyond Moore's Law: New solutions for beating the data growth curve. Microcontroller Tips. https://www.microcontrollertips.com/beyond-moores-law-new-solutions-beating-data-growth-curve/

Bunting, J. (2023, March 14). AI Benchmarks Are Broken. SemiEngineering. https://semiengineering.com/ai-benchmarks-are-broken/

Burt, J. (2022, August 23). Untether AI pulls the curtain rope for its next-gen inferencing system. The Next Platform. https://www.nextplatform.com/2022/08/23/untether-ai-pulls-the-curtain-rope-for-its-next-gen-inferencing-system/

Business Wire. (2021, August 24). Cerebras Systems Announces World’s First Brain-Scale Artificial Intelligence Solution. https://www.businesswire.com/news/home/20210824005644/en/Cerebras-Systems-Announces-World%E2%80%99s-First-Brain-Scale-Artificial-Intelligence-Solution

Butler, S. (1863). Darwin Among the Machines. In The Notebooks of Samuel Butler.

Cherney, M. A. (2023, June 6). Andrew Ng says AI poses no extinction risk. Silicon Valley Business Journal. https://www.bizjournals.com/sanjose/news/2023/06/06/andrew-ng-says-ai-poses-no-extinction-risk.html

Cottier, B. (2023). Trends in the dollar training cost of machine learning systems. Epoch AI. https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems

Cuenca, P. (2023, January 25). The Infrastructure Behind Serving DALL-E Mini. Weights & Biases. https://wandb.ai/dalle-mini/dalle-mini/reports/The-Infrastructure-Behind-Serving-DALL-E-Mini--VmlldzoyMTI4ODAy

Dave, P. (2022, May 3). Google faces internal battle over research on AI to speed up chip design. Reuters. https://www.reuters.com/technology/google-faces-internal-battle-over-research-ai-speed-chip-design-2022-05-03/

Dilmengani, C. (2023, June 17). AI chip makers: Top 10 companies in 2023. https://research.aimultiple.com/ai-chip-makers/

Doherty, S. (2021, August 25). Designing the Colossus MK2 IPU: Simon Knowles at Hot Chips 2021. Graphcore. https://www.graphcore.ai/posts/designing-the-colossus-mk2-ipu-simon-knowles-at-hot-chips-2021

E2Analyst. (2023). GPT-4: Everything you want to know about OpenAI’s new AI model. Medium. https://medium.com/predict/gpt-4-everything-you-want-to-know-about-openais-new-ai-model-a5977b42e495

Edwards, B. (2023, May 24). The lightning onset of AI—what suddenly changed? An Ars Frontiers 2023 recap. Ars Technica. https://arstechnica.com/information-technology/2023/05/the-lightning-onset-of-ai-what-suddenly-changed-an-ars-frontiers-2023-recap/

Freund, K. (2021, August 9). Using AI to help design chips has become a thing. Forbes. https://www.forbes.com/sites/karlfreund/2021/08/09/using-ai-to-help-design-chips-has-become-a-thing/?sh=29e752cb5d9d

Fu, J. (2022, September 29). AI frontiers in 2022. Better Programming. https://betterprogramming.pub/ai-frontiers-in-2022-5bd072fd13c

Gairola, A. (2023, May 25). Former Google CEO echoes Musk and Hinton's dire warnings on AI becoming existential risk. Benzinga. https://www.benzinga.com/news/23/05/32566930/former-google-ceo-echoes-musk-and-hintons-dire-warnings-on-ai-becoming-existential-risk

Garanhel, M. (2022, October 14). Top 20 AI Chips of Your Choice in 2022. AI Accelerator Institute. https://www.aiacceleratorinstitute.com/top-20-chips-choice-2022/

Goldie, A., & Mirhoseini, A. (2020, April 3). Chip Design with Deep Reinforcement Learning. Google AI Blog. https://ai.googleblog.com/2020/04/chip-design-with-deep-reinforcement.html

Greenberg, M. (2023, June 6). The best AI features Apple announced at WWDC 2023. VentureBeat. https://venturebeat.com/ai/the-best-ai-features-apple-announced-at-wwdc-2023/

Gupta, A. (2022, March 22). Nvidia’s Grace CPU: The ins and outs of an AI-focused processor. Ars Technica. https://arstechnica.com/gadgets/2022/03/nvidias-grace-cpu-the-ins-and-outs-of-an-ai-focused-processor/

Hamblen, M. (2023, February 16). ChatGPT runs 10K Nvidia training GPUs with potential for thousands more. Fierce Electronics. Retrieved from https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more

Hennessy, J. L., & Patterson, D. A. (2018). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.

Higginbotham, S. (2022, February 14). Google is using AI to design chips for its AI hardware. Protocol. https://www.protocol.com/google-is-using-ai-to-design-chips

Hoffman, K. (2020, February 24). We're not prepared for the end of Moore's law. MIT Technology Review. Retrieved from https://www.technologyreview.com/2020/02/24/905789/were-not-prepared-for-the-end-of-moores-law/

HPC Wire. (2021, May 27). NERSC debuts Perlmutter, world's fastest AI supercomputer. https://www.hpcwire.com/2021/05/27/nersc-debuts-perlmutter-worlds-fastest-ai-supercomputer/

Hruska, J. (2021, June 8). Intel’s 2021-2022 roadmap: Alder Lake, Meteor Lake, and a big bet on EUV. ExtremeTech. https://www.extremetech.com/computing/323126-intels-2021-2022-roadmap-alder-lake-meteor-lake-and-a-big-bet-on-euv

Intelligent Computing Lab, Peking University. (2022). Scalable Architecture for Neural Networks. http://nicsefc.ee.tsinghua.edu.cn/projects/neuralscale/

Jotrin Electronics. (2022, January 4). A brief history of the development of AI chips. Retrieved from https://www.jotrin.com/technology/details/a-brief-history-of-the-development-of-ai-chips

Jouppi, N., & Patterson, D. (2022, June 29). TPU v4 enables performance, energy, and CO2e efficiency gains. Google Cloud Blog. Retrieved from https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains

Kandel, A. (2023, April 7). Secrets of ChatGPT's AI Training: A Look at the High-Tech Hardware Behind It. Retrieved from https://www.linkedin.com/pulse/secrets-chatgpts-ai-training-look-high-tech-hardware-behind-kandel/

Kaur, D. (2021, November 3). Here's what the 2021 global chip shortage is all about. Tech Wire Asia. https://techwireasia.com/2021/11/heres-what-the-2021-global-chip-shortage-is-all-about/

Kennedy, P. (2021, August 24). SambaNova SN10 RDU at Hot Chips 33. ServeTheHome. https://www.servethehome.com/sambanova-sn10-rdu-at-hot-chips-33/

Kennedy, P. (2023, June 17). Nvidia Notches a Modest Grace Superchip Win at ISC 2023. ServeTheHome. Retrieved from https://www.servethehome.com/nvidia-notches-a-modest-grace-superchip-win-at-isc-2023-arm-hpe/

Khare, Y. (2023, June 16). Meta Reveals AI Chips to Revolutionize Computing. Analytics Vidhya. Retrieved from https://finance.yahoo.com/news/1-amd-says-meta-using-174023713.html
https://www.analyticsvidhya.com/blog/2023/05/meta-reveals-ai-chips-to-revolutionize-computing/

Lee, J., & Nellis, S. (2023, March 9). Groq adapts Meta's chatbot to its own chips in race against Nvidia. Reuters. https://www.reuters.com/technology/groq-adapts-metas-chatbot-its-own-chips-race-against-nvidia-2023-03-09/

Liu, M. (2022). Get the Latest from re:Invent 2022. AWS re:Post. https://repost.aws/articles/ARWg0vtgR7RriapTABCkBnng/get-the-latest-from-re-invent-2022

McKenzie, J. (2023, June 20). Moore’s law: further progress will push hard on the boundaries of physics and economics. Physics World. https://physicsworld.com/a/moores-law-further-progress-will-push-hard-on-the-boundaries-of-physics-and-economics/

Mitchell, R. (2021, June 19). Mythic announces latest AI chip M1076. Electropages. https://www.electropages.com/blog/2021/06/mythic-announces-latest-ai-chip-m1076

Mirhoseini, A., Goldie, A., Yazgan, M. et al. (2021). Chip placement with deep reinforcement learning. Nature 595, 230–236. https://www.nature.com/articles/s41586-021-03544-w

MLCommons. (2023, March 8). History. MLCommons. Retrieved from https://mlcommons.org/en/history/

Mohan, R. (2023, June 17). AI chip race heats up as AMD introduces rival to Nvidia technology. Tech Xplore. Retrieved from https://techxplore.com/news/2023-06-ai-chip-amd-rival-nvidia.html

Moore, G. E. (1965). Cramming more components onto integrated circuits. Electronics, 38(8), 114-117.

Moore, S. (2022). MLPerf Rankings 2022. IEEE Spectrum. https://spectrum.ieee.org/mlperf-rankings-2022

Morgan, T. P. (2022, October 20). IBM’s AI Accelerator: This Had Better Not Be Just A Science Project. The Next Platform. https://www.nextplatform.com/2022/10/20/ibms-ai-accelerator-this-had-better-not-be-just-a-science-project/

Naik, A. R. (2021, August 4). Explained: Nvidia's record-setting performance on MLPerf v1.0 training benchmarks. Analytics India Magazine. https://analyticsindiamag.com/explained-nvidias-record-setting-performance-on-mlperf-v1-0-training-benchmarks/

Nikita, S. (2022, May 27). AWS announces general availability of Graviton 3 processors. MGT Commerce. https://www.mgt-commerce.com/blog/aws-announces-general-availability-of-graviton-3-processors/

Narasimhan, S. (2022, June 29). Nvidia partners sweep all categories in MLPerf AI benchmarks. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2022/06/29/nvidia-partners-ai-mlperf/

Narendran, S. (2023, May 11). Every major AI feature announced at Google I/O 2023. ZDNet. Retrieved from https://www.zdnet.com/article/every-major-ai-feature-announced-at-google-io-2023/

Naval Group. (2023, March 2). AI-powered chip design: A revolution in the semiconductor industry. Naval Group Press Room. https://www.naval-group.com/en/news/ai-powered-chip-design-a-revolution-in-the-semiconductor-industry/

Nosta, J. (2023, March 10). Stacked exponential growth: AI is outpacing Moore's law and evolutionary biology. Medium. https://johnnosta.medium.com/stacked-exponential-growth-ai-is-outpacing-moores-law-and-evolutionary-biology-12882c38b68d

Nvidia. (2021, April 12). Nvidia Announces CPU for Giant AI and High Performance Computing Workloads. Nvidia Newsroom. https://nvidianews.nvidia.com/news/nvidia-announces-cpu-for-giant-ai-and-high-performance-computing-workloads

Nvidia. (2023, May 2). Introducing Nvidia Grace: A CPU specifically designed for giant-scale AI and HPC. Nvidia Newsroom. https://nvidianews.nvidia.com/news/introducing-nvidia-grace-a-cpu-specifically-designed-for-giant-scale-ai-and-hpc

Patel, D., & Ahmad, A. (2023, February 9). The inference cost of search disruption. SemiAnalysis. https://www.semianalysis.com/p/the-inference-cost-of-search-disruption

Peckham, O. (2022, July 7). IBM, Tokyo Electron Announce 3D Chip Stacking Breakthrough. HPCwire. https://www.hpcwire.com/2022/07/07/ibm-tokyo-electron-announce-3d-chip-stacking-breakthrough/

PR Newswire. (2018). Synopsys Unveils Fusion Compiler Enabling 20 Percent Higher Quality-of-Results and 2x Faster Time-to-Results. https://www.prnewswire.com/news-releases/synopsys-unveils-fusion-compiler-enabling-20-percent-higher-quality-of-results-and-2x-faster-time-to-results-300744510.html

Precedence Research. (2022). Artificial Intelligence (AI) in Hardware Market. https://www.precedenceresearch.com/artificial-intelligence-in-hardware-market

Roach, J. (2023, June 17). Intel thinks your next CPU needs an AI processor — here’s why. Digital Trends. https://www.digitaltrends.com/computing/intel-meteor-lake-vpu-computex-2023/

Roy, R., Raiman, J., & Godil, S. (2023, April 5). Designing arithmetic circuits with deep reinforcement learning. Nvidia Developer Blog. Retrieved from https://developer.nvidia.com/blog/designing-arithmetic-circuits-with-deep-reinforcement-learning/

Salvator, D. (2022). Nvidia Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2022/04/06/mlperf-edge-ai-inference-orin/

Salvator, D. (2023a). Inference MLPerf AI. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2023/04/05/inference-mlperf-ai/

Sharma, S. (2021, December 20). 2021 Was a Breakthrough Year for AI. VentureBeat. https://venturebeat.com/ai/2021-was-a-breakthrough-year-for-ai/

Smith, L. (2023, January 10). 4th Gen Intel Xeon Scalable Processors Launched. StorageReview. https://www.storagereview.com/news/4th-gen-intel-xeon-scalable-processors-launched

Sweeney, T. [@TimSweeneyEpic]. (2023, April 13). Artificial intelligence is doubling at a rate much faster than Moore’s Law’s 2 years, or evolutionary biology’s 2M years. Why? Because we’re bootstrapping it on the back of both laws. And if it can feed back into its own acceleration, that’s a stacked exponential. Twitter. https://twitter.com/TimSweeneyEpic/status/1646645582583267328

Synopsys. (2023). DSO.ai. Retrieved June 2023, from https://www.synopsys.com/ai/chip-design/dso-ai.html

Takahashi, D. (2021, July 22). AI’s got talent: Meet the new rising star in media and entertainment. VentureBeat. https://venturebeat.com/ais-got-talent-meet-the-new-rising-star-in-media-and-entertainment/

Tardi, C. (2023, June 17). Moore's Law. Investopedia. https://www.investopedia.com/terms/m/mooreslaw.asp

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. doi:10.1093/mind/LIX.236.433

Vanian, J. (2023, March 13). ChatGPT and generative AI are booming, but at a very expensive price. CNBC. Updated 2023, April 17. https://www.cnbc.com/2023/03/13/chatgpt-and-generative-ai-are-booming-but-at-a-very-expensive-price.html

Vellante, D., & Floyer, D. (2021, April 10). New era of innovation: Moore's law is not dead and AI is ready to explode. SiliconANGLE. https://siliconangle.com/2021/04/10/new-era-innovation-moores-law-not-dead-ai-ready-explode/

Vincent, J. (2021, June 10). Google is using machine learning to design its next generation of machine learning chips. The Verge. https://www.theverge.com/2021/6/10/22527476/google-machine-learning-chip-design-tpu-floorplanning

Ward-Foxton, S. (2023, February 10). AI-Powered Chip Design Goes Mainstream. EE Times. https://www.eetimes.com/ai-powered-chip-design-goes-mainstream/

Westfall, R. (2021, November 18). Groq Turbocharges COVID Drug Discovery at Argonne National Laboratory. Futurum Research. https://futurumresearch.com/research-notes/groq-turbocharges-covid-drug-discovery-at-argonne-national-laboratory/

Wiggers, K. (2020, April 7). Tenstorrent reveals Grayskull, an all-in-one system that accelerates AI model training. VentureBeat. https://venturebeat.com/ai/tenstorrent-reveals-grayskull-an-all-in-one-system-that-accelerates-ai-model-training/

ChatGPT (OpenAI) and Bard (Google) were used as tools while writing this essay. Their contributions were the following:
- Narrative: In the early stages of writing many ideas for the structure of the essay were generated, in the latter parts generating ideas for what could be removed.
- Research: ChatGPT's Bing integration was used throughout the information gathering process to check for additional sources outside of the ones found using traditional search methods.
- Validation: In order to ensure that my summaries of AI hardware advancements were accurate, they were checked against the contents of original sources to ensure summaries reflected the particular advancement or technology.

Ultimately the output of these generative tools was always rewritten.

In [7]:
submission_df = pd.read_csv("/kaggle/input/2023-kaggle-ai-report/sample_submission.csv")
submission_df.head()

Unnamed: 0,type,value
0,essay_category,'copy/paste the exact category that you are su...
1,essay_url,'http://www.kaggle.com/your_username/your_note...
2,feedback1_url,'http://www.kaggle.com/.../your_1st_peer_feedb...
3,feedback2_url,'http://www.kaggle.com/.../your_2nd_peer_feedb...
4,feedback3_url,'http://www.kaggle.com/.../your_3rd_peer_feedb...


In [8]:
val = ["'Other'", "http://www.kaggle.com/your_username/your_public_notebook",
      "http://www.kaggle.com/.../your_1st_peer_feedback",
      "http://www.kaggle.com/.../your_2nd_peer_feedback",
      "http://www.kaggle.com/.../your_3rd_peer_feedback"]
submission_df.value = val
submission_df.to_csv('submission.csv', index=False)

In [9]:
submission_df.head()

Unnamed: 0,type,value
0,essay_category,'Other'
1,essay_url,http://www.kaggle.com/your_username/your_publi...
2,feedback1_url,http://www.kaggle.com/.../your_1st_peer_feedback
3,feedback2_url,http://www.kaggle.com/.../your_2nd_peer_feedback
4,feedback3_url,http://www.kaggle.com/.../your_3rd_peer_feedback
