In [1]:
import numpy as np
import pandas as pd

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/2023-kaggle-ai-report/sample_submission.csv
/kaggle/input/2023-kaggle-ai-report/arxiv_metadata_20230510.json
/kaggle/input/2023-kaggle-ai-report/kaggle_writeups_20230510.csv


# Working Title - "State of AI: Hardware Advances 2021-2023"

# Introduction

Over the past two years, Artificial Intelligence (AI) has experienced a period of unprecedented growth and advancement, triggering profound changes in our interaction with machines and their societal impact. Tim Sweeny, CEO of Epic Games and a significant figure in the tech industry, encapsulated this transition in a recent tweet. He stated, "Artificial intelligence is doubling at a rate much faster than Moore’s Law’s 2 years, or evolutionary biology’s 2M years. Why? Because we’re bootstrapping it on the back of both laws. And if it can feed back into its own acceleration, that’s a stacked exponential" (Sweeney, 2023). 

In the tweet he quoted, OpenAI announced the release of an implementation of Consistency Models, a new type of generative model that achieves high sample quality without the need for adversarial training (Song, Dhariwal, Chen, & Sutskever, 2023). This innovation is a significant breakthrough because adversarial training, a key component of current generative AI methods, can be very computationally demanding and difficult to optimize. The capacity to generate high-quality samples without adversarial training means much more efficient learning models and widens the potential for AI usage across diverse fields. This represents another large step forward in AI capabilities and aligns with Sweeny's point about the rapid rate of AI advancement. With AI's unique ability to contribute to the optimization of its own development, from refining model architectures to enhancing hardware design, it distinctly sets itself apart from traditional computer hardware development. It is this self-augmenting capacity of AI that likely leads to what Sweeny refers to as the "stacked exponential" growth.

The expected growth of computing power has traditionally been benchmarked against Moore's law. However, in the last two years we have seen a surge in acceleration hardware that dramatically outpaces this prediction (Moore, 2022). What are the factors driving this rapid pace of innovation? What significant advancements in hardware have been made in the last two years? How can we measure the acceleration and what does the acceleration mean for the future of AI development? While it is impossible to cover every development, this essay will explore key advancements in AI hardware over the past two years, delve into the driving forces behind AI's swift evolution, and ponder the potential implications of this accelerated growth.

# Theoretical Foundations

Moore's Law, named after Intel co-founder Gordon E. Moore, predicts that the number of transistors on integrated circuits doubles approximately every two years, or about a ~40% annual increase. This prediction from 1975 was actually a revision from a decade earlier that densities doulbe every year. What Moore himself later described to be just a "wild extrapolation" was never really a scientific law but rather an observation of the progress in electronics that would only continue. The physics behind this says simply that as transistors get smaller, they run faster and with less power consumption. The economics then says as you place more transistors in a smaller and smaller area, the transistor is cheaper to make (McKenzie, 2023).

Historically, this has been a rough indicator of computational power growth, but it doesn't directly correlate with complex tasks like machine learning model training or inference, where factors like algorithms, data I/O speeds, memory design, and power efficiency also play major roles.

As we exited the 2010's, much of the conversation within the computer hardware community shifted towards questioning the sustainability of Moore's Law into the next decade. In 2019 and 2020, several articles, including an influential piece titled 'We’re not prepared for the end of Moore’s Law' (Hoffman, 2020), published in the MIT Technology Review, raised alarm bells about the supposed death of the once-reliable prediction. Another perspective argued that Moore's law is still alive and well, just not with the strictest definition. 

CPU performance is traditionally measurd in terahertz (THz) while the GPU is measured in trillions of floating point operations per second (TFLOPS). Neural processing units, designed specifically for machine learning applications, measures performance in trillions of operations per second (TOPS). If we only focus on measuring CPU power, we miss out on the full picture. The following chart superimposes the historical processing power of Apple iPhone chips, specically the NPU (referred to as the "neural engine" by Apple), with generic GPU and CPU performance. With the NPU specifically, we are seeing over 100% YoY processing power improvements. This is just the neural engine isolated from acceleration and signal processors (Vellante & Floyer, 2021).

<div align="center">
      <img src="https://d2axcg2cspgbkk.cloudfront.net/wp-content/uploads/Breaking-Analysis_-Moores-Law-is-Accelerating-and-AI-is-Ready-to-Explode-1.jpg" width="450">
</div>

The challenges to Moore's law are not just isolated to the particular measurement one chooses to look at. The chart on the left, taken from the sixth edition of "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson (2018), suggests a slowing pace in the doubling of transistors. This refined depiction of the growth trajectory integrates other factors and laws into its prediction, thus offering a more nuanced understanding of technology progression. The chart aligns with recent investigations indicating that the rate of transistor doubling has extended to approximately every 3.5 years, a significant departure from Moore's Law original two-year prediction (Barry, 2023). This finding underscores the evolving nature of technological advancement, presenting a challenge to the future relevance of Moore's Law in its traditional form.

<div align="center">
<table>
  <tr>
    <td align="center">
      <img src="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0780a30a-e1dc-4622-90aa-b5ec3f7c07c6_925x600.png" width="450">
    </td>
    <td align="center">
      <img src="https://i.redd.it/gtvlzsimcf981.png" width="400">
    </td>
  </tr>
</table>
</div>

The landscape of computing has evolved significantly since the inception of Moore's Law, introducing new paradigms that influence computational performance beyond mere transistor counts. For instance, the concept of Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing (RISC) plays a vital role in processor architecture design. CISC, characterized by a large set of instructions, was prominent during the early days of computing but has since given way to RISC architectures, which prioritize a smaller set of instructions executed more quickly, leading to better power efficiency and performance (Panigrahi, 2023). Most modern CPUs, including those by Intel and AMD, use a hybrid approach, while GPUs and many AI accelerators lean towards RISC.

Further, other physical laws and principles also affect computing performance. Dennard scaling, which predicts that power density (energy per unit area) would remain constant as transistors shrink, also reached its limit in the 2005-2006 time period (Platt, 2018). As transistors have become smaller, power leakage has increased, leading to higher power density, increased heat, and reduced performance per watt. This breakdown, often called the "end of Dennard scaling," then shifted the focus towards multi-core and parallel processing designs to continue the trajectory performance improvements.

However, multi-core processing designs hit their own roadblock, referred to as Amdahl's law. It states that the maximum improvement in performance due to parallelization is limited by the portion of the program that cannot be parallelized. In other words, even with an infinite number of processors, there's a limit to how much speedup you can achieve if any part of your computation must be performed sequentially (Brans, n.d.).

Despite the historical accuracy of Moore's law, the recognition of slowing caused many to look to future speedups coming from acceleration and not increased transistor count.

# Historical Context

The term "artificial intelligence" was first introduced by scientists John McCarthy, Claude Shannon, and Marvin Minsky at the Dartmouth Conference in 1956. With optimistic expectations for the field's potential, Marvin Minsky boldly proclaimed in a 1970 Life magazine article that within the next three to eight years, machines with average human intelligence would exist. The hype surrounding this forecast ignited an investment wave in the 1970s, which culminated in an AI bubble. However, when this bubble burst in the early 1980s, AI development regressed to the confines of research labs, and the field entered a long-lasting "AI Winter" (Jotrin Electronics, 2022).

This winter was primarily a result of inadequate computational power and data availability necessary for complex AI model training. During this period, Central Processing Units (CPUs) performed the bulk of computations. CPUs, although efficient at handling a wide array of tasks, were not suited for large-scale AI operations. Furthermore, the algorithms and techniques utilized during this time were still in their infancy, lacking the sophistication and effectiveness of those developed in later years.

Moore's Law had been the guiding principle for the advancement of these CPUs. However, the demands of AI computations quickly exceeded the capabilities of these CPU architectures which caused much slower progress than originally anticipated by Minsky. 

The birth of the Graphics Processing Unit (GPU) by Nvidia in 1999 marked a significant shift in this trajectory. Initially used to accelerate 3D graphics for PC video games, GPUs offloaded computational work from the CPU, enhancing processing speeds. However, their application in AI model training wasn't recognized until a decade later.

In 2009, this understanding began to change, marking a turning point in the AI hardware landscape. Geoffrey Hinton, a pioneer in the field of artificial intelligence, recommended the use of GPUs for model training. Simultaneously, Stanford University researchers Rajat Raina, Anand Madhavan, and Andrew Ng illustrated the superior computational power of modern GPUs compared to multi-core CPUs in deep learning applications (Raina et al., 2009). With the foresight of GPUs' potential impact, Hinton boldly labeled them as "the future of machine learning." This marked the beginning of an acceleration in the AI hardware landscape, far surpassing the incremental advancements predicted by Moore's Law.

The tipping point arrived in 2012 with the advent of AlexNet, a groundbreaking neural network model developed by Hinton and his student Alex Krizhevsky. Powered by Nvidia's GPU hardware, AlexNet won the ImageNet competition by delivering a record-setting image recognition accuracy (Jotrin Electronics, 2022). This marked a paradigm shift and solidified GPUs as the gold standard in the AI landscape.

Yet, the extent of AI's growing computational needs remained unpredictable. According to OpenAI's 2018 analysis, before 2012, using GPUs for machine learning was uncommon, which made the realization of such breakthroughs rare. Between 2012 and 2014, infrastructure to train on multiple GPUs was scarce. Thus, most results involved only 1-8 GPUs, rated at 1-2 TFLOPS. The period from 2014 to 2016 saw the use of 10-100 GPUs, rated at 5-10 TFLOPS, resulting in 0.1-10 petaflop/s-days. However, larger training runs had limited value due to diminishing returns on data parallelism (OpenAI, 2018).

The period from 2016 to 2017, however, witnessed a leap in computational efficiency. Methods allowing for greater algorithmic parallelism, such as huge batch sizes, architecture search, and expert iteration, were developed. Coupled with the advent of specialized hardware like Tensor Processing Units (TPUs), a specialized version of the NPU, and faster interconnects, these developments greatly increased the computational limits, at least for some applications (Amodei & Hernandez, 2018).

In the meantime, hardware has become the silent workhorse in the AI world, defining the speed and efficiency of both model training and deployment. The hardware's choice can significantly affect an AI model's learning curve, influencing how quickly it can digest and learn from data. Furthermore, the hardware used in deployment impacts the speed at which these AI models can produce predictions and respond to inputs, a critical factor in many real-time applications.

Despite the increased optimization potential of GPUs, engineers are challenged with fitting more capabilities into ever-shrinking spaces. Around 2018, AI-assisted chip design emerged as a solution to this challenge, with companies like Synopsis incorporating task-specific machine learning models, such as the Fusion Compiler. These companies utilized Design Space Exploration (DSE) and Electronic Design Automation (EDA) technologies that have been in use for decades. This area's research advanced steadily until both Google and Synopsys created a buzz around AI-powered chip design, shedding light on the potential of AI in hardware optimization and design.

On April 3rd, 2020 Google posted a blog post titled 'Chip Design with Deep Reinforcement Learning'. In this short post, they outlined a framework to teach an algorithm how to optimize chip design through a game framework. They made the claim that "this method is the first chip placement approach that has the ability to generalize" (Goldie & Mirhoseini, 2020). The AI generated designs are "comparable or superior" to humans but can be generated much more rapidly (Vincent, 2021). Synopsis later announced DSO.ai, the "worlds first autonomous AI application for chip design" ("Synopsys," n.d.). AI powered chip design has the power to drastically lower the cost of chip manufacturing and further democratizing the technology to the masses. A 2021 Forbes article estimated that the chip design process could shrink from 2-3 months to 2-3 weeks (Freund, 2021).

To further set the stage for what has been developed over the last two years, the COVID-19 pandemic in 2020 introduced a surge in demand for consumer electronics following several years of delcine (Kaur, 2021). This demand coupled with constraints on supply due to lockdowns and other bottlenecks caused a chip shortage that still continues at the time of writing. Innvoations in semiconductor production have elevated to a much higher priority. The recent surge in AI related hardware has occured despite these many production constraints, but have led to higher costs and more barriers to entry.

The November 2022 release of ChatGPT by OpenAI has highlighted the essential role of hardware in driving AI performance. The high-level functionality of ChatGPT demands significant memory and storage capacity. For instance, this system was trained on an extensive network of 10,000 Nvidia A100 HPC (high-performance computing) accelerators, each of which is a $12,500 tensor core Graphics Processing Unit (GPU) (Kandel, 2023). A case in point is the third version of ChatGPT, which features an astounding 175 billion parameters and calls for a data capacity of 45 terabytes during its training stage. This exceeds the memory capabilities of even the most powerful GPUs typically used in system training, necessitating the concurrent operation of multiple processors. While the hardware used in deployment can vary significantly based on specific application requirements, the selection of hardware is indisputably a key consideration in the realm of AI.

# The Boom of AI Hardware (2021-2023)

## 2021

As we entered the new decade, the stage was set for AI hardware to take a giant leap forward. Not only was the true power about to be put to the test with the first version of Machine Learning Performance benchmark tests (MLPerf v1.0) on the horizon, there was significant capital allocated to AI hardware development. Although there was technically a slight decrease in the number of equity funding deals vs 2020 (2,384 deals in 2021 versus 2,450 in 2020), the amount of capital invested in AI hardware companies globally almost doubled from 36 billion in 2020 to 68 billion in 2021. Market research reports later found that in 2021 the highest demand in AI hardware was be for processors (65%) rather than storage or network devices (Precedence Research, 2022) The only issue with that is while processing power is clearly outperforming expectations, network and storage devices are increasingly the bottleneck of even greater performance (Vellante & Floyer, 2021).

<div align="center">
<table>
  <tr>
    <td align="center">
      <img src="https://www.precedenceresearch.com/insightimg/Artificial-Intelligence-in-Hardware-Market-Share-By-Type-2021.jpg" width="400">
    </td>
    <td align="center">
      <img src="https://d2axcg2cspgbkk.cloudfront.net/wp-content/uploads/Breaking-Analysis_-Moores-Law-is-Accelerating-and-AI-is-Ready-to-Explode-3.jpg" width="400">
    </td>
  </tr>
</table>
</div>


2021 started off with the release of DALL-E by OpenAI in January. DALL-E, a multimodal AI system, distinguishes itself by generating images from text descriptions. Although not a hardware advancement per se, DALL-E's significance in the computational domain is undeniable. It merges two of the most computationally intensive fields in AI: computer vision and natural language processing. To train models like DALL-E and its underlying model, GPT-3, and deep learning models for images, substantial hardware resources are necessary. Moreover, to operate at scale, the model depends on a potent combination of efficient processing power, robust networking capabilities, and high-speed storage hardware. While the exact hardware setup of DALL-E haven't been disclosed, recent attempts to replicate DALL-E on a much smaller scale have shown the complexity of end-to-end hardware setup (Cuenca, 2023).

In February, Google released TensorFlow 3D, designed to help businesses develop and train models capable of comprehending 3D scenes. This offering signified an expansion in the AI model development ecosystem, with TensorFlow employing the raw power of GPUs for model training.

March saw a landmark collaboration between Nvidia and Harvard, with the development of an AI toolkit called AtacWorks. This toolkit was a testament to Nvidia's determination to tailor AI hardware to handle complex tasks such as genome analysis, thus significantly reducing associated costs and time.

In May, Google announced the introduction of their fourth-generation TPUs, for AI and machine learning workloads. TPUs, designed specifically to optimize AI computation, stood as Google's response to the rising dominance of GPUs. Another major announcement came from Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC), the Perlmutter supercomputer, built by HPE in collaboration with Nvidia and AMD features around 6,159 Nvidia A100 GPUs and roughly 1,500 AMD Milan CPUs, collectively providing an impressive 3.8 exaflops of theoretical "AI performance". It has since been instrumental in mapping the visible universe spanning 11 billion light years by processing data from the Dark Energy Spectroscopic Instrument (DESI), with early benchmarking revealing up to 20X performance speedups using the GPUs, thus reducing computational timeframes from weeks or months to merely hours (HPC Wire, 2021).

June brought another pivotal development when Mythic launched an AI processor that required ten times less power than a conventional system-on-chip or GPU. This introduction marked a shift towards creating more energy-efficient hardware solutions for AI, an important consideration as energy costs and environmental impacts become more of a concern (Sharma, 2021). Google published a paper in Nature detailing their approach to using AI for the floorplanning stage of chip design (Mirhoseini et al., 2021). This paper was the formalization of their 2020 blog post about AI powered chip design and made the findings more transparent. They also revealed that their fourth generation TPU, released just one month earlier, was designed using this new deep reinforcement learning technique (Vincent, 2021).

Following the excitement of AI powered chip design in June, July brought the release of the Cerebrus platform from Cadence. The Cerebrus Intelligent Chip Explorer tool leverages machine learning to enhance the process of chip design, making engineers remarkably more productive. The introduction of machine learning has added an additional layer of automation to the design process, resulting in up to 10 times improved productivity per engineer and yielding a 20% enhancement in power, performance, and chip area (Takahashi, 2021).

In October, Apple also continued upgrades to the M1 series chips released only a year earlier and already touted as the most powerful chips Apple had ever built. Most notably, both the M1 Pro and M1 Max chips came equipped with the standard 16-core Neural Engine but further ehnanced for accelerating on-device machine learning, indicative of Apple's investment in advancing machine learning technology through their existing products ("Introducing M1 Pro and M1 Max," 2021).

November showcased advancements from both Nvidia and Amazon. Nvidia announced Omniverse Avatar, a platform harnessing AI hardware capabilities to create real-time interactive avatars, signifying an innovative use of AI hardware. Simultaneously, Amazon unveiled its Graviton3 processors for AI inferencing, illustrating an industry trend towards using AI-specific processors for distinct tasks such as inference (Sharma, 2021).

The year 2021 was also an exciting year for AI chip manufacturing startups. In April, Cerebras Systems unveiled an AI supercomputing processor containing an unprecedented 2.6 trillion transistors called WSE-2. This powerful computational device underscores the intensifying demand for advanced AI hardware to keep up with the increasingly intricate tasks (source). In June, Mythic announced the M1076 25 TOPS AI processor which is capbable of storing up to 80 million weighted parameters which means that it can run complext AI models without the need for external memory (Mitchell, 2021).

### Hardware Summary 2021

| Hardware | Company | Key Features |
|---|---|---|
| DALL-E | OpenAI | End-to-end text-to-image generative model at scale |
| Perlmutter supercomputer | NERSC | Advanced supercomputer for scientific research, with more than 7000 Nvidia A100 GPUs |
| Grace CPU | Nvidia | Arm-based CPU designed for AI and high-performance computing |
| DGX SuperPOD | Nvidia | AI supercomputer for enterprise-level AI training and inference |
| Google TPU v4 | Google | Tensor Processing Unit designed for Google's data centers, AI aided design |
| Habana Gaudi AI Training Processor | Intel | High-performance AI processor focused on training tasks |
| Wafer Scale Engine 2 (WSE-2) | Cerebras Systems | Extremely large chip (size of a dinner plate) designed for AI tasks, contains 2.6 trillion transistors |
| M1076 | Mythic | 25 TOPS, up to 80 million weighted parameters | 
| Snapdragon 888 5G | Qualcomm | Mobile platform with integrated 5G and an improved AI Engine |
| M1 Pro/Max Chip | Apple | 16-core Neural engine optimized for ML acceleration |


## 2022


Throughout 2022, the AI hardware landscape saw an array of impressive launches from leading tech companies and startups alike. Nvidia announced the release of their new DGX Station, DGX-1, and DGX-2 built on state-of-the-art Volta GPU architecture (Gupta, 2022). The system includes the DGX A100, the flagship chip of NVIDIA designed for data centres. The chip has integrated eight GPUs and has a GPU memory of 640 GB. Nvidia also announced the release of the H100 data center GPU, the flagship product for the new Hopper architecture. All of these components are specifically designed for deep learning training, accelerated analytics, and inference (Fu, 2022).

Just one year after Google made their research and methods for incorporating AI into chip design, Nvidia announced their own incorpriation of AI called 'PrefixRL'. Similar methods of reinforcement learning were incorporated into their new Hopper architecture resulting in circuits 25% smaller than those designed by humans with standard EDA tools (Roy, Raiman, & Godil, 2023). Around the same time, an internal struggle emerged at Google questioning the accuracy of findings in their original paper published in 2021 (Dave, 2022). 

Intel’s Habana Labs released the second generation of their deep learning processors for training and inference — Habana Gaudi2 (Gupta, 2022). IBM launched their first Telum Processor-based system, IBM z16, aimed at improving performance and efficiency for large datasets and featuring on-chip acceleration for AI inference (Fu, 2022).

In March and June, Apple also made significant strides in their hardware capabilities, unveiling the M1 Ultra and M2 chip, both next-generation enhancements of their breakthrough M1 chip. The M1 Ultra doubled the number of previous of neural engine cores from 16 to 32 ("Apple unveils M1 Ultra," 2022). The new mac standard neural engine in M2 can process up to 15.8 trillion operations per second, 40% faster than the prior year. ("Apple unveils M2," 2022).

In July, IBM and Tokyo Electron made strides in 3D chip stacking, innovatively addressing the limitations posed by Moore's law. Silicon carrier wafers, a significant obstacle in 3D chip manufacturing, were at the core of their challenges. The advancements they've introduced are designed to optimize the production process, with the added advantage of potentially alleviating the global chip shortage (Peckham, 2022).

On AI Day in September, Tesla revealed its powerful Dojo chip, designed for faster training and inference in self-driving cars (Gupta, 2022). AMD, though not traditionally focused on AI, released Zen 4, a new version of their Zen microarchitecture built on a 5 nm architecture, and introduced a new line of PC processors for machine learning capabilities (Fu, 2022). Meanwhile, Cerebras Systems launched their AI supercomputer, Andromeda, aiming to accelerate academic and commercial research (Gupta, 2022).

In the same vein, SambaNova Systems announced the shipping of the second generation of the DataScale system—SN30. The system, powered by the Cardinal SN30 chip, is built for large models with more than 100 billion parameters and capable of handling both 2D and 3D images (Fu, 2022).

By mid-2022 we had a pretty good understanding of the state of the market for the prior year and where things were headed. The AI hardware market was valued at 10 billion in 2021 and was projected to grow to almost 90 billion by 2030 (Precedence Research, 2022).


<div align="center">
      <img src="https://www.precedenceresearch.com/insightimg/Artificial-Intelligence-in-Hardware-Market-Size-2021-to-2030.jpg" width="400"/>
</div>



The AI Hardware Summit held in September 2022 showcased the emergent trend of Edge AI, pointing out its potential as a major avenue for growth and performance improvement. Edge AI, which refers to deploying AI applications on devices throughout the physical world rather than a centralized cloud server, has seen remarkable advancement due to the maturation of deep learning and enhanced computing power. Edge AI has enormous advantages such as reduced latency, better privacy, and reduced energy consumption. The Summit also highlighted how AI chips have now advanced to the level of detecting human emotions, emphasizing the impressive strides being made in edge computing and object detection. Furthermore, a noticeable shift was identified toward TPUs in Edge AI, with more vendors beginning to adopt TPUs as AI accelerators.

One of the key themes of the Summit was the rise of foundation models in AI, signaling a new era in AI development. These models, trained on massive amounts of data and adapted for multiple applications, have started to replace the task-specific models that previously dominated the AI landscape. Although still relatively nascent and not entirely understood, foundation models have shown tremendous potential and are being deployed at scale.

Another pivotal discussion point was the evolving large-scale AI infrastructure. The focus was on developing high-performance computers with AI-optimized accelerators, efficient software for AI development, robust data center environments, and even innovative cooling solutions for high-density computing equipment (Fu, 2022).

In 2022, one of the noteworthy advancements in AI wasn't a physical piece of hardware, but a sophisticated language model known as ChatGPT, trained by OpenAI. Despite its primary role as an interactive web application, its existence and performance have significant implications for the hardware domain. ChatGPT, which requires substantial computational power for both training and inference as previously mentioned, is a testament to the increasing demand for advanced hardware capable of supporting such large models. Training these large models often requires specialized hardware and would not be possible without prior advancements in GPUs or TPUs that can handle a large amount of data and perform parallel computations. Moreover, the inference stage often requires powerful servers for hosting the models, as well as efficient hardware capable of quickly processing requests in real-time. The success of ChatGPT underscores the intertwined relationship between AI software and hardware advancements, where each drives progress in the other.

### Hardware Summary 2022

| Hardware | Company | Key Features |
|---|---|---|
| DGX Station, DGX-1, and DGX-2, DGX-A100 | Nvidia | AI supercomputers built on Volta GPU architecture for deep learning, analytics and inference |
| H100 data center GPU | Nvidia | Flagship product built on the new Hopper architecture, ideal for large-scale machine learning and deep learning workloads, designed with PrefixRL. |
| Habana Gaudi2 | Intel | Deep learning processor for training and inference, built with 7nm technology |
| IBM z16 | IBM | First Telum Processor-based system, for improving performance and efficiency for large datasets, features on-chip acceleration for AI inference |
| 3D Breakthroughs | IBM/T.E. | 3D chip enabled silicon carrier wafers |
| Zen 4 | AMD | Microarchitecture built on a 5 nm architecture, introduced for machine learning capabilities |
| Dojo Supercomputer | Tesla | Revealed for faster training and inference in self-driving cars, claims to outperform multiple GPUs |
| Andromeda Supercomputer | Cerebras Systems | Combines 16 Cerebras CS-2 systems for academic and commercial research, performs one quintillion operations per second |
| SN30 Datascale | SambaNova Systems | Second generation of the DataScale system, powered by Cardinal SN30 chip, built for large models with more than 100 billion parameters |
| M1 Ultra, M2 | Apple | 32-core pro model neural engine, 40% faster standard Neural Engine over previous year, 15.8 trillion operations per second |

## 2023

The boom in AI hardware in 2023 is characterized by a proliferation of new platforms engineered for high performance, extreme scalability, energy efficiency, and sophisticated deep learning techniques. These advancements have unlocked new frontiers in the AI and machine learning landscape, with significant contributions coming from industry powerhouses such as Google, Nvidia, Intel, AMD, Apple, and Meta.

The late 2022 release of OpenAI's ChatGPT has sparked a surge in AI advancements over the following six months, primarily fueled by an increased demand for advanced GPU hardware. The competitive landscape, featuring key players like Google with their Bard powered by PaLM 2, Microsoft's Bing AI, and Meta's LLaMA, has been driven by the development of large language models (LLMs). Latest AI hardware's power and efficiency is crucial for training large language models (LLMs), broadening their practical uses. However, hardware is just one facet of product development. Choices about parameter and data size critically shape the design, affecting everything from hardware requirements to training duration and model performance. For instance, the implications of choosing 175 billion parameters for OpenAI's GPT-3, 170 trillion for GPT-4, versus LLaMA's 65 billion are considerable.

In addition to the most recent advancements in LLMs, we can also look to the most recent developer conferences from Google, Apple, and Microsoft to give us an idea of the types of advancements we are going to see in the latter end of 2023. Google I/O, Microsoft Build, and Apple WWDC are the yearly flagship conferences where developers in particular get a deeper dive into the software and hardware that will launch later in the year. Google and Microsoft are fully embracing and participating in the AI race with new virtual assistants, product features, and open LLMs just to name a few. Apple, a bit more subtle in the AI race, did not once mention the term "artificial intelligence" at this years WWDC (Greenburg, 2023). Instead they unveiled numerous software improvements for machine learning across the device ecosystem along with the upgraded M2 Ultra's 32-core neural engine touted as 40% than the prior year 32-core model. ("Apple introduces M2 Ultra," 2023).

Google made a giant leap in its Cloud TPU v4, offering a staggering 10x increase in machine learning system performance compared to its predecessor, TPU v3. With innovative interconnect technologies and domain-specific accelerators, the TPU v4 not only amplifies performance, but it also champions energy efficiency, leading to a reduction in CO2 emissions. Notably, the TPU v4 is tailored for LLMs such as LaMDA, MUM, and PaLM, with the PaLM model delivering 57.8% of peak hardware floating-point performance over 50 days of training on the TPU v4 (Jouppi & Patterson, 2022).

Nvidia marked a substantial milestone with its Grace CPU Superchips, finding a place in the UK-based Isambard 3 supercomputer. This setup, featuring 384 Arm-based Nvidia Grace CPU Superchips, commands a total core count exceeding 55,000. It delivers FP64 performance within a remarkable power envelope of under 270kW. The incorporation of Arm Neoverse V2 cores offers a high-performance edge, as the Grace chips are projected to have superior speed and memory bandwidth compared to their counterparts (Kennedy, 2023).

Intel, with its Meteor Lake chips, embedded Vision Processing Units (VPUs) across all variants, thereby offloading AI processing tasks from the CPU and GPU to the VPU. This move resulted in increased power efficiency and ability to handle complex AI models, providing benefits for power-hungry applications such as Adobe suite, Microsoft Teams, and Unreal Engine (Roach, 2023).

AMD introduced an AI chip called MI300X, described as "the world's most advanced accelerator for generative AI". This introduction is expected to compete head-on with Nvidia's AI chips and generate interest from major cloud providers. Simultaneously, AMD initiated high-volume shipping of a general-purpose central processor chip named "Bergamo", adopted by Meta Platforms and others for their computing infrastructure (Mohan, 2023).

Meta made its foray into AI hardware by unveiling its first custom-designed chips, the Meta Training and Inference Accelerator (MTIA) and the Meta Scalable Video Processor (MSVP). These chips, optimized for deep learning and video processing, underpin Meta's plans for a next-gen data center optimized for AI, illustrating its dedication to crafting a fully integrated AI ecosystem (Khare, 2023).

Although Nvidia currently has 90% AI computing market share, companies like Cerebras, AMD, Intel, IBM, and another startup called Groq are determined to chip away at that lead. Groq is a startup founded by a former Google engineer in 2016 but gained recent attention by claiming that it had created a process to move Meta's LLaMA from Nvidia chips over to its own hardware. The complexity of the current AI hardware makes it a tedious task to adapt model architectures to run quickly on new setups (Lee & Nellis, 2023). 

While these advancements in 2023 are indeed significant, it's important to note that the AI Hardware Summit for the year is yet to occur, indicating that we don't have the full picture of all the developments in the field for this year. As such, the current state of AI hardware in 2023 should be viewed as a work in progress, awaiting further updates and advancements.

In regards to AI chip design, the four major players continue to be Synopsis, Cadence, Google, and Nvidia . Althought there haven't been any significant announcements made this year about a new AI designed chips, there is a shifting sentiment that this movement is going more mainstream due to the increase in customer contracts being reported by Synopsis and Cadence. (Ward-Foxton, 2023).

| Hardware | Company | Key Features |
| -------- | ------- | ------------ |
| Meteor Lake chips with Vision Processing Units (VPUs) | Intel | Embedded VPUs in all chips for increased power efficiency and the ability to handle complex AI models |
| MI300X AI Chip and Bergamo Processor | AMD | Introduced the MI300X, the world's most advanced accelerator for generative AI, and started high-volume shipping of the Bergamo central processor chip |
| Meta Training and Inference Accelerator (MTIA) and Meta Scalable Video Processor (MSVP) | Meta | Unveiled custom-designed AI chips optimized for deep learning and video processing and discussed plans for a next-gen data center optimized for AI |
| M2 Ultra | Apple | 32-core neural engine, 31.6 trillion operations per second |
| Google Cloud TPU v4 | Google | Exascale ML performance, 4096 chips, dynamic OCS reconfigurability, hardware support for embeddings, 3D torus interconnect |
| Nvidia Grace CPU Superchips in Isambard 3 | Nvidia | 384 Arm-based Nvidia Grace CPU Superchips, >55,000 cores, FP64 performance, <270 kW power consumption, Arm Neoverse V2 cores |
| Interactive LLMs | Google (BARD), Microsoft (Bing AI) | Large-scale models designed for interactive and responsive tasks, leveraging the power and efficiency of the latest AI hardware |


# Other AI Hardware Performance Benchmarks

Moore's Law has been a useful guideline for hardware development, but predicting the pace of improvement in machine learning performance is much more complex due to these many additional factors. This is one reason why more relevant benchmarks like MLPerf are so valuable - they offer a more holistic view of system performance for more relevant tasks.

There are several benchmarks that are commonly used today to evaluate the performance of ML/AI hardware specifically. MLPerf is one of the most popular of the last two years (MLCommons, 2023). Developed by a consortium of tech companies in 2018, MLPerf benchmarks measure the speed of machine learning software and hardware.

Another benchmark that has gained attention recently is the AI Benchmark, which is designed specifically for AI tasks on mobile devices. This benchmark measures the speed, accuracy, and power efficiency of AI algorithms on various hardware platforms, including CPUs, GPUs, and dedicated AI accelerators.

The Compute Architecture Benchmark Review (CARB) is another initiative that aims to provide clear, consistent performance benchmarks for various computational tasks, including machine learning.

However, these benchmarks focus mainly on the operational aspect of machine learning - i.e., how quickly a given piece of hardware can perform a specific task. They do not necessarily reflect the research or development aspect of machine learning - i.e., how quickly a new model can be developed, trained, and optimized.

Metrics like time-to-solution or time-to-accuracy are more indicative of the research productivity, which often involves multiple iterations of model development, training, and optimization.

Benchmarks such as MLPerf HPC provide insights into the performance of hardware on High Performance Computing (HPC) workloads, which include tasks like weather forecasting, quantum mechanics, and molecular dynamics, along with machine learning tasks.

# Comparing the Pace: Moore's Law vs. AI Hardware Growth

Although MLPerf launched in 2018, it wasn't until the very end of 2020 that it was properly scaled and standardized into the ML Commons consortium. This is why the MLPerf tests of 2021 are referred to as MLPerf v1.0. MLPerf consists of eight benchmark tests: image recognition, medical-imaging segmentation, two versions of object detection, speech recognition, natural-language processing, recommendation, and a form of gameplay called reinforcement learning. MLPerf is often referred to as "the Olympics of machine learning" because computers and software from 21 different companies compete on any or all the tests [IEEE]. This incentivizes hardware companies like Nvidia to put their best foot forward.

In 2022 an IEEE Spectrum article came out following MLPerf v2.0, the June 2022 benchmark test results, that specifically described the rapid outpacing of AI hardware and training times compared to Moore's law. 

<div align="center">
      <img src="https://spectrum.ieee.org/media-library/a-chart-shows-six-lines-of-various-colors-sweeping-up-and-to-the-right.jpg?id=30049159&width=1580&quality=80" width="450">
</div>

Based on the recent release of the 2023 MLPerf results, the pace of AI innovation is not only continuing but accelerating at a rate much faster than previously predicted. Nvidia's AI platform in 2023 has shown a considerable performance increase over its 2022 results, reaffirming Tim Sweeny's statement that AI is "doubling at a rate much faster than Moore’s Law’s 2 years."

In 2022, Nvidia's AI platform, powered by the A100 Tensor Core GPU, demonstrated significant versatility and efficiency across all eight MLPerf benchmarks. It achieved the fastest time to train on four out of eight tests and was found to be the fastest on a per-chip basis on six out of the eight tests. This performance was attributed to full-stack innovations spanning GPUs, software, and at-scale improvements, delivering 23x more performance in 3.5 years since the first MLPerf submission.

<div align="center">
      <img src="https://blogs.nvidia.com/wp-content/uploads/2022/04/MLPerf-inference-April-22-FINAL2-1-1536x663.jpg.webp" width="450">
</div>

Fast forward to 2023, the results are even more impressive. The newly introduced Nvidia H100 Tensor Core GPUs, designed in part by AI, running on DGX H100 systems, not only achieved the highest performance in every test of AI inference but also saw a performance gain of up to 54% since their debut in September 2022. Furthermore, the Nvidia L4 Tensor Core GPUs, which debuted in the MLPerf tests, ran over 3x the speed of prior-generation T4 GPUs, demonstrating another significant leap in AI performance for 2023 (Salvator, 2023). This unprecedented progress was in part due to Nvidia's Transformer Engine, a testament to the company's commitment to optimizing software and hardware innovations to push the boundaries of AI performance.

<div align="center">
      <img src="https://blogs.nvidia.com/wp-content/uploads/2023/04/H100-GPU-inference-performance-MLPerf-1536x857.jpg" width="450">
</div>

Specifically, in the healthcare domain, the H100 GPUs have improved performance by 31% since launch on the 3D-UNet benchmark, used for medical imaging. Additionally, the H100 GPUs powered by the Transformer Engine excelled in the BERT benchmark, a transformer-based large language model, significantly contributing to the rise of generative AI. 

This rapid advancement showcases how technology evolves by "bootstrapping" on previous laws, as Sweeny noted. As these technologies accelerate their development, they also enable new levels of efficiency and capabilities that would have been inconceivable in previous years. In essence, these advancements show that we are not only building upon the foundations laid by Moore's Law but are also accelerating beyond it, providing the fertile ground necessary for the exponential growth of AI technologies.

# Implications and Consequences of this Accelerated Pace

The accelerated development of AI in the last two years, as suggested by Sweeney's concept of 'stacked exponential acceleration', presents both exceptional opportunities and significant challenges. This notion of rapid advancement not only outpaces the linear progression predicted by Moore's Law and the gradual pace of biological evolution, but also posits that AI could potentially reach, or even surpass, human-level intelligence.

This notion sparks a dichotomy of perspectives within the AI community. On one side, individuals such as Geoffrey Hinton and Elon Musk urge caution, highlighting ethical and existential risks. Conversely, optimists like Andrew Ng emphasize the potential of superior AI to drive unprecedented advancements and solve global challenges [source]. However, it is clear that the trajectory of AI's future remains as relevant and uncertain as when pioneers like Samuel Butler and Alan Turing first contemplated it.

As Samuel Butler posited in his 1863 essay, *"It is said that the power of a machine to imitate human skill in any real sense of the words depends on the degree of skill possessed by the designer or constructor, and that unless he knew a thing himself he could not possibly teach it to his machine"* (Butler, 1863). Alan Turing, in his seminal work on machine intelligence, echoed this sentiment in his 1950 paper: *"'It is said that these machines can only do what we know how to order them to perform', a rather begging of the question. It might also be said that we can only do what machines will allow us to do, for this is equally true"* (Turing, 1950).

An important consideration in this discourse is the accessibility and cost of AI technologies. Despite the accelerated pace of AI development, it has not directly led to reduced costs, although it has increased accessibility. This indicates that while every individual or organization may not need to train their own AI models, they could leverage trained models to fulfill various use cases.

Furthermore, the demand for a robust framework of AI benchmarks continues to grow, prompting the need for innovative ways to measure and predict AI progress. This need is underscored by recent global challenges such as increased demand for GPUs, chip shortages, and fluctuating stock prices of chip manufacturers.

# Looking Ahead


Comparing modern hardware benchmarks to Moore's Law, it's clear that while Moore's Law provides a valuable historical context for the evolution of computing power, it doesn't fully encapsulate the multifaceted nature of the explosion in recent hardware acceleration. Current benchmarks like MLPerf provide a much more nuanced view of this phenomenon, taking into account the complexity of the tasks being performed, the efficiency of the algorithms used, and the intricacies of the hardware and software designs. The more relevant measures show that not only is Moore's law still relevant but it is in many cases being far outpaced every year.

Going forward, the question is not just how quickly we can double the number of transistors on a chip, but how we can best optimize the entire system - hardware, software, and algorithms - to deliver the most effective performance for the full range of tasks espeically those that revolve around model training. This reflects an overall shift from a focus on hardware alone (as epitomized by Moore's Law) to a more holistic view of computing performance. The future of computing is bright and the current incentives around AI hardware in particular will continue to drive performance innovations at speeds we have not yet experienced in the age of computing.

# Sources

Amodei, D., & Hernandez, D. (2018, May 16). AI and Compute. OpenAI. https://openai.com/research/ai-and-compute/

Apple. (2022, June). Apple unveils M2, taking the breakthrough performance and capabilities of M1 even further. Apple Newsroom. https://www.apple.com/newsroom/2022/06/apple-unveils-m2-with-breakthrough-performance-and-capabilities/

Apple Inc. (2021, October 18). Introducing M1 Pro and M1 Max: the most powerful chips Apple has ever built. Apple Newsroom. https://www.apple.com/newsroom/2021/10/introducing-m1-pro-and-m1-max-the-most-powerful-chips-apple-has-ever-built/

Apple Inc. (2022, March). Apple unveils M1 Ultra, the world's most powerful chip for a personal computer. Apple Newsroom. https://www.apple.com/newsroom/2022/03/apple-unveils-m1-ultra-the-worlds-most-powerful-chip-for-a-personal-computer/

Apple Inc. (2023, June). Apple introduces M2 Ultra. Apple Newsroom. https://www.apple.com/newsroom/2023/06/apple-introduces-m2-ultra/

Barry, D. J. (2023, April 17). Beyond Moore's Law: New solutions for beating the data growth curve. Microcontroller Tips. https://www.microcontrollertips.com/beyond-moores-law-new-solutions-beating-data-growth-curve/

Brans, P. Amdahl's law. https://www.techtarget.com/whatis/definition/Amdahls-law

Butler, S. (1863). Darwin Among the Machines. In The Notebooks of Samuel Butler.

Cuenca, P. (2023, January 25). The Infrastructure Behind Serving DALL-E Mini. Weights & Biases. https://wandb.ai/dalle-mini/dalle-mini/reports/The-Infrastructure-Behind-Serving-DALL-E-Mini--VmlldzoyMTI4ODAy

Dave, P. (2022, May 3). Google faces internal battle over research on AI to speed up chip design. Reuters. https://www.reuters.com/technology/google-faces-internal-battle-over-research-ai-speed-chip-design-2022-05-03/

Dilmengani, C. (2023, June 17). AI chip makers: Top 10 companies in 2023. https://research.aimultiple.com/ai-chip-makers/

Edwards, B. (2023, May 24). The lightning onset of AI—what suddenly changed? An Ars Frontiers 2023 recap. Ars Technica. https://arstechnica.com/information-technology/2023/05/the-lightning-onset-of-ai-what-suddenly-changed-an-ars-frontiers-2023-recap/

Freund, K. (2021, August 9). Using AI to help design chips has become a thing. Forbes. https://www.forbes.com/sites/karlfreund/2021/08/09/using-ai-to-help-design-chips-has-become-a-thing/?sh=29e752cb5d9d

Fu, J. (2022, September 29). AI frontiers in 2022. Better Programming. https://betterprogramming.pub/ai-frontiers-in-2022-5bd072fd13c

Goldie, A., & Mirhoseini, A. (2020, April 3). Chip Design with Deep Reinforcement Learning. Google AI Blog. https://ai.googleblog.com/2020/04/chip-design-with-deep-reinforcement.html

Greenberg, M. (2023, June 6). The best AI features Apple announced at WWDC 2023. VentureBeat. https://venturebeat.com/ai/the-best-ai-features-apple-announced-at-wwdc-2023/

Gupta, A. (2022, March 22). Nvidia’s Grace CPU: The ins and outs of an AI-focused processor. Ars Technica. https://arstechnica.com/gadgets/2022/03/nvidias-grace-cpu-the-ins-and-outs-of-an-ai-focused-processor/

Hamblen, M. (2023, February 16). ChatGPT runs 10K Nvidia training GPUs with potential for thousands more. Fierce Electronics. Retrieved from https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more

Hennessy, J. L., & Patterson, D. A. (2018). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.

Higginbotham, S. (2022, February 14). Google is using AI to design chips for its AI hardware. Protocol. https://www.protocol.com/google-is-using-ai-to-design-chips

Hoffman, K. (2020, February 24). We're not prepared for the end of Moore's law. MIT Technology Review. Retrieved from https://www.technologyreview.com/2020/02/24/905789/were-not-prepared-for-the-end-of-moores-law/

HPC Wire. (2021, May 27). NERSC debuts Perlmutter, world's fastest AI supercomputer. https://www.hpcwire.com/2021/05/27/nersc-debuts-perlmutter-worlds-fastest-ai-supercomputer/

Hruska, J. (2021, June 8). Intel’s 2021-2022 roadmap: Alder Lake, Meteor Lake, and a big bet on EUV. ExtremeTech. https://www.extremetech.com/computing/323126-intels-2021-2022-roadmap-alder-lake-meteor-lake-and-a-big-bet-on-euv

Intelligent Computing Lab, Peking University. (2022). Scalable Architecture for Neural Networks. http://nicsefc.ee.tsinghua.edu.cn/projects/neuralscale/

Jotrin Electronics. (2022, January 4). A brief history of the development of AI chips. Retrieved from https://www.jotrin.com/technology/details/a-brief-history-of-the-development-of-ai-chips

Jouppi, N., & Patterson, D. (2022, June 29). TPU v4 enables performance, energy, and CO2e efficiency gains. Google Cloud Blog. Retrieved from https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains

Kandel, A. (2023, April 7). Secrets of ChatGPT's AI Training: A Look at the High-Tech Hardware Behind It. Retrieved from https://www.linkedin.com/pulse/secrets-chatgpts-ai-training-look-high-tech-hardware-behind-kandel/

Kaur, D. (2021, November 3). Here's what the 2021 global chip shortage is all about. Tech Wire Asia. https://techwireasia.com/2021/11/heres-what-the-2021-global-chip-shortage-is-all-about/

Kennedy, P. (2023, June 17). Nvidia Notches a Modest Grace Superchip Win at ISC 2023. ServeTheHome. Retrieved from https://www.servethehome.com/nvidia-notches-a-modest-grace-superchip-win-at-isc-2023-arm-hpe/

Khare, Y. (2023, June 16). Meta Reveals AI Chips to Revolutionize Computing. Analytics Vidhya. Retrieved from https://finance.yahoo.com/news/1-amd-says-meta-using-174023713.html
https://www.analyticsvidhya.com/blog/2023/05/meta-reveals-ai-chips-to-revolutionize-computing/

Kharpal, A. (2023, May 28). Europe's bold $145 billion plan to rival U.S., Asian chip giants could 'fail,' experts warn. CNBC. https://www.cnbc.com/2023/05/28/europe-chip-strategy-could-fail-experts-warn.html

Lee, J., & Nellis, S. (2023, March 9). Groq adapts Meta's chatbot to its own chips in race against Nvidia. Reuters. https://www.reuters.com/technology/groq-adapts-metas-chatbot-its-own-chips-race-against-nvidia-2023-03-09/

Mack, M. (2019). Fifty Years of Moore's Law. IEEE Transactions on Semiconductor Manufacturing, 24(2), 202-207.

Martin, C. (2023, April 12). China’s chip ambitions hit by US sanctions, but Beijing remains determined to catch up. South China Morning Post. https://www.scmp.com/news/china/diplomacy/article/3130528/chinas-chip-ambitions-hit-us-sanctions-beijing-remains

McKenzie, J. (2023, June 20). Moore’s law: further progress will push hard on the boundaries of physics and economics. Physics World. https://physicsworld.com/a/moores-law-further-progress-will-push-hard-on-the-boundaries-of-physics-and-economics/

Mitchell, R. (2021, June 19). Mythic announces latest AI chip M1076. Electropages. https://www.electropages.com/blog/2021/06/mythic-announces-latest-ai-chip-m1076

Mirhoseini, A., Goldie, A., Yazgan, M. et al. (2021). Chip placement with deep reinforcement learning. Nature 595, 230–236. https://www.nature.com/articles/s41586-021-03544-w

MLCommons. (2023, March 8). History. MLCommons. Retrieved from https://mlcommons.org/en/history/

Mohan, R. (2023, June 17). AI chip race heats up as AMD introduces rival to Nvidia technology. Tech Xplore. Retrieved from https://techxplore.com/news/2023-06-ai-chip-amd-rival-nvidia.html

Moore, G. E. (1965). Cramming more components onto integrated circuits. Electronics, 38(8), 114-117.

Moore, S. (2022). MLPerf Rankings 2022. IEEE Spectrum. https://spectrum.ieee.org/mlperf-rankings-2022

Naik, A. R. (2021, August 4). Explained: NVIDIA's record-setting performance on MLPerf v1.0 training benchmarks. Analytics India Magazine. https://analyticsindiamag.com/explained-nvidias-record-setting-performance-on-mlperf-v1-0-training-benchmarks/

Narasimhan, S. (2022, June 29). NVIDIA partners sweep all categories in MLPerf AI benchmarks. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2022/06/29/nvidia-partners-ai-mlperf/

Narendran, S. (2023, May 11). Every major AI feature announced at Google I/O 2023. ZDNet. Retrieved from https://www.zdnet.com/article/every-major-ai-feature-announced-at-google-io-2023/

Naval Group. (2023, March 2). AI-powered chip design: A revolution in the semiconductor industry. Naval Group Press Room. https://www.naval-group.com/en/news/ai-powered-chip-design-a-revolution-in-the-semiconductor-industry/

Nosta, J. (2023, March 10). Stacked exponential growth: AI is outpacing Moore's law and evolutionary biology. Medium. https://johnnosta.medium.com/stacked-exponential-growth-ai-is-outpacing-moores-law-and-evolutionary-biology-12882c38b68d

Nvidia. (2023, May 2). Introducing NVIDIA Grace: A CPU specifically designed for giant-scale AI and HPC. Nvidia Newsroom. https://nvidianews.nvidia.com/news/introducing-nvidia-grace-a-cpu-specifically-designed-for-giant-scale-ai-and-hpc

Panigrahi, K. K. (2023, January 11). Difference between RISC and CISC. Retrieved from https://www.tutorialspoint.com/difference-between-risc-and-cisc

Peckham, O. (2022, July 7). IBM, Tokyo Electron Announce 3D Chip Stacking Breakthrough. HPCwire. https://www.hpcwire.com/2022/07/07/ibm-tokyo-electron-announce-3d-chip-stacking-breakthrough/

Platt, S. (2018, October 16). Metamorphosis of an industry, part two: Moore's Law and Dennard Scaling. https://www.micron.com/about/blog/2018/october/metamorphosis-of-an-industry-part-two-moores-law

PR Newswire. (2018). Synopsys Unveils Fusion Compiler Enabling 20 Percent Higher Quality-of-Results and 2x Faster Time-to-Results. https://www.prnewswire.com/news-releases/synopsys-unveils-fusion-compiler-enabling-20-percent-higher-quality-of-results-and-2x-faster-time-to-results-300744510.html

Precedence Research. (2022). Artificial Intelligence (AI) in Hardware Market. https://www.precedenceresearch.com/artificial-intelligence-in-hardware-market

Roach, J. (2023, June 17). Intel thinks your next CPU needs an AI processor — here’s why. Digital Trends. https://www.digitaltrends.com/computing/intel-meteor-lake-vpu-computex-2023/

Roy, R., Raiman, J., & Godil, S. (2023, April 5). Designing arithmetic circuits with deep reinforcement learning. Nvidia Developer Blog. Retrieved from https://developer.nvidia.com/blog/designing-arithmetic-circuits-with-deep-reinforcement-learning/

Salvator, D. (2022). Nvidia Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2022/04/06/mlperf-edge-ai-inference-orin/

Salvator, D. (2023a). Inference MLPerf AI. The Official Nvidia Blog. https://blogs.nvidia.com/blog/2023/04/05/inference-mlperf-ai/

Sharma, S. (2021, December 20). 2021 Was a Breakthrough Year for AI. VentureBeat. https://venturebeat.com/ai/2021-was-a-breakthrough-year-for-ai/

Song, Y., Dhariwal, P., Chen, M., & Sutskever, I. (2023). Consistency models. arXiv preprint arXiv:2303.01469. https://arxiv.org/abs/2303.01469

Sparks, E. (2023, January 6). The state of quantum computing in 2023. The Verge. https://www.theverge.com/2023/1/6/22216734/quantum-computing-2023-update

Sweeney, T. [@TimSweeneyEpic]. (2023, April 13). Artificial intelligence is doubling at a rate much faster than Moore’s Law’s 2 years, or evolutionary biology’s 2M years. Why? Because we’re bootstrapping it on the back of both laws. And if it can feed back into its own acceleration, that’s a stacked exponential. Twitter. https://twitter.com/TimSweeneyEpic/status/1646645582583267328

Synopsys. (n.d.). DSO.ai. https://www.synopsys.com/ai/chip-design/dso-ai.html

Takahashi, D. (2021, July 22). AI’s got talent: Meet the new rising star in media and entertainment. VentureBeat. https://venturebeat.com/ais-got-talent-meet-the-new-rising-star-in-media-and-entertainment/

Tardi, C. (2023, June 17). Moore's Law. Investopedia. https://www.investopedia.com/terms/m/mooreslaw.asp

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. doi:10.1093/mind/LIX.236.433

Varghese, G. (2022, November 28). European Chip Alliance: Uniting for a common cause. EETimes. https://www.eetimes.com/european-chip-alliance-uniting-for-a-common-cause/

Vellante, D., & Floyer, D. (2021, April 10). New era of innovation: Moore's law is not dead and AI is ready to explode. SiliconANGLE. https://siliconangle.com/2021/04/10/new-era-innovation-moores-law-not-dead-ai-ready-explode/

Vincent, J. (2021, June 10). Google is using machine learning to design its next generation of machine learning chips. The Verge. https://www.theverge.com/2021/6/10/22527476/google-machine-learning-chip-design-tpu-floorplanning

Ward-Foxton, S. (2023, February 10). AI-Powered Chip Design Goes Mainstream. EE Times. https://www.eetimes.com/ai-powered-chip-design-goes-mainstream/

*ChatGPT was used to help me outline the essay in a way that made sense, refine ideas, summarize articles to help me better understand hardware performance improvements, and format the sources into APA format. Google's Bard was used as a validator to make sure my summaries were true and accurate.*

In [7]:
submission_df = pd.read_csv("/kaggle/input/2023-kaggle-ai-report/sample_submission.csv")
submission_df.head()

Unnamed: 0,type,value
0,essay_category,'copy/paste the exact category that you are su...
1,essay_url,'http://www.kaggle.com/your_username/your_note...
2,feedback1_url,'http://www.kaggle.com/.../your_1st_peer_feedb...
3,feedback2_url,'http://www.kaggle.com/.../your_2nd_peer_feedb...
4,feedback3_url,'http://www.kaggle.com/.../your_3rd_peer_feedb...


In [8]:
val = ["'Other'", "http://www.kaggle.com/your_username/your_public_notebook",
      "http://www.kaggle.com/.../your_1st_peer_feedback",
      "http://www.kaggle.com/.../your_2nd_peer_feedback",
      "http://www.kaggle.com/.../your_3rd_peer_feedback"]
submission_df.value = val
submission_df.to_csv('submission.csv', index=False)

In [9]:
submission_df.head()

Unnamed: 0,type,value
0,essay_category,'Other'
1,essay_url,http://www.kaggle.com/your_username/your_publi...
2,feedback1_url,http://www.kaggle.com/.../your_1st_peer_feedback
3,feedback2_url,http://www.kaggle.com/.../your_2nd_peer_feedback
4,feedback3_url,http://www.kaggle.com/.../your_3rd_peer_feedback
