# In-memory computing with resistive switching devices

Daniele Ielmini 101 and H.-S. Philip Wong2\*

Modern computers are based on the von Neumann architecture in which computation and storage are physically separated: data are fetched from the memory unit, shuttled to the processing unit (where computation takes place) and then shuttled back to the memory unit to be stored. The rate at which data can be transferred between the processing unit and the memory unit represents a fundamental limitation of modern computers, known as the memory wall. In-memory computing is an approach that attempts to address this issue by designing systems that compute within the memory, thus eliminating the energy-intensive and time-consuming data movement that plagues current designs. Here we review the development of in-memory computing using resistive switching devices, where the two-terminal structure of the devices, their resistive switching properties, and direct data processing in the memory can enable area- and energy-efficient computation. We examine the different digital, analogue, and stochastic computing schemes that have been proposed, and explore the microscopic physical mechanisms involved. Finally, we discuss the challenges in-memory computing faces, including the required scaling characteristics, in delivering next-generation computing.

ver the past 50 years, progress in computing and information technology was based on the downscaling of the metaloxide-semiconductor field-effect transistor (MOSFET), which served as the workhorse of the semiconductor industry for analogue and digital circuits. This downscaling enabled digital complementary metal-oxide-semiconductor (CMOS) systems to sustain an exponential increase of the operating frequency and the number of devices per area at each technology generation<sup>1</sup>. Today, however, the operation frequency and device density have reached a plateau, which stems from at least two barriers: the dissipated power is so large that the temperature increase on the chip cannot be sustained without significant performance degradation<sup>2</sup>; and there exists an increasing performance gap between the central processing unit (where the data is processed) and the computer memory (where data is stored), which is known as the memory wall<sup>3</sup>. It has been evaluated that, for many computing tasks, most of the energy and time are consumed in data movement, rather than computation<sup>4</sup>. These problems are expected to be exacerbated as applications become more data-centric, where computing tasks consist of machine-learning operations such as object, image, and speech recognition.

Modern technologies are tackling these barriers from many angles, from the component level to the systems architecture design. Measures include the extensive use of parallelism, such as the graphics processing unit (GPU), which enhances parallelism by using many cores (even more than 100), each with a dedicated or shared high-throughput connection with the memory. Also, application-specific processors known as accelerators are designed to match the exact computing algorithms and data flow<sup>5</sup>. For instance, the tensor processing unit (TPU) has been recently developed for accelerating the multiply-accumulate (MAC) operation, which constitutes the major workload in the inference phase of neural networks in data centres for image and speech recognition<sup>6</sup>. Another solution is the introduction of memory chips with enhanced bandwidth, such as the hybrid memory cube (HMC)<sup>7</sup>, and high bandwidth memory

(HBM)<sup>8</sup>, where high data-transfer rate and high memory density are achieved by stacking multiple memory chips by through-silicon via (TSV) interconnects.

New and emerging non-volatile memory concepts have also been introduced into the traditional memory hierarchy to reduce the 'distance' between computing and the data9. These new memories, which are grouped under the name resistive switching devices in this work, have unique storage principles that are not based on charge, as in conventional flash memory and random access memory (RAM), such as static RAM (SRAM) and dynamic RAM (DRAM). The storage concept relies instead on the physics of the active materials and the device where they are integrated. These memories include resistance switching RAM (RRAM)10, phase change memory (PCM)11, magnetoresistive RAM (MRAM)12, and ferroelectric RAM (FeRAM)13. Although some of these memories have led to commercial technologies that are available on the market14, they are still too slow, have limited data bandwidth, or are too expensive to significantly contribute to solve the memory bottleneck.

Instead of re-engineering conventional systems by individual improvements in parallelism, memory bandwidth, or memory concept, in-memory computing aims to radically subvert the von Neumann architecture by carrying out calculations in situ, exactly where the data are located<sup>15</sup>. This approach is similar to the computing scheme in the human brain, where information is processed in sparse networks of neurons and synapses, without any physical separation between computation and memory<sup>16</sup>. In-memory computing offers a clear advantage by totally removing the latency and energy burdens of the memory wall. However, this new architecture requires computational memory devices that can both store data and compute at the same time, usually by device physics or other physical laws, such as Ohm's law and Kirchhoff's law in electrical circuits.

In this Review Article we examine the in-memory computing schemes that have been proposed in both digital and analogue

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IU.NET, Milan, Italy. Department of Electrical Engineering and Stanford SystemX Alliance, Stanford University, Stanford, CA, USA. \*e-mail: daniele.ielmini@polimi.it; hspwong@stanford.edu

spaces, covering the device physics, the processing algorithms, and the circuit architectures that perform computing tasks within memory.

## **Computational memory technologies**

In-memory computing generally requires fast, high-density, lowpower, scalable memory devices, such as RRAM, PCM, MRAM, and FeRAM, which are sketched in Fig. 1. All of these devices are two-terminal elements, where the application of a voltage results in a change of the material's property. For instance, RRAM (Fig. 1a) consists of a metal-insulator-metal (MIM) stack, where a filamentary path is initially created by soft electrical breakdown, or forming, induced by the application of a voltage. The large concentration of defects, for example, oxygen vacancies in metal oxides<sup>17</sup> or metallic ions injected from the electrodes<sup>18</sup>, are then driven by field-induced migration and diffusion in this conductive filament (CF)<sup>19</sup>. Application of a positive voltage to the top electrode, where the defects are concentrated with higher density, induces defect migration toward the bottom electrode, thus causing the transition to the low-resistance state (LRS), due to enhanced conduction at defect sites. Application of a negative voltage induces defect migration back to the top electrode, thus causing the transition to the high-resistance state (HRS) due to the disconnection of the CF. These transitions can be seen in the idealized current-voltage (I-V)characteristic in Fig. 1b, where the set transition to the LRS and the reset transition to the HRS occur at opposite voltages. Similar to the bipolar RRAM concept shown in Fig. 1b, unipolar RRAMs have also been presented, where the set and reset processes both occur under the same voltage polarity because of the dominant role of Joule heating in creating and dissolving the CF<sup>20,21</sup>. Also, nonfilamentary switching has been demonstrated in RRAM by interface switching, where the voltage-induced defect migration results in a uniform change of a Schottky or tunnelling barrier across the whole device area<sup>22</sup>. All of these devices rely on the diffusion and

migration of defects and will be referred to as RRAM throughout this Review Article.

In PCM (Fig. 1c), the active material is a chalcogenide phasechange material, such as Ge<sub>2</sub>Sb<sub>2</sub>Te<sub>5</sub> (ref. <sup>23</sup>), which can remain in either crystalline or amorphous states for long periods of time, for example, ten years at moderately high temperature. Starting from the amorphous state, the application of voltage pulses with relatively low amplitude causes crystallization induced by Joule heating (Fig. 1d), whereas the application of pulses at higher amplitudes can lead to local melting and consequent amorphization. The crystalline phase has a low resistance because of the large concentration of carriers, while the amorphous phase has a high resistance due to its intrinsic semiconductor nature originating from Fermi level pinning at mid gap<sup>24</sup>. A typical PCM cell has a mushroom shape, as shown in Fig. 1c, where the pillar-like bottom electrode confines heat and current, thus resulting in a hemispherical shape of the molten material. Pore-like PCM cells have been shown to be more energy efficient and more scalable than mushroom cells, thanks to a better confinement of heat and current<sup>25</sup>.

RRAM and PCM share many similar features, which make them quite promising technologies for in-memory computing. First, they are both non-volatile, and the resistance ratio is generally greater than ten, which allows clear discrimination between digital '0' and '1', as well as making multilevel operations possible. Both devices operate at moderately high switching speed (typically below 100 ns and even in the sub-ns regime<sup>26,27</sup>). Finally, they both display a better endurance compared to conventional flash storage devices<sup>28</sup>. On the other hand, the morphology of the material change is different in the two cases, namely filamentary switching in the case of RRAM versus thermally induced volume change in PCM. Also, the switching phenomena are different in that RRAM states are chemically distinct because of redox reactions and migration, whereas the PCM phases are only physically different (that is, there is no change in material composition).



**Fig. 1** Computational memory devices. **a,b**, Resistive switching random access memory structure (**a**) and current-voltage characteristic (**b**) of a bipolar switching device. The set transition from the high resistance state (HRS) to the low resistance state (LRS) occurs at positive voltage due to the formation of a filament shunting the top and bottom electrodes, while the reset transition from the LRS to the HRS under negative voltage indicates the disconnection of the voltage-induced filament. **c,d**, Phase change memory structure (**c**) and resistance change characteristic (**d**), showing the resistance measured after a voltage pulse is applied to a phase change memory device in the amorphous state. The decrease of resistance indicates increasing crystalized volume in the active material, while the increase of resistance above the melting-point voltage ( $V_m$ ) indicates increasing amorphous volume. **e,f**, Magnetic tunnel junction structure (**e**) and resistance-voltage characteristic (**f**) of a spin transfer torque magnetic random access memory device. The parallel (P) and antiparallel (AP) states have low and high resistance, respectively, which can be attained at positive and negative voltage, respectively. **g,h**, Ferroelectric random access memory structure (**g**) and polarization-voltage hysteretic characteristic (**h**). The orientation of electrical dipoles causes permanent polarization of the ferroelectric layer. Application of a voltage above the coercive voltage  $V_c$  leads to dipole reorientation with positive remnant polarization  $-P_r$ .

Figure 1e shows a magneto-tunnel junction (MTJ), which is the building block for most MRAM devices. The MTJ consists of an MIM structure with two ferromagnetic metal layers, for example, the CoFeB alloy, and a thin tunnel oxide, for example, MgO. The two ferromagnetic layers are referred to as either the pinned layer (where the magnetic polarization is structurally fixed to act as a reference) or the free layer (where the magnetic polarization is free to change upon programming). The two ferromagnetic polarizations can thus be either parallel (same direction) or antiparallel (opposite direction), which results in a low or a high resistance of the MTJ, respectively, due to the tunnel magnetoresistive effect<sup>29</sup>. To flip the state of the MTJ, the spin transfer torque (STT) has emerged in recent years as a scalable, low-energy mechanism<sup>30</sup>. In STT-MRAM, the transition to the parallel state takes place directly by conduction electrons, which are first spin-polarized by the pinned layer, then rotate the magnetic polarization of the free layer by magnetic momentum conservation<sup>31</sup>. To rotate the free layer magnetization in the antiparallel state, an opposite voltage, hence current direction, is needed, as shown in Fig. 1f. The relative change of resistance, also called the magnetoresistance ratio, is typically around 200%, or a factor of three<sup>32</sup>. STT-MRAM is characterized by a high switching speed, which can be lower than 1 ns, and by a high endurance above 1014 (ref. 33).

RRAM, PCM and MRAM are all resistive switching memories, as the physical switching is reflected in a change of resistance. On the other hand, FeRAM (shown in Fig. 1g) relies on the polarization switching in a ferroelectric material, such as a perovskite material<sup>13</sup>, or doped-HfO<sub>2</sub> (ref. <sup>34</sup>). Here, the individual ferroelectric dipoles change their orientation in response to the voltage applied to the MIM stack (Fig. 1h). Instead of impacting the MIM stack resistance, ferroelectric switching changes the charge induced on the metallic electrodes of the MIM capacitance, which can be sensed by integrating the current over a voltage sweep. To gain a resistance change by ferroelectric switching, one should adopt a ferroelectric field-effect transistor (FeFET) structure with three terminals, where the change in dielectric polarization causes a variation in the resistance of the FeFET channel<sup>35</sup>. The resistance change in the FeFET thus enables in-memory computing elements such as neuromorphic synapses<sup>36</sup>.

## Digital computing by binary resistive switching

In the last 20 years, in-memory digital computing has focused on identifying novel logic gate concepts with lower energy and area consumption. Digital computing with nanomagnets<sup>37</sup>, quantum dots<sup>38</sup>, and even single atoms<sup>39</sup> have been demonstrated experimentally, although the control of the individual cells via voltage and current signals appears challenging. Resistive switching devices, such as RRAM, offer several advantages in digital computing, such as direct access by interconnect lines, the capability to electrically reconfigure the device, and nanoscale miniaturization<sup>40</sup>. Figure 2 shows various options to carry out digital Boolean operations with RRAM, differing by the type of input, the type of output, and the physical operation to describe the logic function. In the logic gate of Fig. 2a, the two input states  $X_1$  and  $X_2$  are represented by the voltage values applied to the top and bottom electrodes, respectively, while the output of the logic operation is stored as the resistance of the physical element, therefore the scheme will be referred to as the V-R logic gate<sup>41</sup>. The computing element is a bipolar-switching RRAM device, where the application of a positive voltage to the top electrode leads to a transition to the HRS, and the application of a negative voltage to the top electrode leads to a transition to the LRS. (To represent the switching polarity, the RRAM device is drawn as an arrow pointing to the electrode being biased to negative during the set transition to the LRS.) The output of the computation is the resistive state, namely a logic value 0 for the HRS, and 1 for the LRS, where the RRAM device is initially prepared in state 1. The logic gate behaves as follows: if the input logic voltages are equal, namely

 $X_1=X_2=0$ , or  $X_1=X_2=1$ , then the overall voltage drop across the RRAM device is zero, thus the RRAM state remains unchanged (Y=1; see the truth table in Fig. 2b). On the other hand, the configuration  $X_1=1$  and  $X_2=0$  causes a transition to the HRS, hence Y=0. Finally, the configuration  $X_1=0$  and  $X_2=1$  is ineffective as the device is already in state 1. The resulting logic operation is the material implication (IMP), where the output is always 1, except for the condition  $X_1=1$  and  $X_2=0$ , where the logic implication is not satisfied. Since IMP is functionally complete, all 16 Boolean functions can be realized by suitable combinations of more IMP operations<sup>41</sup>. The V-R gate can also be generalized as a majority gate if the initial state of the RRAM device is also considered as a possible input value<sup>42</sup>. The V-R concept can also be generalized to serial/parallel arrangements of more resistive switches to perform conventional logic operations (for example, AND) in just one step<sup>43</sup>.

The V-R logic gate in Fig. 2a is a non-volatile concept, in that the output state remains stored as the resistive state without any voltage bias, thus allowing a considerable saving of static power. On the other hand, the sequential cascade of two operations, where the first gate's output is directly used as the second gate's input, is impossible, as input and output signals are physically different<sup>44</sup>. Converting the output resistance into an input voltage can be achieved by additional circuits, typically located out of the memory area, which however increase the size, complexity and power consumption of the computing system. As a result, the V-R gate cannot be ascribed to fully in-memory logic schemes, as the computing flow must 'exit' the memory circuit to convert resistive states into input voltage values.

Figure 2c shows the V-V logic gate, also referred to as a threshold logic unit. In V-V logic gates, both input and output values are described by digital voltages, being either low or high to represent a 0 or a 1, respectively<sup>45</sup>. The V-V logic gate can be viewed as a onelayer neural network, where any input voltage  $V_i$  stimulates a current  $I_j$  given by Ohm's law  $I_j = G_j(V_j - V_{com})$ , where  $G_j$  is the conductance of the jth resistive switch and  $V_{\text{com}}$  is the potential of the common node in Fig. 2c. Currents are then summed by Kirchhoff's law at the common node, thus leading to  $V_{\text{com}} = R_{\text{L}} \sum I_i = R_{\text{L}} \sum G_i (V_i - V_{\text{com}})$ , where  $R_{\rm L}$  is the load resistance connecting the common node to ground. The common voltage is thus given by  $V_{\text{com}} = \sum G_i V_i / (1/R_L + \sum G_i)$ , which describes a weighted sum of input voltages. The common node is usually connected to a rectifying stage, such as a comparator, which restores a digital value for the output  $V_{out}$ , given by  $V_{\text{out}} = f(V_{\text{com}} - V_{\text{T}})$ , where f is a highly-nonlinear function, and  $V_{\text{T}}$ is an internal threshold voltage, hence the name threshold logic. The Boolean function is thus described by the input/output characteristic in Fig. 2d, where all input values are linearly separated between configurations yielding an output of 0 or 1. The position of the separating line is dictated by the weights  $G_i$  and by  $V_T$ , which are carefully tuned to obtain any generic linearly separable Boolean function (AND in Fig. 2d)<sup>46</sup>. Similar to the V-R logic scheme, the comparator is a relatively massive circuit that must be located out of the memory area. Also, each input voltage must be obtained by a conversion from a stored value (typically a resistance state *R*), while the resistive switch simply stores the weight for the logic operation, that is, part of the information required to execute a program code, rather than the input/output values themselves. On the other hand, cascading is possible in V-V logic, as input and output voltages share the same physical nature and amplitude range.

Figure 2e shows R–R logic, where both the input and output values are the resistive states of the memory elements, and the logic operation is carried out within the memory  $^{47}$ . This is a true, cascadable in-memory operation, which is also referred to as stateful logic  $^{48,49}$ , as it relies on the non-volatile states of the resistive elements. Similar to V–V logic, logic computation is carried out based on physical laws, such as Ohm's law and Kirchhoff's law, except that the comparator function is taken by one of the resistive units in the logic gate, thus enabling true in-memory computation.



**Fig. 2 | RRAM-based digital logic gates. a,b**, V-R logic gate (**a**) and corresponding truth table for material implication (IMP) operation (**b**). The V-R logic gate consists of a single resistive switch, where the input signals are the applied voltages at the two ends of the device ( $X_1$  and  $X_2$ ) and the output signal is the switch conductance state (Y). **c**,**d**, V-V logic, also known as the threshold logic gate (**c**), and the corresponding input/output characteristic for AND operation (**d**). Input and output signals are the applied voltages at the input nodes ( $X_1$  and  $X_2$ ) and the output of the comparator stage (Y), respectively. The four configurations of input values can be linearly separated (see dashed line in **d**) according to the weights  $G_1$  and the comparator threshold  $V_T$ , thus yielding a reconfigurable Boolean function. The input/output characteristic indicates an AND function, where low and high values of Y are indicated as filled and open symbols, respectively. **e**,**f**, Parallel R-R stateful logic gate for IMP operation (**e**) and corresponding truth table (**f**). The input variables  $X_1$  and  $X_2$  are the initial resistance states of the RRAM devices, while the output variable Y is the final resistance state of the RRAM that changes state. Unconditional set transition occurs for  $X_1 = 0$ , while no switching takes place for  $X_1 = 1$ , thus resulting in an IMP operation. **g**,**h**, Serial R-R stateful logic gate for OR operation (**g**) and corresponding truth table (**h**). The input variables  $X_1$  and  $X_2$  are the initial states of the RRAM devices, while the output variable is the final resistance of either RRAM. Voltages  $V_A$  and  $V_A$  are applied to the top and bottom lines, whereas the intermediate line remains floating. A conditional set transition from 0 to 1 takes place for odd input states, thus resulting in an OR operation. R-R logic is the only true in-memory option, as it is fully resident in the memory circuit.

For instance, in the IMP gate of Fig. 2e, two resistive switches in parallel configuration with input states  $X_1$  and  $X_2$  are biased with voltages  $V_{\rm set}$  –  $\Delta$  and  $V_{\rm set}$  +  $\Delta$ , respectively, where  $V_{\rm set}$  is the nominal voltage to induce a set transition, and  $\Delta$  is a relatively small fraction, for example, 10% of  $V_{\rm set}$ . If  $X_1$  is in the HRS (input logic value 0), the voltage  $V_{\rm set}$  +  $\Delta$  will drop entirely across  $X_2$ , thus leading to an unconditional set operation. As a result, the output state, which is  $X_2$  at the end of the computation, or Y, is unconditionally equal to 1 (see the truth table in Fig. 2f). On the other hand, if  $X_1$  is in the LRS (input logic value 1), the voltage across  $X_1$  and  $X_2$  will be only  $2\Delta$ , thus insufficient for the switching of either  $X_1$  or  $X_2$ , that is, the input states will remain unchanged, thus resulting in the IMP operation. More Boolean functions can be obtained by sequentially repeating IMP on more devices<sup>47</sup>.

Similar stateful logic gates were proposed by changing the circuit architecture, for example, adopting a serial switch arrangement of more resistive switches in parallel  $^{51,52}$ . For instance, Fig. 2g shows an OR logic gate consisting of two serially connected resistive switches, where the intermediate node is left floating, that is, free to change its potential according to the voltage divider made by the two switches. If the two input states are equal, for example,  $X_1 = X_2 = 0$ , then the voltage divides equally across the two devices, thus remaining below the threshold  $V_{\rm set}$  for set transition. On the other hand, if only one of the two input devices is high, the other input with low conductance will have a large voltage drop across it, thus inducing a set transition. This operation yields an OR function with either switch serving as the output. Other functions, such as IMP and logic inversion (NOT), can be realized with the same architecture but different

applied voltages<sup>50</sup>. The *R-R* logic concept has been extended to other memory devices, such as PCM<sup>53</sup> and STT-MRAM<sup>54</sup>, thus confirming the universal application of digital in-memory computing.

Stateful R-R logic gates have several advantages over V-V and V-R schemes, including the possibility of sequentially cascading multiple operations, the reconfiguration of the Boolean function by the applied voltage pulses, and true in-memory processing capability. Both the data and the code, containing the type of operation and the data address, can be stored in the same memory circuit, for example, a crosspoint RRAM array. The code can be read and executed on data within the memory, thus overcoming the typical memory bottleneck of today's computing architecture<sup>55</sup>. A key limitation of in-memory digital computing is the time and energy burden due to the physical switching process within the device. Resistive switching in RRAMs today requires a voltage of at least about 1 V, with a current consumption in the range of 10 μA and a time of about 10 ns, which yields an energy of approximately 0.1 pJ per operation. For comparison, this is the same energy that is consumed in the 45 nm CMOS technology for a 32-bit integer addition, which consists of many individual Boolean logic steps4. Even lower energy consumption is estimated for advanced technology nodes, for example, only a few fJ for an 8-bit addition<sup>56</sup> in a 7 nm CMOS generation. Such a large gap stems from the physics of the device itself, which involves electron drift and capacitive charging in CMOS logic gates, while RRAM relies on the motion of ionic species, which requires a relatively large local electric field and temperature, caused by Joule heating, for their hopping migration<sup>19</sup>. The high electric field and local temperature in RRAM switching also

results in significant degradation, eventually inducing an irreversible breakdown of the MIM interfaces<sup>57</sup>. This is a major limitation compared to charge-based CMOS logic circuits, where device degradation is almost negligible within the expected lifetime, for example, 10<sup>16</sup> cycles in the case of a SRAM embedded in the same chip as the CPU. The relatively large operating current can cause unwanted ohmic voltage drops along the signal line, which can be avoided by increasing the width of the metallic interconnect, thus losing some of the high-density advantages of the crosspoint memory architecture. An additional area overhead is taken by the periphery control logic, including latches for synchronous propagation of the signal and row/column multiplexers. Neither issue has been adequately addressed in the literature so far.

### Computing by cumulative resistive changes

Digital computing with RRAM generally takes advantage of a binary transition, for example, a set transition from the HRS to the LRS. More degrees of freedom can be gained by extending to the multilevel domain, where the application of repeated pulses induces a controllable, fractional variation of the device resistance. This is shown in Fig. 3 for a PCM device, where the set transition consists of a gradual crystallization of an amorphous region, and an increasing crystallization fraction results in a decreasing resistance. Figure 3a shows the evolution of a PCM device from an amorphous phase to a crystalline phase as the number of pulses is increased. A simulated temperature profile during programming and the distribution of amorphous/crystalline phases within a PCM device for an increasing crystallization time has been reported in ref. 53. As the time increases, the thickness of the amorphous material decreases because of crystallization, thus causing a decrease of the threshold voltage  $V_{\rm T}$  and the device resistance.

Gradual resistive switching is a key concept for analogue computing. It can, for example, enable arithmetic summation<sup>57</sup> (Fig. 3b). Here, the addition '3 + 4' is carried out in a single PCM element, used as a nanoscale abacus<sup>58</sup>. First, the two limits of the conductance scale are defined: the HRS corresponding to the fully amorphized PCM after the reset operation, and the LRS corresponding to the partial crystalline state obtained after the repetition of N pulses, for example, N=8 in Fig. 3b. The device is thus initialized in the HRS with resistance  $R_0$ , and a number of pulses corresponding to the first addend and the second addend are applied, that is, three pulses in the first stage, and four pulses in the second stage. Finally, the number of pulses to reach the LRS resistance  $R_8$  is evaluated by a program/verify loop, which yields the N-complement of the correct solution, for example, 1=8-(4+3) in Fig. 3b (ref. <sup>57</sup>).

The concept of the accumulating PCM counter can be extended to several applications in the realm of analogue computing, such as the decomposition in prime factors<sup>59</sup>, the gradual potentiation of a PCM artificial synapse<sup>60,61</sup>, and the logic summation, namely digital OR within a single PCM<sup>53</sup>. In the latter case, the digital operation is executed by sequential pulses, rather than a single step as in Fig. 2, where each pulse is counted by the PCM for digital addition in a cascadable R-R logic gate. Summation can also be extended to the concept of integration of analogue pulses, or spikes, which is an essential feature of integrate-and-fire neurons in spiking neural networks<sup>57,62</sup>. This is illustrated by Fig. 3c, showing the concept of an artificial neuron receiving weighted spiking signals from several pre-synaptic sources. The spikes are summed, for example, by Kirchhoff's law of summation of currents at the input of the neuron circuit, and integrated within a PCM, as shown in Fig. 3d (ref. 62). The PCM accumulates the incoming spikes, eventually hitting the LRS threshold, which triggers the generation of a fire event, namely an output spiking signal. A key advantage of the PCM neuron is its nanoscale miniaturization, as opposed to the conventional capacitor-based charge integration, where the capacitor typically occupies a relatively large area of the circuit, for example, about  $60 \ \mu m^2 \ pF^{-1}$ 

in a 28 nm CMOS technology63, thus limiting the maximum number of neurons in a neuromorphic chip. Similar concepts of integrating neurons were shown by adopting threshold switching in a Mott insulator<sup>64</sup> or volatile RRAM with unstable Ag filaments<sup>65</sup>. In these works, the spike-accumulating device spontaneously returns to the off-state after firing, in contrast to the PCM neuron, which must instead be reset to return to the off-state<sup>62</sup>. Pulse accumulation in nanoscale elements has been demonstrated in RRAM devices. where repeated pulses cause incremental reset transitions because of the gradual increase of the depleted gap disconnecting the conductive path66. Cumulative change of resistance has been adopted for analogue synapses in artificial neural networks using both filamentary<sup>67-69</sup> and interface-switching RRAM devices<sup>70</sup>, as well as ferroelectric devices<sup>71</sup> and domain-wall STT-MRAM devices<sup>72</sup>. Weight update in floating gate synapses has also been demonstrated by charge integration<sup>73</sup>, although this implementation suffers from typically large operation voltages and an expensive double-poly integration process.

# Stochastic random bit generation

A significant drawback of resistive memory devices for both memory and computing is their stochastic variation, which originates from microscopic switching mechanisms. For instance, set transition in a RRAM consists of the field- and temperature-activated migration of defects, each of them moving along different paths and correspondingly different energy barriers. As a result, the set voltage changes from cycle to cycle even for the same device, due to the multiple variables (migration barriers, local configurations, concentration gradient and so on) affecting the set process. In addition, the conductive filament formed by the set transition also changes from cycle to cycle, thus causing a variable resistance of the LRS74 from cycle to cycle. Similarly, the reset voltage and the HRS resistance stochastically vary, causing errors in the digital operations shown in Figs. 2 and 3. For instance, the IMP logic gate in Fig. 2c relies on the repeatability of  $V_{\text{set}}$ , in terms of both the cycle-to-cycle variability, and a suitable degree of matching of the device characteristics between the two cells of the logic gate. Similarly, the accuracy of the PCM arithmetic adder in Fig. 3b depends on the repeatability of pulse-induced crystallization, which is inherently stochastic due to the random atomic configurations within the amorphous phases<sup>75</sup>.

Variability can be turned into a resource in a random number generator (RNG). Although a random number generation is not strictly an in-memory computing tool, it has an important function in cryptography and data security. The generation of random keys is also instrumental in the physical unclonable function (PUF) — a one-way function for the authentication of hardware chips<sup>76</sup>. The PUF provides a response to an external challenge and the function generating the response cannot be captured or cloned, thus preventing chip counterfeiting and hacking<sup>77</sup>. Random number generation is also an enabling tool in probabilistic spiking neural networks, where noise is used as a resource to mimic the stochastic release of synaptic neurotransmitters, or the stochastic opening and closing of membrane channels78. The conventional schemes for generating random numbers usually rely on software and hardware techniques, which create a seed-dependent stream of deterministic pseudo-random numbers<sup>79</sup>. To develop a true RNG (TRNG), a physical entropy source is needed, for example, the noise or variability of switching phenomena in memory devices.

Figure 4 shows examples of stochastic phenomena in RRAM and their exploitation for RNGs. Random telegraph noise (RTN) in Fig. 4a is a typical phenomenon taking place in either the HRS or the LRS; RTN results from metastable defect fluctuations near the conductive path<sup>80</sup>. RTN appears as a random change of the current from a low value  $I_0$  to a high value  $I_1$ , thus sampling the current at random times yields a bimodal probabilistic distribution, as shown in Fig. 4b. The random bit can thus be generated by reading the

**REVIEW ARTICLE** 



**Fig. 3** | **Analogue computing in a PCM device. a**, Evolution of the PCM device after an increasing number of pulses. More applied pulses lead to an increasing crystalline phase, causing a decrease of threshold voltage and resistance. PSP, postsynaptic potential. **b**, Arithmetic summation of addends 4 and 3 by pulse accumulation in a PCM. **c**, Integrate-and-fire neuron, where integration is carried out by accumulating incoming spikes in a PCM element. BE, bottom electrode.  $W_N$  are the synaptic weights of the artificial synapses. **d**, Synaptic potentiation by cumulative crystallization in a PCM synapse.Credit: panels **a**, **c** and **d** adapted from ref. <sup>62</sup>, Macmillan Publishers Ltd.



**Fig. 4 | Stochastic computing with resistive switching devices. a,b**, Random telegraph noise current fluctuations (**a**) and corresponding probabilistic distribution function (PDF) (**b**), attributing random bit values 0 and 1 to current sub-distributions  $I_0$  and  $I_1$ , respectively. **c,d**, Applied voltage pulse, its current response evidencing the random delay time  $\Delta t$  (**c**), and PDF of  $\Delta t$  (**d**) with an equally spaced time window to uniformly attribute bit values 0 and 1. **e,f**, Measured I-V curves evidencing cycle-to-cycle variation of  $V_{\text{set}}$  (**e**), and PDF of the resistance measured after a stochastic set (**f**), where sub-distributions of the high resistance state and the low resistance state are attributed to bits 0 and 1, respectively. **g**, Differential pair for generating uniform sequences of random bits without probability tracking. Credit: panels **a** and **e** adapted from refs <sup>81</sup> and <sup>83</sup>, respectively, IEEE

current value and attributing  $I_0$  to bit 0, and  $I_1$  to bit 1 (ref.  $^{81}$ ). The RTN site is generally difficult to control in terms of both amplitude and uniformity, that is, the sub-distributions in Fig. 4b should be equal to ensure a 50% probability of generating either 0 or 1. RTN is also affected by temperature and applied bias, which also leads to drift and instability of the RTN entropy source.

The quality of the generated random numbers can be improved by exploiting switching variability, such as the stochastic delay time in Fig. 4c. When a constant voltage close to  $V_{\text{set}}$  is applied to a RRAM in the HRS, set transition takes place after a delay time  $t_D$ which varies from cycle to cycle due to statistical changes in the electrical and ionic conductive paths within the device<sup>74</sup>. A random number can be generated by dividing the time in equally spaced intervals  $\Delta t$ , as in Fig. 4d, and attributing bit 0 or 1 to the switching event taking place in even or odd windows, respectively. This scheme improves bit randomness, as the probability of generating 0 or 1 is close to 50% provided that  $t_D$  is sufficiently larger than  $\Delta t$ (ref. 82). Instead of measuring the delay time for switching, the state of the device can be conveniently measured after a fixed amount of time, as shown in Fig. 4e. Here, a voltage equal to the median value of the stochastic  $V_{\text{set}}$  is applied to a RRAM device in the HRS, thus statistically resulting in a set transition 50% of the time. The resistance distribution of the final states after stochastic switching thus shows a bimodal distribution of the HRS and LRS (Fig. 4f), which can be attributed to bit 0 and 1, respectively<sup>83</sup>. This technique was also applied to stochastic computing, where a single device can represent an analogue value corresponding, for example, to the HRS fraction in Fig. 4f (ref. 84). For the generated random numbers to be uniform, the exact value of  $\langle V_{\text{set}} \rangle$  should be known, which requires a preliminary probability tracking procedure to initialize the RNG85. Similar voltage-based random number generation schemes were developed by adopting STT-MRAM devices, which benefit from a higher cycling lifetime and higher switching speed<sup>85,86</sup>. The need for probability tracking can be overcome by differential TRNGs (Fig. 4e), where the competition within two switching devices randomly yields HRS-LRS or LRS-HRS pairs, which can be attributed to bit 0 and 1, respectively, with 50% probability87.

Overall, in-memory TRNGs provide physical random numbers with high randomness quality, as assessed by standard tests<sup>82,85,87</sup>, and simple circuit layout, consisting of just a few switching devices

and some external control for stochastic programming and readout. Due to its higher stability and better endurance, STT-MRAM devices appear to be the best device option to implement TRNGs. On the other hand, PUF circuits, which require only random initialization and an optional runtime reconfiguration, may be developed with PCM and RRAM circuits, thus benefitting from a lower cost and easier integration in the CMOS process flow.

# Analogue computing with crosspoint arrays

In-memory computing can adopt not only microscopic physical phenomena, but also universal circuit laws such as Kirchhoff's law and Ohm's law in resistive memory arrays. A typical example is the crosspoint array, consisting of multiple intersections between orthogonal row and column electrodes, each intersection containing a resistive memory element, such as a RRAM88 or a PCM89. The crosspoint memories are extremely attractive to reduce the bit cell size, as the individual device area is just  $4F^2$ , where F is the lithographic feature size in the process technology. From the viewpoint of in-memory computing, the crosspoint array naturally provides a hardware accelerator for analogue, approximated matrix-vector multiplication (MVM). Figure 5a illustrates the concept of MVM in a crosspoint array, where a voltage  $V_i$  is applied to the *j*th column, with j = 1, 2, ..., N, where N is the number of rows and columns. The voltage-induced currents of each resistive element are collected at the grounded rows, yielding a total current

$$I_i = \sum_j G_{ij} V_j \tag{1}$$

at the *i*th row, where  $G_{ij}$  is the conductance of the resistive memory at row *i* and column *j*. Equation (1) is the analogue product of the conductance matrix  $G_{ij}$  and the voltage vector  $V_{j}$ , which implements a hardware-based MVM via Ohm's law and Kirchhoff's law. The analogue MVM in the crosspoint can be carried out in just one step, as opposed to the digital MAC operation, which is a time-and energy-consuming step in classical computers. Note that a significant amount of energy for crosspoint-based MVM is spent in operating analogue-to-digital converters that transform the digital input vector into analogue voltages  $V_i$  in cases where the input of the







**Fig. 5 | Analogue computing in crosspoint arrays. a**, Matrix-vector multiplication (MVM) within an artificial neural network, where input voltages  $V_j$  serve as pre-synaptic (input) neuron signals, and the array conductance  $G_{ij}$  describes the synaptic weight. The output row current  $I_i$  provides the sum of weighted signals feeding the ith post-synaptic neuron. **b**, Content-addressable memory concept adopting MVM of input data  $V_j$  and stored data  $G_{ij}$ . The MVM provides the best match to data, where the maximum response current yields the address of the input data. For instance, submitting a bit string of '0 11' to the array columns results in the largest current in  $I_2$ , corresponding to the matching data '0 11'. **c**, PUF for generating a response  $I_{out}$  to a challenge, namely, the configuration of biased columns. The PUF relies on multiple sneak paths to yield a random unclonable function.

calculations does not come directly from analogue sensors, or where further digital processing of the output is needed<sup>91</sup>. A fair comparison of energy and area efficiency should therefore consider both direct (crosspoint) and indirect (periphery) contributions<sup>92</sup>.

Crosspoint MVM can be adopted for a broad range of problems, including image compression  $^{91}$ , sparse coding  $^{93}$ , and implementation of artificial neural networks (ANNs), where  $G_{ij}$  has the meaning of a synaptic weight,  $V_j$  is a pre-synaptic spike amplitude, and  $I_i$  is the input signal to the ith neuron  $^{69,70}$ . For instance, Fig. 5a represents a  $3\times 3$  ANN with three input neurons and three output neurons, where synaptic weights can be trained directly in hardware using a gradient-descent algorithm and backpropagation, taking advantage of pulse accumulation in PCM and RRAM for updating the weights. The MVM scheme can be used to implement a content addressable memory (CAM) — an associative memory that provides the

location of the memory where the digital content is the best match to an input set of digital data. Figure 5b shows the CAM concept, namely a crosspoint array with stored digital data  $G_{ij}$ , and a set of input data  $V_j$  (ref. <sup>52</sup>). The row current in equation (1) provides an analogue match function that is maximized for the input data closest to the stored data  $G_{ii}$ .

While MVM relies on the precise voltage control of columns and rows for MVM operations, alternative biasing techniques can be used, for example, to generate stochastic challenge-response PUFs. Figure 5c shows a crosspoint PUF, where random conductance values  $G_{ii}$  are stored, and all lines are left floating except for column  $j^*$ and row  $i^*$ , which are biased to V and 0, respectively. The current sensed at row  $i^*$  includes not only the current of element  $(i^*, j^*)$ , but also a number of sneak-path currents, flowing across the indirectly biased resistive elements at floating rows/columns in the array94. Sneak-path currents are generally unwanted in memory arrays where the individual memory cell must be sensed to read the stored data. As a result of sneak-path currents in the crosspoint PUF, the output current is a complicated (hence hardly clonable) function of  $i^*$ ,  $j^*$ , V and  $G_{ii}$ . The same concept can be generalized by biasing an arbitrary number of columns, serving as the input challenge94. Thanks to the good scalability and stochastic variation of resistance, the crosspoint PUF appears to be a promising solution for hardware security in the Internet of Things.

### Outlook

In-memory computing provides a an alternative means to overcome the limitations of existing von Neumann-based computing approaches. However, there are many technical issues that must be addressed for in-memory computing to become a viable solution in information technology. It has been mentioned that switching variability is a major concern for deterministic computing (for example, the IMP logic gate of Fig. 2c can result in errors due to the cell-to-cell and cycle-to-cycle variations of  $V_{\text{set}}$  in RRAM). On the other hand, inherently stochastic functions, such as stochastic integration in artificial neurons and random number generators, benefit from switching variations in memory elements. It should be noted that statistical variability also affects memory operation for storage<sup>74</sup>, although in such a case it can be managed at the system level by algorithms that verify and correct the memory state soon after programming and correct for errors after a read operation. Similar verify techniques are difficult to implement in computing and may reduce the benefits of a pure in-memory computing system.

Besides variations, memory instability can also limit the accuracy of in-memory computing, particularly for analogue operations in crosspoint computing. For instance, even assuming a precise tuning of array elements  $G_{ii}$  by verify techniques in Fig. 5a, the conductance  $G_{ij}$  might be affected by spontaneous fluctuations, thus causing MVM inaccuracies. Instability particularly affects RRAM devices, as the localization in the LRS and HRS makes the resistance extremely sensitive to individual atomic transitions close to the conductive path<sup>80</sup>. PCM devices are less affected by instability due to the bulk-type conduction mechanism present in these devices. However, the resistance can drift in time as a result of the metastable nature of the amorphous state<sup>95</sup>. Resistance drift originates from structural relaxation of the amorphous phase after quenching from the liquid phase and consists of a decrease of the defect concentration and an increase in bandgap and resistivity. Drift can be alleviated by increasing the read current%, and adopting a core-shell structure of the memory cell, where the conduction path flows away from the core amorphous phase via a metallic shell layer97. In general, the effects of drift and noise are increased at high memory resistance, which also suffers from a higher nonlinearity of conductance, due to field- and temperature-induced enhancement of transport<sup>91</sup>. On the other



**Fig. 6 | Crosspoint memory architecture and scaling. a**, RRAM crosspoint structure with a single memory layer. To increase the device density, the RRAM diameter and the distance between memory elements should be reduced. **b**, A horizontal stacked 3D array, where device stacking enables density multiplication, roughly by the number of layers. **c**, Vertical 3D array, combining high density and cost-effective processing technology.

hand, programming states at low resistance requires relatively high currents, thus impacting the energy efficiency of the computing system. Higher operating currents also raise the parasitic voltage drop across the metallic lines constituting the rows and columns of the memory array. As a result of resistance variation and nonlinearity, crosspoint arrays only compute approximate results and should be restricted to a limited set of error-tolerant tasks, for example, pattern recognition, page ranking, and data inference. Given this intimate device–system interaction, the design and optimization of the memory device should rely on a detailed consideration of the system-level performance metrics, including accuracy, energy efficiency, and switching speed. The technology of computational memory devices, as a result, will most likely be different from those targeted for digital data storage.

A key requirement for in-memory computing to become a mainstream technology is scaling. The growth of Internet data and the need to sustain more storage capacity in mobile computers and data centres have been driving the scaling of flash memory density, which is currently increasing by 40% per year<sup>98</sup>. To enable a similar growth rate for in-memory computing, crosspoint memory arrays should scale down. The density increase can be obtained by a decrease of the individual cell size (for example, by reducing the diameter of the RRAM device in Fig. 6a). Reducing the size of the computing element, however, raises concerns about the control of switching parameters and increased cell-to-cell variability. With the device downscaling, the interconnect line and the periphery circuit area should correspondingly decrease. However, interconnect downscaling causes an increase of series resistance, due to both the geometry scaling and the enhanced surface scattering<sup>99</sup>. The increased line resistance complicates the operation of the crosspoint circuit due to parasitic voltage drops, especially at high operating currents. Methods to reduce the interconnect resistivity include the use of novel materials, such as carbon nanotubes and graphene<sup>100</sup>, and alternative scaling paths.

To overcome the difficulties of in-plane scaling, novel 3D array architectures have been proposed, such as the horizontal stacked 3D structure in Fig. 6b (ref. 89), and the vertical 3D structure in Fig. 6c (ref. 101). The vertical 3D structure offers better processing yield and cost efficiency with respect to horizontal 3D structures, as the critical lithography steps are limited to the creation of the pillar across the multiple electrode/spacing layers. In fact, in a vertical 3D array, memory cells are formed at the crossing between a horizontal plane and a vertical pillar, consisting of a core-shell structure with a metallic core and an insulating shell serving as the switching layer. Within a 3D array, high density can be achieved by increasing the number of stacked layers instead of reducing the cell size and line width. The horizontal cell-cell pitch can be reduced by decreasing the thickness of the switching layer, which strongly favours RRAM and FeRAM memories thanks to the ultrathin switching layer,

as opposed to relatively thick PCM and STT-MRAM elements. Vertical 3D arrays have been recently demonstrated for in-memory computing applications, where 3D RRAM devices were used as multiplication–addition–permutation (MAP) kernels to classify and associate data for an integrated hyperdimensional computing system<sup>102</sup>. Addressing the multiple technological challenges of 3D co-integration of memory devices, CMOS periphery and low-resistivity interconnects can serve as a future path towards high-density, energy-efficient in-memory computing.

To assess the full potential of in-memory computing, one should consider the individual computing blocks, such as the logic gates for Boolean operations or the crosspoint array, which are discussed earlier in this Review Article, and also system-level aspects such as the periphery circuit, the area efficiency and the time and energy efficiency of the system. For instance, operating the logic gates in Fig. 3 requires a control logic in the periphery, biasing the lines of selected data while optimizing the crosspoint area used for maximum parallelism, and minimizing the interaction between independent operations and the disturb to unselected bits. Other important considerations include power and clock delivery, especially for circuits in which the signal lines also need to supply the power, and for circuits that need multi-phase clocks or precisely timed clocks. Without a critical assessment of the control system and algorithms, a comparison with conventional von Neumann computers is not possible. Similarly, comparing digital MAC and crosspoint-based MVM requires a detailed evaluation of the system complexity, periphery circuit area, error tolerance, memory array utilization, and overall energy efficiency. In this scenario, choosing a suitable application plays a significant role, as in-memory computing might be better suited to data intensive, errortolerant tasks. The development of improved devices with higher endurance, lower cycle-to-cycle variation, lower energy consumption and lower instability might considerably advance in-memory computing concepts and accelerate their adoption for information communication technology.

In-memory computing can subvert the conventional architecture of the computer and eliminate the memory wall of today's computing systems. Various schemes have been proposed to compute within resistive switching devices by exploiting the physics of the devices to perform digital, analogue and stochastic computation. Although highly promising, significant efforts are still needed to address the interdisciplinary challenges of device optimization, circuit design, and system management. The development of resistive switching devices for storage is likely to strongly accelerate in-memory computing as a feasible alternative technology in the post-Moore microelectronics industry.

Received: 9 March 2018; Accepted: 21 May 2018; Published online: 13 June 2018

### References

- Moore, G. E. Cramming more components onto integrated circuits. *Electronics* 38, 114–117 (1965).
- Waldrop, M. M. The chips are down for Moore's law. *Nature* 530, 144–147 (2016).
- Wulf, W. A. & McKee, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News 23, 20–24 (1995).
- Horowitz, M. Computing's energy problem (and what we can do about it). 2014 IEEE Int. Solid-State Circuits Conf. Digest Tech. Papers (ISSCC) https://doi.org/10.1109/ISSCC.2014.6757323 (2014).
  - This work reviews the power limitation of modern computers, highlighting the importance of application-optimized computing to improve the energy efficiency.
- Chen, Y.-H., Krishna, T., Emer, J. S. & Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. *IEEE J. Solid-State Circuits* 52, 127–138 (2017).
- Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. Proc. 44th Int. Symp. Comp. Architecture (ISCA) https://doi.org/10.1145/3079856.3080246 (2017).
- Pawlowski, J. T. Hybrid memory cube (HMC). 2011 IEEE Hot Chips 23 Symp. (HCS) https://doi.org/10.1109/HOTCHIPS.2011.7477494 (2011).
- Lee, D. U. et al. A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29 nm process and TSV. 2014 IEEE Int. Solid-State Circuits Conf. Digest Tech. Papers (ISSCC) https://doi.org/10.1109/ISSCC.2014.6757501 (2014).
- Wong, H.-S. P. & Salahuddin, S. Memory leads the way to better computing. *Nat. Nanotech.* 10, 191–194 (2015).
- Waser, R. & Aono, M. Nanoionics-based resistive switching memories. Nat. Mater. 6, 833–840 (2007).
  - This is the first review on resistive switching memory describing the physical mechanisms and the experimental techniques to investigate them.
- Raoux, S., Welnic, W. & Ielmini, D. Phase change materials and their application to non-volatile memories. *Chem. Rev.* 110, 240–267 (2010).
- Kent, A. D. & Worledge, D. C. A new spin on magnetic memories. Nat. Nanotech. 10, 187–191 (2015).
- Mikolajick, T. et al. FeRAM technology for high density applications. Microelectron. Reliab. 41, 947–950 (2001).
- https://www.intel.com/content/www/us/en/architecture-and-technology/ intel-optane-technology.html
- Di Ventra, M. & Pershin, Y. V. The parallel approach. Nat. Phys. 9, 200–202 (2013).
- Indiveri, G. & Liu, S.-C. Memory and information processing in neuromorphic systems. *Proc. IEEE* 103, 1379–1397 (2015).
- Beck, A., Bednorz, J. G., Gerber, Ch., Rossel, C. & Widmer, D. Reproducible switching effect in thin oxide films for memory applications. *Appl. Phys. Lett.* 77, 139 (2000).
  - This is the first work demonstrating reproducible resistance switching in an oxide layer, thus paving the way for memory applications.
- Liu, Q. et al. Real-time observation on dynamic growth/dissolution of conductive filaments in oxide-electrolyte-based ReRAM. Adv. Mater. 24, 1844–1849 (2012).
- Ielmini, D. Modeling the universal set/reset characteristics of bipolar RRAM by field- and temperature-driven filament growth. *IEEE Trans. Electron Devices* 58, 4309–4317 (2011).
- Yang, J. J., Strukov, D. B. & Stewart, D. R. Memristive devices for computing. Nat. Nanotech. 8, 13–24 (2013).
- Kim, K. M., Jeong, D. S. & Hwang, C. S. Nanofilamentary resistive switching in binary oxide system; a review on the present status and outlook. *Nanotechnology* 22, 254002 (2011).
- Sawa, A. Resistive switching in transition metal oxides. *Mater. Today* 11, 28–36 (2008).
- Yamada, N., Ohno, E., Nishiuchi, K. & Akahira, N. Rapid-phase transitions of GeTe-Sb<sub>2</sub>Te<sub>3</sub> pseudobinary amorphous thin films for an optical disk memory. J. Appl. Phys. 69, 2849 (1991).
- Ielmini, D. & Zhang, Y. Analytical model for subthreshold conduction and threshold switching in chalcogenide-based memory devices. *J. Appl. Phys.* 102, 054517 (2007).
- Boniardi, M. et al. Optimization metrics for phase change memory (PCM) cell architectures. 2014 IEEE Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2014.7047131 (2014).
- Choi, B. J. et al. High-speed and low-energy nitride memristors. Adv. Funct. Mater. 26, 5290–5296 (2016).
- 27. Loke, D. et al. Breaking the speed limits of phase-change memory. *Science* **336**, 1566–1569 (2012).
- Lee, M.-J. et al. A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta<sub>2</sub>O<sub>5-x</sub>/TaO<sub>2-x</sub> bilayer structures. *Nat. Mater.* 10, 625–630 (2011).
- Chappert, C., Fert, A. & Nguyen Van Dau, F. The emergence of spin electronics in data storage. *Nat. Mater.* 6, 813–823 (2007).

- Locatelli, N., Cros, V. & Grollier, J. Spin-torque building blocks. Nat. Mater. 13, 11–20 (2014).
- Slonczewski, J. Current-driven excitation of magnetic multilayers.
   J. Magn. Magn. Mater. 159, L1–L7 (1996).
  - This work theoretically predicted spin transfer torque, where the magnetic polarization in a ferromagnetic layer can be switched by a current of spin-polarized electrons.
- Yuasa, S., Nagahama, T., Fukushima, A., Suzuki, Y. & Ando, K. Giant room-temperature magnetoresistance in single-crystal Fe/MgO/Fe magnetic tunnel junctions. *Nat. Mater.* 3, 868–871 (2004).
- Carboni, R. et al. Understanding cycling endurance in perpendicular spin-transfer torque (p-STT) magnetic memory. 2016 IEEE Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2016.7838468 (2016).
- Boescke, T. S., Mueller, J., Brauhaus, D., Schroeder, U. & Boettger, U. Ferroelectricity in hafnium oxide thin films. *Appl. Phys. Lett.* 99, 102903 (2011).
- Trentzsch, M. et al. A 28nm HKMG super low power embedded NVM technology based on ferroelectric FETs. 2016 IEEE Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2016.7838397 (2016).
- Oh, S. et al. HfZrO<sub>x</sub> -based ferroelectric synapse device with 32 levels of conductance states for neuromorphic applications. *IEEE Electron Device* Lett. 38, 732–735 (2017).
- Niemier, M. T. et al. Nanomagnet logic: progress toward system-level integration. J. Phys. Condens. Matter 23, 493202 (2011).
- Amlani, I. et al. Digital logic gate using quantum-dot cellular automata. Science 284, 289–291 (1999).
- Khajetoorians, A. A., Wiebe, J., Chilian, B. & Wiesendanger, R. Realizing all-spin-based logic operations atom by atom. *Science* 332, 1062–1064 (2011).
- Govoreanu, B. et al. 10 x 10 nm² Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation. 2011 Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2011.6131652 (2011).
   This is the first work demonstrating the scalability of RRAM in the lateral size range of 10 nm.
- Linn, E., Rosezin, R., Tappertzhofen, S., Böttger, U. & Waser, R. Beyond von Neumann—logic operations in passive crossbar arrays alongside memory operations. *Nanotechnology* 23, 305205 (2012).
- Gaillardon, P.-E. et al. The programmable logic-in-memory (PLiM) computer. IEEE Design, Automation & Test in Europe Conference (DATE) 427–432 (2016); https://infoscience.epfl.ch/record/213465/files/PEG\_DATE16.pdf
- Papandroulidakis, G., Vourkas, I., Vasileiadis, N. & Sirakoulis, G. Ch. Boolean logic operations and computing circuits based on memristors. IEEE Trans. Circuits Syst. II: Express Briefs 61, 972–976 (2014).
- Nikonov, D. E. & Young, I. A. Overview of beyond-CMOS devices and a uniform methodology for their benchmarking. *Proc. IEEE* 101, 2498–2533 (2013).
- Gao, L., Alibart, F. & Strukov, D. B. Programmable CMOS/memristor threshold logic. IEEE Trans. Nanotechnology 12, 115–119 (2013).
- James, A. P., Francis, L. R. V. J. & Kumar, D. S. Resistive threshold logic. IEEE Trans. Very Large Scale Integr. (VLSI) 22, 190–195 (2014).
- Borghetti, J. et al. 'Memristive' switches enable 'stateful' logic operations via material implication. *Nature* 464, 873–876 (2010).
   This is the first work proposing the idea of stateful Boolean operation
- with RRAM, where memory devices are used for digital computation.
  48. Reuben, J. et al. Memristive logic: A framework for evaluation and comparison. 27th Int. Symp. Power and Timing Modeling, Optimization and Simulation (PATMOS) https://doi.org/10.1109/PATMOS.2017.8106959 (2017).
- Jeong, D. S., Kim, K. M., Kim, S., Choi, B. J. & Hwang, C. S. Memristors for energy-efficient new computing paradigms. Adv. Electron. Mater. 2, 1600090 (2016).
- Balatti, S., Ambrogio, S. & Ielmini, D. Normally-off logic based on resistive switches—Part I: Logic gates. *IEEE Trans. Electron Devices* 62, 1831–1838 (2015).
- Huang, P. et al. Reconfigurable non-volatile logic operations in resistance switching crossbar array for large-scale circuits. Adv. Mater. 28, 9758–9764 (2016).
- Chen, B. et al. Efficient in-memory computing architecture based on crossbar arrays. 2015 IEEE Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2015.7409720 (2015).
- Cassinerio, M., Ciocchini, N. & Ielmini, D. Logic computation in phase change materials by threshold and memory switching. *Adv. Mater.* 25, 5975–5980 (2013).
- Mahmoudi, H., Windbacher, T., Sverdlov, V. & Selberherr, S. Implication logic gates using spin-transfer-torque-operated magnetic tunnel junctions for intrinsic logic-in-memory. Solid-State Electron. 84, 191–197 (2013).
- Balatti, S. et al. Voltage-controlled cycling endurance of HfO<sub>x</sub>-based resistive-switching memory (RRAM). *IEEE Trans. Electron Devices* 62, 3365–3372 (2015).

- Clark, L. T. et al. ASAP7: A 7-nm FinFET predictive process design kit. Microelectron. J. 53, 105–115 (2016).
- Wright, C. D., Hosseini, P. & Vazquez Diosdado, J. A. Beyond von-Neumann computing with nanoscale phase-change memory devices. Adv. Funct. Mater. 23, 2248–2254 (2013).
  - This is the first work proposing the use of cumulative crystallization in PCM as a means for analogue computation and neuron-like integration.
- 58. Feldmann, J. et al. Calculating with light using a chip-scale all-optical abacus. *Nat. Commun.* **8**, 1256 (2017).
- Hosseini, P., Sebastian, A., Papandreou, N., Wright, C. D. & Bhaskaran, H. Accumulation-based computing using phase-change memories with FET access devices. *IEEE Electron Device Lett.* 36, 975–977 (2015).
- Bichler, O. et al. Visual pattern extraction using energy-efficient 2-PCM synapse neuromorphic architecture. *IEEE Trans. Electron Devices* 59, 2206–2214 (2012).
- Burr, G. W. et al. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. *IEEE Trans. Electron Devices* 62, 3498–3507 (2015).
- Tuma, T., Pantazi, A., Le Gallo, M., Sebastian, A. & Eleftheriou, E. Stochastic phase-change neurons. Nat. Nanotech. 11, 693–699 (2016).
- Qiao, N. & Indiveri, G. Scaling mixed-signal neuromorphic processors to 28 nm FD-SOI technologies. *IEEE Biomedical Circuits & Systems Conference* (BioCAS) https://doi.org/10.1109/BioCAS.2016.7833854 (2016).
- Stoliar, P. et al. A leaky-integrate-and-fire neuron analog realized with a Mott insulator. Adv. Funct. Mater. 27, 1604740 (2017).
- 65. Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. *Nat. Electron* 1, 137–145 (2018).
- Larentis, S., Nardi, F., Balatti, S., Gilmer, D. C. & Ielmini, D. Resistive switching by voltage-driven ion migration in bipolar RRAM—Part II: Modeling. *IEEE Trans. Electron Devices* 59, 2468–2475 (2012).
- Yu, S., Wu, Y., Jeyasingh, R., Kuzum, D. & Wong, H.-S. P. An electronic synapse device based on metal oxide resistive switching memory for neuromorphic computation. *IEEE Trans. Electron Devices* 58, 2729–2737 (2011).
- Yu, S. et al. A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. *Adv. Mater.* 25, 1774–1779 (2013).
- Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. *Nature* 521, 61–64 (2015).
- Jang, J.-W., Park, S., Burr, G. W., Hwang, H. & Jeong, Y.-H. Optimization of conductance change in Pr<sub>1-x</sub>Ca<sub>x</sub>MnO<sub>3</sub>-based synaptic devices for neuromorphic systems. *IEEE Electron Device Lett* 36, 457–459 (2015).
- Chanthbouala, A. et al. A ferroelectric memristor. Nat. Mater. 11, 860–864 (2012).
- Lequeux, S. et al. A magnetic synapse: multilevel spin-torque memristor with perpendicular anisotropy. Sci. Rep. 6, 31510 (2016).
- Diorio, C., Hasler, P., Minch, B. A., & Mead, C. A single-transistor silicon synapse. *IEEE Trans. Electron Devices* 43, 1972-1980 (1996).
   This is the first work proposing the adoption of a flash memory as a synapse capable of plasticity by weight update.
- Ambrogio, S. et al. Statistical fluctuations in HfO<sub>x</sub> resistive-switching memory (RRAM): Part I—Set/reset variability. *IEEE Trans. Electron Devices* 61, 2912–2919 (2014).
- Rizzi, M. et al. Cell-to-cell and cycle-to-cycle retention statistics in phasechange memory arrays. IEEE Trans. Electron Devices 62, 2205–2211 (2015).
- Chen, A. Utilizing the variability of resistive random access memory to implement reconfigurable physical unclonable functions. *IEEE Electron Device Lett.* 36, 138–140 (2015).
- Herder, C., Yu, M.-D., Koushanfar, F. & Devadas, S. Physical unclonable functions and applications: A tutorial. *Proc. IEEE* 102, 1126–1141 (2014).
- Maass, W. Noise as a resource for computation and learning in networks of spiking neurons. *Proc. IEEE* 102, 860–880 (2014).
- Jun, B. & Kocher, P. The Intel Random Number Generator (Rambus, 1999); https://www.rambus.com/intel-random-number-generator/
- Ambrogio, S. et al. Statistical fluctuations in HfO<sub>x</sub> resistive-switching memory (RRAM): Part II—Random telegraph noise. *IEEE Trans. Electron Devices* 61, 2920–2927 (2014).
- Huang, C.-Y., Shen, W. C., Tseng, Y.-H., King, Y.-C. & Lin, C.-J. A contact-resistive random-access-memory-based true random number generator. *IEEE Electron Device Lett.* 33, 1108–1110 (2012).
- Jiang, H. et al. A novel true random number generator based on a stochastic diffusive memristor. *Nat. Commun.* 8, 882 (2017).
- Balatti, S., Ambrogio, S., Wang, Z.-Q. & Ielmini, D. True random number generation by variability of resistive switching in oxide-based devices. IEEE J. Emerging Topics in Circuits and Systems (JETCAS) 5, 214–221 (2015).
- Gaba, S., Sheridan, P., Zhou, J., Choi, S. & Lu, W. Stochastic memristive devices for computing and neuromorphic applications. *Nanoscale* 5, 5872 (2013).

- Choi, W. H. et al. A magnetic tunnel junction based true random number generator with conditional perturb and real-time output probability tracking. 2014 IEEE Int. Electron Devices Meet. (IEDM) https://doi. org/10.1109/IEDM.2014.7047039 (2014).
- Fukushima, A. et al. Spin dice: A scalable truly random number generator based on spintronics. Appl. Phys. Express 7, 083001 (2014).
- Balatti, S. et al. Physical unbiased generation of random numbers with coupled resistive switching devices. *IEEE Trans. Electron Devices* 63, 2029–2035 (2016).
- Jo, S. H., Kim, K.-H. & Lu, W. High-density crossbar arrays based on a Si memristive system. *Nano Lett.* 9, 870–874 (2009).
- Kau, D. et al. A stackable cross point phase change memory. 2009 IEEE Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2009.5424263 (2009)
- Truong, S. N. & Min, K.-S. New memristor-based crossbar array architecture with 50-% area reduction and 48-% power saving for matrix-vector multiplication of analog neuromorphic computing. *J. Semicond. Technol. Sci.* 14, 356–363 (2014).
- Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron 1, 52–59 (2018).
- Eryilmaz, S. B. Brain-inspired and Non-conventional Computing with Emerging Memory Devices PhD thesis, Stanford University (2017); https://searchworks.stanford.edu/view/12137356
- Sheridan, P. M. et al. Sparse coding with memristor networks. *Nat. Nanotech.* 12, 784–789 (2017).
- Gao, L., Chen, P.-Y., Liu, R. & Yu, S. Physical unclonable function exploiting sneak paths in resistive cross-point array. *IEEE Trans. Electron Devices* 63, 3109–3115 (2016).
- Ielmini, D., Lacaita, A. L. & Mantegazza, D. Recovery and drift dynamics of resistance and threshold voltages in phase change memories. *IEEE Trans. Electron Devices* 54, 308–315 (2007).
- Ielmini, D., Sharma, D., Lavizzari, S. & Lacaita, A. L. Reliability impact of chalcogenide-structure relaxation in phase change memory (PCM) cells—Part I: Experimental study. *IEEE Trans. Electron Devices* 56, 1070–1077 (2009).
- Kim, S. et al. A phase change memory cell with metallic surfactant layer as a resistance drift stabilizer. 2013 IEEE Int. Electron Devices Meet. (IEDM) https://doi.org/10.1109/IEDM.2013.6724727 (2013).
- Daly, D. C., Fujino, L. C. & Smith, K. C. Through the looking glass The 2017 edition: Trends in solid-state circuits from ISSCC. *IEEE Solid-State Circuits Mag.* 9, 12–22 (2017).
- Kapur, P., McVittie, J. P. & Saraswat, K. C. Technology and reliability constrained future copper interconnects—Part I: Resistance modeling. IEEE Trans. Electron Devices 49, 590–597 (2002).
- Geim, A. K. & Novoselov, K. S. The rise of graphene. *Nat. Mater.* 6, 183–191 (2007).
- 101. Yu, S., Chen, H.-Y., Gao, B., Kang, J. & Wong, H.-S. P. HfO<sub>x</sub> based vertical resistive switching random access memory suitable for bit-cost-effective three-dimensional cross-point architecture. ACS Nano 7, 2320 (2013).
- 102. Li, H., Wu, T. F., Mitra, S. & Wong, H.-S. P. Resistive RAM-centric computing: Design and modeling methodology. *IEEE Trans. Circuits and Systems I: Regular Papers* 64, 2263–2273 (2017).

### Acknowledgements

D.I. acknowledges funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 648635). H.-S.P.W. is supported in part by DARPA, the National Science Foundation (E2CDA, Expeditions in Computing), in addition to member companies of: Stanford Non-Volatile Memory Technology Research Initiative (NMTRI) and Stanford SystemX Alliance.

### **Author contributions**

D.I. and H.-S.P.W. conceived the project, carried out the discussions and wrote the manuscript.

### Competing interests

The authors declare no competing interests.

# **Additional information**

Reprints and permissions information is available at www.nature.com/reprints.

Correspondence should be addressed to D.I. or H.-S.P.W.

**Publisher's note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.