# Design Trade-Offs for High Density Cross-Point Resistive Memory

Abstract—With conventional memory technologies approaching their scaling limit, emerging non-volatile memory technologies have attracted increasing attention because of their non-volatility, high access speed, low power consumption, and good scalability. Resistive RAM (ReRAM), with its simple structure, small cell size  $(4F^2)$ , and the support for 3D stacking, has been a promising candidate among emerging memory technologies. A key advantage of ReRAM comes from its non-linear nature, which enables cross-point ReRAM array structures without having a dedicated access transistor for each cell. While cross-point design is effective in improving the memory density, it has inherent disadvantages which introduce extra design challenges. Based on the device characteristics, we perform a comprehensive analysis of issues of reliability, energy consumption, area overhead, and performance for the cross-point array structure. In addition to the cell-level analysis, different programming schemes are also discussed in this paper. According to our analysis, the area, energy, and bandwidth of a 256 Mbits ReRAM macro is detailed evaluated. The simulation results enable designers to identify the most performance/energy/area efficient ReRAM organization and cell parameters that meet specific design goals during the early design stage.

#### I. INTRODUCTION

The scaling of traditional memory technologies, such as DRAM and FLASH, is approaching its physical limit. In the past few years, emerging non-volatile memory technologies (NVM), such as Phase Change RAM (PCRAM), Spin-transfer-torque RAM (STT-RAM), and Resistive RAM (ReRAM) have been widely studied as potential candidates for the next generation memory technologies to meet the requirement of higher density, faster access time, and lower power consumption. Among all of these emerging memory technologies, ReRAM has many unique characteristics, including simple structure, nonlinearity, and high resistance ratio, making itself one of the most promising technologies. Researchers have shown that the state-of-theart single-level-cell ReRAM can achieve 7.2ns random access time for both read and write operations with a resistance ratio larger than 100 [1]. Also, HP labs and Hynix have already announced plans to commercialize memristor-based ReRAM and predicted that ReRAM could eventually replace traditional memory technologies [2].

Unlike other non-volatile memory technologies, ReRAM can be implemented in a cross-point style structure without any access device [3], [4]. Specifically, in a nano cross-point array, each bistable ReRAM cell is sandwiched by two orthogonal nanowires. Thus the area occupied by each cell is  $4F^2$  per bit. However, the simplicity of the access-device-free, cross-point structure introduces challenges to the peripheral circuit and memory organization design. While there have been prior studies on cross-point ReRAM arrays [5]-[9], they do not consider the effect of voltage drivers and programming methods on the array. In addition, detailed area, energy, and performance analysis is also absent. In this work, we address the design challenges of cross-point structure based ReRAM. We use a mathematical model to evaluate memory reliability, energy consumption, and area overhead for different designs and cell parameters. The advantages of nonlinearity  $K_r$  and write current  $I_w$  scaling are all discussed in detail. In addition, the simulation results of area, energy, and write throughput trade-offs are presented. Our study allows for exploring the most energy/area efficient ReRAM design with different design constraints and cell parameters at the very beginning of the design stage. Moreover, system designers can also leverage the proposed model to provide valuable feedback to device researchers who will in turn adjust ReRAM cell design. We believe that this kind of collaboration will be very helpful to shorten the time to market of ReRAM memory.

## II. PRELIMINARIES

This section provides background of ReRAM and cross-point architecture, and discusses the modeling of cross-point ReRAM array.

## A. Background of ReRAM Technology

As implied by its name, a ReRAM cell uses its resistance to represent the stored information. A ReRAM cell is built on a Metal-Insulator-Metal(MIM) structure and can be switched between a high resistance state (HRS) and a low resistance state (LRS) by applying an external voltage across the cell. The resistance switching behaviors have been observed in many MIM nanodevices with different metal oxide materials. For example, a particular  $TiO_2$  based MIM structure ReRAM, named 'memristor', was developed by HP Labs in 2008 [10]. The proposed memristor-based ReRAM is considered as the first experimental realization and a theoretical model of the fourth fundamental circuit element, which is predicted by Chua [11] about 40 years ago. It has been reported that the memristor-based ReRAM has very small cell size with an access time of less than 50ns [12]. Another  $H_fO_2$ -based bipolar ReRAM prototype was fabricated by ITRI with an access time as low as 7.2ns [1].

Although there are several variants of ReRAM cells, all of them can be classified into two broad categories: unipolar ReRAM and bipolar ReRAM. In a unipolar cell, the resistance switching behaviors do not depend on the polarity of the voltage input across the cell and are only related to magnitude and duration of the voltage input. On the other hand, in a bipolar cell, the voltage polarity for ON-to-OFF switching (RESET operation) is different from OFF-to-ON switching (SET operation). The need of different pulse widths for SET and RESET in unipolar ReRAM means that its write latency is determined by the longest pulse. Moreover, the control of SET, RESET, and read operations without any disturbance is another crucial design challenge, especially in high speed ReRAM design. For these reasons, most high performance ReRAM studies are dominated by bipolar ReRAM [1], [4], [13], [14]. In this study, we perform a detailed analysis of the design challenges of bipolar ReRAM cross-point arrays.

# B. Cross-Point Architecture

There are two possible memory structures for a bipolar ReRAM array implementation: the traditional MOSFET-accessed structure and the cross-point structure. In the MOSFET-accessed memory array, a MOSFET is used as an access device for each memory cell. As the size of a MOSFET access device is typically much larger than the size of a ReRAM cell, the total area of memory array is primarily dominated by MOSFETs rather than ReRAM cells. Also, in order to provide enough driven current, larger than minimumsized transistor should be used for write operations. Hence, ReRAM's area advantage gets lost because of the access devices. Fortunately, the access device can be eliminated due to the large current-voltage (I-V) nonlinearity of some ReRAM devices [12], [15]. The I-V characteristic demonstrated in these fabricated devices shows that the resistance of ReRAM significantly increases as the voltage applied on it decreases. Such observation basically indicates effective cut off of the leakage current from the unselected cells in the sneak paths.



Figure 1. A schematic view of a typical cross-point array. (a) The perspective of the cross-point array. (b) The top view of the array, from which we can clearly see that the size of each cell is  $4F^2$ .

Therefore, the area-efficient cross-point ReRAM memory array is enabled by the intrinsic property of the device [16]. A schematic view of a typical cross-point memory array is shown in Figure 1(a). As shown, ReRAM cells are sandwiched between wordlines and bitlines (top electrodes and bottom electrodes). Figure 1(b) shows a top view of the array, which indicates that each ReRAM cell occupies an area of  $4F^2$ , the theoretical lower limit for a single layer single level memory cell. In addition, this memory density can be further improved by using a multi-layer multi-level cross-point ReRAM array [3] [17].

There are several write/read schemes for cross-point ReRAM arrays. For example, the write operation can write either a single-bit per access or several bits attached to the same wordline at the same time. Although the second scheme has higher bandwidth, it requires a two-step write operation to prevent unintentional writing [16], which significantly increases the write latency. Furthermore, while writing to a cross-point array, the unselected wordlines and bitlines can be either left floating or half-biased. In contrast, while reading a cell, the selected wordline should be biased with a read voltage and all the other wordlines and bitlines in the array are shunted to ground. The current in each bitline is then sensed and compared to a reference current to determine the cell content. However, due to the sneak current existing in the cross-point array, the current in bitlines also varies depending upon the data patterns of unselected cells. This read disturbance restricts the size of a cross-point array, since sneak current increases as the number of cells attached to wordlines and bitlines increases, which makes it difficult to sense the current difference of the selected cell at HRS and LRS. Besides, the existence of the voltage drop along the nanowires also limits the length of wordlines and bitlines. Therefore, a cross-point array should be sized carefully to meet the requirements of the read/write reliability. In addition to all of these write/read schemes, different cell parameters will also impact the reliability, energy consumption, bandwidth, and area efficiency of the cross-point ReRAM array. In this case, it is not straightforward for a designer to figure out how to design a workable memory array with the minimum energy consumption and area overheads. Thus, the following sections will propose a worst-case oriented methodology to help designers make decisions early in the design flow.

# C. Modeling of the Cross-Point Memory

The basic circuit model of an M by N cross-point ReRAM array is shown in Figure 2. This model is built upon Kirchhoff's Current Law (KCL) and its validity can be guaranteed by deductions from basic circuit theory. The horizontal lines are wordlines and the vertical lines represent bitlines. The ReRAM cells are located at each wordline and bitline cross-point. A detailed cross-point structure is also shown in Figure 2(b). The resistance of the ReRAM cell at the cross-point of  $i^{th}$  wordline and  $j^{th}$  bitline is represented by  $R_{i,j}$ . We assume



Figure 2. The circuit model of the cross-point array.



Figure 3. Validation of the analytical model against SPICE simulation. The two figures show the voltage drops obtained from our model and SPICE (a) with a nonlinearity factor of 5 and (b) without nonlinearity.

the resistance of the wire connecting two cross-points to be  $R_{line}$ . The input resistance of each wordline or bitline driver is  $R_v$  and the resistance of a sense amplifier is  $R_s$ . In order to set up the KCL equations, the voltage at each cross-point is indicated as  $V_{i,j}$  for the wordline layer and  $V_{i,j}'$  for the bitline layer. In addition, the input voltage for the  $i^{th}$  wordline is  $V_{Wi}$  and for the  $i^{th}$  bitline is  $V_{Bi}$ . In the case that a wordline is driven from both sides, the voltage at the other end of the  $i^{th}$  wordline is represented as  $V_{Wi}'$ .

Based on this model, the current equations for each cross-point can be obtained. All of the cross-points have similar structure with no more than three current branches, and therefore it is very easy to set up the KCL equations for each cross-point. Since the cross-points at the edges of the array have different write/read conditions, the KCL equations of these cross-point should be adjusted according to each write/read scheme.All of the KCL equations can be considered as a system of linear equations, which has the following form

$$A \cdot V = C,\tag{1}$$

where A is a  $2mn \times 2mn$  coefficient matrix and C is a  $2mn \times 1$  vector, containing the constant terms of these equations. Thus, with parameters such as the resistance of ReRAM cells, the resistance of interconnect wires, program voltages, and write/read schemes, voltages at various cross points can be obtained by solving the system of linear equations. With detailed voltage values,  $V_{2mn\times 1}$ , we can analyze the array at a fine granularity. These values are also critical to evaluate the reliability, energy consumption, driven current density, and area overheads of a cross-point array.

To validate the analytical model, we compare the results with HSPICE [18] simulations using a resistor model in cross-point memory arrays. The results of eight cross-point arrays with different array sizes and specific data patterns are shown in Figure 3, which shows that the voltage drop on the selected cell derived from our analytical model are consistent with the HSPICE simulation results.

TABLE I
PARAMETERS OF THE BASELINE CROSS-POINT ARRAY

| Metric      | Description                    | Typical Values (Range)   |  |  |  |
|-------------|--------------------------------|--------------------------|--|--|--|
| $A_{cell}$  | Cell Size                      | $4F^2$                   |  |  |  |
| $R_l$       | Interconnection Resistance     | $0.65\Omega$             |  |  |  |
| $V_{RESET}$ | Threshold voltage for RESET    | 2.0V                     |  |  |  |
| $V_{SET}$   | Threshold voltage for SET      | -2.0V                    |  |  |  |
| $V_{READ}$  | Read Voltage of Cell           | 0.5V                     |  |  |  |
| $I_{on}$    | Write Current for LRS Cell     | $40uA \ (40 \sim 200uA)$ |  |  |  |
| $V_W(R)$    | Wordline Voltage during Read   | 0.5V                     |  |  |  |
| $V_W(W)$    | Wordline Voltage during Write  | $\pm 2V$                 |  |  |  |
| $V_W(H)$    | Half Selected wordline Voltage | 1V                       |  |  |  |
| $V_B(R)$    | Bitline Voltage during Read    | 0V                       |  |  |  |
| $V_B(W)$    | Bitline Voltage during Write   | 0V                       |  |  |  |
| $V_B(H)$    | Half Selected bitline Voltage  | 1V                       |  |  |  |
| $K_r$       | Nonlinearity of ReRAM Cell     | $20 \ (2 \sim 40)$       |  |  |  |
| M, N        | Number of wordlines/bitlines   | 512 (8 $\sim$ 1024)      |  |  |  |

#### III. ANALYSIS OF DESIGN CONSTRAINTS - A CASE STUDY

In this section, we study the effect of various schemes on crosspoint ReRAM arrays in detail. Specifically, we evaluate the design constraints on array size, energy consumption and area overhead in worst case scenarios. The results of this study will be useful when designing a cross-point array.

#### A. Overview

In order to write or read a cross-point array, proper voltages should be applied across the ReRAM cell. Although the goal of a read operation is different from a write operation, both of them are realized by fully biasing the selected wordlines/bitlines and floating (or half biasing) unselected wordlines/bitlines. Thus, the coefficient matrix A and the constant vector C are very similar for both. In addition, their energy consumption and area overhead will also have a similar trend. Therefore, in this section, we first study the write operation comprehensively. After that, for read operation, we mainly focus on the read margin analysis since it is unique for read operations.

Table I shows the circuit parameters of our baseline 50nm design. The data is derived from the recently published studies on ReRAM [16], [19], [20]. The nonlinearity coefficient is defined as

$$K_r(p, V) = p \times R(V/p)/R(V), \tag{2}$$

where R(V/p) and R(V) are the equivalent resistance of the cell biased at V/p and V [16]. Therefore, the resistance of a ReRAM cell with nonlinearity is not constant but varies with the applied voltage. For example, for a ReRAM cell with nonlinearity of 20, the resistance of half biased cell is 10 times larger than resistance of fully biased cell. By using these parameters, we study reliability, energy consumption, and area overheads for four different write schemes, and discuss the sensitivities of these schemes to the data pattern of HRS and LRS ReRAM cells and cell nonlinearity. In this section, the baseline design uses a cell with write current of 40uA and nonlinearity Kr=20. A sensitivity study varying the nonlinearity coefficient and the write current is presented in Section IV.

### B. Write Operation

To write a ReRAM cell, an external voltage is applied across the cell for a certain duration. Intuitively, there are four possible schemes for the write operation: FWFB scheme activates the selected wordline and selected bitline, and leaves all of other lines floating; FWHB scheme activates the selected wordline and bitline, leaves all the unselected wordlines floating, and half biases the unselected bitlines; HWFB scheme activates the selected wordline and bitline, leaves all the unselected bitlines floating, and half biases the unselected wordlines; HWHB scheme activates the selected wordline and bitline, and half biases the unselected wordlines and bitlines. However, the FWFB scheme has an inherent problem that may result in severe

write disturbance [8]. Therefore, in the following discussion, we only compare the results of the FWHB, HWFB and HWHB schemes. For each of these three schemes, we can either write several cells on one wordline at the same time (multi-bit write) or write only one bit per access (single-bit write) and distribute the write operation to several arrays.

Reliable Write Operations. Write reliability is a serious concern in cross-point arrays. In an ideal condition, the resistance of wires and the sneak currents in unselected cells are negligible. In such a scenario, all the write schemes discussed above can make sure that the write voltage  $V_W(W) - V_B(W)$  is fully applied across the specified cell. However, in reality, both wire resistance and sneak current are non-trivial. Hence, the voltage applied across a cross-point varies based on the location of the cell as well as the data pattern stored in all of the ReRAM cells in the array. A write is considered reliable if it modifies the content of the selected cells to the new value without disturbing other unselected cells. Correspondingly, there are two potential problems with writes: write failure, an unsuccessful write on selected cells, and write disturbance, an undesirable write to unselected cells. It is necessary to ensure that a write scheme guarantees reliable operation even in the worst case (w.r.t the location of cells to written and the data pattern stored in the cross-point array).

Write failure typically results from the voltage drop at the interconnect wires along the wordline and bitline. It has been shown that, for single-bit write operation, the worst case voltage drop occurs when writing the cell at the cross point of the  $M^{th}$  wordline and the  $N^{th}$ bitline with all of the other cells in the array are in LRS [7]. In order to avoid write failure and successfully program the selected ReRAM cell, the driven voltage should be boosted to a higher level, making sure that the voltage across the cell exceeds the threshold voltage even at the worst case. Figure 4 shows the lower bounds of the driven voltage for different sizes of cross-point array. The minimum wordline/bitline voltage increases from 2.01 V for a  $32 \times 32$  array to nearly 7 V for a 1024×1024 cross-point array. However, boosting the driven voltage also increases the voltage applied at unselected cells. Therefore, a write disturbance may occur when the voltage applied at an unselected cell exceeds the threshold voltage for SET or RESET operation. According to our analysis, the maximum voltage applied at unselect cells is exactly the same as half of the driven voltage. Thus, only arrays with driven voltage less than 4V are allowable. Otherwise, the array is unreliable because it cannot avoid write failure and write disturbance at the same time. The unreliable array sizes are denoted as red bars in Figure 4. The array size limitation provided by Figure 4 is a hard constraint, and all of the following energy and area trade-offs are bounded by this constraint.

Additionally, the cross-point array can be organized with a different number of wordlines and bitlines. For example, a 256K bit cross-point array can be implemented either by a  $512 \times 512$  array or by a  $64 \times 4096$  array. In the latter case, the voltage drops along the wordline will be much worse than along the bitline. Figure 5 examines the voltage requirements for different array organizations with different write schemes. This result shows that from a reliability point of view, a cross-point array with the same number of wordlines and bitlines is the best choice. Thus, in the following discuss, we assume the array has the same number of wordline and bitline. Furthermore, we also notice that when the array has the same number of wordlines and bitlines, FWFB, HWFB and FWHB schemes have the same minimum driven voltage.

**Energy Consumption of Write Operations.** The energy consumption of a write operation includes: the energy consumed to change the state of the selected cell, the undesired energy wasted at the half selected cells and unselected cells, and the energy consumed by the



Figure 4. Required write voltages for different cross-point arrays (threshold voltage = 2V.).



Figure 5. Required write voltages with different memory shapes (array capacity = 256Kbits, threshold voltage = 2V).

interconnect lines. Intuitively, the impact of sneak paths for floating schemes (FWHB and HWFB) is more serious, the energy consumed at unselected cells for floating schemes should be larger than the HWHB scheme. However, our simulation results show that compared to the total energy consumption, the energy consumed by unselected cells are negligible. Therefore, the total energy consumptions for FWHB and HWFB schemes are almost the same as that of HWHB scheme. In addition, as mentioned, the voltage drop results for these schemes are also similar, in the following discussion, we only show the simulation results of HWHB scheme. The results for the other two schemes have the same trend as that of HWHB scheme.

Figure 6(a) shows the decomposed energy consumption for singlebit write operation. Obviously, the undesired energy consumed by half-selected cells takes a great part of the total energy consumption. Besides, with the increase of array size, the energy dissipated at interconnect lines also becomes significant. Also, this part of the energy wasted during the write operation is a greater part of the total energy for larger array sizes. For example, the undesired energy consumption for writing a 512×512 array is more than 15 times larger than that of a 32×32 array. For multi-bit write operation, we evaluate the energy consumption of write operations that program the entire wordline at one time. In order to fairly compare the energy consumption, we compare the energy-per-bit instead of the total energy. For example, in order to write a wordline with size of 512 bits, the energy-per-bit can be calculated as:  $E_{ave} = E_{total}/512$ . Figure 6(b) shows the energy-per-bit of the multi-bit write operation. Compared with the single-bit write operation, we conclude that for large cross-point array sizes, the multi-bit write operation is much more energy efficient. This is because the energy wasted at the unselected and half-selected cells are amortized by multiple bits and the average energy for one bit is therefore reduced.

Write Current and Area Overhead of Write Operations. The write operation for a  $M \times N$  array requires M wordline voltage



Figure 6. The normalized energy consumption with different array size. (a) Single-bit writing. (b) Multi-bit writing.

drivers and N bitline multiplexors. The drivers and multiplexors should be sized such that they can provide the worst-case current of wordline current and bitline current. The transistor sizing of the wordline/bitline circuitry is achieved using HSPICE simulations. We further calculate the area overhead for the drivers and multiplexors by referring to the CACTI area model. Figure 7(a) shows the maximum write current with different ReRAM array sizes. Not surprisingly, the current requirement increases as the array size increases. Figure 8(a) illustrates the area overhead for the wordline and bitline circuitry. This show that drivers and multiplexors occupy a smaller area than the cross-point array. Only in this case can voltage drivers and multiplexors be implemented beneath the array, resulting an ideal cell size of  $4F^2$ .

Although multi-bit write operation has the advantage of lower energy consumption, the maximum current requirement for each wordline also increases. As demonstrated in Figure 7(b), although the maximum driven current for each bitline is almost the same as when writing one bit, the driving current requirement for each wordline in a multi-bit write scheme is > 10 times larger than that of a single-bit write scheme. Since the area of the voltage driver increases proportionally with its driving current, the area overhead for multi-bit writing is much larger than that of single-bit writing. As shown in Figure 8(b), the peripheral circuitry area is much larger than that of the array. In this case, the total area of the memory array is dominated by the peripheral circuitry rather than the cells. In addition to the extra area overhead, writing multiple bits at one time also worsens the voltage drop along the wordline. As shown in Figure 9, in order to write an entire wordline when writing, the maximum reliable array size reduces from  $800 \times 800$  to  $352 \times 352$ . This is because the current passing through the interconnect wires in the multi-bit write scheme is much larger than that of the singlebit write scheme, causing more severe voltage drops on the wire resistance.

Therefore, we conclude that although the multi-bit write operation is more energy efficient, from the standpoint of reliability and area overhead, single-bit write operation is preferred.

**Read Operation.** In this section we apply a similar sensing scheme as [6] and [7]. In order to read cell  $R_{i,j}$ , the  $i^{th}$  wordline is biased at  $V_{READ}$  and all of the other wordlines and bitlines are grounded. Then the state of the selected cell is read out by measuring the voltage across  $R_s$ . The energy consumption for a read operation can be analyzed similarly as a write operation. Since the read voltage is much smaller than write voltage, the read energy is expected to be at least one order of magnitude smaller than for a write operation. Considerable sensing margin is achieved by implementing a current-to-voltage converter and sensing the voltage signal using traditional or more recent sense amplifier designs. The input resistance of the current-to-voltage converter is extracted from HSPICE simulation



Figure 7. The requirements for wordline and bitline driven currents. (a) One bit per write. (b) One wordline per write.



Figure 8. Area overhead comparison. (a) One bit per write. (b) One wordline per write.

results. Read sensing margin is defined as  $\Delta V = \Delta I \times R_{converter}$  where  $R_{converter}$  is the input resistance of the converter. The read reliability is determined by the voltage swing for reading HRS and LRS cells. Detailed results will be shown in Section IV.

#### IV. NONLINEARITY AND WRITE CURRENT SCALING

One of the most distinct features of ReRAM is its nonlinearity. Normally, the  $K_r$  value for memristor-based ReRAM is larger than 20, meaning that the resistance of a half-biased cell is at least 10 times larger than a full-biased cell. Clearly, ReRAM cells with larger nonlinearity coefficients result in a better memory cell since the sneak



Figure 9. Worst case select voltage and write voltage requirements for multibit writing (one wordline per write).



Figure 10. The maximum array size with different nonlinearity coefficients.



Figure 11. Energy and area overhead comparison. (a) Energy consumption (normalized to baseline). (b) Area overhead of voltage driver (normalized to the area of cross-point array).

current in half selected cells will be significantly reduced. In addition, the increased resistance at half-selected and unselected cells can also mitigate the voltage drop along the activated wordline and bitline. Also, we find that the cross-point array design can benefit from the scaling of the write current. Figure 10 shows the influence of different nonlinearity coefficients and write currents on the array size requirements for a single-bit HWHB writing scheme. This figure shows that the array size limitation is relaxed as the nonlinearity increases or the write current scales. As we can see from the figure, the maximum array size exceeds  $1024 \times 1024$  when we have a nonlinearity of 30, together with a write current of  $40\mu A$ .

Moreover, the increase of nonlinearity or scaling of write current can also reduce the energy consumption and area overhead of the cross-point array. As shown in Figure 11(a), for a  $512 \times 512$  array, the energy consumption for the write operation decreases dramatically with the scaling of nonlinearity coefficient  $K_r$ . For example, for a ReRAM cell with write current of 50uA, the write energy is reduced by 98.3% when  $K_r$  increases from 2 to 40. The area overhead of the voltage drivers is illustrated in Figure 11(b). As a baseline design  $(K_r = 20 \text{ and } I_w = 40\mu A)$ , the driver area overhead is about 35% of the area of the memory array cells. To design a memory array with an effective cell size close to  $4F^2$ , we need to make sure the nonlinearity and write current should satisfy certain conditions so that the driver overhead is less than 100% and the wordline drivers can be almost "hidden" underneath the ReRAM cells. As nonlinearity and write current continues to scale, the area overhead can be as low as 10%. In that case, the introduction of 3D stacking of multi-layer cross-point arrays is productive in further reduce the effective cell size to  $4/N_lF^2$  where  $N_l$  is the number of layers.

Unlike the write operation, the read operation suffers, rather than benefits, from scaling of nonlinearity or write current. This is because the scaling of nonlinearity and write current will reduce read current, degrading the read signal ratio. Figure 12(a) shows the read noise margin with different array sizes for the baseline design in Section III. As can be seen, the read noise margin is reduced for large array sizes. The impact of nonlinearity and write current on read noise margin is illustrated in Figure 12(b). A large  $K_r$  value and small write current are harmful to the read noise margin. For example, given a  $512 \times 512$  array, the read noise margin is less than 10mV for  $K_r = 40$  and  $I_w = 40\mu A$ , which makes it very difficult to sense the state of the selected memory cell using traditional sense amplifiers.

Therefore, by given the array size and read noise margin constraints, an "optimal cell" with nonlinearity of  $K_{r\_opt}$  and write current of  $I_{on\_opt}$  can be determined. For example, when the array size is fixed at  $512 \times 512$  and the minimum noise margin is 50mV, a cross-point array with ReRAM cells which have  $K_{r\_opt} = 9$  and  $I_{on\_opt} = 40mA$  is the most energy and area efficient design.



Figure 12. Read noise margin with (a) different array size and (b) scaling of nonlinearity and write current.

#### V. CROSS-POINT RERAM MACRO DESIGN??

Since the array size of a cross-point ReRAM array is strictly limited by reliability requirements, the design of a ReRAM macro is greatly different from the traditional DRAM design. A crosspoint ReRAM macro is implemented by establishing a large amount of small cross-point arrays with appropriate peripheral circuity and organizations. In this section, we evaluate the area, energy consumption, and bandwidth of a 256 Mbits ReRAM macro. We apply the similar memory organization as Kawahara's work [4]. The 256 Mbits ReRAM macro consists of eight planes, each of which is 32 Mbits. Each plane has separate wordline decoder, bitline selectors, sense amplifiers, and write circuity. Due to limitations of space, we only present the results of ReRAM macro implemented by four different typical cell parameters:  $(Kr = 20, I_w = 40uA)$ ,  $(Kr = 20, I_w = 40uA)$ 200uA),  $(Kr = 40, I_w = 40uA)$ , and  $(Kr = 40, I_w = 200uA)$ . For each of them, we vary the number of bit per write to investigate the relation among the area, energy consumption, and bandwidth of the ReRAM macro.

Figure 14 shows the total area, energy consumption, and bandwidth of the 256 Mbits ReRAM macro. Clearly, consistent to our previous discussion, the total area and energy consumption of the ReRAM macro increase with the increase of nonlinearity, the scaling of write current, as well as the increase of the number of array-level bits per write. Besides, the bandwidth also has the similar trend as area and energy consumption. This observation implies that we have to either sacrifice the area efficiency or increase the energy budget to improve the bandwidth of the ReRAM macro.

The bandwidth-per-Joule is another important measurement of the energy efficiency of the memory macro. Figure V shows the bandwidth-per-Joule of the ReRAM macro. For a certain ReRAM cell, the single-bit write operation always has the best bandwidth-per-Joule value. It is because the multi-bit write operation requires a two-step write method, which almost doubles the latency of write operation and therefore impact the bandwidth-per-Joule value. Besides, for the multi-bit write operation, the bandwidth-per-Joule do not increase monotonically with the increase of the number of bit per write. According to Figure V, an optimal write scheme exists for each of the different cell parameters. For example, for our baseline design with Kr=20 and  $I_w=40uA$ , write 32 bits per access will result in the best bandwidth-per-Joule value. Therefore, to design a bandwidth-per-Joule efficient ReRAM macro, the optimal scheme should be carefully decided.

# VI. CONCLUSION

ReRAM is a promising candidate for next-generation non-volatile memory technology. The area efficient cross-point structure is the

|    |        |                | Number of bit per write at array level |        |        |        |        |         |         |         |
|----|--------|----------------|----------------------------------------|--------|--------|--------|--------|---------|---------|---------|
| Kr | lw(uA) |                | 1                                      | 2      | 4      | 8      | 16     | 32      | 64      | 128     |
| 20 | 40     | Area(mm^2)     | 3.89                                   | 3.94   | 4.06   | 4.29   | 4.75   | 5.69    | 7.54    | 11.75   |
|    |        | Energy(nJ)     | 4.38                                   | 12.38  | 19.86  | 33.66  | 59.43  | 111.40  | 232.44  | 576.50  |
|    |        | Bandwidth(MB   | 66.69                                  | 72.62  | 144.75 | 287.53 | 567.29 | 1103.99 | 2089.85 | 3649.58 |
| 20 | 200    | Area(mm^2)     | 6.51                                   | 6.82   | 7.42   | 8.64   | 11.10  | 16.14   | 27.29   |         |
|    |        | Energy(nJ)     | 24.58                                  | 67.13  | 106.85 | 182.41 | 338.78 | 716.14  | 1845.36 |         |
|    |        | Bandwidth(MB   | 90.04                                  | 113.03 | 217.35 | 401.20 | 685.22 | 1018.79 | 1213.96 |         |
| 40 | 40     | Area(mm^2)     | 3.62                                   | 3.67   | 3.78   | 3.99   | 4.75   | 5.69    | 7.54    | 11.43   |
|    |        | Energy(nJ)     | 2.06                                   | 5.55   | 8.71   | 14.49  | 25.60  | 49.32   | 107.52  | 280.12  |
|    |        | Bandwidth(MB   | 69.59                                  | 74.31  | 148.11 | 294.20 | 580.36 | 1129.13 | 2136.16 | 3777.48 |
| 40 | 200    | Area(mm^2)     | 3.98                                   | 4.29   | 4.89   | 6.09   | 8.46   | 13.33   | 24.09   |         |
|    |        | Energy(nJ)     | 11.64                                  | 29.74  | 46.94  | 80.81  | 155.03 | 343.66  | 933.23  |         |
|    |        | Bandwidth(IVIB | 115.78                                 | 131.83 | 253.69 | 469.24 | 800.28 | 1174.24 | 1362.14 |         |

Figure 13. Area, energy, and bandwidth results of 256 Mbits ReRAM macro.



Figure 14. Bandwidth-per-Joule of 256 Mbits ReRAM macro.

most attractive memory organization for ReRAM memories. However, problems inherent in the cross-point structure, such as the existence of sneak current and voltage drops along the wires introduce challenges to the design of reliable ReRAM cross-point memory arrays. In this paper, we use a mathematical model to study in detail how reliability affects the array organization, size, energy consumption, and area overheads of cross-point arrays. The simulation results show that multi-bit write operation is more energy efficient than single-bit write operation, and therefore is more suitable for energyconstrained design. However, for an area-constrained design, singlebit write operation is better. Besides, we point out that both increasing nonlinearity and scaling of write current of the ReRAM cell can reduce the energy consumption and area overhead significantly, and it is favorable for large, energy efficient ReRAM design. According to our macro-level analysis, we figure out that we have to either sacrifice the area efficiency or increase the energy budget to improve the bandwidth of the ReRAM macro.

# REFERENCES

- S. S. Sheu et al., "A 4Mb embedded SLC resistive-ram macro with 7.2ns read-write random-access time and 160ns MLC-access capability," in Proc. of International Solid-State Circuits Conference (ISSCC), 2011, pp. 200–202.
- $\label{eq:com/news/2010/jul-sep/memristorhynix.html.} \end{com/news/2010/jul-sep/memristorhynix.html.} \end{com/news/2010/jul-sep/mem$
- [3] C. Chevallier et al., "A 0.13 um 64Mb multi-layered conductive metaloxide memory," in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), Feb 2010, pp. 260 –261.
- [4] A. Kawahara et al., "An 8Mb Multi-Layered Cross-Point ReRAM Macro with 443MB/s Write Throughput," in Proc. of IEEE International Solid-State Circuits Conference (ISSCC), Feb 2012, pp. 432 –433.
- [5] M. Ziegler and M. Stan, "Design and analysis of crossbar circuits for molecular nanoelectronics," in *Proc. of the 2nd IEEE Conference on Nanotechnology*, 2002, pp. 323–327.
- [6] A. Flocke et al., "A fundamental analysis of nano-crossbars with non-linear switching materials and its impact on TiO2 as a resistive layer," in *IEEE Conf. on Nanotechnology*, 2008, pp. 319 –322.

- [7] J. Liang and H.-S. Wong, "Cross-point memory array without cell selectors -device characteristics and data storage pattern dependencies," *IEEE Transactions on Electron Devices*, vol. 57, no. 10, pp. 2531 –2538, Oct 2010.
- [8] M. Ziegler and M. Stan, "CMOS/nano co-design for crossbar-based molecular electronic systems," *IEEE Transactions on Nanotechnology*, vol. 2, no. 4, pp. 217 – 230, Dec 2003.
- [9] O. Kavehei et al., "An analytical approach for memristive nanoarchitectures," *IEEE Transactions on Nanotechnology*, vol. 11, no. 2, pp. 374

  –385. Mar 2012.
- [10] D. B. Strukov et al., "The missing memristor found," Nature, 2008.
- [11] L. Chua, "Memristor-the missing circuit element," *IEEE Transactions on Circuit Theory*, no. 5, Sep 1971.
- [12] J. J. Yang et al., "Memristive switching mechanism for metal/oxide/metal nanodevices," in *Nature Nanotechnology*, vol. 3, Jun 2008, pp. 429–433.
- [13] M. Kim *et al.*, "Low power operating bipolar TMO ReRAM for sub 10 nm era," in *Proc. of IEEE Int. Electron Devices Meeting (IEDM)*, Dec 2010, pp. 19.3.1 19.3.4.
- [14] W. Otsuka *et al.*, "A 4Mb conductive-bridge resistive memory with 2.3GB/s read-throughput and 216MB/s program-throughput," in *Proc. of International Solid-State Circuits Conference (ISSCC)*, 2011.
- [15] R. Meyer et al., "Oxide dual-layer memory element for scalable non-volatile cross-point memory technology," in Proc. of Non-Volatile Memory Technology Symposium (NVMTS), Nov. 2008.
- [16] C. Xu et al., "Design implications of memristor-based RRAM crosspoint structures," in Proc. of Design Automation Test in Europe Conference (DATE), 2011.
- [17] M.-J. Lee *et al.*, "Stack friendly all-oxide 3D RRAM using GaInZnO peripheral TFT realized over glass substrates," in *Prof. of IEEE Int. Electron Devices Meeting (IEDM)*, Dec 2008, pp. 1 –4.
- [18] "http://www.synopsys.com/tools/verification/amsverification/ circuitsimulation/hspice/pages/default.aspx."
- [19] H. Akinaga and H. Shima, "Resistive random access memory (ReRAM) based on metal oxides," *Proceedings of the IEEE*, vol. 98, no. 12, pp. 2237 –2251, Dec 2010.
- [20] M. Terai, Y. Sakotsubo, Y. Saito, S. Kotsuji, and H. Hada, "Memory-state dependence of random telegraph noise of Ta2O5/TiO2 stack ReRAM," *IEEE Electron Device Letters*, vol. 31, no. 11, pp. 1302 –1304, Nov 2010.