## **Supplementary Information**

### **Supplementary Figures**



Supplementary Figure 1. Forming procedure. (a) Flow diagram of the automatic current-controlled memristor forming procedure. (In the voltage-controlled forming algorithm, currents should be replaced with corresponding voltages.) The adjustment of  $I_{\text{stop}}$ ' value was so far performed manually after the failure to form a device automatically (in ~10% of all cases). (b) All forming I-V curves for one of the crossbars used in the experimental demonstration (with  $I_{\text{start}} = 180 \, \mu\text{A}$ ,  $I_{\text{stop}} = 540 \, \mu\text{A}$ ,  $I_{\text{step}} = 20 \, \mu\text{A}$ ,  $V_{\text{reset}} = -1.3 \, \text{V}$ ,  $A_{\text{min}} = 5$ ).



**Supplementary Figure 2. Experimental setup and board details.** (a) Circuit diagram of the implemented neurons. Note that the output scaling stage is not implemented in the output neurons; (b) Photos of the two printed circuit boards with one hosting wire-bonded memristive crossbar chips and the switching matrix and the other one implementing discrete CMOS neurons; (c) Block diagram of the experimental setup controlled by a personal computer.

(a)



(b)



**Supplementary Figure 3. Results for smiley face tuning experiment.** (a) Absolute device resistances and (b) absolute tuning error.



**Supplementary Figure 4. Pattern classification test set**. (a-d) A complete set of 640 test patterns for four letters used in the pattern classification experiment.



**Supplementary Figure 5. Perceptron software simulation results**. (a) Comparison of the best fidelity obtained for single layer perceptron and MLPs with different number of hidden layer neurons (shown in parenthesis in the legend). (b, c) The results for 10-hidden layer perceptron, similar to the one used in the experiment for classification of (b) training and (c) test patterns. The normalized weight import error (Error) was modeled by using a random variate generated from uniform distribution [ $W_{ideal}$  -  $W_{ideal}$ \*Error/100,  $W_{ideal}$  +  $W_{ideal}$ \* Error/100], where  $W_{ideal}$  is the desired weight value. Such import error approach approximates well the resulting conductance distribution for relatively crude tuning accuracy, e.g. 30% that was used in our experiment. The red, blue (rectangles), and black (segment) markers denote, respectively, the median, the 25%-75% percentile, and the minimum and maximum values for 100 simulation runs.



**Supplementary Figure 6. Tuning results for classifier experiment.** (a, b) Tuning accuracy and (c, d) weight errors for each of the two layers of the implemented MLP network. The data for tuning accuracy are replotted from Fig. 5a,b of the main text. The tuning accuracy is defined here again as the normalized difference between the desired and actual conductances. The shown weight error is calculated as a (not normalized) difference between the desired value and the actual one implemented with the pair of memristors. Note that the weights and conductances in the second layer are always close to their maximum or minimum values, because of the clipping enforced during software ex-situ training.



Supplementary Figure 7. In-situ training for 3-pattern classification ('A', 'V', and 'T'). (a) Experimentally measured and simulated error decay dynamics for the training set patterns. In experiment, conductances of all memristors were updated, one row of the crossbar at a time, at the end of each epoch. The weight update in each row was done in parallel in two steps by applying 500- $\mu$ s fixed amplitude ( $\pm$  1.3 V) voltage pulses using V/2 biasing technique. (b) Example of devices' switching kinetics and it's variations obtained using simple device model from Ref. [1]. Such model was used for the in-situ training simulations shown in panel a – see supplementary matlab code for more details.



**Supplementary Figure 8. Voltage drop in resistor ladder.** (a) The considered circuit and (b) the relative worst-case voltage drop for several representative parameters specific to the implemented crossbar circuits. AR stands for the electrode height-to-width aspect ratio.



**Supplementary Figure 9. Temperature sensitivity**. (a) The *I-V* curves of a single memristor for several temperatures and (b) the extracted temperature dependence of its conductance.



Supplementary Figure 10. Target conductances for additional tuning experiment. The sequence of target conductance values, exponentially spaced between 10.5  $\mu$ S and 100  $\mu$ S, that were used in the additional tuning experiment.



**Supplementary Figure 11. Experimental results for repeated tuning.** The data are shown for 5 crossbar integrated memristors. (a) Each dot shows final measured value for the tuned conductance. One cycle corresponds to tuning of all 5 memristors to 13 specific conductance values, as shown in Supplementary Figure 10. (b) Corresponding tuning error histogram, shown separately for each device. The tuning error is defined as a normalized difference between the desired and actual conductance. Bins are 2.67 % wide.



Supplementary Figure 12. Experimental results for high-precision tuning. (a-e) The data shows the results of tuning conductances of 5 crossbar integrated memristors to 32 exponentially spaced levels within 7.5-75 k $\Omega$  range (at 0.2 V) with 2.5% tuning accuracy. Each panel shows histograms of tuning the same memristors 20 times to each level. The dashed lines are normal fits for the experimental data.

#### **Supplementary Note 1: Crossbar Circuit Scaling**

An important future work, in addition to the monolithic integration with CMOS subsystem discussed in the main text, is increasing the dimensions of the crossbar circuits which would allow higher connectivity among neurons and improve integration density (i.e. by lowering relative peripheral overhead). Here let us first stress again that in our implementation, crossbar lines are never floated so that sneak path currents do not affect directly the measured currents at the outputs. Scaling up crossbar dimensions, however, increases currents flowing in the crossbar lines. Because of the potential voltage drops across the crossbar lines the voltages applied to the crosspoint memristors could be different from the ones applied at the periphery.

For example, Supplementary Figure 8 shows the dependence of the worse-case voltage drop as a function of the length of the finite resistor ladder, which is useful for analyzing crossbar circuit operation. In this figure, one set of lines shows the voltage drop assuming electrode resistance per wire segment  $(R_w)$  comparable to the one in our experiment, while the other one is for more aggressive (though quite realistic) parameters which are representative of high-aspect ratio copper wires. For simplicity, the memristor conductances G(V) can be estimated using the corresponding average value measured at bias V, specific to the type of considered operation. It should be noted that in a properly trained network, the weights are typically normally distributed so that the representative average value is rather close to the minimum of the used conductance range.

Let us now consider in detail three operations which might be impacted by voltage drop on the crossbar lines, namely classifier inference, and read and write phases of the tuning algorithm:

#### Write operation

Naturally, the voltage drops are the most significant for write operation because of the larger voltages applied and higher currents passed. For the conductance tuning, however, we do not rely on precise conductance update with write pulses but rather adjust applied write voltages gradually based on precise read measurements. Therefore, any potential voltage drop will be compensated dynamically during tuning by applying larger voltage pulses, with the largest applied voltage (and hence crossbar dimensions) limited by the condition of not disturbing half-selected devices.

Specifically, let us assume the V/3 biasing scheme, i.e. with  $\pm V_W/2$  applied to the selected lines and  $\pm V_W/6$  to the remaining lines. From Fig. 1c and 2, up to  $(V_{TH}^{SET})_{max} \approx +1.3$  V set and

 $(V_{\rm TH}^{\rm RESET})_{\rm max} \approx -1.9 \text{ V}$  reset voltages must be applied to switch the devices with the largest switching thresholds. (Here, we neglect the tails of the distributions on Fig. 2, which are typically contributed by the devices at the edges of the array. This is similar to the dummy line technique commonly used in conventional memories.) The corresponding average memristor conductances at one third of such biases can be roughly estimated to be  $\langle G((V_{\text{TH}}^{\text{SET}})_{\text{max}}/3) \rangle \approx 30 \,\mu\text{S}$  for set and  $\langle G((V_{\rm TH}^{\rm SET})_{\rm max}/3) \rangle \approx 50 \ \mu S$  for reset transitions. On the other hand, the largest voltages, which can be safely applied to the half-selected devices without disturbing memristors with the smallest switching thresholds are  $(V_{\text{TH}}^{\text{SET}})_{\text{min}} \approx +0.7 \text{ V}$  for set and  $(V_{\text{TH}}^{\text{RESET}})_{\text{min}} \approx -1 \text{ V}$  for reset transitions. The maximum crossbar dimensions, specific to the wire resistance, memristor *I-V* and its variations (i.e. parameters  $R_{\rm w}$ ,  $G((V_{\rm TH})_{\rm max}/3)$ ,  $(V_{\rm TH})_{\rm max/min}$ ) can be crudely estimated assuming  $100\times(3(V_{\rm TH})_{\rm min})$ -  $(V_{\rm TH})_{\rm max}$  // $(V_{\rm TH})_{\rm max}$  / 2 as the largest allowable relative voltage drop in Supplementary Figure 8b. (Additional factor of 2 in the denominator accounts for the drop on both selected lines.) For the considered parameters, this drop is equal to 30% and 25% for set and reset switching, respectively, indicating to the possibility of implementing 70×70 crossbar arrays with demonstrated device technology and up to 400×400 crossbar array for the crossbar arrays with improved electrode resistance. (Note that in our work, we have used somewhat simpler, the V/2 biasing scheme, for which the largest allowable voltage drop is ~ 7% and the corresponding maximum crossbar dimensions are around 40×40 and 200×200 for two considered electrode resistances.)

#### Read operation

Let us assume that during read operation, one of the selected lines is biased at  $+V_R$ , while the other selected line and all of the remaining ones are grounded. (This is exactly the scheme that we used for conductance tuning in this work.) In this case, the current running via grounded selected crossbar line is small (only contributed by one selected memristor) and does not dependent on the crossbar dimensions. Therefore, the substantial voltage drops may occur only on the biased selected crossbar line. Such voltage drop would be naturally much less than that of the write operation and, moreover, it can be easily taken into account when reading the state of the devices. For example, it is straightforward to compute the actual applied voltage across the specific memristor knowing the conductive states of all other half-selected devices of the biased selected crossbar electrode.

#### Inference operation

As discussed in main text, during inference, one set of lines (vertical in Figure 3a) receive voltages  $V \le V_R$ , while all orthogonal lines are virtually grounded. Because of the smaller applied voltages, the crossbar line currents, and hence the corresponding voltage drops, are the smallest for inference operations. However, the inference operation (just like read) is more sensitive as compared to write operation to the voltage variations and even small voltage drops may lead to the lower effective precision of the vector-by-matrix computation. For example, assuming representative 10  $\mu$ S average device conductance, and  $70\times70$  and  $400\times400$  crossbar arrays discussed in write operation above, the worst-case voltage drop on one line is around 7% (Supplementary Figure 8b).

Using our examples, inference operations would likely be a limiting factor for scaling though are several reserves for improvements. For example, the conductances of each memristor can be uniquely increased to compensate for the potential voltage drops during inference. (Unlike read operation, such adjustment cannot be exact because of the input-dependent voltage drop on the virtually-grounded lines.) The loss of precision for the worst case largest currents might be also acceptable, e.g. if it leads to the saturation of the neuron. It is also important to note that precision loss at inference due to voltage drops is common problem for the devices with or without selectors. If fact, the problem is likely more severe for 1T1R structures, because of their larger device area and potentially larger  $R_w$ .

The crude estimate above show that the developed device technology, with some further optimization of the electrodes, should be suitable for implementing much larger, up to  $400\times400$  crossbar circuit. The discussed analysis is also applicable to 10 nm memristors, if we assume that both the resistance of the crossbar line segment and memristor operating (average) currents would scale down at the same rate. (For that memristor currents should decrease at slightly faster rate than its linear device dimensions to compensate for the additional increase in metal resistivity due to scattering effects.) That is certainly plausible scenario for smaller currents at voltages below  $V_R$  (e.g., relevant to the inference operation and read phase of the tuning algorithm) considering that the off-state conductance is typically limited by the device leakages which are proportional to the device electrode area. Ensuring the same scaling in the context of the write phase of the tuning algorithm would require enhancing I-V nonlinearity and/or decreasing write currents, which we believe is also plausible given the observed write current dependence on the electrode area in our devices and further optimization of the tunneling barrier layer.

#### Supplementary Note 2: Device Programmability and Uniformity

We have performed a number of additional experiments to characterize device to device variations in tunability. In the first experiment, we have repeatedly tuned 5 crossbar integrated devices to the same set of conductance levels. Specifically, in one cycle each device was sequentially tuned to 13 exponentially-spaced values within 10.5-100  $\mu$ S range of conductances (measured at 0.2 V), which is a typical operating range utilized during inference computation. The first target conductance value was 10.5  $\mu$ S. It was then increased to 100  $\mu$ S in 6 steps before decreasing it back to 10.5  $\mu$ S, also in 6 steps (Supplementary Figure 10). Such tuning cycle was repeated about 550 times in the same order for every device. For the tuning algorithm, the write pulse polarity and magnitudes were selected according to the tuning algorithm described in Ref. 2. We used 0.2 V 100  $\mu$ s read pulses with 25  $\mu$ s rise and fall times. Each measured current value during read operation was an average of 10,000 samples (taken every 5 ns) within 50  $\mu$ s read pulse.

The sequences of tuned conductances are presented in Supplementary Figure 11a, while the corresponding histograms for the aggregate tuning error for all devices are shown on Supplementary Figure 11b. To speed up measurements, the tuning precision was always set to 7.5%, while the maximum number of write/read pulses was set to 300. As Supplementary Figure 11 shows, in some cases the tuning accuracy was worse than the desired one due to reaching maximum number of tuning iterations. Tuning accuracy was also somewhat worse for lower values of the desired conductances, likely due to larger temporal fluctuations of read currents. The data do not show noticeable degradation in tuning accuracy over time. Note that Supplementary Figure 11a shows final values of the measured tuned conductances. Tuning to each state involved 45 write/read pulses on average, so, altogether, each device was stressed with write pulse almost 300,000 times in this experiment.

In the next experiment (Supplementary Figure 12), we tuned 5 devices with much higher, 2.5% tuning accuracy to 32 conductance levels, which were exponentially spaced within similar 7.5-75 k $\Omega$  range (at 0.2 V). Each device was tuning 20 times to each level. The data shows that most of the devices, most of the time, can be set closely to the desired states with significant margins between adjacent levels. Some of the devices at some states, however, cannot be tuned accurately. We expect that tuning accuracy would significantly improve with better control over

the shape and duration of the write pulses, which would be possible in tightly integrated CMOS/memristor circuits. Also, the infrequent nonideal behavior can be coped with various circuit and algorithmic techniques, e.g. by dynamically adjusting the conductances in differential pairs.

#### **Supplementary References**

- 1. Prezioso, M. *et al.* Modeling and implementation of firing-rate neuromorphic-network classifiers with bilayer Pt/Al<sub>2</sub>O<sub>3</sub>/TiO<sub>2-x</sub>/Pt memristors. In *Proc. IEEE International Electron Devices Meeting* 455-458 (2015).
- 2. Alibart, F. *et al.*, High-precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. *Nanotechnology* **23**, 075201 (2012).