# NumPy Arrays

In [2]:
# one dimensional example
from numpy import array
# list of data
data = [11, 22, 33, 44, 55]
# array of data
data = array(data)
print(data)
print(type(data))

[11 22 33 44 55]
<class 'numpy.ndarray'>


In [3]:
# list of data
data = [[11, 22],
		[33, 44],
		[55, 66]]
# array of data
data = array(data)
print(data)
print(type(data))

[[11 22]
 [33 44]
 [55 66]]
<class 'numpy.ndarray'>


### One dimensional indexing

In [5]:
# define array
data = array([11, 22, 33, 44, 55])
# index data
print(data[0])
print(data[4])

11
55


#### Array Bounds

In [6]:
# define array
data = array([11, 22, 33, 44, 55])
# index data
print(data[5])

IndexError: index 5 is out of bounds for axis 0 with size 5

One key difference is that you can use negative indexes to retrieve values offset from the end of the array.

For example, the index -1 refers to the last item in the array. The index -2 returns the second last item all the way back to -5 for the first item in the current example.

In [7]:
# define array
data = array([11, 22, 33, 44, 55])
# index data
print(data[-1])
print(data[-5])

55
11


### Two-dimensional indexing

In [12]:
# define array
data = array([[11, 22], [33, 44], [55, 66]])
# index data
print(data[0,0])
print(data[0,1])
print(data[2,1])

11
22
66


If we are interested in all items in the first row, we could leave the second dimension index empty, for example:

In [16]:
data = array([[11, 22], [33, 44], [55, 66]])
# index data
print(data[0,])
print(data[1,])
print(data[2])

[11 22]
[33 44]
[55 66]


## Array Slicing

o far, so good; creating and indexing arrays looks familiar.
Now we come to array slicing, and this is one feature that causes problems for beginners to Python and NumPy arrays.
Structures like lists and NumPy arrays can be sliced. This means that a subsequence of the structure can be indexed and retrieved.
This is most useful in machine learning when specifying input variables and output variables, or splitting training rows from testing rows.
Slicing is specified using the colon operator ‘:’ with a ‘from‘ and ‘to‘ index before and after the column respectively. The slice extends from the ‘from’ index and ends one item before the ‘to’ index.

data[from:to]

### One-dimensional slicing

In [18]:
data = array([11, 22, 33, 44, 55])
print(data[:])

[11 22 33 44 55]


The first item of the array can be sliced by specifying a slice that starts at index 0 and ends at index 1 (one item before the ‘to’ index).

In [20]:
# define array
data = array([11, 22, 33, 44, 55])
print(data[0:1])
print(data[2:4])

[11]
[33 44]


We can also use negative indexes in slices. For example, we can slice the **last two** items in the list by starting the slice at -2 (the second last item) and not specifying a ‘to’ index; that takes the slice to the end of the dimension.

In [21]:
# define array
data = array([11, 22, 33, 44, 55])
print(data[-2:])

[44 55]


### Two-dimensional Slicing

Split Input and Output Features
It is common to split your loaded data into input variables (X) and the output variable (y).

We can do this by slicing all rows and all columns up to, but before the last column, then separately indexing the last column.

For the input features, we can select all rows and all columns except the last one by specifying ‘:’ for in the rows index, and :-1 in the columns index.

In [22]:
# define array
data = array([[11, 22, 33],
		[44, 55, 66],
		[77, 88, 99]])
# separate data
X, y = data[:, :-1], data[:, -1]
print(X)
print(y)

[[11 22]
 [44 55]
 [77 88]]
[33 66 99]


In [30]:
data.shape

(3, 3)

In [23]:
data[:-1, :-1]

array([[11, 22],
       [44, 55]])

In [24]:
data[:, :-1]

array([[11, 22],
       [44, 55],
       [77, 88]])

In [25]:
data[0:1, :]

array([[11, 22, 33]])

In [26]:
data[1:2, :]

array([[44, 55, 66]])

In [27]:
data[2:3, :]

array([[77, 88, 99]])

In [29]:
data[2:3, :-1].shape

(1, 2)

### Split Train and Test Rows

It is common to split a loaded dataset into separate train and test sets.
This is a splitting of rows where some portion will be used to train the model and the remaining portion will be used to estimate the skill of the trained model.
This would involve slicing all columns by specifying ‘:’ in the second dimension index. The training dataset would be all rows from the beginning to the split point.

In [33]:
split=2
#dataset
train = data[:split, :]
test = data[split:, :]

In [34]:
train

array([[11, 22, 33],
       [44, 55, 66]])

In [35]:
test

array([[77, 88, 99]])

In [36]:
# define array
data = array([[11, 22, 33],
		[44, 55, 66],
		[77, 88, 99]])
# separate data
split = 2
train,test = data[:split,:],data[split:,:]
print(train)
print(test)

[[11 22 33]
 [44 55 66]]
[[77 88 99]]


## Array Reshaping

After slicing your data, you may need to reshape it.

For example, some libraries, such as scikit-learn, may require that a one-dimensional array of output variables (y) be shaped as a two-dimensional array with one column and outcomes for each column.

Some algorithms, like the Long Short-Term Memory recurrent neural network in Keras, require input to be specified as a three-dimensional array comprised of samples, timesteps, and features.

It is important to know how to reshape your NumPy arrays so that your data meets the expectation of specific Python libraries. We will look at these two examples.

## Data Shape

NumPy arrays have a shape attribute that returns a tuple of the length of each dimension of the array.

In [38]:
# define array
data = array([11, 22, 33, 44, 55])
print(data.shape)

(5,)


Running the example prints a tuple for the one dimension.

A tuple with two lengths is returned for a two-dimensional array.

In [40]:
# list of data
data = [[11, 22],
		[33, 44],
		[55, 66]]
# array of data
data = array(data)
print(data.shape)

(3, 2)


Running the example returns a tuple with the number of rows and columns.

You can use the size of your array dimensions in the shape dimension, such as specifying parameters.

The elements of the tuple can be accessed just like an array, with the 0th index for the number of rows and the 1st index for the number of columns. For example:

In [41]:
# list of data
data = [[11, 22],
		[33, 44],
		[55, 66]]
# array of data
data = array(data)
print('Rows: %d' % data.shape[0])
print('Cols: %d' % data.shape[1])

Rows: 3
Cols: 2


### Reshape 1D to 2D Array

It is common to need to reshape a one-dimensional array into a two-dimensional array with one column and multiple arrays.

NumPy provides the reshape() function on the NumPy array object that can be used to reshape the data.

The reshape() function takes a single argument that specifies the new shape of the array. In the case of reshaping a one-dimensional array into a two-dimensional array with one column, the tuple would be the shape of the array as the first dimension (data.shape[0]) and 1 for the second dimension.

data = data.reshape((data.shape[0], 1))

In [46]:
# reshape 1D array
from numpy import array
from numpy import reshape
# define array
data = array([11, 22, 33, 44, 55])
print(data.shape)
# reshape

(5,)


In [47]:
data

array([11, 22, 33, 44, 55])

In [48]:
data = data.reshape((data.shape[0], 1))
print(data.shape)

(5, 1)


Running the example prints the shape of the one-dimensional array, reshapes the array to have 5 rows with 1 column, then prints this new shape.

In [49]:
data

array([[11],
       [22],
       [33],
       [44],
       [55]])

### Reshape 2D to 3D Array

It is common to need to reshape two-dimensional data where each row represents a sequence into a three-dimensional array for algorithms that expect multiple samples of one or more time steps and one or more features.

A good example is the LSTM recurrent neural network model in the Keras deep learning library.

The reshape function can be used directly, specifying the new dimensionality. This is clear with an example where each sequence has multiple time steps with one observation (feature) at each time step.

We can use the sizes in the shape attribute on the array to specify the number of samples (rows) and columns (time steps) and fix the number of features at 1.

In [65]:
data = [[11, 22],
		[33, 44],
		[55, 66]]
print(type(data))

<class 'list'>


Note that it's a list.....

In [66]:
# array of data
data = array(data)
print(type(data))
print(data.shape)
# reshape
data = data.reshape((data.shape[0], data.shape[1], 1))
print(data.shape)

<class 'numpy.ndarray'>
(3, 2)
(3, 2, 1)


Reshape a 3D array to 1D (flatten)

In [54]:
data = [[1,2,3], [4,5,6], [7,8,9]]

In [56]:
data = array(data)

In [69]:
data.shape

(6,)

### Reshape 3D array to 1D (flatten)

In [70]:
data= data.flatten()

In [71]:
data

array([11, 22, 33, 44, 55, 66])