

## Numpy Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

NumPy’s array class is called **ndarray**. It is also known by the alias **array**. There are several ways to create arrays.

**ndarray**: Is a multidimensional, homogeneous array of fixed-size items. In Numpy dimensions are called axes.
The number of axes is called rank.


The most important attributes of an ndarray object are:

* **ndarray.ndim** - the number of axes (dimensions) of the array.
* **ndarray.shape** - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m).
* **ndarray.size** - the total number of elements of the array.
* **ndarray.dtype** - numpy.int32, numpy.int16, and numpy.float64 are some examples.
* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8)

To use numpy need to import the numpy module as follows.

In [1]:
import numpy as np # naming import convention

## Creating numpy arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples or
* using functions that are dedicated to generating numpy arrays, such as arange, linspace, etc.

### From lists
For example, to create new vector and matrix arrays from Python lists we can use the numpy.array function.

In [2]:
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [3]:
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

The v and M objects are both of the type ndarray that the numpy module provides.The difference between the v and M arrays is only their shapes.

In [4]:
print('Shape of v: ', np.shape(v))
print('Shape of M: ', np.shape(M))

Shape of v:  (4,)
Shape of M:  (2, 2)


Alternatively, We can get information about the shape of an array by using the ndarray.shape property :

In [5]:
M.shape

(2, 2)

Equivalently, we can get information about the size of the two ndarrays, namely the total number of elements in the array.

In [6]:
M.size

4

In [7]:
v.size

4

Similary we can use python list to create numpy matrix

In [8]:
c = np.matrix([[0,2,4],
               [1,5,-2],
               [1,0,1]])
c

matrix([[ 0,  2,  4],
        [ 1,  5, -2],
        [ 1,  0,  1]])

## Note: 
Numpy matrices are strictly 2-dimensional, while numpy arrays (ndarrays) are N-dimensional.
The main advantage of numpy matrices is that they provide a convenient notation for matrix multiplication: if a and b are matrices, then a*b is their matrix product.

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists.
Instead we can use one of the many functions in numpy that generates arrays of different forms.

Some of the more common are:

* np.arange;
* np.linspace;
* np.logspace;
* np.random.rand;
* np.diag;
* np.zeros;
* np.ones;
* np.empty;


#### np.arange

In [9]:
#Evenly spaced array (arange)
a = np.arange(10) # 0 .. n-1  (!)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]:
b = np.arange(0, 2, 0.5) # start, end (exclusive), step
b

array([ 0. ,  0.5,  1. ,  1.5])

#### np.linspace and np.logspace

In [11]:
# using linspace, both end points **ARE included**
c = np.linspace(0, 1, 10)
c

array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])

In [12]:
# equally spaced values on a logarithmic scale, use logspace.
d = np.logspace(0,1,5) 
d

array([  1.        ,   1.77827941,   3.16227766,   5.62341325,  10.        ])

In [13]:
np.logspace(0, 4, 5, base=2) # specify the logarithmic base, by default is base 10

array([  1.,   2.,   4.,   8.,  16.])

* ### Use common array

In [14]:
# Create a 4x4 array with integer zeros
a = np.zeros((4, 4))
print(a)

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [15]:
# Create a 4x4 array with integer 1
b = np.ones((4, 4))
print(b)

[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]


In [16]:
# Create a diagonal matrix
c = np.diag(np.array([1, 2, 3, 4]))
print(c)

[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]


In [17]:
# Create Identity matrix of size 4
d = np.eye(4)
print(d)

[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]


### np.random.rand & np.random.randn

In [18]:
# uniform random numbers in [0,1]
np.random.rand(4)

array([ 0.30430794,  0.04611675,  0.30377896,  0.04146715])

In [19]:
# standard normal distributed random numbers of shape (1,2)
np.random.randn(2,2)

array([[ 0.26977971, -1.19143646],
       [ 0.36018203,  0.63815272]])

### np.diag

In [20]:
# a diagonal matrix
np.diag([1,2,3,4])

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

### np.eye

In [21]:
# a diagonal matrix with ones on the main diagonal
np.eye(3)  # 3 is the

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

### np.zeros and np.ones

In [22]:
np.zeros((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [23]:
np.ones((3, 2))

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

# Numpy vs Python List

So far the numpy.ndarray looks awefully much like a Python list (or nested list).
Why not simply use Python lists for computations instead of creating a new array type?

There are several reasons:

* Python lists are very general.
* They can contain any kind of object.
* They are dynamically typed.
* They do not support mathematical functions such as matrix and dot multiplications, etc.
* Implementing such functions for Python lists would not be very efficient because of the dynamic typing.

Numpy arrays on the other hand are statically typed and homogeneous. The type of the elements is determined when array is created.Numpy arrays are memory efficient.

Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

## Exercises

**Simple arrays**

* Create simple one and two dimensional arrays. First, redo the examples from above. And then create your own.
* Use the functions len, shape and ndim on some of those arrays and observe their output.

**Creating arrays using functions**

* Experiment with arange, linspace, ones, zeros, eye and diag.
* Create different kinds of arrays with random numbers.
* Try setting the seed before creating an array with random values
 hint: use np.random.seed
* Look at the function np.empty. What does it do? When might this be useful?


## Basic Data Type
You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. 2. vs 2). This is due to a difference in the data-type used. The default data type is floating point:


In [24]:
a = np.array([1, 2, 3])
a.dtype

dtype('int64')

In [25]:
b = np.array([1., 2., 3.])
b.dtype

dtype('float64')

### Note
Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input.

You can explicitly specify which data-type you want:

In [26]:
c = np.array([1, 2, 3], dtype=float)
c.dtype

dtype('float64')

# Indexing and slicing 


We can index elements in an array using the square bracket and indices. The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists):

In [27]:
 # Create an array of temp sensor data for seven day
tempData = np.array([28, 32, 26.5, 31.6, 27.8, 29, 30, 27, 28, 28])

In [28]:
#print the first sensor data
print(tempData[0])

28.0


In [29]:
#print the sensor data between index 3 and 7
print(tempData[3:7])

[ 31.6  27.8  29.   30. ]


### Note that the last index is not included! :

In [30]:
#print the last three data

print(tempData[7:])

[ 27.  28.  28.]


In [31]:
# The first three sensor data
print(tempData[:2])

[ 28.  32.]


## Multidimensional array
Multidimensional array behaves like a dataframe or matrix (i.e. columns and rows)

In [32]:
#Create an array of rtemp sensor  data for three day
tempData2 = np.random.randint(27,32, size=10)
tempData3 = np.random.randint(26,33.5, size=10)
data = np.array([tempData, tempData2, tempData3])

In [33]:
data

array([[ 28. ,  32. ,  26.5,  31.6,  27.8,  29. ,  30. ,  27. ,  28. ,  28. ],
       [ 28. ,  27. ,  30. ,  30. ,  27. ,  30. ,  29. ,  30. ,  31. ,  28. ],
       [ 30. ,  26. ,  29. ,  29. ,  27. ,  32. ,  32. ,  32. ,  27. ,  29. ]])

In [34]:
# View the first column of the matrix
data[:,0]

array([ 28.,  28.,  30.])

In [35]:
# View the second row of the matrix
data[1,]

array([ 28.,  27.,  30.,  30.,  27.,  30.,  29.,  30.,  31.,  28.])

In [36]:
# View the top-right quarter of the matrix
data[:2,]

array([[ 28. ,  32. ,  26.5,  31.6,  27.8,  29. ,  30. ,  27. ,  28. ,  28. ],
       [ 28. ,  27. ,  30. ,  30. ,  27. ,  30. ,  29. ,  30. ,  31. ,  28. ]])

In [37]:
#View the first the data
data[0,0]

28.0

We can assign new values to elements in an array using indexing:

In [44]:
data[1:4, 1:4]

array([[ 28.,  28.,  30.],
       [ 30.,  31.,  29.]])

## Basic operations and Universal Numpy Functions

All numpy arithmetic operates element wise and are much faster than if you did them in pure python.

### Scalar-array operations


In [39]:
a = np.array([1, 2, 3, 4])

In [40]:
a + 2.5

array([ 3.5,  4.5,  5.5,  6.5])

In [41]:
b = np.ones(4) + 1

In [42]:
b

array([ 2.,  2.,  2.,  2.])

In [43]:
a - b

array([-1.,  0.,  1.,  2.])

Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum. For more details on numpy function visit [here](https://docs.scipy.org/doc/numpy/reference/routines.math.html).

In [44]:
### Computing sums
x = np.array([[1,2],[3,4]])
x

array([[1, 2],
       [3, 4]])

In [45]:
# Compute sum of all elements; prints "10"
np.sum(x)  

10

In [46]:
# Compute sum of each column; prints "[4 6]"
np.sum(x, axis=0)  

array([4, 6])

In [47]:
# Compute sum of each row; prints "[3 7]"
np.sum(x, axis=1) 

array([3, 7])

In [48]:
# Computing mean
x.mean()

2.5

In [49]:
# square root
np.sqrt(x)

array([[ 1.        ,  1.41421356],
       [ 1.73205081,  2.        ]])

In [54]:
x= np.array([[1,0,0,-1],
             [0,1,1,-1],
            [-1,0,1,0]])
x.shape

(3, 4)

In [55]:
y = x.T
y.shape

(4, 3)

### Numpy Exercise

1. Generate a matrix with 10 rows and 50 columns, elements being drawn from normal distribution $\mathcal{N}(1, 10)$. Specify random seed to make the result reproducible.
2. Normalize the matrix: subtract from each column its mean and divide by the standard deviation. I suggest np.mean, np.std with axis parameter.