# Creating NumPy Arrays

### *Copyright 2021-2022 Dr. George Papagiannakis,  papagian@csd.uoc.gr*
*All Rights Reserved*
### *University of Crete & Foundation for Research & Technology - Hellas (FORTH)*

This notebook is also based on parts of [Lectures on scientific computing with Python](http://github.com/jrjohansson/scientific-python-lectures) by [J.R. Johansson](http://jrjohansson.github.io). 

---

## Introduction

The `numpy` package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. 

To use `numpy` you need to import the module, using for example:

In [1]:
from numpy import *

In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. 



## Creating `numpy` arrays

There are ** a number of ways to initialize new numpy arrays**, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.
* reading data from files

### From lists

For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

In [2]:
# a vector: the argument to the array function is a Python list
v = array([1,2,3,4])
y = array([0,0,0,1])
v,y
v.shape , y.shape

((4,), (4,))

In [3]:
help(array)

Help on built-in function array in module numpy:

array(...)
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
          like=None)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
        Specify the memory layout of the array. If object is not an array

In [4]:
# a matrix: the argument to the array function is a nested Python list
M = array([[1, 2], [3, 4]])

M

array([[1, 2],
       [3, 4]])

In [5]:
position = array(
    (
        (0, .5, 0), (.5, -.5, 0), (-.5, -.5, 0)
    ), float32)

M2 = array([
    [1,0,0,0],
    [0,1,0,0],
    [0,0,1,0],
    [0,0,0,1]
])

M3 = array([
    [1,0,0,1],
    [0,1,0,1],
    [0,0,1,1],
    [0,0,0,1]
])

M4 = array([
    [1,0,0,2],
    [0,1,0,2],
    [0,0,1,2],
    [0,0,0,1]
])

T = array([
    [1],[2],[3],[1]
])
F = M2 * T
F2 = T * M2
F3 = M2.dot(T) # this is correct linear algebra multiplication!!!
#print(M2), print(T)
#print(F), print(F2)
#print(F3)
print("M2:\n", M2, " with shape:", M2.shape )
print("T:\n", T, " with shape:", T.shape )
print("F:\n", T, " with shape:", F.shape )
print("F2:\n", T, " with shape:", F2.shape )
print("F3: M2 @ T is:\n", M2 @ T)
print("M2 @ M3 is:\n", M2 @ M3)
print("M3 @ M4 is (correct matrix mult):\n", M3 @ M4)
print("M3 dot M4 is:\n", M3.dot(M4))
print("position:\n", position, " with shape:", position.shape )


M2:
 [[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]  with shape: (4, 4)
T:
 [[1]
 [2]
 [3]
 [1]]  with shape: (4, 1)
F:
 [[1]
 [2]
 [3]
 [1]]  with shape: (4, 4)
F2:
 [[1]
 [2]
 [3]
 [1]]  with shape: (4, 4)
F3: M2 @ T is:
 [[1]
 [2]
 [3]
 [1]]
M2 @ M3 is:
 [[1 0 0 1]
 [0 1 0 1]
 [0 0 1 1]
 [0 0 0 1]]
M3 @ M4 is (correct matrix mult):
 [[1 0 0 3]
 [0 1 0 3]
 [0 0 1 3]
 [0 0 0 1]]
M3 dot M4 is:
 [[1 0 0 3]
 [0 1 0 3]
 [0 0 1 3]
 [0 0 0 1]]
position:
 [[ 0.   0.5  0. ]
 [ 0.5 -0.5  0. ]
 [-0.5 -0.5  0. ]]  with shape: (3, 3)


Test vector multiplication and vector 2 matrix multiplication. Note that **numpy** does not distinguish between `row` and `column` vectors. For more on that, please refer to: 
- https://stackoverflow.com/questions/17428621/python-differentiating-between-row-and-column-vectors
- https://hadrienj.github.io/posts/Deep-Learning-Book-Series-2.2-Multiplying-Matrices-and-Vectors/
- for CG expressed vectors/matrices please best refer to:
    - http://morpheo.inrialpes.fr/~franco/3dgraphics/practical1.html#projection-transform

In [6]:
M1 = identity(4)
M2 = array([
    [1,0,0,1],
    [0,1,0,2],
    [0,0,1,3],
    [0,0,0,1]
])
T = array([
    [1],[2],[3],[1]
])

Row = array(
    [1, 2, 3, 1]
)
#Row = Row.reshape(4,1)
#Row = Row.transpose()
print(f"F3 shape: {F3.shape}")
F3 = M2 @ T
print("F3=M2@T: \n",F3)
print("M1: \n",M1)
print("M2: \n",M2)
print("T: \n",T)
print(f"M1 @ F3: \n {M1 @ F3}")
print(f"T shape: {T.shape}")
print(f"M2 shape: {M2.shape}")
print(f"M2 * T: \n {M2 @ T}")
print(f"Row shape: {Row.shape}")
print("Row: \n", Row)
print(f"Row @ T: \n {Row @ T}")
#print(f"T @ Row: \n {T @ Row}") # cannot perform this matrix multiplication


F3 shape: (4, 1)
F3=M2@T: 
 [[2]
 [4]
 [6]
 [1]]
M1: 
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
M2: 
 [[1 0 0 1]
 [0 1 0 2]
 [0 0 1 3]
 [0 0 0 1]]
T: 
 [[1]
 [2]
 [3]
 [1]]
M1 @ F3: 
 [[2.]
 [4.]
 [6.]
 [1.]]
T shape: (4, 1)
M2 shape: (4, 4)
M2 * T: 
 [[2]
 [4]
 [6]
 [1]]
Row shape: (4,)
Row: 
 [1 2 3 1]
Row @ T: 
 [15]


The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.

In [7]:
type(v), type(M)

(numpy.ndarray, numpy.ndarray)

The difference between the `v` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property.

In [8]:
v.shape

(4,)

In [9]:
M.shape

(2, 2)

The number of elements in the array is available through the `ndarray.size` property:

In [10]:
M.size

4

Equivalently, we could use the function `numpy.shape` and `numpy.size`

In [11]:
shape(M)

(2, 2)

In [12]:
size(M)

4

So far the `numpy.ndarray` looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type? 

There are several reasons:

* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created.
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

Using the `dtype` (data type) property of an `ndarray`, we can see what type the data of an array has:

In [13]:
M.dtype

dtype('int64')

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

In [14]:
M[0,0] = "hello"

ValueError: invalid literal for int() with base 10: 'hello'

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [None]:
M = array([[1, 2], [3, 4]], dtype=complex)

M

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

Common data types that can be used with `dtype` are: `int`, `float`, `complex`, `bool`, `object`, etc.

We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

Finally, we can also represent arrays as different types:

In [None]:
M = M.astype(str)

M

array([['(1+0j)', '(2+0j)'],
       ['(3+0j)', '(4+0j)']], dtype='<U64')

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generate arrays of different forms. Some of the more common are:

#### arange

In [None]:
# create a range

x = arange(0, 10, 1) # arguments: start, stop, step

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
x = arange(-1, 1, 0.1)

x

array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
       -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
       -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
        2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
        6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01])

#### linspace and logspace

In [None]:
# using linspace, both end points ARE included
linspace(0, 10, 25)

array([ 0.        ,  0.41666667,  0.83333333,  1.25      ,  1.66666667,
        2.08333333,  2.5       ,  2.91666667,  3.33333333,  3.75      ,
        4.16666667,  4.58333333,  5.        ,  5.41666667,  5.83333333,
        6.25      ,  6.66666667,  7.08333333,  7.5       ,  7.91666667,
        8.33333333,  8.75      ,  9.16666667,  9.58333333, 10.        ])

In [None]:
logspace(0, 10, 10, base=e)

array([1.00000000e+00, 3.03773178e+00, 9.22781435e+00, 2.80316249e+01,
       8.51525577e+01, 2.58670631e+02, 7.85771994e+02, 2.38696456e+03,
       7.25095809e+03, 2.20264658e+04])

#### mgrid

In [None]:
x, y = mgrid[0:5, 0:5] # similar to meshgrid in MATLAB

In [None]:
x

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [None]:
y

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

#### random data

In [None]:
from numpy import random

In [None]:
# uniform random numbers in [0,1]
random.rand(5,5)

array([[0.35232846, 0.89510433, 0.80279604, 0.62906389, 0.8453854 ],
       [0.54926037, 0.33054527, 0.99409128, 0.19773736, 0.53417421],
       [0.25275286, 0.99090521, 0.70003463, 0.96757214, 0.234704  ],
       [0.01634668, 0.78580514, 0.9905677 , 0.40367673, 0.81877347],
       [0.96797387, 0.05857155, 0.42413395, 0.87260417, 0.19299059]])

In [None]:
# standard normal distributed random numbers
random.randn(5,5)

array([[ 1.13233337,  1.17613526, -0.81931247, -0.01941493,  0.85946902],
       [ 0.13627108, -0.00363321,  1.2590855 ,  1.25533632,  0.88234596],
       [ 1.3067313 , -0.0181152 ,  1.70995109,  0.99048507, -1.37063684],
       [ 1.22006993, -1.25380112, -1.17774766, -0.48618116, -0.00587499],
       [ 2.2051755 , -0.32652741,  0.32702792,  0.68804175,  0.21949573]])

#### diag

In [None]:
# a diagonal matrix
diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [None]:
# diagonal with offset from the main diagonal
diag([1,2,3], k=1) 

array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

#### zeros and ones

In [None]:
zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

## Further reading

* Check out more introductory notebooks in **Juno**!
* [General questions about NumPy](https://www.scipy.org/scipylib/faq.html#id1)
* http://numpy.scipy.org
* http://scipy.org/Tentative_NumPy_Tutorial
* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.

## Versions

In [None]:
%reload_ext version_information

%version_information numpy

Software,Version
Python,3.7.6 64bit [Clang 4.0.1 (tags/RELEASE_401/final)]
IPython,7.13.0
OS,Darwin 19.4.0 x86_64 i386 64bit
numpy,1.18.2
Tue Apr 07 18:52:39 2020 EEST,Tue Apr 07 18:52:39 2020 EEST
