# 1. Creating Numpy arrays

Numpy has many different types of data "containers": lists, dictionaries, tuples etc. However none of them allows for efficient numerical calculation, in particular not in multi-dimensional cases (think e.g. of operations on images). Numpy has been developed exactly to fill this gap. It provides a new data structure, the **numpy array**, and a large library of operations that allow to: 
- generate such arrays
- combine arrays in different ways (concatenation, stacking etc.)
- modify such arrays (projection, extraction of sub-arrays etc.)
- apply mathematical operations on them

Numpy is the base of almost the entire Python scientific programming stack. Many libraries build on top of Numpy, either by providing specialized functions to operate on them (e.g. scikit-image for image processing) or by creating more complex data containers on top of it. The data science library Pandas that will also be presented in this course is a good example of this with its dataframe structures.


In [20]:
import numpy as np
from svg import numpy_to_svg

ImportError: cannot import name 'numpy_to_svg' from 'svg' (/opt/homebrew/lib/python3.11/site-packages/svg/__init__.py)

## 1.1 What is an array ?

Let us create the simplest example of an array by transforming a regular Python list into an array (we will see more advanced ways of creating arrays in the next chapters):

In [3]:
mylist = [2,5,3,9,5,2]

In [4]:
mylist

[2, 5, 3, 9, 5, 2]

In [5]:
myarray = np.array(mylist)

In [6]:
myarray

array([2, 5, 3, 9, 5, 2])

In [7]:
type(myarray)

numpy.ndarray

We see that ```myarray``` is a Numpy array thanks to the ```array``` specification in the output. The type also says that we have a numpy ndarray (n-dimensional). At this point we don't see a big difference with regular lists, but we'll see in the following sections all the operations we can do with these objects.

We can already see a difference with two basic attributes of arrays: their type and shape.

### 1.1.1 Array Type

Just like when we create regular variables in Python, arrays receive a type when created. Unlike regular list, **all** elements of an array always have the same type. The type of an array can be recovered through the ```.dtype``` method:

In [8]:
myarray.dtype

dtype('int64')

Depending on the content of the list, the array will have different types. But the logic of "maximal complexity" is kept. For example if we mix integers and floats, we get a float array:

In [9]:
myarray2 = np.array([1.2, 6, 7.6, 5])
myarray2

array([1.2, 6. , 7.6, 5. ])

In [10]:
myarray2.dtype

dtype('float64')

In general, we have the possibility to assign a type to an array. This is true here, as well as later when we'll create more complex arrays, and is done via the ```dtype``` option: 

In [11]:
myarray2 = np.array([1.2, 6, 7.6, 500], dtype=np.uint8)
myarray2

For the old behavior, usually:
    np.array(value).astype(dtype)`
will give the desired result (the cast overflows).
  myarray2 = np.array([1.2, 6, 7.6, 500], dtype=np.uint8)


array([  1,   6,   7, 244], dtype=uint8)

The type of the array can also be changed after creation using the ```.astype()``` method:

In [13]:
myfloat_array = np.array([1.2, 6, 7.6, 500])
myfloat_array.dtype

dtype('float64')

In [15]:
myint_array = myfloat_array.astype(np.int8)
myint_array.dtype
myint_array

array([  1,   6,   7, -12], dtype=int8)

### 1.1.2 Array shape

A very important property of an array is its **shape** or in other words the dimensions of each axis. That property can be accessed via the ```.shape``` property:

In [16]:
myarray

array([2, 5, 3, 9, 5, 2])

In [17]:
myarray.shape

(6,)

We see that our simple array has only one dimension of length 6. Now of course we can create more complex arrays. Let's create for example a *list of two lists*:

In [18]:
my2d_list = [[1,2,3], [4,5,6]]

my2d_array = np.array(my2d_list)
my2d_array

array([[1, 2, 3],
       [4, 5, 6]])

In [19]:
my2d_array.shape

(2, 3)

We see now that the shape of this array is *two-dimensional*. We also see that we have 2 lists of 3 elements. In fact at this point we should forget that we have a list of lists and simply consider this object as a *matrix* with *two rows and three columns*. We'll use the follwing graphical representation to clarify some concepts:

In [None]:
numpy_to_svg(my2d_array)

## 1.2 Creating arrays

We have seen that we can turn regular lists into arrays. However this becomes quickly impractical for larger arrays. Numpy offers several functions to create particular arrays. 

### 1.2.1 Common simple arrays
For example an array full of zeros or ones:

In [22]:
one_array = np.ones((2,3))
one_array

array([[1., 1., 1.],
       [1., 1., 1.]])

In [23]:
zero_array = np.zeros((2,3))
zero_array

array([[0., 0., 0.],
       [0., 0., 0.]])

One can also create diagonal matrix:

In [24]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

By default Numpy creates float arrays:

In [25]:
one_array.dtype

dtype('float64')

However as mentioned before, one can impose a type usine the ```dtype``` option:

In [26]:
one_array_int = np.ones((2,3), dtype=np.int8)
one_array_int

array([[1, 1, 1],
       [1, 1, 1]], dtype=int8)

In [27]:
one_array_int.dtype

dtype('int8')

### 1.2.2 Copying the shape
Often one needs to create arrays of same shape. This can be done with "like-functions":

In [28]:
same_shape_array = np.zeros_like(one_array)
same_shape_array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [29]:
one_array.shape

(2, 3)

In [30]:
same_shape_array.shape

(2, 3)

In [31]:
np.ones_like(one_array)

array([[1., 1., 1.],
       [1., 1., 1.]])

### 1.2.3 Complex arrays

We are not limited to create arrays containing ones or zeros. Very common operations involve e.g. the creation of arrays containing regularly arrange numbers. For example a "from-to-by-step" list:

In [32]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

Or equidistant numbers between boundaries:

In [33]:
np.linspace(0,1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

Numpy offers in particular a ```random``` submodules that allows one to create arrays containing values from a wide array of distributions. For example, normally distributed:

In [34]:
normal_array = np.random.normal(loc=10, scale=2, size=(3,4))
normal_array

array([[13.23391828, 10.71446303,  6.98749332, 11.67887065],
       [ 9.51455648, 10.13049154, 10.16660984,  8.46105405],
       [ 9.13855979, 15.42681298, 12.77524334, 12.20456374]])

In [35]:
np.random.poisson(lam=5, size=(3,4))

array([[10,  7,  9,  4],
       [ 5,  8,  6,  3],
       [ 9,  4,  7,  3]])

### 1.2.4 Higher dimensions

Until now we have almost only dealt with 1D or 2D arrays that look like a simple grid:

In [37]:
myarray = np.ones((5,10))
myarray

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

We are not limited to create 1 or 2 dimensional arrays. We can basically create any-dimension array. For example in microscopy, images can be volumetric and thus they are 3D arrays in Numpy. For example if we acquired 5 planes of a 10px by 10px image, we would have something like:

In [38]:
array3D = np.ones((10,10,5))

In [39]:
array3D

array([[[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
    

All the functions and properties that we have seen until now are N-dimensional, i.e. they work in the same way irrespective of the array size.

## 1.3 Importing arrays

We have seen until now multiple ways to create arrays. However, most of the time, you will *import* data from some source, either directly as arrays or as lists, and use these data in your analysis.

### 1.3.1 Loading and saving arrays

Numpy can efficiently save and load arrays in its own format ```.npy```. Let's create an array and save it:

In [40]:
array_to_save = np.random.normal(10, 2, (4,5))
array_to_save

array([[10.04055276,  9.60392876, 11.7494708 , 12.62943368,  8.67643699],
       [ 9.55748422,  8.88472059,  9.93733192,  9.49676878, 10.08593667],
       [ 5.39206709,  8.27655003,  8.66910128,  8.56429791, 13.68488498],
       [12.75572895,  9.6778243 , 10.59453347,  8.93760608, 13.10571655]])

In [41]:
np.save('my_saved_array.npy', array_to_save)

In [42]:
ls

01-DA_Numpy_arrays_creation.ipynb   10-DA_Pandas_combine.ipynb
02-DA_Numpy_array_maths.ipynb       11-DA_Pandas_splitting.ipynb
03-DA_Numpy_matplotlib.ipynb        12-DA_Pandas_realworld.ipynb
04-DA_Numpy_indexing.ipynb          98-DA_Numpy_Exercises.ipynb
05-DA_Numpy_combining_arrays.ipynb  98-DA_Numpy_Solutions.ipynb
06-DA_Pandas_introduction.ipynb     99-DA_Pandas_Exercises.ipynb
07-DA_Pandas_structures.ipynb       99-DA_Pandas_Solutions.ipynb
08-DA_Pandas_import_plotting.ipynb  [34mData[m[m/
09-DA_Pandas_operations.ipynb       my_saved_array.npy


Now that this array is saved on disk, we can load it again using ```np.load```:

In [43]:
new_array = np.load('my_saved_array.npy')
new_array

array([[10.04055276,  9.60392876, 11.7494708 , 12.62943368,  8.67643699],
       [ 9.55748422,  8.88472059,  9.93733192,  9.49676878, 10.08593667],
       [ 5.39206709,  8.27655003,  8.66910128,  8.56429791, 13.68488498],
       [12.75572895,  9.6778243 , 10.59453347,  8.93760608, 13.10571655]])

If you have several arrays that belong together, you can also save them in a single file using ```np.savez``` in ```npz``` format. Let's create a second array:

In [None]:
array_to_save2 = np.random.normal(10, 2, (1,2))
array_to_save2

In [None]:
np.savez('multiple_arrays.npz', array_to_save=array_to_save, array_to_save2=array_to_save2)

In [None]:
ls

And when we load it again:

In [None]:
load_multiple = np.load('multiple_arrays.npz')
type(load_multiple)

We get here an ```NpzFile``` *object* from which we can read our data. Note that when we load an ```npz``` file, it is only loaded *lazily*, i.e. data are not actually read, but the content is parsed. This is very useful if you need to store large amounts of data but don't always need to re-load all of them. We can use methods to actually access the data:

In [None]:
load_multiple.files

In [None]:
load_multiple.get('array_to_save2')

### 1.3.2 Importing data as arrays

Images are a typical example of data that are array-like (matrix of pixels) and that can be imported directly as arrays. Of course, each domain will have it's own *importing libraries*. For example in the area of imaging, the scikit-image package is one of the main libraries, and it offers and importer of images as arrays which works both with local files and web addresses:

In [45]:
import skimage.io

image = skimage.io.imread('https://upload.wikimedia.org/wikipedia/commons/f/fd/%27%C3%9Cbermut_Exub%C3%A9rance%27_by_Paul_Klee%2C_1939.jpg')

We can briefly explore that image:

In [46]:
type(image)

numpy.ndarray

In [47]:
image.dtype

dtype('uint8')

In [50]:
image.shape
image

array([[[124,  85,  56],
        [147, 108,  79],
        [150, 113,  84],
        ...,
        [152, 118,  91],
        [153, 119,  92],
        [157, 123,  96]],

       [[114,  75,  46],
        [147, 108,  79],
        [169, 132, 103],
        ...,
        [186, 152, 125],
        [185, 151, 124],
        [172, 138, 111]],

       [[ 98,  59,  30],
        [136,  97,  68],
        [176, 137, 108],
        ...,
        [177, 143, 116],
        [186, 152, 125],
        [185, 151, 124]],

       ...,

       [[139, 104,  82],
        [159, 124, 102],
        [171, 136, 114],
        ...,
        [177, 130, 104],
        [176, 129, 103],
        [149, 102,  76]],

       [[142, 106,  84],
        [163, 127, 105],
        [171, 135, 113],
        ...,
        [180, 134, 108],
        [179, 133, 107],
        [150, 105,  76]],

       [[155, 119,  97],
        [152, 116,  94],
        [155, 119,  97],
        ...,
        [178, 132, 106],
        [172, 127,  98],
        [139,  94,  65]]

We see that we have an array of integeres with 3 dimensions. Since we imported a jpg image, we know that the thrid dimension corresponds to three color channels Red, Green, Blue (RGB).

You can also read regular CSV files directly as Numpy arrays. This is more commonly done using Pandas, so we don't spend much time on this, but here is an example on importing data from the web:

In [None]:
oilprice = np.loadtxt('https://raw.githubusercontent.com/guiwitz/Rdatasets/master/csv/quantreg/gasprice.csv',
          delimiter=',', usecols=range(2,3), skiprows=1)

In [None]:
oilprice