# Introduction to NumPy fundamentals

## Python is slow

Scientific and numerical computing often requires processing massive datasets with complex algorithms. If you are scientists or data professional, you want a programming language than can process data FAST. **The closer a programming language is to machine instructions (binary), the faster it runs**. That's why for decades, programs like C, C++, and Fortran, were the to-go option for data intensive applications in science and technology. 

However, writing code in compiled languages like C++ and Fortran can be slow and, frankly, terribly annoying (but that's just me!). Scientist are not coders (although many end up mutating into scientist-coder hybrids over time, but that's a whole other subject). From scientists perspective, **the easier and faster to write code, the better**. High-level programming languages, i.e., languages that are closer to human language rather than to machine language do meet such requirements. The likes of Python, PHP, JavaScript, and Ruby, fits the bill: **easy to write, easy to learn, easy to run**. But, instructions written in high-level programming languages are slow to run by computers, for complicated reasons I do not explore here. The fact is that they are slower to run than C++ or Fortran.

Scientists face a conundrum: they need an **easy to write** AND **fast to run** programming language. They need the best of both worlds. For a long time such language simply did not exist. Then Python came a long.

Oh wait, Python it is, indeed, easy to learn and write, *but slow to run compared to compiled languages*. Like really, really slow. For instance, computing the [spectral-norm of a matrix](https://en.wikipedia.org/wiki/Matrix_norm), which is a common task in data applications, [has been benchmarked](https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/spectralnorm.html) at ~1.9 seconds in C, whereas python takes a whopping ~170.1 seconds, meaning that Python is ~90 times *slower*. Considering this, the fact Python became the dominant language in machine learning and data science is a bit of a puzzle.

There are at least two reasons for this why Python succeed anyways. 

The first is that as computing capacity became cheaper, **processing time has become less important than coding-time**. Basically, the time you save by writing code in high-level yet slow to-run programming languages, compensates by their lack of performance at run time. And Python is exceptionally clean, intuitive, and easy to learn, compared to C or Fortran.

The second reason, which is probably the major one, is **libraries**. Particularly libraries written in low-level high-performant languages. Turns out that Python extensibility allows programmers to write the "engine" for numerical computation in languages like C and Fortran, and then Python can invoke such "engines" in the background, meaning that although you write code in Python, it is executed in compiled C or Fortran code instead of Python itself. And that is how you obtain the best of both worlds: **the easy and fast developing time of Python, plus the runtime performance of C or Fortran**. True, there is some small overhead of going back and forth between languages, but its impact its minimal. 

It is important to mention that nowadays, new programming languages has been created exactly with this two necessities (fast development + fast performance) in mind, like **[Julia](https://julialang.org/)**. In the spectral-norm test, Julia was benchmarked at ~2.79 seconds, almost as fast as C. Additionally, Julia is a dynamic language, easy to learn and write in like Python. Why not Julia then? Probably because of the relative immaturity of its packages ecosystem compared to Python. Julia first appeared in 2012, whereas Python was introduced in 1990. The availability of well-tested libraries for pretty much anything you can imagine in Python is unparalleled. A second reason is probably because Python does not cost you performance anyways (with its libraries), so Why not? 

## What is NumPy

It is no exaggeration to say that **NumPy is at the core of the entire scientific computing Python ecosystem**, both as an standalone package for numerical computation and as the engine behind most data science packages.

NumPy is a package for array-like or matrix-like high-performance computation. Its "engine" is written in C, meaning that NumPy utilized "in the background" pre-compiled C code to perform computations. 

## Installing NumPy

If you are running this Notebook in `MyBinder` or locally after running the `pip install -r requirements.txt` file, you have `NumPy` installed already. Otherwise, you will need to install `NumPy` with one of this options:

1. Scientific Python Distributions
2. pip
3. System-wide installation via package manager (apt, brew, etc.)
4. From NumPy source code

Option 4 is for developers who need to alter source code. Option 3 is not recommended as system-wide installation of packages may generate dependency conflicts. 

Option 1 is probably the simplest and it's widely used by practitioners. Within this category we have:

- [Anaconda](https://www.anaconda.com/products/individual)
- [Mini-conda](https://docs.conda.io/en/latest/miniconda.html)
- [WinPython](https://winpython.github.io/)
- [Pyzo](https://pyzo.org/)

WinPython and Pyzo are less used and I do not have any experience with them. Feel free to experiment with them at your own risk. Anaconda and mini-conda are the most popular options. Anaconda basically is a large bundle of packages for Python and R, and a package manager. Mini-conda is a lightweight version of Anaconda.  Once you install Anaconda or mini-conda, `NumPy` will be available within the conda installation. 

I do not like and do not use Anaconda or mini-conda. I just see no reason to install hundreds of libraries I most likely never use. I also see no reason to duplicate functionality which is already provided in the standard Python installation. It just occupies memory and you also need to learn how to use `conda`, which sometimes introduce hard to understand issues with multiple Python and packages installations. Yet, many people recommend and use this method. If you decide to use it, go to the [Anaconda](https://www.anaconda.com/products/individual) or [Mini-conda](https://docs.conda.io/en/latest/miniconda.html) site and follow the instructions for your system. 

My preferred method is `pip`, which is available out-of-the-box with your Python installation. To install `NumPy` is as simple as to run:

```bash
pip3 install numpy
```

Now, I highly recommend to create a virtual environment, activate the environment, and then install `NumPy` within that. It boils down to copy-pasting the following in your terminal:

```bash
# create the virtual environment
python3 -m venv venv

# activate the virtual environment
source venv/bin/activate

# upgrade pip package manager
pip3 install --upgrade pip

#install numpy
pip3 install numpy
```

The virtual environment will isolate your `NumPy` installation from your system-wide Python installation and other projects you may have in your computer. So, it's safer. This method will save you gigabytes of memory, time, confusion, and effort. But that's just me!

## Arrays

`NumPy` fundamental object is the **[ndarray](https://numpy.org/doc/1.18/reference/generated/numpy.ndarray.html)**. Arrays are simply ordered collections of elements, like singletons, [lists](https://docs.python.org/3/tutorial/datastructures.html), [sets](https://realpython.com/python-sets/), [vectors](https://pabloinsente.github.io/intro-linear-algebra#vectors), [matrices](https://pabloinsente.github.io/intro-linear-algebra#matrices), or [tensors](https://en.wikipedia.org/wiki/Tensor). In Additionally, elements in an array have of **the same type**. For instance, an array can't have integers and text at the same time. 

The "n" in "ndarray" makes references to the arbitrary number of dimensions it can take. An array with one element and one dimension, it's a "singleton" or just a number. An array with four elements and two dimensions, is a 2x2 matrix. Put simply, an array is like an Excel sheet with the caveat that instead of being restricted to two dimensions, it can be extended to 3, 4, or higher dimensions, and that you can't combine data types in a "sheet". 

In `NumPy`, dimensions are called `axes`, so I will use such term interchangeably with dimensions from now.

<<<array-picture.svg>>>

Let's see a few examples. 

We first need to import `NumPy` by running:

In [1]:
import numpy as np

Then we can use the `array` method constructor to build an array as:

In [2]:
 # 1 axis/dimensions array: 1
one_dim=  np.array([1, 2, 3]) 
 
 # 2 axis/dimensions array
two_dim= np.array([[1, 2, 3],  
                   [4, 5, 6]])
 
 # 3 axis/dimensions array
three_dim =  np.array([[[1, 2, 3], 
                        [4, 5, 6]],
                       [[1, 2, 3],
                        [4, 5, 6]]]) 

This is how arrays look when printed:

In [3]:
print(f'One-dimensional array:\n{one_dim}\n')
print(f'Two-dimensional array:\n{two_dim}\n')
print(f'Three-dimensional array:\n{three_dim}')

One-dimensional array:
[1 2 3]

Two-dimensional array:
[[1 2 3]
 [4 5 6]]

Three-dimensional array:
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


We can inspect and confirm **dimensionality** as:

In [4]:
print(f'Number of dimensions array one:{one_dim.ndim}')
print(f'Number of dimensions array two:{two_dim.ndim}')
print(f'Number of dimensions array three:{three_dim.ndim}')

Number of dimensions array one:1
Number of dimensions array two:2
Number of dimensions array three:3


The **shape** of an array is something confused with its dimensionality. 

In [5]:
print(f'Shape array one:{one_dim.shape}')
print(f'Shape array two:{two_dim.shape}')
print(f'Shape array three:{three_dim.shape}')

Shape array one:(3,)
Shape array two:(2, 3)
Shape array three:(2, 2, 3)


The first number in the parenthesis represents the number of elements within the first axis/dimension; the second number the number of elements within the second axis/dimension, the third number the number of elements within the third axis/dimensions, and so on. 

For instance, the (2, 2, 3) indicates 2 elements along the first axis, 2 elements along the second axis, and 3 elements along the third axis.

To count the **number of elements** within an array type:

In [6]:
print(f'Number of elements array one:{one_dim.size}')
print(f'Number of elements array two:{two_dim.size}')
print(f'Number of elements array three:{three_dim.size}')

Number of elements array one:3
Number of elements array two:6
Number of elements array three:12


`NumPy` utilizes different **data types** (more on this later) to represent data, which can be inspected as:

In [7]:
print(f'Data type array one:{one_dim.dtype}')
print(f'Data type array two:{two_dim.dtype}')
print(f'Data type array three:{three_dim.dtype}')

Data type array one:int64
Data type array two:int64
Data type array three:int64


## Array creation

`NumPy` offers several alternatives  to create arrays.

I will review three cases:

1. Conversion from other Python structures 
2. Intrinsic `NumPy` array creation objects 
3. Use of special library functions

### Conversion from other Python structures 

In the previous section I used the `array` method to create an array from a Python list. This is an example of array creation from conversion of an array-like Python object. 

Lits, tuples, and sets are array-like Python objects that serve as options for this method.

In [8]:
array_list = np.array([1, 2, 3])
array_tuple = np.array(((1, 2, 3), (4, 5, 6)))
array_set = np.array({"pikachu", "snorlax", "charizard"})

In [9]:
print(f'Array from list:\n{array_list}\n')
print(f'Array from tuple:\n{array_tuple}\n')
print(f'Array from set:\n{array_set}')

Array from list:
[1 2 3]

Array from tuple:
[[1 2 3]
 [4 5 6]]

Array from set:
{'charizard', 'pikachu', 'snorlax'}


### Intrinsic NumPy array creation objects 

Manual input of data into arrays can be cumbersome, so `NumPy` offers a series of convenience methods to create arrays for special cases, like zeros, ones, and others. Below some common examples.

In [10]:
# zeros
zeros = np.zeros(5)

# ones
ones = np.ones((3, 3))

# arange
arange = np.arange(1, 10, 2)

# empty
empty =  np.empty([2, 2])

# linspace
linespace = np.linspace(-1.0, 1.0, num=10)

# full
full = np.full((3,3), -2)

# indices
indices =  np.indices((3,3))

In [11]:
print(f'Array of zeros:\n{zeros}\n')
print(f'Array of ones:\n{ones}\n')
print(f'Array of empty entries:\n{empty}\n')
print(f'Evenly spaced array in a range:\n{linespace}\n')
print(f'Array with same number on each entry:\n{full}\n')
print(f'Array from indices:\n{indices}\n')

Array of zeros:
[0. 0. 0. 0. 0.]

Array of ones:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

Array of empty entries:
[[4.66120156e-310 6.90454990e-310]
 [0.00000000e+000 0.00000000e+000]]

Evenly spaced array in a range:
[-1.         -0.77777778 -0.55555556 -0.33333333 -0.11111111  0.11111111
  0.33333333  0.55555556  0.77777778  1.        ]

Array with same number on each entry:
[[-2 -2 -2]
 [-2 -2 -2]
 [-2 -2 -2]]

Array from indices:
[[[0 0 0]
  [1 1 1]
  [2 2 2]]

 [[0 1 2]
  [0 1 2]
  [0 1 2]]]



- The `zero` method generates an array of zeros of shape defined by a tuple passed to the function
- The `ones` method generates an array of ones of shape defined by a tuple passed to the function
- The `empty` method generates an empty array (although very small numbers will be printed) of shape defined by a tuple passed to the function
- The `linespace` method generates an array of evenly spaced entries given a range and an step size
- The `full` method returns an array of  shape defined by a tuple passed to the function filled with the same value (third argument outside the tuple)
- The `indices` method generates an array representing the indices of the grid

### Use of special library functions

`NumPy` has a large list of special cases functions that generate arrays, which are too large and seemingly disconnected to enumerate. Here are a few examples:

In [12]:
# diagonal array
diagonal = np.diag([1, 2, 3], k=0)

# identity 
identity = np.identity(3)

# eye
eye = np.eye(4, k=1)

# rand
rand = np.random.rand(3,2)

In [13]:
print(f'Diagonal matrix from array-like structure:\n{diagonal}\n')
print(f'Identity matrix:\n{identity}\n')
print(f'Diagonal matrix with ones and zeros elsewhere:\n{eye}\n')
print(f'Array of random numbers sampled from a uniform distribution:\n{rand}')

Diagonal matrix from array-like structure:
[[1 0 0]
 [0 2 0]
 [0 0 3]]

Identity matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Diagonal matrix with ones and zeros elsewhere:
[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]]

Array of random numbers sampled from a uniform distribution:
[[0.08941575 0.91306609]
 [0.35893813 0.84554003]
 [0.94169506 0.05398521]]


- The `diagonal` function returns an array with the numbers in the diagonal and zeros elsewhere
- The `identity` function returns an identity matrix
- The `eye` function returns an array with ones on the diagonal and zeros elsewhere
- The `random.rand` function returns an array of random numbers sampled from an uniform distribution

## Array data type and conversions

I mentioned `NumPy` arrays can contain a single data type. This constraint makes data storing and manipulation much more efficient than working with mixed type arrays (like Python lists), which is a priority for `NumPy`. 

Data types in `NumPy` is a relatively complicated subject, particularly if you are not familiar with C or memory allocation. For our purposes, some basic data types are worth knowing: 

1. `np.bool_`: used to represent "Booleans" (True or False)
2. `np.int`: used to integers numbers 
3. `np.unit`: used to represent positive integers or "unsigned" integers
4. `np.float`: used to represent real numbers (decimals, fractions, etc) or "floating point" numbers  
5. `np.complex`: used to represent complex number

In my experience, booleans, integers, and float point data types, are the ones that you end up using the most. At least explicitly. Other data types are used all the time, but you do not have to worry about it because `NumPy` takes care of it for you. As Python is a "dynamically typed" language, which simply means that you do not have to tell the computer what data type you will use (Python does this for you), most of the time you have no need to indicate which data type you will use. 

For instance, if you simply define and array of values, `NumPy` will decide what data type to allocate for each:  

In [14]:
bools = np.array([True, False])
ints = np.full((2,2), 1)
floats = np.ones(5)
unsigned = np.arange(3, dtype=np.uint8)
complexs = np.array([+1j, +2j, +3j])
unicode = np.array(["Catbug", "Chris", "Danny", "Wallow", "Beth"])

In [15]:
print(f'Booleans :{bools.dtype}')
print(f'Integers :{ints.dtype}')
print(f'Floats :{floats.dtype}')
print(f'Unsigned :{unsigned.dtype}')
print(f'Complexs :{complexs.dtype}')
print(f'Unicode :{unicode.dtype}')

Booleans :bool
Integers :int64
Floats :float64
Unsigned :uint8
Complexs :complex128
Unicode :<U6


In some instances you may want to explicitly indicate the data type. Here are some examples of how you can do that:

In [16]:
int_16 = np.array([1, 2, 3, 4], dtype='int16')
float_32 = np.array([1, 2, 3, 4], dtype='float32')
unsigned_int_8 = np.arange(3, dtype='uint8')

In [17]:
print(f'Integer 16 bytes data type: { int_16.dtype}')
print(f'Float 32 bytes data type: {float_32.dtype}')
print(f'Unsigned integer 8 bytes data type: {unsigned_int_8.dtype}')

Integer 16 bytes data type: int16
Float 32 bytes data type: float32
Unsigned integer 8 bytes data type: uint8


Note that the numbers after the data type, like 8, 16, 32, and 64, indicate the number of bytes are allocated to represent each element of the array. The higher the number, the more memory.

There are several reasons why you may want to explicitly indicate the data type. One the most common reasons is **memory efficiency**. If you know your range of numbers in a variable are small positive numbers, utilizing an 8 bytes unsigned integers will use significantly less memory than a 32 bytes integer. Now, this will be an issue only with massive datasets, meaning datasets that make your computer to have difficulty processing your data, as it is not clear what "massive" means without context. 

A final operation you may want to do, is **to change the data type of an array**. For instance, let's say you want to stack two arrays, but one has floating point data type, and the other integers. Recall that `NumPy` arrays can have a single data type. Let's see an example.

In [18]:
int_array = np.arange(10, dtype='int')
float_array = np.arange(10, dtype='float')

In [19]:
print(f'Array 1: {int_array}, data type:{int_array.dtype}')
print(f'Array 2: {float_array}, data type:{float_array.dtype}')

Array 1: [0 1 2 3 4 5 6 7 8 9], data type:int64
Array 2: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.], data type:float64


Let's first stack the arrays as they are

In [20]:
stacked_arrays = np.stack((int_array, float_array))

In [21]:
print(f'Stacked arrays as they are:\n{stacked_arrays},\ndata type:{stacked_arrays.dtype}')

Stacked arrays as they are:
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
 [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]],
data type:float64


As you can see, `NumPy` "upcasted" the data type of lower precision, the `int64`, to the data type of higher precision, the `float64`. This is simply because `int64` can't represent float point or real numbers, only integers or natural numbers. But `float64` can represent integers. So it is the smart choice to make things work. 

Now, you may want to keep everything as integers, for whatever reason. If so, this is what you need to do. First, convert the float64 array to a int64 array as:

In [22]:
int_array_2 = float_array.astype('int64')

And now stack things together. 

In [23]:
stacked_arrays_2 = np.stack((int_array, int_array_2))

In [24]:
print(f'Stacked arrays after conversion:\n{stacked_arrays_2},\ndata type:{stacked_arrays_2.dtype}')

Stacked arrays after conversion:
[[0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]],
data type:int64


There you go, the array is composed by integers of 64 bytes now. 

Let's say you want to save memory by converting the new array to a **lower byte representation**, like int 8. This is known as "downcasting", i.e., the opposite of "upcasting". For this you simply need to:

In [25]:
stacked_arrays_3 = stacked_arrays_2.astype('int8')

In [26]:
print(f'Stacked arrays after downcasting:\n{stacked_arrays_3},\ndata type:{stacked_arrays_3.dtype}')

Stacked arrays after downcasting:
[[0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]],
data type:int8


You can check and compare the memory "footprint" of each array as:

In [27]:
print(f'Memory size Int64 array:{stacked_arrays_2.nbytes}')
print(f'Memory size Int8 array:{stacked_arrays_3.nbytes}')

Memory size Int64 array:160
Memory size Int8 array:20


The memory footprint has been reduced by a factor of 8. This may not sound like a lot, but if you are working with a dataset of size, let's say, 8 gigabytes, you can reduce such memory usage to only 1 gigabyte, which will also do data manipulation faster. 

## Array manipulation

The shape of an array is given by the number of elements along each axis. Now, if you think in a array as composed by little boxes or LEGO pieces, you can start to think if how those pieces can be rearranged in different shapes. For example, a 2 x 2 array could be "flattened" to be a 1 x 4 array, or maybe you could "swap" the rows and columns of the array, by moving the little boxes around, or even take away a couple of pieces reducing the array to a 1 x 2 shape.

These kind of operations are extremely common in any kind of data manipulation, and it is one of the most important skills to acquire. Some people prefer to convert `NumPy` arrays to `Pandas` DataFrames, as `Pandas` provide several easy to use functions to manipulate arrays. Nonetheless, manipulating arrays in `NumPy` it is not that much harder, it can save you time and effort by preventing you to going back and forth with `Pandas`, and well, this is a `NumPy` tutorial, so we are here to learn `NumPy` way. 

### Shape manipulation

Arrays can be changed with or without changing is data. This is equivalent to the difference between rearranging LEGO blocks with or without adding/removing pieces. 

The `reshape` method changes the shape of an array *without* changing its data. 

In [28]:
array_1 = np.array([[1, 2, 3],
                    [4, 5, 6]])

print(f'Array-1 shape: {array_1.shape}')

Array-1 shape: (2, 3)


Array-1 has shape (2, 3), meaning it has 2 rows and 3 columns. 

In [29]:
# the two syntaxs below are equivalent
array_2 = array_1.reshape(3,2) 
array_3 = np.reshape(array_1, (1,6))

In [30]:
print(f'Array-2:\n{array_2},\nshape: {array_2.shape}\n')
print(f'Array-3:\n{array_3},\nshape: {array_3.shape}')

Array-2:
[[1 2]
 [3 4]
 [5 6]],
shape: (3, 2)

Array-3:
[[1 2 3 4 5 6]],
shape: (1, 6)


Array-2 and Array-3 preserve the number of elements of Array-1.

**Flattening** an array, this is, collapsing all values into a single axis or dimension, can be done in two manners. 

In [31]:
array_1_ravel = array_1.ravel()
array_1_flatt = array_1.flatten()

In [32]:
print(f'Array-1 ravel:\n{array_1_ravel},\nshape: {array_1_ravel.shape}\n')
print(f'Array-1 flattened:\n{array_1_flatt},\nshape: {array_1_flatt.shape}')

Array-1 ravel:
[1 2 3 4 5 6],
shape: (6,)

Array-1 flattened:
[1 2 3 4 5 6],
shape: (6,)


Why on earth are there two methods to do exactly the same? The answer is that they are not doing the same: `ravel()` returns a '**view**' of the original array, whereas `flatte()` returns an independent '**copy**' of it. Views or images are just "pointers" to the original array in memory, whereas copies have their own space in memory. I'll cover this in depth later.

Another thing you might have realized is that Array-3 has the same elements than Array-1-ravel and Array-1-flattened, but it has an extra pair of `[]` and shape `(1,6)` instead of `(,6)`. What is going on here? 

Put simply, internally, `NumPy` arrays have two parts: the information itself and information about how to interpret/read the array. In the case of the shape information, this indicates how many "indices" are associated with an array. The `(1,6)` is saying that there **two indices** identifying the array: the number 1 for all the elements, and the numbers from 1 to 6 for each individual element. This makes sense if you think in array as matrices or excel sheets: the first element is in the firs row and first column (1,1), the second in the first row and the second column (1, 2), and son on.

However, If you think about it, you don't need two indices to identify the elements of a one-dimensional array. After all, when we count things we do not count "1 and 1, 1 and 2, 1 and 3, 1 and 4..." and so on. The `(6,)` is just saying that there is a **single index** identifying each of the 6 elements of the array, which makes perfect sense. The first element of the array is in position one (1,), the second in position two (2,), and so on. 

Now you may be wondering. Why then to add a 1 as an index if it's unnecessary? Since `NumPy` supports multi-dimensional arrays, technically, the `(1,6)` is indicating the array has TWO dimensions or axes instead of one. "BUT, the array has just one dimension, right?" Yes and no. The thing is such array can be represented as either: as a collection of elements along one dimension or as collection of elements along two dimensions, with the caveat that the first dimension has all the data, and the other is basically "empty" or "flat", but assigned to it. Just like with the the first element is in the firs row and first column (1,1)" idea. 

If you are familiar with linear algebra or geometry, you should now that a square is an object with two dimensions, but that can 'live' in three, four, five, a million, or any number of dimensions. Essentially, higher-dimensional spaces can contain objects with less dimensions, but not the other way around. You can't fit an sphere in a plane. The misunderstanding, in my view, comes from the tendency to think in data as two-dimensional grid-like objects, when in practice does not need to be like that *necessarily*. People like to think in `NumPy` arrays as matrices, vectors, tensors, etc., but they aren't, they are arrays with one or more dimensions. Period. 

This whole discussion may sound like I am beating around the bushes, but I am not. Dimensionality mismatch is one of the most important sources of errors, misunderstandings, and frustrations when working with `NumPy` arrays. If you ever do anything related to linear algebra, like pretty much all of machine learning and statistics, you need to have a firm understanding of how dimensions work in `NumPy`. 

### Transpose-like operations

Transposing means to "swap" or interchange the position and elements between two or more axes. 

The most common operation is the plain 'Transpose' operation, where the axes get permuted.  

In [33]:
array_1 = np.arange(4).reshape((2,2))# two dimensional array
array_2 = np.arange(12).reshape((3,2,2)) # three dimensional array

In [34]:
print(f'Array-1:\n{array_1},\nshape:{array_1.shape}\n')
print(f'Array-2:\n{array_2},\nshape:{array_2.shape}')

Array-1:
[[0 1]
 [2 3]],
shape:(2, 2)

Array-2:
[[[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]]],
shape:(3, 2, 2)


Now let's transpose both.

In [35]:
array_1_T = array_1.T
array_2_T = array_2.T

In [36]:
print(f'Array-1 transposed:\n{array_1_T},\nshape:{array_1_T.shape}\n')
print(f'Array-2 transposed:\n{array_2_T},\nshape:{array_2_T.shape}')

Array-1 transposed:
[[0 2]
 [1 3]],
shape:(2, 2)

Array-2 transposed:
[[[ 0  4  8]
  [ 2  6 10]]

 [[ 1  5  9]
  [ 3  7 11]]],
shape:(2, 2, 3)


Array-1 has swap the rows for the columns. Array-2 has reshaped from a three 2x2 arrays, into two 2x3 arrays. This is because the indices "cycle" such that the third index pass to the first place, the second to the third, and the first to the second.

To continue with:
- moveaxis
- swapaxes

In [37]:
### Dimension manipulation

In [38]:
### Joining arrays

In [39]:
### Splitting arrays

In [40]:
### Tiling arrays

In [41]:
### Adding and removing elements

In [42]:
### Rearranging elements

https://numpy.org/doc/stable/user/quickstart.html#shape-manipulation  
https://numpy.org/doc/stable/reference/routines.array-manipulation.html  
https://numpy.org/doc/stable/reference/routines.sort.html  
https://numpy.org/doc/stable/reference/generated/numpy.pad.html#numpy.pad  

## Array arithmetic

https://numpy.org/doc/stable/user/quickstart.html#basic-operations

## NumPy Universal Functions

https://numpy.org/doc/stable/user/quickstart.html#universal-functions  
https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html  
https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs

## Array copy and views

https://numpy.org/doc/stable/user/quickstart.html#copies-and-views

## Array logic and evaluation

https://numpy.org/doc/stable/reference/routines.logic.html

## Basic statistics operations

https://numpy.org/doc/stable/reference/routines.statistics.html

## Basic linear algebra operations

https://numpy.org/doc/stable/reference/routines.linalg.html  
https://numpy.org/doc/stable/user/quickstart.html#vector-stacking


## Broadcasting

https://numpy.org/doc/stable/user/basics.broadcasting.html  
https://numpy.org/doc/stable/reference/ufuncs.html#broadcasting

## Array indexing

https://numpy.org/doc/stable/user/basics.indexing.html  
https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing   
https://numpy.org/doc/stable/reference/arrays.indexing.html

## String operations

https://numpy.org/doc/stable/reference/routines.char.html

## Datetime functions

https://numpy.org/doc/stable/reference/routines.datetime.html

## Structured arrays

https://numpy.org/doc/stable/user/basics.rec.html

## Input-output (I/O) processing (load and read data)

https://numpy.org/doc/stable/user/basics.io.html  
https://numpy.org/doc/stable/reference/routines.io.html

## Random sampling

https://numpy.org/doc/stable/reference/random/index.html

## Array masking

https://numpy.org/doc/stable/reference/routines.ma.html

## NumPy constants and scalars

https://numpy.org/doc/stable/reference/constants.html  
https://numpy.org/doc/stable/reference/arrays.scalars.html

## Array Iteration

https://numpy.org/doc/stable/user/c-info.beyond-basics.html