# Introduction<a id="0"></a>
<hr>
**NumPy** is a basic package for scientific computing. It is a **Python** language implementation which includes:

* The powerful N-dimensional array structure
* Sophisticated functions
* Tools that can be integrated into C/C++ and Fortran code
* Linear algebra, Fourier transform and random number features

In addition to being used for scientific computing, NumPy also can be used as an efficient multi-dimensional container for general data. Because it can work with any type of data, NumPy can be integrated into multiple types of databases seamlessly and efficiently.

* If you like it, thank you for you **upvotes**.
* If you have any **question**, I will happy to hear it

<hr>

1. [ndarray](#1)
2. [Create a specific array](#2)
3. [Shape and operation](#3)
4. [Index](#4)
5. [Mathematics](#5)
6. [Matrix](#6)
7. [Random Number](#7)
8. [Conclusion](#8)
9. [Reference](#9)

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

Basic Porperties and array creation

Here is an array with rank 1, and the length of the axis is 3:

In [None]:
[1,2,3]

Below is an array with rank 2, and the length of the axis is 3 too:

In [None]:
[[ 1, 2, 3],[ 4, 5, 6]]

We can create an array of NumPy through the array function, for example:

In [None]:
a = np.array([1, 2, 3])
b = np.array([(1,2,3), (4,5,6)])

print("a: ",a)
print("b: ",b)

Note that the square brackets are required here. And the following way of writing is wrong:

In [None]:
# a = np.array(1,2,3,4) # WRONG!!!

NumPy's array class is **ndarray**, which has an alias  **numpy.array**, but it's different from **array.array** in the Python standard library. The latter is just a one-dimensional array. The features of **ndarray** are as follows:
<a id="1"></a>
<mark>[Return Contents](#0)
<hr>

* **ndarray.ndim:** the dimension number of the array. It's called rank in Python.
* **ndarray.shape:** the dimension of the array. It's a series of numbers whose length is determined by the dimension （ndim） of the array. For example, the shape of a one-dimensional array with length n is n. And the shape of an array with n rows and m columns is n,m.
* **ndarray.size:** the number of all elements in the array.
* **ndarray.dtype:** the type of the element in the array, such as numpy.int32, numpy.int16, or numpy.float64.
* **ndarray.itemsize:** the size of each element in the array, in bytes.
* **ndarray.data:** the buffering for storing the array elements. Usually we only need to access the elements by subscripts, and don't need to access the buffer.

Let's take a look at the code example:

In [None]:
a = np.array([1, 2, 3])
b = np.array([(1,2,3), (4,5,6)])

print('a=')
print(a)
print("a's ndim {}".format(a.ndim))
print("a's shape {}".format(a.shape))
print("a's size {}".format(a.size))
print("a's dtype {}".format(a.dtype))
print("a's itemsize {}".format(a.itemsize))

print('')

print('b=')
print(b)
print("b's ndim {}".format(b.ndim))
print("b's shape {}".format(b.shape))
print("b's size {}".format(b.size))
print("b's dtype {}".format(b.dtype))
print("b's itemsize {}".format(b.itemsize))

We can also specify the type of the element when creating the array, for example:

In [None]:
c = np.array( [ [1,2], [3,4] ], dtype=complex )
c

## Create a specific array<a id="2"></a>
<mark>[Return Contents](#0)
<hr>

In actual project engineering, we often need some specific data, and some helper functions are provided in NumPy:

* **zeros:** used to create an array whose elements are all 0
* **ones:** used to create an array whose elements are all 1
* **empty:** used to create uninitialized data. so the content is undefined.
* **arange:** used to create an array by specifying the scope and step-length
* **linespace:** used to create an array by specifying the range and the number of elements
* **random:** used to generate random numbers

In [None]:
a = np.zeros((2,3))
print('np.zeros((2,3)= \n{}\n'.format(a))

b = np.ones((2,3))
print('np.ones((2,3))= \n{}\n'.format(b))

c = np.empty((2,3))
print('np.empty((2,3))= \n{}\n'.format(c))

d = np.arange(1, 2, 0.3)
print('np.arange(1, 2, 0.3)= \n{}\n'.format(d))

e = np.linspace(1, 2, 7)
print('np.linspace(1, 2, 7)= \n{}\n'.format(e))

f = np.random.random((2,3))
print('np.random.random((2,3))= \n{}\n'.format(f))

## Shape and operation<a id="3"></a>
<mark>[Return Contents](#0)
<hr>

In addition to generating an array, after we have held some data, we may need to generate some new data structures based on the existing array. In this case, we can use the following functions:

* **reshape:** used to generate a new array based on the existing array and the specified shape
* **vstack:** used to stack multiple arrays in vertical direction (the dimensions of the array must be matched)
* **hstack:** used to stack multiple arrays in horizontal direction (the dimensions of the array must be matched)
* **hsplit:** used to split the array horizontally
* **vsplit:** used to split the array vertically

We'll use some examples to illustrate.

To make it easier to test, let's create a few data:

* **zero_line:** an array with a row containing three 0
* **one_column:** an array with a column containing three 1
* **a:** a matrix with 2 rows and 3 columns
* **b:** an integer array in the interval of [11,20]

In [None]:
zero_line = np.zeros((1,3))
one_column = np.ones((3,1))
print("zero_line = \n{}\n".format(zero_line))
print("one_column = \n{}\n".format(one_column))

a = np.array([(1,2,3), (4,5,6)])
b = np.arange(11, 20)
print("a = \n{}\n".format(a))
print("b = \n{}\n".format(b))

The array b is a one-dimensional array originally, and we resize it into a matrix of 3 rows and 3 columns by the reshape method:

In [None]:
b = b.reshape(3, -1)
print("b.reshape(3, -1) = \n{}\n".format(b))

The second parameter here is set to -1, which means that it'll be determined based on actual conditions automatically. Since the array has 9 elements originally, the matrix after being resized is 3X3. The code output is as follows:

In [None]:
b.reshape(3, -1)

Next, we'll stack the three arrays vertically through the vstack function:

In [None]:
c = np.vstack((a, b, zero_line))
print("c = np.vstack((a,b, zero_line)) = \n{}\n".format(c))

Similarly, we can also use the hstack for horizontal stacking. This time we need to adjust the structure of the array a first:

In [None]:
a = a.reshape(3, 2)
print("a.reshape(3, 2) = \n{}\n".format(a))

d = np.hstack((a, b, one_column))
print("d = np.hstack((a,b, one_column)) = \n{}\n".format(d))

Next, let's take a look at the split. First, we split the array d into three arrays in horizontal direction. Then we print out the middle one (the subscript is 1):

In [None]:
e = np.hsplit(d, 3) # Split a into 3
print("e = np.hsplit(d, 3) = \n{}\n".format(e))
print("e[1] = \n{}\n".format(e[1]))

In addition to specifying number to split the array evenly, we can also specify the number of columns to split. The following is to split the array d from the first column and the third column:

In [None]:
f = np.hsplit(d, (1, 3)) # # Split a after the 1st and the 3rd column
print("f = np.hsplit(d, (1, 3)) = \n{}\n".format(f))

Finally, we split the array d in the vertical direction. Similarly, if the specified number cannot make the array be split evenly, it will fail:

In [None]:
g = np.vsplit(d, 3)
print("np.hsplit(d, 2) = \n{}\n".format(g))

# np.vsplit(d, 2) # ValueError: array split does not result in an equal division

## Index<a id="4"></a>
<mark>[Return Contents](#0)
<hr>

Next we look at how to access the data in the NumPy array.

Again, for testing convenience, let's create a one-dimensional array first. Its content is integers in the interval of [100,200).

Basically, we can specify the subscripts by array[index] to access the elements of the array.

In [None]:
base_data = np.arange(100, 200)
print("base_data\n={}\n".format(base_data))

print("base_data[10] = {}\n".format(base_data[10]))

In NumPy, we can create an array containing several subscripts to get the elements in the target array. For example:

In [None]:
every_five = np.arange(0, 100, 5)
print("base_data[every_five] = \n{}\n".format(
    base_data[every_five]))

The subscript array can be one-dimensional, or multi-dimensional. Let's suppose that we want to get a 2X2 matrix whose content comes from the four subscripts of 1, 2, 10, and 20 in the target array, so the code can be written:

In [None]:
a = np.array([(1,2), (10,20)])
print("a = \n{}\n".format(a))
print("base_data[a] = \n{}\n".format(base_data[a]))

The above we see is the case where the target array is one-dimensional. Let's convert the following array into a 10X10 two-dimensional array.

In [None]:
base_data2 = base_data.reshape(10, -1)
print("base_data2 = np.reshape(base_data, (10, -1)) = \n{}\n".format(base_data2))

For a two-dimensional array,

* if we only specify one subscript, the result of the access is still an array.
* if we specify two subscripts,  the result of the access is the elements inside.
* we can also specify the last element by "-1".

In [None]:
print("base_data2[2] = \n{}\n".format(base_data2[2]))
print("base_data2[2, 3] = \n{}\n".format(base_data2[2, 3]))
print("base_data2[-1, -1] = \n{}\n".format(base_data2[-1, -1]))

In addition, we can also specify the scope by ":", such as: 2:5 . Only to write ":" indicates the full scope.

Please see the code below:

It will:

* get all the elements of the row whose subscript is 2
* get all the elements of the column whose subscript is 3
* get all the elements of the rows whose subscripts are in [2,5) and the columns * whose subscripts are in [2,4). Please observe the following output carefully:

In [None]:
print("base_data2[2, :]] = \n{}\n".format(base_data2[2, :]))
print("base_data2[:, 3]] = \n{}\n".format(base_data2[:, 3]))
print("base_data2[2:5, 2:4]] = \n{}\n".format(base_data2[2:5, 2:4]))

## Mathematics<a id="5"></a>
<mark>[Return Contents](#0)
<hr>

There are also a lot of mathematical functions in NumPy. Here are some examples.

In [None]:
base_data = (np.random.random((5, 5)) - 0.5) * 100
print("base_data = \n{}\n".format(base_data))

print("np.amin(base_data) = {}".format(np.amin(base_data)))
print("np.amax(base_data) = {}".format(np.amax(base_data)))
print("np.average(base_data) = {}".format(np.average(base_data)))
print("np.sum(base_data) = {}".format(np.sum(base_data)))
print("np.sin(base_data) = \n{}".format(np.sin(base_data)))

In [None]:
arr = np.arange(1,20)
arr = arr * arr              #Multiplies each element by itself 
print("Multpiles: ",arr)
arr = arr - arr              #Subtracts each element from itself
print("Substracts: ",arr)
arr = np.arange(1,20)
arr = arr + arr              #Adds each element to itself
print("Add: ",arr)
arr = arr / arr              #Divides each element by itself
print("Divide: ",arr)
arr = np.arange(1,20)
arr = arr + 50
print("Add +50: ",arr)

In [None]:
print("Sqrt: ",np.sqrt(arr))#Returns the square root of each element 
print("Exp: ",np.exp(arr))     #Returns the exponentials of each element
print("Sin: ",np.sin(arr))     #Returns the sin of each element
print("Cos: ",np.cos(arr))     #Returns the cosine of each element
print("Log: ",np.log(arr))     #Returns the logarithm of each element
print("Sum: ",np.sum(arr))     #Returns the sum total of elements in the array
print("Std: ",np.std(arr))     #Returns the standard deviation of in the array

## Matrix<a id="6"></a>
<mark>[Return Contents](#0)
<hr>

Now, let's take a look at how to use NumPy in a matrix way.

First, let's create a 5X5 random integer matrix. There are two ways to get the transpose of a matrix: **.T** or **transpose** function. In addition, the matrix can be multiplied through the **dot** function. The sample code is as follows:

In [None]:
base_data = np.floor((np.random.random((5, 5)) - 0.5) * 100)
print("base_data = \n{}\n".format(base_data))

print("base_data.T = \n{}\n".format(base_data.T))
print("base_data.transpose() = \n{}\n".format(base_data.transpose()))

matrix_one = np.ones((5, 5))
print("matrix_one = \n{}\n".format(matrix_one))

minus_one = np.dot(matrix_one, -1)
print("minus_one = \n{}\n".format(minus_one))

print("np.dot(base_data, minus_one) = \n{}\n".format(
    np.dot(base_data, minus_one)))

## Random Number<a id="7"></a>
<mark>[Return Contents](#0)
<hr>

At the end of the article, let's take a look at the use of random numbers.

Random numbers are a feature we use very often during the programming process, such as generating demo data, or disordering existing data sequence randomly to segment the modeling data and the verification data.

The numpy.random package contains a number of algorithms for random numbers. Here we list the four most common usage:

The four usages are:

* to generate 20 random numbers, each of which is between **[0.0, 1.0)**
* to generate a random number based on the specified **shape**
* to generate a specified number (such as 20) of random integers within the specified range (such as **[0, 100)**)
* to disorder the sequence of the existing data (**[0, 1, 2, ..., 19]**) randomly
The output is as follows:

In [None]:
print("random: {}\n".format(np.random.random(20)));

print("rand: {}\n".format(np.random.rand(3, 4)));

print("randint: {}\n".format(np.random.randint(0, 100, 20)));

print("permutation: {}\n".format(np.random.permutation(np.arange(20))));

## Conclusion <a id="8"></a>
<mark>[Return Contents](#0)
<hr>
* If you like it, thank you for you upvotes.
* If you have any question, I will happy to hear it

## Reference<a id="9"></a>
<mark>[Return Contents](#0)
<hr>

* https://www.tutorialdocs.com/article/python-numpy-tutorial.html
* https://towardsdatascience.com/lets-talk-about-numpy-for-datascience-beginners-b8088722309f
* http://www.numpy.org