# Table of Content
[NumPy](#NumPy)
    
* [Introduction](#Introduction)
* [Data Types in Python](#DataTypesInPython)
* [NumPy Standard Data Types](#NumPyStandardDataTypes)
* [NumPy Arrays](#NumPyArrays)
	* [NumPy Array Attributes](#NumPyArrayAttributes)
	* [Indexing](#Indexing) 
	* [Slicing](#Slicing)
* [Array Reshape](#ArrayReshape)
* [Array Concatenation And Splitting](#ArrayConcatenationAndSplitting)
* [Array Math](#ArrayMath)
* [Broadcasting](#Broadcasting)
* [Numpy Documentation](#NumpyDocumentation)


# <a id="NumPy"></a>NumPy

## <a id="Introduction"></a>Introduction

Datasets can come from a various range of sources and formats, like 
collections of documents, images, sound clips, numerical measurements etc. 
Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

Digital images can be seen as two-dimensional arrays of numbers. 
Sound clips can be seen as one-dimensional arrays of intensity versus time. 
Text can be converted in various ways into numerical representations, like frequency of words or pairs of words. 
No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers(This process is called Feature Engineering). 

Therefore storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science.

Python offers specialized tools for handling numerical arrays, the NumPy package.

NumPy is the fundamental package for scientific computing with Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.
It contains among other things:

 * a powerful N-dimensional array object
 * sophisticated (broadcasting) functions
 * tools for integrating C/C++ and Fortran code
 * useful linear algebra, Fourier transform, and random number capabilities

More details about the NumPy can be found at http://www.numpy.org/

## <a id="DataTypesInPython"></a>Data Types In Python

Python is dynamicly typed language. 
Statically-typed languages like C or Java requires that type of the each variable has to be explicitly declared, 
while dynamically-typed language like Python skips this specification. 

For example, in C you might specify a particular operation as follows:

```
int counter = 0;
for(int i=0; i<100; i++){
    counter += i;
}
```

In Python same operation can be written as follows:

```
counter = 0
for i in range(100):
    counter += i
```

So in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. 
This means, for example, that in Python we can assign any kind of data to any variable:

```
a = 1
a = "one"
```

If we try similar thing in C, we will end up in compilation error:

```
int a = 1;
a = "one"; // compilation error
```

This sort of flexibility is one piece that makes Python and other dynamically-typed languages convenient and easy to use. 
But in order to have type-flexibility, Python variables does not hold only the values, they  hold extra information about the type of the value. 

The standard Python implementation is written in C. 
This means that every Python object is a C structure, which contains it's value and other neccesary information. 

For example, when we define an integer in Python, 
such as **a = 1**,  **a** is actually a pointer to a C structure. 
Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like:


```
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.4 actually contains four pieces:

 * **ob_refcnt**, a reference count 
 * **ob_type**,   holds type of the variable
 * **ob_size**,   holds size size of the data members
 * **ob_digit**,  which contains the actual value 

This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, 

So in C integer is essentially a pointer to position in memory whose bytes encode an integer value. 
A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value. 
This extra information in the Python integer structure is what gives the flexibilty to Python to be coded so freely and dynamically. 
But flexibility comes at a cost, which becomes especially apparent in structures that combine many of these objects.

So, in short Python data types has overhead, comparing to C data types.

Now if we consider Python aggregates like list, then we will have even more overhead in comparism to C.

Python list are defined as follows:

```
pythonList = list(range(5))
```

List can hold varius data types:

```
pythonList = [True, "2", 3.0, 4]
```


Each item in the list must contain its own type info, reference count, and other information. 
In the special case when all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array.

NumPy is implemented with idea to reduce mentioned overhead in order to provide fast computation.
NumPy is implemented in C, based on Atlas library(http://math-atlas.sourceforge.net/), which is a library for linear algebra operations.
The library's name is actually short for **Numeric Python** or **Numerical Python**.

NumPy is implemented around fixed-type arrays which are much more efficient for storing and manipulating data.
Even Python from version 3.3 offers the built-in **array** module


```
import array
L = list(range(10))
A = array.array('i', L)
```

Array module is efficient for storing array-based data, 
while NumPy library adds efficient operations on that data. 

So, in short, NumPy is a Python library that is the core library for scientific computing in Python. 
It contains a collection of tools and techniques that can be used to solve on a computer mathematical models of problems in Science and Engineering. 
One of these tools is a high-performance multidimensional **array** object that is a powerful data structure for efficient computation of arrays and matrices. 
To work with these arrays, there's a huge amount of high-level mathematical functions which are operating on these matrices and arrays.

However, on a structural level, an NumPy **array** is basically nothing but pointers. It's a combination of a memory address, a data type, a shape and strides:

 * the **data**,     pointer indicates the memory address of the first byte in the array
 * the **data**,     type or dtype pointer describes the kind of elements that are contained within the array
 * the **shape**,    indicates the shape of the array
 * the **strides**,  are the number of bytes that should be skipped in memory to go to the next element
 

In other words, an NumPy **array** contains information about the raw data, how to locate an element and how to interpret an element.

Looking through the NumPy code, we can see how NumPy **array** structure is implemented:

```
typedef struct PyArrayObject {
        PyObject_HEAD

        /* Block of memory */
        char *data;

        /* Data type descriptor */
        PyArray_Descr *descr;

        /* Indexing scheme */
        int nd;
        npy_intp *dimensions;
        npy_intp *strides;

        /* Other stuff */
        PyObject *base;
        int flags;
        PyObject *weakreflist;
} PyArrayObject;
```

Now let's build first NumPy array:



In [1]:
import numpy as np

# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## <a id="NumPyStandardDataTypes"></a>NumPy Standard Data Types


NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. 
Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in table bellow. Note that when constructing an **array**, data type can be specified using a string:


In [2]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

Or using the associated NumPy object:

In [3]:
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

| Data type        | Description                                                                      |
|------------------|----------------------------------------------------------------------------------|			        
|   **bool_**	   | Boolean (True or False) stored as a byte                                         |
|   **int_**	   | Default integer type (same as C long; normally either int64 or int32)            |
|   **intp**	   | Integer used for indexing (same as C ssize_t; normally either int32 or int64)    |
|   **int8**	   | Byte (-128 to 127)                                                               |
|   **int16**	   | Integer (-32768 to 32767)                                                        |
|   **int32**	   | Integer (-2147483648 to 2147483647)                                              |
|   **int64**	   | Integer (-9223372036854775808 to 9223372036854775807)                            |
|   **uint8**	   | Unsigned integer (0 to 255)                                                      |
|   **uint16**	   | Unsigned integer (0 to 65535)                                                    |
|   **uint32**	   | Unsigned integer (0 to 4294967295)                                               |
|   **uint64**	   | Unsigned integer (0 to 18446744073709551615)                                     |
|   **float_**	   | Shorthand for float64                                                            |
|   **float16**	   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa                |
|   **float32**	   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa              |
|   **float64**	   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa             |
|   **complex_**   | Shorthand for complex128                                                         |
|   **complex64**  | Complex number, represented by two 32-bit floats                                 |
|   **complex128** | Complex number, represented by two 64-bit floats                                 |  

More advanced type specification is possible, such as specifying big or little endian numbers, for more information, refer to the NumPy documentation. 


## <a id="NumPyArrays"></a>NumPy Arrays

Let's start with NumPy's random number generator, which we will seed with a set value in order to ensure that the same random 
arrays are generated each time this code is executed:

In [4]:
np.random.seed(7) 

NumPy array can be created with many functions, let's start with function which filled the array with random values:

In [5]:
a1 = np.random.randint(10, size=6)  # One-dimensional array
a2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
a3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
print(a1)
print(a2)
print(a3)

[4 9 6 3 3 7]
[[7 9 7 8]
 [9 8 7 6]
 [4 0 7 0]]
[[[7 6 3 5 8]
  [8 7 5 0 0]
  [2 8 9 6 4]
  [9 7 3 3 8]]

 [[3 0 1 0 0]
  [6 7 7 9 3]
  [0 7 7 7 0]
  [5 4 3 1 3]]

 [[1 3 4 3 1]
  [9 5 9 1 2]
  [3 2 2 5 7]
  [3 0 9 9 3]]]


Another array initialization can be done from nested Python list:

In [6]:
a4 = np.array([1, 2, 3])  # One-dimensional array
a5 = np.array([[1,2,3],[4,5,6]])    # Two-dimensional array
print(a4)
print(a5)

[1 2 3]
[[1 2 3]
 [4 5 6]]


Other possibilities are:

In [7]:
a6 = np.zeros((2,2))   # Create an array of all zeros
a7 = np.ones((1,2))    # Create an array of all ones
a8 = np.full((2,2), 7)  # Create a constant array
a9 = np.eye(2)         # Create an 2x2 identity matrix
a10 = np.linspace(0, 100, 6) # Create an array of 6 evenly divided values from 0 to 100
a11 = np.arange(0, 10, 3) # Create an array of values from 0 to less than 10 with step 3 (eg [0,3,6,9]) 
a12 = np.full((2,3), 8) # Create and 2x3 array with all values 8

In [8]:
print(a6)
print(a7)
print(a8)
print(a9)
print(a10)
print(a11)
print(a12)

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[  0.  20.  40.  60.  80. 100.]
[0 3 6 9]
[[8 8 8]
 [8 8 8]]


### <a id="NumPyArrayAttributes"></a>NumPy Array Attributes

NumPy array has attributes:   
 * **ndim** the number of dimensions   
 * **shape** the size of each dimension   
 * **size** the total size of the array   
 * **dtype** the data type of the array   
 * **itemsize** size in bytes of each array element   
 * **nbytes** lists the total size in bytes of the array   


In [9]:
print("a3 ndim: ", a3.ndim)   
print("a3 shape:", a3.shape)   
print("a3 size: ", a3.size)   
print("a3 dtype:", a3.dtype)   
print("a3 itemsize:", a3.itemsize, "bytes")   
print("a3 nbytes:", a3.nbytes, "bytes")

a3 ndim:  3
a3 shape: (3, 4, 5)
a3 size:  60
a3 dtype: int32
a3 itemsize: 4 bytes
a3 nbytes: 240 bytes


### <a id="Indexing"></a>Indexing

Indexing is similar to standard Python indexing.
In one dimensional array, element can be accessed with sqare brackets:


In [10]:
print(a1)
print(a1[0])

[4 9 6 3 3 7]
4


In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [11]:
print(a2)
print(a2[0, 0])

[[7 9 7 8]
 [9 8 7 6]
 [4 0 7 0]]
7


Values can also be modified using any of the above index notation:

In [12]:
a2[0, 0] = 1111
print(a2)


[[1111    9    7    8]
 [   9    8    7    6]
 [   4    0    7    0]]



### <a id="Slicing"></a>Slicing

Square brackets can be used to access subarrays with the slice notation, marked by the colon (:) character. 
The NumPy slicing syntax follows standard Python list syntax to access a slice of an array x, use:


```
x[start:stop:step]
```

If any of these are unspecified, default values are **start=0**, **stop=size of dimension**, **step=1**. 

One-dimensional arrays:

In [13]:
x = np.arange(10)

print(x)
print(x[:2])  # first two elements
print(x[2:])  # elements after index 2
print(x[4:7])  # middle sub-array
print(x[::2])  # every second element

[0 1 2 3 4 5 6 7 8 9]
[0 1]
[2 3 4 5 6 7 8 9]
[4 5 6]
[0 2 4 6 8]


Multi-dimensional arrays:

In [14]:
x = np.random.randint(10, size=(3, 4))  # Two-dimensional array

print(x)

print(x[:2, :3])  # two rows, three columns
print(x[:, 0])  # first column of array x
print(x[0, :])  # first row of array x

[[4 5 3 0]
 [4 8 6 7]
 [2 7 3 8]]
[[4 5 3]
 [4 8 6]]
[4 4 2]
[4 5 3 0]


Array slice(subarray) does not return copy of the array, it returns view of array.
In Python list, slice returns copy of the list.

Sometimes we have a need for copy a data within an array or a subarray. This can be easily done with the ***copy()*** method:

In [15]:
x = np.random.randint(10, size=(3, 4))  # Two-dimensional array

print(x)
x_copy = x[:2, :2].copy() # Copy sun array
print(x_copy)

# If we now modify this subarray, the original array is not touched:

x_copy[0, 0] = 45

print(x_copy)
print(x)



[[6 6 5 6]
 [5 7 1 5]
 [4 4 9 9]]
[[6 6]
 [5 7]]
[[45  6]
 [ 5  7]]
[[6 6 5 6]
 [5 7 1 5]
 [4 4 9 9]]


## <a id="ArrayReshape"></a>Array Reshape

Arrays can be reshaped. Reshape can be executed with **reshape** function:

In [16]:
x = np.array([1, 2, 3])

print(x)

x = x.reshape(3, 1) # Reshapes arr to 3 rows, 1 column without changing data

print(x)

[1 2 3]
[[1]
 [2]
 [3]]


Another usefull operation is operation of transposing a matrix. Transpoting is operation where rows become colums and vice versa.   

In order to transpose and array, use the **T** attribute of an array object:


In [17]:
x = np.array([[1,2], [3,4]])

print(x) 
print(x.T) 


[[1 2]
 [3 4]]
[[1 3]
 [2 4]]


## <a id="ArrayConcatenationAndSplitting"></a>Array Concatenation And Splitting


It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays:


In [18]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

Two-dimensional arrays can be combined as well:

In [19]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

If arrays has mixed dimensions, it is better to use the **vstack** (vertical stack) and **hstack** (horizontal stack) functions:


In [20]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [21]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similary, **dstack** will stack arrays along the third axis.

Splitting is implemented with functions **split**, **hsplit**, and **vsplit**.

In [22]:
x = [1, 2, 3, 4, 5, 6, 7, 8]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [4 5] [6 7 8]


## <a id="ArrayMath"></a>Array Math

Basic mathematical functions operates elementwise on arrays, and they are available as overloaded operator  
and as functions in the NumPy module:


In [23]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

Elementwise add:

In [24]:
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


Elementwise difference:

In [25]:
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


Elementwise product:

In [26]:
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


Elementwise division:

In [27]:
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


Elementwise square root:

In [28]:
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


For matrix manipulation **dot** is used:

In [29]:
print(x.dot(y))
print(np.dot(x, y))

[[19. 22.]
 [43. 50.]]
[[19. 22.]
 [43. 50.]]


NumPy provides many useful functions for performing computations on arrays, one of the most useful is **sum**:

In [30]:
print(np.sum(x))  # Compute sum of all elements; 
print(np.sum(x, axis=0))  # Compute sum of each column
print(np.sum(x, axis=1))  # Compute sum of each row

10.0
[4. 6.]
[3. 7.]


We can easily find minimum/maximum of NumPy array:

In [31]:
print(np.min(x)) 
print(np.max(x))

1.0
4.0


## <a id="Broadcasting"></a>Broadcasting


Strictly, arithmetic may only be performed on arrays that have the same dimensions and dimensions with the same size.
This means that a one-dimensional array with the length of 10 can only perform arithmetic with another one-dimensional array with the length 10.

This limitation on array arithmetic is quite limiting indeed. 
Thankfully, NumPy provides a built-in workaround to allow arithmetic between arrays with differing sizes.
Concept is called broadcasting.

Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.

Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:

In [32]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

Broadcasting allows these types of binary operations to be performed on arrays of different sizes, for example:


In [33]:
a = np.array([0, 1, 2])
a + 5

array([5, 6, 7])

Let's add a one-dimensional array to a two-dimensional array:

In [34]:
a = np.array([0, 1, 2]) # one dimension (1,3)
b = np.ones((3, 3)) # two dimensions(3,3)
b + a # result two dimensions(3,3)


array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

Now let's try even more complicated case:

In [35]:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]
print(a)
print(b)

[0 1 2]
[[0]
 [1]
 [2]]


In [36]:
c = a + b
print(c)

[[0 1 2]
 [1 2 3]
 [2 3 4]]


So, we can see that NumPy in fact streches one value into the shape of other value.
Broadcasting solves the problem of arithmetic between arrays of different shapes by replicating the smaller array along the last mismatched dimension.

## <a id="NumpyDocumentation"></a>Numpy Documentation


This brief overview has touched on many of the important things that you need to know about NumPy, but is far from complete.   
You can check NumPy [reference documentation](https://docs.scipy.org/doc/numpy/reference/) to find out all details.