# Numpy: Numeric Computing Library

NumPy is a core package for Python development. It is used for numerical computing. Many other libraries, such as Pandas, Matplotlib, and Statmodels all rely on Numpy.

NumPy's main contributions are:

> Efficient numberic computation with C primitives

> Efficient collections with vectorized operations

> An integrated and natural Linear Algebra API

> A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

In Python, everything is an Object, which means that even simple integers are also objects. With the required machinery to make these data types work, we refer to them as "Boxed Ints". NumPy converts these into primitive data types in order to make them more efficient.

In [6]:
import numpy as np
import sys

In [7]:
np.int8

numpy.int8

You can use numpy to specify integers and their sizes, saving a lot of space compared to native python.

The same thing goes for arrays. python natively wraps each item in a list into an object. When converting a list to numpy, it saves a ton of data.

## NumPy Arrays

Creating a numpy array is simple, just like making a Python list

In [14]:
array = np.array([ 0, 1, 2, 3, 4])
array

array([0, 1, 2, 3, 4])

**numPy arrays are unique from Python lists in a lot of ways, however a rather important one is:**

> You must keep data types the same, as numPy works to optimize the array based on the types.

It is not wise to store multiple data types in a numPy array.

In [15]:
a = np.array([0,1,2,3,4])
a.dtype

dtype('int32')

In [16]:
b = np.array([0.0,1.1,2.2,3.3,4.4])
b.dtype

dtype('float64')

If you wish to specify the data type, you can pass that as an argument when creating the array:

In [23]:
c = np.array([0,1,2,3], dtype=float)
c.dtype

dtype('float64')

In [24]:
d = np.array(['a','b','c'])
d.dtype

dtype('<U1')

Most of the time, nump will not be used to store string or char data.

## Dimensions and Shapes

In [25]:
e = np.array([
#   0  1 2 
    [1,2,3], # 0
    [4,5,6]  # 1
])

In [26]:
e.shape

(2, 3)

> The above means that the above defined array has 2 columns consisting of three rows each

> We can also see the dimensions of the array, as well as the total amount of all entries in the array

In [27]:
e.ndim

2

In [28]:
e.size

6

When accessing the elements of the array in a matrix form, the first index specified will target the indexed row, while the second will target the indexed column. When used together, you cna target specific elements:

In [29]:
e[1][2]

6

You can also target multiple elements using numpy's syntax, called slicing

In [30]:
e[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [31]:
e[:, :2]

array([[1, 2],
       [4, 5]])

: is inclusive. You can specify an index before or after to target starting at and up to respectively. 

## Using Statistics with NumPy

numpy comes built in with many usefull statistics tools to help summarize and verify your data such as:

In [33]:
e.mean()

3.5

In [36]:
e.std()

1.707825127659933

In [37]:
e.sum()

21

you can specify which axis you want as an argument


In [38]:
e.mean(axis=1)

array([2., 5.])

## Broadcasting and Vectorized Operations

In [42]:
f = np.array([
#   0  1 2 
    [1,2,3], # 0
    [4,5,6], # 1
    [7,8,9]  #2
])

You can use operations to modify **the entire dataset of the array**

In [43]:
f + 10

array([[11, 12, 13],
       [14, 15, 16],
       [17, 18, 19]])

However, the original array will remain unchanged, so its best to set the new dataset to a new variable.

In [45]:
f

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [46]:
g = f + 100
g

array([[101, 102, 103],
       [104, 105, 106],
       [107, 108, 109]])

In [47]:
f

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

You can also operate two arrays with one another:

In [50]:
h = f + g
h

array([[102, 104, 106],
       [108, 110, 112],
       [114, 116, 118]])

## Boolean Arrays

*Also called masks*

Boolean arrays can be used to select elements instead of targeting their specific indexes.

In [53]:
boo = np.arange(4)
boo

array([0, 1, 2, 3])

We can select the first and last index by specifying:

In [60]:
boo[0], boo[-1]
boo[[0,-1]]

array([0, 3])

**OR**

In [59]:
boo[[True, False, False, True]]

array([0, 3])

We can also return an array of boolean values based on an expression:

In [61]:
boo >=2

array([False, False,  True,  True])

If we combine these techniques, we can return specified values:

In [62]:
boo[boo >= 2]

array([2, 3])

If we combine these with the stastics methods, we can provide values that match criteria we may be looking for:

In [63]:
boo[boo > boo.mean()]

array([2, 3])