## Numpy Introduction

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Intro to Numpy

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

![Python vs Numpy](https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396)

In [2]:
x = 4

In [6]:
import sys
import numpy as np

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Basic Numpy Arrays

In [8]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [9]:
a = np.array([1, 2, 3, 4])

In [24]:
b = np.array([10, 11, 12.5, 13])

In [11]:
a[0], a[1]

(1, 2)

In [13]:
a[1:]

array([2, 3, 4])

In [14]:
a[2:4]

array([3, 4])

In [15]:
a[1:-1]

array([2, 3])

In [16]:
a[::2] #Step as 2

array([1, 3])

Multi Indexing : We can use indexes to create another array by passing indexes

In [18]:
c = b[[0, 2, -1]]

In [19]:
c

array([10, 12, 13])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Numpy Array Types

In [21]:
a

array([1, 2, 3, 4])

In [22]:
a.dtype

dtype('int32')

In [25]:
b.dtype

dtype('float64')

In [27]:
d = np.array([1, 2, 3], dtype=np.float)

In [28]:
d.dtype

dtype('float64')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [29]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]    
])

In [30]:
A.shape

(2, 3)

In [31]:
A.size

6

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and slicing of Matrices

In [32]:
# Square Matrix

A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [33]:
A[1]

array([4, 5, 6])

In [34]:
A[1][0]

4

In [35]:
A[1][1]

5

In [36]:
A[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [43]:
A[:2,:2]

array([[1, 2],
       [4, 5]])

In [44]:
A[:2,2:]

array([[3],
       [6]])

In [47]:
A[:,2]

array([3, 6, 9])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary Statistics

In [48]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [49]:
A.sum()

45

In [50]:
A.mean()

5.0

In [51]:
A.sum(axis=0) # Axis 0 is to depict the columns

array([12, 15, 18])

In [52]:
A.sum(axis=1) # Axis 1 is to depict the rows

array([ 6, 15, 24])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized Operations

In [54]:
a = np.arange(4)

In [55]:
a

array([0, 1, 2, 3])

In [56]:
a + 10

array([10, 11, 12, 13])

> Internally all the operation is applied on all the objects; the same thing can be done on Scalars and Vectors

In [57]:
a += 100

In [58]:
a

array([100, 101, 102, 103])

In [59]:
l = [0, 1, 2, 3]

In [60]:
[i + 10 for i in l]

[10, 11, 12, 13]

In [61]:
l

[0, 1, 2, 3]

In [66]:
b = np.array([10]*4)

In [67]:
b

array([10, 10, 10, 10])

In [68]:
c = a + b

In [69]:
c

array([110, 111, 112, 113])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
*(Also called masks)*

In [70]:
a

array([100, 101, 102, 103])

In [71]:
a[[True,True,False,False]]

array([100, 101])

In [73]:
a <= 102

array([False, False,  True,  True])

In [75]:
a[a <= 102] # This becomes very strong function now

array([102, 103])

In [78]:
a[~(a > a.mean())]

array([100, 101])

In [81]:
a[(a == 100) | (a < 102)] #OR Condition

array([100, 101])

In [82]:
a[(a == 100) & (a < 102)] #AND Condition

array([100])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

#### Int, Floats

In [83]:
# An integer in Python is >24 bytes
sys.getsizeof(1)

28

In [85]:
# Longs are even longer
sys.getsizeof(100**1000)

912

In [88]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [87]:
np.dtype(float).itemsize

8

#### Lists are even longer

In [106]:
# Traditional Python List
List_Traditional = list(range(100000))
#List_Traditional

In [110]:
# Numpy Python List
List_Numpy = np.arange(100000)
#List_Numpy

Let's try to do the same operation on both the lists

In [112]:
%time sum(x ** 2 for x in List_Traditional)

Wall time: 90 ms


333328333350000

In [111]:
%time np.sum(List_Numpy ** 2)

Wall time: 1 ms


216474736