# Getting started with numpy

In [80]:
import numpy as np

Numpy data storage and computation is much more efficient than the built in Python features. It allows faster access and computation, as well as many Q.O.L. features.

In [81]:
data = np.array([1,2,3.5,'h',])

In [82]:
print(data)

['1' '2' '3.5' 'h']


We can have different types of items in the arrays

In [83]:
print(data[1])

2


In [84]:
print(data[[1, 2]])

['2' '3.5']


In [88]:
data.dtype

dtype('<U32')

dtype gives us the type of array we are dealing with

In [89]:
A = np.array([
    # 0 1 2 3 4
    [1,2,3,4,5],  #0
    [5,6,7,8,5],  #1
    [1,4,2,2,5]]) #2

In [87]:
A.size

15

2D arrays are intuitive, and the .size method gives the number of items. The .shape tells us the distribution of rows and columnsW

In [90]:
A.shape

(3, 5)

In [91]:
A[1,0]

5

We can use coordinate notation to access specific items in 2d arrays.

## Summary statistics

In [92]:
A.sum()

60

In [93]:
A.mean()

4.0

In [94]:
A.std()

2.065591117977289

Standard and self explanatory operations for datasets

In [95]:
A.sum(axis=0)

array([ 7, 12, 12, 14, 15])

In [96]:
A.sum(axis=1)

array([15, 31, 14])

This here gives us a sum either from row to row or column to column.

# Vectorized operations

In [98]:
s = np.arange(4)

In [99]:
s

array([0, 1, 2, 3])

In [100]:
s + 10

array([10, 11, 12, 13])

Adds 10 to each item

In [101]:
s * 10

array([ 0, 10, 20, 30])

Multiplies each item by 10

These operations create a new array with the modifications, they do not modify the original array- See below

In [103]:
s

array([0, 1, 2, 3])

In [107]:
s += 10

In [108]:
s

array([30, 31, 32, 33])

+= operator changes the original array

In [109]:
t = np.arange(4)

In [110]:
t += 50

In [111]:
t

array([50, 51, 52, 53])

In [112]:
s + t

array([80, 82, 84, 86])

In [113]:
s * t

array([1500, 1581, 1664, 1749])

Multiplying and adding arrays works too

In [114]:
t

array([50, 51, 52, 53])

In [115]:
s

array([30, 31, 32, 33])

# Boolean arrays

In [118]:
a = np.arange(5)

In [119]:
a

array([0, 1, 2, 3, 4])

In [121]:
a[[True, False, True, False, True]]

array([0, 2, 4])

In [122]:
a >= 5

array([False, False, False, False, False])

In [123]:
a >= 1

array([False,  True,  True,  True,  True])

Boolean operators return arrays of true/false values!! 

In [124]:
a[a>=1]

array([1, 2, 3, 4])

Many filtering opportunities

In [130]:
a[a>a.mean()]

array([3, 4])

In [131]:
a[~(a>a.mean())]

array([0, 1, 2])

~ Tilde is the NOT operator

In [132]:
a[(a == 5) | (a == 2)]

array([2])

In [137]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [138]:
A = np.random.randint(100, size=(3,3))

In [139]:
A

array([[60, 19, 26],
       [13, 11, 35],
       [65, 75, 44]])

In [140]:
A > 30

array([[ True, False, False],
       [False, False,  True],
       [ True,  True,  True]])

In [141]:
A[A>30]

array([60, 35, 65, 75, 44])

# Linear Algebra stuff

In [142]:
A = np.random.randint(100, size=(3,3))

In [143]:
B = np.random.randint(100, size=(3,3))

In [144]:
A.dot(B)

array([[ 2974,  4465,  6106],
       [10697, 15373, 20004],
       [ 5900,  8428, 10836]])

In [145]:
A @ B

array([[ 2974,  4465,  6106],
       [10697, 15373, 20004],
       [ 5900,  8428, 10836]])

In [146]:
B.T

array([[17, 73, 29],
       [47, 84, 40],
       [77, 86, 63]])

In [147]:
A

array([[31, 22, 29],
       [95, 97, 69],
       [52, 56, 32]])

In [148]:
B.T @ A

array([[ 8970,  9079,  6458],
       [11517, 11422,  8439],
       [13833, 13564, 10183]])

## Note : Python uses WAY more data to store simple numbers. Numpy is much more memory efficient.