![data-x](http://oi64.tinypic.com/o858n4.jpg)

---
# Numpy Data X - BKHW 

**Author:** Kunal Desai and Ikhlaq Sidhu 1/22/2017, midified June 2017

**License Agreement:** Feel free to do whatever you want with this code

___

# Introduction to NumPy

# What is NumPy:  

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities


# NumPy contains an array object that is "fast"


<img src="https://github.com/ikhlaqsidhu/data-x/raw/master/imgsource/threefundamental.png">


It stores:
* location of a memory block (allocated all at one time)
* a shape (3 x 3 or 1 x 9, etc)
* data type / size of each element

The core feauture that NumPy supports is its multi-dimensional arrays. In NumPy, dimensions are called axes and the number of axes is called a rank.

In [1]:
# written for Python 3.6
import numpy as np


## Creating a NumPy Array: - 
### 1. Simplest possible: We use a list as an argument input in making a NumPy Array


In [18]:

list1 = [1, 2, 3, 4]
data = np.array(list1)
data

array([1, 2, 3, 4])

In [19]:
# it could be much longer
list2 = range(10000)
data = np.array(list2)
data

array([   0,    1,    2, ..., 9997, 9998, 9999])

In [20]:
# data = np.array(1,2,3,4, 5,6,7,8,9) # wrong
data = np.array([1,2,3,4,5,6,7,8,9]) # right
data

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
#accessing elements - similar to slicing Python lists:
print(data[:])
print (data[0:3])
print (data[3:])
print (data[::-2])

[1 2 3 4 5 6 7 8 9]
[1 2 3]
[4 5 6 7 8 9]
[9 7 5 3 1]


## Arrays are like lists, but different

In [27]:
# Arrays are faster and more efficient

x = list(range(10000))
%timeit y = [i**2 for i in x]
y = [i**2 for i in x]
print (y[0:5])


100 loops, best of 3: 3.11 ms per loop
[0, 1, 4, 9, 16]


In [28]:
z = np.array(x)
%timeit y = z**2
y = z**2
print (y[0:5])

The slowest run took 8.06 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.25 µs per loop
[ 0  1  4  9 16]


In [8]:
# Arrays are different than lists in another way:
# x and y are lists
x = list(range(5))
y = list(range(5,10))
print ("x = ", x)
print ("y = ", y)
print ("x+y = ", x+y)

x =  [0, 1, 2, 3, 4]
y =  [5, 6, 7, 8, 9]
x+y =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [37]:
# now lets try with NumPy arrays:
xn = np.array(x)
yn = np.array(y)
print (xn)
print (yn)
print ("xn + yn = ", xn + yn)

[   0    1    2 ..., 9997 9998 9999]
[       0        1        4 ..., 99940009 99960004 99980001]
xn + yn =  [       0        2        6 ..., 99950006 99970002 99990000]


In [39]:
# if you need to join to numpy arrays, try hstack, vstack, column_stack, or concatenate
print (np.hstack((xn,yn)))
print (np.concatenate((xn,yn)))
print (np.vstack((xn,yn)))
print (np.column_stack((xn,yn)))

[       0        1        2 ..., 99940009 99960004 99980001]
[       0        1        2 ..., 99940009 99960004 99980001]
[[       0        1        2 ...,     9997     9998     9999]
 [       0        1        4 ..., 99940009 99960004 99980001]]
[[       0        0]
 [       1        1]
 [       2        4]
 ..., 
 [    9997 99940009]
 [    9998 99960004]
 [    9999 99980001]]


In [11]:
# An array is a sequence that can be manipulated easily
# An arithmatic operation is applied to each element individually
# When two arrays are added, they must have the same size; corresponding elements 
# are added in the result

print (3* x)
print (3 * xn)


[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
[ 0  3  6  9 12]


In [40]:
# all elements must be the same type for performing mathematical operations
data = np.array([1,2,'cat', 4])
print (data+1)  # results in error

### Creating arrays with 2 axis:


In [41]:
# This list has two dimensions
list3 = [[1, 2, 3],
 [4, 5, 6]]

In [42]:
# data = np.array([[1, 2, 3], [4, 5, 6]])
data = np.array(list3)
print (data)

[[1 2 3]
 [4 5 6]]


In [73]:
# If you want to know the shape, use 'shape'
print (data.shape)
print (len(data.shape))
print (data.shape[1])

(2, 3)
2
3


In [43]:
# You can also transpose an array Matrix
print ('Transpose: \n', data.T, '\n')
print ('Transpose: \n', np.transpose(data))

# print (list3.T) # note, this would not work

Transpose: 
 [[1 4]
 [2 5]
 [3 6]] 

Transpose: 
 [[1 4]
 [2 5]
 [3 6]]


### Remember that every time you declare an np.array, the argument must be in the form of a Python list. Ranges are a great tool to create these list arrays.

In [16]:
#Creates array from 0 to before end: np.arange(end)
# See that you don't have to make a list first

# A range is an array of consecutive numbers
# np.arange(end): 

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
#Array increasing from start to end: np.arange(start, end)
np.arange(10, 20)

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [18]:
#Array increasing from start to end by step: np.arange(start, end, step)
# The range always includes start but excludes end
np.arange(1, 10, 2)

array([1, 3, 5, 7, 9])

In [56]:
#Returns a new array of specified size, filled with zeros.
array=np.zeros((2,5))
array

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [57]:
#Returns a new array of specified size, filled with ones.
array=np.ones((2,5))
array

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

## Some useful NumPy array methods:

### Indexing: There are two types of advanced indexing − Integer and Boolean.

In [78]:


x = np.array([[1, 2], [3, 4], [5, 6]]) 
print(x)
print ()
# first element is  the row, 2nd element is the column
print(x[1,0])




[[1 2]
 [3 4]
 [5 6]]

3


In [79]:
# first list contains  row indices, 2nd element contains column indices
idx = x[[0,1,2], [0,1,1]] 
print (idx)


[1 4 6]


In [80]:

print('Boolean indexing')
print(x[x>0])

Boolean indexing
[1 2 3 4 5 6]


In [52]:
# Reshape is used to change the shape
a = np.arange(0, 15)
a = a.reshape(3, 5)
# a = np.arange(0, 15).reshape(3, 5)  # same thing
print (a)


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


array([[ 0, 10,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [21]:
# ndim tells us the number of dimensions of the array
a.ndim

2

In [22]:
#dtype.name tells us what type is each element in the array
print (a.dtype.name)

int64


In [23]:
# And for total size:
a.size

15

In [24]:
# Setting the data type
# default is float
d1 = np.array([1,2,3,4,5,6,7,8])
print (d1.dtype, d1)

d2 = np.array([1,2.0,3,4,5,6,7,8])
print (d2.dtype, d2)

d3 = np.array([1,2.0,3,4,5,6,7,8], dtype = np.uint)
print (d3.dtype, d3)

# can be complex, float, int (same as int64), uint.

int64 [1 2 3 4 5 6 7 8]
float64 [ 1.  2.  3.  4.  5.  6.  7.  8.]
uint64 [1 2 3 4 5 6 7 8]


In [25]:
# sum, min, max, .. are easy
print (a)
print (a.sum())
print ((0+14)*15/2)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
105
105.0


In [26]:
print (a.sum(axis=0))
print (a.sum(axis=1))

[15 18 21 24 27]
[10 35 60]


## Arrray Axis
<img src= "https://github.com/ikhlaqsidhu/data-x/raw/master/imgsource/anatomyarray.png">



To get the cumulative product:

In [27]:
print (np.arange(1, 10))
print (np.cumprod(np.arange(1, 10)))

[1 2 3 4 5 6 7 8 9]
[     1      2      6     24    120    720   5040  40320 362880]


To get the cumulative sum:

In [28]:
print (np.arange(1, 10))
np.cumsum((np.arange(1, 10)))

[1 2 3 4 5 6 7 8 9]


array([ 1,  3,  6, 10, 15, 21, 28, 36, 45])

In [29]:
print (a[1,:])
print (np.cumsum(a[1,:]))

[5 6 7 8 9]
[ 5 11 18 26 35]


You can also compare arrays

In [30]:
#mask
# Does this array have any elements that are "3"?
data1 = np.array(range(10))
print (data1)
mask1 = (data1 > 3)
print (mask1)

[0 1 2 3 4 5 6 7 8 9]
[False False False False  True  True  True  True  True  True]


In [31]:
# use the mask to get elements:
print (data1[mask1])

[4 5 6 7 8 9]


In [32]:
# again:
mask2 = data1 == 0
print (mask2)
print (data1[mask2])

[ True False False False False False False False False False]
[0]


In [33]:
# or directly in one step:
print (np.array(range(10))> 5)
print (np.array(range(10))[np.array(range(10)) > 5])

[False False False False False False  True  True  True  True]
[6 7 8 9]


In [34]:
# Does this array have any or all elements that are "1"?
print (np.array([1, 1, 0, 1]) == 1)
print (np.all(np.array([1, 1, 1, 1]) == 1))   

[ True  True False  True]
True


Creating a 3D array:

In [35]:
a = np.arange(0, 96).reshape(2, 6, 8)
print(a)

[[[ 0  1  2  3  4  5  6  7]
  [ 8  9 10 11 12 13 14 15]
  [16 17 18 19 20 21 22 23]
  [24 25 26 27 28 29 30 31]
  [32 33 34 35 36 37 38 39]
  [40 41 42 43 44 45 46 47]]

 [[48 49 50 51 52 53 54 55]
  [56 57 58 59 60 61 62 63]
  [64 65 66 67 68 69 70 71]
  [72 73 74 75 76 77 78 79]
  [80 81 82 83 84 85 86 87]
  [88 89 90 91 92 93 94 95]]]


In [36]:
# The same methods typically apply in multiple dimensions
print (a.sum(axis = 0))
print ('---')
print (a.sum(axis = 1))

[[ 48  50  52  54  56  58  60  62]
 [ 64  66  68  70  72  74  76  78]
 [ 80  82  84  86  88  90  92  94]
 [ 96  98 100 102 104 106 108 110]
 [112 114 116 118 120 122 124 126]
 [128 130 132 134 136 138 140 142]]
---
[[120 126 132 138 144 150 156 162]
 [408 414 420 426 432 438 444 450]]


# Basic Operations

One of the coolest parts of NumPy is the ability for you to run operations on top of arrays. Here are some basic operations:

In [37]:
a = np.arange(11, 21)
b = np.arange(0, 10)
print ("a = ",a)
print ("b = ",b)
print (a + b)

a =  [11 12 13 14 15 16 17 18 19 20]
b =  [0 1 2 3 4 5 6 7 8 9]
[11 13 15 17 19 21 23 25 27 29]


In [38]:
a * b

array([  0,  12,  26,  42,  60,  80, 102, 126, 152, 180])

In [39]:
a ** 2

array([121, 144, 169, 196, 225, 256, 289, 324, 361, 400])

You can even do things like matrix operations

In [40]:
a.dot(b)

780

In [41]:
# Matrix multiplication
c = np.arange(1,5).reshape(2,2)
print ("c = ", c)
d = np.arange(5,9).reshape(2,2)
print ("d = ", d)

c =  [[1 2]
 [3 4]]
d =  [[5 6]
 [7 8]]


In [42]:
print (d.dot(c))

[[23 34]
 [31 46]]


In [43]:
# Other ways to create an array:
print (np.zeros(5))
print (np.ones(8).reshape(2,4))

[ 0.  0.  0.  0.  0.]
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]


In [44]:
# Radom numbers
rng = np.random.RandomState(0)  # the seed is zero
print(rng.uniform(1,5,10))   # 10 random uniform numbers from 1 to 5
print (rng.exponential(1,5)) # 5 random exp numbers with rate 1

[ 3.19525402  3.86075747  3.4110535   3.17953273  2.6946192   3.58357645
  2.75034885  4.567092    4.85465104  2.53376608]
[ 1.56889614  0.75267411  0.83943285  2.59825415  0.07368535]


In [45]:
print (np.random.random(8).reshape(2,4)) #8 random 0-1 in a 2 x 4 array
# https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.random.html

[[ 0.79367369  0.11808361  0.08561092  0.16551397]
 [ 0.95734802  0.01997461  0.19161898  0.46989086]]


In [46]:
# linspace: this is how you fill a number an array 
# with numbers from a to b with n equally spaced numbers (inclusive)

data = np.linspace(0,5,10)
print (data)


[ 0.          0.55555556  1.11111111  1.66666667  2.22222222  2.77777778
  3.33333333  3.88888889  4.44444444  5.        ]


In [47]:
from numpy import pi
x = np.linspace(0,2*pi, 10)
print ("x = ",x)
print ("sin(x) = ", np.sin(x))

x =  [ 0.          0.6981317   1.3962634   2.0943951   2.7925268   3.4906585
  4.1887902   4.88692191  5.58505361  6.28318531]
sin(x) =  [  0.00000000e+00   6.42787610e-01   9.84807753e-01   8.66025404e-01
   3.42020143e-01  -3.42020143e-01  -8.66025404e-01  -9.84807753e-01
  -6.42787610e-01  -2.44929360e-16]


In [48]:
# more slicing
x = np.array(range(25))
print (x)
print (x[5:15:2])
print (x[15:5:-1])


[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[ 5  7  9 11 13]
[15 14 13 12 11 10  9  8  7  6]


In [49]:
# take a slice from 10 to 19 and call it x1
x1 = x[10:20]
print (x1)

#x1 is. shallow copy, its just a window into the original x
x1[:] = 0
print (x1)


[10 11 12 13 14 15 16 17 18 19]
[0 0 0 0 0 0 0 0 0 0]


In [50]:
# what happens to x
print (x)

[ 0  1  2  3  4  5  6  7  8  9  0  0  0  0  0  0  0  0  0  0 20 21 22 23 24]


In [51]:
# if you actually need to delete a row or column, look up numpy.delete
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(x)
print ("---")
x = np.delete(x,0,axis=0)
print (x)


[[1 2 3]
 [4 5 6]
 [7 8 9]]
---
[[4 5 6]
 [7 8 9]]


In [52]:
# same thing with assignment, its not a copy, its the same data
x = np.array(range(25))
print (x)
y = x
y[:] = 0
print (x)
x is y

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


True

In [53]:
# If you want an actual copy: use a deep copy
x = np.array(range(25))
print (x)
y = x.copy()
y[:] = 0
print (x)
x is y

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]


False

In [54]:
# flatten using ravel()
x = np.array(range(24))
x = x.reshape(4,6)
print(x)

x = x.ravel() # make it flat
print (x)

x = x.reshape(6,4)
print (x)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
