

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/440px-NumPy_logo.svg.png"/>


# What is NumPy?
"__Numpy__ is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.


The Python programming language was not initially designed for numerical computing, but attracted the attention of the scientific and engineering community early on, so that a special interest group called matrix-sig was founded in 1995 with the aim of defining an array computing package. Among its members was Python designer and maintainer Guido van Rossum, who implemented extensions to Python's syntax (in particular the indexing syntax) to make array computing easier." 

_~ Wikipedia_




### *NumPy*, which stands for *Numerical Python*, is a library consisting of:
- Multidimensional array objects 
- A collection of routines for processing those arrays. 
- Mathematical and logical operations.




## NDArray


The main object in NumPy is an **N-dimensional array** type called **ndarray**. 

- ndarray describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index.
- Each element in ndarray is an object of data-type object (called dtype).

### Quick Example

In [55]:
import numpy as np 
a = np.array([1,2,3]) 
print a

[1 2 3]


In [56]:
# array can have more than one dimension 
a = np.array([ [1,2,3],
               [4,5,6], 
               [7,8,9] ])
print a

[[1 2 3]
 [4 5 6]
 [7 8 9]]


### Arguments

The `np.array(...)` function accepts the following arguments: 

- `object` - The array of values.
- `dtype`  - The values data type (optional).
- `copy`   - Should copy objects? (default=true).
- `order`  - Notes if the values should be ordered by row (C) or column (F),or any(A) (default=A)
- `subok`  - Should result be forced to the defined data type? (default=true)
- `ndmin`  - Minimum dimensions of the result array.




### Args Examples

In [57]:
a = np.array([1, 2, 3, 4, 5], ndmin = 2) 
print a

[[1 2 3 4 5]]


In [58]:
# dtype parameter 
import numpy as np 
a = np.array([1, 2, 3], dtype = complex) 
print a

[1.+0.j 2.+0.j 3.+0.j]


### Data Types

**String**

`S` - Followed by a it's length. (e.g. S5).

**Boolean:**

`bool`  - True or False. stored as a byte

**Integers:**

- `int - Default integer type  (32 or 64)
- `int8` - Byte (-128 to 127)
- `int16` - Integer (-32768 to 32767)
- `int32` - Integer (-2147483648 to 2147483647)
- `int64` - Integer (-9223372036854775808 to 9223372036854775807)

**Unsigned Integers**

- `uint8` (0 to 255)
- `uint16` (0 to 65535) 
- `uint32` (0 to 4294967295)
- `uint64` (0 to 18446744073709551615)

**Float**

`float16` - Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
`float32` - Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
`float64` - Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

**Complex number:**

`complex64` - Complex number, represented by two 32-bit floats (real and imaginary components)
`complex128` - Complex number, represented by two 64-bit floats (real and imaginary components)



### Using Data Type Objects (dtype)
A `data type object` describes the data types of the objects in the **ndarray**. 

Using data types objects helps us manage and access our objects more 'humanly' manner. When working with two-dimentioanl datasets, we can access 'Columns' directly.

**Exmaples:**



In [96]:
dt = np.dtype(np.int32) 
print dt

int32


In [97]:
# Use np.dtype to define the data type of the object
import numpy as np 
dt = np.dtype([('score',np.int8)]) 

# now apply it to ndarray object 
a = np.array([(10,),(20,),(30,)], dtype = dt) 
print a

[(10,) (20,) (30,)]


In [61]:
# use the field name to access content of score column 
print a['score']

[10 20 30]


In [62]:
# A more useful example
 
student = np.dtype([('name','S20'), ('age', 'int8'), ('grade', 'float16')]) # note the string data type
result = np.array([('Alice', 21, 50.3),
                   ('Bob', 28, 75.0),
                   ('Charlie', 22, 95.1),
                   ('Dave', 23, 79.0),
                   ('Eve', 19, 82.9),
                  ], dtype = student) 
print result

[('Alice', 21, 50.3) ('Bob', 28, 75. ) ('Charlie', 22, 95.1)
 ('Dave', 23, 79. ) ('Eve', 19, 82.9)]


In [63]:
# Get the list of grades
print result['grade']

[50.3 75.  95.1 79.  82.9]


### Ndarray Attributes

**Shape:**

NumPy array comes in all shapes and sizes



In [64]:
a = np.array([[1,2,3],[4,5,6]]) 
print a.shape

(2, 3)


**A few more examples:**


In [65]:
# lets use the arange function to use the 
a = np.arange(24) 
print a


[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]


In [66]:
# The ndim attribute is the number of dimensions of the ndarray. 
# The arrange functions return a series on natural number as a one-dimensional array
print a.ndim  

1


In [68]:
# now lets use the reshape function to force-set the array dimensions
b = a.reshape(2,4,3) 
print b 

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]]


In [70]:
print "ndarray 'b' has: {0} dimensions".format(b.ndim) 

dnarray 'b' has: 3 dimensions


### Creating Arrays

There are a few built-in functions to create arrays, pre-filled with values. 

We've seen the `np.arange` function, another common one is the `np.zeros(size)` function which returns an array of the given size, pre-filled with zeros. 

However, the more common way to create arrays is using the `np.asarray(my_list)` function. 

**Example:**

In [76]:
x = [1,2,3] 
a = np.asarray(x) 
print type(a)

<type 'numpy.ndarray'>


In [77]:
# we can also use tuples
x = [(1,2,3),(4,5)] 
a = np.asarray(x) 
print a

[(1, 2, 3) (4, 5)]


In [88]:
# or we can use the 'frombuffer' function to get get items for a buffer, such as a string
s = 'Hello World' 
a = np.frombuffer(s, dtype = 'S1') 
print a

['H' 'e' 'l' 'l' 'o' ' ' 'W' 'o' 'r' 'l' 'd']


**The arange function can be used to create more sufisticated arrays using the function's parameters**

`start` - The starting value of an interval.

`stop`  - The ending value of an interval (exclusive)

`step`  - The gap between items. (default=1)

`dtype` - Data type of resulting ndarray.

In [90]:
# All multipliers of 5, between 100 and 200. 
a = np.arange(100, 200, 5)
print a

[100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185
 190 195]


In [93]:
# A similar function and also very useful is the linespace function to generate N numbers within a range. 
# So lets say we want to make a protest and we need to position 100 people on the road between TLV and JLM, 
# we will calculate their marks along the road like this: 

road_length = 67.6 # 67.6 km, according to google
num_of_people = 100
x = np.linspace(0,road_length, num_of_people, endpoint = False) 
print x

[ 0.     0.676  1.352  2.028  2.704  3.38   4.056  4.732  5.408  6.084
  6.76   7.436  8.112  8.788  9.464 10.14  10.816 11.492 12.168 12.844
 13.52  14.196 14.872 15.548 16.224 16.9   17.576 18.252 18.928 19.604
 20.28  20.956 21.632 22.308 22.984 23.66  24.336 25.012 25.688 26.364
 27.04  27.716 28.392 29.068 29.744 30.42  31.096 31.772 32.448 33.124
 33.8   34.476 35.152 35.828 36.504 37.18  37.856 38.532 39.208 39.884
 40.56  41.236 41.912 42.588 43.264 43.94  44.616 45.292 45.968 46.644
 47.32  47.996 48.672 49.348 50.024 50.7   51.376 52.052 52.728 53.404
 54.08  54.756 55.432 56.108 56.784 57.46  58.136 58.812 59.488 60.164
 60.84  61.516 62.192 62.868 63.544 64.22  64.896 65.572 66.248 66.924]


In [98]:
# or use the 'logspace' function to generate an exponential series
a = np.logspace(1,10, num = 10, base = 2) 
print a

[   2.    4.    8.   16.   32.   64.  128.  256.  512. 1024.]


###  Slicing arrays is fun!

There are two ways to index an ndarrray

__Integer indexing__ - speciding the 'rows and columns', using lists of indexes of the values we wish to slice. 

__Boolean indexing__ - using a boolean condition to 'filter out' values. 


Integer indexing Examples: 


In [2]:
# using integers to state the rows and columns we wish to slice

x = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9]]) 

row_indexes = [0,1,2] 
col_indexes = [2,1,0]

""" 
When combining the rows and columns indexes, we get a tupple for each element we wish to slice 
in the example above, we get (0,2) , (1,1) , (2,0). meaning, we wish to extract the object at row 0 column 2,
the element in row no' 1 and column no' 1 etc.
""" 

y = x[row_indexes,   
      col_indexes] 

print y


[3 5 7]


In [10]:
# you can also use python's regular list indexing (e.g. my_list[0:5])

a = np.arange(100)
a = a.reshape(10,10)
print a
print """ 
---------------------------------------
And now just the top-right quarter
"""
print a[0:5, 5:10]



[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]
 
---------------------------------------
And now just the top-right quarter

[[ 5  6  7  8  9]
 [15 16 17 18 19]
 [25 26 27 28 29]
 [35 36 37 38 39]
 [45 46 47 48 49]]



__boolean indexing__ allows us to extract values based on logic rather then just index and is usually more helpful. 

Some examples: 

In [21]:
a = np.array([[ 0,  1,  2],
              [ 3,  4,  5],
              [ 6,  7,  8],
              [ 9, 10, 11]]) 

print a
print '\n'

# Now lets slice by items greater than 5 
print "is even number?"
print '\n'
b = a % 2 == 0
print b

print "\n"

# but that's not so usefull for calculations, lets turn it into int
print "as binary matrix\n"
print b.astype(np.int)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


is even number?


[[ True False  True]
 [False  True False]
 [ True False  True]
 [False  True False]]


as binary matrix

[[1 0 1]
 [0 1 0]
 [1 0 1]
 [0 1 0]]


###  Array Manipulation

Numpy has a variaty of functions to manipulate your data. From reshaping, transposing, merging and many more. 

Lets see some examples.

__broadcasting__


In [22]:
# Broadcasting is the numpy name for arrays 'arithmetics'. 
a = np.array([1,2,3,4]) 
b = np.array([10,20,30,40]) 
c = a * b 
print c


[ 10  40  90 160]


In [23]:
c = a + b 
print c

[11 22 33 44]


In [25]:
c = b - a
print c

[ 9 18 27 36]


In [28]:
# Arrays doesn't have to be of the same dimension to use broadcasting, 
# but be careful with how you plan your calculation

a = np.array([[0, 0, 0],
              [10, 10, 10],
              [20, 20, 20],
              [30, 30, 30]]) 

b = np.array([1, 2, 3])  
print a + b

# 'b' is added to each row in 'a'

[[ 1  2  3]
 [11 12 13]
 [21 22 23]
 [31 32 33]]


__numpy.concatenate__ is used to 'merge' two arrays together. you can choose on which axis you wish to join the arrays. 


In [31]:
a = np.array([[1,2],
              [3,4]]) 

b = np.array([[5,6],
              [7,8]]) 

# both the arrays are of same dimensions 

print "Merging by the '0' axis: (default)"
print np.concatenate((a,b)) 
print '\n'  

print 'Merging by the 1 axis:' 
print np.concatenate((a,b),axis = 1)

Merging by the '0' axis: (default)
[[1 2]
 [3 4]
 [5 6]
 [7 8]]


Merging by the 1 axis:
[[1 2 5 6]
 [3 4 7 8]]


using the __split__ function, we can get the opposite effect. 




In [42]:
a = np.arange(9) 
# 'spilt' returns a python lists of numpy arrays.

print 'Split the array in 3 equal-sized subarrays:' 
b = np.split(a,3) 
for row in b:
    print row
print '\n'  

print 'Split the array at positions indicated in 1-D array:' 
b = np.split(a,[2,6])
for row in b:
    print row



Split the array in 3 equal-sized subarrays:
[0 1 2]
[3 4 5]
[6 7 8]


Split the array at positions indicated in 1-D array:
[0 1]
[2 3 4 5]
[6 7 8]


### Adding/Removing elements

Adding/removing elements from the array is poosible using a few useful functions:

__append__ - Appends the values to the end of an array

__insert__ - Inserts the values along the given axis before the given indices

__delete__ - Returns a new array with sub-arrays along an axis deleted

__unique__ - Finds the unique elements of an array


Some examples: 

In [45]:
a = np.array([1,2,3])
np.append(a, 4)

array([1, 2, 3, 4])

In [52]:
# use the 'insert(obj, values, axis)' function to insert elements in a given row/column 
a = np.array([[1,2, 3],
              [4, 5, 6],
              [7, 8, 9]]) 

to_insert = 0

print 'Insert along axis 0:' 
print np.insert(a,1, to_insert , axis = 0) 
print '\n'  

print 'Insert along axis 1:' 
print np.insert(a, 1, to_insert, axis = 1)

Insert along axis 0:
[[1 2 3]
 [0 0 0]
 [4 5 6]
 [7 8 9]]


Insert along axis 1:
[[1 0 2 3]
 [4 0 5 6]
 [7 0 8 9]]


In [57]:
# The delete column removes rows/columns
a = np.array([[1,2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print 'Delete column 2:'  
print np.delete(a, 1, axis = 1) 

Delete column 2:
[[ 1  3]
 [ 4  6]
 [ 7  9]
 [10 12]]


## Statistical Functions

Numpy supports various statistical functions to explore your data set. 

__Examples:__


In [65]:
a = np.array([[3,0,5],
              [8,4,1],
              [5,4,9]]) 

print 'amin() returns an array of the minumum value in each axis:' 
print np.amin(a, axis=0) 
print '\n'  

print 'amax() example'
print np.amax(a) 
print '\n'  

print 'Applying amax() function again:' 
print np.amax(a, axis = 1)

amin() returns an array of the minumum value in each axis:
[3 0 1]


amax() example
9


Applying amax() function again:
[5 8 9]


In [74]:
"""Use numpy.percentile(a, q, axis) to find the the value 
below which a given percentage of observations in a group of observations """

import numpy as np 
a = np.array([[1,22, 9],
             [2, 15, 11],
             [13, 28, 76]]) 

print 'using percentile() without specifing an axis flattens the array:' 
print np.percentile(a,50) 
print '\n'  

print 'Find the 50% element along the axis 1:' 
print np.percentile(a, 50, axis = 1) 
print '\n'  

print 'Find the 50% element along the axis 0:' 
print np.percentile(a, 50, axis = 0)



using percentile() without specifing an axis flattens the array:
13.0


Find the 50% element along the axis 1:
[ 9. 11. 28.]


Find the 50% element along the axis 0:
[ 2. 22. 11.]


In [81]:
# Comming up are your favorite stat functions!
a = np.array([1,3,6,3.4,5,9,23,45,6,43,34,15,43])
print "Mean:", np.mean(a)
print "Median:", np.median(a)
print "Standard Deviation:", np.std(a)
print "Variance:", np.var(a)


Mean: 18.184615384615384
Median: 9.0
Standard Deviation: 16.48709733998756
Variance: 271.82437869822485


## Exercise Time!

Numpy has __a lot__ more functions in each of the above caetgories, but that's enough for now.

Lets start playing around.