# Python Numpy Intro
An introduction to the [Python Numpy](http://www.numpy.org/) numerical python library.  
The core data structure behind Numpy is the n-dimensional [Numpy Array](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html). It is 3x to 10x faster and more memory efficient than Python's lists because, similar to Java arrays, it uses contiguous blocks of memory, and all elements are the same data type so there is no type checking at runtime. The Numpy library also includes many built-in code-saving mathematical functions that can be performed on an entire array or any slice of an array with a single line of code (ie. no for loops).  
Numpy n-dimensional arrays are also sometimes referred to as nd-arrays.

**Install Numpy** using pip:  `pip install numpy`
The convention for importing numpy is *import numpy as np*.

In [None]:
import numpy as np

### Creating a Numpy Array
There are MANY ways to instantiate a numpy array. I covered the most common ones below. [Docs here cover more constructors](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.array-creation.html).
- Pass in a list to the array() constructor
- Use the arange function, similar to the range function but used for Numpy arrays. Uses arguments, (start, stop+1, step).
- Use linspace to create an array of n equally spaced values. Uses arguments (start, stop, number of items).
- Create an array empty, full of ones or zeros, or full of any fill value. Uses argument (shape) in the form of a tuple.  

You can pass in dtype as an optional argument for any of these. This is especially useful if you want to limit memory usage for a very large array of small integers because int8 and int16 use much less space than the default int32.

In [1]:
a = np.array([1, 3, 5, 7, 9, 11])
print(a)

[ 1  3  5  7  9 11]


In [12]:
?np.array

In [6]:
a = np.arange(1, 13, 2)  # (start, stop, step)
print(a)

array([ 1,  3,  5,  7,  9, 11])

In [20]:
a = np.linspace(5, 8, 13)  # (start, stop, number of items)
print(a)

[5 6 7 8]


1

In [21]:
a = np.zeros((4, 2))
print(a)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


In [50]:
a = np.ones((2, 3), dtype=np.int64)
print(a)

[[1 1 1]
 [1 1 1]]


In [25]:
a = np.full((6,), 88)
print(a)

[88 88 88 88 88 88]


In [35]:
a = np.fromstring('25 30 35 40', dtype = np.int, sep =' ')
print(a)

[25 30 35 40]


In [59]:
a = np.array([[1,3,5],[7,9,11]])
print(a)

[[ 1  3  5]
 [ 7  9 11]]


In [44]:
b = np.zeros_like(a)    # _like gives you a new array in the same shape as the argument.
print(b)

[[0 0 0]
 [0 0 0]]


### Numpy Array Attributes
Get size (number of items), shape (dimensions), itemsize(bytes of memory for each item), and dtype (numpy data type).  
See how many bytes of memory space the whole array uses from the product of size and itemsize. See [complete list of attributes and methods](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html).

In [60]:
a

array([[ 1,  3,  5],
       [ 7,  9, 11]])

In [None]:
print(a.size)

In [None]:
print(a.shape)

In [None]:
print(a.ndim)

In [51]:
a.dtype

dtype('int64')

In [52]:
print(a.itemsize)

8


In [54]:
print(a.dtype)

int64


In [55]:
print(a.nbytes)  # same as a.size * a.itemsize

48


### Indexing and Slicing
Use square brackets to get any item of an array by index. Multi-dimensional arrays can use multiple square brackets.

There are three arguments for slicing arrays, all are optional: [start:stop:step].  
    If start is left blank it defaults to 0. If stop is left blank it defaults to the end of the array. Step defaults to 1.

In [61]:
print(a)

[[ 1  3  5]
 [ 7  9 11]]


In [62]:
print(a[1])

[ 7  9 11]


In [63]:
print(a[0][2])

5


In [64]:
b

array([[0, 0, 0],
       [0, 0, 0]])

In [65]:
print(b[2:4])

[]


In [66]:
print(a[:1])

[[1 3 5]]


In [67]:
print(a[1:3:2])

[[ 7  9 11]]


In [74]:
print(a[:, 1:2])  # all elements on dimension 0, only element 1 on dimension 1

[3 9]


### Reshape, Swap Axes, Flatten
See full list of [array manipulation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html).

In [79]:
c = np.arange(-9, -3,).reshape(2,3)
print(c)

[[-9 -8 -7]
 [-6 -5 -4]]


In [80]:
c = c.swapaxes(0,1)
print(c)

[[-9 -6]
 [-8 -5]
 [-7 -4]]


In [81]:
c.T

array([[-9, -8, -7],
       [-6, -5, -4]])

In [82]:
c = c.flatten()
print(c), type(c)

[-9 -6 -8 -5 -7 -4]


(None, numpy.ndarray)

In [83]:
c.ravel()

array([-9, -6, -8, -5, -7, -4])

In [92]:
?np.ravel

In [91]:
?c.flatten

### Use dtype to Save Space
Default data types (int32 and float64) are memory hogs. If you don't need the higher precision you can save a lot of memory space and improve speed of operations by using smaller data types. For large data sets this makes a big difference.

In [96]:
d = np.arange(0,100)
d

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [97]:
print(d.dtype, type(d[1]))
print(d.nbytes)

int32 <class 'numpy.int32'>
400


In [98]:
d = np.arange(0,100, dtype='int8')
d

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
      dtype=int8)

In [99]:
print(d.dtype, type(d[1]))
print(d.nbytes)

int8 <class 'numpy.int8'>
100


### UpCasting, Rounding, Print Formatting
Data type of all Items is upcast to the most precise element. 

In [93]:
e = np.array([(1.566666,2,3), (4,5,6)])
print(e.dtype)

float64


In [94]:
e = e.round(4)
print(e)

[[1.5667 2.     3.    ]
 [4.     5.     6.    ]]


In [101]:
np.set_printoptions(precision=2, suppress=True)    # show 2 decimal places, suppress scientific notation
print(e)

[[1.57 2.   3.  ]
 [4.   5.   6.  ]]


### Numpy Data Types Available
uint is unsigned int, for positive numbers.

In [None]:
import pprint as pp #pretty print
pp.pprint(np.sctypes)

### Reading and Writing to Files
Can use [loadtxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt), or [genfromtxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt) to load data to load an entire file into an array at once. Genfromtxt is more fault tolerant.  
Use [savetxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html#numpy.savetxt) to write an array to file.

In [None]:
f = np.loadtxt('data.txt', skiprows=1, delimiter=',', dtype=np.int32)
print(f)

In [None]:
print(f.dtype)

In [None]:
np.savetxt('data2.txt', f, delimiter=';', fmt='%d', header='a;b;c;d;e;f;g;h;i;j', comments='')

### `.genfromtxt()`

In [None]:
g = np.genfromtxt('data.txt', skip_header=1, delimiter=',', dtype=np.int32)
print(g)

### Mathematical Functions
Numpy has an extensive list of [math and scientific functions](https://docs.scipy.org/doc/numpy/reference/routines.html).  
The best part is that you don't have to iterate. You can apply an operation to the entire array or a slice of an array at once.

In [None]:
print(g > 4)

In [None]:
print(g ** 2 - 1)

### Statistics Summary

In [None]:
g

In [None]:
print(g.min())

In [None]:
print(g.max())

In [None]:
print(g.sum())

In [None]:
print(g.mean())

In [None]:
print(g.var())         # variance

In [None]:
print(g.std())         # standard deviation

In [None]:
g

In [None]:
print(g.sum(axis=1)) # across each columns or sum of each row

In [None]:
print(g.min(axis=0)) # across each row or min of values in each column

In [None]:
print(g.argmin())      # index of min element

In [None]:
print(g.argmax())      # index of max element

In [None]:
print(g.argsort())     # returns array of indices that would put the array in sorted order

### Column Operations
Apply functions only to specific columns by slicing, or create a new array from the columns you want, then work on them.  
But Beware that creating a new pointer to the same data can screw up your data if you're not careful.

In [None]:
g

In [None]:
print(g[:, 2:3])

In [None]:
print(g[:, 2:3].max())

In [None]:
col3 = g[:, 3:4]      # not a copy, just a pointer to a slice of g
print(col3.std())

In [None]:
col3 *= 100           # Beware: this is applied to g data
print(g)

### Numpy Random Functions

In [None]:
np.set_printoptions(precision=5, suppress=True)    # show 5 decimal places, suppress scientific notation
h = np.random.random(6)
print(h)

In [None]:
h = np.random.randint(10, 99, 8)    # (low, high inclusive, size)
print(h)

In [None]:
np.random.shuffle(h)        # in-place shuffle
print(h)

In [None]:
print(np.random.choice(h))

In [None]:
h.sort()                    # in-place sort
print(h)