## Basic NumPy Functionality

### NumPy n-dim array
* Part of a Data "ecosystem" which includes data science libaries, machine learning and image processing
* NumPy is huge and very powerful --> we will just hit the highlights
* optimized array, superior to lists
* can be "any" number of dimensions, typically 1 or 2 dims though
* restricted to a single data type
* required to "dimension" your arrays when you initialize them 

In [105]:
import numpy as np
import numpy.random as npr

In [24]:
l = [1, 2, 3] 
a = np.array([1,2,3])

In [26]:
l

[1, 2, 3]

In [28]:
a


array([1, 2, 3])

In [30]:
[x * 5 for x in l]

[5, 10, 15]

In [32]:
l + l


[1, 2, 3, 1, 2, 3]

In [35]:
'1' + '1'

'11'

In [37]:
a + a

array([2, 4, 6])

### Performance

In [45]:
big_arr = np.arange(4.5e6)
big_list = big_arr.tolist()


In [54]:
%timeit -n5 square = [x ** 2 for x in big_list]


580 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [51]:
%timeit -n5 sqare = big_arr **2

12.7 ms ± 2.04 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [43]:
4_500_000

4500000

### Determining dimensions
* size = "rows" * "columns"
* itemsize --> number of bytes used by a single value
* itemsize * size = bytes of memory the array uses
* shape --> comma separated tuple of length of each dimension
* ndim --> int value of dimensions present
* reshape --> dimension manipulation

In [56]:
r = np.arange(15)
r

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [58]:
r.size

15

8 bit image,
3d array, RGB COLOUR, 0-225, 24 bit
black and white stored in 2d array

In [67]:
r.itemsize # How many bytes does each value in this array take up

4

In [69]:
r.shape

(15,)

In [83]:
r.reshape(5,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [75]:
r.reshape(5,3).shape

(5, 3)

In [73]:
r


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [77]:
r.reshape(-1) #or r.reshape(15,), to convert it into 1d array as earlier it was converted into 3d

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [91]:
a= r.reshape(5,3)

### "Slicing"

In [93]:
a[1,:]

array([3, 4, 5])

### Special arrays
* useful for dimensioning an array before you aknow what values it will hold

In [95]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [97]:
np.ones((5,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [99]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [101]:
np.eye(5,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [103]:
np.empty(5)

array([1., 1., 1., 1., 1.])

### Working with random numbers
* many different distribution options
* normal and uniform most common
* normal, standard_normal
* randn, randint

### Standard normal
* mean 0
* std 1
* takes a single optional argument --> n
* `np.random.standard_normal([size])`

In [109]:
np.random.standard_normal(10)

array([-0.47831425,  2.24225659, -2.08745086, -0.21028412,  1.47997374,
        1.59099623, -0.11443272,  0.47651478,  1.49210799,  0.82908787])

AttributeError: 'builtin_function_or_method' object has no attribute 'standard_normal'

### Uniform Distributions
* `np.random.randn(size=None)`
* `np.random.randint(low, high=None, size=None, dtype=int)`
    * excludes high --> called a half-open interval

In [115]:
# notice we never get to 6...
npr.randint(1,6)

1

### Normal
* loc = mean
* scale = std
* if not specified mean = 0, std = 1

In [124]:
x= npr.normal(loc=100, scale=10, size=100)  #scale=std, local = normal

In [126]:
print(x.mean())
print(x.std())
print(x.max())
print(x.min())
# there is no median method

100.02127088779686
9.771923955982194
122.19500107418409
81.22084905369115


In [134]:
percentile = [ 5,10,25,50,75, 90, 95]
np.percentile (x, percentile)

array([-1.13962172, -1.09584505, -0.70898091, -0.12959075,  0.67813038,
        1.22656313,  1.38008202])

### Simple Linear Modeling

In [59]:
slope, intercept = np.polyfit(x,y, deg=1)
slope

0.9617250638432819

### More syntactically intense method to do the same thing
* Requires a "dummy" variable next to actual x variable
* The rcond allows for special handling of extremely small values 

In [128]:
x = npr.normal(size=50)
y = x + npr.normal(loc=10, size=50)

In [130]:
np.corrcoef(x,y)

array([[1.        , 0.73780949],
       [0.73780949, 1.        ]])

In [132]:
np.polyfit(x,y, deg=1)

array([ 1.11793172, 10.03655916])

### Fill array with uniformly spaced data points between a minimum and maximum