## Basic NumPy Functionality

### NumPy n-dim array
* Part of a Data "ecosystem" which includes data science libaries, machine learning and image processing
* NumPy is huge and very powerful --> we will just hit the highlights
* optimized array, superior to lists
* can be "any" number of dimensions, typically 1 or 2 dims though
* restricted to a single data type
* required to "dimension" your arrays when you initialize them 

In [52]:
import numpy as np
import numpy.random as npr

In [11]:
l = [1, 2, 3] 
a = np.array(l)
l

[1, 2, 3]

In [12]:
l

[1, 2, 3]

In [13]:
a

array([1, 2, 3])

In [14]:
[x*5 for x in l]

[5, 10, 15]

In [15]:
a*5

array([ 5, 10, 15])

In [16]:
a+a

array([2, 4, 6])

### Performance

In [17]:
l+l

[1, 2, 3, 1, 2, 3]

In [22]:
big_arr= np.arange(4.5e6)
big_list= big_arr.tolist() 

In [18]:
4_500_000

4500000

In [23]:
%timeit -n5 square = [x**2 for x in big_list]

295 ms ± 3.99 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [24]:
%timeit -n5 square = big_arr**2

4.81 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


### Determining dimensions
* size = "rows" * "columns"
* itemsize --> number of bytes used by a single value
* itemsize * size = bytes of memory the array uses
* shape --> comma separated tuple of length of each dimension
* ndim --> int value of dimensions present
* reshape --> dimension manipulation

In [26]:
dda=np.arange(15)
dda

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [28]:
dda.size

15

In [29]:
dda.itemsize 
#space the item is using 

8

In [30]:
dda.shape

(15,)

In [31]:
dda.reshape(3,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [34]:
dda.reshape(15,)
#another way dda.reshape(-1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### "Slicing"

In [37]:
dda[0:]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Special arrays
* useful for dimensioning an array before you aknow what values it will hold

In [38]:
np.ones((6,6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

In [39]:
np.zeros(6)

array([0., 0., 0., 0., 0., 0.])

In [41]:
np.eye(3,3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [40]:
np.empty(3)
#gives smth close to zero

array([0., 0., 0.])

### Working with random numbers
* many different distribution options
* normal and uniform most common
* normal, standard_normal
* randn, randint

### Standard normal
* mean 0
* std 1
* takes a single optional argument --> n
* `np.random.standard_normal([size])`

In [47]:
np.random.standard_normal([10])

array([-1.71124677, -0.68394341,  0.59989698,  0.481366  ,  0.73809773,
        0.80670263, -0.66002667,  0.32462231, -1.07456564, -0.29046536])

In [53]:
npr.standard_normal([10])

array([ 1.93584096, -0.41748412, -1.81873151, -0.08627307, -2.1939024 ,
        1.13802304,  0.37160721,  1.28505547, -0.74884888,  0.65522741])

### Uniform Distributions
* `np.random.randn(size=None)`
* `np.random.randint(low, high=None, size=None, dtype=int)`
    * excludes high --> called a half-open interval

In [54]:
npr.randint(1,5)

4

In [4]:
# notice we never get to 6...


### Normal
* loc = mean
* scale = std
* if not specified mean = 0, std = 1

In [56]:
x=npr.normal(loc=100, scale=10, size=1000)

In [57]:
print(x.mean())
print(x.std())
print(x.max())
print(x.min())
# there is no median method

99.81975614754877
9.992668703415113
134.04861223406266
65.7234766895583


In [58]:
percentiles = [5,10,25,50,75,90,95]
np.percentile(x,percentiles)

array([ 83.35241308,  87.14688103,  93.06735838,  99.90247731,
       106.60322166, 112.38836756, 116.07688009])

### Simple Linear Modeling

In [59]:
slope, intercept = np.polyfit(x,y, deg=1)
slope

0.9617250638432819

### More syntactically intense method to do the same thing
* Requires a "dummy" variable next to actual x variable
* The rcond allows for special handling of extremely small values 

In [59]:
x=npr.normal(size=50)
y=x+npr.normal(loc=10,size=50)

In [60]:
np.corrcoef(x,y)

array([[1.        , 0.71867852],
       [0.71867852, 1.        ]])

In [61]:
np.polyfit(x,y, deg=1)

array([ 1.0508583 , 10.04222492])

### Fill array with uniformly spaced data points between a minimum and maximum