## Basic NumPy Functionality

### NumPy n-dim array
* Part of a Data "ecosystem" which includes data science libaries, machine learning and image processing
* NumPy is huge and very powerful --> we will just hit the highlights
* optimized array, superior to lists
* can be "any" number of dimensions, typically 1 or 2 dims though
* restricted to a single data type
* required to "dimension" your arrays when you initialize them 

In [78]:
import numpy as np

In [79]:
l = [1, 2, 3] 
a = np.array([1,2,3])

In [80]:
l + l

[1, 2, 3, 1, 2, 3]

In [7]:
a + a

array([2, 4, 6])

In [8]:
a + 1

array([2, 3, 4])

In [9]:
a

array([1, 2, 3])

In [81]:
a = a + 1
a

array([2, 3, 4])

### Performance

In [112]:
big_arr = np.arange(4.5e6)
big_list = big_arr.tolist()

In [114]:
%timeit -n10  square = [x ** 2 for x in big_list]

1.14 s ± 33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [115]:
%timeit -n10 square = big_arr ** 2

15.5 ms ± 866 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Determining dimensions
* size = "rows" * "columns"
* itemsize --> number of bytes used by a single value
* itemsize * size = bytes of memory the array uses
* shape --> comma separated tuple of length of each dimension
* ndim --> int value of dimensions present
* reshape --> dimension manipulation

In [84]:
a = np.arange(15)

In [85]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [86]:
a.size

15

In [87]:
a.itemsize

4

In [44]:
a = np.arange(15)

In [45]:
print(f"{a.itemsize * a.size:,d} bytes")

60 bytes


In [46]:
a.shape

(15,)

In [47]:
a.reshape(3,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [48]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [89]:
a = a.reshape(3,5)

In [91]:
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [90]:
a.shape

(3, 5)

In [92]:
a.ndim

2

In [93]:
a.reshape(-1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### "Slicing"

In [94]:
a[0]

array([0, 1, 2, 3, 4])

In [95]:
a[:,0]

array([ 0,  5, 10])

In [96]:
a[0,0]

0

### Special arrays
* useful for dimensioning an array before you aknow what values it will hold

In [24]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [25]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [26]:
np.eye(5,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [27]:
np.empty([2,2])

array([[2.12199579e-314, 1.10131582e-311],
       [5.55329786e-321, 6.95314444e-310]])

### Working with random numbers
* many different distribution options
* normal and uniform most common
* normal, standard_normal
* randn, randint

### Standard normal
* mean 0
* std 1
* takes a single optional argument --> n
* `np.random.standard_normal([size])`

In [76]:
np.random.standard_normal()

-0.38420084699993906

In [77]:
np.random.standard_normal(5)

array([ 0.15851874, -0.2058755 ,  0.19471779,  0.57710061,  0.19860551])

### Uniform Distributions
* `np.random.randn(size=None)`
* `np.random.randint(low, high=None, size=None, dtype=int)`
    * excludes high --> called a half-open interval

In [97]:
np.random.randn()

0.17112045560624875

In [98]:
np.random.randn(5)

array([-0.83703911, -1.47547863,  0.02687145, -0.26140746,  1.94346588])

In [101]:
np.random.randint(10)

2

In [104]:
# notice we never get to 6...
np.random.randint(1, 6, 10)

array([5, 3, 4, 5, 1, 5, 3, 4, 2, 4])

### Normal
* loc = mean
* scale = std
* if not specified mean = 0, std = 1

In [28]:
np.random.normal(size=50)

array([ 0.61499075, -0.79031338,  2.20985409, -0.82819349,  0.44362538,
       -0.47206279,  1.05962155, -1.37971882, -0.11556924,  0.9768736 ,
       -1.54724814, -0.01042032, -0.26883774, -0.58252071,  0.03313654,
       -1.00009154, -1.03485552, -0.63342535, -1.25841989, -0.86888451,
        0.68204029, -1.30427489,  0.08323465,  0.8848538 ,  2.50112189,
       -1.02638804,  1.17810789, -1.76675899, -2.58325409,  0.3511461 ,
        0.51929291,  1.31668393,  1.80715959, -1.03982153, -0.66266492,
        1.65595351,  0.66003959, -1.00065746, -1.30856493, -1.08279799,
       -1.50234048,  1.71406053, -1.03610557, -0.13577955, -0.22113857,
       -0.52451153, -1.58625989,  0.62636419,  1.0445933 ,  1.86347383])

In [109]:
# start with 100
x =np.random.normal(loc=2, scale=5, size=100)

In [110]:
print(x.mean())
print(x.std())
print(x.max())
print(x.min())
# there is no median method

1.690824604612041
5.055112588427487
16.823582999433327
-13.756313531762975


In [111]:
l = np.random.normal(loc=10, size=50)
print(f"{l.mean():.3f}")
print(f"{l.std():.3f}")      

10.298
1.053


### Simple Linear Modeling

In [33]:
x = np.random.normal(size=50)
y = x + np.random.normal(loc= 10, size = 50)

In [35]:
np.corrcoef(x,y)

array([[1.        , 0.58903706],
       [0.58903706, 1.        ]])

In [36]:
np.polyfit(x,y, deg=1)

array([1.05761848, 9.92043801])

In [None]:
slope, intercept = np.polyfit(x,y, deg=1)

### More syntactically intense method to do the same thing
* Requires a "dummy" variable next to actual x variable
* The rcond allows for special handling of extremely small values 

In [37]:
x = np.vstack([x, np.ones(len(x))]).T

m, b = np.linalg.lstsq(x,y, rcond=None)[0]

In [38]:
m

1.0576184800045452

In [39]:
b

9.920438006191198

### Fill array with uniformly spaced data points between a minimum and maximum

In [64]:
np.linspace(-4,4,21)

array([-4. , -3.6, -3.2, -2.8, -2.4, -2. , -1.6, -1.2, -0.8, -0.4,  0. ,
        0.4,  0.8,  1.2,  1.6,  2. ,  2.4,  2.8,  3.2,  3.6,  4. ])