# numpy/pandas

## numpy
* fundamental package for scientific computing (i.e., numerics and mathematics) with Python
* vector oriented computing
* efficiently implemented multi-dimensional arrays
* how are numpy arrays different from Python containers?
 * Python variables are references–values are independent objects with their own space in memory and a Python variable points (or refers) to it
   * inefficient for lots of vars of same type
 * numpy arrays reserve a space in memory and all of the values are contiguous

![alt-text](array_vs_list.png 'array vs. list')

![alt-text](numpy-array.jpg 'numpy-array')

## numpy datatypes
* __`numpy`__ is very precise about identifying datatypes
* several types of integers: __`numpy.int8`__, __`numpy.int16`__, __`numpy.int32`__, __`numpy.int64`__ (also unsigned)
* __`numpy.float32`__, __`numpy.float64`__, __`numpy.float128`__ (also complex types)
* boolean
* string, Unicode string (same as Python but length must be specified in advance)

Read more: https://docs.scipy.org/doc/numpy/user/basics.types.html

## creating numpy arrays

In [1]:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
a # repr() is being called

array([1, 2, 3, 4, 5])

In [2]:
type(a), a.dtype

(numpy.ndarray, dtype('int64'))

In [3]:
# types matter for ndarrays!
a[0] = 34.7 # Ok, as it can be converted to int
a[0] = 'x'
a

ValueError: invalid literal for int() with base 10: 'x'

In [4]:
a

array([34,  2,  3,  4,  5])

In [5]:
# If need be, you can specify type
a = np.array([1, 2, 3, 4, 5], dtype=np.float64)
a

array([1., 2., 3., 4., 5.])

In [6]:
a.ndim, a.shape, a.size

(1, (5,), 5)

In [7]:
# unlike Python lists, NumPy arrays can
# multi-dimensional
b = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]],
             dtype=np.float64)
b

array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.]])

In [None]:
# ...or initialize using a list comprehension
np.array([range(i, i + 3) for i in [3, 5, 7]])

In [None]:
b, b.ndim, b.shape, b.size

## Creating arrays from scratch
* especially for larger arrays, it is more efficient to create arrays from scratch using routines built into NumPy

In [8]:
np.zeros((4, 6), dtype=int)

array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

In [None]:
np.empty((4, 4), dtype='float64')

In [9]:
np.full((3, 9), 3.14159)

array([[3.14159, 3.14159, 3.14159, 3.14159, 3.14159, 3.14159, 3.14159,
        3.14159, 3.14159],
       [3.14159, 3.14159, 3.14159, 3.14159, 3.14159, 3.14159, 3.14159,
        3.14159, 3.14159],
       [3.14159, 3.14159, 3.14159, 3.14159, 3.14159, 3.14159, 3.14159,
        3.14159, 3.14159]])

In [None]:
# linear sequence, similar to range()
np.arange(0, 10, 2)

In [None]:
# five values evenly-spaced beteen 0 and 10
np.linspace(0, 10, 5)

In [10]:
# 3x3 array of uniformly distributed random values between 0 and 1
np.random.random((3, 3))

array([[0.31665596, 0.22980838, 0.59220783],
       [0.52468797, 0.83517942, 0.5913187 ],
       [0.34821106, 0.62099008, 0.95631668]])

In [None]:
np.random.standard_normal((2, 4))

In [None]:
# 3x3 array of normally distributed random values with mean 0 and stdev 2
np.random.normal(0, 2, (3, 3))

In [None]:
# 4x4 array of random integers in interval [0, 100)
np.random.randint(0, 100, (4, 4))

In [None]:
# identity matrix
np.eye(8, dtype='float32')

## indexing/slicing

In [None]:
a = np.linspace(0, 10, 5)
a

In [None]:
a[3]

In [None]:
aa = np.random.random((5, 4))
aa

In [None]:
aa[1, 2]

In [None]:
aa[:,2:4] # extract col 2 and 3

In [None]:
aa[2:5, 1] # extract rows 2-4, element 1

In [None]:
aa[::-1]

In [None]:
aa[::-1, ::-1]

## Manipulating numpy arrays

In [11]:
a = np.random.standard_normal((2, 4))
b = np.random.standard_normal((2, 4))
a, b

(array([[ 1.70529928,  0.01306013,  0.61475412,  0.08243156],
        [-0.68335118,  1.38442753,  0.34659512, -0.91340872]]),
 array([[ 0.42010643, -0.16590015, -1.23023684,  0.29593086],
        [-1.19920541,  0.35262114, -0.23541494, -0.13774255]]))

In [12]:
np.vstack([a, b])

array([[ 1.70529928,  0.01306013,  0.61475412,  0.08243156],
       [-0.68335118,  1.38442753,  0.34659512, -0.91340872],
       [ 0.42010643, -0.16590015, -1.23023684,  0.29593086],
       [-1.19920541,  0.35262114, -0.23541494, -0.13774255]])

In [13]:
np.hstack([a, b])

array([[ 1.70529928,  0.01306013,  0.61475412,  0.08243156,  0.42010643,
        -0.16590015, -1.23023684,  0.29593086],
       [-0.68335118,  1.38442753,  0.34659512, -0.91340872, -1.19920541,
         0.35262114, -0.23541494, -0.13774255]])

In [None]:
a.transpose()

## Saving/Loading a numpy array

In [None]:
np.save('/tmp/a.npy', a)
a1 = np.load('/tmp/a.npy')
a1

## Performing math on numpy arrays

In [14]:
x = np.linspace(0, 10, 1000)
x

array([ 0.        ,  0.01001001,  0.02002002,  0.03003003,  0.04004004,
        0.05005005,  0.06006006,  0.07007007,  0.08008008,  0.09009009,
        0.1001001 ,  0.11011011,  0.12012012,  0.13013013,  0.14014014,
        0.15015015,  0.16016016,  0.17017017,  0.18018018,  0.19019019,
        0.2002002 ,  0.21021021,  0.22022022,  0.23023023,  0.24024024,
        0.25025025,  0.26026026,  0.27027027,  0.28028028,  0.29029029,
        0.3003003 ,  0.31031031,  0.32032032,  0.33033033,  0.34034034,
        0.35035035,  0.36036036,  0.37037037,  0.38038038,  0.39039039,
        0.4004004 ,  0.41041041,  0.42042042,  0.43043043,  0.44044044,
        0.45045045,  0.46046046,  0.47047047,  0.48048048,  0.49049049,
        0.5005005 ,  0.51051051,  0.52052052,  0.53053053,  0.54054054,
        0.55055055,  0.56056056,  0.57057057,  0.58058058,  0.59059059,
        0.6006006 ,  0.61061061,  0.62062062,  0.63063063,  0.64064064,
        0.65065065,  0.66066066,  0.67067067,  0.68068068,  0.69

In [16]:
%time sinx = np.sin(x)
# "universal" function which operates on entire array!
sinx

CPU times: user 212 µs, sys: 112 µs, total: 324 µs
Wall time: 335 µs


array([ 0.        ,  0.01000984,  0.02001868,  0.03002552,  0.04002934,
        0.05002916,  0.06002396,  0.07001275,  0.07999452,  0.08996827,
        0.09993302,  0.10988774,  0.11983146,  0.12976317,  0.13968188,
        0.14958659,  0.15947632,  0.16935006,  0.17920684,  0.18904566,
        0.19886554,  0.20866549,  0.21844453,  0.22820168,  0.23793597,
        0.24764642,  0.25733206,  0.26699191,  0.276625  ,  0.28623038,
        0.29580708,  0.30535414,  0.3148706 ,  0.32435552,  0.33380793,
        0.3432269 ,  0.35261147,  0.36196071,  0.37127369,  0.38054946,
        0.3897871 ,  0.39898569,  0.4081443 ,  0.41726201,  0.42633791,
        0.4353711 ,  0.44436066,  0.45330569,  0.46220531,  0.47105861,
        0.47986471,  0.48862273,  0.49733179,  0.50599102,  0.51459954,
        0.52315651,  0.53166105,  0.54011232,  0.54850948,  0.55685167,
        0.56513807,  0.57336784,  0.58154016,  0.58965421,  0.59770917,
        0.60570425,  0.61363863,  0.62151153,  0.62932216,  0.63

In [17]:
%%timeit
for i in range(0, 1000):
    sinx[i] = np.sin(x[i])

959 µs ± 3.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## __`numpy`__ Datetime Object

In [None]:
np.datetime64('2016')

In [None]:
np.datetime64('2016-03')

In [None]:
np.datetime64('2016-03-31 08:30:00')

In [None]:
np.datetime64('2016-03-07') < np.datetime64('2016-03-09')

In [None]:
np.datetime64('2016-03-09') - np.datetime64('2016-03-07')

In [None]:
np.datetime64('2016-01-01') + np.timedelta64(59, 'D')

In [None]:
np.arange(np.datetime64('2016-02-01'),
          np.datetime64('2016-03-01'))
#np.timedelta64(67,'D') / np.timedelta64(1, 'W')