# Numpy
It is Numerical Python (NumPy), a Python package that provides (among other things) an alternative to Python Lists, i.e. NumPy Arrays that allows you to apply mathematical calculations over entire arrays.

 
Numpy is  great for doing vector arithmetic.

Installation: pip3 install numpy

**import numpy as np**


In [13]:
import numpy as np

height = [1.73, 1.68, 1.71, 1.89, 1.79] # list
weight = [ 65.4, 59.2, 63.6, 88.4, 68.7] # list

np_height = np.array(height) # converts list to a NumPy array
np_weight = np.array(weight) # converts list to a NumPy array

In [12]:
bmi = weight / height ** 2 # simple BMI calculation using Lists (throws an error)
bmi

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [11]:
bmi = np_weight / np_height ** 2 # simple BMI calculation using NumPy arrays -- element-wise calculations
bmi

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

### <span style="color:red">Numpy Characteristics</span>

#### NumPy Arrays: contain only one type
 - If you try to use different data types in the NumPy array, it will convert them to strings
 - If you try to use float and integer types in the NumPy array, it will convert them to floats
 - If you try to use boolean and integer/float types in the NumPy array, it will convert Boolan to 1 or 0 and rest to number

In [7]:
np.array([1.0, "is", True]) # mixed data types - converts to string

array(['1.0', 'is', 'True'], dtype='|S32')

In [8]:
np.array([1.0, 5, 6])  # float and integer types - converts to float

array([1., 5., 6.])

In [9]:
np.array([True, 1, 2]) + np.array([3, 4, False]) # Boolean and integer types -- Boolean to 1 or 0

array([4, 5, 2])

In [10]:
np.array([True, 1, 2.0]) + np.array([3, 4, False]) # Boolean and float types -- Boolean to 1.0 or 0.0

array([4., 5., 2.])

### <span style="color:red">NumPy Array Operations</span>

Numpy Array will behave differently than List by doing calculaions rather than concatenation or repetition.

In [14]:
python_list = [1, 2, 3]
np_array = np.array([1, 2, 3])

python_list + python_list # lists are concatenated

[1, 2, 3, 1, 2, 3]

In [15]:
np_array + np_array # arrays are added

array([2, 4, 6])

In [16]:
python_list * 3 # lists are repeated

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [17]:
np_array * 3 # array is multiplied

array([3, 6, 9])

In [18]:
python_list * python_list # list multiplication not allowed

TypeError: can't multiply sequence by non-int of type 'list'

In [19]:
np_array * np_array # array is multiplied with itself

array([1, 4, 9])

In [20]:
bmi[0] = 20.0 # you can assign new values to existing values
print bmi

[20.         20.97505669 21.75028214 24.7473475  21.44127836]


### <span style="color:red">NumPy Array Subsetting</span>

**Numpy Array subsetting works same as the list subsetting with brackets [ ].**

In [21]:
print(bmi)
bmi[1]

[20.         20.97505669 21.75028214 24.7473475  21.44127836]


20.97505668934241

**Numpy Array also has extra capabilty where you can do filtering within the brackets using boolean logic.**

In [22]:
bmi > 23

array([False, False, False,  True, False])

In [23]:
bmi[bmi < 24] # provides you with subset of elements that only satisfies the condition inside the bracket

array([20.        , 20.97505669, 21.75028214, 21.44127836])

### <span style="color:red">NumPy - 2d Arrays</span>

You can built multi-dimensional arrays with Numpy arrays.

2d Numpy Array can be built by combining two arrays into one. 

In [24]:
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])

type(np_height) # ndarray - n dimensional array

numpy.ndarray

In [25]:
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                  [65.4, 59.2, 63.6, 88.4, 68.7]]) 

print(np_2d)
type(np_2d)

[[ 1.73  1.68  1.71  1.89  1.79]
 [65.4  59.2  63.6  88.4  68.7 ]]


numpy.ndarray

**numpy.ndarray.shape**

The shape property (attribute) is usually used to get the current shape of an array: Tuple of array dimensions.

In [None]:
np_2d.shape # 2 rows, 5 columns

In [26]:
x = np.array([1, 2, 3, 4])
x.shape # 4 rows

(4L,)

In [27]:
np_baseball = np.array([[180, 78.4],
                    [215, 102.7],
                    [210, 98.5],
                    [188, 75.2]])
np_baseball.shape # 4 rows, 2 columns

(4L, 2L)

In [28]:
np.array([[1.73, 1.68, 1.71, 1.89, 1.79],   # added a string to array - all elemnts will be converted to strings
         [65.4, 59.2, 63.6, 88.4, "68.7"]]) # you can only have one data type in the array

array([['1.73', '1.68', '1.71', '1.89', '1.79'],
       ['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='|S32')

**index() Method**

The method index() returns the lowest index in list that obj appears.

In [29]:
np_2d[1]  # Second row (index 1) of the 2d array

array([65.4, 59.2, 63.6, 88.4, 68.7])

In [30]:
np_2d[0][1]  # First row (index 0) and second column (index 1) element of the 2d array

1.68

In [31]:
np_2d[0,1] # different representation of the same subsetting above 

1.68

In [32]:
 np_2d[:,1:3] # Second and third row and column array elements; : -- selects all rows and 1:3 -- selects second and third columns

array([[ 1.68,  1.71],
       [59.2 , 63.6 ]])

In [33]:
np_2d[1,:] # selects entire second row with all columns

array([65.4, 59.2, 63.6, 88.4, 68.7])

In [34]:
# regular list of lists
x = [["a", "b"], ["c", "d"]]
[x[0][0], x[1][0]]

['a', 'c']

In [35]:
# numpy
np_x = np.array(x)
np_x[:,0] # |S1 means a string of length 1.

array(['a', 'c'], dtype='|S1')

**Math Operations**

In [36]:
np_mat = np.array([[1, 2],
                   [3, 4],
                   [5, 6]])
np_mat * 2

array([[ 2,  4],
       [ 6,  8],
       [10, 12]])

In [37]:
np_mat + np.array([10, 12])

array([[11, 14],
       [13, 16],
       [15, 18]])

In [38]:
np_mat + np_mat

array([[ 2,  4],
       [ 6,  8],
       [10, 12]])

### <span style="color:red">Numpy - Generate Data by Sampling Random Distributions</span>

**numpy.random.normal (loc=0.0, scale=1.0, size=None)**

Draw random samples from a normal (Gaussian) distribution. 

*Parameters:*	
loc : float or array_like of floats - Mean (“centre”) of the distribution.

scale : float or array_like of floats - Standard deviation (spread or “width”) of the distribution.

size : int or tuple of ints, optional - Output shape (number of samples). If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

**numpy.column_stack method**

Stack 1-D arrays as columns into a multi-demnsional array.

In [39]:
height_random = np.round(np.random.normal(1.75, 0.20, 5000), 2) # Simulates 5000 times random distribution based on 
                                                                # dist. mean: 1.75 and spread (SD) of 0.20

weight_random = np.round(np.random.normal(60.32, 15, 5000), 2) # Simulates 5000 times random distribution based on 
                                                               # dist. mean: 60.32 and spread (SD) of 15

np_city = np.column_stack((height_random, weight_random))      # stacks to 1d arrays into a 2d array
print(np_city)
print(np_city.shape)

[[ 1.69 69.69]
 [ 1.3  66.43]
 [ 1.44 35.3 ]
 ...
 [ 1.81 62.99]
 [ 1.57 60.85]
 [ 1.51 61.29]]
(5000L, 2L)


In [40]:
np.column_stack(([3, 1, 1, 2, 3, 3, 2, 3], 
                         [5, 7, 5, 4, 6, 7, 5, 4]))

array([[3, 5],
       [1, 7],
       [1, 5],
       [2, 4],
       [3, 6],
       [3, 7],
       [2, 5],
       [3, 4]])

**numpy.arange ([start, ]stop, [step, ]dtype=None)**

Return evenly spaced values within a given interval.

Values are generated within the half-open interval start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

*Note: When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.*

*Parameters:*	
start : number, optional - Start of interval. The interval includes this value. The default start value is 0.

stop : number - End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

step : number, optional - Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given.

dtype : dtype - The type of the output array. If dtype is not given, infer the data type from the other input arguments.

In [3]:
np.arange(3)

array([0, 1, 2])

In [4]:
np.arange(3.0)

array([0., 1., 2.])

In [5]:
np.arange(3,7)

array([3, 4, 5, 6])

In [6]:
np.arange(3,7,2)

array([3, 5])

### <span style="color:red">Numpy Statistics</span>

In [41]:
print(np_mat)
print(np.mean(np_mat[:,0])) # average of first column
print(np.mean(np_mat[:,:])) # average of all elements
print(np.median(np_mat[:,:])) # median of all elements
print(np.median(np_mat[:,0])) # median of first column

[[1 2]
 [3 4]
 [5 6]]
3.0
3.5
3.5
3.0


In [42]:
np.corrcoef(np_mat[:,0], np_mat[:,1]) # correlation coefficient of first column and second column to see if they are correlated

array([[1., 1.],
       [1., 1.]])

In [43]:
np.std(np_mat[:,0]) # standard deviation of first column

1.632993161855452

In [44]:
np.sum(np_mat[:,0]) # sum of the first column

9

In [45]:
np.sort(np_mat[:1]) # sorts based on the first column

array([[1, 2]])