This notebook is inspired from:
[Jake VanderPlas - Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html)

# Numpy

In [6]:
## importing numpy
import numpy as np

np.__version__

'1.16.5'

## Creating Numpy Arrays

__From lists__

In [22]:
np.array([[[4, 1, 7],[ 13, 2.76]]]).shape

(1, 2)

In [10]:
# if we want we can also specify the data types
type(my_array1)


numpy.ndarray

In [17]:
my_array1=np.array([4, 1, 7, 13, 2.76])
print(my_array1.shape)
#A 'one-dimensional array'

#A 'two dimensional array is what is wanted, so nest the first row as a list: 
my_array1=np.array([[4, 1, 7, 13, 2.76]])
print(my_array1.shape)

(5,)
(1, 5)


In [25]:
my_1d_array=np.array([12,27,33.3,1e56],dtype='int32')

OverflowError: Python int too large to convert to C long

Marat mentioned that the naming above is dangerous.  Is that because of it ending in "array" or something else? 

In [34]:
# but aware that they have certain range and limitations

array1=np.array([1,2,3,4],dtype='int32')

array1[2]=4.5

print(array1)

#Notice the "4.5" becomes "4" because the d-type is integer.  Easy mistake to make. 

[1 2 4 4]


Unlike lists, arrays can be multidimensional

In [50]:
multidim = np.array([[1,2,3,12],
          [4,5,6,11], 
          [7,8,9,10]])

In [51]:
multidim.shape

(3, 4)

__From Scratch__

In [55]:
# we can create a numpy array with zeros of any shape
np.zeros((2,6),dtype='int')

array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

In [56]:
## Again we can pass the dtype

np.zeros((2,6), dtype = float)

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [54]:
# we can create an array of any shape filled with any number:

np.full((3, 7), .23)

array([[0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23],
       [0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23],
       [0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23]])

In [None]:
## np.ones?

Other useful methods for creating arrays:

- `np.arange`

- `np.linspace`

- `np.random.random`

In [72]:
np.arange(5,25,3)
 
np.linspace(0,1,5)

np.random.random(size=(10)) # 10 random numbers, uniform distribution

np.random.randn(10) #normal distribution

np.random.normal(loc=58, scale=2, size=100)

array([61.46123348, 59.46941892, 55.7618402 , 56.37112647, 55.81517929,
       54.60465191, 59.1076216 , 59.78989069, 63.67027464, 58.13993626,
       57.86798824, 58.36404236, 61.91210936, 56.72190072, 54.62334182,
       56.97104371, 57.84324903, 58.42956391, 58.89233663, 57.4280111 ,
       58.80343375, 58.15107797, 54.71583329, 56.99999088, 57.08767455,
       59.53863949, 56.39476232, 58.48987511, 60.37222878, 58.18919414,
       56.57130215, 60.88122428, 57.9385345 , 57.92748443, 55.98434768,
       56.18409166, 56.82718215, 58.49789171, 59.55515551, 57.87865809,
       58.7584975 , 57.97579045, 55.68352961, 54.19206212, 57.81393173,
       56.82881993, 57.40822022, 57.53983797, 60.13893494, 58.55511773,
       59.90332113, 55.67056129, 59.08422343, 55.20523654, 56.00361986,
       61.54307697, 60.49903676, 59.68503359, 57.43337322, 56.54733036,
       59.69477963, 59.33785759, 56.91577988, 61.21905782, 60.78113175,
       55.2434523 , 58.20036807, 58.17455494, 55.79278279, 59.43

## Descriptive Statistics with Numpy

In [None]:
## let's create a sample from normally distributed population of size = 10

In [75]:
sample1=np.random.normal(loc=10,scale=1,size=10)


In [None]:
## what is the mean of sample1?

In [76]:
sample1.mean()

9.719454428831654

In [None]:
## what is the median of sample1?

In [79]:
np.median(sample1)

9.793823288130408

In [85]:
## sorting sample1

np.sort(sample1)

array([ 7.1487867 ,  9.23830071,  9.4193642 ,  9.42162959,  9.65865098,
        9.92899559, 10.2893768 , 10.57174231, 10.70056625, 10.81713116])

In [None]:
## what is the 0.1 percentile of sample1?

In [87]:
np.percentile(sample1,q=.1,interpolation='lower')  #Interpolation returns an actual level of the list.  

7.148786695487686

In [None]:
## Where is the max/min in sample1

In [None]:
## We can use different formattings as we print values
print('Maximum of sample1 is %.2f'%sample1.max())
print('The index of the max in sample1 is {}'.format(sample1.argmax()))

[Comparison between % and format](https://stackoverflow.com/questions/5082452/string-formatting-vs-format)

[Descriptive Statistics](https://www.hackerearth.com/blog/developers/descriptive-statistics-python-numpy/)

In [94]:
sample1.argmax()

9