# Numpy

##### Numpy one of the most important foundational packages of python.
##### Many packages providing scientific functionality use Numpy's Array objects.

1. ndarray -- an efficient multidimensional array providing fast array operations and flexible broadcasting capabilities.
2. Mathematical functions for performing fast operations on entire array data without loop.
3. Tools for reading/writing array data to disk and working memory-mapped files.
4. Linear algebra, random number generation, and Fourier transform capabilities
5. A C api to connect with libraries written in C, C++ and FORTRAN

##### While Numpy by itself does not provide modeling or scientificc functionality, but it helps in using tools with array computing semantics like pandas.

#### For most data analysis applications, the main areas of functionality focus on are:
1. Fast array-based operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations
2. Common array algorithms like sorting, unique, and set operations
3. Efficient descriptive statistics and aggregating/summarizing data
4. Data alignment and relational data manipulations for merging and joining together heterogeneous datasets
5. Expressing conditional logic as array expressions instead of loops with if-elif-else branches
6. Group-wise data manipulations (aggregation, transformation, function application)

#### Why is numpy efficient which is one of its most important uses?

1. Numpy internally stores data in contiguous block of memory independent of each other.
2. C based algorithms (most important advantages)
3. Perform complex array operations on entire arrays without the need of loop, which can be slow for large sequences.

In [1]:
import numpy as np

my_array = np.arange(1000000)
my_list = list(range(1000000))

In [2]:
%time
for _ in range(10):
    my_arr2 = my_array * 2

CPU times: total: 0 ns
Wall time: 0 ns


In [3]:
%time
for _ in range(10):
    my_list2 = [x*2 for x in my_list]
# i dont know why the time is showing 0 it took longer time to execute also.

CPU times: total: 0 ns
Wall time: 0 ns


##### ndarray is a fast flexible container for large datasets of python
##### enables you to perform mathematical ops on whole array just like scalar elements

In [4]:
data = np.random.randn(2,3)
data

array([[ 0.67496582, -0.29501942,  0.46609846],
       [ 1.10929145, -1.55018771,  1.44558131]])

In [5]:
data*100

array([[  67.49658234,  -29.50194195,   46.60984567],
       [ 110.92914528, -155.01877076,  144.55813144]])

In [6]:
data + data

array([[ 1.34993165, -0.59003884,  0.93219691],
       [ 2.21858291, -3.10037542,  2.89116263]])

##### ndarray containes only homogenous data

In [7]:
data.shape

(2, 3)

In [8]:
data.dtype

dtype('float64')

In [9]:
# creating numpy arrays

data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [10]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2
# since data2 was a list of list arr2 also has 2 dimensions with shape inferred from the data

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [11]:
arr2.ndim

2

In [12]:
arr2.shape

(2, 4)

In [13]:
arr1.dtype

dtype('float64')

In [14]:
arr2.dtype

dtype('int32')

In [15]:
# array of zero
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [16]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [17]:
# empty creates an array without initializing its value to any particular value. the values can be zeros or garbage value
np.empty((2,3,2))

array([[[7.70164874e-312, 2.47032823e-322],
        [0.00000000e+000, 0.00000000e+000],
        [1.18831764e-312, 6.82116729e-043]],

       [[5.27486918e-091, 3.99761406e+175],
        [6.55036289e-043, 3.64069989e+175],
        [3.99910963e+252, 6.57679456e-038]]])

In [18]:
# arange has built in range function
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Data Types for ndarrays

In [19]:
arr1 = np.array([1,2,3],dtype = np.float64)

In [20]:
arr2 = np.array([1,2,3],dtype = np.int32)

In [21]:
arr1.dtype

dtype('float64')

In [22]:
arr2.dtype

# dtypes are a source of numpy flexibility for interacting with data coming from other systems. 

dtype('int32')

In [23]:
# you can explicitly convert or cast an array from one dtype to another using astype method

arr = np.array([1,2,3,4,5])
arr.dtype

dtype('int32')

In [24]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [25]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [26]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

In [27]:
numeric_strings = np.array(['1.24','56','65'], dtype = np.string_)
numeric_strings.astype(np.float64)

array([ 1.24, 56.  , 65.  ])

In [28]:
# you can also use another array dtype

int_array = np.arange(10)
calibers = np.array([.22,.27,8.9,.357],dtype = np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [29]:
# there are shorthand type code strings you can also use to refer to a dtype
empty_int32 = np.empty(8,dtype = 'u4')
empty_int32

# calling astype always creates a new array

array([4128860, 6029375, 3801155, 5570652, 6619251, 7536754, 4718684,
            80], dtype=uint32)

### Arithmetic with Numpy array

In [30]:
arr = np.array([[1,2,3],[4,5,6]],dtype = np.float64)
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [31]:
arr*arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [32]:
arr-arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [33]:
# arithmetic ops with scalars propogate the scalar argument to each element in the array
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [34]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [35]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [36]:
# Comparisons between arrays of the same size yield boolean arrays:
arr2>arr1

array([[False,  True, False],
       [ True, False,  True]])

### Basic indexing and slicing 

In [37]:
arr = np.arange(10)

In [38]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [39]:
arr[5]

5

In [40]:
arr[5:8]

array([5, 6, 7])

In [41]:
arr[5:8] = 12 # example of broadcasting

In [42]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [43]:
# first distinction from python lists is that array slices are only views so the data is not copied or modified

In [44]:
arr_slice = arr[5:8]

In [45]:
arr_slice

array([12, 12, 12])

In [46]:
arr_slice[1] = 12345

In [47]:
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

In [48]:
arr_slice[:] = 64 # : assign value to all values of the array

In [49]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [50]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [51]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [52]:
arr2d[2]

array([7, 8, 9])

In [53]:
arr2d[0][2]

3

In [54]:
arr2d[0,2]

3

In [55]:
arr2d[0]

array([1, 2, 3])

In [56]:
arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])

In [57]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [58]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [59]:
old_values = arr3d[0].copy()

In [60]:
arr3d[0] = 42

In [61]:
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [62]:
arr3d[0] = old_values

In [63]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [64]:
arr3d[1,0]

array([7, 8, 9])

In [65]:
# the above expression is same as though we had indexed in two steps
x = arr3d[1]

In [66]:
x

array([[ 7,  8,  9],
       [10, 11, 12]])

In [67]:
x[0]

array([7, 8, 9])

### Indexing with Slices

In [68]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [69]:
arr[1:6] # numpy array can be sliced just like python list but they are only views

array([ 1,  2,  3,  4, 64])

In [70]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [71]:
arr2d[:2] # it can also be read as "select first 2 rows of the array"

array([[1, 2, 3],
       [4, 5, 6]])

In [72]:
arr2d[len(arr2d)-1:]

array([[7, 8, 9]])

In [73]:
arr2d[:2,1:]

array([[2, 3],
       [5, 6]])

In [74]:
arr2d[1,:2] # second row but only 1 and 2 col

array([4, 5])

In [75]:
arr2d[:2,2] # 3rd col but 1 and 2 row

array([3, 6])

In [76]:
arr2d[:,:1] # only column

array([[1],
       [4],
       [7]])

In [77]:
arr2d[:2,1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

In [78]:
arr2d[:2,1:]

array([[0, 0],
       [0, 0]])

In [79]:
arr2d[2]

array([7, 8, 9])

In [80]:
arr2d[2,:]

array([7, 8, 9])

In [81]:
arr2d[2:,:]

array([[7, 8, 9]])

In [82]:
arr2d[:,:2]

array([[1, 0],
       [4, 0],
       [7, 8]])

In [83]:
arr2d[1,:2]

array([4, 0])

In [84]:
arr2d[1:2,:2]

array([[4, 0]])

### Boolean Indexing

In [85]:
names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

In [86]:
data = np.random.randn(7,4)

In [87]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [88]:
data

array([[-0.32552658,  0.24888733,  1.10556593,  0.29280865],
       [-0.53753807, -0.00562826,  1.53239548,  0.48266783],
       [ 1.58605126,  2.43583879,  0.14683721,  1.30923711],
       [ 1.0211308 ,  1.5250246 ,  0.05990687,  0.28975946],
       [ 1.03222844, -0.47865327,  1.12974063,  0.54612441],
       [-0.30617505, -0.16975189,  1.13619437, -1.34742365],
       [-1.2489589 ,  1.77981686, -0.15987246, -0.68906118]])

In [89]:
names=='Bob'

array([ True, False, False,  True, False, False, False])

In [90]:
data[names=='Bob']
# the boolean array must be of the same length as the array it's indexing 

array([[-0.32552658,  0.24888733,  1.10556593,  0.29280865],
       [ 1.0211308 ,  1.5250246 ,  0.05990687,  0.28975946]])

In [91]:
data[names=='Bob',:2]

array([[-0.32552658,  0.24888733],
       [ 1.0211308 ,  1.5250246 ]])

In [92]:
data[names=='Bob',3]

array([0.29280865, 0.28975946])

In [93]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [94]:
~(names=='Bob')

array([False,  True,  True, False,  True,  True,  True])

In [95]:
data[~(names=='Bob')]

array([[-0.53753807, -0.00562826,  1.53239548,  0.48266783],
       [ 1.58605126,  2.43583879,  0.14683721,  1.30923711],
       [ 1.03222844, -0.47865327,  1.12974063,  0.54612441],
       [-0.30617505, -0.16975189,  1.13619437, -1.34742365],
       [-1.2489589 ,  1.77981686, -0.15987246, -0.68906118]])

In [96]:
cond = names=='Bob'
cond

array([ True, False, False,  True, False, False, False])

In [97]:
data[~cond]

array([[-0.53753807, -0.00562826,  1.53239548,  0.48266783],
       [ 1.58605126,  2.43583879,  0.14683721,  1.30923711],
       [ 1.03222844, -0.47865327,  1.12974063,  0.54612441],
       [-0.30617505, -0.16975189,  1.13619437, -1.34742365],
       [-1.2489589 ,  1.77981686, -0.15987246, -0.68906118]])

In [98]:
mask = (names=='Bob')|(names=='Will')

In [99]:
mask

array([ True, False,  True,  True,  True, False, False])

In [100]:
data[mask]

array([[-0.32552658,  0.24888733,  1.10556593,  0.29280865],
       [ 1.58605126,  2.43583879,  0.14683721,  1.30923711],
       [ 1.0211308 ,  1.5250246 ,  0.05990687,  0.28975946],
       [ 1.03222844, -0.47865327,  1.12974063,  0.54612441]])

In [101]:
# Note 
# python keywords 'and' and 'or' do not work with boolean arrays use & and |

In [102]:
data[data<0] = 0
data

array([[0.        , 0.24888733, 1.10556593, 0.29280865],
       [0.        , 0.        , 1.53239548, 0.48266783],
       [1.58605126, 2.43583879, 0.14683721, 1.30923711],
       [1.0211308 , 1.5250246 , 0.05990687, 0.28975946],
       [1.03222844, 0.        , 1.12974063, 0.54612441],
       [0.        , 0.        , 1.13619437, 0.        ],
       [0.        , 1.77981686, 0.        , 0.        ]])

In [103]:
data[names != 'Joe']=7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 1.53239548, 0.48266783],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 1.13619437, 0.        ],
       [0.        , 1.77981686, 0.        , 0.        ]])

### Fancy Indexing

In [104]:
arr = np.empty((8,4))

In [105]:
arr

array([[7.70165168e-312, 1.06224114e-321, 0.00000000e+000,
        0.00000000e+000],
       [1.90059520e+007, 5.02034658e+175, 6.81694768e-038,
        4.25212874e+174],
       [1.74081126e+184, 1.60128004e+160, 1.50026836e-076,
        5.19566651e-144],
       [3.59751658e+252, 1.46901661e+179, 8.37404147e+242,
        2.59027926e-144],
       [3.80985069e+180, 1.14428494e+243, 2.59027907e-144,
        7.79952704e-143],
       [1.39448262e+165, 1.10528274e-046, 2.82793109e-056,
        1.20881492e+161],
       [2.59027862e-144, 2.59903818e-144, 7.11456194e-091,
        6.76093145e-067],
       [1.25736066e-071, 1.11475752e+261, 1.16318408e-028,
        2.97707521e+296]])

In [106]:
for i in range(8):
    arr[i]=i 
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [107]:
arr[[4,3,0,6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [108]:
arr[[-3,-5,-7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [109]:
arr = np.arange(32).reshape((8,4))

In [110]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [111]:
arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])