#### NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python. Most computational packages providing scientific functionality use NumPy’s array objects as the lingua franca for data exchange.

 - ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.

 - Mathematical functions for fast operations on entire arrays of data without having to write loops.

 - Tools for reading/writing array data to disk and working with memory-mapped files.

 - Linear algebra, random number generation, and Fourier transform capabilities.

 - A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.
 - NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.

 - NumPy operations perform complex computations on entire arrays without the need for Python for loops.

##  performance difference

In [1]:
import numpy as np
my_arr = np.arange(1000)
my_list = list(range(1000000))

In [2]:
%time for _ in range(10): my_arr2 = my_arr * 2

Wall time: 1 ms


In [3]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 2.31 s


`NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.`

## The NumPy ndarray: A Multidimensional Array Object

In [4]:
data = np.random.rand(2,3)
data

array([[0.42725716, 0.15576252, 0.65694374],
       [0.16498318, 0.83175184, 0.02255984]])

In [5]:
data*10

array([[4.27257155, 1.55762518, 6.56943738],
       [1.64983185, 8.31751839, 0.22559837]])

`An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array`

In [6]:
data.shape

(2, 3)

In [7]:
data.dtype

dtype('float64')

## Creating ndarrays

In [8]:
data1 = [1,6.5,100,20]
arr1 = np.array(data1)
arr1

array([  1. ,   6.5, 100. ,  20. ])

In [12]:
arr1.dtype

dtype('float64')

In [10]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2  = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [14]:
arr2.dtype

dtype('int32')

`zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape. empty creates an array without initializing its values to any particular value. To create a higher dimensional array with these methods, pass a tuple for the shape:`

In [15]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [17]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [18]:
np.empty((3,3,3))

array([[[1.09379020e-311, 1.09378876e-311, 1.09378876e-311],
        [1.09379059e-311, 1.09378873e-311, 1.09378873e-311],
        [1.09379020e-311, 1.09379020e-311, 1.09379025e-311]],

       [[1.09379018e-311, 1.09379018e-311, 1.09379014e-311],
        [1.09379014e-311, 1.09379014e-311, 1.09379014e-311],
        [1.09379014e-311, 1.09379014e-311, 1.09379014e-311]],

       [[1.09379014e-311, 1.09379014e-311, 1.09379014e-311],
        [1.09379014e-311, 1.09379014e-311, 1.09379014e-311],
        [1.09379014e-311, 1.09379014e-311, 1.09379014e-311]]])

<p style="color:red">It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values.</p>

## Data Types for ndarrays

<strong>The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data</strong>

In [28]:
type(np.float)

type

In [30]:
arr1 = np.array([1,2,3.1],dtype=np.float)

In [31]:
arr1

array([1. , 2. , 3.1])

In [32]:
arr2 = np.array([1,2,3.1],dtype=np.int)
arr2

array([1, 2, 3])

## explicitly convert or cast

In [33]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int32')

In [34]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [35]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings

array([b'1.25', b'-9.6', b'42'], dtype='|S4')

<i>Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.</i>

## Arithmetic with NumPy Arrays

Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [36]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array

In [37]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [38]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Comparisons between arrays of the same size yield boolean arrays

In [40]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

## Basic Indexing and Slicing

In [41]:
arr = np.arange(10)
arr[1:3]

array([1, 2])

In [44]:
arr[1:3] = 10e3

In [45]:
arr

array([    0, 10000, 10000,     3,     4,     5,     6,     7,     8,
           9])

<p style="color:red">An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.</p>

In [46]:
arr_slice = arr[5:8]
arr_slice

array([5, 6, 7])

In [47]:
arr_slice[1] = 12345
arr

array([    0, 10000, 10000,     3,     4,     5, 12345,     7,     8,
           9])

<i>If you are new to NumPy, you might be surprised by this, especially if you have used other array programming languages that copy data more eagerly. As NumPy has been designed to be able to work with very large arrays, you could imagine performance and memory problems if NumPy insisted on always copying data.</i>

In [49]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [50]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [51]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [52]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

In [53]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)

In [54]:
data

array([[ 0.21823693, -1.13216746, -0.00678146,  0.96287594],
       [-0.79330357,  0.56359267,  0.43761873,  1.35410669],
       [ 0.668544  , -0.19974725,  0.73651563, -1.25026972],
       [-0.49306159,  0.54780423,  0.55646318, -0.40270012],
       [ 0.57320653,  1.42611396,  1.33929036,  0.01491998],
       [ 1.5102293 , -0.47467497, -0.20961643, -0.95994729],
       [-1.80050732, -0.26393218,  1.0512865 , -0.69929569]])

In [55]:
data.shape

(7, 4)

In [56]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [58]:
data[names == 'Bob']

(2, 4)

In [59]:
data[names == 'Bob', 2:]

array([[-0.00678146,  0.96287594],
       [ 0.55646318, -0.40270012]])

In [60]:
data

array([[ 0.21823693, -1.13216746, -0.00678146,  0.96287594],
       [-0.79330357,  0.56359267,  0.43761873,  1.35410669],
       [ 0.668544  , -0.19974725,  0.73651563, -1.25026972],
       [-0.49306159,  0.54780423,  0.55646318, -0.40270012],
       [ 0.57320653,  1.42611396,  1.33929036,  0.01491998],
       [ 1.5102293 , -0.47467497, -0.20961643, -0.95994729],
       [-1.80050732, -0.26393218,  1.0512865 , -0.69929569]])

In [61]:
data<0

array([[False,  True,  True, False],
       [ True, False, False, False],
       [False,  True, False,  True],
       [ True, False, False,  True],
       [False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True, False,  True]])

In [63]:
data[data<0] = 0

In [64]:
data

array([[0.21823693, 0.        , 0.        , 0.96287594],
       [0.        , 0.56359267, 0.43761873, 1.35410669],
       [0.668544  , 0.        , 0.73651563, 0.        ],
       [0.        , 0.54780423, 0.55646318, 0.        ],
       [0.57320653, 1.42611396, 1.33929036, 0.01491998],
       [1.5102293 , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 1.0512865 , 0.        ]])

In [70]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i

In [71]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [72]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([1., 5., 7., 2.])

## Transposing Arrays and Swapping Axes

In [73]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [74]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

## Universal Functions: Fast Element-Wise Array Functions

In [76]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [77]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [78]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [80]:
x = np.random.randn(8)
y = np.random.randn(8)

np.maximum(x,y)

array([ 1.03434871, -0.33230951, -0.61892018,  0.68438449,  1.60053665,
        0.29300406,  0.10074693,  0.91829173])

## Array-Oriented Programming with Arrays

Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations. 

In [85]:
points = np.arange(-5, 5, 0.01)
points.shape

(1000,)

In [83]:
xs, ys = np.meshgrid(points, points)

In [86]:
xs.shape

(1000, 1000)

In [87]:
ys.shape

(1000, 1000)

In [88]:
z = np.sqrt(xs ** 2 + ys ** 2)
z.shape

(1000, 1000)

## Expressing Conditional Logic as Array Operations

In [89]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

zip(xarr,yarr,cond)

<zip at 0x20376fb1888>

In [90]:
result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]

<p style="color:red">This has multiple problems. First, it will not be very fast for large arrays (because all the work is being done in interpreted Python code). Second, it will not work with multidimensional arrays. With np.where you can write this very concisely</p>

In [93]:
result = np.where(cond, xarr, yarr)

## Mathematical and Statistical Methods

In [95]:
arr = np.random.randn(5, 4)
arr

array([[-1.52743449,  0.71534456, -0.62903073, -0.20599973],
       [ 0.38466508,  0.57458224, -0.04406182, -1.11726682],
       [-0.08732652, -0.14591825, -1.10008422,  1.10022569],
       [-0.46208468,  0.32520585, -1.87578141,  1.60481266],
       [ 1.16110244, -0.08574342, -0.27428234,  0.39922427]])

In [96]:
arr.mean()

-0.06449258097722585

In [97]:
np.mean(arr)

-0.06449258097722585

In [98]:
arr.sum()

-1.2898516195445169

In [99]:
arr.mean(axis=1)

array([-0.4117801 , -0.05052033, -0.05827582, -0.10196189,  0.30007524])

In [100]:
arr.sum(axis=0)

array([-0.53107817,  1.38347098, -3.92324051,  1.78099608])

<mark>Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0) means “compute sum down the rows.”</mark>

In [101]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
np.cumsum(arr)

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

In [102]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [103]:
arr.cumsum(axis=0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

## Methods for Boolean Arrays

In [104]:
arr = np.random.randn(100)
(arr>0).sum()

46

## Sorting

In [108]:
arr.sort()
arr

array([-2.25903121, -2.19048112, -2.1209715 , -2.08423633, -1.96648922,
       -1.69818336, -1.67203234, -1.61224136, -1.54839536, -1.46948841,
       -1.38714708, -1.31689549, -1.27869315, -1.22294772, -1.22001905,
       -1.1702748 , -1.03055146, -0.98375136, -0.8361093 , -0.82282847,
       -0.79427403, -0.78103551, -0.70656092, -0.6180259 , -0.57619066,
       -0.56234166, -0.55644127, -0.54921635, -0.48735921, -0.4662985 ,
       -0.45399928, -0.45211795, -0.44035841, -0.4355905 , -0.41946555,
       -0.3451298 , -0.34409959, -0.33906175, -0.31762689, -0.28960389,
       -0.27780489, -0.25347865, -0.24841948, -0.2270002 , -0.2223323 ,
       -0.18744057, -0.09429902, -0.08491604, -0.08447607, -0.07769994,
       -0.04005261, -0.03665245, -0.03501153, -0.0236565 ,  0.02874323,
        0.07742189,  0.12794696,  0.1608684 ,  0.1725148 ,  0.22114032,
        0.22715914,  0.23670379,  0.26348077,  0.28377979,  0.29238589,
        0.32536117,  0.32632729,  0.33329869,  0.33811245,  0.34

In [109]:
arr = np.random.randn(5, 3)
arr.sort(1)
arr

array([[-1.61997677,  0.01626832,  1.080817  ],
       [-1.32971304,  0.1466147 ,  1.85034015],
       [-1.53891362, -0.83388282,  0.0731081 ],
       [-2.59005287, -0.81623975,  0.01275844],
       [-1.14576815, -0.3215562 ,  1.17909587]])

## Unique and Other Set Logic

In [110]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

## File Input and Output with Arrays

In [112]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [113]:
np.save('some_array', arr)

In [114]:
np.load('some_array.npy')

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [115]:
np.savez('array_archive.npz', a=arr, b=arr)

In [116]:
arch = np.load('array_archive.npz')
arch['b']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

***