---
# <font color= blue > **NumPy** </font>
* NumPy is the fundamental package for scientific computing with Python. 
* NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.

## <font color= blue > ndarray </font>
* NumPy's array class is called ndarray (n dimensional array).
* numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality.

In [30]:
import numpy as np    # np is an alias for numpy
np.__version__

'1.15.4'

In [1]:
import numpy as np
#to display numpy's built in documentation
np?


## Array Creation

In [67]:
import numpy as np
# Creating nd array from list
b=np.array([6,7,8])
b

array([6, 7, 8])

In [13]:
type(b)

numpy.ndarray

In [33]:
# if list is having different type of data, numpy will upcast if possible (integers are upcast to floating point)
import numpy as np
np.array([3.85,4,7,9,28.3])

array([ 3.85,  4.  ,  7.  ,  9.  , 28.3 ])

In [2]:
# if list contains string, integer and floating point number then nd array will have upcasted every element as Unicode of 4 bytes
#four bytes because "beta" has four characters
np.array(["A",34,78.9,"betagamma"])

array(['A', '34', '78.9', 'betagamma'], dtype='<U9')

In [46]:
# you can explicitly set the type of elements of resulting array
b=np.array([12,34,65,52], dtype='float32')
b


array([12., 34., 65., 52.], dtype=float32)

In [47]:
type(b)

numpy.ndarray

In [44]:
type(b[0])

numpy.float32

In [64]:
#Creating one dimensional ndarray 
np.array(range(8))

array([0, 1, 2, 3, 4, 5, 6, 7])

Array transforms sequence of sequences into two-dimensional arrays, 

In [6]:
# Unlike Python List ndarray can be multideimensional array
np.array([range(4,7),range(1,4)])

array([[4, 5, 6],
       [1, 2, 3]])

In [73]:
import numpy as np
#Unlike Python List, ndarray can be multidimensional. Creating ndarray from list comprehension
np.array([range(i, i+3) for i in [2,4,6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

## An example to show the important attributes of an ndarray object

In [27]:
# to show type of the array
a=np.array([[2,3,4],[5,6,7],[8,9,10]])
type(a)

numpy.ndarray

In [28]:
# to show the dimensionality of the array
a.ndim

2

In [29]:
#to show the shape of the array
a.shape

(3, 3)

In [30]:
# to get data type of array elements
a.dtype

dtype('int32')

In [58]:
# to get data type of array elements
a.dtype.name

'int32'

In [31]:
#to get number of elements in the array
a.size

9

In [32]:
#to get size of each element in bytes
a.itemsize


4

In [33]:
#to get total number of bytes taken up by array
a.nbytes

36

## To create sequences of numbers, NumPy provides arange function analogous to range that returns arrays instead of lists.

In [15]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [17]:
#to create an array of elements starting from 10 to 30 (not icluding 30) in the steps of 5
np.arange( 10, 30, 5 )

array([10, 15, 20, 25])

In [4]:
#to arange these 15 elements in two dimensional nd array of  (3,5) reshape() method is used
np.arange(15).reshape(3,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Note: **ndarray.reshape** function returns the array in modified shape.

## An array of Complex type can also be created as:

In [12]:
c = np.array( [ [1+2j,2], [3,4] ], dtype=complex )
c

array([[1.+2.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

In [17]:
import sys
L=list(range(1000))
L

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,


In [18]:
print(sys.getsizeof(5)*len(L))

28000


In [19]:
import numpy as np
x=np.arange(1000)
x.nbytes

4000

## Creating Arrays From Scratch Using NumPy Built-in-routines: 

   Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content.

In [11]:
#creating one dimensional array of 10 elements all 0.0
a=np.zeros(10)
a


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [14]:
a.dtype.name

'float64'

In [8]:
#Creating a one dimensional array of 10 integers initialized to 0
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [12]:
#Creating a 2x3x4 floating point array of 1's
np.ones((2,3,4))


array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

In [16]:
#Creating a 3x5 array of integers 5
np.full((3,5),5)

array([[5, 5, 5, 5, 5],
       [5, 5, 5, 5, 5],
       [5, 5, 5, 5, 5]])

In [21]:
#create a 3x3 identity matrix of integers
np.eye((3), dtype=int)

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

In [61]:
#creating 3x4 random integer array between 1 and 10
np.random.randint(1,10,(3,4))

array([[3, 6, 4, 2],
       [7, 8, 1, 4],
       [3, 8, 8, 7]])

In [13]:
#Creating a 3x4 array, elements will be whatever happens to already exist at that memory location
np.empty((3,4))

array([[1.12874800e-311, 2.86558075e-322, 0.00000000e+000,
        0.00000000e+000],
       [1.11260619e-306, 3.76231868e+174, 4.22007877e-090,
        1.39380320e+165],
       [1.55054622e+184, 4.39582454e+175, 6.48224659e+170,
        5.82471487e+257]])

## Printing arrays

## One-dimensional arrays are printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

In [34]:
a = np.arange(6)                         # 1d array
print(a)

[0 1 2 3 4 5]


In [35]:
b = np.arange(12).reshape(4,3)           # 2d array
print(b)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


In [36]:
c = np.arange(24).reshape(2,3,4)         # 3d array
print(c)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


## Vectorized Operations in Numpy

In [30]:
import numpy as np
np.random.seed(0)
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i]=1.0/values[i]
    return output
values=np.random.randint(1,10,size=5) # create an ndarray of five random integer elements between 1 and 10
print(values)
compute_reciprocals(values)


[6 1 4 4 8]


array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [27]:
# %timeit is an ipython magic function, which can be used to time a particular piece of code 
#(A single execution statement, or a single method)
big_array=np.random.randint(1,100, size=1000000)
%timeit compute_reciprocals(big_array)

1.6 s ± 40.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [22]:
%timeit (1.0/big_array)

4.04 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


See the difference between the time taken in both the cases. Which is faster?? 

Second method is faster. This is known as Vectorized Operations. Vectorized operations in NumPy are implemented via 
ufuncs(Universal Functions).
Main purpose of ufuncs is to quikly execute repeated operations on values in Numpy arrays.
This is the main reason of ndarry's importance in Data Science
There are Unary ufuncs and Binary ufuncs

# Basic operations

In [None]:
Arithmetic Unary and Binary operators on arrays apply elementwise. A new array is created and filled with the result.

In [33]:
import numpy as np
a=np.array(range(6))
a

array([0, 1, 2, 3, 4, 5])

In [58]:
-a   #negating each element

array([ 0, -1, -2, -3, -4, -5])

In [59]:
a+5

array([ 5,  6,  7,  8,  9, 10])

In [43]:
a-2

array([-2, -1,  0,  1,  2,  3])

In [46]:
a/2

array([0. , 0.5, 1. , 1.5, 2. , 2.5])

In [56]:
a//2   #integer division

array([0, 0, 1, 1, 2, 2], dtype=int32)

In [47]:
a*4

array([ 0,  4,  8, 12, 16, 20])

In [51]:
a**2

array([ 0,  1,  4,  9, 16, 25], dtype=int32)

In [34]:
2 **a

array([ 1,  2,  4,  8, 16, 32], dtype=int32)

In [68]:
#array creation and operator application can be done together
a = np.array(range(6))**3
a

array([  0,   1,   8,  27,  64, 125], dtype=int32)

In [57]:
a%2    # modulus operation

array([0, 1, 0, 1, 0, 1], dtype=int32)

In [None]:
All these operators are wrappers around specific universal functions built into Numpy, like np.negative, np.add, np.subtract 
np.multiply, np.divide,np.floor_divide,np.power,np.mod etc

In [62]:
np.add(a,5)    #same as a+5

array([ 5,  6,  7,  8,  9, 10])

Just as Numpy understands Python's built-in-operators, it also understands python built-in-functions like abs

In [63]:
b=-a
b

array([ 0, -1, -2, -3, -4, -5])

In [64]:
abs(b)    #np.absolute(b) corresponding ufunc

array([0, 1, 2, 3, 4, 5])

Arithmetic operators can also be performed on multidimensional matrices and on two matrices of same dimensionality

In [73]:
a = np.array( [20,30,40,50] )
b = np.arange( 1,5 )
c = a-b 
c

array([19, 28, 37, 46])

In [74]:
c=a+b
c

array([21, 32, 43, 54])

In [75]:
c=a/b
c

array([20.        , 15.        , 13.33333333, 12.5       ])

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator (in python >=3.5) or the dot function or method:

In [76]:
c=a*b
c

array([ 20,  60, 120, 200])

In [77]:
A = np.array( [[1,1],
             [0,1]] )
B = np.array( [[2,0],
             [3,4]] )
A * B  #element wise product

array([[2, 0],
       [0, 4]])

In [78]:
A @ B   #matrix product

array([[5, 4],
       [3, 4]])

Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.

In [80]:

A *= 3
A

array([[3, 3],
       [0, 3]])

In [81]:
B += A
B

array([[5, 3],
       [3, 7]])

## Aggregate Functions:Python built-in-functions like sum,min,max can be applied on Numpy's ndarray.

In [None]:
linspace function in numpy returns evenly spaced data points over an interval.

In [4]:
import numpy as np
a= np.linspace(0,9,4)
a

array([0., 3., 6., 9.])

In [8]:
sum(a)


18.0

In [9]:
max(a)

9.0

In [10]:
min(a)

0.0

In [None]:
Numpy has its own built-in aggregation functions like sum, product, median, min, max etc for working on arrays, 
which are much faster. To show the time taken by Python built-in function sum() and Numpy np.sum()

In [None]:
random([size])                          Return random floats in the half-open interval [0.0, 1.0)
rand(d0, d1, …, dn)                     Random values in a given shape and populate it with random samples from a 
                                        uniform distribution over [0, 1) [random.random and random.rand do same thing]
randint(low[, high, size, dtype])       Return random integers from low (inclusive) to high (exclusive).
random_integers(low[, high, size])      Random integers of type np.int between low and high, inclusive.                                                                                        

In [11]:
big_array=np.random.rand(1000000)  
%timeit sum(big_array)

71.8 ms ± 384 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [12]:
%timeit np.sum(big_array)

585 µs ± 23.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [13]:
%timeit big_array.sum()

612 µs ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [None]:
numpy aggregate functions are much faster.

In [None]:
python built-in function                                            numpy function    shorthand 
min(a)                                                              numpy.min(a)     a.min()
max(a)                                                              numpy.max(a)     a.max()
sum(a)                                                              numpy.sum(a)     a.sum()
                                                                    numpy.prod(a)    a.prod()
                                                                    numpy.mean(a)    a.mean()
                                                                    numpy.std(a)
                                                                    numpy.var(a)
                                                                    numpy.argmin(a) returns index of the minimum element
                                                                    numpy.argmax(a)    
                                                                    numpy.median(a)
                                                                    numpy.percentile(a)  rank based statistics of elements 
                                                                    numpy.any(a)  evaluates if any element is true
                                                                    numpy.all(a)
                                                              

### By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array:

In [17]:
b = np.arange(12).reshape(3,4)
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [18]:
b.sum()

66

In [19]:
b.sum(axis=1)    #sum along rows, axis keyword specifies axis which is collapsed. 0specifies row and 1 column

array([ 6, 22, 38])

In [20]:
b.sum(axis=0) #sum along column

array([12, 15, 18, 21])

In [23]:
#cumulative sum across columns
b.cumsum(axis=0)  

array([[ 0,  1,  2,  3],
       [ 4,  6,  8, 10],
       [12, 15, 18, 21]], dtype=int32)

### How to apply arithmetic operations on two arrays of different sizes?
Numpy provides Broadcasting-which is a set of rules for applying binary ufuncs (addition,subtraction and multiplication) on 
arrays of different sizes. 

In [24]:
import numpy as np     
a=np.ones((3,3))              #dim 3x3
a

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [25]:
b=np.arange(3)                #dim 1x3
b

array([0, 1, 2])

In [26]:
a+b

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

### Trignometric Functions, Inverse Trignometric functions, hyperbolic trignometric functions,Exponents and Logarithms and many more are available in Numpy.

## Indexing, Slicing and Iterating

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [38]:
a=np.array([12,56,34,21,8,19])   #indexing
a[2]

34

In [39]:
a[2:5]   #start, stop,step   #slicing to access sub array of the array

array([34, 21,  8])

In [40]:
a[:6:2] = -1000            #modify alternate element starting from 0 to 5
a

array([-1000,    56, -1000,    21, -1000,    19])

In [41]:
a[-1]           #negative indexing -1 refers to last element

19

In [42]:
a[ : :-1]                #display elements in reverse order, when step is negative then defaults for stop and start swapped

array([   19, -1000,    21, -1000,    56, -1000])

In [48]:
a[2: :-1]

array([-1000,    56, -1000])

In [36]:
import numpy
a=numpy.arange(15).reshape(3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [37]:
a[2,-2]   #mixing of positive and positive indexing second last element of third row

13

In [51]:
a[:2,::3]           #slicing in multidimensional array first two rows and first and fourth column

array([[0, 3],
       [5, 8]])

In [45]:
a[0:2,::-1]


array([[4, 3, 2, 1, 0],
       [9, 8, 7, 6, 5]])

### numpy.fromfunction construct an array by executing a function over each coordinate.

In [3]:
import numpy as np
def f(x,y):
    return 3*x+y
b = np.fromfunction(f,(5,4), dtype=int)
b

array([[ 0,  1,  2,  3],
       [ 3,  4,  5,  6],
       [ 6,  7,  8,  9],
       [ 9, 10, 11, 12],
       [12, 13, 14, 15]])

In [9]:
b[0:5, 1]                       # each row in the second column of b


array([ 1,  4,  7, 10, 13])

In [None]:
b[ : ,1]                        # equivalent to the previous example


In [None]:
b[1:3, : ]                      # each column in the second and third row of b

In [None]:
b[-1]                                  # the last row. Equivalent to b[-1,:]


Difference between slicing of a list and slicing of Python array: Array slicing returns views rather than copies of array data, unlike python Lists.
    Any change made to slicing of array changes the original array whereas changig a slice of List does not change the original List.

In [63]:
L=[12,45,67,41,21,32]
b=L[2:5]
b

[67, 41, 21]

In [64]:
b[1]=80
b

[67, 80, 21]

In [66]:
L

[12, 45, 67, 41, 21, 32]

In [67]:
import numpy
a=numpy.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [68]:
c=a[2:5]
c

array([2, 3, 4])

In [69]:
c[1]=12
c


array([ 2, 12,  4])

In [70]:
a

array([ 0,  1,  2, 12,  4,  5,  6,  7,  8,  9])

So what if one needs a copy of sub array of an array

In [71]:
c=a[2:5].copy()
c

array([ 2, 12,  4])

In [72]:
c[2]=5
c

array([ 2, 12,  5])

In [74]:
a

array([ 0,  1,  2, 12,  4,  5,  6,  7,  8,  9])

In [6]:
import numpy as np
x=np.arange(5)
y=np.empty(5)
np.multiply(x,10,out=y)   
print (y)

[ 0. 10. 20. 30. 40.]


In [8]:
y=np.zeros(10)    #create array of ten elements initialized to zeros
np.power(2,x,out=y[::2])#raise 2 to the power equal to every element of array x & write to alternate position of array y
print(y)

[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


## Two special operations: reduce and accumulate
## reduce repeatedly applies a given operation to the elements of an array until only a single result remains. 
## accumulate stores all the intermediate results as well.

In [10]:
x=np.arange(1,6)
x

array([1, 2, 3, 4, 5])

In [13]:
np.multiply.reduce(x)


120

In [14]:
np.multiply.accumulate(x)

array([  1,   2,   6,  24, 120], dtype=int32)

## A very simple application, useful for Data Analysis.
Let us consider the heights of all US presidents. This data is available in the file presidents_heights.csv, which is a simple comma-separated list of labels and values.

In [None]:
File comprises of 42 records. 10 sample records are as below:
order,name,height(cm)
1,George Washington,189
2,John Adams,170
3,Thomas Jefferson,189
4,James Madison,163
5,James Monroe,183
6,John Quincy Adams,171
7,Andrew Jackson,185
8,Martin Van Buren,168
9,William Henry Harrison,173
10,John Tyler,183   

in order to read this file, pandas library is required.

In [None]:
import pandas as pd
data= pd.read_csv('data/president_heights.csv')    #data is a an instance of datframe object in pandas
heights=np.array(data['heights(cm)'])
print(heights)

In [1]:
import numpy
heights=numpy.array([189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 183, 168, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 182, 185])
print(heights.size)

42


In [2]:
import numpy as np
print("Mean height:             ", heights.mean())  
print("Standard deviation:       ",heights.std() )
print("Minimum Height:           ", heights.min())
print("Maximum height:           ", heights.max())
print("25th percentile :          ",np.percentile(heights,25))
print("Median:                   ", np.median(heights))
print("75th percentile:          ",np.percentile(heights,75))

Mean height:              179.61904761904762
Standard deviation:        6.831134539223372
Minimum Height:            163
Maximum height:            193
25th percentile :           174.25
Median:                    182.0
75th percentile:           183.0


## Comparisons, Masks and Boolean Logic

Boolean masks are used to examine and manipulate values within numpy arrays. eg one may want to count all values greater than a certain value or to remove all ouliers that are above some threshold value.

Example: Counting Rainy days. Suppose we have a series of data representing the amount of precipitation each day for a year in given city.There will be 365 values giving daily rainfall.

In [None]:
import numpy as np
import pandas as pd
data=pd.read_csv('data/Seattle2014.csv')
rainfall=np.array(data['PRCP'])
inches=rainfall/254   #convert rainfall from cm to inches
inches.shape    # gives (365,) ie one dimensional array of 365 values


numpy universal functions will now be used to answer questions like- 
(i) How many rainy days were there in a year?
(ii) What is the average precipitation on those rainy days?
(iii) How many days were there with more than half an inch of rain?


In [None]:
Just like arithmetic operators, numpy also provides comparison(relational operators and their equivalent universal functions which works on nd arrays.
Operator           Equivalent ufunc
 ==                  np.equal
 !=                  np.not_equal
 <                   np.less
 <=                  np.less_equal
 >                   np.greater
 >=                  np.greater_equal                                                              
                                                               
                                                               
                                                               
                                                               
                                                               
                                            

In [2]:
x=np.array([1,2,3,4,5,6])
x

array([1, 2, 3, 4, 5, 6])

In [4]:
x<3

array([ True,  True, False, False, False, False])

In [5]:
x !=3

array([ True,  True, False,  True,  True,  True])

In [6]:
(x * 2) == (x ** 2)

array([False,  True, False, False, False, False])

in the background when you write x < 3 Numpy uses np.less(x,3)

comparison operators work on ndarray of any dimension

In [7]:
y=np.arange(15).reshape(3,5)
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [8]:
y >= 3

array([[False, False, False,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

Working with Boolean Arrays

to count the number of true values in a Boolean array, np.count_nonzero is used

In [10]:
np.count_nonzero(x<4)

3

In [11]:
np.sum(x<4)          #this will also do the same. false are treated as 0 and true as 1 and then sum will give number of true

3

In [13]:
np.sum(y<10,axis=1)   #count true values along each row


array([5, 5, 0])

In [14]:
np.any(x>4)    #are there any value in x which is greater than 4

True

In [15]:
np.all(x>3)  #are all values of x greater than 3

False

In [17]:
np.all(y < 8, axis=1)   #are all values along each rows less than 8

array([ True, False, False])

In [None]:
We can use these comparison operators to find out number of days with no rain or less than 4 inches of rain.
np.
np.count_nonzero(inches == 0) or np.sum(inches == 0)
np.count_nonzero(inches <4 )

To count number of days with rainfall less than 4 inches but greater than 1 inch, bitwise logic operators are used, which works on Boolean array

In [None]:
operator &  (np.bitwise_and)
        |   (np.bitwise_or)
        ^   (np.bitwise_xor)
        ~    (np.bitwise_not)

In [None]:
np.count_nonzero((inches >0.5) & (inches <1)) will yield 29           

In [None]:
np.sum(inches == 0)        # number of days without rain               215
np.sum(inches !=0)         # number of days with rain                   150   
np.sum(inches > 0.5)       # number of days with more than 0.5 inches   37


To count number of days with rainfall less than 4 inches but greater than 1 inch, bitwise logic operators are used, which works on Boolean array




In [None]:
operator &  (np.bitwise_and)
        |   (np.bitwise_or)
        ^   (np.bitwise_xor)
        ~    (np.bitwise_not)



In [None]:
np.count_nonzero((inches >0.5) & (inches <1)) will yield 29           

np.sum(inches == 0)        # number of days without rain               215
np.sum(inches !=0)         # number of days with rain                   150   
np.sum(inches > 0.5)       # number of days with more than 0.5 inches   37


Boolean arrays can be used as masks, to select particular subsets of the data themselves. To get an array of all values in the array that are less than 5,



In [3]:
L=[[5,0,3,3],[7,9,3,5],[2,4,7,6]]
L

[[5, 0, 3, 3], [7, 9, 3, 5], [2, 4, 7, 6]]

In [4]:
import numpy as np
x=np.array(L)
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [5]:
x < 5


array([[False,  True,  True,  True],
       [False, False,  True, False],
       [ True,  True, False, False]])

To select these values from the array, index on this Boolean array, this is known as masking operation.

In [6]:
x[x<5]   # all the values in position at which the mask array is True.

array([0, 3, 3, 3, 2, 4])

We have selected the values which are less than 5.

In [None]:
#construct a mask on all rainy days
rainy = (inches > 0)
#construct the mask for all summer days (June 21st is 172nd day) [three months ie 90 days starting from 172nd day]
summer= (np.arange(365)-172 < 90) & (np.arange(365)-172 >0)
print("Median precip on rainy days in 2014:  ", np.median(inches[rainy]))
print("Median precip on summer days in 2014:  ", np.median(inches[summer]))
print("Maximum precip on summer days in 2014: ", np.max(inches[summer]))
print("Median precip on non-summer rainy days: ", np.median(inches[rainy & ~summer]))

Median precip on rainy days in 2014:  0.194881889764
Median precip on summer days in 2014: 0.0
Maximum precip on summer days in 2014:  0.850393700787
Median precip on non-summer rainy days: 0.200787401575
    
By combining Boolean operations, masking operations and aggregates such questions can be quickly answered.    

In [None]:
FAST SORTING IN NUMPY: Numpy's np.sort and np.argsort functions are very efficient and use quicksort with O(NlogN) 

In [13]:
x=np.array([2,6,9,3,1,8,9])
x

array([2, 6, 9, 3, 1, 8, 9])

In [11]:
#returns a new sorted array, x remains same
np.sort(x)

array([1, 2, 3, 6, 8, 9, 9])

In [12]:
#x remains the same
x

array([2, 6, 9, 3, 1, 8, 9])

In [9]:
#to sort the array in place, use the sort method of arrays
x.sort()
x

array([1, 2, 3, 6, 8, 9, 9])

In [14]:
x=np.array([2,6,9,3,1,8,9])
x

array([2, 6, 9, 3, 1, 8, 9])

In [15]:
i=np.argsort(x)   #returns an array of the indices of the sorted elements
i

array([4, 0, 3, 1, 5, 2, 6], dtype=int64)

In [16]:
type(i)

numpy.ndarray

In [17]:
#that is if we print x[i] it will print the sorted array
x[i]


array([1, 2, 3, 6, 8, 9, 9])

Sorting can be done along rows and columns of multidimensional arrays.

In [18]:
y=np.array([[12,5,8,2],[67,34,21,3]])
y

array([[12,  5,  8,  2],
       [67, 34, 21,  3]])

In [21]:
#sort y along rows
np.sort(y, axis=1)

array([[ 2,  5,  8, 12],
       [ 3, 21, 34, 67]])

By No Means an EXhaustive Coverage of NUMPY