# Data wrangling & manipulation using Numpy and Pandas in Python -Part 1 (Numpy)

NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. This tutorial explains the basics of NumPy such as its architecture and environment.

### Operations using NumPy

 * Mathematical and logical operations on arrays.
 * Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation.
 
The ndarray object consists of contiguous one-dimensional segment of computer memory, combined with an indexing scheme that maps each item to a location in the memory block. 

In [1]:
# in order to use numpy we should import it first
import numpy as np
# Here we are importing numpy package and giving it a name as "np". Whenever we want to use numpy, then use "np"

a = np.array([1,2,3]) # We are creating a 1-d array 
print (a) # we are printing the array


[1 2 3]


In [2]:
a = np.array([[1, 2], [3, 4]]) # We are creating a 2d array
print (a)

[[1 2]
 [3 4]]


### 1) numpy.arange
**Syntax:** numpy.arange(start,stop, step)

* **start** : number, optional

Start of interval. The interval includes this value. The default start value is 0.

* **stop** : number

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

* **step** : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given

In [3]:
# Example 1
import numpy as np
np.arange( 10, 30, 5 )

array([10, 15, 20, 25])

In [4]:
a = np.arange(15) # Prints numbers from 0 to 14. Similar to "range" function  in built-in python.
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### 2) numpy.reshape
**Syntax:** numpy.reshape(a, newshape)

_Gives a new shape to an array without changing its data._


* **a** : Array to be reshaped.

* **newshape** : int or tuple of ints
The new shape should be compatible with the original shape.

In [5]:
a = np.arange(15).reshape(3, 5) # Change the 1-d array into 3*5 matrix
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

### 3) ndarray.shape

The dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

In [6]:
a.shape

(3, 5)

### 4) ndarray.ndim
The number of axes (dimensions) of the array.

In [7]:
a.ndim

2

### 5.1) ndarray.dtype
An object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

In [8]:
a.dtype

dtype('int32')

### 5.2) type()

Returns the type of object.

In [9]:
type(a)

numpy.ndarray

### 6) ndarray.itemsize
The size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.

In [10]:
a.itemsize

4

### 7) ndarray.size
The total number of elements of the array. This is equal to the product of the elements of shape.

In [11]:
a.size

15

**Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.
The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.**

### 9) numpy.zeros , numpy.ones , numpy.empty, numpy.eye, numpy.full and numpy.random.random

In [12]:
np.zeros((3,4)) # creates an arrays of all zero

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [13]:
np.ones((2,3),dtype=int) #  creates an array of all ones. dtype can also be specified 

array([[1, 1, 1],
       [1, 1, 1]])

In [14]:
np.empty((2,3))  # Creates an empty array. Note that the numbers are the elements at that location 

array([[0., 0., 0.],
       [0., 0., 0.]])

In [15]:
np.eye(3) # Creates identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [16]:
np.full((2,2),7) # Creates a matrix with the specified elements

array([[7, 7],
       [7, 7]])

In [17]:
a = np.random.random((2,3)) # Creates a matrix with some random numbers
a

array([[0.196953  , 0.75509068, 0.98040344],
       [0.96853969, 0.08056366, 0.39969312]])

### 10) Basic Operations
### 10.1) Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [18]:
# Let's create two arrays
import numpy as np
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
print("Elements in a are",a)
print("Elements in b are",b)

Elements in a are [20 30 40 50]
Elements in b are [0 1 2 3]


In [19]:
c = a-b # Simple substraction
print(c)

[20 29 38 47]


In [20]:
print(b**4) # power

[ 0  1 16 81]


In [21]:
print(10*np.sin(a)) # trignometric functions are available in numpy package

[ 9.12945251 -9.88031624  7.4511316  -2.62374854]


In [22]:
print(a<35) # Retuns true (or) false based on condition

[ True  True False False]


In [23]:
a[0] +=10 # This is equalient to a[0]=a[0]+10
print(a)

[30 30 40 50]


### 10.2) Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator (in python >=3.5) or the dot function or method:

In [26]:
A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0],[3,4]] )

print("Elements in A are \n",A)
print("Elements in B are \n",B) 

Elements in A are 
 [[1 1]
 [0 1]]
Elements in B are 
 [[2 0]
 [3 4]]


In [27]:
print(A * B)  # elementwise product

[[2 0]
 [0 4]]


In [None]:
print(A @ B)  # matrix product

In [None]:
print(A.dot(B)) # another matrix product

In [None]:
print(np.dot(A,B)) # another matrix product

### 10.3) Some of the math functions on arrays-Use of axis

In [28]:
b=np.arange(12).reshape(3,4)
print(b)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [29]:
print(b.sum()) # Addition of all elements in b

66


In [30]:
print(b.sum(axis=0)) # Note that axis=0 means column 

[12 15 18 21]


In [31]:
print(b.sum(axis=1)) # Note that axis=1 means row

[ 6 22 38]


In [32]:
print(b.min(axis=0)) # Mininmum element from each row

[0 1 2 3]


In [33]:
print(b.cumsum(axis=1)) # Cummulative sum of each row

[[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]


### 10.4) Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [34]:
# Create 2 arrays
a=np.arange(5)
print(a)

b=np.array([22,11,33,44,21])
print(b)

[0 1 2 3 4]
[22 11 33 44 21]


In [35]:
print(np.add(a,b))# Elementwise sum; both produce the same array
print(a + b)

[22 12 35 47 25]
[22 12 35 47 25]


In [36]:
print(a - b) # Elementwise difference; both produce the same array
print(np.subtract(a, b))

[-22 -10 -31 -41 -17]
[-22 -10 -31 -41 -17]


In [37]:
print(a * b)# Elementwise product; both produce the same array
print(np.multiply(a, b))

[  0  11  66 132  84]
[  0  11  66 132  84]


In [38]:
print(a / b)# Elementwise division; both produce the same array
print(np.divide(a, b))

[0.         0.09090909 0.06060606 0.06818182 0.19047619]
[0.         0.09090909 0.06060606 0.06818182 0.19047619]


### 11) Slicing and Iterating

### 11.1) Slicing: In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:

#### Slicing a 1-D array

In [39]:
import numpy as np
# Consider an array
a=np.arange(10)**3
print(a)

[  0   1   8  27  64 125 216 343 512 729]


In [2]:
print(a[2]) # Prints the element at index 2. i.e 3rd element of the array a

8


In [3]:
print(a[2:4]) # Prints the element from index 2 to index 3. Note that a[2:4] means elements from 2 upto 4 (But not including 4).

[ 8 27]


In [4]:
print(a[2:3]) # Note that this returns an array with 1 element. But a[2] returns a scalar element

[8]


#### Slicing a 2-D array

In [7]:
a=np.arange(12).reshape(4,3)**2 # Create an array "a" with square of numbers from 0 to 11.
print(a)

[[  0   1   4]
 [  9  16  25]
 [ 36  49  64]
 [ 81 100 121]]


In [8]:
print(a[1:3,1:2]) # from second and third rows, print second column

[[16]
 [49]]


In [9]:
print(a[1,1])

16


In [10]:
print(a[:2,]) # print 1st and 2nd row

[[ 0  1  4]
 [ 9 16 25]]


In [11]:
print(a[:3,])# print 1st, 2nd and 3rd row

[[ 0  1  4]
 [ 9 16 25]
 [36 49 64]]


In [12]:
print(a[:3,:2])# from 1st , 2nd and 3rd row, print 1st and 2nd column 

[[ 0  1]
 [ 9 16]
 [36 49]]


In [13]:
print(a[:,2]) # from all rows, print 3rd column

[  4  25  64 121]


In [14]:
print(a[2,:])# print 3rd row with all columns

[36 49 64]


### Using -ve indentation for slicing the array

#### Slicing 1-D array using -ve indexing

In [15]:
a=np.arange(10)**3
print(a)

[  0   1   8  27  64 125 216 343 512 729]


In [16]:
print(a[-1]) # prints last element

729


In [17]:
print(a[-2]) # prints last but element

512


In [18]:
print(a[-5:-2]) # prints 5th ,6th and 7th element. i.e -5,-4,-3

[125 216 343]


In [19]:
# Slicing a 2-D using -ve indexing

a=np.arange(12).reshape(4,3)**2 
print(a)

[[  0   1   4]
 [  9  16  25]
 [ 36  49  64]
 [ 81 100 121]]


In [20]:
print(a[-1,]) # Prints last row

[ 81 100 121]


In [21]:
print(a[-1,-1]) # print element at last row and last column

121


In [22]:
print(a[-3:-1,-2:]) # Note: -3 :-1 means 3rd row from last upto 1st row from last(exclusive) and -2: means 
                    #2nd column and 1st column from last.

[[16 25]
 [49 64]]


In [25]:
print(a[-3:-2,-3:])

[[ 9 16 25]]


In [26]:
print(a[-2:,-2:])

[[ 49  64]
 [100 121]]


### A slice of an array is a view into the same data, so modifying it will modify the original array.

In [32]:
a=np.array([[1,2,3],[4,5,6],[7,8,9]]) # Create an array "a"
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [33]:
b=a[:1,:1] # Use slicing to pull out the subarray consisting of 1st element in an array
print(b)

[[1]]


In [34]:
# Now let's change the array "b"

b[0]=12
print(a) # Note that original array is changed


[[12  2  3]
 [ 4  5  6]
 [ 7  8  9]]


### One useful trick with integer array indexing is selecting or mutating (Change) one element from each row of a matrix.

In [None]:
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
a

In [None]:
b=np.array([0,1,2])
b

In [None]:
# Select one element from each row of "a" using the indices in b.
a[np.arange(3),b]


In [None]:
a[np.arange(3),b]+=10 # We have change one element from each row of a using the indices in b
a

### 11.2) Iterating

**Iterating over multidimensional arrays is done with respect to the first axis.**

In [None]:
a=np.arange(12).reshape(4,3)**2 # Create an array "a" with square of numbers from 0 to 11.
print(a)

for i in a:
    print(i+10)

**However, if one wants to perform an operation on each element in the array, one can use the "flat" attribute which is an iterator over all the elements of the array.**

In [None]:
for i in a.flat:
    print(i+12)

### 12) Matrix functions for matrix shape manipulation

### 12.1) Difference between resize and reshape


In [40]:
a=np.floor(10*np.random.random((3,4))) # Create a rondom 3*4 matrix.
a

array([[6., 3., 6., 9.],
       [5., 4., 9., 0.],
       [5., 4., 8., 7.]])

In [41]:
a.reshape(2,6) 

array([[6., 3., 6., 9., 5., 4.],
       [9., 0., 5., 4., 8., 7.]])

In [42]:
a # Note that actual array is not changed

array([[6., 3., 6., 9.],
       [5., 4., 9., 0.],
       [5., 4., 8., 7.]])

In [43]:
a.resize(2,6)

In [44]:
a # The reshape function returns its argument with a modified shape, 
  #whereas the ndarray.resize method modifies the array itself:

array([[6., 3., 6., 9., 5., 4.],
       [9., 0., 5., 4., 8., 7.]])

### 12.3) To find transpose of a matrix.

In [45]:
a.T

array([[6., 9.],
       [3., 0.],
       [6., 5.],
       [9., 4.],
       [5., 8.],
       [4., 7.]])

### 13) Stacking together different arrays

In [47]:
# Create 2 matrices

a=np.floor(10*np.random.random((2,2)))
print(a)

b=np.floor(10*np.random.random((2,2)))
print(b)

[[9. 3.]
 [6. 7.]]
[[6. 9.]
 [6. 9.]]


In [48]:
np.vstack((a,b)) # Stacking one array above another.

array([[9., 3.],
       [6., 7.],
       [6., 9.],
       [6., 9.]])

In [49]:
np.hstack((a,b)) # Stacking one array beside another

array([[9., 3., 6., 9.],
       [6., 7., 6., 9.]])

### 14) Splitting one array into several smaller ones

In [None]:
a=np.array([[11,12,13,14,15,16],[17,19,23,24,25,34]])
print(a)

np.hsplit(a,3) # Split "a" into 3

In [None]:
np.hsplit(a,(3,4)) # Split "a" after the third and the fourth column

In [None]:
np.vsplit(a,2) # Splits along vertical axis

### 15) Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

In [54]:
a=np.array([[11,22,33],[44,55,66],[12,14,34]])
a

b=np.array([1,2,3])
b

array([1, 2, 3])

In [55]:
c=a+b
c

array([[12, 24, 36],
       [45, 57, 69],
       [13, 16, 37]])