# Numpy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

**NumPy stands for Numerical Python.**

NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.  
This behavior is called **locality of reference** in computer science.  
This is the main reason why NumPy is faster than lists.  
Also it is optimized to work with latest CPU architectures.  

In [1]:
import numpy as np

### Defining an array

- An array that has 0-D arrays ( only numbers ) as its elements is called uni-dimensional or 1-D array.
- An array that has 1-D arrays as its elements is called a 2-D array.


In [2]:
arr1D = np.array([1,2,3,4,5])
print(arr1D)

[1 2 3 4 5]


In [3]:
arr2D = np.array([[1,2,3],[4,5,6]])
print(arr2D)

[[1 2 3]
 [4 5 6]]


### Accessing elements

In [4]:
print(arr1D[3])

4


In [5]:
print(arr2D[1][2])

6


### Inserting in an array

Arrays have fixed size and new elements cannot be added or popped from the array.  
To insert/delete you’ll have to store the array in new variable.

In [6]:
# unlike list, insert in arrays doesn't alter the array. We'll have to store it again
np.insert(arr1D,2,3)
print(arr1D)

[1 2 3 4 5]


In [7]:
# adding 3 at index 2
arr1D = np.insert(arr1D,2,3)
print(arr1D)

[1 2 3 3 4 5]


### Joining Arrays

In [8]:
arr1 = np.array([1,2,3])
arr2 = np.array([11,22,33])

arr3 = np.concatenate([arr1,arr2])
print(f"Using concatenate {arr3}\n")

arr3 = np.hstack([arr1,arr2])
print(f"Using hstack {arr3}\n")

arr3 = np.vstack([arr2,arr1])
print(f"Using vstack \n{arr3}")

Using concatenate [ 1  2  3 11 22 33]

Using hstack [ 1  2  3 11 22 33]

Using vstack 
[[11 22 33]
 [ 1  2  3]]


### Searching in Arrays

Numpy search function returns all the indexes where the value is found.  
Search can be performed using membership operators as well. 

In [9]:
x = np.where(arr1 == 3)
print(x)

x = ( 4 in arr1 )
print(x)

(array([2]),)
False


### Sorting

In [10]:
arr = np.array([1,3,6,3,7,0,3,5])
arr = np.sort(arr)
print(arr)

[0 1 3 3 3 5 6 7]


### Multiplication of arrays
The multiplication operator multiplies two arrays elements wise. 
Another methos is matrix multiplaction, illustrated below

<img src='matmul.png' width = 400>

In [11]:
arr1 = np.array([[2,3],[2,1]])
arr2 = np.array([[1,5],[3,7]])

print(f"Multiplication element wise\n {arr1*arr2}\n")
print(f"Matrix Multiplication \n {np.matmul(arr1,arr2)}")

Multiplication element wise
 [[ 2 15]
 [ 6  7]]

Matrix Multiplication 
 [[11 31]
 [ 5 17]]


Despite the arrays, the library provides us many mathematical functions like those for arithmetic, trigonometry, logarithmic etc.  
As most of them have been implemented in C/C++ by the founders of the library, the functions are very fast to interpret. More functions can be seen [here](https://numpy.org/doc/1.18/reference/routines.math.html)

# Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

### Data Frame
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Key of the dictionary are converted to column names while the number of values is converted to index.

In [12]:
import pandas as pd

In [13]:
pd.DataFrame({
    'a':[1,2],
    'b':[3,4]
})

Unnamed: 0,a,b
0,1,3
1,2,4


### Data Frame Details
DataFrame.columns gives us the column names, while the index gives the index values.  
**Note** that index can be assigned to any not null unique, key value pair of the dictionary.   
If not specified pandas generate a key for it.

In [14]:
# data sepcified here is in format of Dictionary
df = pd.DataFrame({
    'a':[1,2,3,4,5,6,7,8,9,10],
    'b':[100,90,80,70,60,50,40,30,20,10]
})

Each Column in a DataFrame can be referred just like Dictionary and is called Series.

In [15]:
df

Unnamed: 0,a,b
0,1,100
1,2,90
2,3,80
3,4,70
4,5,60
5,6,50
6,7,40
7,8,30
8,9,20
9,10,10


In [16]:
df.columns

Index(['a', 'b'], dtype='object')

In [17]:
df.index

RangeIndex(start=0, stop=10, step=1)

In [18]:
df.describe()

Unnamed: 0,a,b
count,10.0,10.0
mean,5.5,55.0
std,3.02765,30.276504
min,1.0,10.0
25%,3.25,32.5
50%,5.5,55.0
75%,7.75,77.5
max,10.0,100.0


In [19]:
df['b'].values

array([100,  90,  80,  70,  60,  50,  40,  30,  20,  10])

### Dataframe Operations
There are many dataset operations which we can apply to our dataframe. 
Unique gives us the unique values of all the values in a column

In [20]:
df['a'].unique()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Group by functions, applied the trailing function to unique value.
Example: here we group by column ‘b’
So   
- 1 = 1+3+5+7+9 = 25 
- 0 = 2+4+6+8+10+100 = 130

In [21]:
df = pd.DataFrame({
    'a':[1,2,3,4,5,6,7,8,9,10,100],
    'b':[1,0,1,0,1,0,1,0,1,0,0]
})

In [22]:
df.groupby('b').sum()

Unnamed: 0_level_0,a
b,Unnamed: 1_level_1
0,130
1,25


New column can be added just like adding a new key value pair in Dictionary or using assign function

In [23]:
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Height': [5.1, 6.2, None, 5.2], 
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']} 
  
df = pd.DataFrame(data) 
  
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna'] 
  
# Using 'Address' as the column name and equating it to the list 
df['Address'] = address 

df 

Unnamed: 0,Name,Height,Qualification,Address
0,Jai,5.1,Msc,Delhi
1,Princi,6.2,MA,Bangalore
2,Gaurav,,Msc,Chennai
3,Anuj,5.2,Msc,Patna


A column can be deleted using `dataframe.drop`  
Also we can use `dropna()` to drop any null values.

In [24]:
df.dropna()

Unnamed: 0,Name,Height,Qualification,Address
0,Jai,5.1,Msc,Delhi
1,Princi,6.2,MA,Bangalore
3,Anuj,5.2,Msc,Patna


Considering two dataframes with same columns, can be joined using concat keeping axis=0 or by specifying the index.  
To join the sideways you need to specify axis=1

In [25]:
data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']} 
   
data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 
        'Age':[17, 14, 12, 52], 
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} 
 
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])

df_final = pd.concat([df,df1],axis=0)
df_final

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Nagpur,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Kannuaj,Phd
4,Abhi,17,Nagpur,Btech
5,Ayushi,14,Kanpur,B.A
6,Dhiraj,12,Allahabad,Bcom
7,Hitesh,52,Kannuaj,B.hons


Selection

In [26]:
df_final[df_final['Age']>20]

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Nagpur,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Kannuaj,Phd
7,Hitesh,52,Kannuaj,B.hons


Applying changes to a column will affect all its elements.

In [27]:
df_final['Age'] = 50
df_final

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,50,Nagpur,Msc
1,Princi,50,Kanpur,MA
2,Gaurav,50,Allahabad,MCA
3,Anuj,50,Kannuaj,Phd
4,Abhi,50,Nagpur,Btech
5,Ayushi,50,Kanpur,B.A
6,Dhiraj,50,Allahabad,Bcom
7,Hitesh,50,Kannuaj,B.hons


Dummy Variables  
Variable with fixed number of unique values can be converted to different values using pd.get_dummies.
This help us to apply mathematical functions if any as arithmetic operator can only be applied to numbers. 


In [28]:
pd.get_dummies(df['Address'])

Unnamed: 0,Allahabad,Kannuaj,Kanpur,Nagpur
0,0,0,0,1
1,0,0,1,0
2,1,0,0,0
3,0,1,0,0
