<a href="https://colab.research.google.com/github/elmaazouziyassine/Machine_Learning_Python/blob/master/Python%20Libraries%20For%20Machine%20Learning/Pandas%20Library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pandas Library**

### **Introduction**

##### **What is Pandas?**

- Pandas is a Python library that provides fast data cleaning, preparation, and analysis and is easy to use for data visualization and machine learning.

- Pandas is built on top of NumPy, which makes it easy to work with arrays and matrices (called Series and DataFrames in Pandas)

- A DataFrame Object: 
  - acts like a spreadsheet of rows & columns in Excel
  - Made up of a set of Series objects (rows and columns)
  - is indexable

- It is easy to create a Pandas DataFrame from a Numpy Array. 
Pandas DataFrames can be cast as NumPy arrays.

##### **How to create Pandas Series?**


In [0]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [30]:
mySeries = Series(np.arange(8), index=['row 1', 'row 2','row 3','row 4','row 5', 'row 6', 'row 7', 'row 8'])
print(mySeries)
print(mySeries['row 7'])
print(mySeries[[0,7]])

row 1    0
row 2    1
row 3    2
row 4    3
row 5    4
row 6    5
row 7    6
row 8    7
dtype: int64
6
row 1    0
row 8    7
dtype: int64


##### **How to create a DataFrame?**


In [31]:
np.random.seed(25)
myDF = DataFrame(np.random.rand(16).reshape((4,4)), 
                   index=['row 1', 'row 2', 'row 3', 'row 4'],
                   columns=['column 1', 'column 2', 'column 3', 'column 4'])
print(myDF)
print(myDF.ix[['row 2', 'row 4'], ['column 3', 'column 2']])

       column 1  column 2  column 3  column 4
row 1  0.870124  0.582277  0.278839  0.185911
row 2  0.411100  0.117376  0.684969  0.437611
row 3  0.556229  0.367080  0.402366  0.113041
row 4  0.447031  0.585445  0.161985  0.520719
       column 3  column 2
row 2  0.684969  0.117376
row 4  0.161985  0.585445


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  


### **Inspecting DataFrames**

In [32]:
myDF = DataFrame(np.arange(16).reshape((4,4)), 
                   index=['row 1', 'row 2', 'row 3', 'row 4'],
                   columns=['column 1', 'column 2', 'column 3', 'column 4'])

myDF.head()
myDF.tail()
print(myDF.shape)
print(myDF.size)
print(len(myDF))
print(myDF.columns)

(4, 4)
16
4
Index(['column 1', 'column 2', 'column 3', 'column 4'], dtype='object')


### **DataFrame Manipulation**

##### **How to extract data from a DataFrame?**

In [33]:
myDF = DataFrame(np.arange(16).reshape((4,4)), 
                   index=['row 1', 'row 2', 'row 3', 'row 4'],
                   columns=['column 1', 'column 2', 'column 3', 'column 4'])
myDF
myDF['column 3']
myDF['column 3'][:2]
myDF['column 3'][2]

myDF[['column 3', 'column 4']]
myDF[['column 3', 'column 4']][:2]

Unnamed: 0,column 3,column 4
row 1,2,3
row 2,6,7


##### **How to sorte a DataFrame by a specific column?**

In [34]:
df = pd.DataFrame({
    'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
    'col2': [2, 1, 9, 8, 7, 4],
    'col3': [0, 1, 9, 4, 2, 3],
})
df
df.sort_values(by=['col1'])

Unnamed: 0,col1,col2,col3
0,A,2,0
1,A,1,1
2,B,9,9
5,C,4,3
4,D,7,2
3,,8,4


##### **How to compare DF values to scalars?**

In [35]:
myDF = DataFrame(np.arange(16).reshape((4,4)), 
                   index=['row 1', 'row 2', 'row 3', 'row 4'],
                   columns=['column 1', 'column 2', 'column 3', 'column 4'])

myDF
myDF < 3

Unnamed: 0,column 1,column 2,column 3,column 4
row 1,True,True,True,False
row 2,False,False,False,False
row 3,False,False,False,False
row 4,False,False,False,False


##### **How to filter DF values with scalars?**

In [36]:
myDF = DataFrame(np.arange(16).reshape((4,4)), 
                   index=['row 1', 'row 2', 'row 3', 'row 4'],
                   columns=['column 1', 'column 2', 'column 3', 'column 4'])

myDF[myDF>3]

Unnamed: 0,column 1,column 2,column 3,column 4
row 1,,,,
row 2,4.0,5.0,6.0,7.0
row 3,8.0,9.0,10.0,11.0
row 4,12.0,13.0,14.0,15.0
