# Pandas
- Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

## Agenda
#### What is Data Frames ? 
- It is the collection of rows and columns and also the format in which our csv data is loaded using pandas function.
- It should atleast be more than one row and/or more than one column.

#### What is Data Series ?
- It is either one row or one column.

#### Different operations in Pandas

In [1]:
# import pandas
import pandas as pd
import numpy as np

In [2]:
## creating a dataframe
df = pd.DataFrame(np.arange(0,20).reshape(5,4), index = ['Row1','Row2','Row3','Row4','Row5'], columns = ['Column1','Column2','Column3','Column4'], dtype = int)

In [3]:
df.head()

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [4]:
# convert this dataframe to excel sheet
df.to_csv('Test1.csv')

### Accessing the elements
- Using .loc -> Focuses on row index only.
- Using .iloc -> It means index location including both row and column. Works like numpy array's indexing.
- By directly addressing the column name (one or more than one [in list form]). It's type is series.

In [5]:
df.loc['Row1']

Column1    0
Column2    1
Column3    2
Column4    3
Name: Row1, dtype: int64

In [6]:
type(df.loc['Row1'])

pandas.core.series.Series

In [32]:
df.loc[['Row1','Row2']]

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7


In [33]:
type(df.loc[['Row1','Row2']])

pandas.core.frame.DataFrame

In [7]:
df.iloc[:,:]

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [8]:
df.iloc[:2,:]

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7


In [9]:
df.iloc[:2,:2]

Unnamed: 0,Column1,Column2
Row1,0,1
Row2,4,5


In [10]:
type(df.iloc[:2,:2])

pandas.core.frame.DataFrame

In [13]:
print(df.iloc[:2,0])
type(df.iloc[:2,0])

Row1    0
Row2    4
Name: Column1, dtype: int64


pandas.core.series.Series

In [14]:
print(df.iloc[:2,0:1])
type(df.iloc[:2,0:1])

      Column1
Row1        0
Row2        4


pandas.core.frame.DataFrame

In [29]:
df['Column3']

Row1     2
Row2     6
Row3    10
Row4    14
Row5    18
Name: Column3, dtype: int64

In [31]:
df[['Column3','Column4']]

Unnamed: 0,Column3,Column4
Row1,2,3
Row2,6,7
Row3,10,11
Row4,14,15
Row5,18,19


### Convert Dataframes into arrays

In [15]:
df.iloc[:,1:].values

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11],
       [13, 14, 15],
       [17, 18, 19]])

In [19]:
df.iloc[:,1:].values.shape

(5, 3)

In [16]:
df.iloc[:,:].values

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [20]:
df.iloc[:,:].values.shape

(5, 4)

##### How to check null conditions

In [23]:
df.isnull().sum

<bound method NDFrame._add_numeric_operations.<locals>.sum of       Column1  Column2  Column3  Column4
Row1    False    False    False    False
Row2    False    False    False    False
Row3    False    False    False    False
Row4    False    False    False    False
Row5    False    False    False    False>

In [24]:
df.isnull().sum()

Column1    0
Column2    0
Column3    0
Column4    0
dtype: int64

In [25]:
df

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [26]:
# To calculate no. of unique categories/elements
df['Column1'].value_counts()

0     1
4     1
8     1
12    1
16    1
Name: Column1, dtype: int64

In [27]:
df['Column1'].unique()

array([ 0,  4,  8, 12, 16])