# Panda DataFrame 
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

![Alt text](image.png)

#### Creating Data Frame
In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, or an Excel file. Pandas DataFrame can be created from lists, dictionaries, and from a list of dictionaries, etc.

#### DataFrame Constructor
| Type | Notes |
| --- | --- |
| 2D ndarray | A matrix of data, passing optional row and column labels |
| dict of arrays, list, or tuples | Each sequence becomes a column in the DataFrame. All Sequences must be the same length. |
| NumPy structured array | Treated as the 'dict of arrays' case |
| dict of Series | Each value becomes a column. Indexes from each Series are unioned together to from the result's row index if no explicit index is passed. |
| list of dicts or Series | Each item becomes a row in the DataFrame. Union of dict keys or Series indexes become the DataFrame's column labels |
| List of lists or tuples | Treated as the '2D ndarray' case
| Another DataFrame | The DataFrame's indexes are used unless different ones are passed |
| NumPy MaskedArray | Like the '2D ndarray' case except masked values become NA in the DataFrame result | 

In [8]:
import pandas as pd
 
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
df = pd.DataFrame(dict)
 
print(df)

     name  degree  score
0  aparna     MBA     90
1  pankaj     BCA     40
2  sudhir  M.Tech     80
3   Geeku     MBA     98


#### Dealing with Rows and Columns in Pandas DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

In [26]:
# Column Selection
# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# select two columns
print(df[['Name', 'Qualification']])

     Name Qualification
0     Jai           Msc
1  Princi            MA
2  Gaurav           MCA
3    Anuj           Phd


#### Column Addition:
In Order to add a column in Pandas DataFrame, we can declare a new list as a column and add to a existing Dataframe.

In [27]:
# Import pandas package 
import pandas as pd
  
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
  
# Declare a list that is to be converted into a column
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
  
# Using 'Address' as the column name
# and equating it to the list
df['Address'] = address
  
# Observe the result
print(df)

     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna


#### Column Deletion:
In Order to delete a column in Pandas DataFrame, we can use the drop() method. Columns is deleted by dropping columns with column names.

In [32]:
# importing pandas module
import pandas as pd
  
# making data frame from csv file
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
  
 # Convert the dictionary into DataFrame
df = pd.DataFrame(data)

# dropping passed columns
df.drop(['Qualification'], axis = 1, inplace = True)
  
# display
print(df)

     Name  Height
0     Jai     5.1
1  Princi     6.2
2  Gaurav     5.1
3    Anuj     5.2


### Dealing with Rows:
In order to deal with rows, we can perform basic operations on rows like selecting, deleting, adding and renaming.

#### Row Selection:
Pandas provide a unique method to retrieve rows from a Data frame.DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

#### Index Objects in pandas
| Class | Description |
| --- | --- |
| Index | The most general index object, representing axis labels in a NumPy array of Python Objects. |
| Int64Index | Specialize Index for integer values |
| MultiIndex | 'Hierachical' index object representing multiple levels of indexing on a single axis. Can be thought of as similar to an array of tuples |
| DatetimeIndex | Stores nanosecond timestamps (NumPy's datetime64 dtype). |
| PeriodIndex | Specialized Index for Period data (timespans) |

#### Index methods and properties
| Method | Description |
| --- | --- |
| append | Concatenate with additional Index object, producing a new Index |
| diff | Compute set difference as an Index |
| intersection | Compute set intersection |
| union | Compute set union |
| isin | Compute boolean array indicating whether each value is contained in the passed collection |
| delete | Compute new Index with element at index i deleted |
| drop | Compute new Index by deleting passed values |
| insert | Compute new Index by inserting element at index i |
| is_monotonic | Return True if each element is greater than or equal to the previous element. |
| is_unique | Return True if the Index has no duplicate values |
| unique | Compute the array of unique values in the Index |
| reindex | Reindex series by expanding or truncating the index |
| head() | Return top n rows of a data frame. |
| tail() | Return bottom n rows of a data frame. |
| at[] | Access a single value for a row/column label pair. |
| iat[] |	Access a single value for a row/column pair by integer position. |
| tail() | Purely integer-location based indexing for selection by position. |
| lookup() | Label-based “fancy indexing” function for DataFrame. |
| pop() | Return item and drop from frame. |
| xs() | Returns a cross-section (row(s) or column(s)) from the DataFrame. |
| get() | Get item from object for given key (DataFrame column, Panel slice, etc.). |
| isin() | Return boolean DataFrame showing whether each element in the DataFrame is contained in values. |
| where() | Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. |
| mask() | Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other. |
| query() | Query the columns of a frame with a boolean expression. |

In [22]:
# Import pandas package 
import pandas as pd
  
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],
        'Address' : ['Delhi', 'Bangalore', 'Chennai', 'Patna']}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
df.index = data['Name']
first = df.iloc[0] 
team = df.loc[['Jai', 'Princi']][['Height', 'Address']] 

# Observe the result
print('first = ')
print(first)

print('\n')
print('team = ')
print(team)


first = 
Name               Jai
Height             5.1
Qualification      Msc
Address          Delhi
Name: Jai, dtype: object


team = 
        Height    Address
Jai        5.1      Delhi
Princi     6.2  Bangalore


In [None]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
df = DataFrame(np.arange(16).reshape((4, 4)), index = ['Ohio', 'Colorado', 'Utah', 'New York'], columns = (['one', 'two', 'three', 'four']))
print("df = ", df)

print("df.loc['Colorado']", df.loc['Colorado']) 
print("df.loc[['Ohio', 'Colorado']][['one', 'three']]", df.loc[['Ohio', 'Colorado']][['one', 'three']]) 
print("df['three']", df['three']) 

df =            one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15
df.loc['Colorado'] one      4
two      5
three    6
four     7
Name: Colorado, dtype: int32
df.loc[['Ohio', 'Colorado']]           one  three
Ohio        0      2
Colorado    4      6
df['three'] Ohio         2
Colorado     6
Utah        10
New York    14
Name: three, dtype: int32


#### Observation 
1. select a list of rows we should use loc method with a list of row id.
2. if we only select one row it return a distionary with column : value


#### Row Addition:
In Order to add a Row in Pandas DataFrame, we can concat the old dataframe with new one.

In [None]:
# Import pandas package 
import pandas as pd
  
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],
        'Address' : ['Delhi', 'Bangalore', 'Chennai', 'Patna']}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
  
new_row = pd.DataFrame({'Name': 'Curry',  'Height': 6.5, 'Qualification': 'MA', 'Address' : 'Seattle'}, 
                       index =[0])
df = pd.concat([df, new_row]).reset_index(drop = True)
  
# Observe the result
print(df)

     Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna
4   Curry     6.5            MA    Seattle


#### Row Deletion:
In Order to delete a row in Pandas DataFrame, we can use the drop() method. Rows is deleted by dropping Rows by index label.

In [54]:
# Import pandas package 
import pandas as pd
  
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],
        'Address' : ['Delhi', 'Bangalore', 'Chennai', 'Patna']}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
df.index = data['Name']
  
df.drop(['Anuj'],inplace = True)  
# Observe the result
print(df)

          Name  Height Qualification    Address
Jai        Jai     5.1           Msc      Delhi
Princi  Princi     6.2            MA  Bangalore
Gaurav  Gaurav     5.1           Msc    Chennai


#### Boolean Indexing in Pandas
In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data.

Boolean indexing is a type of indexing that uses actual values of the data in the DataFrame. In boolean indexing, we can filter a data in four ways:
- Accessing a DataFrame with a boolean index
- Applying a boolean mask to a dataframe
- Masking data based on column value
- Masking data based on an index value

In [6]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
df = pd.DataFrame(dict, index = [True, False, True, False])
  
print(df)

         name  degree  score
True   aparna     MBA     90
False  pankaj     BCA     40
True   sudhir  M.Tech     80
False   Geeku     MBA     98


## Select Data 
In a dataframe we can filter a data based on a column value.  In order to filter data, we can apply certain conditions on the dataframe using different operators like ==, >, <, <=, >=. When we apply these operators to the dataframe then it produces a Series of True and False.

In [1]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["BCA", "BCA", "M.Tech", "BCA"],
        'score':[90, 40, 80, 98]} 
# creating a dataframe
df = pd.DataFrame(dict)
  
# using a comparison operator for filtering of data
result =  df[df['degree'] == 'BCA']
print(result)

     name degree  score
0  aparna    BCA     90
1  pankaj    BCA     40
3   Geeku    BCA     98
