# DataFrame


## Creating a Pandas DataFrame


In [1]:
import pandas as pd
import numpy as np
# https://www.geeksforgeeks.org/python-pandas-dataframe/

### Create Dataframe From List

In [2]:
lst = ['one', 'Two', np.nan, 'Four', 5, 'six']

df = pd.DataFrame(lst)
df

Unnamed: 0,0
0,one
1,Two
2,
3,Four
4,5
5,six


### Create Dataframe From Dict

In [3]:
# dictionary of lists 
data = {
        'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']
       }
  
df = pd.DataFrame(data,index=data['Name']) # To indicate one column as index
print(df)
df = pd.DataFrame(data) # Auto indexing
df

          Name  Age    Address Qualification
Jai        Jai   27      Delhi           Msc
Princi  Princi   24     Kanpur            MA
Gaurav  Gaurav   22  Allahabad           MCA
Anuj      Anuj   32    Kannauj           Phd


Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Delhi,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Kannauj,Phd


## Dealing With Row Column
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. 
We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming. 

#### Column Selection
In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

In [4]:
print(df[['Name','Age']]) # To select multiple column.
print('\n\n')
print(df['Age']) # to select only one column.

     Name  Age
0     Jai   27
1  Princi   24
2  Gaurav   22
3    Anuj   32



0    27
1    24
2    22
3    32
Name: Age, dtype: int64


## Dealing With Files

In [5]:
data_frame = pd.read_csv('dataset/circle_employee.csv') # Read CSV files by auto-indexing
print(data_frame.head())
data_frame = pd.read_csv('dataset/circle_employee.csv', index_col='name') # Read CSV files by making one column as index
data_frame.head()


   id           name   age blood_group gender  experience  \
0   1         Sharif   NaN          B+   male         1.5   
1   2   Kanan Mahmud  28.0         NaN   Male         7.5   
2   3     Md. Shakil  27.0          B-   Male         3.5   
3   4   Imran Sheikh  25.0          B-   Male         1.8   
4   5  Farsan Rashid  27.0          O+   Male         4.2   

            designation  salary  
0  Jr Software Engineer   30000  
1  Sr Software Engineer   80000  
2     Software Engineer   45000  
3  Jr Software Engineer   30000  
4     Software Engineer   55000  


Unnamed: 0_level_0,id,age,blood_group,gender,experience,designation,salary
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Sharif,1,,B+,male,1.5,Jr Software Engineer,30000
Kanan Mahmud,2,28.0,,Male,7.5,Sr Software Engineer,80000
Md. Shakil,3,27.0,B-,Male,3.5,Software Engineer,45000
Imran Sheikh,4,25.0,B-,Male,1.8,Jr Software Engineer,30000
Farsan Rashid,5,27.0,O+,Male,4.2,Software Engineer,55000


In [6]:
print(data_frame['name'][:4]) # for specific column

KeyError: 'name'

### Row Selection
Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is used 
to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

**.loc[ ] :** This function selects data by the label of the rows and columns. The **DataFrame.loc** indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns.

**.iloc[ ] :** This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to **DataFrame.loc** but only uses integer locations to make its selections.

In [None]:
try:
    second = data_frame.loc['Sharif']
    print(second)
except Exception as e:
    print(e)

## Working With Missing Value
Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA(Not Available) values in pandas.