# Pandas

Pandas is an open source data analysis library written in python

It uses the power and speed of numpy to make data manipulation and analysis easy for data scientists

It's key data structure is called DataFrame

we can analyse :
1) series  
2) dataframe

In [307]:
# you can install pandas using pip
# pip install pandas

In [308]:
# verify installatation
import pandas as pd
print(pd.__version__)

2.1.4


### Pandas Data Structures
1) **series ----> 1D array**
   1) Definition : A pandas series is a 1D array like object that can hold different data types     elements. It is similar to a column in DataFrame.
   2) Homogeneity: All elements in a Series must be of the same data type.
3) **Data Frame ----> 2D array**
   1) Definition: A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
   2) Rows and Columns: Consists of rows and columns, each column being a Series. Columns can be of different data types.

## Pandas Series

#### Creating series object using python Data Structures
**List , Tuple , Dictionary , Numpy arrays**

In [309]:
# from List
import pandas as pd
print(pd.Series([4,6,8,10,2])) #List

0     4
1     6
2     8
3    10
4     2
dtype: int64


In [310]:
# String
obj= pd.Series(["Python","SQL","Machine Learning"])
obj

0              Python
1                 SQL
2    Machine Learning
dtype: object

In [311]:
print(pd.Series((4,6,8,10,2))) #tuple

0     4
1     6
2     8
3    10
4     2
dtype: int64


In [312]:
print(pd.Series({"name":"Darshan","age":30,"city":"indore"})) # dict

name    Darshan
age          30
city     indore
dtype: object


In [313]:
# creating series from a numpy array
import numpy as np
arr = np.array([3,4,5,6,7])
print(pd.Series(arr))

0    3
1    4
2    5
3    6
4    7
dtype: int32


##### Changing Index labels

In [314]:
obj = pd.Series([4,6,8,10,2],index=[2,3,4,5,1]) 
print(obj)

2     4
3     6
4     8
5    10
1     2
dtype: int64


In [315]:
# lets sort its indexed in increasing/decreasing order
obj = pd.Series([4,6,8,10,2],index=[2,3,4,5,1]) #List
print("sorted\n",obj.sort_index(ascending=True))

sorted
 1     2
2     4
3     6
4     8
5    10
dtype: int64


In [316]:
obj = pd.Series([4,6,8,10,2],index=[2,3,4,5,1]) #List
print("sorted\n",obj.sort_values(ascending=False))

sorted
 5    10
4     8
3     6
2     4
1     2
dtype: int64


In [317]:
# we can also specify the list indexes
obj = pd.Series([4,6,8,10,2],index=['first','second','third','fourth','fifth']) 
print(obj)

first      4
second     6
third      8
fourth    10
fifth      2
dtype: int64


In [318]:
print("accessing using index :",obj['third'])

accessing using index : 8


In [319]:
obj.index #We can reach indexes of a serie with index attribute

Index(['first', 'second', 'third', 'fourth', 'fifth'], dtype='object')

In [320]:
obj.values

array([ 4,  6,  8, 10,  2], dtype=int64)

In [321]:
for labels in obj.index:
    print(f"value at index {labels}:",obj[labels])

value at index first: 4
value at index second: 6
value at index third: 8
value at index fourth: 10
value at index fifth: 2


#### Merging 2 different series (Concat)

In [322]:
obj1 = pd.Series([2,3,55,2,6,44])
obj2 = pd.Series([421,325,3426,2,1,4,42])
obj3 = pd.concat([obj1,obj2])
obj3

0       2
1       3
2      55
3       2
4       6
5      44
0     421
1     325
2    3426
3       2
4       1
5       4
6      42
dtype: int64

In [323]:
obj3.reset_index(drop=True)

0        2
1        3
2       55
3        2
4        6
5       44
6      421
7      325
8     3426
9        2
10       1
11       4
12      42
dtype: int64

#### Indexing and Scling in Pandas Series

##### eg.1

In [324]:
obj = pd.Series([2,3,55,2,6,44])
obj

0     2
1     3
2    55
3     2
4     6
5    44
dtype: int64

In [325]:
obj[1] # indexing

3

In [326]:
obj[[2,5]] # multiple indexing

2    55
5    44
dtype: int64

In [327]:
obj[0:2] # scling

0    2
1    3
dtype: int64

In [328]:
obj[3]= 500 #updating values
obj

0      2
1      3
2     55
3    500
4      6
5     44
dtype: int64

##### eg.2
1) Label-based indexing with .loc[].
2) Positional indexing with .iloc[].

In [329]:
data = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
data

a    10
b    20
c    30
d    40
dtype: int64

In [330]:
print(data['a']) # Label based indexing
print(data.loc['b']) # label based indexing
print(data.iloc[2]) # position based indexing

10
20
30


In [331]:
print(data.loc['a':'c']) # Label-based slicing

a    10
b    20
c    30
dtype: int64


In [332]:
print(data.iloc[1:3]) # # Positional slicing

b    20
c    30
dtype: int64


#### Methods in series

In [333]:
# aggrigation methods
import pandas as pd
data = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(data)
print("sum :",data.sum())
print("mean :",data.mean())
print("median :",data.median())
print("max :",data.max())
print("min :",data.min())

a    10
b    20
c    30
d    40
dtype: int64
sum : 100
mean : 25.0
median : 25.0
max : 40
min : 10


In [334]:
# Element-wise Operations:
# map(func): Maps a function to each element.
print(data.map(lambda x: x*2))

a    20
b    40
c    60
d    80
dtype: int64


In [335]:
# Filtering Methods:
# where(cond): Returns a Series with elements that meet the condition, others set to NaN.
# dropna(): Removes missing values.
# fillna(value): Fills missing values with a specified value.
filtered_data = data.where(data>20)
filtered_data

a     NaN
b     NaN
c    30.0
d    40.0
dtype: float64

In [336]:
print(filtered_data.dropna())

c    30.0
d    40.0
dtype: float64


In [337]:
print(filtered_data.fillna(0))

a     0.0
b     0.0
c    30.0
d    40.0
dtype: float64


In [338]:
# Descriptive Statistics Methods:
# describe(): Generates descriptive statistics.
# value_counts(): Returns the counts of unique values.


In [339]:
print(data.describe())

count     4.000000
mean     25.000000
std      12.909944
min      10.000000
25%      17.500000
50%      25.000000
75%      32.500000
max      40.000000
dtype: float64


In [340]:
print(data.value_counts())

10    1
20    1
30    1
40    1
Name: count, dtype: int64


In [341]:
# Index and Label Manipulation Methods:
# rename(index): Renames the Series index.
# reset_index(): Resets the index of the Series
print(data.rename({'a': 'alpha', 'b': 'beta'}))

alpha    10
beta     20
c        30
d        40
dtype: int64


In [342]:
print(data.reset_index(drop=True))

0    10
1    20
2    30
3    40
dtype: int64


## Pandas DataFrame

#### DataFrame creation

In [343]:
import pandas as pd
df = pd.DataFrame({
    'Name':["Rohan","Aman","Krishna","Shivam"],
    'Age' :[26,26,25,45]
})

df # You can think keys as columns and values as rows

Unnamed: 0,Name,Age
0,Rohan,26
1,Aman,26
2,Krishna,25
3,Shivam,45


In [344]:
# We can also create dataframes with numpy arrays
import numpy as np
np.random.seed(3)
arr = np.random.randint(1,20,(5,3))
arr

array([[11,  4,  9],
       [ 1, 11, 12],
       [10, 11,  7],
       [ 1, 13,  8],
       [15, 18,  3]])

### columns operations on dataframe

In [345]:
import numpy as np
np.random.seed(3)
arr = np.random.randint(1,20,(5,3))
df = pd.DataFrame(arr,columns=['var1','var','var3'])
df

Unnamed: 0,var1,var,var3
0,11,4,9
1,1,11,12
2,10,11,7
3,1,13,8
4,15,18,3


In [346]:
print("shape :",df.shape)
print("Dimension :",df.ndim)
print("size :",df.size)

shape : (5, 3)
Dimension : 2
size : 15


In [347]:
df.columns

Index(['var1', 'var', 'var3'], dtype='object')

In [348]:
df.values

array([[11,  4,  9],
       [ 1, 11, 12],
       [10, 11,  7],
       [ 1, 13,  8],
       [15, 18,  3]])

In [349]:
df.columns = ('Age',"Weight","Height") # renaming multiple colums

In [350]:
df

Unnamed: 0,Age,Weight,Height
0,11,4,9
1,1,11,12
2,10,11,7
3,1,13,8
4,15,18,3


In [351]:
df = df.rename(columns={'Age':'age'}) # renaming single columns
df

Unnamed: 0,age,Weight,Height
0,11,4,9
1,1,11,12
2,10,11,7
3,1,13,8
4,15,18,3


In [352]:
# adding single column
df['Department'] = ['IT',"Sales","Finance","Banking","Health"]
df

Unnamed: 0,age,Weight,Height,Department
0,11,4,9,IT
1,1,11,12,Sales
2,10,11,7,Finance
3,1,13,8,Banking
4,15,18,3,Health


In [353]:
# adding multiple columns
df = df.assign(city=['bhopal','indore','delhi','mumbai','banglore'],
              Name=['salman','arbaz','sohail','Neil','nitin'],
              country=['India','Australia','Japan','USA','China'])
df

Unnamed: 0,age,Weight,Height,Department,city,Name,country
0,11,4,9,IT,bhopal,salman,India
1,1,11,12,Sales,indore,arbaz,Australia
2,10,11,7,Finance,delhi,sohail,Japan
3,1,13,8,Banking,mumbai,Neil,USA
4,15,18,3,Health,banglore,nitin,China


In [354]:
# shuffling columns
df = df.reindex(columns=['Name','age', 'Weight', 'Height', 'Department', 'city','country'])
df

Unnamed: 0,Name,age,Weight,Height,Department,city,country
0,salman,11,4,9,IT,bhopal,India
1,arbaz,1,11,12,Sales,indore,Australia
2,sohail,10,11,7,Finance,delhi,Japan
3,Neil,1,13,8,Banking,mumbai,USA
4,nitin,15,18,3,Health,banglore,China


In [355]:
# assigning custom index
df = df.set_index("Name")
df

Unnamed: 0_level_0,age,Weight,Height,Department,city,country
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
salman,11,4,9,IT,bhopal,India
arbaz,1,11,12,Sales,indore,Australia
sohail,10,11,7,Finance,delhi,Japan
Neil,1,13,8,Banking,mumbai,USA
nitin,15,18,3,Health,banglore,China


In [356]:
df = df.reset_index(drop=False)
df

Unnamed: 0,Name,age,Weight,Height,Department,city,country
0,salman,11,4,9,IT,bhopal,India
1,arbaz,1,11,12,Sales,indore,Australia
2,sohail,10,11,7,Finance,delhi,Japan
3,Neil,1,13,8,Banking,mumbai,USA
4,nitin,15,18,3,Health,banglore,China


In [357]:
# droping columns , you can also drop multiple column --> ['Height','age']
df = df.drop(columns=['Height'])
df

Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,11,4,IT,bhopal,India
1,arbaz,1,11,Sales,indore,Australia
2,sohail,10,11,Finance,delhi,Japan
3,Neil,1,13,Banking,mumbai,USA
4,nitin,15,18,Health,banglore,China


In [358]:
df.age # accessing column

0    11
1     1
2    10
3     1
4    15
Name: age, dtype: int32

In [359]:
df[["age","city"]] # accessing multiple columns

Unnamed: 0,age,city
0,11,bhopal
1,1,indore
2,10,delhi
3,1,mumbai
4,15,banglore


### rows operations on dataframe

In [360]:
df1 = df
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,11,4,IT,bhopal,India
1,arbaz,1,11,Sales,indore,Australia
2,sohail,10,11,Finance,delhi,Japan
3,Neil,1,13,Banking,mumbai,USA
4,nitin,15,18,Health,banglore,China


In [361]:
# checking top n rows 
df1.head(2)

Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,11,4,IT,bhopal,India
1,arbaz,1,11,Sales,indore,Australia


In [362]:
# checking bottom n rows 
df1.tail(2)

Unnamed: 0,Name,age,Weight,Department,city,country
3,Neil,1,13,Banking,mumbai,USA
4,nitin,15,18,Health,banglore,China


In [363]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        5 non-null      object
 1   age         5 non-null      int32 
 2   Weight      5 non-null      int32 
 3   Department  5 non-null      object
 4   city        5 non-null      object
 5   country     5 non-null      object
dtypes: int32(2), object(4)
memory usage: 332.0+ bytes


In [364]:
# adding one single row to the dataframe using positional arguments
df1.loc[5] = ["Mukesh",26,86.6,'Aviation','Noida','India']
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,11,4.0,IT,bhopal,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,nitin,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India


####  adding multiple rows 

In [365]:
new_rows = [
    ['pravesh', 20, 75, 'Engineering', 'Patna', 'England'],
    ['dinesh', 22, 80, 'Design', 'Banglore', 'India'],
    ['vijay', 24, 85, 'HR', 'Colambo', 'shrilanka']
]

# Convert new rows data to DataFrame
new_df = pd.DataFrame(new_rows, columns=df1.columns)
new_df

Unnamed: 0,Name,age,Weight,Department,city,country
0,pravesh,20,75,Engineering,Patna,England
1,dinesh,22,80,Design,Banglore,India
2,vijay,24,85,HR,Colambo,shrilanka


#### merging data frames

In [366]:
# now adding above multiple rows to the dataframe
df1 = pd.concat([df1,new_df],ignore_index=True)
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,11,4.0,IT,bhopal,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,nitin,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,pravesh,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


#### Updating Values

In [367]:
df1["age"] #We can select spesific row of a column with this way

0    11
1     1
2    10
3     1
4    15
5    26
6    20
7    22
8    24
Name: age, dtype: int64

In [368]:
df1["age"][0]

11

In [369]:
df1["age"][0] = 1100 # updating single value
df1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1["age"][0] = 1100 # updating single value


Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,1100,4.0,IT,bhopal,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,nitin,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,pravesh,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


In [370]:
df1["Name"][6] # updating single value

'pravesh'

In [371]:
df1["Name"][6] = 'Prakash' # updating single value
df1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1["Name"][6] = 'Prakash' # updating single value


Unnamed: 0,Name,age,Weight,Department,city,country
0,salman,1100,4.0,IT,bhopal,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,nitin,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,Prakash,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


In [372]:
# updating multiple values
df1["Name"] = df1["Name"].replace({
    'salman':'Salman Khan',
    'nitin':'Nitin mishra',
    'sohail':'sohail khan'
})

In [373]:
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Khan,1100,4.0,IT,bhopal,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,Nitin mishra,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,Prakash,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


#### Filtering

##### Boolean indexing

In [374]:
df1[df1['age']>20] # boolean indexing

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Khan,1100,4.0,IT,bhopal,India
5,Mukesh,26,86.6,Aviation,Noida,India
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


In [375]:
df1[df1['country']=='India']

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Khan,1100,4.0,IT,bhopal,India
5,Mukesh,26,86.6,Aviation,Noida,India
7,dinesh,22,80.0,Design,Banglore,India


In [376]:
# Logical condition AND, OR
df1[(df1["age"]>25) & (df1["country"]=='India')] # And &

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Khan,1100,4.0,IT,bhopal,India
5,Mukesh,26,86.6,Aviation,Noida,India


In [377]:
df1[(df1["Department"]=="Engineering") | (df1["city"]=='Lucknow')] # OR |

Unnamed: 0,Name,age,Weight,Department,city,country
6,Prakash,20,75.0,Engineering,Patna,England


##### loc keyword

In order to reach rows by their indexes(labels), we can use loc keyword. If indexes are like 0,1,2..., it can be same as using loc.

In [378]:
df1.loc[0] # indexing

Name          Salman Khan
age                  1100
Weight                4.0
Department             IT
city               bhopal
country             India
Name: 0, dtype: object

In [379]:
#We can reach spesific columns with name of column
df1.loc[[1,2],"Name"]

1          arbaz
2    sohail khan
Name: Name, dtype: object

In [380]:
df1.loc[0:3] # slicing

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Khan,1100,4.0,IT,bhopal,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA


In [381]:
df1.loc[0:3,["Name","age"]]

Unnamed: 0,Name,age
0,Salman Khan,1100
1,arbaz,1
2,sohail khan,10
3,Neil,1


In [382]:
df1.loc[0:3,"Name":"Weight"] # we can perform slicing on both rows and columns

Unnamed: 0,Name,age,Weight
0,Salman Khan,1100,4.0
1,arbaz,1,11.0
2,sohail khan,10,11.0
3,Neil,1,13.0


##### iloc keyword

In order to reach rows by their integer location , we can use iloc keyword

In [383]:
df1.iloc[0]

Name          Salman Khan
age                  1100
Weight                4.0
Department             IT
city               bhopal
country             India
Name: 0, dtype: object

In [384]:
# Fancy indexing
df1.iloc[[1,2]]

Unnamed: 0,Name,age,Weight,Department,city,country
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan


In [385]:
# we can also select specific rows os a column
df.iloc[[1,2],0] # here 0 is given for column indexing

1     arbaz
2    sohail
Name: Name, dtype: object

In [386]:
df1.iloc[[1,2],[0,1]]

Unnamed: 0,Name,age
1,arbaz,1
2,sohail khan,10


In [387]:
df1.iloc[1:3]

Unnamed: 0,Name,age,Weight,Department,city,country
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan


In [388]:
# assigning new values to the row
df1.iloc[0] = ["Salman Bhai",61,78,'Film_industry','Mumbai','India']
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,Nitin mishra,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,Prakash,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


##### using regex to filters the data and columns using reguler expression

In [389]:
# The filter method in pandas is primarily used for selecting columns or index labels, not for filtering rows based on the values in a column.
df1.filter(regex='a') # filtering the columns containing letter 'a'

Unnamed: 0,Name,age,Department
0,Salman Bhai,61,Film_industry
1,arbaz,1,Sales
2,sohail khan,10,Finance
3,Neil,1,Banking
4,Nitin mishra,15,Health
5,Mukesh,26,Aviation
6,Prakash,20,Engineering
7,dinesh,22,Design
8,vijay,24,HR


In [390]:
df1.filter(regex='^D') # columns starting with letter "D"

Unnamed: 0,Department
0,Film_industry
1,Sales
2,Finance
3,Banking
4,Health
5,Aviation
6,Engineering
7,Design
8,HR


In [391]:
df1.filter(regex='y$') # filtering the column ending with y

Unnamed: 0,city,country
0,Mumbai,India
1,indore,Australia
2,delhi,Japan
3,mumbai,USA
4,banglore,China
5,Noida,India
6,Patna,England
7,Banglore,India
8,Colambo,shrilanka


In [392]:
df1.filter(regex='.+e.+') # filtering the column that contains e in the center only not at the begining or ending

Unnamed: 0,Weight,Department
0,78.0,Film_industry
1,11.0,Sales
2,11.0,Finance
3,13.0,Banking
4,18.0,Health
5,86.6,Aviation
6,75.0,Engineering
7,80.0,Design
8,85.0,HR


In [393]:
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,Nitin mishra,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,Prakash,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


In [394]:
# filtering rows with regex
df1[df1['Name'].str.contains('^S',case=False, regex=True)] # starts with S 

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
2,sohail khan,10,11.0,Finance,delhi,Japan


In [395]:
df1[df1['Name'].str.contains('h$',case=False, regex=True)] # ends with h

Unnamed: 0,Name,age,Weight,Department,city,country
5,Mukesh,26,86.6,Aviation,Noida,India
6,Prakash,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India


In [396]:
df1[df1['Name'].str.contains('.+b.+',case=False, regex=True)] # contains b inbetween 

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
1,arbaz,1,11.0,Sales,indore,Australia


#### Sorting

In [397]:
df1

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
1,arbaz,1,11.0,Sales,indore,Australia
2,sohail khan,10,11.0,Finance,delhi,Japan
3,Neil,1,13.0,Banking,mumbai,USA
4,Nitin mishra,15,18.0,Health,banglore,China
5,Mukesh,26,86.6,Aviation,Noida,India
6,Prakash,20,75.0,Engineering,Patna,England
7,dinesh,22,80.0,Design,Banglore,India
8,vijay,24,85.0,HR,Colambo,shrilanka


In [398]:
df1.sort_values('age',ascending=False) # sorting numerical values

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
5,Mukesh,26,86.6,Aviation,Noida,India
8,vijay,24,85.0,HR,Colambo,shrilanka
7,dinesh,22,80.0,Design,Banglore,India
6,Prakash,20,75.0,Engineering,Patna,England
4,Nitin mishra,15,18.0,Health,banglore,China
2,sohail khan,10,11.0,Finance,delhi,Japan
1,arbaz,1,11.0,Sales,indore,Australia
3,Neil,1,13.0,Banking,mumbai,USA


In [399]:
df1.sort_values('Name', key=lambda x: x.str.lower(), ascending=True) # sorting based on categorical values

Unnamed: 0,Name,age,Weight,Department,city,country
1,arbaz,1,11.0,Sales,indore,Australia
7,dinesh,22,80.0,Design,Banglore,India
5,Mukesh,26,86.6,Aviation,Noida,India
3,Neil,1,13.0,Banking,mumbai,USA
4,Nitin mishra,15,18.0,Health,banglore,China
6,Prakash,20,75.0,Engineering,Patna,England
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
2,sohail khan,10,11.0,Finance,delhi,Japan
8,vijay,24,85.0,HR,Colambo,shrilanka


In [400]:
df1.sort_values(["Name","age"],ascending=[True,False])
#We can also sort one column ascending order and descending with other column

Unnamed: 0,Name,age,Weight,Department,city,country
5,Mukesh,26,86.6,Aviation,Noida,India
3,Neil,1,13.0,Banking,mumbai,USA
4,Nitin mishra,15,18.0,Health,banglore,China
6,Prakash,20,75.0,Engineering,Patna,England
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India
1,arbaz,1,11.0,Sales,indore,Australia
7,dinesh,22,80.0,Design,Banglore,India
2,sohail khan,10,11.0,Finance,delhi,Japan
8,vijay,24,85.0,HR,Colambo,shrilanka


In [401]:
# In order to get largest elements in a column, we can use nlargest() function.
df1["age"].nlargest(2) # top 2

0    61
5    26
Name: age, dtype: int64

In [402]:
# In order to get largest elements in a column, we can use nlargest() function.
df1["age"].nsmallest(2) # bottom 2

1    1
3    1
Name: age, dtype: int64

### Aggregation Functions
count, value_counts(), mean() , median(), sum(),min(),max(), std(), var(), describe()

In [403]:
df1.count()

Name          9
age           9
Weight        9
Department    9
city          9
country       9
dtype: int64

In [404]:
df1['country'].value_counts()

country
India        3
Australia    1
Japan        1
USA          1
China        1
England      1
shrilanka    1
Name: count, dtype: int64

In [405]:
df1.max() # gives maximun values of each column

Name              vijay
age                  61
Weight             86.6
Department        Sales
city             mumbai
country       shrilanka
dtype: object

In [406]:
print("Mean age :",df1['age'].mean())
print("Max age :",df1["age"].max())
print("min age :",df1["age"].min())
print("var age :",df1["age"].var())
print("std age :",df1["age"].std())

Mean age : 20.0
Max age : 61
min age : 1
var age : 323.0
std age : 17.97220075561143


In [407]:
df1.describe()

Unnamed: 0,age,Weight
count,9.0,9.0
mean,20.0,50.844444
std,17.972201,35.885481
min,1.0,11.0
25%,10.0,13.0
50%,20.0,75.0
75%,24.0,80.0
max,61.0,86.6


In [408]:
df1.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,9.0,20.0,17.972201,1.0,10.0,20.0,24.0,61.0
Weight,9.0,50.844444,35.885481,11.0,13.0,75.0,80.0,86.6


In [409]:
df1['Salary'] = [1000,2000,4000,5000,1000,3000,9000,10000,6000] # adding one more columns

### Grouping

In [410]:
df1

Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India,1000
1,arbaz,1,11.0,Sales,indore,Australia,2000
2,sohail khan,10,11.0,Finance,delhi,Japan,4000
3,Neil,1,13.0,Banking,mumbai,USA,5000
4,Nitin mishra,15,18.0,Health,banglore,China,1000
5,Mukesh,26,86.6,Aviation,Noida,India,3000
6,Prakash,20,75.0,Engineering,Patna,England,9000
7,dinesh,22,80.0,Design,Banglore,India,10000
8,vijay,24,85.0,HR,Colambo,shrilanka,6000


In [411]:
# group the data by department
df1.groupby("country")['Salary'].sum()

country
Australia     2000
China         1000
England       9000
India        14000
Japan         4000
USA           5000
shrilanka     6000
Name: Salary, dtype: int64

In [412]:
df1.groupby("country")[['Salary','Weight']].sum() # countrywise salary and weight sum

Unnamed: 0_level_0,Salary,Weight
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Australia,2000,11.0
China,1000,18.0
England,9000,75.0
India,14000,244.6
Japan,4000,11.0
USA,5000,13.0
shrilanka,6000,85.0


In [413]:
df1.groupby("country")['Name'].count() # country wise employee count

country
Australia    1
China        1
England      1
India        3
Japan        1
USA          1
shrilanka    1
Name: Name, dtype: int64

In [414]:
df1.groupby(["country",'city'])['Name'].count() # country/city wise employee count

country    city    
Australia  indore      1
China      banglore    1
England    Patna       1
India      Banglore    1
           Mumbai      1
           Noida       1
Japan      delhi       1
USA        mumbai      1
shrilanka  Colambo     1
Name: Name, dtype: int64

In [415]:
df1.groupby("Department")['age'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Aviation,1.0,26.0,,26.0,26.0,26.0,26.0,26.0
Banking,1.0,1.0,,1.0,1.0,1.0,1.0,1.0
Design,1.0,22.0,,22.0,22.0,22.0,22.0,22.0
Engineering,1.0,20.0,,20.0,20.0,20.0,20.0,20.0
Film_industry,1.0,61.0,,61.0,61.0,61.0,61.0,61.0
Finance,1.0,10.0,,10.0,10.0,10.0,10.0,10.0
HR,1.0,24.0,,24.0,24.0,24.0,24.0,24.0
Health,1.0,15.0,,15.0,15.0,15.0,15.0,15.0
Sales,1.0,1.0,,1.0,1.0,1.0,1.0,1.0


In [416]:
# gruoup by country and calculate various aggregate functions
df1.groupby("country")['Salary'].agg(['max','min','sum'])

Unnamed: 0_level_0,max,min,sum
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Australia,2000,2000,2000
China,1000,1000,1000
England,9000,9000,9000
India,10000,1000,14000
Japan,4000,4000,4000
USA,5000,5000,5000
shrilanka,6000,6000,6000


In [417]:
df1.groupby("country").aggregate({"Salary":"max","age":'min'})

Unnamed: 0_level_0,Salary,age
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Australia,2000,1
China,1000,15
England,9000,20
India,10000,22
Japan,4000,10
USA,5000,1
shrilanka,6000,24


### Pivot table

In [418]:
df1

Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,Salman Bhai,61,78.0,Film_industry,Mumbai,India,1000
1,arbaz,1,11.0,Sales,indore,Australia,2000
2,sohail khan,10,11.0,Finance,delhi,Japan,4000
3,Neil,1,13.0,Banking,mumbai,USA,5000
4,Nitin mishra,15,18.0,Health,banglore,China,1000
5,Mukesh,26,86.6,Aviation,Noida,India,3000
6,Prakash,20,75.0,Engineering,Patna,England,9000
7,dinesh,22,80.0,Design,Banglore,India,10000
8,vijay,24,85.0,HR,Colambo,shrilanka,6000


In [419]:
df1.pivot_table(values='Weight',index="country",columns='Department',aggfunc='mean')

Department,Aviation,Banking,Design,Engineering,Film_industry,Finance,HR,Health,Sales
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Australia,,,,,,,,,11.0
China,,,,,,,,18.0,
England,,,,75.0,,,,,
India,86.6,,80.0,,78.0,,,,
Japan,,,,,,11.0,,,
USA,,13.0,,,,,,,
shrilanka,,,,,,,85.0,,


### Data cleaning and Handling missing values

##### Handling missing values

In [420]:
import pandas as pd

data = pd.DataFrame({
    'Name': ['Salman Bhai', 'arbaz', 'sohail khan', 'Neil', 'Nitin mishra', 'Mukesh', 'Prakash', 'dinesh', 'vijay'],
    'age': [61, 1, 10, 1, 15, 26, None, 22, 24],
    'Weight': [78.0, 11.0, None, 13.0, 18.0, 86.6, 75.0, 80.0, None],
    'Department': ['Film_industry', 'Sales', 'Finance', 'Banking', 'Health', 'Aviation', 'Engineering', 'Design', 'HR'],
    'city': ['Mumbai', 'indore', 'delhi', 'mumbai', 'banglore', 'Noida', 'Patna', 'Banglore', 'Colambo'],
    'country': ['India', 'Australia', 'Japan', 'USA', 'China', 'India', 'England', 'India', 'shrilanka']
})
data

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61.0,78.0,Film_industry,Mumbai,India
1,arbaz,1.0,11.0,Sales,indore,Australia
2,sohail khan,10.0,,Finance,delhi,Japan
3,Neil,1.0,13.0,Banking,mumbai,USA
4,Nitin mishra,15.0,18.0,Health,banglore,China
5,Mukesh,26.0,86.6,Aviation,Noida,India
6,Prakash,,75.0,Engineering,Patna,England
7,dinesh,22.0,80.0,Design,Banglore,India
8,vijay,24.0,,HR,Colambo,shrilanka


In [421]:
# Identify missing values
data.isnull().sum() # or also isna().sum()

Name          0
age           1
Weight        2
Department    0
city          0
country       0
dtype: int64

In [422]:
# Finding the location of null values in specific columns/rows
data[data['Weight'].isna()]

Unnamed: 0,Name,age,Weight,Department,city,country
2,sohail khan,10.0,,Finance,delhi,Japan
8,vijay,24.0,,HR,Colambo,shrilanka


In [423]:
data.dropna() # droping the rows that is having null values

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61.0,78.0,Film_industry,Mumbai,India
1,arbaz,1.0,11.0,Sales,indore,Australia
3,Neil,1.0,13.0,Banking,mumbai,USA
4,Nitin mishra,15.0,18.0,Health,banglore,China
5,Mukesh,26.0,86.6,Aviation,Noida,India
7,dinesh,22.0,80.0,Design,Banglore,India


In [424]:
data.dropna(axis=1) # droping the columns that is having null values

Unnamed: 0,Name,Department,city,country
0,Salman Bhai,Film_industry,Mumbai,India
1,arbaz,Sales,indore,Australia
2,sohail khan,Finance,delhi,Japan
3,Neil,Banking,mumbai,USA
4,Nitin mishra,Health,banglore,China
5,Mukesh,Aviation,Noida,India
6,Prakash,Engineering,Patna,England
7,dinesh,Design,Banglore,India
8,vijay,HR,Colambo,shrilanka


In [425]:
# filling the null values with the specified values

data.fillna(0)

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61.0,78.0,Film_industry,Mumbai,India
1,arbaz,1.0,11.0,Sales,indore,Australia
2,sohail khan,10.0,0.0,Finance,delhi,Japan
3,Neil,1.0,13.0,Banking,mumbai,USA
4,Nitin mishra,15.0,18.0,Health,banglore,China
5,Mukesh,26.0,86.6,Aviation,Noida,India
6,Prakash,0.0,75.0,Engineering,Patna,England
7,dinesh,22.0,80.0,Design,Banglore,India
8,vijay,24.0,0.0,HR,Colambo,shrilanka


In [426]:
data.fillna(data["age"].mean())

Unnamed: 0,Name,age,Weight,Department,city,country
0,Salman Bhai,61.0,78.0,Film_industry,Mumbai,India
1,arbaz,1.0,11.0,Sales,indore,Australia
2,sohail khan,10.0,20.0,Finance,delhi,Japan
3,Neil,1.0,13.0,Banking,mumbai,USA
4,Nitin mishra,15.0,18.0,Health,banglore,China
5,Mukesh,26.0,86.6,Aviation,Noida,India
6,Prakash,20.0,75.0,Engineering,Patna,England
7,dinesh,22.0,80.0,Design,Banglore,India
8,vijay,24.0,20.0,HR,Colambo,shrilanka


##### Data Type conversion

In [427]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        9 non-null      object 
 1   age         9 non-null      int64  
 2   Weight      9 non-null      float64
 3   Department  9 non-null      object 
 4   city        9 non-null      object 
 5   country     9 non-null      object 
 6   Salary      9 non-null      int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 636.0+ bytes


In [437]:
df1['age'] = df1['age'].astype(float)
df1

Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,Salman Bhai,61.0,78.0,Film_industry,Mumbai,India,1000
1,arbaz,1.0,11.0,Sales,indore,Australia,2000
2,sohail khan,10.0,11.0,Finance,delhi,Japan,4000
3,Neil,1.0,13.0,Banking,mumbai,USA,5000
4,Nitin mishra,15.0,18.0,Health,banglore,China,1000
5,Mukesh,26.0,86.6,Aviation,Noida,India,3000
6,Prakash,20.0,75.0,Engineering,Patna,England,9000
7,dinesh,22.0,80.0,Design,Banglore,India,10000
8,vijay,24.0,85.0,HR,Colambo,shrilanka,6000


In [440]:
df1['Weight'] = df1['Weight'].astype(int)
df1

Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,Salman Bhai,61.0,78,Film_industry,Mumbai,India,1000
1,arbaz,1.0,11,Sales,indore,Australia,2000
2,sohail khan,10.0,11,Finance,delhi,Japan,4000
3,Neil,1.0,13,Banking,mumbai,USA,5000
4,Nitin mishra,15.0,18,Health,banglore,China,1000
5,Mukesh,26.0,86,Aviation,Noida,India,3000
6,Prakash,20.0,75,Engineering,Patna,England,9000
7,dinesh,22.0,80,Design,Banglore,India,10000
8,vijay,24.0,85,HR,Colambo,shrilanka,6000


In [442]:
df1['Weight'] = df1['Weight'].astype(str)
df1

Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,Salman Bhai,61.0,78,Film_industry,Mumbai,India,1000
1,arbaz,1.0,11,Sales,indore,Australia,2000
2,sohail khan,10.0,11,Finance,delhi,Japan,4000
3,Neil,1.0,13,Banking,mumbai,USA,5000
4,Nitin mishra,15.0,18,Health,banglore,China,1000
5,Mukesh,26.0,86,Aviation,Noida,India,3000
6,Prakash,20.0,75,Engineering,Patna,England,9000
7,dinesh,22.0,80,Design,Banglore,India,10000
8,vijay,24.0,85,HR,Colambo,shrilanka,6000


In [443]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        9 non-null      object 
 1   age         9 non-null      float64
 2   Weight      9 non-null      object 
 3   Department  9 non-null      object 
 4   city        9 non-null      object 
 5   country     9 non-null      object 
 6   Salary      9 non-null      int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 636.0+ bytes


In [449]:
df1 = df1.convert_dtypes() # converts all Dataframe columns to the best possible dtypes
df1

Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,Salman Bhai,61,78,Film_industry,Mumbai,India,1000
1,arbaz,1,11,Sales,indore,Australia,2000
2,sohail khan,10,11,Finance,delhi,Japan,4000
3,Neil,1,13,Banking,mumbai,USA,5000
4,Nitin mishra,15,18,Health,banglore,China,1000
5,Mukesh,26,86,Aviation,Noida,India,3000
6,Prakash,20,75,Engineering,Patna,England,9000
7,dinesh,22,80,Design,Banglore,India,10000
8,vijay,24,85,HR,Colambo,shrilanka,6000


In [450]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        9 non-null      string
 1   age         9 non-null      Int64 
 2   Weight      9 non-null      string
 3   Department  9 non-null      string
 4   city        9 non-null      string
 5   country     9 non-null      string
 6   Salary      9 non-null      Int64 
dtypes: Int64(2), string(5)
memory usage: 654.0 bytes


### DataFrame Merging

#### Merge

In [10]:
import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value1': [1, 2, 3, 4]
})
df1

Unnamed: 0,key,value1
0,A,1
1,B,2
2,C,3
3,D,4


In [11]:
df2 = pd.DataFrame({
    'key': ['B', 'D', 'E', 'F'],
    'value2': [5, 6, 7, 8]
})
df2

Unnamed: 0,key,value2
0,B,5
1,D,6
2,E,7
3,F,8


In [12]:
# Inner Join
inner_merged = pd.merge(df1, df2, on='key', how='inner')
inner_merged

Unnamed: 0,key,value1,value2
0,B,2,5
1,D,4,6


In [13]:
# Left Join
left_merged = pd.merge(df1, df2, on='key', how='left')
left_merged

Unnamed: 0,key,value1,value2
0,A,1,
1,B,2,5.0
2,C,3,
3,D,4,6.0


In [14]:
# Right Join
right_merged = pd.merge(df1, df2, on='key', how='right')
right_merged

Unnamed: 0,key,value1,value2
0,B,2.0,5
1,D,4.0,6
2,E,,7
3,F,,8


In [15]:
# Outer Join
outer_merged = pd.merge(df1, df2, on='key', how='outer')
outer_merged

Unnamed: 0,key,value1,value2
0,A,1.0,
1,B,2.0,5.0
2,C,3.0,
3,D,4.0,6.0
4,E,,7.0
5,F,,8.0


### Data Loading and Saving

https://www.kaggle.com/datasets/shivamb/netflix-shows

https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/crimeinenglandandwalesannualtrendanddemographictables

In [454]:
df1.to_csv('MyDataFrame.csv') # saving current DataFrame as a CSV file to local system

In [455]:
df1.to_excel('MyDataFrame.xlsx') # saving current DataFrame as a excel file to local system

In [456]:
df1.to_json('MyDataFrame.json') # saving current DataFrame as a json file to local system

In [460]:
# reading csv file
df = pd.read_csv("D:\\M.TECH DATA SCIENCE\\1st semester\\PYTHON\\Udemy Python\\10 Libraries\\2 Pandas\\Dataset .csv")
df.head(4)

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365


In [469]:
df1 = pd.read_excel("D:\\M.TECH DATA SCIENCE\\1st semester\\PYTHON\\Udemy Python\\10 Libraries\\2 Pandas\\MyDataFrame.xlsx")
df1

Unnamed: 0.1,Unnamed: 0,Name,age,Weight,Department,city,country,Salary
0,0,Salman Bhai,61,78,Film_industry,Mumbai,India,1000
1,1,arbaz,1,11,Sales,indore,Australia,2000
2,2,sohail khan,10,11,Finance,delhi,Japan,4000
3,3,Neil,1,13,Banking,mumbai,USA,5000
4,4,Nitin mishra,15,18,Health,banglore,China,1000
5,5,Mukesh,26,86,Aviation,Noida,India,3000
6,6,Prakash,20,75,Engineering,Patna,England,9000
7,7,dinesh,22,80,Design,Banglore,India,10000
8,8,vijay,24,85,HR,Colambo,shrilanka,6000
