<h2>Pandas Library :-</h2>

The Pandas library in Python is widely used for data manipulation and analysis. It provides data structures and functions needed to manipulate structured data seamlessly and efficiently. Here are some key reasons why pandas is popular and commonly used:

 Data Structures:
    
   Series: A one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its index.
    
   DataFrame: A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

 Data Manipulation:

    Data Cleaning: Handling missing data, duplicates, and data transformation.

    Data Wrangling: Merging, joining, reshaping, and pivoting datasets.

 Data Import and Export:

    Reading from and writing to various file formats like CSV, Excel, SQL databases, and more.

4. Integration with Other Libraries:

    Works well with other libraries like NumPy for numerical operations and Matplotlib for plotting.

5. Performance:

    Built on top of NumPy, providing efficient data manipulation and computational performance.

In [118]:
# To Download pandas library => pip install pandas (write in command prompt)
# Importing Pandas Library as pd
import pandas as pd
import numpy as np


dt = {"Name":['Ritu','Taniya','abhishek','tanmay','mayank'],
     "Branch":['it','cse','it','ece','cse'],
     "Class":[2,3,4,1,2], 
     "City":['jaipur','agra','vridavan','mathura','gokul']}

# Creating a Dataframe
df = pd.DataFrame(dt)

# Viewing the DataFrame
df

Unnamed: 0,Name,Branch,Class,City
0,Ritu,it,2,jaipur
1,Taniya,cse,3,agra
2,abhishek,it,4,vridavan
3,tanmay,ece,1,mathura
4,mayank,cse,2,gokul


In [119]:
# key --> column name
# value --> column entries
# 0,1,2,3,4 are the index

type(df) # type() function is used to determine the type of an object.

pandas.core.frame.DataFrame

In [120]:
# Series is like a single column
s = df['Name']

In [121]:
type(s)

pandas.core.series.Series

In [122]:
# In pandas, dtypes refers to the data types of the columns in a DataFrame. Each column in a pandas DataFrame can hold data of a specific type
df.dtypes

Name      object
Branch    object
Class      int64
City      object
dtype: object

In [123]:
# The shape attribute is used to get the dimensions of a DataFrame or Series
df.shape

(5, 4)

In [124]:
# Accessing the 'Name' column of the DataFrame df
df.Name

0        Ritu
1      Taniya
2    abhishek
3      tanmay
4      mayank
Name: Name, dtype: object

In [125]:
# Using df[['Name', 'Branch']] in pandas allows you to select multiple columns from a DataFrame. 
df[['Name','Branch']]

Unnamed: 0,Name,Branch
0,Ritu,it
1,Taniya,cse
2,abhishek,it
3,tanmay,ece
4,mayank,cse


<h4>Indexing :-</h4>

In [126]:
df[1:2] # In pandas allows you to slice the DataFrame to get rows from index 1 up to, but not including, index 2

Unnamed: 0,Name,Branch,Class,City
1,Taniya,cse,3,agra


In [127]:
# In pandas involves two steps: first slicing the DataFrame to get rows from index 1 to 2 (excluding 3), 
# and then selecting the 'Name' and 'Branch' columns from the sliced DataFrame.
df[1:3][['Name', 'Branch']]

Unnamed: 0,Name,Branch
1,Taniya,cse
2,abhishek,it


In [128]:
# The loc function in pandas is used for label-based indexing and selection. 
# This means you can select rows and columns based on their labels or names rather than their integer positions. 
df.loc[3:4,['Branch','Class','City']]

Unnamed: 0,Branch,Class,City
3,ece,1,mathura
4,cse,2,gokul


In [129]:
# The iloc indexer in pandas is used for integer-based indexing and selection, 
# allowing you to access rows and columns by their integer positions. 
df.iloc[3:,1:]

Unnamed: 0,Branch,Class,City
3,ece,1,mathura
4,cse,2,gokul


In [130]:
# Reading CSV files in pandas is a common operation for loading data into a DataFrame. 
# The pd.read_csv() function is used for this purpose.
df = pd.read_csv('Used_Bikes.csv')
df
# If  file is nt present in folder then pass path of file.

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
0,TVS Star City Plus Dual Tone 110cc,35000.0,Ahmedabad,17654.0,First Owner,3.0,110.0,TVS
1,Royal Enfield Classic 350cc,119900.0,Delhi,11000.0,First Owner,4.0,350.0,Royal Enfield
2,Triumph Daytona 675R,600000.0,Delhi,110.0,First Owner,8.0,675.0,Triumph
3,TVS Apache RTR 180cc,65000.0,Bangalore,16329.0,First Owner,4.0,180.0,TVS
4,Yamaha FZ S V 2.0 150cc-Ltd. Edition,80000.0,Bangalore,10000.0,First Owner,3.0,150.0,Yamaha
...,...,...,...,...,...,...,...,...
32643,Hero Passion Pro 100cc,39000.0,Delhi,22000.0,First Owner,4.0,100.0,Hero
32644,TVS Apache RTR 180cc,30000.0,Karnal,6639.0,First Owner,9.0,180.0,TVS
32645,Bajaj Avenger Street 220,60000.0,Delhi,20373.0,First Owner,6.0,220.0,Bajaj
32646,Hero Super Splendor 125cc,15600.0,Jaipur,84186.0,First Owner,16.0,125.0,Hero


In [131]:
df.head() # top 5 records are shown

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
0,TVS Star City Plus Dual Tone 110cc,35000.0,Ahmedabad,17654.0,First Owner,3.0,110.0,TVS
1,Royal Enfield Classic 350cc,119900.0,Delhi,11000.0,First Owner,4.0,350.0,Royal Enfield
2,Triumph Daytona 675R,600000.0,Delhi,110.0,First Owner,8.0,675.0,Triumph
3,TVS Apache RTR 180cc,65000.0,Bangalore,16329.0,First Owner,4.0,180.0,TVS
4,Yamaha FZ S V 2.0 150cc-Ltd. Edition,80000.0,Bangalore,10000.0,First Owner,3.0,150.0,Yamaha


In [132]:
df.head(10) # pass the number of lines to access in head() function

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
0,TVS Star City Plus Dual Tone 110cc,35000.0,Ahmedabad,17654.0,First Owner,3.0,110.0,TVS
1,Royal Enfield Classic 350cc,119900.0,Delhi,11000.0,First Owner,4.0,350.0,Royal Enfield
2,Triumph Daytona 675R,600000.0,Delhi,110.0,First Owner,8.0,675.0,Triumph
3,TVS Apache RTR 180cc,65000.0,Bangalore,16329.0,First Owner,4.0,180.0,TVS
4,Yamaha FZ S V 2.0 150cc-Ltd. Edition,80000.0,Bangalore,10000.0,First Owner,3.0,150.0,Yamaha
5,Yamaha FZs 150cc,53499.0,Delhi,25000.0,First Owner,6.0,150.0,Yamaha
6,Honda CB Hornet 160R ABS DLX,85000.0,Delhi,8200.0,First Owner,3.0,160.0,Honda
7,Hero Splendor Plus Self Alloy 100cc,45000.0,Delhi,12645.0,First Owner,3.0,100.0,Hero
8,Royal Enfield Thunderbird X 350cc,145000.0,Bangalore,9190.0,First Owner,3.0,350.0,Royal Enfield
9,Royal Enfield Classic Desert Storm 500cc,88000.0,Delhi,19000.0,Second Owner,7.0,500.0,Royal Enfield


In [133]:
df.tail() # bottom 5 records are shown

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
32643,Hero Passion Pro 100cc,39000.0,Delhi,22000.0,First Owner,4.0,100.0,Hero
32644,TVS Apache RTR 180cc,30000.0,Karnal,6639.0,First Owner,9.0,180.0,TVS
32645,Bajaj Avenger Street 220,60000.0,Delhi,20373.0,First Owner,6.0,220.0,Bajaj
32646,Hero Super Splendor 125cc,15600.0,Jaipur,84186.0,First Owner,16.0,125.0,Hero
32647,Bajaj Pulsar 150cc,22000.0,Pune,60857.0,First Owner,13.0,150.0,Bajaj


In [134]:
df.tail(10) # pass the number of lines to access in tail() function from bottom

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
32638,Yamaha Fazer 25 250cc,123000.0,Kadapa,14500.0,First Owner,4.0,250.0,Yamaha
32639,Royal Enfield Classic 350cc,95500.0,Delhi,18000.0,First Owner,8.0,350.0,Royal Enfield
32640,Hero Passion Pro 100cc,32000.0,Delhi,12000.0,First Owner,6.0,100.0,Hero
32641,Bajaj Avenger 220cc,41000.0,Delhi,20245.0,Second Owner,11.0,220.0,Bajaj
32642,Hero Passion 100cc,15000.0,Perumbavoor,35000.0,Second Owner,19.0,100.0,Hero
32643,Hero Passion Pro 100cc,39000.0,Delhi,22000.0,First Owner,4.0,100.0,Hero
32644,TVS Apache RTR 180cc,30000.0,Karnal,6639.0,First Owner,9.0,180.0,TVS
32645,Bajaj Avenger Street 220,60000.0,Delhi,20373.0,First Owner,6.0,220.0,Bajaj
32646,Hero Super Splendor 125cc,15600.0,Jaipur,84186.0,First Owner,16.0,125.0,Hero
32647,Bajaj Pulsar 150cc,22000.0,Pune,60857.0,First Owner,13.0,150.0,Bajaj


<h3>DATA FILTERING :-</h3>

In [135]:
df['brand'].nunique() # Returns number of unique items in brand column

23

In [136]:
df['brand'].unique() # returns list of unique items in "brand" column

array(['TVS', 'Royal Enfield', 'Triumph', 'Yamaha', 'Honda', 'Hero',
       'Bajaj', 'Suzuki', 'Benelli', 'KTM', 'Mahindra', 'Kawasaki',
       'Ducati', 'Hyosung', 'Harley-Davidson', 'Jawa', 'BMW', 'Indian',
       'Rajdoot', 'LML', 'Yezdi', 'MV', 'Ideal'], dtype=object)

In [137]:
# The value_counts() method is used to count the occurrences of unique values in a Series.
df['brand'].value_counts()

brand
Bajaj              11213
Hero                6368
Royal Enfield       4178
Yamaha              3916
Honda               2108
Suzuki              1464
TVS                 1247
KTM                 1077
Harley-Davidson      737
Kawasaki              79
Hyosung               64
Benelli               56
Mahindra              55
Triumph               26
Ducati                22
BMW                   16
Jawa                  10
MV                     4
Indian                 3
Ideal                  2
Rajdoot                1
Yezdi                  1
LML                    1
Name: count, dtype: int64

In [138]:
# Filtering the pandas DataFrame to include only rows where the 'brand' column matches the value 'Royal Enfield'. 
bullet = df[df['brand'] == 'Royal Enfield']
bullet

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
1,Royal Enfield Classic 350cc,119900.0,Delhi,11000.0,First Owner,4.0,350.0,Royal Enfield
8,Royal Enfield Thunderbird X 350cc,145000.0,Bangalore,9190.0,First Owner,3.0,350.0,Royal Enfield
9,Royal Enfield Classic Desert Storm 500cc,88000.0,Delhi,19000.0,Second Owner,7.0,500.0,Royal Enfield
23,Royal Enfield Classic Chrome 500cc,121700.0,Kalyan,24520.0,First Owner,5.0,500.0,Royal Enfield
36,Royal Enfield Classic 350cc,98800.0,Kochi,39000.0,First Owner,5.0,350.0,Royal Enfield
...,...,...,...,...,...,...,...,...
32601,Royal Enfield Classic 350cc,95500.0,Delhi,18000.0,First Owner,8.0,350.0,Royal Enfield
32614,Royal Enfield Bullet Electra 350cc,105000.0,Delhi,20000.0,First Owner,4.0,350.0,Royal Enfield
32633,Royal Enfield Classic 350cc,87000.0,Gautam Buddha Nagar,16336.0,First Owner,7.0,350.0,Royal Enfield
32634,Royal Enfield Thunderbird 350cc,70000.0,Mumbai,13858.0,Second Owner,11.0,350.0,Royal Enfield


In [139]:
#  Filtering a pandas DataFrame df based on multiple conditions.
bullet = df[(df['brand']=="Royal Enfield") & (df['age']<=2) & (df['owner']=="First Owner")]
bullet

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
38,Royal Enfield Thunderbird X 500cc,190500.0,Samastipur,4550.0,First Owner,2.0,500.0,Royal Enfield
81,Royal Enfield Interceptor 650cc,260000.0,Navi Mumbai,3800.0,First Owner,2.0,650.0,Royal Enfield
139,Royal Enfield Himalayan 410cc Fi ABS,173300.0,Vadodara,14000.0,First Owner,2.0,410.0,Royal Enfield
157,Royal Enfield Himalayan 410cc Fi ABS,173300.0,Vadodara,14000.0,First Owner,2.0,410.0,Royal Enfield
194,Royal Enfield Electra 350cc,145000.0,Bangalore,4000.0,First Owner,2.0,350.0,Royal Enfield
...,...,...,...,...,...,...,...,...
7694,Royal Enfield Classic Chrome 500cc ABS,215000.0,Delhi,417.0,First Owner,2.0,500.0,Royal Enfield
7706,Royal Enfield Classic Chrome 500cc ABS,215000.0,Delhi,417.0,First Owner,2.0,500.0,Royal Enfield
8139,Royal Enfield Thunderbird X 350cc ABS,169000.0,Bangalore,4411.0,First Owner,2.0,350.0,Royal Enfield
8192,Royal Enfield Thunderbird 350cc ABS,145000.0,Ghaziabad,12400.0,First Owner,2.0,350.0,Royal Enfield


In [140]:
# The info() method in pandas is used to display a concise summary of a DataFrame.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32648 entries, 0 to 32647
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   bike_name   32648 non-null  object 
 1   price       32648 non-null  float64
 2   city        32648 non-null  object 
 3   kms_driven  32648 non-null  float64
 4   owner       32648 non-null  object 
 5   age         32648 non-null  float64
 6   power       32648 non-null  float64
 7   brand       32648 non-null  object 
dtypes: float64(4), object(4)
memory usage: 2.0+ MB


In [141]:
# The method df.duplicated().sum() in pandas is used to identify and count the number of duplicate rows in a DataFrame.
df.duplicated().sum()

np.int64(25324)

In [142]:
# The method df.drop_duplicates(inplace=True) is used to remove duplicate rows from a pandas DataFrame. 
# The inplace=True parameter ensures that the changes are made directly to the original DataFrame rather 
# than returning a new DataFrame with duplicates removed.
df.drop_duplicates(inplace=True)

In [143]:
bullet=df[df['brand']=='Royal Enfield']
# sort_values() method is used to sort a DataFrame or Series by one or more columns.
# It allows you to order the data in ascending or descending order based on the values in specified columns.
bullet.sort_values(by='price').head(10)

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
5811,Royal Enfield Thunderbird 350cc,33500.0,Delhi,49463.0,First Owner,16.0,350.0,Royal Enfield
7038,Royal Enfield Bullet Electra 350cc,35000.0,Delhi,60000.0,Fourth Owner Or More,18.0,350.0,Royal Enfield
9183,Royal Enfield Thunderbird 350cc,35800.0,Bangalore,90408.0,Third Owner,18.0,350.0,Royal Enfield
4664,Royal Enfield Bullet Electra 350cc,41000.0,Noida,120000.0,First Owner,17.0,350.0,Royal Enfield
2371,Royal Enfield Bullet 350 cc,45000.0,Gurgaon,40000.0,Second Owner,20.0,350.0,Royal Enfield
5918,Royal Enfield Thunderbird 350cc,45000.0,Bangalore,93108.0,Third Owner,18.0,350.0,Royal Enfield
8475,Royal Enfield Thunderbird 350cc,45000.0,Delhi,45710.0,First Owner,16.0,350.0,Royal Enfield
6489,Royal Enfield Thunderbird 350cc,45918.0,Bangalore,51396.0,Second Owner,12.0,350.0,Royal Enfield
385,Royal Enfield Thunderbird 350cc,46000.0,Chennai,35000.0,First Owner,16.0,350.0,Royal Enfield
1227,Royal Enfield Thunderbird 350cc,47000.0,Faridabad,21000.0,First Owner,9.0,350.0,Royal Enfield


In [144]:
bullet.sort_values(by='price',ascending=False).head(10) # Descending order

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
4912,Royal Enfield Continental GT 650cc,285000.0,Hyderabad,4500.0,First Owner,2.0,650.0,Royal Enfield
277,Royal Enfield Interceptor 650cc,280000.0,Bangalore,1500.0,First Owner,2.0,650.0,Royal Enfield
2228,Royal Enfield Interceptor 650cc,280000.0,Mumbai,5000.0,First Owner,2.0,650.0,Royal Enfield
1931,Royal Enfield Interceptor 650cc,270000.0,Ahmedabad,6500.0,First Owner,2.0,650.0,Royal Enfield
5912,Royal Enfield Interceptor 650cc,265500.0,Nellore,12000.0,First Owner,2.0,650.0,Royal Enfield
1396,Royal Enfield Interceptor 650cc,265000.0,Delhi,11000.0,First Owner,2.0,650.0,Royal Enfield
428,Royal Enfield Interceptor 650cc,265000.0,Bangalore,12900.0,First Owner,2.0,650.0,Royal Enfield
2332,Royal Enfield Interceptor 650cc,265000.0,Delhi,8500.0,First Owner,2.0,650.0,Royal Enfield
81,Royal Enfield Interceptor 650cc,260000.0,Navi Mumbai,3800.0,First Owner,2.0,650.0,Royal Enfield
1251,Royal Enfield Interceptor 650cc,250000.0,Silchar,920.0,First Owner,2.0,650.0,Royal Enfield


In [145]:
bullet.sort_values(by='price',ascending=False).head(10) # Descending order

Unnamed: 0,bike_name,price,city,kms_driven,owner,age,power,brand
4912,Royal Enfield Continental GT 650cc,285000.0,Hyderabad,4500.0,First Owner,2.0,650.0,Royal Enfield
277,Royal Enfield Interceptor 650cc,280000.0,Bangalore,1500.0,First Owner,2.0,650.0,Royal Enfield
2228,Royal Enfield Interceptor 650cc,280000.0,Mumbai,5000.0,First Owner,2.0,650.0,Royal Enfield
1931,Royal Enfield Interceptor 650cc,270000.0,Ahmedabad,6500.0,First Owner,2.0,650.0,Royal Enfield
5912,Royal Enfield Interceptor 650cc,265500.0,Nellore,12000.0,First Owner,2.0,650.0,Royal Enfield
1396,Royal Enfield Interceptor 650cc,265000.0,Delhi,11000.0,First Owner,2.0,650.0,Royal Enfield
428,Royal Enfield Interceptor 650cc,265000.0,Bangalore,12900.0,First Owner,2.0,650.0,Royal Enfield
2332,Royal Enfield Interceptor 650cc,265000.0,Delhi,8500.0,First Owner,2.0,650.0,Royal Enfield
81,Royal Enfield Interceptor 650cc,260000.0,Navi Mumbai,3800.0,First Owner,2.0,650.0,Royal Enfield
1251,Royal Enfield Interceptor 650cc,250000.0,Silchar,920.0,First Owner,2.0,650.0,Royal Enfield


In [146]:
# The to_csv() method in pandas is used to export a DataFrame or Series to a CSV file. 
bullet.to_csv("Bullet.csv",index=False)

In [147]:
# The drop() method in pandas is used to remove rows or columns from a DataFrame or Series.
bullet.drop('owner',axis=1)

# For making changes permanently use inplace = True.

Unnamed: 0,bike_name,price,city,kms_driven,age,power,brand
1,Royal Enfield Classic 350cc,119900.0,Delhi,11000.0,4.0,350.0,Royal Enfield
8,Royal Enfield Thunderbird X 350cc,145000.0,Bangalore,9190.0,3.0,350.0,Royal Enfield
9,Royal Enfield Classic Desert Storm 500cc,88000.0,Delhi,19000.0,7.0,500.0,Royal Enfield
23,Royal Enfield Classic Chrome 500cc,121700.0,Kalyan,24520.0,5.0,500.0,Royal Enfield
36,Royal Enfield Classic 350cc,98800.0,Kochi,39000.0,5.0,350.0,Royal Enfield
...,...,...,...,...,...,...,...
9261,Royal Enfield Classic 500cc,146006.0,Guwahati,8575.0,4.0,500.0,Royal Enfield
9319,Royal Enfield Classic 350cc,100000.0,Chennai,25000.0,10.0,350.0,Royal Enfield
9337,Royal Enfield Himalayan 410cc,120000.0,Gurgaon,8492.0,5.0,410.0,Royal Enfield
9338,Royal Enfield Himalayan 410cc,138000.0,Delhi,5000.0,5.0,410.0,Royal Enfield


<h3>Missing Value Handling :-</h3>

1. Remove the Records : When in only 3-4% data contains the none value

2. Fill the Records

In [148]:
data = {'A':[2,5,np.nan,8,np.nan,9],
       'B':[np.nan,45,np.nan,89,63,np.nan], 
       'C':[np.nan,74,np.nan,np.nan,85,4], 
       'D':[10,20,30,40,50,60]}
data

# Creating a Dataframe with null values
df2 = pd.DataFrame(data)
df2

# NaN --> Missing Values

Unnamed: 0,A,B,C,D
0,2.0,,,10
1,5.0,45.0,74.0,20
2,,,,30
3,8.0,89.0,,40
4,,63.0,85.0,50
5,9.0,,4.0,60


In [149]:
df2.dropna() # By default remove all those rows that contain missing values

Unnamed: 0,A,B,C,D
1,5.0,45.0,74.0,20


In [150]:
df2.dropna(axis=1) # Remove all those Columns that contain missing values

Unnamed: 0,D
0,10
1,20
2,30
3,40
4,50
5,60


In [151]:
df2.isnull().sum() # Gives sum of the null values with column

A    2
B    3
C    3
D    0
dtype: int64

In [152]:
df2.isnull().sum().sum() # used to count the total number of missing (NaN) values in a DataFrame

np.int64(8)

<h3>Filling the Records :</h3>

1. Rows wise

2. Column wise

In [153]:
# The fillna() method in pandas is used to fill missing (NaN) values in a DataFrame or Series with specified values.
df2.fillna(500) 

Unnamed: 0,A,B,C,D
0,2.0,500.0,500.0,10
1,5.0,45.0,74.0,20
2,500.0,500.0,500.0,30
3,8.0,89.0,500.0,40
4,500.0,63.0,85.0,50
5,9.0,500.0,4.0,60


In [154]:
df2['B'].fillna(600) # fill missing values in the column 'B' of  DataFrame df2 with the specified value 600

0    600.0
1     45.0
2    600.0
3     89.0
4     63.0
5    600.0
Name: B, dtype: float64

In [155]:
# Filling mean / median
df2['A'].fillna(df2['A'].mean())
df2['A'].fillna(df2['A'].median())

0    2.0
1    5.0
2    6.5
3    8.0
4    6.5
5    9.0
Name: A, dtype: float64

<h3>Grouping :-</h3>

Grouping in pandas is a powerful technique used for data aggregation and transformation. It allows you to split your data into 

groups based on certain criteria, perform operations on each group, and then combine the results back into a DataFrame. 

This process is essential for summarizing and analyzing data at different levels.

In [156]:
#  group_by used to split data into groups based on one or more columns
brand_group = df.groupby('brand')

In [157]:
# Find the minimum value of the 'price' column for each group defined by the brand_group object.
brand_group[['price']].min() 

Unnamed: 0_level_0,price
brand,Unnamed: 1_level_1
BMW,255000.0
Bajaj,6400.0
Benelli,110700.0
Ducati,380000.0
Harley-Davidson,250000.0
Hero,5000.0
Honda,10000.0
Hyosung,120000.0
Ideal,100000.0
Indian,700000.0


In [158]:
# Find the maximum value of the 'price' column for each group defined by the brand_group object.
brand_group[['price']].max()

Unnamed: 0_level_0,price
brand,Unnamed: 1_level_1
BMW,1800000.0
Bajaj,195000.0
Benelli,785000.0
Ducati,1500000.0
Harley-Davidson,1100000.0
Hero,104000.0
Honda,800000.0
Hyosung,493500.0
Ideal,100000.0
Indian,1900000.0


In [159]:
# The agg method in pandas is used to perform multiple aggregation operations on a DataFrame or Series after grouping. 
brand_group['price'].agg(min_price='min',max_price='max',avg_price='mean')

Unnamed: 0_level_0,min_price,max_price,avg_price
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BMW,255000.0,1800000.0,673500.0
Bajaj,6400.0,195000.0,49031.28
Benelli,110700.0,785000.0,298376.1
Ducati,380000.0,1500000.0,900575.0
Harley-Davidson,250000.0,1100000.0,473429.1
Hero,5000.0,104000.0,30683.07
Honda,10000.0,800000.0,51470.68
Hyosung,120000.0,493500.0,243208.4
Ideal,100000.0,100000.0,100000.0
Indian,700000.0,1900000.0,1100000.0


In [160]:
# used to select columns from a DataFrame that have a data type of object (often used for categorical or text data in pandas). 
# The select_dtypes method is handy for filtering columns based on their data type.
cat_col = df.select_dtypes(include='O')
cat_col.head()

Unnamed: 0,bike_name,city,owner,brand
0,TVS Star City Plus Dual Tone 110cc,Ahmedabad,First Owner,TVS
1,Royal Enfield Classic 350cc,Delhi,First Owner,Royal Enfield
2,Triumph Daytona 675R,Delhi,First Owner,Triumph
3,TVS Apache RTR 180cc,Bangalore,First Owner,TVS
4,Yamaha FZ S V 2.0 150cc-Ltd. Edition,Bangalore,First Owner,Yamaha


In [161]:
# Used to select columns from a DataFrame that do not have a data type of object. 
# This typically includes numerical columns such as integers or floats.
num_col = df.select_dtypes(exclude='O')
num_col.head()

Unnamed: 0,price,kms_driven,age,power
0,35000.0,17654.0,3.0,110.0
1,119900.0,11000.0,4.0,350.0
2,600000.0,110.0,8.0,675.0
3,65000.0,16329.0,4.0,180.0
4,80000.0,10000.0,3.0,150.0


In [162]:
# pd.concat(): This function concatenates two or more DataFrames or Series along a particular axis.
# axis='columns': This specifies that the concatenation should be done along columns.
# For rows axis = 0.
# Used to concatenate two DataFrames (or Series) along columns, combining them side by side


pd.concat([cat_col,num_col],axis='columns')

Unnamed: 0,bike_name,city,owner,brand,price,kms_driven,age,power
0,TVS Star City Plus Dual Tone 110cc,Ahmedabad,First Owner,TVS,35000.0,17654.0,3.0,110.0
1,Royal Enfield Classic 350cc,Delhi,First Owner,Royal Enfield,119900.0,11000.0,4.0,350.0
2,Triumph Daytona 675R,Delhi,First Owner,Triumph,600000.0,110.0,8.0,675.0
3,TVS Apache RTR 180cc,Bangalore,First Owner,TVS,65000.0,16329.0,4.0,180.0
4,Yamaha FZ S V 2.0 150cc-Ltd. Edition,Bangalore,First Owner,Yamaha,80000.0,10000.0,3.0,150.0
...,...,...,...,...,...,...,...,...
9362,Hero Hunk Rear Disc 150cc,Delhi,First Owner,Hero,25000.0,48587.0,8.0,150.0
9369,Bajaj Avenger 220cc,Bangalore,First Owner,Bajaj,35000.0,60000.0,9.0,220.0
9370,Harley-Davidson Street 750 ABS,Jodhpur,First Owner,Harley-Davidson,450000.0,3430.0,4.0,750.0
9371,Bajaj Dominar 400 ABS,Hyderabad,First Owner,Bajaj,139000.0,21300.0,4.0,400.0


The merge() function in pandas is used to combine two DataFrames based on common columns or indices, similar to SQL joins. 

It allows you to perform database-like join operations on DataFrames

