
# Introduction to Pandas in Python

Pandas is one of the most widely used libraries in Python for data manipulation and analysis. It provides data structures like DataFrames and Series that make handling structured data easy and efficient.

This notebook provides an overview of common pandas functions and operations used for:
1. Data Loading and Inspection
2. Data Selection and Filtering
3. Handling Missing Data
4. Data Aggregation and Grouping
5. Data Merging and Joining
6. Data Transformation

### Importing Pandas
To get started, you first need to import the Pandas library.


In [None]:
!pip install pandas

In [1]:

# Import the pandas library
import pandas as pd

# Check the version of pandas
pd.__version__


'2.2.3'


## 1. Loading Data

Pandas allows you to load data from various sources like CSV, Excel, and SQL databases. The most common function is `pd.read_csv()` for loading data from a CSV file.


In [6]:

# Load a dataset (example CSV file)
url = 'https://people.sc.fsu.edu/~jburkardt/data/csv/hw_200.csv'
data = pd.read_csv(url)

# Display the first few rows of the dataset
data.head()


Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)"""
0,1,65.78,112.99
1,2,71.52,136.49
2,3,69.4,153.03
3,4,68.22,142.34
4,5,67.79,144.3


In [7]:
data.tail(1)

Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)"""
199,200,71.39,127.88


In [8]:
data.columns

Index(['Index', ' Height(Inches)"', ' "Weight(Pounds)"'], dtype='object')


## 2. Data Inspection

After loading the data, you can inspect it using various functions:
- `.head()`: Display the first few rows of the dataset.
- `.info()`: Get a summary of the dataset.
- `.describe()`: Get statistical summary of the numerical columns.


In [9]:
# Get information about the dataset
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Index              200 non-null    int64  
 1    Height(Inches)"   200 non-null    float64
 2    "Weight(Pounds)"  200 non-null    float64
dtypes: float64(2), int64(1)
memory usage: 4.8 KB


In [11]:

# Get summary statistics
data.describe()


Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)"""
count,200.0,200.0,200.0
mean,100.5,67.9498,127.22195
std,57.879185,1.940363,11.960959
min,1.0,63.43,97.9
25%,50.75,66.5225,119.895
50%,100.5,67.935,127.875
75%,150.25,69.2025,136.0975
max,200.0,73.9,158.96



## 3. Data Selection and Filtering

You can select columns and filter rows in Pandas using:
- Column selection: `data['column_name']`
- Row filtering: `data[data['column_name'] > value]`
- Conditional selection: Using conditions to filter rows.


In [10]:
data.columns

Index(['Index', ' Height(Inches)"', ' "Weight(Pounds)"'], dtype='object')

In [16]:
columns_desired = ['Index', ' Height(Inches)"']
data[columns_desired]

Unnamed: 0,Index,"Height(Inches)"""
0,1,65.78
1,2,71.52
2,3,69.40
3,4,68.22
4,5,67.79
...,...,...
195,196,65.80
196,197,66.11
197,198,68.24
198,199,68.02


In [17]:
type(data)

pandas.core.frame.DataFrame

In [18]:
# Select a single column
heights = data[' Height(Inches)"']
heights.head()

0    65.78
1    71.52
2    69.40
3    68.22
4    67.79
Name:  Height(Inches)", dtype: float64

In [19]:
type(heights)

pandas.core.series.Series

In [14]:
data[' Height(Inches)"'].head()

0    65.78
1    71.52
2    69.40
3    68.22
4    67.79
Name:  Height(Inches)", dtype: float64

In [20]:

# Filter rows where height is greater than 70
tall_people = data[(data[' Height(Inches)"'] > 70) & (data[' "Weight(Pounds)"'] > 140)]
tall_people.head()


Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)"""
26,27,70.84,142.42
34,35,71.8,140.1
55,56,70.18,147.89
56,57,70.41,155.9
82,83,70.05,155.38



## 4. Handling Missing Data

You can handle missing data in Pandas using:
- `.isnull()`: Check for missing values.
- `.dropna()`: Drop rows with missing values.
- `.fillna()`: Fill missing values with a specific value or strategy.


In [22]:
# Check for missing values
data.isnull().sum()

Index                0
 Height(Inches)"     0
 "Weight(Pounds)"    0
dtype: int64

In [23]:

# Fill missing values with the mean of the column
data_filled = data.fillna(data.mean())
data_filled.head()

Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)"""
0,1,65.78,112.99
1,2,71.52,136.49
2,3,69.4,153.03
3,4,68.22,142.34
4,5,67.79,144.3



## 5. Data Aggregation and Grouping

Pandas provides powerful methods for grouping and aggregating data:
- `.groupby()`: Group data based on a column.
- `.agg()`: Apply aggregation functions like mean, sum, min, max, etc.


In [24]:
data.columns

Index(['Index', ' Height(Inches)"', ' "Weight(Pounds)"'], dtype='object')

In [25]:

# Group data by Height and calculate the average Weight
grouped_data = data.groupby(' Height(Inches)"').agg({' "Weight(Pounds)"': 'mean'})
grouped_data.head()


Unnamed: 0_level_0,"""Weight(Pounds)"""
"Height(Inches)""",Unnamed: 1_level_1
63.43,123.1
63.48,97.9
63.84,127.19
64.05,106.71
64.13,106.11



## 6. Data Merging and Joining

You can merge or join two DataFrames using:
- `pd.merge()`: Merge two DataFrames based on a key.
- `.join()`: Join two DataFrames on their indices.


In [27]:

# Example DataFrames for merging
data1 = pd.DataFrame({
    'ID': [1, 2, 3, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'Tej']
})

data2 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Age': [25, 30, 35]
})




In [30]:
# Merge the two DataFrames on 'ID'
merged_data = pd.merge(data1, data2, on='ID', how= "right" )
merged_data

Unnamed: 0,ID,Name,Age
0,1,Alice,25
1,2,Bob,30
2,3,Charlie,35



## 7. Data Transformation

Pandas allows you to transform your data using:
- `.apply()`: Apply a function along an axis of the DataFrame.
- `.map()`: Map values in a Series using a dictionary or a function.
- `.applymap()`: Apply a function element-wise on the entire DataFrame.


In [31]:
data.columns

Index(['Index', ' Height(Inches)"', ' "Weight(Pounds)"'], dtype='object')

In [32]:
# Example of using apply() to transform data
data['Height_Cm'] = data[' Height(Inches)"'].apply(lambda x: x * 2.54)
data.head()


Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)""",Height_Cm
0,1,65.78,112.99,167.0812
1,2,71.52,136.49,181.6608
2,3,69.4,153.03,176.276
3,4,68.22,142.34,173.2788
4,5,67.79,144.3,172.1866


In [35]:
data.columns

Index(['Index', ' Height(Inches)"', ' "Weight(Pounds)"', 'Height_Cm'], dtype='object')

In [36]:
# Example of using apply() to transform data
data['Weight_kg'] = data[' "Weight(Pounds)"'].apply(lambda x: x * 0.453)
data.head()


Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)""",Height_Cm,Weight_kg
0,1,65.78,112.99,167.0812,51.18447
1,2,71.52,136.49,181.6608,61.82997
2,3,69.4,153.03,176.276,69.32259
3,4,68.22,142.34,173.2788,64.48002
4,5,67.79,144.3,172.1866,65.3679


In [38]:
def convert_height(x):
    return x*2.54

In [39]:
data['Height_Cm_2'] = data[' Height(Inches)"'].apply(convert_height)
data.head()

Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)""",Height_Cm,Weight_kg,Height_Cm_2
0,1,65.78,112.99,167.0812,51.18447,167.0812
1,2,71.52,136.49,181.6608,61.82997,181.6608
2,3,69.4,153.03,176.276,69.32259,176.276
3,4,68.22,142.34,173.2788,64.48002,173.2788
4,5,67.79,144.3,172.1866,65.3679,172.1866
