# What is Pandas?
---

Pandas is a powerful and popular open-source Python library used for data analysis and data manipulation. It is widely used in Data Science, Machine Learning, AI, Data Analytics, and Finance because it provides fast, flexible, and easy-to-use data structures.

---

**Key Points about Pandas**

---
| Feature                  | Description                                                  |
| ------------------------ | ------------------------------------------------------------ |
| **Data Structures**      | Provides **Series** (1D) and **DataFrame** (2D tabular data) |
| **Handles Missing Data** | Easily manage NA/Null values                                 |
| **Data Cleaning**        | Replace, filter, drop, modify data                           |
| **Data Analysis**        | Perform group, sort, filter, merge, and aggregate            |
| **File Operations**      | Read/Write CSV, Excel, JSON, SQL, etc.                       |
| **Fast Performance**     | Built on top of **NumPy**, optimized for performance         |



---

**üì¶ How to Install Pandas**
---
        pip install pandas

**How to import**
---
        import pandas as pd

---
# Features of Pandas :

- Import Data Sets (CSV, SQL, Excel, etc.)
- Data Cleaning
- Size Mutability (Add / Delete Rows & Columns)
- Reshaping & Pivot Table
- Efficient Manipulation & Extraction
- Statistical Analysis
  
---

# üèÅ Conclusion
---
Pandas is a powerful data manipulation and analysis library that allows:
- ‚úî Easy data import
- ‚úî Data cleaning and transformation
- ‚úî Analytical & statistical operations
- ‚úî Handling large datasets efficiently

In [9]:
import pandas as pd
import numpy as np

In [10]:
s=pd.Series([22,33,44],name="data")
df=s.to_frame()
df

Unnamed: 0,data
0,22
1,33
2,44


In [11]:
arr= np.arange(1,16)
df1=pd.Series(arr)
print(df1)

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
dtype: int64


In [12]:
arr[2]

np.int64(3)

In [13]:
df1.iloc[11]

np.int64(12)

In [14]:
df1

0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
dtype: int64

In [15]:
df1.iloc[1:12]

1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
dtype: int64

# üßæ Main Data Structures in Pandas
1. Series (1-D labeled array)
---

In [16]:
import pandas as pd
s = pd.Series([10, 20, 30, 40])
print(s)


0    10
1    20
2    30
3    40
dtype: int64


2. DataFrame (2-D table like Excel sheet)
---

In [17]:
data = {
    "Name": ["Aman", "Suyash", "Rahul"],
    "Age": [20, 21, 22]
}

df = pd.DataFrame(data)
print(df)


     Name  Age
0    Aman   20
1  Suyash   21
2   Rahul   22


# üîß Important Functions in Pandas
- Creating and Viewing Data
---

In [18]:
df.head()       # first 5 rows
df.tail()       # last 5 rows
df.info()       # column info
df.describe()   # summary statistics


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes


Unnamed: 0,Age
count,3.0
mean,21.0
std,1.0
min,20.0
25%,20.5
50%,21.0
75%,21.5
max,22.0


In [19]:
#Selecting Columns & Rows
df['Name']            # select column
df.iloc[0]            # select row by index
df.loc[1, 'Age']      # select by label

np.int64(21)

In [20]:
#Filtering
df[df['Age'] > 20]

Unnamed: 0,Name,Age
1,Suyash,21
2,Rahul,22


In [21]:
#Sorting
df.sort_values('Age')

Unnamed: 0,Name,Age
0,Aman,20
1,Suyash,21
2,Rahul,22


In [22]:
#Adding and Removing Columns
df['City'] = ['Delhi','Mumbai','Patna']   # add
df.drop('Age', axis=1, inplace=True)      # delete

In [23]:
#Handling Missing Values
df.fillna(0)            # replace missing
df.dropna()             # remove missing rows

Unnamed: 0,Name,City
0,Aman,Delhi
1,Suyash,Mumbai
2,Rahul,Patna


In [24]:
#üìÇ File Read/Write
pd.read_csv('file.csv')
df.to_csv('output.csv', index=False)

pd.read_excel('file.xlsx')
df.to_excel('output.xlsx', index=False)

FileNotFoundError: [Errno 2] No such file or directory: 'file.csv'

In [25]:
#ü§ù Merging & Joining
pd.merge(df1, df2, on='id')

NameError: name 'df2' is not defined

# üèÅ Summary
Pandas is used for:

- Importing and exporting datasets
- Cleaning and transforming data
- Analysing large datasets easily
- Preparing data for machine learning