# 1. Why Learn Pandas?
## What is it?

 Pandas is a Python library for data manipulation and analysis. It provides Series (1D) and DataFrames (2D tabular data).

## Why we need it?

 1.Simplifies data cleaning, exploration, and manipulation.

 2.Handles large datasets efficiently.

 3.Built on top of NumPy for fast computations.

## Where is it used?
 1.Data Analysis & Reporting

 2.Machine Learning (preprocessing datasets)

 3.Finance, Healthcare, Marketing data analysis

 4.CSV/Excel/SQL data handling

## 2. Importing Pandas

"import pandas as pd"


In [1]:
import pandas as pd

## 3.Series & DataFrames

In [3]:
#Series 
s = pd.Series([10, 20, 30, 40]) 
print("Series:\n", s) 
# DataFrame from dict 
data = {'Name': ['Alice','Bob','Charlie'], 'Age':[25,30,35]} 
df = pd.DataFrame(data) 
print("\nDataFrame:\n", df)

Series:
 0    10
1    20
2    30
3    40
dtype: int64

DataFrame:
       Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


## 4. Reading & Exploring Data

In [8]:
# Read CSV (replace with your file path)

df = pd.read_csv(r"C:\Users\sumit\Downloads\archive\titanic.csv")
 
# Explore
print(df.head())
print(df.tail())
print(df.info())
print(df.describe())
print(df.shape)

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  
  

## 5. Selecting Columns & Rows

In [9]:
# Select column
print(df['Name'])
 
# Select multiple columns
print(df[['Name','Age']])
 
# Select row by index
print(df.iloc[0])       # first row
print(df.loc[0])        # first row using label
 
# Select subset of rows and columns
print(df.loc[0:1, ['Name','Sex']])

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object
                                                  Name   Age
0                              Braund, Mr. Owen Harris  22.0
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0
2                               Heikkinen, Miss. Laina  26.0
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0
4                            

## 6. Filtering & Sorting

In [12]:
# Filter rows
print(df[df['Age'] > 28])
 
# Sort by Age
print(df.sort_values('Age', ascending=False))

     PassengerId  Survived  Pclass  \
1              2         1       1   
3              4         1       1   
4              5         0       3   
6              7         0       1   
11            12         1       1   
..           ...       ...     ...   
873          874         0       3   
879          880         1       1   
881          882         0       3   
885          886         0       3   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
6                              McCarthy, Mr. Timothy J    male  54.0      0   
11                            Bonnell, Miss. Elizabeth  female  58.0      0   
..                                                 ...     ...   ... 

## 7. Adding & Removing Columns

In [13]:
# Add new column
df['Age_in_5yrs'] = df['Age'] + 5
print(df)
 
# Drop column
df = df.drop('Age_in_5yrs', axis=1)
print(df)
 

     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                               Heikkinen, Miss. Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                                 ...     ...   ... 

## 8. GroupBy & Aggregation

In [14]:

data = {'Name':['Alice','Bob','Charlie','Alice','Bob'],
        'Score':[85,90,95,80,70]}
df = pd.DataFrame(data)
 
# Group by Name and calculate mean score
grouped = df.groupby('Name').mean()
print(grouped)

         Score
Name          
Alice     82.5
Bob       80.0
Charlie   95.0


## 9. Handling Missing Data

In [15]:
data = {'Name':['Alice','Bob','Charlie','David'],
        'Age':[25, None, 35, 40]}
df = pd.DataFrame(data)
 
# Fill missing value
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)
 
# Drop rows with missing values
# df = df.dropna()

      Name        Age
0    Alice  25.000000
1      Bob  33.333333
2  Charlie  35.000000
3    David  40.000000
