#Day 3: Pandas Basics for Data Science

Welcome to Day 3 of my Data Science learning journey!  
Today, I focused on Pandas, one of the most powerful libraries in Python for data manipulation and analysis.  

## Topics Covered
1. Creating DataFrames
2. Reading & Writing CSV files
3. Selecting Rows & Columns
4. Filtering Data
5. Adding New Columns
6. Handling Missing Values
7. Sorting Data
8. Grouping Data

## Key Learning
- Pandas is built on top of **NumPy** and is mainly used for working with **tabular data**.  
- DataFrame is like a table (rows & columns) in Pandas.  
- It makes **data cleaning, transformation, and analysis** very easy and fast.  

## Practical Use Cases
- Cleaning messy datasets before applying ML models.
- Reading and analyzing CSV/Excel files in real projects.
- Summarizing large datasets with groupby and aggregation.
- Preparing structured data for visualization and reporting.





In [1]:
import pandas as pd
## 1. Creating DataFrame
data = {
    "Name": ["Sanjana", "Riya", "Aman", "Ravi"],
    "Age": [21, 22, 23, 24],
    "City": ["Delhi", "Mumbai", "Kolkata", "Chennai"]
}

df = pd.DataFrame(data)
print(df)

      Name  Age     City
0  Sanjana   21    Delhi
1     Riya   22   Mumbai
2     Aman   23  Kolkata
3     Ravi   24  Chennai


In [2]:
## Reading & Writing CSV
# Write CSV
df.to_csv("data.csv", index=False)

# Read CSV
df_read = pd.read_csv("data.csv")
print(df_read)


      Name  Age     City
0  Sanjana   21    Delhi
1     Riya   22   Mumbai
2     Aman   23  Kolkata
3     Ravi   24  Chennai


In [3]:
df

Unnamed: 0,Name,Age,City
0,Sanjana,21,Delhi
1,Riya,22,Mumbai
2,Aman,23,Kolkata
3,Ravi,24,Chennai


In [4]:
## 3. Selecting Rows & Columns

print("Names:\n", df["Name"])
print("\nName and Age:\n", df[["Name", "Age"]])
print("\nFirst Row:\n", df.iloc[0])



Names:
 0    Sanjana
1       Riya
2       Aman
3       Ravi
Name: Name, dtype: object

Name and Age:
       Name  Age
0  Sanjana   21
1     Riya   22
2     Aman   23
3     Ravi   24

First Row:
 Name    Sanjana
Age          21
City      Delhi
Name: 0, dtype: object


In [5]:
##  Filtering Data
data2 = {
    "Name": ["Sanjana", "Riya", "Aman", "Ravi", "Kiran"],
    "Age": [21, 22, 23, 24, 25],
    "City": ["Delhi", "Mumbai", "Kolkata", "Chennai", "Bangalore"],
    "Marks": [85, 90, 78, 88, 92]
}
df2 = pd.DataFrame(data2)

print(df2[df2["Marks"] > 85])
print(df2[df2["City"] == "Delhi"])


    Name  Age       City  Marks
1   Riya   22     Mumbai     90
3   Ravi   24    Chennai     88
4  Kiran   25  Bangalore     92
      Name  Age   City  Marks
0  Sanjana   21  Delhi     85


In [6]:
df

Unnamed: 0,Name,Age,City
0,Sanjana,21,Delhi
1,Riya,22,Mumbai
2,Aman,23,Kolkata
3,Ravi,24,Chennai


In [7]:
## handling missing value
data3 = {
    "Name": ["A", "B", "C", "D"],
    "Age": [20, None, 22, None],
    "City": ["Delhi", "Mumbai", None, "Kolkata"]
}
df3 = pd.DataFrame(data3)
print("Original:\n", df3)

print("\nFilled Missing:\n", df3.fillna({"Age": 0, "City": "Unknown"}))

print("\nDropped Missing:\n", df3.dropna())

Original:
   Name   Age     City
0    A  20.0    Delhi
1    B   NaN   Mumbai
2    C  22.0     None
3    D   NaN  Kolkata

Filled Missing:
   Name   Age     City
0    A  20.0    Delhi
1    B   0.0   Mumbai
2    C  22.0  Unknown
3    D   0.0  Kolkata

Dropped Missing:
   Name   Age   City
0    A  20.0  Delhi


In [8]:
## grouping  data 
print(df2.groupby("City")["Marks"].mean())

City
Bangalore    92.0
Chennai      88.0
Delhi        85.0
Kolkata      78.0
Mumbai       90.0
Name: Marks, dtype: float64


In [9]:

## Sorting Data



print(df2.sort_values(by="Marks", ascending=False))


      Name  Age       City  Marks
4    Kiran   25  Bangalore     92
1     Riya   22     Mumbai     90
3     Ravi   24    Chennai     88
0  Sanjana   21      Delhi     85
2     Aman   23    Kolkata     78


In [1]:
import os
print(os.getcwd())

C:\Users\user
