<a href="https://colab.research.google.com/github/sh1vam31/DataScience_Sheryians/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Class 1: Introduction & Setup
Welcome to your first class on **Pandas**. In this session, we'll learn what Pandas is, why we use it, and how to get started with Series and DataFrames.

## 1 What is Pandas and Why Use It?
Pandas is a powerful **Python library** for data analysis and manipulation.

Key points:
- Provides two main data structures: **Series** (1D) and **DataFrame** (2D).
- Makes working with tabular data easy and fast.
- Ideal for cleaning, transforming, and analyzing datasets.
- Works well with other libraries like NumPy, Matplotlib, and Scikit-Learn.

## 2 Installing & Importing Pandas
To install pandas, open your terminal or command prompt and run:
```bash
pip install pandas
```
Then, you can import it in Python as follows:

In [None]:
import numpy as np
import pandas as pd

## 3 Understanding Series & DataFrame
- **Series**: One-dimensional labeled array, like a column in Excel.
- **DataFrame**: Two-dimensional table, similar to an Excel sheet or SQL table.

### Creating a Series from Python Lists or Dictionaries

In [None]:
import pandas as pd

# Creating a Series from a Python list
a = [90, 85, 78, 92]
b = pd.Series(a, name="Marks")
print("Series from list:")
print(b)

# Creating a Series from a Python dictionary
c = {"Alice": 90, "Bob": 85, "Charlie": 78, "David": 92}
d = pd.Series(c, name="Marks")
print("\nSeries from dictionary:")
print(d)

Series from list:
0    90
1    85
2    78
3    92
Name: Marks, dtype: int64

Series from dictionary:
Alice      90
Bob        85
Charlie    78
David      92
Name: Marks, dtype: int64


### Creating a DataFrame

In [None]:
import pandas as pd

# Create a dictionary with some data
a = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Marks": [90, 85, 78, 92],
    "Subject": ["Math", "Science", "History", "English"]
}

# Create a DataFrame from the dictionary
b = pd.DataFrame(a)

print(b)


      Name  Marks  Subject
0    Alice     90     Math
1      Bob     85  Science
2  Charlie     78  History
3    David     92  English


### Creating DataFrame from NumPy Arrays

In [None]:
import pandas as pd
import numpy as np

# Create a 2D NumPy array
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Create a DataFrame from the array
b = pd.DataFrame(a, columns=["A", "B", "C"])  # columns -> you can give the mane of the columns as you'r own

print(b)


   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


### Reading CSV / Excel Files (Sneak Peek)
You can read external data using:
```python
pd.read_csv('file.csv')
```
```python
pd.read_excel('file.xlsx')
```


# Class 2: DataFrame Basics
Welcome to **Class 2**. Today we’ll learn how to **inspect**, **select**, and **organize** data inside a DataFrame.

## 1 Inspecting Data
Quick ways to understand your dataset:
- `head(n)` → first *n* rows (default 5)
- `tail(n)` → last *n* rows
- `info()` → columns, non-null counts, and dtypes
- `shape` → (rows, columns)
- `dtypes` → data type of each column

In [2]:
import pandas as pd

# Create a small DataFrame
a = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 22],
    "Score": [88, 92, 79, 85, 95]
}

b = pd.DataFrame(a)

# 1. See the first rows
print("First 3 rows:")
print(b.head(3))

# 2. See the last rows
print("\nLast 2 rows:")
print(b.tail(2))

# 3. Get a concise summary
print("\nInfo about DataFrame:")
print(b.info())

# 4. Shape of the DataFrame (rows, columns)
print("\nShape:")
print(b.shape)

# 5. Data types of each column
print("\nData types:")
print(b.dtypes)



First 3 rows:
      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   35     79

Last 2 rows:
    Name  Age  Score
3  David   40     85
4   Emma   22     95

Info about DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   Score   5 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 252.0+ bytes
None

Shape:
(5, 3)

Data types:
Name     object
Age       int64
Score     int64
dtype: object


## 2 Selecting Columns & Rows
Ways to access parts of a DataFrame:
- Single column → `df['col']`
- Multiple columns → `df[['col1','col2']]`
- Label-based selection → `df.loc[row_label, col_label]`
- Integer-index selection → `df.iloc[row_idx, col_idx]`
Tip: `:` means "all rows/columns" in that dimension.

In [3]:

# 1. Single column
print("Single column (Name):")
print(b["Name"])

# 2. Multiple columns
print("\nMultiple columns (Name & Score):")
print(b[["Name", "Score"]])

# 3. Label-based selection with .loc ->
print("\nLabel-based selection (row index 2, column 'Score'):")
print(b.loc[2, "Score"])

print("\nRows 1 to 3 with columns Name & Age:")
print(b.loc[1:3, ["Name", "Age"]])

# 4. Integer-index selection with .iloc
print("\nRow at index 3 (all columns):")
print(b.iloc[3])

print("\nRows 0 to 2, columns 0 to 1:")
print(b.iloc[0:3, 0:2])

Single column (Name):
0      Alice
1        Bob
2    Charlie
3      David
4       Emma
Name: Name, dtype: object

Multiple columns (Name & Score):
      Name  Score
0    Alice     88
1      Bob     92
2  Charlie     79
3    David     85
4     Emma     95

Label-based selection (row index 2, column 'Score'):
79

Rows 1 to 3 with columns Name & Age:
      Name  Age
1      Bob   30
2  Charlie   35
3    David   40

Row at index 3 (all columns):
Name     David
Age         40
Score       85
Name: 3, dtype: object

Rows 0 to 2, columns 0 to 1:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


## 3 Index & Columns Overview
- `df.index` → gives row name or number
- `df.columns` → gives column names
- `df.values` → underlying NumPy array (read-only; prefer using DataFrame ops)

In [4]:

# Row labels (index)
print("Index:")
print(b.index)

# Column names
print("\nColumns:")
print(b.columns)

# Underlying NumPy array
print("\nValues:")
print(b.values)

# Optional: shape of the underlying array
print("\nShape of values:")
print(b.values.shape)


Index:
RangeIndex(start=0, stop=5, step=1)

Columns:
Index(['Name', 'Age', 'Score'], dtype='object')

Values:
[['Alice' 25 88]
 ['Bob' 30 92]
 ['Charlie' 35 79]
 ['David' 40 85]
 ['Emma' 22 95]]

Shape of values:
(5, 3)


## 4 Renaming Columns & Setting an Index
- Rename columns → `df.rename(columns={'old':'new'}, inplace=False)` -> inplace = True (chnages made in original data)
- Set a column as index → `df.set_index('column_name', inplace=False)` -> inplace = False (chnages made only in the copy of the data)
- Reset index → `df.reset_index(inplace=False)`
 If you do inplace = True by mistace and want the original data back so use this


In [None]:

# 1. Rename columns
c = b.rename(columns={"Score": "Marks"})
print("After renaming 'Score' to 'Marks':")
print(c)

# 2. Set a column as index
d = c.set_index("Name")
print("\nAfter setting 'Name' as index:")
print(d)

# 3. Reset the index back to default
e = d.reset_index()
print("\nAfter resetting the index:")
print(e)

## 5 Basic Attributes Recap
- `df.columns` → names of all columns
- `df.index` → row labels
- `df.values` → NumPy array (data only)
These are handy for quick checks and loops (though vectorized ops are preferred).

## Mini Practice
1. Create a DataFrame with at least 5 rows and 4 columns (mix of numeric + text).
2. Show the first 3 rows, last 2 rows, and the info.
3. Select a sub-DataFrame with only 2 columns.
4. Rename one column and set any column as index, then reset it back.
5. Print `df.shape`, `df.dtypes`, and `df.columns`.

In [None]:
import pandas as pd

# Create the DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 22],
    "City": ["Delhi", "Mumbai", "Pune", "Bangalore", "Chennai"],
    "Score": [85, 91, 78, 88, 95]
}

df = pd.DataFrame(data)

# 1. Show the first 3 rows, last 2 rows, and info
print("First 3 rows:")
print(df.head(3))

print("\nLast 2 rows:")
print(df.tail(2))

print("\nInfo about DataFrame:")
print(df.info())

# 2. Select a sub-DataFrame with only 2 columns (Name and Score)
sub_df = df[["Name", "Score"]]
print("\nSub-DataFrame with Name and Score:")
print(sub_df)

# 3. Rename one column and set any column as index, then reset it back
df_renamed = df.rename(columns={"Score": "Marks"})
print("\nAfter renaming Score to Marks:")
print(df_renamed)

df_indexed = df_renamed.set_index("Name")
print("\nAfter setting Name as index:")
print(df_indexed)

df_reset = df_indexed.reset_index()
print("\nAfter resetting the index:")
print(df_reset)

# 4. Print df.shape, df.dtypes, and df.columns
print("\nShape of original DataFrame:")
print(df.shape)

print("\nData types of each column:")
print(df.dtypes)

print("\nColumn names:")
print(df.columns)
