 # Pandas Tutorial

 A comprehensive yet beginner-friendly tutorial on **pandas**, a popular Python library for data manipulation and analysis.

 We will cover:

 - Creating and loading data into a pandas `DataFrame`.

 - Basic indexing, merging, grouping, and computing statistics.

 - Modifying data with `.loc`, `.iloc`, and using functions like `value_counts()`.



 ## 1. Installation and Import



 Install pandas (if not already installed):


In [None]:
!pip install pandas


 Import pandas in Python:

In [None]:
import pandas as pd


 ## 2. Creating DataFrames



 A **DataFrame** is the core data structure in pandas—think of it like a table with rows and columns. You can create one from various sources.



 ### 2.1. From a Dictionary of Lists

In [None]:
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)
print(df)


 ### 2.2. From a List of Dictionaries

In [None]:
data_list = [
    {"Name": "Alice",   "Age": 25, "City": "New York"},
    {"Name": "Bob",     "Age": 30, "City": "Los Angeles"},
    {"Name": "Charlie", "Age": 35, "City": "Chicago"}
]
df2 = pd.DataFrame(data_list)
print(df2)


 ### 2.3. From CSV or Excel



 Pandas makes it easy to read data from common file types:


In [None]:
df_csv = pd.read_csv("my_data.csv")      # from CSV
df_excel = pd.read_excel("my_data.xlsx") # from Excel

#Replace `"my_data.csv"` with your actual file path or URL.

 ## 3. Basic Data Inspection



 After creating or loading a DataFrame, you’ll often want to inspect it:

In [None]:
print(df.head())       # First 5 rows (use df.head(10) for first 10)
print(df.tail())       # Last 5 rows
print(df.shape)        # (rows, columns)
print(df.columns)      # List of column names
print(df.info())       # Summary of the DataFrame (types, non-null counts)
print(df.describe())   # Basic statistics for numeric columns


 ## 4. Selecting and Indexing Data



 Pandas offers multiple ways to select or filter data within a DataFrame.



 ### 4.1. Dot Notation / Bracket Notation

In [None]:
# Dot notation (for simple column names without spaces/special chars)
print(df.Age)

# Bracket notation
print(df["Age"])


 ### 4.2. Row Selection with `.loc` and `.iloc`



 - **`.loc`** selects rows and columns by **label**.

 - **`.iloc`** selects rows and columns by **integer position**.

In [None]:
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "Dave"],
    "Age": [25, 30, 35, 28],
    "City": ["NY", "LA", "Chicago", "Seattle"]
}, index=["row1", "row2", "row3", "row4"])  # custom index labels

# Using .loc (label-based)
print(df.loc["row2"])            # Entire row labeled 'row2'
print(df.loc["row2", "Age"])     # Specific cell (row2, Age)
print(df.loc["row1":"row3"])     # Slice multiple rows by label
print(df.loc[:, ["Name", "City"]]) # All rows, only these columns

# Using .iloc (integer-based)
print(df.iloc[1])                # 2nd row (since indexing starts at 0)
print(df.iloc[1, 1])             # Cell in row index=1, col index=1
print(df.iloc[0:2])              # Rows 0 to 1
print(df.iloc[:, [0, 2]])        # All rows, columns 0 and 2


 ## 5. Filtering Rows



 ### Boolean Masking

 You can create a **boolean condition** that returns `True/False` for each row, then use that mask to filter the DataFrame.

In [None]:
# Show only rows where Age > 28
mask = df["Age"] > 28
older_than_28 = df[mask]
print(older_than_28)


 ### Multiple Conditions

 Use bitwise operators `&` (AND), `|` (OR), and `~` (NOT):

In [None]:
# People older than 25 AND living in NY
df_filtered = df[(df["Age"] > 25) & (df["City"] == "NY")]
print(df_filtered)


 ## 6. Changing Values



 ### 6.1. Assigning with `.loc`

In [None]:
df.loc["row1", "Age"] = 26
print(df)


 ### 6.2. Assigning with `.iloc`

In [None]:
df.iloc[0, 1] = 27
print(df)


 ### 6.3. Vectorized Assignments

In [None]:
# Increase everyone's Age by 1
df["Age"] = df["Age"] + 1
print(df)


 ## 7. Calculating Simple Statistics and Value Counts



 ### 7.1. Simple Statistics

In [None]:
print(df["Age"].mean())  # Average age
print(df["Age"].max())   # Max age
print(df["Age"].min())   # Min age


 ### 7.2. `value_counts()`

In [None]:
city_counts = df["City"].value_counts()
print(city_counts)


 ## 8. Grouping and Aggregation



 `.groupby()` allows you to split data into groups based on some criteria, apply functions to each group, and combine results.

In [None]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "Dave"],
    "Age": [25, 30, 35, 28],
    "City": ["NY", "LA", "NY", "LA"],
    "Salary": [70000, 80000, 120000, 95000]
}
df = pd.DataFrame(data)

# Group by 'City' and calculate mean Salary
grouped = df.groupby("City")["Salary"].mean()
print(grouped)


 ## 9. Merging / Joining DataFrames



 ### 9.1. The `merge()` Method

In [None]:
df_left = pd.DataFrame({
    "PersonID": [1, 2, 3],
    "Name": ["Alice", "Bob", "Charlie"]
})

df_right = pd.DataFrame({
    "PersonID": [1, 2, 4],
    "City": ["NY", "LA", "Houston"]
})

merged_df = pd.merge(df_left, df_right, on="PersonID", how="inner")
print(merged_df)


 ### 9.2. Joins on Different Column Names

In [None]:
# If columns in the two DataFrames have different names:
pd.merge(df_left, df_right, left_on="PersonID", right_on="ID")


 ## 10. Exercises



 1. Create a DataFrame from a dictionary of lists with at least three columns.

 2. Load a CSV file into a DataFrame and inspect its first few rows.

 3. Filter rows where a numeric column exceeds a certain threshold.

 4. Perform a group-by operation and calculate the mean of another column.

 5. Merge two DataFrames on a common key.

In [None]:
# 1. Create a DataFrame from a dictionary of lists.


In [None]:
# 2. Load a CSV file and inspect its first few rows.


In [None]:
# 3. Filter rows where a numeric column exceeds a threshold.


In [None]:
# 4. Perform a group-by operation and calculate the mean of another column.


In [None]:
# 5. Merge two DataFrames on a common key.
