# 📊 Pandas DataFrames Tutorial
Welcome! This notebook provides a beginner‑friendly introduction to **Pandas DataFrames**. You'll learn how to create, inspect, manipulate, and analyze tabular data using Pandas.

**Learning objectives**
1. Understand what a `DataFrame` is and how to create one.
2. Load data from common formats (CSV, Excel).
3. Inspect and summarize data with built‑in methods.
4. Select, filter, and slice data using `.loc[]` and `.iloc[]`.
5. Perform common transformations: grouping, merging, concatenation.
6. Visualize data quickly with Pandas‑built‑in plotting.

Let's get started!


> **Tip**: Execute a code cell by pressing **Shift+Enter**. Feel free to experiment—modify the code and re‑run to see what changes!

Import the **Pandas** library for data manipulation.

In [1]:
import numpy as np 
import pandas as pd 

**Creating a DataFrame**

In [2]:
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 34, 29, 42],
    'City': ['New York', 'Paris', 'Berlin', 'London'],
    'Salary': [65000, 70000, 62000, 85000]
}
df = pd.DataFrame(data)

In [3]:
df

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [4]:
data_list = [
    ['John', 28, 'New York', 65000],
    ['Anna', 34, 'Paris', 70000],
    ['Peter', 29, 'Berlin', 62000],
    ['Linda', 42, 'London', 85000]
]
df2 = pd.DataFrame(data_list)
df2

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [5]:
columns = ["Name","Age","City","Salary"]
df2 = pd.DataFrame(data_list,columns =columns)
df2

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


**Selection and Indexing of Columns**

In [6]:
df2

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [7]:
df2[["Name","City"]]

Unnamed: 0,Name,City
0,John,New York
1,Anna,Paris
2,Peter,Berlin
3,Linda,London


**Creating a new column**

In [8]:
df2["Designation"] = ["Doctor","Eng.","Doctor","Eng."]

In [9]:
df2

Unnamed: 0,Name,Age,City,Salary,Designation
0,John,28,New York,65000,Doctor
1,Anna,34,Paris,70000,Eng.
2,Peter,29,Berlin,62000,Doctor
3,Linda,42,London,85000,Eng.


**Removing Columns**

In [10]:
df2.drop(0,axis = 0)

Unnamed: 0,Name,Age,City,Salary,Designation
1,Anna,34,Paris,70000,Eng.
2,Peter,29,Berlin,62000,Doctor
3,Linda,42,London,85000,Eng.


In [11]:
df2

Unnamed: 0,Name,Age,City,Salary,Designation
0,John,28,New York,65000,Doctor
1,Anna,34,Paris,70000,Eng.
2,Peter,29,Berlin,62000,Doctor
3,Linda,42,London,85000,Eng.


**Selecting Rows**

In [12]:
df2

Unnamed: 0,Name,Age,City,Salary,Designation
0,John,28,New York,65000,Doctor
1,Anna,34,Paris,70000,Eng.
2,Peter,29,Berlin,62000,Doctor
3,Linda,42,London,85000,Eng.


Select data by label with `.loc[]`.

In [13]:
df2.loc[[0,1]]

Unnamed: 0,Name,Age,City,Salary,Designation
0,John,28,New York,65000,Doctor
1,Anna,34,Paris,70000,Eng.


Select data by integer location with `.iloc[]`.

In [14]:
df.iloc[3]

Name       Linda
Age           42
City      London
Salary     85000
Name: 3, dtype: object

**Selecting Subsets of Rows and Columns**

Select data by label with `.loc[]`.

In [15]:
df.loc[[0,1]][["City","Salary"]]

Unnamed: 0,City,Salary
0,New York,65000
1,Paris,70000


Select data by integer location with `.iloc[]`.

In [16]:
df.iloc[[0,1]][["Name","City"]]

Unnamed: 0,Name,City
0,John,New York
1,Anna,Paris


**Conditional Selection**

In [17]:
df2

Unnamed: 0,Name,Age,City,Salary,Designation
0,John,28,New York,65000,Doctor
1,Anna,34,Paris,70000,Eng.
2,Peter,29,Berlin,62000,Doctor
3,Linda,42,London,85000,Eng.


In [18]:
#I only want to see those people whose age is above 30

In [19]:
df2[df2["Age"] > 30]

Unnamed: 0,Name,Age,City,Salary,Designation
1,Anna,34,Paris,70000,Eng.
3,Linda,42,London,85000,Eng.


In [20]:
#I only want poeple whose age is above 30 and their city must be paris

In [21]:
df2[(df2["Age"] > 30) & (df2["City"] == 'Paris')]

Unnamed: 0,Name,Age,City,Salary,Designation
1,Anna,34,Paris,70000,Eng.


## ✏️ Practice Exercises
Try these exercises on your own to reinforce what you've learned:
1. **Create** a DataFrame from a Python dictionary with at least three columns and five rows.
2. **Load** a CSV file of your choice and display its first 10 rows.
3. Use **`.loc[]`** to select all rows where a numeric column is greater than its median.
4. **Group** the data by a categorical column and compute the mean of a numeric column.
5. **Merge** two small DataFrames on a common column and display the result.
6. **Plot** a histogram of a numeric column using Pandas plotting.

---

## 🚀 Next Steps
- Learn this file **3_Missing_Data.ipynb**