# Pandas DataFrames

**What is a DataFrame?**

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

In [2]:
import numpy as np
import pandas as pd

# **Creating a dataframe**

using function pd.DataFrame()

In [10]:
# creating dataframes usning dictionary
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 34, 29, 42],
    'City': ['New York', 'Paris', 'Berlin', 'London'],
    'Salary': [65000, 70000, 62000, 85000]
}
# load data into d dataframe object
df=pd.DataFrame(data)

In [11]:
df

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [17]:
#  Creating dataframe using list
data_list = [
    ['John', 28, 'New York', 65000],
    ['Anna', 34, 'Paris', 70000],
    ['Peter', 29, 'Berlin', 62000],
    ['Linda', 42, 'London', 85000]
]
df2=pd.DataFrame(data_list)

In [18]:
df2

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


✅ Manual Column Setting (After Creation)

In [19]:
# Step 1: List of lists (without columns)
data_list2 = [
    ['John', 28, 'New York', 65000],
    ['Anna', 34, 'Paris', 70000],
    ['Peter', 29, 'Berlin', 62000],
    ['Linda', 42, 'London', 85000]
]

# Step 2: Create DataFrame without column names
df3 = pd.DataFrame(data)

# Step 3: Manually set column names
columns = ['Name', 'Age', 'City', 'Salary']
df3=pd.DataFrame(data_list2 , columns=columns)

In [20]:
df3

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


## **Selections And Indexing Of Columns**

✅ 1. Select a Single Column

Returns a Series.

You can also use dot notation: df.Name (but not recommended if column name has space or special characters).

In [21]:
df3['Name']

0     John
1     Anna
2    Peter
3    Linda
Name: Name, dtype: object

✅ 2. Select Multiple Columns

Note: Use double brackets [[...]] for multiple columns.

Returns a DataFrame.

In [22]:
df3[['Name','City']]

Unnamed: 0,Name,City
0,John,New York
1,Anna,Paris
2,Peter,Berlin
3,Linda,London


# **Creating new Columns**

In [25]:
df3['Gender']=['Male','Female','Male','Female'] # adding new columns Gender

In [26]:
df3

Unnamed: 0,Name,Age,City,Salary,Gender
0,John,28,New York,65000,Male
1,Anna,34,Paris,70000,Female
2,Peter,29,Berlin,62000,Male
3,Linda,42,London,85000,Female


# **Removing Columns**

✅ 1. Remove a Single Column

📌 axis=1 means column (not row)
📌 inplace=True means changes are saved in the same DataFrame

In [28]:
df3.drop('Gender',axis=1, inplace=True)

In [29]:
df3

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


# Selecting A Row

In [41]:
df2

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


 Select Row by Label using .loc[] (label-based)

In [42]:
df2.loc[0]

0        John
1          28
2    New York
3       65000
Name: 0, dtype: object

In [43]:
df2.loc[[0,1]]

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
1,Anna,34,Paris,70000


Select Row by Index using .iloc[] (position-based)



In [44]:
df2.iloc[0]

0        John
1          28
2    New York
3       65000
Name: 0, dtype: object

In [45]:
df2.iloc[[0,3]]

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
3,Linda,42,London,85000


# Selecting Subset OF Row & Columns

In [46]:
df2

Unnamed: 0,0,1,2,3
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


✅ Format:

df.loc[rows, columns]



df.iloc[rows, columns]


In [51]:
df2.loc[[0,1],[2,3]]

Unnamed: 0,2,3
0,New York,65000
1,Paris,70000


In [52]:
df2.loc[[2,3],[0,1]]

Unnamed: 0,0,1
2,Peter,29
3,Linda,42


# Conditional Selections

✅ Basic Format:

df[ df['ColumnName'] condition ]


Multiple Conditions (AND / OR)

✅ AND (&)

df[(df['Age'] > 30) & (df['Salary'] > 65000)]


✅ OR (|)


df[(df['City'] == 'London') | (df['City'] == 'Berlin')]


📌 Don’t forget to wrap each condition in **()s** and use &, |(notand, or`)

 4. NOT condition

df[~(df['City'] == 'Paris')]


In [55]:
data_list3 = [
    ['John', 28, 'New York', 65000],
    ['Anna', 34, 'Paris', 70000],
    ['Peter', 29, 'Berlin', 62000],
    ['Linda', 42, 'London', 85000]
]

df4 = pd.DataFrame(data)


columns = ['Name', 'Age', 'City', 'Salary']
df4=pd.DataFrame(data_list3 , columns=columns)

In [56]:
df4

Unnamed: 0,Name,Age,City,Salary
0,John,28,New York,65000
1,Anna,34,Paris,70000
2,Peter,29,Berlin,62000
3,Linda,42,London,85000


In [None]:
# people whose age is above 30

In [57]:
df4[df4['Age']>30]

Unnamed: 0,Name,Age,City,Salary
1,Anna,34,Paris,70000
3,Linda,42,London,85000


In [None]:
# peoples whose age above 30 and their city must be paries

In [62]:
df4[(df4['Age']>30) & (df4['City'] == 'Paris')]

Unnamed: 0,Name,Age,City,Salary
1,Anna,34,Paris,70000
