# Pandas DataFrame Selection

This notebook demonstrates various operations that can be performed on a Pandas DataFrame, including selection, slicing, indexing, and boolean indexing.

## Importing Libraries

First, we import the necessary libraries.


In [7]:
import pandas as pd

## Creating a DataFrame

We create a list of lists representing student data and then convert it into a Pandas DataFrame.

In [11]:
# Create a list of lists representing student data
students = [
    ['Eric', 40, 'Machine Learning'],
    ['Ivy', 37, 'Project Management'],
    ['Jude', 10, 'Programmer'],
    ['Alice', 22, 'Data Science'],
    ['Bob', 25, 'Web Development'],
    ['Charlie', 23, 'Cyber Security'],
    ['Diana', 28, 'AI and Robotics'],
    ['Edward', 21, 'Cloud Computing'],
    ['Fiona', 24, 'Software Engineering']
]

# Define the column names for the DataFrame
columns = ['Name', 'Age', 'Course']

# Create an index for the DataFrame
index = list(range(1, len(students) + 1))

# Create a Pandas DataFrame from the students list with the specified columns and index
students_df = pd.DataFrame(students, columns=columns, index=index)
students_df

Unnamed: 0,Name,Age,Course
1,Eric,40,Machine Learning
2,Ivy,37,Project Management
3,Jude,10,Programmer
4,Alice,22,Data Science
5,Bob,25,Web Development
6,Charlie,23,Cyber Security
7,Diana,28,AI and Robotics
8,Edward,21,Cloud Computing
9,Fiona,24,Software Engineering


## Selecting Columns

We can select a single column from the DataFrame.

In [12]:
# Select column
select_column_students_df = students_df.Name
select_column_students_df

1       Eric
2        Ivy
3       Jude
4      Alice
5        Bob
6    Charlie
7      Diana
8     Edward
9      Fiona
Name: Name, dtype: object

## Slicing Data

We can use slicing to select specific portions of the DataFrame.

In [13]:
# Select data by using slice
select_slice_students_df = students_df[:3]
select_slice_students_df

Unnamed: 0,Name,Age,Course
1,Eric,40,Machine Learning
2,Ivy,37,Project Management
3,Jude,10,Programmer


## Label-based Indexing with `loc`

We can use the `loc` method to select data by label.

In [14]:
# Select data by label using loc()
students_df_loc_label = students_df.loc[index[0]]
students_df_loc_label

Name                  Eric
Age                     40
Course    Machine Learning
Name: 1, dtype: object

## Selecting all rows (:) with specific column labels

In [16]:
students_df_loc_rows = students_df.loc[:, ['Name', 'Age']]
students_df_loc_rows

Unnamed: 0,Name,Age
1,Eric,40
2,Ivy,37
3,Jude,10
4,Alice,22
5,Bob,25
6,Charlie,23
7,Diana,28
8,Edward,21
9,Fiona,24


## For label slicing, both endpoints are included

In [17]:
students_df_loc_rows_columns = students_df.loc[1:4, ['Name', 'Age']]
students_df_loc_rows_columns

Unnamed: 0,Name,Age
1,Eric,40
2,Ivy,37
3,Jude,10
4,Alice,22


## Selecting a single row and column label returns a scalar

In [18]:
students_df_loc_scalar = students_df.loc[1, 'Age']
students_df_loc_scalar

40

## For getting fast access to a scalar

In [19]:
students_df_at_scalar = students_df.at[2, 'Course']
students_df_at_scalar

'Project Management'

## Position-based Indexing with `iloc`

We can use the `iloc` method to select data by position.

In [20]:
# Select via the position of the passed integers
students_df_iloc_position = students_df.iloc[1]
students_df_iloc_position

Name                     Ivy
Age                       37
Course    Project Management
Name: 2, dtype: object

## Select via integer slices

In [21]:
students_df_iloc_slices = students_df.iloc[1:3, 0:2]
students_df_iloc_slices

Unnamed: 0,Name,Age
2,Ivy,37
3,Jude,10


## Select via lists of integer position locations

In [22]:
students_df_iloc_positions = students_df.iloc[[3, 1], [2, 1]]
students_df_iloc_positions

Unnamed: 0,Course,Age
4,Data Science,22
2,Project Management,37


## Select via slicing rows explicitly

In [23]:
students_df_iloc_rows = students_df.iloc[0:2, :]
students_df_iloc_rows

Unnamed: 0,Name,Age,Course
1,Eric,40,Machine Learning
2,Ivy,37,Project Management


## Select via slicing columns explicitly

In [24]:
students_df_iloc_columns = students_df.iloc[:, 0:2]
students_df_iloc_columns

Unnamed: 0,Name,Age
1,Eric,40
2,Ivy,37
3,Jude,10
4,Alice,22
5,Bob,25
6,Charlie,23
7,Diana,28
8,Edward,21
9,Fiona,24


## Boolean Indexing

We can use boolean indexing to filter rows based on specific conditions.

In [25]:
# Boolean indexing: Select rows where Age is greater than 25
students_df_boolean_indexing = students_df[students_df['Age'] > 25]
students_df_boolean_indexing

Unnamed: 0,Name,Age,Course
1,Eric,40,Machine Learning
2,Ivy,37,Project Management
7,Diana,28,AI and Robotics


## Using `isin()` Method for Filtering

The `isin()` method can be used to filter DataFrame rows based on a list of values.

In [26]:
# Using isin() method for filtering
students_df2 = students_df.copy()
students_df2['GPA'] = [5, 4, 3, 2, 3, 4, 6, 8, 4]
students_df2_is_in = students_df2[students_df2['GPA'].isin([3, 8])]
students_df2_is_in

Unnamed: 0,Name,Age,Course,GPA
3,Jude,10,Programmer,3
5,Bob,25,Web Development,3
8,Edward,21,Cloud Computing,8
