# **Guided Lab 343.3.11 - Slicing Pandas Dataframe’s Data**

## **Lab Introduction:**
Pandas is a powerful Python library for data analysis and manipulation. One of its key features is the ability to slice data from DataFrames, allowing you to extract specific subsets of data for further analysis. This lab will guide you through the essential techniques for slicing Pandas DataFrames using the iloc and loc indexers.

**Why is slicing important?**

- `Data Exploration:` Slicing lets you quickly examine specific parts of your data, such as individual rows, columns, or ranges of values.
- `Data Cleaning:` You can use slicing to remove unwanted data or select only the data you need for your analysis.
- `Data Transformation:` Slicing can be used to create new DataFrames with specific columns or rows, enabling you to reshape your data for different purposes.
- `Data Analysis:` By isolating specific subsets of data, slicing allows you to perform targeted analyses on smaller portions of your dataset.

## **Learning Objective:**
By the end of this lab, learner will be able to:
- Slicing Pandas Dataframe’s Data
- Use the iloc indexer to slice data by row and column positions.
- Use the loc indexer to slice data by row and column labels.
- Select specific rows and columns from a DataFrame using both iloc and loc methods.
- Identify the first non-empty row in a Pandas Series or column using first_valid_index()
- Effectively manipulate and extract desired data subsets from Pandas DataFrames for analysis and further processing.


## **Instructions:**



## **Method #1: Slicing Dataframe using DataFrame.iloc[]**

**Example 1.1: Slicing by rows**

In the below example, we will slice:
- The only first row from the dataframe.
- The first four rows (from index 0 to 3) from the dataframe.


In [1]:
# importing pandas library
import pandas as pd

# Initializing the nested list with Data set
employee_list = [['James', 36, 75, 5428000],
               ['Villers', 38, 74, 3428000],
               ['VKole', 31, 70, 8428000],
               ['Smith', 34, 80, 4428000],
               ['Gayle', 40, 100, 4528000],
               ['Rooter', 33, 72, 7028000],
               ['Peterson', 42, 85, 2528000],
               ['John', 41, 85, 1528000],

]

# creating a pandas dataframe
df = pd.DataFrame(employee_list, columns=['Name', 'Age', 'Weight', 'Salary'])

print(' ------data frame before slicing-----')
print(df)

print(' ------ Select First Row by Index-----')
# Select First Row by Index
print(df.iloc[:1])

print(' ------ Select First 4 Row by Index-----')
# Slicing first 4 rows from dataframe
df1 = df.iloc[0:4]
# This above line used the iloc indexer to slice (select) the first 4 rows of the #original DataFrame (df) and assigns the result to a new DataFrame named df1.

print(' ------data frame after slicing----')
print(df1)


 ------data frame before slicing-----
       Name  Age  Weight   Salary
0     James   36      75  5428000
1   Villers   38      74  3428000
2     VKole   31      70  8428000
3     Smith   34      80  4428000
4     Gayle   40     100  4528000
5    Rooter   33      72  7028000
6  Peterson   42      85  2528000
7      John   41      85  1528000
 ------ Select First Row by Index-----
    Name  Age  Weight   Salary
0  James   36      75  5428000
 ------ Select First 4 Row by Index-----
 ------data frame after slicing----
      Name  Age  Weight   Salary
0    James   36      75  5428000
1  Villers   38      74  3428000
2    VKole   31      70  8428000
3    Smith   34      80  4428000


**Example 1.2 - Slicing by columns or index label**

In the below example, we will slice the columns from the data frame.




In [3]:
# Initializing the nested list with Data set
employee_list = [['James', 36, 75, 5428000],
               ['Villers', 38, 74, 3428000],
               ['VKole', 31, 70, 8428000],
               ['Smith', 34, 80, 4428000],
               ['Gayle', 40, 100, 4528000],
               ['Rooter', 33, 72, 7028000],
               ['Peterson', 42, 85, 2528000],
               ['John', 41, 85, 1528000]]

# creating a pandas dataframe
df = pd.DataFrame(employee_list, columns=['Name', 'Age', 'Weight', 'Salary'])

# data frame before slicing
print(df)
print( '====Slicing columns in dataframe======')

emp_df = df.iloc[:, 0:2]
print(emp_df)

print( '====Slicing rows & columns in dataframe======')
emp_df2 = df.iloc[:4, 0:2]
print(emp_df2)


       Name  Age  Weight   Salary
0     James   36      75  5428000
1   Villers   38      74  3428000
2     VKole   31      70  8428000
3     Smith   34      80  4428000
4     Gayle   40     100  4528000
5    Rooter   33      72  7028000
6  Peterson   42      85  2528000
7      John   41      85  1528000
       Name  Age
0     James   36
1   Villers   38
2     VKole   31
3     Smith   34
4     Gayle   40
5    Rooter   33
6  Peterson   42
7      John   41
      Name  Age
0    James   36
1  Villers   38
2    VKole   31
3    Smith   34


**Example 1.3**

Select values from row index 0 to 2(exclusive) and column position 0 to 2(exclusive)






In [4]:
print(df1.iloc[0:2, 0:2])

      Name  Age
0    James   36
1  Villers   38


## **Method #2 - Slicing Dataframe using DataFrame.loc[]**

Creating Demo Data Set for Dataframe



In [6]:
#create DataFrame with six columns
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12],
                   'steals': [4, 3, 3, 2, 5, 4, 3, 8],
                   'blocks': [1, 0, 0, 3, 2, 2, 1, 5]})

#view DataFrame
df


Unnamed: 0,team,points,assists,rebounds,steals,blocks
0,A,18,5,11,4,1
1,B,22,7,8,3,0
2,C,19,7,10,3,0
3,D,14,9,6,2,3
4,E,14,12,6,5,2
5,F,11,9,5,4,2
6,G,20,9,9,3,1
7,H,28,4,12,8,5


**Example 2.1- Slice by Specific Column Names**

We can use the following syntax to create a new DataFrame that only contains the columns team and rebounds:



In [9]:
#slice columns team and rebounds
df_new = df.loc[:, ['team', 'rebounds']]

#view new DataFrame
df_new


Unnamed: 0,team,rebounds
0,A,11
1,B,8
2,C,10
3,D,6
4,E,6
5,F,5
6,G,9
7,H,12


**Example 2.2 - Slice by Column Names in Range**

We can use the following example to create a new DataFrame that only contains the columns in the range between team and rebounds:




In [11]:

#slice columns between team and rebounds
df_new = df.loc[:, 'team':'rebounds']

#view new DataFrame
df_new


Unnamed: 0,team,points,assists,rebounds
0,A,18,5,11
1,B,22,7,8
2,C,19,7,10
3,D,14,9,6
4,E,14,12,6
5,F,11,9,5
6,G,20,9,9
7,H,28,4,12


**Example 2.3 - Select values from row index 0 to 2 and 'Name' column**

Select values from row index 0 to 2 and 'Name' column




In [12]:
print(df.loc[3:6, ['team']])


  team
3    D
4    E
5    F
6    G


 ## **Example 3: Identify the first non-empty row in a Pandas Series or column**

To identify the first non-empty row in a Pandas Series or column, you can use the `first_valid_index()` method. This method returns the index label of the first non-null (non-empty) value in the Series. Here's how you can use it:

In [13]:
# Example Pandas Series
data = pd.Series([None, None, 5, 10, None, 20])

# Find the index label of the first non-empty row
first_non_empty_index = data.first_valid_index()

print("Index of the first non-empty row:", first_non_empty_index)
print("Value of the first non-empty row:", data[first_non_empty_index])


Index of the first non-empty row: 2
Value of the first non-empty row: 5.0
