# **Guided Lab - 343.3.12 - How to Select Panda Rows using query()**

# **Lab Objective:**

In this lab, we will demonstrate various techniques for querying Pandas DataFrame rows. We will cover:

- Selecting rows based on single or multiple conditions.
- Querying with a list of values (checking if a column value exists within a list of string values).
- Utilizing the `query()` function to efficiently access specific records from a Pandas DataFrame.

# **Learning Objective:**

By the end of this lab, Learner will be able to utilize **query()** function to retrieve desired data from Pandas DataFrame.

## **Introduction**

The `query()` function, a powerful tool for selecting rows from a DataFrame based on specific criteria. The `query()` function enhances data exploration and manipulation, allowing for precise data retrieval and analysis. Throughout this lab, you will gain hands-on experience with `query()` and understand its syntax and applications in data analysis.




**Following is the syntax of query() method.**

```
DataFrame.query(expr, inplace=False)

- expr: expression takes conditions to query rows
- inplace: Defaults to False. When set to True, it updates the referring DataFrame and query() method returns None.

```



### **let’s create a pandas DataFrame from Dictionary.**

Remember when you query DataFrame Rows using query() fucntion, it always returns a new DataFrame with selected rows, in order to update existing df you have to use **`inplace=True`**. I hope this article helps you learn Pandas.

In [1]:
# importing pandas library
import pandas as pd

In [13]:
# Initializing the nested list with Data set
employee_list = [['James', 36, 75, 5428000],
               ['Villers', 38, 74, 3428000],
               ['VKole', 31, 70, 8428000],
               ['Smith', 34, 80, 4428000],
               ['Gayle', 40, 100, 4528000],
               ['Rooter', 33, 72, 7028000],
               ['Peterson', 42, 85, 2528000],
               ['John', 41, 85, 1528000],
               ['Rome', 45, 85, 152890],
               ['Dave', 55, 85, 152890],
                ['Smith', 44, 60, 1428000]

]

# creating a pandas dataframe
idx = pd.RangeIndex(1,12)
df = pd.DataFrame(employee_list, columns=['Name', 'Age', 'Weight', 'Salary'], index = idx)

print(' ------data frame before slicing-----')
df

 ------data frame before slicing-----


Unnamed: 0,Name,Age,Weight,Salary
1,James,36,75,5428000
2,Villers,38,74,3428000
3,VKole,31,70,8428000
4,Smith,34,80,4428000
5,Gayle,40,100,4528000
6,Rooter,33,72,7028000
7,Peterson,42,85,2528000
8,John,41,85,1528000
9,Rome,45,85,152890
10,Dave,55,85,152890


### **DataFrame.query() takes condition in expression to select rows from a DataFrame. This expression can have one or multiple conditions.**

- The following example we can consider as a categorical variable matches a single value.

In [4]:
df2=df.query("Name == 'Gayle'")
df2

Unnamed: 0,Name,Age,Weight,Salary
5,Gayle,40,100,4528000


### **In case you wanted to use a variable in the expression, use @, as shown in the below code example.**

In [10]:
# Using variable
value='Gayle'
df2 = df.query("Name == @value")
df2

Unnamed: 0,Name,Age,Weight,Salary
5,Gayle,40,100,4528000


### **If you notice the above examples return a new DataFrame after filtering the rows. if you wanted to update the existing DataFrame use inplace=True**

In [11]:
# Inplace
print(df.query("Name == 'Gayle'", inplace=True))
print(df)

None
    Name  Age  Weight   Salary
5  Gayle   40     100  4528000


### **Filter records: We can use conditional operators and comparision operators**

In [14]:
print(df.query("Name != 'VKole'"))


        Name  Age  Weight   Salary
1      James   36      75  5428000
2    Villers   38      74  3428000
4      Smith   34      80  4428000
5      Gayle   40     100  4528000
6     Rooter   33      72  7028000
7   Peterson   42      85  2528000
8       John   41      85  1528000
9       Rome   45      85   152890
10      Dave   55      85   152890
11     Smith   44      60  1428000


In [15]:
df.query("Salary >= 4528000")


Unnamed: 0,Name,Age,Weight,Salary
1,James,36,75,5428000
3,VKole,31,70,8428000
5,Gayle,40,100,4528000
6,Rooter,33,72,7028000


In [16]:
df.query("Salary >= 6528000 and Salary >= 2528000")

Unnamed: 0,Name,Age,Weight,Salary
3,VKole,31,70,8428000
6,Rooter,33,72,7028000


In [17]:
df.query("Salary <= 6528000 and Salary >= 2528000")

Unnamed: 0,Name,Age,Weight,Salary
1,James,36,75,5428000
2,Villers,38,74,3428000
4,Smith,34,80,4428000
5,Gayle,40,100,4528000
7,Peterson,42,85,2528000


### **We can use the 'in' operator with query()**

- The following example, we can consider if a categorical variable is in a list of items.




In [None]:
print(df.query("Name in ('James','Smith')"))


     Name  Age  Weight   Salary
1   James   36      75  5428000
4   Smith   34      80  4428000
11  Smith   44      60  1428000


### **We can use string methods with query()**

For conditions requiring partial string matches, the string method (str.xxx()) can be used. Although exact matches can be achieved with == or in, string methods provide more flexibility.

Pandas: Extract rows that contain specific strings from a DataFrame
- Here are some useful methods:

    - str.contains(): Checks if a specific string is contained.

    - str.endswith(): Checks if a string ends with a specific string.

    - str.startswith(): Checks if a string starts with a specific string.

    - str.match(): Checks if a string matches a regular expression pattern.


let’s create a pandas DataFrame from Dictionary again.

In [19]:
# Initializing the nested list with Data set
employee_list = [['James', 36, 75, 5428000],
               ['Villers', 38, 74, 3428000],
               ['VKole', 31, 70, 8428000],
               ['Smith', 34, 80, 4428000],
               ['Gayle', 40, 100, 4528000],
               ['Rooter', 33, 72, 7028000],
               ['Peterson', 42, 85, 2528000],
               ['John', 41, 85, 1528000],
               ['Rome', 45, 85, 152890],
               ['Dave', 55, 85, 152890],
                ['Smith', 44, 60, 1428000]

]

# creating a pandas dataframe
idx = pd.RangeIndex(1,12)
df = pd.DataFrame(employee_list, columns=['Name', 'Age', 'Weight', 'Salary'], index = idx)

print(' ------data frame before slicing-----')
df

 ------data frame before slicing-----


Unnamed: 0,Name,Age,Weight,Salary
1,James,36,75,5428000
2,Villers,38,74,3428000
3,VKole,31,70,8428000
4,Smith,34,80,4428000
5,Gayle,40,100,4528000
6,Rooter,33,72,7028000
7,Peterson,42,85,2528000
8,John,41,85,1528000
9,Rome,45,85,152890
10,Dave,55,85,152890


In [20]:
print(df.query('Name.str.endswith("s")'))

      Name  Age  Weight   Salary
1    James   36      75  5428000
2  Villers   38      74  3428000


In [21]:
print(df.query('Name.str.contains("le")'))

      Name  Age  Weight   Salary
2  Villers   38      74  3428000
3    VKole   31      70  8428000
5    Gayle   40     100  4528000


### Non-string type columns can be converted to strings with **astype()**, allowing string methods to be applied. This conversion can also be executed within query().

In [22]:
print(df.query('Age.astype("str").str.endswith("5")'))

    Name  Age  Weight  Salary
9   Rome   45      85  152890
10  Dave   55      85  152890


In [24]:
print(df.query('Age.astype("str").str.startswith("3")'))

      Name  Age  Weight   Salary
1    James   36      75  5428000
2  Villers   38      74  3428000
3    VKole   31      70  8428000
4    Smith   34      80  4428000
6   Rooter   33      72  7028000
