In PySpark, you can use multiple conditions in a filter() (or where()) clause by combining column expressions with logical operators such as & (and), | (or), and ~ (not).

Here’s how to modify your example to filter on two or more columns:

# Example: Select columns and filter rows using multiple conditions

In [2]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

In [3]:
# Start Spark session
spark = SparkSession.builder.appName("FilterExample").getOrCreate()

In [4]:
# Sample data
data = [
    ("Alice", 30, "NY"),
    ("Bob", 25, "CA"),
    ("Charlie", 35, "NY"),
    ("David", 28, "TX"),
]

In [5]:
# Create DataFrame
df = spark.createDataFrame(data, ["Name", "Age", "State"])

In [6]:
# Select columns and filter with multiple conditions
filtered_df = df.select("Name", "Age", "State").filter(
    (col("Age") > 26) & (col("State") == "NY")
)

# Show result
filtered_df.show()


+-------+---+-----+
|   Name|Age|State|
+-------+---+-----+
|  Alice| 30|   NY|
|Charlie| 35|   NY|
+-------+---+-----+



### Explanation:
col("Age") > 26: checks if age is greater than 26.

col("State") == "NY": checks if state is NY.

& is the logical AND operator (wrap each condition in parentheses!).

### Other operators you can use:
|: OR

~: NOT

## Example using OR

In [7]:
df.filter((col("Age") > 30) | (col("State") == "TX")).show()

+-------+---+-----+
|   Name|Age|State|
+-------+---+-----+
|Charlie| 35|   NY|
|  David| 28|   TX|
+-------+---+-----+



## 8. Stop the SparkSession

In [None]:
spark.stop()