 Entire PySpark column syntax guide wrapped inside a single code block so you can paste it directly into a **Databricks Markdown cell**—clean, complete, and ready for DataGym documentation or onboarding notebooks:


# 🧠 PySpark Column Syntax Cheat Sheet — Reference, Transform, Alias

In PySpark, you can reference and manipulate columns in several expressive ways depending on the context—whether you're selecting, transforming, filtering, or aliasing. This unified guide captures the most common and powerful patterns.

---

## 🔹 1. Dot Notation

```python
df.colName
df.salary
```

- Simple and readable.
- Works only if `colName` is a valid Python identifier (no spaces or special characters).

---

## 🔹 2. Bracket Notation

```python
df["colName"]
df["salary"]
```

- More flexible.
- Useful when column names have spaces, dots, or special characters.

---

## 🔹 3. `col()` Function

```python
from pyspark.sql.functions import col

col("salary")
col("employee.name")
```

- Preferred in transformations and filters.
- Enables chaining like `.alias()`, `.cast()`, `.substr()`.

---

## 🔹 4. `df.selectExpr()` with SQL Expressions

```python
df.selectExpr("salary * 1.1 as updated_salary", "name")
```

- Powerful for inline calculations and aliasing.
- Accepts SQL-like strings.

---

## 🔹 5. `df.select()` with Column Objects

```python
df.select(col("salary").alias("updated_salary"), col("name"))
```

- Explicit and readable.
- Great for chaining transformations.

---

## 🔹 6. `df.withColumn()` for Adding or Modifying Columns

```python
df.withColumn("bonus", col("salary") * 0.1)
```

- Adds or replaces a column.
- Often used in pipelines.

---

## 🔹 7. `df["colName"].alias("newName")`

```python
df.select(df["salary"].alias("updated_salary"))
```

- Combines bracket notation with aliasing.
- Handy in joins or renaming.

---

## 🔹 8. SQL via Temp View

```python
df.createOrReplaceTempView("Employee")
spark.sql("SELECT name, salary FROM Employee WHERE salary > 50000")
```

- Ideal for SQL lovers.
- Column access via SQL syntax.

---

## 🔹 9. Nested Struct Columns

```python
df.select("employee.name", "employee.salary")
col("employee.name")
```

- For accessing fields inside structs.
- Use dot notation or `col()`.

---

## 🧪 Bonus: Column Functions

You can apply transformations directly:

```python
col("name").substr(1, 3)
col("salary").cast("double")
col("name").like("J%")
```

---

