Final stretch. This part focuses on **production readiness + performance + internals** ‚Äî exactly what **senior interviews and real projects** test.

---

# ‚úÖ PySpark Top 100 Methods ‚Äî **PART 4 (76‚Äì100)**

**Category: Writing Data, File Formats, Partitioning, Performance, Internals, Streaming**

---

## 7Ô∏è‚É£6Ô∏è‚É£ `df.write`

### **What**

Entry point for writing data.

### **Why**

Persists DataFrame to storage.

In [None]:
df.write.parquet("output/")

---

## 7Ô∏è‚É£7Ô∏è‚É£ `df.write.format()`

### **What**

Explicit output format.

### **Why**

Production pipelines require explicit formats.

In [None]:
df.write.format("parquet").save("s3://bucket/path")

---

## 7Ô∏è‚É£8Ô∏è‚É£ `mode()`

### **What**

Write mode.

| Mode      | Meaning |
| --------- | ------- |
| overwrite | Replace |
| append    | Add     |
| ignore    | Skip    |
| error     | Fail    |

In [None]:
df.write.mode("overwrite").parquet("out/")

---

## 7Ô∏è‚É£9Ô∏è‚É£ `partitionBy()`

### **What**

Creates directory-based partitions.

### **Why**

Enables **partition pruning**.

In [None]:
df.write.partitionBy("year", "month").parquet("out/")

### **Interview**

> Too many partitions = small file problem

---

## 8Ô∏è‚É£0Ô∏è‚É£ `bucketBy()`

### **What**

Bucketing data.

### **Why**

Optimizes joins & aggregations.

In [None]:
df.write.bucketBy(10, "user_id").sortBy("user_id").saveAsTable("users")

---

## 8Ô∏è‚É£1Ô∏è‚É£ `saveAsTable()`

### **What**

Writes to Hive metastore.

### **Use Case**

Spark SQL + BI tools.

---

## 8Ô∏è‚É£2Ô∏è‚É£ `insertInto()`

### **What**

Inserts into existing table.

### **Interview**

> Schema must match exactly

---

## 8Ô∏è‚É£3Ô∏è‚É£ `parquet()`

### **What**

Columnar storage.

### **Why**

Best default format for Spark.

### **Interview**

> Supports predicate pushdown + compression

---

## 8Ô∏è‚É£4Ô∏è‚É£ `orc()`

### **What**

Optimized for Hive.

---

## 8Ô∏è‚É£5Ô∏è‚É£ `json()`

### **What**

Semi-structured output.

### **Downside**

‚ùå No schema enforcement

---

## 8Ô∏è‚É£6Ô∏è‚É£ `csv()`

### **What**

Text-based format.

### **Interview**

> Avoid for large-scale analytics

---

## 8Ô∏è‚É£7Ô∏è‚É£ `option("compression")`

### **What**

Compression type.

In [None]:
df.write.option("compression", "snappy").parquet("out/")

---

## 8Ô∏è‚É£8Ô∏è‚É£ `checkpoint()`

### **What**

Truncates lineage.

### **Why**

Prevents long DAG failures.

---

## 8Ô∏è‚É£9Ô∏è‚É£ `unpersist()`

### **What**

Removes cached data.

In [None]:
df.unpersist()

---

## 9Ô∏è‚É£0Ô∏è‚É£ `spark.sql()`

### **What**

Run SQL queries.

### **Why**

Easy migration from RDBMS.

In [None]:
spark.sql("SELECT * FROM table")

---

## 9Ô∏è‚É£1Ô∏è‚É£ `createOrReplaceTempView()`

### **What**

Temporary SQL view.

In [None]:
df.createOrReplaceTempView("emp")

---

## 9Ô∏è‚É£2Ô∏è‚É£ `explain("formatted")`

### **What**

Detailed execution plan.

### **Interview**

> Understand **Scan ‚Üí Filter ‚Üí Exchange ‚Üí Aggregate**

---

## 9Ô∏è‚É£3Ô∏è‚É£ `spark.conf.set()`

### **What**

Set Spark configs.

In [None]:
spark.conf.set("spark.sql.shuffle.partitions", 200)

---

## 9Ô∏è‚É£4Ô∏è‚É£ `spark.sql.shuffle.partitions`

### **What**

Default shuffle partitions (200).

### **Interview**

> Tune this aggressively

---

## 9Ô∏è‚É£5Ô∏è‚É£ `spark.sql.adaptive.enabled`

### **What**

Adaptive Query Execution (AQE).

### **Why**

Dynamic optimization.

---

## 9Ô∏è‚É£6Ô∏è‚É£ `spark.readStream`

### **What**

Streaming source.

In [None]:
df = spark.readStream.format("kafka").load()

---

## 9Ô∏è‚É£7Ô∏è‚É£ `writeStream`

### **What**

Streaming sink.

In [None]:
df.writeStream.format("console").start()

---

## 9Ô∏è‚É£8Ô∏è‚É£ `trigger()`

### **What**

Controls micro-batch timing.

---

## 9Ô∏è‚É£9Ô∏è‚É£ `foreachBatch()`

### **What**

Custom batch processing.

### **Use Case**

Streaming ‚Üí DB writes.

---

## üîü0Ô∏è‚É£ `awaitTermination()`

### **What**

Keeps streaming alive.

In [None]:
query.awaitTermination()

---

# üéØ FULL INTERVIEW READINESS ACHIEVED

You now have:

* **Top 100 PySpark methods**
* **What / Why / How / Scenario**
* **Performance + internals**
* **Production & streaming**

---

## üöÄ Next-Level Add-Ons (Highly Recommended)

I can now help you with:

1. **PySpark vs SQL mapping (method ‚Üí SQL)**
2. **50 real interview scenarios with answers**
3. **Spark DAG & Catalyst optimizer deep dive**
4. **Hands-on mini projects per category**
5. **PySpark performance tuning checklist**

Just tell me **what you want next**.