# Unfinished Parts

## Tesla SQL Interview Question

### Question

Tesla is investigating production bottlenecks and they need your help to extract the relevant data.  
Write a query to determine which parts have begun the assembly process but are not yet finished.

---

### Assumptions:

- `parts_assembly` table contains all parts currently in production, each at varying stages of the assembly process.
- An unfinished part is one that lacks a `finish_date`.

This question is straightforward, so let's approach it with simplicity in both thinking and solution.

---

### Table: `parts_assembly`

| Column Name    | Type     |
|----------------|----------|
| part           | string   |
| finish_date    | datetime |
| assembly_step  | integer  |

---

### Example Input for `parts_assembly` Table:

| part    | finish_date           | assembly_step |
|---------|-----------------------|---------------|
| battery | 01/22/2022 00:00:00   | 1             |
| battery | 02/22/2022 00:00:00   | 2             |
| battery | 03/22/2022 00:00:00   | 3             |
| bumper  | 01/22/2022 00:00:00   | 1             |
| bumper  | 02/22/2022 00:00:00   | 2             |
| bumper  | NULL                  | 3             |
| bumper  | NULL                  | 4             |

---

### Example Output:

| part   | assembly_step |
|--------|---------------|
| bumper | 3             |
| bumper | 4             |

---

### Explanation

The bumpers in **step 3** and **step 4** are the only items that remain unfinished, as they lack a recorded `finish_date`.


In [3]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType
from datetime import datetime

# Create Spark session
spark = SparkSession.builder.master('local[1]').getOrCreate()
sc = spark.sparkContext

# Define the data for parts_assembly table
df = sc.parallelize([
    ("battery", datetime(2022, 1, 22, 0, 0), 1),
    ("battery", datetime(2022, 2, 22, 0, 0), 2),
    ("battery", datetime(2022, 3, 22, 0, 0), 3),
    ("bumper", datetime(2022, 1, 22, 0, 0), 1),
    ("bumper", datetime(2022, 2, 22, 0, 0), 2),
    ("bumper", None, 3),
    ("bumper", None, 4)
])


# Show the DataFrame
df.toDF().show(truncate=False)


+-------+-------------------+---+
|_1     |_2                 |_3 |
+-------+-------------------+---+
|battery|2022-01-22 00:00:00|1  |
|battery|2022-02-22 00:00:00|2  |
|battery|2022-03-22 00:00:00|3  |
|bumper |2022-01-22 00:00:00|1  |
|bumper |2022-02-22 00:00:00|2  |
|bumper |NULL               |3  |
|bumper |NULL               |4  |
+-------+-------------------+---+



In [None]:
df2=df\
    .filter(lambda x: (x[1] is None))\
    .map(lambda x: (x[0],x[2]))\
    
df2.toDF(['part','assembly_step']).show()



+------+-------------+
|  part|assembly_step|
+------+-------------+
|bumper|            3|
|bumper|            4|
+------+-------------+



: 