## 2041 - Accepted Candidates From the Interviews
### Table: Candidates

| Column Name  | Type    |
|--------------|---------|
| candidate_id | int     |
| name         | varchar |
| years_of_exp | int     |
| interview_id | int     |

candidate_id is the primary key column for this table.  
Each row of this table indicates the name of a candidate, their number of years of experience, and their interview ID.

---

### Table: Rounds

| Column Name  | Type |
|--------------|------|
| interview_id | int  |
| round_id     | int  |
| score        | int  |

(interview_id, round_id) is the primary key column for this table.  
Each row of this table indicates the score of one round of an interview.

---

### Problem Statement

Write an SQL query to report the IDs of the candidates who have at least two years of experience and the sum of the score of their interview rounds is strictly greater than 15.

Return the result table in any order.

---

### Example 1:

#### Input:

##### Candidates table:

| candidate_id | name    | years_of_exp | interview_id |
|--------------|---------|--------------|--------------|
| 11           | Atticus | 1            | 101          |
| 9            | Ruben   | 6            | 104          |
| 6            | Aliza   | 10           | 109          |
| 8            | Alfredo | 0            | 107          |

##### Rounds table:

| interview_id | round_id | score |
|--------------|----------|-------|
| 109          | 3        | 4     |
| 101          | 2        | 8     |
| 109          | 4        | 1     |
| 107          | 1        | 3     |
| 104          | 3        | 6     |
| 109          | 1        | 4     |
| 104          | 4        | 7     |
| 104          | 1        | 2     |
| 109          | 2        | 1     |
| 104          | 2        | 7     |
| 107          | 2        | 3     |
| 101          | 1        | 8     |

---

#### Output:

| candidate_id |
|--------------|
| 9            |

---

### Explanation:

- Candidate 11: The total score is 16, and they have one year of experience. We do not include them in the result table because of their years of experience.  
- Candidate 9: The total score is 22, and they have six years of experience. We include them in the result table.  
- Candidate 6: The total score is 10, and they have ten years of experience. We do not include them in the result table because the score is not good enough.  
- Candidate 8: The total score is 6, and they have zero years of experience. We do not include them in the result table because of their years of experience and the score.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Define schemas
candidates_schema = StructType([
    StructField("candidate_id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("years_of_exp", IntegerType(), True),
    StructField("interview_id", IntegerType(), True)
])

rounds_schema = StructType([
    StructField("interview_id", IntegerType(), True),
    StructField("round_id", IntegerType(), True),
    StructField("score", IntegerType(), True)
])

# Sample data
candidates_data = [
    (11, "Atticus", 1, 101),
    (9, "Ruben", 6, 104),
    (6, "Aliza", 10, 109),
    (8, "Alfredo", 0, 107)
]

rounds_data = [
    (109, 3, 4),
    (101, 2, 8),
    (109, 4, 1),
    (107, 1, 3),
    (104, 3, 6),
    (109, 1, 4),
    (104, 4, 7),
    (104, 1, 2),
    (109, 2, 1),
    (104, 2, 7),
    (107, 2, 3),
    (101, 1, 8)
]

# Create DataFrames
candidates_df = spark.createDataFrame(candidates_data, schema=candidates_schema)
rounds_df = spark.createDataFrame(rounds_data, schema=rounds_schema)

# Register temp views
candidates_df.createOrReplaceTempView("Candidates")
rounds_df.createOrReplaceTempView("Rounds")


In [0]:
from pyspark.sql.functions import *

c_df = candidates_df.selectExpr(
    "candidate_id as c_id", "name", "years_of_exp", "interview_id  as c_i_id"
)
r_df = rounds_df.selectExpr("interview_id as r_i_id", "round_id", "score")
c_df.join(r_df, col("c_i_id") == col("r_i_id"), "left").filter(
    col("years_of_exp") >= 2
).groupBy(col("c_id")).agg(sum(col("score")).alias("total_score")).filter(
    col("total_score") > 15
).selectExpr(
    "c_id as candidate_id"
).display()

In [0]:
%sql
Select candidate_id from Candidates c left join Rounds r
on c.interview_id = r.interview_id
where  c.years_of_exp >= 2
group by candidate_id
having (sum(score)>15)

In [0]:

# SQL logic
query = """
SELECT c.candidate_id
FROM Candidates c
JOIN (
    SELECT interview_id, SUM(score) AS total_score
    FROM Rounds
    GROUP BY interview_id
) r ON c.interview_id = r.interview_id
WHERE c.years_of_exp >= 2 AND r.total_score > 15
"""

result_df = spark.sql(query)
display(result_df)