## 1699. Number of Calls Between Two Persons
### Table: Calls

| Column Name | Type |
|-------------|------|
| from_id     | int  |
| to_id       | int  |
| duration    | int  |

This table does not have a primary key, it may contain duplicates.  
This table contains the duration of a phone call between from_id and to_id.  
from_id != to_id

Write an SQL query to report the number of calls and the total call duration between each pair of distinct persons (person1, person2) where person1 < person2.

Return the result table in any order.

---

### Calls table:

| from_id | to_id | duration |
|---------|-------|----------|
| 1       | 2     | 59       |
| 2       | 1     | 11       |
| 1       | 3     | 20       |
| 3       | 4     | 100      |
| 3       | 4     | 200      |
| 3       | 4     | 200      |
| 4       | 3     | 499      |

---

### Result table:

| person1 | person2 | call_count | total_duration |
|---------|---------|------------|----------------|
| 1       | 2       | 2          | 70             |
| 1       | 3       | 1          | 20             |
| 3       | 4       | 4          | 999            |

---

**Explanation:**  
Users 1 and 2 had 2 calls and the total duration is 70 (59 + 11).  
Users 1 and 3 had 1 call and the total duration is 20.  
Users 3 and 4 had 4 calls and the total duration is 999 (100 + 200 + 200 + 499).

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType
from pyspark.sql.functions import col, when, least, greatest

# Start Spark session
spark = SparkSession.builder.appName("CallAggregation").getOrCreate()

# Define schema
schema = StructType([
    StructField("from_id", IntegerType(), True),
    StructField("to_id", IntegerType(), True),
    StructField("duration", IntegerType(), True)
])

# Sample data
data = [
    (1, 2, 59),
    (2, 1, 11),
    (1, 3, 20),
    (3, 4, 100),
    (3, 4, 200),
    (3, 4, 200),
    (4, 3, 499)
]

# Create DataFrame
df = spark.createDataFrame(data, schema)
df.createOrReplaceTempView("Calls")


In [0]:
from pyspark.sql.functions import *
df_result = df.withColumn("person1", when(col("from_id")< col("to_id"), col("from_id")).otherwise(col("to_id")))\
.withColumn("person2", when(col("from_id")< col("to_id"), col("to_id")).otherwise(col("from_id"))).select("person1", "person2", "duration")
df_result.groupBy(col("person1"),col("person2"))\
    .agg(
        count("person1").alias("call_count")
        ,sum("duration").alias("total_duration")
        )\
            .display()

In [0]:

# Create normalized pairs where person1 < person2
normalized_df = df.select(
    least(col("from_id"), col("to_id")).alias("person1"),
    greatest(col("from_id"), col("to_id")).alias("person2"),
    col("duration")
)
normalized_df.createOrReplaceTempView("NormalizedCalls")



In [0]:
# SQL logic
query = """
SELECT 
    from_id, 
    to_id, 
    COUNT(*) AS call_count, 
    SUM(duration) AS total_duration
FROM Calls
GROUP BY to_id, from_id
"""

# Execute and display
result = spark.sql(query)
display(result)