# Teams Power Users

## Microsoft SQL Interview Question

### Question

Write a query to identify the top 2 Power Users who sent the highest number of messages on Microsoft Teams in August 2022. Display the IDs of these 2 users along with the total number of messages they sent. Output the results in descending order based on the count of the messages.

---

### Assumption:

- No two users have sent the same number of messages in August 2022.

---

### Table: `messages`

| Column Name | Type       |
|-------------|------------|
| message_id  | integer    |
| sender_id   | integer    |
| receiver_id | integer    |
| content     | varchar    |
| sent_date   | datetime   |

---

### Example Input for `messages` Table:

| message_id | sender_id | receiver_id | content                | sent_date             |
|------------|-----------|-------------|------------------------|-----------------------|
| 901        | 3601      | 4500        | You up?                | 08/03/2022 00:00:00   |
| 902        | 4500      | 3601        | Only if you're buying  | 08/03/2022 00:00:00   |
| 743        | 3601      | 8752        | Let's take this offline| 06/14/2022 00:00:00   |
| 922        | 3601      | 4500        | Get on the call        | 08/10/2022 00:00:00   |

---

### Example Output:

| sender_id | message_count |
|-----------|---------------|
| 3601      | 2             |
| 4500      | 1             |

---

### Explanation

For **sender_id 3601**, they sent a total of **2 messages** in August 2022.  
For **sender_id 4500**, they sent a total of **1 message** in August 2022.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, TimestampType
from pyspark.sql.functions import *
from datetime import datetime

# Create Spark session
spark = SparkSession.builder.master('local[1]').appName("TeamsMessages").getOrCreate()

# Define the schema for messages table
schema = StructType([
    StructField("message_id", IntegerType(), True),
    StructField("sender_id", IntegerType(), True),
    StructField("receiver_id", IntegerType(), True),
    StructField("content", StringType(), True),
    StructField("sent_date", TimestampType(), True)
])

# Define the data for messages table
data = [
    (901, 3601, 4500, "You up?", datetime(2022, 8, 3, 0, 0)),
    (902, 4500, 3601, "Only if you're buying", datetime(2022, 8, 3, 0, 0)),
    (743, 3601, 8752, "Let's take this offline", datetime(2022, 6, 14, 0, 0)),
    (922, 3601, 4500, "Get on the call", datetime(2022, 8, 10, 0, 0))
]

# Create the Spark DataFrame
messages_df = spark.createDataFrame(data, schema=schema)

# Show the DataFrame
messages_df.show(truncate=False)


+----------+---------+-----------+-----------------------+-------------------+
|message_id|sender_id|receiver_id|content                |sent_date          |
+----------+---------+-----------+-----------------------+-------------------+
|901       |3601     |4500       |You up?                |2022-08-03 00:00:00|
|902       |4500     |3601       |Only if you're buying  |2022-08-03 00:00:00|
|743       |3601     |8752       |Let's take this offline|2022-06-14 00:00:00|
|922       |3601     |4500       |Get on the call        |2022-08-10 00:00:00|
+----------+---------+-----------+-----------------------+-------------------+



In [10]:
messages_df.where(col('sent_date').like('2022-08%'))\
            .groupBy('sender_id')\
            .agg(count('message_id').alias('message_count'))\
            .orderBy(col('message_count').desc())\
            .show(2)

+---------+-------------+
|sender_id|message_count|
+---------+-------------+
|     3601|            2|
|     4500|            1|
+---------+-------------+



In [6]:
messages_df.createOrReplaceTempView('message')

spark.sql("""

        select sender_id,count(message_id) as message_count
        from message
          where sent_date like '2022-08%'
          group by sender_id
          order by 2 desc
""").show()

+---------+-------------+
|sender_id|message_count|
+---------+-------------+
|     3601|            2|
|     4500|            1|
+---------+-------------+

