## 1303. Find the Team Size
### Table: Employee

| Column Name   | Type |
|---------------|------|
| employee_id   | int  |
| team_id       | int  |

employee_id is the primary key for this table.  
Each row of this table contains the ID of each employee and their respective team.  
Write an SQL query to find the team size of each of the employees.

Return result table in any order.

#### Employee Table:

| employee_id | team_id |
|-------------|---------|
|     1       |    8    |
|     2       |    8    |
|     3       |    8    |
|     4       |    7    |
|     5       |    9    |
|     6       |    9    |

#### Result Table:

| employee_id | team_size |
|-------------|-----------|
|     1       |     3     |
|     2       |     3     |
|     3       |     3     |
|     4       |     1     |
|     5       |     2     |
|     6       |     2     |

Employees with Id 1,2,3 are part of a team with team_id = 8.  
Employees with Id 4 is part of a team with team_id = 7.  
Employees with Id 5,6 are part of a team with team_id = 9.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType
from pyspark.sql.functions import count, col

# Create Spark session
spark = SparkSession.builder.appName("TeamSize").getOrCreate()

# Define schema
schema = StructType([
    StructField("employee_id", IntegerType(), True),
    StructField("team_id", IntegerType(), True)
])

# Sample data
data = [
    (1, 8),
    (2, 8),
    (3, 8),
    (4, 7),
    (5, 9),
    (6, 9)
]

# Create DataFrame
df = spark.createDataFrame(data, schema)

# Register as temp view
df.createOrReplaceTempView("Employee")



In [0]:
from pyspark.sql.functions import count

# Step 1: Subquery equivalent
team_sizes = df.groupBy("team_id").agg(count("*").alias("team_size")).filter("team_size > 2")

# Step 2: Join with original
result = df.join(team_sizes, "team_id")
result.show()


In [0]:
df_1 = df.groupBy(col("team_id")).agg(count("employee_id").alias("teamsize"))
#df_1.display()
df_2 = df.selectExpr("team_id as t2_id","employee_id as e_id")
#df_2.display()
df_2.join(df_1,col("t2_id") == col("team_id"),"left").selectExpr("e_id as employee_id","teamsize").display()


In [0]:
# SQL logic
query = """
SELECT e.employee_id, t.team_size
FROM Employee e
JOIN (
    SELECT team_id, COUNT(*) AS team_size
    FROM Employee
    GROUP BY team_id
) t
ON e.team_id = t.team_id order by e.employee_id asc
"""

# Execute and display
result = spark.sql(query)
display(result)