# X - Fan Engagement Metrics for Sports Coverage

```SQL
CREATE TABLE dim_sports_categories (
    category_id INTEGER,
    category_name VARCHAR
);

CREATE TABLE fct_user_interactions (
    interaction_id INTEGER,
    user_id INTEGER,
    content_type VARCHAR,
    interaction_duration INTEGER,
    interaction_date DATE,
    category_id INTEGER
);

INSERT INTO dim_sports_categories (category_id, category_name)
VALUES
    (1, 'Football'),
    (2, 'Basketball'),
    (3, 'Baseball'),
    (4, 'Tennis'),
    (5, 'Hockey'),
    (6, 'Soccer'),
    (7, 'Cricket'),
    (8, 'Rugby'),
    (9, 'Golf'),
    (10, 'Formula 1');

INSERT INTO fct_user_interactions (interaction_id, user_id, content_type, interaction_duration, interaction_date, category_id)
VALUES
    (1, 1, 'live sports commentary', 130, '2024-04-05', 1),
    (2, 1, 'live sports commentary', 138, '2024-04-12', 2),
    (3, 2, 'live sports commentary', 140, '2024-04-15', 3),
    (4, 3, 'live sports commentary', 136, '2024-04-18', 4),
    (5, 4, 'live sports commentary', 136, '2024-04-28', 5),
    (6, 5, 'live sports commentary', 150, '2024-05-02', 6),
    (7, 6, 'highlights', 80, '2024-05-03', 7),
    (8, 7, 'live sports commentary', 120, '2024-05-04', 8),
    (9, 8, 'highlights', 90, '2024-05-05', 9),
    (10, 9, 'live sports commentary', 130, '2024-05-06', 10),
    (11, 10, 'live sports commentary', 140, '2024-05-07', 1),
    (12, 11, 'highlights', 70, '2024-05-08', 2),
    (13, 12, 'live sports commentary', 155, '2024-05-09', 3),
    (14, 13, 'highlights', 85, '2024-05-10', 4),
    (15, 14, 'live sports commentary', 145, '2024-05-11', 5),
    (16, 15, 'highlights', 95, '2024-05-12', 6),
    (17, 16, 'live sports commentary', 125, '2024-05-13', 7),
    (18, 17, 'highlights', 100, '2024-05-14', 8),
    (19, 18, 'live sports commentary', 135, '2024-05-15', 9),
    (20, 19, 'highlights', 110, '2024-05-16', 10),
    (21, 20, 'live sports commentary', 132, '2024-05-17', 1),
    (22, 21, 'highlights', 88, '2024-05-18', 2),
    (23, 22, 'live sports commentary', 142, '2024-05-19', 3),
    (24, 23, 'highlights', 77, '2024-05-20', 4),
    (25, 24, 'live sports commentary', 138, '2024-05-21', 5),
    (26, 25, 'highlights', 83, '2024-05-22', 6),
    (27, 26, 'live sports commentary', 147, '2024-05-23', 7),
    (28, 27, 'highlights', 92, '2024-05-24', 8),
    (29, 28, 'live sports commentary', 136, '2024-05-25', 9),
    (30, 29, 'highlights', 99, '2024-05-26', 10);

SELECT * FROM dim_sports_categories;

SELECT * FROM fct_user_interactions;
```

In [1]:
import pandas as pd
import numpy as np

In [7]:
df_sport = pd.read_csv('Data/026/dim_sports_categories.csv')
df_user = pd.read_csv('Data/026/fct_user_interactions.csv', parse_dates=['interaction_date'])

df_sport.head()


Unnamed: 0,category_id,category_name
0,1,Football
1,2,Basketball
2,3,Baseball
3,4,Tennis
4,5,Hockey


In [8]:
df_user.head()

Unnamed: 0,interaction_id,user_id,content_type,interaction_duration,interaction_date,category_id
0,1,1,live sports commentary,130,2024-04-05,1
1,2,1,live sports commentary,138,2024-04-12,2
2,3,2,live sports commentary,140,2024-04-15,3
3,4,3,live sports commentary,136,2024-04-18,4
4,5,4,live sports commentary,136,2024-04-28,5


# Pregunta 1 

### "¿Cuál es la duración promedio de las interacciones de los usuarios con los comentarios deportivos en vivo ('live sports commentary') durante abril de 2024? Redondee el resultado al número entero más cercano.

In [9]:
df_abril = df_user[
    (df_user['content_type'] == 'live sports commentary') &
    (df_user['interaction_date'].between('2024-04-01','2024-04-30'))
]

respuesta1 = df_abril['interaction_duration'].mean().round(0)

respuesta1

np.float64(136.0)

```SQL
SELECT
    ROUND(AVG(interaction_duration),0) AS avg_interaction_duration
FROM fct_user_interactions
WHERE content_type = 'live sports commentary'
AND interaction_date BETWEEN '2024-04-01' AND '2024-04-30'
```

# Pregunta 2

### Para el mes de mayo de 2024, determine el número total de usuarios que interactuaron con 'live sports commentary' y 'highlights'. Asegúrese de incluir a los usuarios que interactuaron con cualquiera de los dos tipos de contenido o con ambos.

In [16]:
df_mayo = df_user[
    (df_user['content_type'].isin(['live sports commentary','highlights'])) &
    (df_user['interaction_date'].between('2024-05-01','2024-05-31'))
].copy()

respuesta2 = df_mayo['user_id'].nunique()

respuesta2

25

```SQL
SELECT
    COUNT(DISTINCT user_id) AS total_user_may
FROM fct_user_interactions
WHERE content_type in ('live sports commentary','highlights')
AND interaction_date BETWEEN '2024-05-01' AND '2024-05-31';
```

# Pregunta 3

### Identifique las 3 categorías deportivas con mejor desempeño para 'live sports commentary' (comentarios deportivos en vivo), basándose en el compromiso (engagement) de los usuarios en mayo de 2024. Enfóquese en aquellas con el mayor tiempo total de interacción.

In [21]:
df_merge = df_user.merge(df_sport, on='category_id')

# Corregimos 'commentary' (doble m)
df_may = df_merge[
    (df_merge['interaction_date'].between('2024-05-01','2024-05-31')) &
    (df_merge['content_type'] == 'live sports commentary')
]

# Agrupamos y sumamos
respuesta3 = df_may.groupby('category_name')['interaction_duration'].sum()

# En una Serie, sort_values no necesita el parámetro 'by'
respuesta3 = respuesta3.sort_values(ascending=False).head(3)

respuesta3

category_name
Baseball    297
Hockey      283
Football    272
Name: interaction_duration, dtype: int64

```SQL
SELECT
    s.category_name,
    SUM(interaction_duration) AS sum_interaction_duration
FROM dim_sports_categories s
JOIN fct_user_interactions u ON s.category_id = u.category_id
WHERE u.interaction_date BETWEEN '2024-05-01' AND '2024-05-31'
AND u.content_type = 'live sports commentary'
GROUP BY s.category_name
ORDER BY  sum_interaction_duration DESC
LIMIT 3;
```