# Google - Gmail Label Usage and User Efficiency

```SQL
CREATE TABLE emails (
    email_id INTEGER,
    label_id INTEGER
);

CREATE TABLE email_labels (
    label_id INTEGER,
    user_id INTEGER,
    created_date DATE
);

INSERT INTO emails (email_id, label_id)
VALUES
    (1, 1), (2, 1), (3, 2), (4, 2), (5, 2), (6, 3), (7, 3), (8, 3), (9, 3), (10, 3),
    (11, 4), (12, 4), (13, 4), (14, 4), (15, 4), (16, 4), (17, 4), (18, 4), (19, 4), (20, 4),
    (21, 5), (22, 5), (23, 5), (24, 6), (25, 6), (26, 6), (27, 7), (28, 8), (29, 9), (30, 10),
    (31, 11), (32, 12), (33, 13), (34, 14), (35, 15), (36, 15), (37, 15), (38, 15), (39, 15), (40, 15);

INSERT INTO email_labels (label_id, user_id, created_date)
VALUES
    (1, 101, '2024-10-05'),
    (2, 102, '2024-10-10'),
    (3, 103, '2024-10-15'),
    (4, 101, '2024-10-20'),
    (5, 104, '2024-10-25'),
    (6, 105, '2024-10-30'),
    (7, 106, '2024-11-01'),
    (8, 107, '2024-11-05'),
    (9, 108, '2024-11-10'),
    (10, 109, '2024-11-15'),
    (11, 110, '2024-11-20'),
    (12, 111, '2024-11-25'),
    (13, 112, '2024-12-01'),
    (14, 113, '2024-12-05'),
    (15, 114, '2024-12-10');
```

In [1]:
import pandas as pd  
import numpy as np

In [5]:
df_emails = pd.read_csv('Data/003/emails.csv')
df_email_labels = pd.read_csv('Data/003/email_labels.csv')

In [6]:
df_emails.head()

Unnamed: 0,email_id,label_id
0,1,1
1,2,1
2,3,2
3,4,2
4,5,2


In [7]:
df_email_labels.head()

Unnamed: 0,label_id,user_id,created_date
0,1,101,2024-10-05
1,2,102,2024-10-10
2,3,103,2024-10-15
3,4,101,2024-10-20
4,5,104,2024-10-25


# Pregunta 1

### ¿Puedes averiguar la cantidad de etiquetas creadas por cada usuario? Estamos interesados en entender cuántas etiquetas crean los usuarios normalmente para gestionar sus correos electrónicos

In [9]:
df_conteo = df_email_labels.groupby('user_id')['label_id'].count().reset_index()

df_conteo.columns = ['user_id', 'total_labels']

df_conteo

Unnamed: 0,user_id,total_labels
0,101,2
1,102,1
2,103,1
3,104,1
4,105,1
5,106,1
6,107,1
7,108,1
8,109,1
9,110,1


```SQL 
SELECT
    user_id,
    count(label_id) AS total_labels
FROM email_labels
GROUP BY user_id;
```

# Pregunta 2

### Tu equipo quiere saber qué etiquetas tienen más de 5 correos electrónicos asignados. ¿Puedes obtenerlas?

In [10]:
conteo_labels = df_emails.groupby('label_id').count().reset_index()

resultado = conteo_labels[conteo_labels['email_id'] > 5]

resultado

Unnamed: 0,label_id,email_id
3,4,10
14,15,6


```SQL
 SELECT
    label_id,
    count(email_id) AS total_emails
FROM emails
GROUP BY label_id
HAVING count(email_id)>5;
```

# Pregunta 3

### Para las etiquetas creadas en octubre de 2024, determina la cantidad de correos electrónicos asociados a cada una. Si alguna etiqueta creada en octubre no tiene correos asociados, inclúyela de todos modos en el resultado. Esto nos ayudará a entender la distribución del uso de correos en las etiquetas.

In [12]:
df_email_labels['created_date'] = pd.to_datetime(df_email_labels['created_date'])

In [13]:
labels_octubre = df_email_labels[
    (df_email_labels['created_date'].dt.year == 2024) &
    (df_email_labels['created_date'].dt.month == 10)
]

In [14]:
df_unido = pd.merge(labels_octubre, df_emails, on='label_id', how='left')

In [15]:
resultado = df_unido.groupby('label_id')['email_id'].count().reset_index()
resultado.columns = ['label_id','total_emails']
resultado

Unnamed: 0,label_id,total_emails
0,1,2
1,2,3
2,3,5
3,4,10
4,5,3
5,6,3


```SQL
SELECT
    l.label_id,
    count(e.email_id) AS total_emails 
FROM email_labels l
LEFT JOIN emails e ON l.label_id = e.label_id
WHERE EXTRACT(YEAR FROM l.created_date) = 2024
  AND EXTRACT(MONTH FROM l.created_date) = 10
GROUP BY l.label_id;
```