# User Engagement with Photo Categorization Features

```SQL
CREATE TABLE automatic_photo_categorization (
    photo_id INTEGER,
    user_id INTEGER,
    categorization_date DATE
);

INSERT INTO automatic_photo_categorization (photo_id, user_id, categorization_date)
VALUES
    (1, 1, '2024-01-03'), (2, 2, '2024-01-15'), (3, 3, '2024-01-20'),
    (4, 4, '2024-02-05'), (5, 5, '2024-02-10'), (6, 1, '2024-02-15'),
    (7, 2, '2024-02-20'), (8, 3, '2024-03-01'), (9, 4, '2024-03-05'),
    (10, 5, '2024-03-10'), (11, 6, '2024-01-08'), (12, 7, '2024-01-18'),
    (13, 8, '2024-02-12'), (14, 9, '2024-02-22'), (15, 10, '2024-03-15'),
    (16, 1, '2024-03-20'), (17, 2, '2024-03-25'), (18, 3, '2024-01-12'),
    (19, 4, '2024-02-18'), (20, 5, '2024-03-22'), (21, 6, '2024-01-25'),
    (22, 7, '2024-02-28'), (23, 8, '2024-03-08'), (24, 9, '2024-01-30'),
    (25, 10, '2024-02-07'), (26, 1, '2024-03-12'), (27, 2, '2024-01-22'),
    (28, 3, '2024-02-14'), (29, 4, '2024-03-18'), (30, 5, '2024-01-27'),
    (31, 6, '2024-02-03'), (32, 7, '2024-03-05'), (33, 8, '2024-01-10'),
    (34, 9, '2024-02-25'), (35, 10, '2024-03-30'), (36, 1, '2024-02-19'),
    (37, 2, '2024-03-02'), (38, 3, '2024-01-14'), (39, 4, '2024-02-21'),
    (40, 5, '2024-03-25');

SELECT * FROM automatic_photo_categorization;
```

In [2]:
import pandas as pd
import numpy as np

In [6]:
df_photo = pd.read_csv('Data/001/auto_photo_categorization.csv')
df_photo.head()

Unnamed: 0,photo_id,user_id,categorization_date
0,1,1,2024-01-03
1,2,2,2024-01-15
2,3,3,2024-01-20
3,4,4,2024-02-05
4,5,5,2024-02-10


# Pregunta 1

### Necesitamos medir el nivel de interacción inicial de los usuarios con las funciones de categorización. ¿Cuántas fotos han sido categorizadas por el sistema en enero de 2024?

In [10]:
df_photo['categorization_date'] = pd.to_datetime(df_photo['categorization_date'])
df_photo['categorization_date'].head()

0   2024-01-03
1   2024-01-15
2   2024-01-20
3   2024-02-05
4   2024-02-10
Name: categorization_date, dtype: datetime64[ns]

In [13]:
enero_fotos = df_photo[(df_photo['categorization_date'] >= '2024-01-01') & (df_photo['categorization_date']<= '2024-01-31')]
enero_fotos.head()

Unnamed: 0,photo_id,user_id,categorization_date
0,1,1,2024-01-03
1,2,2,2024-01-15
2,3,3,2024-01-20
10,11,6,2024-01-08
11,12,7,2024-01-18


In [17]:
total_enero = len(enero_fotos)
print(f"El número de fotos categorizadas en enero de 2024 es: {total_enero}")

El número de fotos categorizadas en enero de 2024 es: 12


In [19]:
total_enero = df_photo[(df_photo['categorization_date'].dt.year == 2024) & (df_photo['categorization_date'].dt.month == 1)].shape[0]
total_enero

12

```SQL
SELECT count(*)
FROM automatic_photo_categorization
WHERE categorization_date BETWEEN '2024-01-01' AND '2024-01-31';
```

```SQL
SELECT count(*)
FROM automatic_photo_categorization
WHERE EXTRACT(MONTH FROM categorization_date) = 1
AND EXTRACT(YEAR FROM categorization_date) = 2024;
```


# Pregunta 2

### ¿Cuál es el número total de usuarios únicos que han interactuado con la función de categorización en febrero de 2024?

In [None]:
df_photo_unique

In [22]:
total_unique_feb = df_photo[(df_photo['categorization_date'].dt.year == 2024) & (df_photo['categorization_date'].dt.month == 2)]['user_id'].nunique()
total_unique_feb

10

In [27]:
usuarios_febrero = df_photo[
    (df_photo['categorization_date'].dt.year == 2024) & 
    (df_photo['categorization_date'].dt.month == 2)
]['user_id'].unique() # Devuelve un array con los IDs sin repetir

total_unique_feb = len(usuarios_febrero)

usuarios_febrero

array([ 4,  5,  1,  2,  8,  9,  7, 10,  3,  6])

```SQL
SELECT COUNT(DISTINCT(user_id)) FROM automatic_photo_categorization WHERE EXTRACT(MONTH FROM  categorization_date) = 2 AND EXTRACT(YEAR FROM categorization_date) = 2024;
```

# Pregutna 3

### Para marzo de 2024, calcula el número total de fotos categorizadas por usuario y renombra la columna resultante como total_categorized_photos. Queremos identificar a los usuarios más activos con fines de investigación de usuarios.

In [29]:
df_marzo = df_photo[
    (df_photo['categorization_date'].dt.year == 2024) &
    (df_photo['categorization_date'].dt.month == 3)
]

df_marzo.head()

Unnamed: 0,photo_id,user_id,categorization_date
7,8,3,2024-03-01
8,9,4,2024-03-05
9,10,5,2024-03-10
14,15,10,2024-03-15
15,16,1,2024-03-20


In [33]:
resumen_marzo = df_marzo.groupby('user_id')['photo_id'].count().reset_index()
resumen_marzo.columns = ['user_id', 'total_categorized_photos']

print(resumen_marzo.sort_values(by='total_categorized_photos', ascending=False))

   user_id  total_categorized_photos
4        5                         3
0        1                         2
3        4                         2
1        2                         2
7       10                         2
2        3                         1
5        7                         1
6        8                         1


```SQL
SELECT
    user_id,
    count(photo_id) AS total_categorized_photos
FROM automatic_photo_categorization
WHERE EXTRACT(MONTH FROM categorization_date) = 3
AND EXTRACT(YEAR FROM categorization_date) = 2024
GROUP BY user_id;
```