# PayPal - Peer-to-Peer Social Sharing Impact Analysis

```SQL
CREATE TABLE fct_transactions (
    transaction_id INTEGER,
    user_id INTEGER,
    transaction_date DATE,
    amount DECIMAL
);

CREATE TABLE fct_social_shares (
    share_id INTEGER,
    user_id INTEGER,
    share_date DATE
);

INSERT INTO fct_transactions (transaction_id, user_id, transaction_date, amount)
VALUES
    (1, 1, '2024-10-02', 20.5),
    (2, 1, '2024-11-15', 35.0),
    (3, 1, '2024-12-10', 50.0),
    (4, 2, '2024-10-05', 15.0),
    (5, 2, '2024-10-20', 22.0),
    (6, 3, '2024-11-01', 40.0),
    (7, 3, '2024-12-25', 60.0),
    (8, 4, '2024-12-31', 80.0),
    (9, 5, '2024-10-15', 25.0),
    (10, 5, '2024-11-20', 55.0),
    (11, 5, '2024-11-22', 65.0),
    (12, 6, '2024-10-30', 30.0),
    (13, 6, '2024-12-05', 90.0),
    (14, 2, '2024-09-30', 100.0),
    (15, 7, '2024-12-12', 45.0),
    (16, 7, '2024-11-08', 55.0),
    (17, 3, '2024-09-15', 75.0),
    (18, 8, '2024-09-20', 100.0);

INSERT INTO fct_social_shares (share_id, user_id, share_date)
VALUES
    (2, 2, '2024-11-10'),
    (4, 5, '2024-11-23'),
    (5, 5, '2024-12-01'),
    (6, 7, '2024-12-13'),
    (8, 6, '2024-10-29'),
    (9, 8, '2024-10-15'),
    (10, 2, '2024-09-30');


SELECT * FROM fct_transactions;

SELECT * FROM fct_social_shares;
```

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_social = pd.read_csv('Data/030/fct_social_shares.csv', parse_dates=['share_date'])
df_transaction = pd.read_csv('Data/030/fct_transactions.csv', parse_dates=['transaction_date'])

df_social.head()

Unnamed: 0,share_id,user_id,share_date
0,2,2,2024-11-10
1,4,5,2024-11-23
2,5,5,2024-12-01
3,6,7,2024-12-13
4,8,6,2024-10-29


In [3]:
df_transaction.head()

Unnamed: 0,transaction_id,user_id,transaction_date,amount
0,1,1,2024-10-02,20.5
1,2,1,2024-11-15,35.0
2,3,1,2024-12-10,50.0
3,4,2,2024-10-05,15.0
4,5,2,2024-10-20,22.0


# Pregunta 1

### ¿Cuál es el valor entero inferior (floor) del promedio de transacciones por usuario realizadas entre el 1 de octubre de 2024 y el 31 de diciembre de 2024? Esto ayuda a establecer una línea base para el compromiso (engagement) del usuario en Venmo.

In [6]:
df_q3 = df_transaction[
    (df_transaction['transaction_date'].between('2024-10-01','2024-12-31'))
]

df_count = df_q3.groupby('user_id')['transaction_id'].count().reset_index(name='conteo_transacciones')

promedio = df_count['conteo_transacciones'].mean()

respuesta1 = np.floor(promedio)

respuesta1

np.float64(2.0)

```SQL
SELECT
    FLOOR(AVG(conteo_transacciones)) AS floor_avg_transactions
FROM (SELECT
          user_id,
          COUNT(transaction_id) AS conteo_transacciones
      FROM fct_transactions
      WHERE transaction_date BETWEEN '2024-10-01' AND '2024-12-31'
      GROUP BY user_id) AS subconsulta;
```

# Pregunta 2

### ¿Cuántos usuarios distintos realizaron al menos un compartido social (social share) entre el 1 de octubre de 2024 y el 31 de diciembre de 2024? Esto ayuda a evaluar la prevalencia del uso compartido social entre los usuarios activos.

In [7]:
df_social_q3 = df_social[
    (df_social['share_date'].between('2024-10-01','2024-12-31'))
]

respuesta2 = df_social_q3['user_id'].nunique()

respuesta2

5

```SQL
SELECT
    COUNT(DISTINCT user_id) AS distinct_social_users
FROM fct_social_shares
WHERE share_date BETWEEN '2024-10-01' AND '2024-12-31'
```

# Pregunta 3

### ¿Cuál es la diferencia promedio en días entre la primera y la última transacción de un usuario desde el 1 de octubre de 2024 hasta el 31 de diciembre de 2024, para usuarios que realizaron 2 transacciones frente a aquellos que realizaron 3 o más transacciones?

In [10]:
df_user =  df_transaction[
    df_transaction['transaction_date'].between('2024-10-01','2024-12-31')
].groupby('user_id').agg(
    total_trans=('transaction_id', 'count'),
    first_date = ('transaction_date', 'min'),
    last_date = ('transaction_date', 'max')
)

df_user = df_user[df_user['total_trans'] >= 2].copy()
df_user['days_diff'] = (df_user['last_date'] - df_user['first_date']).dt.days

df_user['segment'] = np.where(df_user['total_trans'] == 2, '2 Transaction', '3+ Transaction') 
resultado = df_user.groupby('segment')['days_diff'].mean()

resultado

segment
2 Transaction     34.75
3+ Transaction    53.50
Name: days_diff, dtype: float64

```SQL
SELECT
    CASE
        WHEN total_trans = 2 THEN '2 Transaction'
        ELSE '3+ Transaction'
    END AS segment,
    AVG(days_diff) AS avg_days_active
FROM (SELECT user_id,
             COUNT(transaction_id) AS total_trans,
             (MAX(transaction_date) - MIN(transaction_date)) AS days_diff
      FROM fct_transactions
      WHERE transaction_date BETWEEN '2024-10-01' AND '2024-12-31'
      GROUP BY user_id
      HAVING COUNT(transaction_date) >= 2
) AS subconsulta
GROUP BY segment;
```