# 03_EXPLORE_REPAYMENTS.ipynb
===========================

Objetivo: Explorar y entender el dataset de pagos
- ¬øQu√© son los repayments (pagos)?
- ¬øCu√°ntas transacciones de pago hay?
- ¬øQu√© componentes de revenue tenemos?
- ¬øC√≥mo se relacionan con los pr√©stamos?

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path


In [2]:
# Cargar datos
base_path = Path.cwd().parent if 'analisis_adhoc' in str(Path.cwd()) else Path.cwd()
data_path = base_path / 'data' / 'raw'

repayments = pd.read_csv(data_path / 'AE_challenge_repayments.csv')

In [4]:

print("="*80)
print("üìä DATASET: REPAYMENTS (PAGOS)")
print("="*80)

# 1. ¬øCu√°ntas transacciones de pago hay?
print(f"\n1Ô∏è‚É£ Total de transacciones de pago: {len(repayments):,}")

# 2. ¬øQu√© columnas tengo?
print(f"\n2Ô∏è‚É£ Columnas disponibles:")
for col in repayments.columns:
    print(f"   - {col}")


üìä DATASET: REPAYMENTS (PAGOS)

1Ô∏è‚É£ Total de transacciones de pago: 91,296

2Ô∏è‚É£ Columnas disponibles:
   - user_id
   - loan_id
   - event_date
   - amount_trans
   - principalamount_trans
   - interestamount_trans
   - feesamount_trans
   - penaltyamount_trans
   - taxoninterestamount_trans
   - taxonfeesamount_trans
   - taxonpenaltyamount_trans
   - repayment_transaction_id


In [5]:

# 3. Ver primeros registros
print(f"\n3Ô∏è‚É£ Primeras 10 transacciones:")
display(repayments.head(10))




3Ô∏è‚É£ Primeras 10 transacciones:


Unnamed: 0,user_id,loan_id,event_date,amount_trans,principalamount_trans,interestamount_trans,feesamount_trans,penaltyamount_trans,taxoninterestamount_trans,taxonfeesamount_trans,taxonpenaltyamount_trans,repayment_transaction_id
0,653447fc27850141b2f20297,1594355219349761440,2025-02-28T11:47:13.000Z,233.0,177.32,55.68,0.0,0.0,7.68,0.0,0.0,9f1209828461229d24d56aac406a012f
1,67d742e33af83067488611b0,1613562262206601029,2025-08-14T06:40:53.000Z,285.2,272.4,12.8,0.0,0.0,1.76,0.0,0.0,58ac4456d225b04dbd0ff50059cbe295
2,67b220a8ccfcc9f45b24df4c,1671753878018852328,2025-09-30T09:04:14.000Z,677.0,529.3,147.7,0.0,0.0,20.37,0.0,0.0,b5515ec651bc675e16a738cb7c654909
3,67b367b6cf6db569d6cabba5,1595142157422010181,2025-03-18T21:02:48.000Z,1115.0,530.36,584.64,0.0,0.0,80.64,0.0,0.0,8e09cb27846d83e74f873f6f3504ffef
4,67a97646279802db05fbc37d,1691910864328386191,2025-12-01T13:38:12.000Z,241.0,101.73,139.27,0.0,0.0,19.21,0.0,0.0,8498c63233b566125c26c85f7c40588f
5,6728648e4e33705045a053f7,1588573041868736423,2025-02-15T11:52:41.000Z,9.21,7.41,1.8,0.0,0.0,0.25,0.0,0.0,d0c9319453b2ee34f9592250302e2f82
6,6788de8fdddcd4d40a1d85d7,1633064582792390226,2025-09-29T22:13:12.000Z,466.78,444.51,22.27,0.0,0.0,3.07,0.0,0.0,e3caffdfa2b896180e9e4044447b72fc
7,65e2625ff590f8e29ad2a61f,1591123272765585251,2025-03-18T13:27:48.000Z,105.0,85.44,19.56,0.0,0.0,2.7,0.0,0.0,2b565b97de7783fd634ec15d9852130c
8,67cd1c4ce8d26cfc8d4df4a0,1691161040162090471,2025-12-02T12:42:25.000Z,300.0,300.0,0.0,0.0,0.0,0.0,0.0,0.0,ffd356b90e16a7e3d3cf20d78a29d546
9,677d6e4067bd056888e0df03,1580247362012028881,2025-03-05T15:17:23.000Z,59.0,1.0,0.0,58.0,0.0,0.0,8.0,0.0,0a208f1b1074ba62f206a4c05978cc9a


In [6]:
# 4. ¬øHay datos faltantes?
print(f"\n4Ô∏è‚É£ Valores nulos por columna:")
print(repayments.isnull().sum())


4Ô∏è‚É£ Valores nulos por columna:
user_id                      0
loan_id                      0
event_date                   0
amount_trans                 0
principalamount_trans        0
interestamount_trans         0
feesamount_trans             0
penaltyamount_trans          0
taxoninterestamount_trans    0
taxonfeesamount_trans        0
taxonpenaltyamount_trans     0
repayment_transaction_id     0
dtype: int64


In [None]:

# 5. ¬øA cu√°ntos pr√©stamos corresponden estos pagos?
print(f"\n5Ô∏è‚É£ RELACI√ìN CON PR√âSTAMOS:")
unique_loans_with_payments = repayments['loan_id'].nunique()
print(f"   - Pr√©stamos con al menos 1 pago: {unique_loans_with_payments:,}")
print(f"   - Pagos promedio por pr√©stamo: {len(repayments) / unique_loans_with_payments:.1f}")



5Ô∏è‚É£ RELACI√ìN CON PR√âSTAMOS:
   - Pr√©stamos con al menos 1 pago: 26,497
   - Pagos promedio por pr√©stamo: 3.4


In [15]:

# Cargar loans para comparar
loans = pd.read_csv(data_path / 'AE_challenge_loans.csv')
# Pr√©stamos √∫nicos totales
total_loans = loans['loan_id'].nunique()

# Pr√©stamos CON pagos
loans_with_payments = repayments['loan_id'].nunique()

# Pr√©stamos SIN pagos
loans_without_payments = total_loans - loans_with_payments

print("="*80)
print("üîç PR√âSTAMOS CON vs SIN PAGOS")
print("="*80)

print(f"\nTotal de pr√©stamos: {total_loans:,}")
print(f"Pr√©stamos CON pagos: {loans_with_payments:,} ({loans_with_payments/total_loans*100:.1f}%)")
print(f"Pr√©stamos SIN pagos: {loans_without_payments:,} ({loans_without_payments/total_loans*100:.1f}%)")


üîç PR√âSTAMOS CON vs SIN PAGOS

Total de pr√©stamos: 29,222
Pr√©stamos CON pagos: 26,497 (90.7%)
Pr√©stamos SIN pagos: 2,725 (9.3%)


In [17]:
print("Columnas reales en repayments:")
print(repayments.columns.tolist())

Columnas reales en repayments:
['user_id', 'loan_id', 'event_date', 'amount_trans', 'principalamount_trans', 'interestamount_trans', 'feesamount_trans', 'penaltyamount_trans', 'taxoninterestamount_trans', 'taxonfeesamount_trans', 'taxonpenaltyamount_trans', 'repayment_transaction_id', 'mes']


In [18]:
# 6. ¬øQu√© tipos de componentes de ingreso hay?
print(f"\n6Ô∏è‚É£ COMPONENTES DE REVENUE:")

# Nombres correctos de las columnas
revenue_columns = {
    'interestamount_trans': 'Intereses',
    'feesamount_trans': 'Comisiones', 
    'penaltyamount_trans': 'Penalidades',
    'taxoninterestamount_trans': 'IVA Intereses',
    'taxonfeesamount_trans': 'IVA Comisiones',
    'taxonpenaltyamount_trans': 'IVA Penalidades'
}

for col, label in revenue_columns.items():
    total = repayments[col].sum()
    avg = repayments[col].mean()
    print(f"   {label:20} ‚Üí Total: ${total:>15,.2f}  |  Promedio: ${avg:>10,.2f}")

# Calcular revenue total (TODO lo que genera ingreso)
repayments['revenue_total'] = (
    repayments['interestamount_trans'] + 
    repayments['feesamount_trans'] + 
    repayments['penaltyamount_trans'] + 
    repayments['taxoninterestamount_trans'] +
    repayments['taxonfeesamount_trans'] +
    repayments['taxonpenaltyamount_trans']
)

print(f"\n   {'REVENUE TOTAL':20} ‚Üí ${repayments['revenue_total'].sum():>15,.2f}")




6Ô∏è‚É£ COMPONENTES DE REVENUE:
   Intereses            ‚Üí Total: $   4,269,624.23  |  Promedio: $     46.77
   Comisiones           ‚Üí Total: $     574,041.17  |  Promedio: $      6.29
   Penalidades          ‚Üí Total: $           0.00  |  Promedio: $      0.00
   IVA Intereses        ‚Üí Total: $     588,912.72  |  Promedio: $      6.45
   IVA Comisiones       ‚Üí Total: $      79,178.03  |  Promedio: $      0.87
   IVA Penalidades      ‚Üí Total: $           0.00  |  Promedio: $      0.00

   REVENUE TOTAL        ‚Üí $   5,511,756.15


In [19]:
# 7. ¬øCu√°l es el componente m√°s importante?
print(f"\n7Ô∏è‚É£ COMPOSICI√ìN DEL REVENUE (%):")
total_rev = repayments['revenue_total'].sum()
for col, label in revenue_columns.items():
    pct = (repayments[col].sum() / total_rev * 100)
    print(f"   {label:20} ‚Üí {pct:>6.2f}%")


7Ô∏è‚É£ COMPOSICI√ìN DEL REVENUE (%):
   Intereses            ‚Üí  77.46%
   Comisiones           ‚Üí  10.41%
   Penalidades          ‚Üí   0.00%
   IVA Intereses        ‚Üí  10.68%
   IVA Comisiones       ‚Üí   1.44%
   IVA Penalidades      ‚Üí   0.00%


In [21]:

# 8. ¬øCu√°ndo ocurrieron estos pagos?
repayments['event_date'] = pd.to_datetime(repayments['event_date'])
repayments['mes_pago'] = repayments['event_date'].dt.to_period('M')

print(f"\n8Ô∏è‚É£ PAGOS POR MES:")
pagos_por_mes = repayments.groupby('mes_pago').agg({
    'repayment_transaction_id': 'count',
    'revenue_total': 'sum'
}).round(2)
pagos_por_mes.columns = ['Num_Transacciones', 'Revenue_Total']
print(pagos_por_mes.sort_index())


8Ô∏è‚É£ PAGOS POR MES:
          Num_Transacciones  Revenue_Total
mes_pago                                  
2025-01                1312       53411.30
2025-02                5100      259980.12
2025-03                9331      509256.03
2025-04               10374      582553.74
2025-05                9081      520747.17
2025-06               11070      679605.06
2025-07                9355      592494.51
2025-08                7863      480895.02
2025-09                8063      515080.33
2025-10                7118      456959.55
2025-11                6387      425721.77
2025-12                6242      435051.55


  repayments['mes_pago'] = repayments['event_date'].dt.to_period('M')


In [22]:
# 9. ¬øHay pagos de capital?
print(f"\n9Ô∏è‚É£ PAGOS DE CAPITAL (principal):")
print(f"   - Total pagado: ${repayments['principalamount_trans'].sum():,.2f}")
print(f"   - Promedio por transacci√≥n: ${repayments['principalamount_trans'].mean():,.2f}")



9Ô∏è‚É£ PAGOS DE CAPITAL (principal):
   - Total pagado: $26,758,941.47
   - Promedio por transacci√≥n: $293.10


In [23]:

# 10. Ejemplo: Ver todos los pagos de UN pr√©stamo
ejemplo_loan = repayments['loan_id'].iloc[0]
ejemplo_pagos = repayments[repayments['loan_id'] == ejemplo_loan].sort_values('event_date')

print(f"\nüîü EJEMPLO: Pagos del pr√©stamo {ejemplo_loan}")
print(f"   Total de pagos: {len(ejemplo_pagos)}")
print("\n   Detalle de pagos:")
display(ejemplo_pagos[['event_date', 'principalamount_trans', 'interestamount_trans', 'feesamount_trans', 'penaltyamount_trans', 'revenue_total']])


üîü EJEMPLO: Pagos del pr√©stamo 1594355219349761440
   Total de pagos: 1

   Detalle de pagos:


Unnamed: 0,event_date,principalamount_trans,interestamount_trans,feesamount_trans,penaltyamount_trans,revenue_total
0,2025-02-28 11:47:13+00:00,177.32,55.68,0.0,0.0,63.36


In [24]:

print("\n" + "="*80)
print("‚úÖ CONCLUSIONES")
print("="*80)
print(f"""
Total transacciones: {len(repayments):,}
Pr√©stamos con pagos: {unique_loans_with_payments:,}
Revenue total generado: ${repayments['revenue_total'].sum():,.2f}

üí∞ COMPONENTES DE REVENUE:
- Intereses: Principal fuente de ingreso
- Comisiones (fees): Segundo componente
- Penalidades: Por pagos tard√≠os
- Impuestos: Adicional sobre los servicios

üîó RELACI√ìN CON LOANS:
Cada fila = 1 transacci√≥n de pago de un pr√©stamo
M√∫ltiples pagos pueden existir por pr√©stamo (pagos parciales, cuotas)
""")



‚úÖ CONCLUSIONES

Total transacciones: 91,296
Pr√©stamos con pagos: 26,497
Revenue total generado: $5,511,756.15

üí∞ COMPONENTES DE REVENUE:
- Intereses: Principal fuente de ingreso
- Comisiones (fees): Segundo componente
- Penalidades: Por pagos tard√≠os
- Impuestos: Adicional sobre los servicios

üîó RELACI√ìN CON LOANS:
Cada fila = 1 transacci√≥n de pago de un pr√©stamo
M√∫ltiples pagos pueden existir por pr√©stamo (pagos parciales, cuotas)



In [26]:
# 11. Agregaci√≥n por pr√©stamo (lo que hace dbt)
print("\n" + "="*80)
print("üéØ AGREGACI√ìN POR PR√âSTAMO (lo que hace dbt)")
print("="*80)

repayments_agg = repayments.groupby('loan_id').agg({
    'repayment_transaction_id': 'count',
    'principalamount_trans': 'sum',
    'interestamount_trans': 'sum',
    'feesamount_trans': 'sum',
    'penaltyamount_trans': 'sum',
    'taxoninterestamount_trans': 'sum',
    'taxonfeesamount_trans': 'sum',
    'taxonpenaltyamount_trans': 'sum',
    'revenue_total': 'sum'
}).round(2)

repayments_agg.columns = [
    'num_pagos', 
    'principal_total', 
    'interest_total', 
    'fee_total', 
    'penalty_total', 
    'tax_interest_total',
    'tax_fees_total',
    'tax_penalty_total',
    'revenue_total'
]

print(f"\nPrimeros 10 pr√©stamos agregados:")
display(repayments_agg.head(10))

print(f"""
üí° TRANSFORMACI√ìN CLAVE:
De {len(repayments):,} transacciones individuales
A {len(repayments_agg):,} pr√©stamos con totales agregados

Esto es exactamente lo que hace: int_loan_repayments_agg.sql

CONCEPTO:
- Antes: Muchas filas por pr√©stamo (1 por cada pago)
- Despu√©s: 1 fila por pr√©stamo (con SUMA de todos sus pagos)
""")


üéØ AGREGACI√ìN POR PR√âSTAMO (lo que hace dbt)

Primeros 10 pr√©stamos agregados:


Unnamed: 0_level_0,num_pagos,principal_total,interest_total,fee_total,penalty_total,tax_interest_total,tax_fees_total,tax_penalty_total,revenue_total
loan_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1576125439529146951,1,525.43,117.55,232.0,0.0,16.21,32.0,0.0,397.76
1576243277606037629,1,299.0,0.0,58.0,0.0,0.0,8.0,0.0,66.0
1576248312676539117,3,368.35,167.65,58.0,0.0,23.13,8.0,0.0,256.78
1576290024839510647,1,3267.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1576293755050773816,2,504.99,0.0,58.0,0.0,0.0,8.0,0.0,66.0
1576338876673445477,6,635.0,112.72,0.0,0.0,15.54,0.0,0.0,128.26
1576341642440826733,2,340.0,33.13,0.0,0.0,4.57,0.0,0.0,37.7
1576346441525034318,4,901.33,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1576351632625651492,7,511.62,52.58,0.0,0.0,7.25,0.0,0.0,59.83
1576357023369715555,4,1854.79,777.35,58.0,0.0,107.21,8.0,0.0,950.56



üí° TRANSFORMACI√ìN CLAVE:
De 91,296 transacciones individuales
A 26,497 pr√©stamos con totales agregados

Esto es exactamente lo que hace: int_loan_repayments_agg.sql

CONCEPTO:
- Antes: Muchas filas por pr√©stamo (1 por cada pago)
- Despu√©s: 1 fila por pr√©stamo (con SUMA de todos sus pagos)

