# Personlacion de Campaña de Marketing Segmentación de Clientes

En esta tarea, el objetivo es aplicar **Machine Learning** para realizar una segmentación precisa de los **10,000 clientes** previamente seleccionados a través del  algoritmo de recomendación. El enfoque que utilizaremos es el algoritmo **K-Nearest Neighbors (KNN)**, el cual nos permitirá agrupar a los clientes en segmentos bien definidos, basándonos en características demográficas y de comportamiento proporcionadas por el equipo de marketing.

### Variables para la Segmentación

Para la segmentación, se utilizarán las siguientes características clave:

- Edad
- Sexo
- Nivel de ingresos
- Comportamiento y preferencias de compra

### Objetivo de la Segmentación

El objetivo de este análisis es generar **grupos de clientes** que compartan características similares, de modo que el equipo de marketing, liderado por **Erin**, pueda personalizar las creatividades y los mensajes de la campaña de e-mail de manera más efectiva. Al segmentar estos clientes, se espera maximizar la relevancia de las comunicaciones y aumentar las tasas de conversión.

En las próximas secciones se detallará el proceso de implementación del modelo KNN, así como los resultados obtenidos en la generación de estos grupos.




In [2]:
import numpy as np 
import pandas as pd
pd.options.display.float_format = '{:,.3f}'.format
pd.set_option('display.max_columns', 100)
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.preprocessing import RobustScaler,OneHotEncoder
from sklearn.cluster import KMeans

In [7]:
# Leer dataframe con los datos de los 10000 clientes.

df_seleccionados = pd.read_parquet(r'C:\Users\Usuario\Desktop\Nuclio\TFM_Nuclio\easy_money_project\Tarea_3_Recomendación\df_seleccionados.parquet')

In [8]:
df_seleccionados

Unnamed: 0,pk_cid,recomendacion,prob_compra,precio,beneficio,cluster
226680,1119834,[credit_card],0.987,60,59.212,4
226681,88904,[credit_card],0.987,60,59.191,4
226682,1136278,[credit_card],0.985,60,59.100,4
226686,1119669,[credit_card],0.985,60,59.095,4
226685,1109597,[credit_card],0.985,60,59.095,4
...,...,...,...,...,...,...
229484,1304214,[emc_account],0.955,10,9.553,4
229483,1393343,[payroll_account],0.955,10,9.553,4
229482,1334243,[debit_card],0.955,10,9.553,4
229480,1376649,[emc_account],0.955,10,9.553,4


In [10]:
# Cargando los datos directamente de s3 de AWS
df_full_cleaned = pd.read_parquet("https://easy-money-project-bucket.s3.eu-west-3.amazonaws.com/df_full_cleaned.parquet")
# sociodemografico
sdg_df_cleaned = pd.read_parquet('https://easy-money-project-bucket.s3.eu-west-3.amazonaws.com/sociodemographic_df_adrian.parquet')


df_full_clean = df_full_cleaned.copy()
# se mergea la edad que faltaba en el df_full_cleaned 
df_full_clean = df_full_clean.merge(sdg_df_cleaned[["pk_cid", "pk_partition", "age"]], on=["pk_cid", "pk_partition"], how="inner")

In [11]:
df_full_clean


Unnamed: 0,pk_cid,pk_partition,short_term_deposit,loans,mortgage,funds,securities,long_term_deposit,em_account_pp,credit_card,payroll,pension_plan,payroll_account,emc_account,debit_card,em_account_p,em_acount,num_products_contracts,p_cuenta_bancaria,cuentas_sum,p_inversion,inversion_sum,p_financiacion,financiacion_sum,profit_cuentas,profit_inversion,profit_financiacion,country_id,gender,mes_partition,mes_nombre_partition,grupo_edad,median_salary,region_code,entry_date,entry_channel,active_customer,segment,categoria_antiguedad,age
0,1375586,2018-01-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,H,1,January,Adultos jóvenes,87218.100,Málaga,2018-01-12,Otros,1,02 - PARTICULARES,1-2 años,35
1,1050611,2018-01-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,1,January,Jóvenes,35548.740,Ciudad Real,2015-08-10,KHE,0,03 - UNIVERSITARIO,Más de 3 años,23
2,1050612,2018-01-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,1,January,Jóvenes,122179.110,Ciudad Real,2015-08-10,KHE,0,03 - UNIVERSITARIO,Más de 3 años,23
3,1050613,2018-01-28,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,40,0,ES,H,1,January,Jóvenes,119775.540,Zaragoza,2015-08-10,KHD,0,03 - UNIVERSITARIO,Más de 3 años,22
4,1050614,2018-01-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,1,January,Jóvenes,101469.135,Zaragoza,2015-08-10,KHE,1,03 - UNIVERSITARIO,Más de 3 años,23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5962919,1166765,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,5,May,Jóvenes,43912.170,Zaragoza,2016-08-14,KHE,0,03 - UNIVERSITARIO,2-3 años,22
5962920,1166764,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,5,May,Jóvenes,23334.990,"Rioja, La",2016-08-14,KHE,0,03 - UNIVERSITARIO,2-3 años,23
5962921,1166763,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,H,5,May,Adultos,87930.930,Zaragoza,2016-08-14,KHE,1,02 - PARTICULARES,2-3 años,47
5962922,1166789,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,H,5,May,Jóvenes,199592.820,Zaragoza,2016-08-14,KHE,0,03 - UNIVERSITARIO,2-3 años,22


In [14]:
df_full_clean_last = df_full_clean[df_full_clean["pk_partition"] == "2019-05-28"]

In [15]:
df_full_clean_last

Unnamed: 0,pk_cid,pk_partition,short_term_deposit,loans,mortgage,funds,securities,long_term_deposit,em_account_pp,credit_card,payroll,pension_plan,payroll_account,emc_account,debit_card,em_account_p,em_acount,num_products_contracts,p_cuenta_bancaria,cuentas_sum,p_inversion,inversion_sum,p_financiacion,financiacion_sum,profit_cuentas,profit_inversion,profit_financiacion,country_id,gender,mes_partition,mes_nombre_partition,grupo_edad,median_salary,region_code,entry_date,entry_channel,active_customer,segment,categoria_antiguedad,age
5519929,657826,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,H,5,May,Adultos,54493.380,Lleida,2015-05-24,Otros,1,02 - PARTICULARES,Más de 3 años,44
5519930,657817,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,ES,V,5,May,Adultos jóvenes,120141.600,Barcelona,2019-05-12,Otros,0,03 - UNIVERSITARIO,0-3 meses,32
5519931,657986,2019-05-28,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,6,1,4,1,1,1,1,40,40,60,ES,H,5,May,Adultos jóvenes,100993.170,Sevilla,2016-02-18,Otros,1,02 - PARTICULARES,Más de 3 años,39
5519932,657905,2019-05-28,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,2,1,1,1,1,0,0,10,40,0,ES,H,5,May,Longevos,154059.090,Madrid,2017-02-07,KAT,1,01 - TOP,2-3 años,85
5519933,657336,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,5,May,Adultos jóvenes,108223.410,Madrid,2019-03-28,KAT,1,02 - PARTICULARES,0-3 meses,38
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5962919,1166765,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,5,May,Jóvenes,43912.170,Zaragoza,2016-08-14,KHE,0,03 - UNIVERSITARIO,2-3 años,22
5962920,1166764,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,V,5,May,Jóvenes,23334.990,"Rioja, La",2016-08-14,KHE,0,03 - UNIVERSITARIO,2-3 años,23
5962921,1166763,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,H,5,May,Adultos,87930.930,Zaragoza,2016-08-14,KHE,1,02 - PARTICULARES,2-3 años,47
5962922,1166789,2019-05-28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,10,0,0,ES,H,5,May,Jóvenes,199592.820,Zaragoza,2016-08-14,KHE,0,03 - UNIVERSITARIO,2-3 años,22


In [17]:
df_full_clean_last['pk_cid'].duplicated().sum()

0

In [20]:
# Hacer el left join
df_joined = df_seleccionados.merge(df_full_clean_last, on='pk_cid', how='left')

# Mostrar el resultado
df_joined

Unnamed: 0,pk_cid,recomendacion,prob_compra,precio,beneficio,cluster,pk_partition,short_term_deposit,loans,mortgage,funds,securities,long_term_deposit,em_account_pp,credit_card,payroll,pension_plan,payroll_account,emc_account,debit_card,em_account_p,em_acount,num_products_contracts,p_cuenta_bancaria,cuentas_sum,p_inversion,inversion_sum,p_financiacion,financiacion_sum,profit_cuentas,profit_inversion,profit_financiacion,country_id,gender,mes_partition,mes_nombre_partition,grupo_edad,median_salary,region_code,entry_date,entry_channel,active_customer,segment,categoria_antiguedad,age
0,1119834,[credit_card],0.987,60,59.212,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,0.000,0.000,1.000,3.000,1.000,3.000,0.000,0.000,0.000,0.000,30.000,0.000,0.000,ES,V,5.000,May,Adultos mayores,198675.690,Madrid,2016-01-29,KFC,1.000,01 - TOP,Más de 3 años,63.000
1,88904,[credit_card],0.987,60,59.191,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,1.000,0.000,0.000,3.000,1.000,3.000,0.000,0.000,0.000,0.000,30.000,0.000,0.000,ES,H,5.000,May,Adultos,871801.680,Madrid,2015-08-08,KFA,1.000,01 - TOP,Más de 3 años,48.000
2,1136278,[credit_card],0.985,60,59.100,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,1.000,1.000,0.000,0.000,0.000,4.000,1.000,3.000,1.000,1.000,0.000,0.000,30.000,40.000,0.000,ES,V,5.000,May,Adultos,95704.380,Cantabria,2016-06-04,KAT,1.000,01 - TOP,Más de 3 años,53.000
3,1119669,[credit_card],0.985,60,59.095,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,0.000,1.000,1.000,0.000,1.000,5.000,1.000,4.000,1.000,1.000,0.000,0.000,40.000,40.000,0.000,ES,V,5.000,May,Adultos mayores,82723.080,Madrid,2016-01-28,KFC,1.000,01 - TOP,Más de 3 años,58.000
4,1109597,[credit_card],0.985,60,59.095,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,1.000,1.000,1.000,0.000,0.000,5.000,1.000,4.000,1.000,1.000,0.000,0.000,40.000,40.000,0.000,ES,V,5.000,May,Adultos mayores,413278.020,Madrid,2015-12-04,KFC,1.000,01 - TOP,Más de 3 años,59.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,1304214,[emc_account],0.955,10,9.553,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,1.000,0.000,1.000,0.000,1.000,5.000,1.000,4.000,1.000,1.000,0.000,0.000,40.000,40.000,0.000,ES,V,5.000,May,Adultos jóvenes,62895.900,Badajoz,2017-09-09,RED,1.000,02 - PARTICULARES,1-2 años,37.000
9996,1393343,[payroll_account],0.955,10,9.553,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,2.000,1.000,1.000,0.000,0.000,1.000,1.000,10.000,0.000,60.000,ES,V,5.000,May,Adultos jóvenes,88963.110,Badajoz,2018-04-21,RED,1.000,02 - PARTICULARES,1-2 años,38.000
9997,1334243,[debit_card],0.955,10,9.553,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,1.000,0.000,0.000,0.000,0.000,3.000,1.000,2.000,1.000,1.000,0.000,0.000,20.000,40.000,0.000,ES,V,5.000,May,Adultos,63117.990,Badajoz,2017-10-16,RED,1.000,02 - PARTICULARES,1-2 años,41.000
9998,1376649,[emc_account],0.955,10,9.553,4,2019-05-28,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,1.000,1.000,1.000,0.000,1.000,0.000,0.000,4.000,1.000,3.000,1.000,1.000,0.000,0.000,30.000,40.000,0.000,ES,V,5.000,May,Adultos,69536.970,Badajoz,2018-01-16,RED,1.000,02 - PARTICULARES,1-2 años,41.000


Ya tenemos el dataframe con todos los datos de los 10.000 clientes para la segmentacion 