# 5. Evaluation – Avaliação do Modelo de Churn

Nesta etapa avaliamos o desempenho do modelo treinado
e interpretamos seus resultados sob a perspectiva de negócio.

Além das métricas estatísticas, analisamos o impacto
estratégico da previsão de churn.


In [0]:
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator
from pyspark.sql.functions import when, col
from pyspark.ml.functions import vector_to_array

In [0]:
df = spark.table("`crisp-dm`.gold_churn")

display(df)


In [0]:
df_model = df.select(
    "recency",
    "frequency",
    "monetary",
    "churn"
)


In [0]:
assembler = VectorAssembler(
    inputCols=["recency", "frequency", "monetary"],
    outputCol="features"
)

df_vector = assembler.transform(df_model)


In [0]:
train_df, test_df = df_vector.randomSplit([0.7, 0.3], seed=42)

print("Treino:", train_df.count())
print("Teste:", test_df.count())


In [0]:
lr = LogisticRegression(
    featuresCol="features",
    labelCol="churn"
)

model = lr.fit(train_df)


In [0]:
predictions = model.transform(test_df)

display(predictions)


## 5.1 Matriz de Confusão


In [0]:
predictions.groupBy("churn", "prediction").count().display()


In [0]:
evaluator = BinaryClassificationEvaluator(
    labelCol="churn",
    metricName="areaUnderROC"
)

auc = evaluator.evaluate(predictions)

print("AUC:", auc)


In [0]:
accuracy_eval = MulticlassClassificationEvaluator(
    labelCol="churn",
    metricName="accuracy"
)

accuracy = accuracy_eval.evaluate(predictions)

print("Accuracy:", accuracy)


In [0]:
precision_eval = MulticlassClassificationEvaluator(
    labelCol="churn",
    metricName="weightedPrecision"
)

recall_eval = MulticlassClassificationEvaluator(
    labelCol="churn",
    metricName="weightedRecall"
)

precision = precision_eval.evaluate(predictions)
recall = recall_eval.evaluate(predictions)

print("Precision:", precision)
print("Recall:", recall)


In [0]:
predictions_array = predictions.withColumn(
    "prob_array",
    vector_to_array(col("probability"))
)

predictions_custom = predictions_array.withColumn(
    "prediction_custom",
    when(col("prob_array")[1] > 0.3, 1).otherwise(0)
)

predictions_custom.groupBy("churn", "prediction_custom").count().display()


In [0]:
top_risk = (
    predictions
    .withColumn("prob_churn", vector_to_array(col("probability"))[1])
    .orderBy(col("prob_churn").desc())
)

display(
    top_risk.select(
        "recency",
        "frequency",
        "monetary",
        "prob_churn"
    ).limit(20)
)

## Conclusão da Avaliação

O modelo foi avaliado utilizando AUC, Accuracy, Precision e Recall.

A interpretação das métricas deve considerar possível desbalanceamento
da variável churn.

Clientes com maior probabilidade prevista podem ser priorizados
em campanhas de retenção.
