# 🧴 Análisis de Reseñas - Skincare (Text Analytics)
Este notebook realiza una limpieza básica de texto y detecta reseñas con lenguaje positivo sobre productos de cuidado de la piel (Skincare).


In [1]:
from pyspark.sql import SparkSession
from delta import configure_spark_with_delta_pip

# Inicializar Spark con Delta
builder = SparkSession.builder \
    .appName("Skincare Reviews Analytics") \
    .master("local[*]") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

# Cargar dataset Delta limpio
df = spark.read.format("delta").load("/home/davidmc/MASTER_BIGDATA/TFM/TFM-Fragancias/data/exploitation/products/skincare_reviews")
df_pd = df.toPandas()

# Vista previa
df_pd.head()


your 131072x1 screen size is bogus. expect trouble
25/06/26 21:49:22 WARN Utils: Your hostname, DavidMartori resolves to a loopback address: 127.0.1.1; using 10.255.255.254 instead (on interface lo)
25/06/26 21:49:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


:: loading settings :: url = jar:file:/home/davidmc/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/davidmc/.ivy2/cache
The jars for the packages stored in: /home/davidmc/.ivy2/jars
io.delta#delta-spark_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5c9b4833-2186-45ba-b904-543857777e0f;1.0
	confs: [default]
	found io.delta#delta-spark_2.12;3.2.0 in central
	found io.delta#delta-storage;3.2.0 in central
	found org.antlr#antlr4-runtime;4.9.3 in central
:: resolution report :: resolve 149ms :: artifacts dl 8ms
	:: modules in use:
	io.delta#delta-spark_2.12;3.2.0 from central in [default]
	io.delta#delta-storage;3.2.0 from central in [default]
	org.antlr#antlr4-runtime;4.9.3 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   3   |   

Unnamed: 0,product_id,review_text
0,P218700,"evidently, no one has commented on this produc..."
1,P248407,it stings my skin once i apply it. i would not...
2,P248407,i’ve been using this cream for about 2 months ...
3,P248407,love this cream. feels great on my skin and i ...
4,P248407,i have repurchased this so many times! it is s...


In [2]:
import string

# Palabras positivas básicas
positive_words = {"love", "like", "amazing", "great", "perfect", "awesome", "wonderful", "excellent"}

def clean_and_detect(text):
    if not isinstance(text, str):
        return False
    # Eliminar puntuación y pasar a minúsculas
    text_clean = text.translate(str.maketrans("", "", string.punctuation)).lower()
    return any(word in text_clean for word in positive_words)

# Aplicar detección
df_pd["is_positive"] = df_pd["review_text"].apply(clean_and_detect)

# Ver primeras filas
df_pd[["product_id", "review_text", "is_positive"]].head()


Unnamed: 0,product_id,review_text,is_positive
0,P218700,"evidently, no one has commented on this produc...",True
1,P248407,it stings my skin once i apply it. i would not...,False
2,P248407,i’ve been using this cream for about 2 months ...,True
3,P248407,love this cream. feels great on my skin and i ...,True
4,P248407,i have repurchased this so many times! it is s...,False


In [3]:
print("📝 Total de reseñas:", len(df_pd))
print("👍 Reseñas positivas detectadas:", df_pd['is_positive'].sum())


📝 Total de reseñas: 599634
👍 Reseñas positivas detectadas: 414207


In [4]:
# Si hay menos de 5 positivas, mostrar todas
positivas = df_pd[df_pd["is_positive"]]
sample_size = min(5, len(positivas))

positivas[["product_id", "review_text"]].sample(sample_size, random_state=42)


Unnamed: 0,product_id,review_text
100220,P423688,my face felt smoother after using it. i don’t ...
164372,P481084,"this is my new holy grail! i have sensitive, a..."
37820,P94421,i really wanted to love this product but i’ve ...
505560,P467038,i really like this product! i have purchased ...
581246,P454794,love this so much! goes on light and smooth bu...
