# 🧴 Análisis de Reseñas - Skincare (Text Analytics)
Este notebook realiza una limpieza básica de texto y detecta reseñas con lenguaje positivo sobre productos de cuidado de la piel (Skincare).


In [8]:
import os
from pyspark.sql import SparkSession
from delta import configure_spark_with_delta_pip

# Inicializar Spark con Delta
builder = SparkSession.builder \
    .appName("Skincare Reviews Analytics") \
    .master("local[*]") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

# Cargar dataset Delta limpio

ruta = os.path.expanduser("~/MASTER_BIGDATA/TFM/TFM-Fragancias/data/exploitation/products/skincare_reviews")
df = spark.read.format("delta").load(ruta)
df_pd = df.toPandas()

# Vista previa
df_pd.head()


25/06/27 17:52:12 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
                                                                                

Unnamed: 0,product_id,review_text
0,P427417,i love this brand & product. it has made my sk...
1,P427417,i gave it 2 months and used 2 bottles but this...
2,P427417,i’m an esthetician and i so wanted badly to lo...
3,P427417,reading through all the reviews (on the basis ...
4,P427417,this line of products never disappoints. this...


In [9]:
import string

# Palabras positivas básicas
positive_words = {"love", "like", "amazing", "great", "perfect", "awesome", "wonderful", "excellent"}

def clean_and_detect(text):
    if not isinstance(text, str):
        return False
    # Eliminar puntuación y pasar a minúsculas
    text_clean = text.translate(str.maketrans("", "", string.punctuation)).lower()
    return any(word in text_clean for word in positive_words)

# Aplicar detección
df_pd["is_positive"] = df_pd["review_text"].apply(clean_and_detect)

# Ver primeras filas
df_pd[["product_id", "review_text", "is_positive"]].head()


Unnamed: 0,product_id,review_text,is_positive
0,P427417,i love this brand & product. it has made my sk...,True
1,P427417,i gave it 2 months and used 2 bottles but this...,False
2,P427417,i’m an esthetician and i so wanted badly to lo...,True
3,P427417,reading through all the reviews (on the basis ...,True
4,P427417,this line of products never disappoints. this...,False


In [10]:
print("📝 Total de reseñas:", len(df_pd))
print("👍 Reseñas positivas detectadas:", df_pd['is_positive'].sum())


📝 Total de reseñas: 599634
👍 Reseñas positivas detectadas: 414207


In [11]:
# Si hay menos de 5 positivas, mostrar todas
positivas = df_pd[df_pd["is_positive"]]
sample_size = min(5, len(positivas))

positivas[["product_id", "review_text"]].sample(sample_size, random_state=42)


Unnamed: 0,product_id,review_text
100377,P411403,"the it cosmetics products never disappoint, th..."
164488,P42204,i always use this lip balm to hydrate my lips ...
38132,P443840,i was excited to try the inkey list caffeine e...
505656,P406712,gave my skin great hydration....10/10 would re...
581477,P421275,i received these complimentary from influenste...
