# 🧴 Análisis de Reseñas - Skincare (Text Analytics)
Este notebook realiza una limpieza básica de texto y detecta reseñas con lenguaje positivo sobre productos de cuidado de la piel (Skincare).


In [1]:
import os
from pyspark.sql import SparkSession
from delta import configure_spark_with_delta_pip

# Inicializar Spark con Delta
builder = SparkSession.builder \
    .appName("Skincare Reviews Analytics") \
    .master("local[*]") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()

# Cargar dataset Delta limpio

ruta = os.path.expanduser("~/MASTER_BIGDATA/TFM/TFM-Fragancias/data/exploitation/products/skincare_reviews")
df = spark.read.format("delta").load(ruta)
df_pd = df.toPandas()

# Vista previa
df_pd.head()


your 131072x1 screen size is bogus. expect trouble
25/08/20 17:00:09 WARN Utils: Your hostname, DESKTOP-TO5DGIL resolves to a loopback address: 127.0.1.1; using 172.20.172.230 instead (on interface eth0)
25/08/20 17:00:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


:: loading settings :: url = jar:file:/home/judmartz/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/judmartz/.ivy2/cache
The jars for the packages stored in: /home/judmartz/.ivy2/jars
io.delta#delta-spark_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-7622c7b7-8915-4809-93a4-4eb708fb226f;1.0
	confs: [default]
	found io.delta#delta-spark_2.12;3.1.0 in central
	found io.delta#delta-storage;3.1.0 in central
	found org.antlr#antlr4-runtime;4.9.3 in central
:: resolution report :: resolve 597ms :: artifacts dl 46ms
	:: modules in use:
	io.delta#delta-spark_2.12;3.1.0 from central in [default]
	io.delta#delta-storage;3.1.0 from central in [default]
	org.antlr#antlr4-runtime;4.9.3 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   3   |

Unnamed: 0,product_id,review_text
0,P420652,i would not repurchase as there are better one...
1,P420652,"i love it, the flavour and the little applicat..."
2,P420652,i can not live with out this product it leaves...
3,P420652,"before buying this product, you have to unders..."
4,P420652,this lip mask is so good for dehydrated lips. ...


In [2]:
import string

# Palabras positivas básicas
positive_words = {"love", "like", "amazing", "great", "perfect", "awesome", "wonderful", "excellent"}

def clean_and_detect(text):
    if not isinstance(text, str):
        return False
    # Eliminar puntuación y pasar a minúsculas
    text_clean = text.translate(str.maketrans("", "", string.punctuation)).lower()
    return any(word in text_clean for word in positive_words)

# Aplicar detección
df_pd["is_positive"] = df_pd["review_text"].apply(clean_and_detect)

# Ver primeras filas
df_pd[["product_id", "review_text", "is_positive"]].head()


Unnamed: 0,product_id,review_text,is_positive
0,P420652,i would not repurchase as there are better one...,False
1,P420652,"i love it, the flavour and the little applicat...",True
2,P420652,i can not live with out this product it leaves...,False
3,P420652,"before buying this product, you have to unders...",False
4,P420652,this lip mask is so good for dehydrated lips. ...,False


In [3]:
print("📝 Total de reseñas:", len(df_pd))
print("👍 Reseñas positivas detectadas:", df_pd['is_positive'].sum())


📝 Total de reseñas: 599634
👍 Reseñas positivas detectadas: 414207


In [4]:
# Si hay menos de 5 positivas, mostrar todas
positivas = df_pd[df_pd["is_positive"]]
sample_size = min(5, len(positivas))

positivas[["product_id", "review_text"]].sample(sample_size, random_state=42)


Unnamed: 0,product_id,review_text
101419,P427414,i’m just finishing this and i really do love i...
165346,P418218,"i really wanted to like this cream, but it clo..."
38927,P411540,i have really fair skin that’s super hard to c...
507328,P309308,pricey but good results! i’ve gone through 1.5...
581536,P453825,i love this product. i wasn’t sure since it’s ...
