# **Importar Librerias Necesarias**

In [0]:
%pip install emoji
%pip install wordcloud
%pip install matplotlib
%pip install tensorflow

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
import re
import emoji
import builtins

import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from wordcloud import WordCloud

from pyspark.sql import Row
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql import functions as F
from pyspark.sql.window import Window
from IPython.display import display, HTML

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from pyspark.sql.types import StringType, IntegerType, NumericType, ArrayType, MapType
from pyspark.ml.feature import Tokenizer, StopWordsRemover, RegexTokenizer

# **Crear Sesión Spark**

In [0]:
spark = SparkSession.builder \
    .appName("EDA_Suicide_Watch") \
    .getOrCreate()

# **Cargar Tabla Registrada Unity Catalog**

Aqui se realizará la carga del conjunto de datos directamente desde la tabla registrada en el entorno de trabajo de Spark. Una vez importada, se muestra su contenido para verificar que la lectura se haya realizado correctamente y para obtener una primera vista general de las filas y columnas disponibles.

In [0]:
df = spark.table("workspace.suicide_detection.suicide_detection_raw")
display(df)

DataFrame[id: bigint, text: string, class: string]

---


---
#<center> **Pre-Procesamiento Conjunto Datos**</center>
---


---


En esta sección se llevará a cabo la preparación inicial del conjunto de datos con el fin de garantizar que la información se encuentre organizada, depurada y lista para ser transformada. Se aplicarán procedimientos orientados a estructurar el conjunto de datos, revisar la calidad de los registros y establecer las bases necesarias para iniciar la limpieza y normalización del texto.

Estos pasos permitirán disponer de un dataset coherente y manejable, asegurando que las etapas posteriores de procesamiento se ejecuten sobre datos consistentes y adecuadamente preparados.

##**Normalización & Limpieza Básica Texto**

En esta sección se llevará a cabo el proceso fundamental de depuración y estandarización del texto, aplicando una serie de transformaciones que permitirán obtener una representación más clara y uniforme del contenido. Estas operaciones estarán enfocadas en eliminar variaciones superficiales, reducir ruido lingüístico y asegurar que cada registro cuente con un formato coherente.

Con estas acciones, se establecerá la base necesaria para que los pasos posteriores puedan trabajar sobre texto limpio y homogéneo, facilitando el análisis y mejorando la calidad del procesamiento subsiguiente.

###**Conversión Texto Minúsculas**

En este apartado se llevará a cabo una visualización comparativa que permitirá observar cómo se verá el texto antes y después de aplicar la conversión a minúsculas. Para ello, se seleccionará una muestra de registros y se generarán dos columnas: una con el texto original y otra con su versión transformada. Esta comparación facilitará verificar que la normalización se esté realizando correctamente antes de aplicarla al resto del conjunto de datos.

Posteriormente, una vez validado el resultado, se aplicará la transformación al dataframe completo, garantizando que todos los registros cuenten con una representación textual unificada en minúsculas.

In [0]:
# Muestra antes/después
df_preview = df.select("id", F.col("text").alias("before")).limit(10)
df_preview = df_preview.withColumn("after", F.lower(F.col("before")))

# Visualizar el Antes vs Después (lowercase)
display(df_preview.toPandas())

# Aplicar transformación al dataframe original
df = df.withColumn("text", F.lower(F.col("text")))

Unnamed: 0,id,before,after
0,2,Ex Wife Threatening SuicideRecently I left my ...,ex wife threatening suiciderecently i left my ...
1,3,Am I weird I don't get affected by compliments...,am i weird i don't get affected by compliments...
2,4,Finally 2020 is almost over... So I can never ...,finally 2020 is almost over... so i can never ...
3,8,i need helpjust help me im crying so hard,i need helpjust help me im crying so hard
4,9,"I’m so lostHello, my name is Adam (16) and I’v...","i’m so losthello, my name is adam (16) and i’v..."
5,11,Honetly idkI dont know what im even doing here...,honetly idki dont know what im even doing here...
6,12,[Trigger warning] Excuse for self inflicted bu...,[trigger warning] excuse for self inflicted bu...
7,13,It ends tonight.I can’t do it anymore. \nI quit.,it ends tonight.i can’t do it anymore. \ni quit.
8,16,"Everyone wants to be ""edgy"" and it's making me...","everyone wants to be ""edgy"" and it's making me..."
9,18,My life is over at 20 years oldHello all. I am...,my life is over at 20 years oldhello all. i am...


###**Eliminación Emojis & Caracteres Unicode Especiales**

En este apartado se llevará a cabo la creación de una función personalizada que permitirá remover emojis del texto, sustituyéndolos por una cadena vacía. Para validar su correcto funcionamiento, primero se seleccionará una muestra aleatoria de registros que contengan caracteres fuera del rango ASCII, ya que estos suelen incluir emojis u otros símbolos especiales.

A partir de esta muestra, se generará una vista comparativa con las columnas “before” y “after”, lo que permitirá observar directamente cuáles textos experimentarán cambios tras la limpieza. Luego, se filtrarán únicamente aquellos casos donde la transformación haya producido diferencias, y se mostrará una tabla desplazable que facilitará revisar el resultado de forma ordenada.

Una vez verificada la efectividad del procedimiento, se aplicará la función de eliminación de emojis a todo el dataframe, garantizando que el conjunto completo de textos quede libre de estos caracteres.

In [0]:
# Función para remover emojis
def remove_emoji(text):
    if text is None:
        return text
    return emoji.replace_emoji(text, replace='')

remove_emoji_udf = F.udf(remove_emoji, StringType())

# Crear preview de textos con emojis
df_emoji_preview = (
    df
    .filter(F.col("text").rlike("[^\x00-\x7F]"))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Crear preview Antes/Despues
df_emoji_preview = df_emoji_preview.withColumn(
    "after",
    remove_emoji_udf(F.col("before"))
)

# Filtrar solo los registros donde sí hubo cambio
df_emoji_changed = df_emoji_preview.filter(F.col("before") != F.col("after")).limit(10)

# Convertir a pandas
preview_pd = df_emoji_changed.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll

display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn("text", remove_emoji_udf(F.col("text")))

Unnamed: 0,id,before,after
0,83805,was vibing with a girl then i found out she’s thirteen💔💔 i mean it’s only a year difference i’m 14 but still i can’t shake the fact that she was 12 last year bro like fuck out of here🤢,was vibing with a girl then i found out she’s thirteen i mean it’s only a year difference i’m 14 but still i can’t shake the fact that she was 12 last year bro like fuck out of here
1,236523,if i were a dinosaur i would have big jaw and pretty feathers 😎😎😎😎😎😎😎 i would also be hot af and get all the heterosexual/bisexual lady dinos 😏😏😏😏,if i were a dinosaur i would have big jaw and pretty feathers i would also be hot af and get all the heterosexual/bisexual lady dinos
2,272807,we did it reddit we bullied an innocent person off this website 😎😎\n\nshame on all of you who took part. you know who you are.,we did it reddit we bullied an innocent person off this website \n\nshame on all of you who took part. you know who you are.
3,323078,"it's sad boi hours, someone tell me a joke yee, idk, i dont wanna be sad cause i justed aced my government final, but i am cause my girl left me less than a week ago and is already flirting with new guys😀","it's sad boi hours, someone tell me a joke yee, idk, i dont wanna be sad cause i justed aced my government final, but i am cause my girl left me less than a week ago and is already flirting with new guys"
4,309886,"daily reminder that god loves you ❤️🙏 “‘the mountains may shift, and the hills may be shaken, but my faithful love won’t shift from you, and my covenant of peace won’t be shaken,’ says the lord, the one who pities you.” — isaiah 54:10\n\n\n“but you, my lord, are a god of compassion and mercy; you are very patient and full of faithful love.” — psalm 86:15\n\n“the lord your god is in your midst — a warrior bringing victory. he will create calm with his love; he will rejoice over you with singing.” — zephaniah 3:17\n\n“see what kind of love the father has given to us in that we should be called god’s children, and that is what we are! because the world didn’t recognize him, it doesn’t recognize us.” — 1 john 3:1","daily reminder that god loves you “‘the mountains may shift, and the hills may be shaken, but my faithful love won’t shift from you, and my covenant of peace won’t be shaken,’ says the lord, the one who pities you.” — isaiah 54:10\n\n\n“but you, my lord, are a god of compassion and mercy; you are very patient and full of faithful love.” — psalm 86:15\n\n“the lord your god is in your midst — a warrior bringing victory. he will create calm with his love; he will rejoice over you with singing.” — zephaniah 3:17\n\n“see what kind of love the father has given to us in that we should be called god’s children, and that is what we are! because the world didn’t recognize him, it doesn’t recognize us.” — 1 john 3:1"
5,310395,who else agrees that this emoji is the worst 😆 like i just die from the inside whenever i see someone use it,who else agrees that this emoji is the worst like i just die from the inside whenever i see someone use it
6,233067,plz give me an award plz 🙏🙏 plz i beg of u.\ni made a bet with my friend and if i get an award i get 5 bucks.\n\nplz i want ur charity i will thank u very much 💕💕,plz give me an award plz plz i beg of u.\ni made a bet with my friend and if i get an award i get 5 bucks.\n\nplz i want ur charity i will thank u very much
7,289890,i’m not fat bro 😵 if that’s your only insult please for the love of all things unholy try again. there is so much more to insult me over. smh my head,i’m not fat bro if that’s your only insult please for the love of all things unholy try again. there is so much more to insult me over. smh my head
8,73303,guys a youtuber called optimus made a video about this sub lol. chungus chungus chungus chungus keanu keanu reeves. 😭😩😢🤨😀👌🍆🍆😭😀😩🤬👅👄👁. optimus is really 🏃 out of ideas. penis penis penis penis penis penis. 😀😅😇😍😚🤪😃😂🙂🥰😋🤨😄🤣🙃😘😛😏😁☺️😉😗😝🤓😆😊😌😙😜😎🤩🥳😒😞😔😟😕☹️☹️😣😫😩🥺😢😤😠😡🤯🥵🥶😰😥😓😱,guys a youtuber called optimus made a video about this sub lol. chungus chungus chungus chungus keanu keanu reeves. . optimus is really out of ideas. penis penis penis penis penis penis.
9,172291,i don’t know who needs to read this but... i hope you have a good night and i love you kings 👑❤️,i don’t know who needs to read this but... i hope you have a good night and i love you kings


###**Remover Caracteres Control**

En este apartado se llevará a cabo la identificación y limpieza de caracteres de control presentes en el texto, los cuales pertenecen a rangos Unicode que no representan contenido legible y que pueden interferir con los procesos posteriores. Para ello, se filtrará una muestra de registros que contengan este tipo de caracteres y se generará una vista comparativa que mostrará el texto antes y después de la transformación.

La visualización permitirá verificar que dichos caracteres serán reemplazados por espacios en blanco, evitando así distorsiones o rupturas en el contenido. Una vez validado el resultado en la muestra, se aplicará la misma expresión regular al dataframe completo, asegurando que todos los registros queden libres de estos símbolos no imprimibles.

In [0]:
# Filtrar solo textos que contienen caracteres Unicode a limpiar
df_preview = (
    df
    .filter(F.col("text").rlike(r"[\u0000-\u001F\u007F]"))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(10)
)

# Aplicar la normalización solo para visualización
df_preview = df_preview.withColumn(
    "after",
    F.regexp_replace("before", r"[\u0000-\u001F\u007F]", " ")
)

# Convertir a pandas
preview_pd = df_preview.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", r"[\u0000-\u001F\u007F]", " ")
)

Unnamed: 0,id,before,after
0,37412,should i paint my hair pink fheehusensienskenidjdudnrhdhdksodnrbdhrhdhhejfbegrnurbdhenudndjenuedksmdndjdnjdjdndndndjsjrjdjd \n\nim a guy btw,should i paint my hair pink fheehusensienskenidjdudnrhdhdksodnrbdhrhdhhejfbegrnurbdhenudndjenuedksmdndjdnjdjdndndndjsjrjdjd im a guy btw
1,93146,"i’m going for my second driving lesson tonight i’m really excited because this time, i’m getting my usual car. \n \nand all i have to say is that it’s clearly the best car in the entire driving school.","i’m going for my second driving lesson tonight i’m really excited because this time, i’m getting my usual car. and all i have to say is that it’s clearly the best car in the entire driving school."
2,258719,"this one goes out to the fellas you ever been on a date with a guy, and it turns out hes gay\n\nlike wtf ive on a date with a gay dude this whole time","this one goes out to the fellas you ever been on a date with a guy, and it turns out hes gay like wtf ive on a date with a gay dude this whole time"
3,291138,bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle \nbank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle,bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle bank angle
4,267697,"i tried to kill myself today.back in december i tried to kill myself twice. once with pills, and once with a train. the pills didn't work so about a week later i walked a couple miles to some railroad tracks outside of town by a cemetery. i waited in the cold for a few hours there but a train never came. i ended up calling the suicide hotline while i was there and they had a sheriff come pick me up and take me to the hospital where i spent 5 days in the psych ward till they kicked me out on christmas eve. \n\nafter that i put in my 2 weeks notice at my job of 5 years to move back home to my dad's house to get away from the all the drugs and violence in the town that i moved to after high school. i had fallen in with a bad crowd in that town and used weed to self medicate and became a stoner but i stopped smoking when i went into the hospital. luckily i was able to pass a drug test and got a job at a walgreens about a mile from my dad's house but i hated that job. i worked there for 2 days until i just stopped going to work. i stopped taking my meds and waited for the cold chill here in the midwest to stop so i could go down to the tracks again to kill myself. \n\ni waited on an old railroad tie while drinking the tall boys i brought with me waiting for a train. i ended up missing the first train that went by because i couldn't get to the tracks fast enough so i walked a little further down and waited under a bridge for another train to come, but it never did so i just walked home.\n\ni don't even know what to do anymore, i guess tomorrow i'll check myself into the hospital again. i don't really want to kill myself but i hate my life and don't want to live anymore. i just don't know anymore.","i tried to kill myself today.back in december i tried to kill myself twice. once with pills, and once with a train. the pills didn't work so about a week later i walked a couple miles to some railroad tracks outside of town by a cemetery. i waited in the cold for a few hours there but a train never came. i ended up calling the suicide hotline while i was there and they had a sheriff come pick me up and take me to the hospital where i spent 5 days in the psych ward till they kicked me out on christmas eve. after that i put in my 2 weeks notice at my job of 5 years to move back home to my dad's house to get away from the all the drugs and violence in the town that i moved to after high school. i had fallen in with a bad crowd in that town and used weed to self medicate and became a stoner but i stopped smoking when i went into the hospital. luckily i was able to pass a drug test and got a job at a walgreens about a mile from my dad's house but i hated that job. i worked there for 2 days until i just stopped going to work. i stopped taking my meds and waited for the cold chill here in the midwest to stop so i could go down to the tracks again to kill myself. i waited on an old railroad tie while drinking the tall boys i brought with me waiting for a train. i ended up missing the first train that went by because i couldn't get to the tracks fast enough so i walked a little further down and waited under a bridge for another train to come, but it never did so i just walked home. i don't even know what to do anymore, i guess tomorrow i'll check myself into the hospital again. i don't really want to kill myself but i hate my life and don't want to live anymore. i just don't know anymore."
5,126874,join my sub filler filler filler its better than this one \n\nr/teenagerstwo2,join my sub filler filler filler its better than this one r/teenagerstwo2
6,287501,i 100% agree with this and stand by this https://twitter.com/albywalter/status/1347912033933484032?s=21\n\nhttps://twitter.com/albywalter/status/1347912156814036992?s=21,i 100% agree with this and stand by this https://twitter.com/albywalter/status/1347912033933484032?s=21 https://twitter.com/albywalter/status/1347912156814036992?s=21
7,178678,i'm about to go on omegle and fish for compliments on being a femboy lmao haha i have self image problems lmao \n\nbig funny,i'm about to go on omegle and fish for compliments on being a femboy lmao haha i have self image problems lmao big funny
8,275021,nice to meet you im basically the most pessimistic creature you will ever see\n\nthats basically what i wanted to say for some reason,nice to meet you im basically the most pessimistic creature you will ever see thats basically what i wanted to say for some reason
9,307656,"can i talk to someone please or someone give me a reason why i shouldn’t overdose right now.[15m]\nhi. can someone please convice me not to overdose rn. i had to build up massive courage to tell one of my friends that i was suicidle and tried to od last week on melatonin. i was asleep for 15 hours and was throwing up and fainting. he didn’t believe me and made jokes about it. i feel like shit. i want to escape. i have adhd, severe anxiety, depression, ptsd. \ni get bullied. i get told to kill myself infront of my face. i get hit “as a joke” by people at school. if i’m outside i get shouted mean things to me. i just can’t live anymore. i want to make the people who tell me to end myself happy. it would be a better place for everyone. i’m such a burden, i can’t do anything right at all. i have hurt myself because of it. and i will probably delete this account either in regret or if i do decide to do it. \n\nand yes, i do have a therapist who doesn’t know of my attempt last week yet. i don’t want anyone to know. \n\ni have melatonin, anti depressants, adderall, lexapro all next to me. idk what to do. i’m not calling anyone. or texting.","can i talk to someone please or someone give me a reason why i shouldn’t overdose right now.[15m] hi. can someone please convice me not to overdose rn. i had to build up massive courage to tell one of my friends that i was suicidle and tried to od last week on melatonin. i was asleep for 15 hours and was throwing up and fainting. he didn’t believe me and made jokes about it. i feel like shit. i want to escape. i have adhd, severe anxiety, depression, ptsd. i get bullied. i get told to kill myself infront of my face. i get hit “as a joke” by people at school. if i’m outside i get shouted mean things to me. i just can’t live anymore. i want to make the people who tell me to end myself happy. it would be a better place for everyone. i’m such a burden, i can’t do anything right at all. i have hurt myself because of it. and i will probably delete this account either in regret or if i do decide to do it. and yes, i do have a therapist who doesn’t know of my attempt last week yet. i don’t want anyone to know. i have melatonin, anti depressants, adderall, lexapro all next to me. idk what to do. i’m not calling anyone. or texting."


###**Normalización Espacios Múltiples**

En este apartado se llevará a cabo una revisión del texto para identificar aquellos casos donde aparezcan espacios repetidos o distribuidos de manera irregular. Para facilitar esta revisión, primero se tomará una muestra del dataset y se mostrará una comparación entre el texto original y la versión corregida, lo que permitirá confirmar que los cambios se están aplicando correctamente.

Una vez validada la transformación, el proceso se aplicará al conjunto completo de datos, reemplazando los espacios múltiples por un único espacio y ajustando la estructura final del texto. Además, se realizará un conteo que permitirá conocer cuántos registros presentaban este tipo de inconsistencias. Con ello, se garantizará que el texto quede más ordenado y uniforme antes de avanzar a las siguientes etapas de procesamiento.

In [0]:
# Tomar una muestra grande para encontrar cambios
df_sample = (
    df
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Aplicar el proceso a la muestra
df_sample = (
    df_sample
    .withColumn("after_temp", F.regexp_replace("before", r"\s{2,}", " "))
    .withColumn("after", F.trim(F.col("after_temp")))
    .drop("after_temp")
)

# Filtrar solo donde hubo cambios reales
df_changed = df_sample.filter(F.col("before") != F.col("after")).limit(10)

# Transformar para hacer visibles los espacios múltiples
df_vis = (
    df_changed
    .withColumn(
        "before_vis",
        F.regexp_replace("before", r"\s{2,}", "⟶⟶")
    )
)

# Convertir a Pandas
preview_pd = df_vis.select("id", "before_vis", "after").toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: pre;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn("text", F.regexp_replace("text", r"\s{2,}", " "))
df = df.withColumn("text", F.trim(F.col("text")))

# Contar registros con espacios múltiples
count_multi_spaces = df.filter(F.col("text").rlike(r"\s{2,}")).count()

# Crear DataFrame resumen simple
summary = spark.createDataFrame([
    Row(
        Descripcion="Registros con espacios múltiples en el dataframe original",
        Cantidad=count_multi_spaces
    )
])

# Mostrar tabla básica
summary.show(truncate=False)

Unnamed: 0,id,before_vis,after
0,327126,"debated about not even putting this upso i'm thinking about trying suicide tonight. tuesday seems like a weird night for it, but i'm really in a bad spot and i don't know if i can come back from this.⟶⟶to make the story short, there's been a lot of things at my work that make me feel really useless. most that problem is me being awkward and not knowing how to communicate properly, some people influencing me in ways so i don't trust people and then they don't trust me.⟶⟶i guess the main problem is my sleeping habits. i've tried sleeping for 10 hours and being tired and hard to wake up, to only sleeping 4 hours and waking up really early.⟶⟶my job gave me the funding for a sleep study to see if i have any sleeping disorders, and i've tried making appointments. i tried getting it done january 2017, and they only called me at the beginning of the month for an initial session, and i had to reschedule getting the results. i have not received a call back and that would help with some of the stress.⟶⟶and then at work, i try to do the best work i can as a mechanic, but always feel like i'm falling short. it helps i'm small, so i can fit into small spaces easy, where other people can't reach.⟶⟶but they treat me like i'm useless and a problem most of the time. i can admit, i internalize things too much. i've been trying to let people in, but when i do, i feel like i get stabbed in the back all the time. what's worse, is i try not to let stuff like this show at work, which only makes it worse when it does.⟶⟶tl;dr: i'm a failure to everyone i know and just want to take a swim with an anchor right now.⟶⟶my phone is at 12%, if i can't calm down, i'm going to try it when it dies.⟶⟶i'm sorry if this stresses you guys out, i'm just so done with it all.","debated about not even putting this upso i'm thinking about trying suicide tonight. tuesday seems like a weird night for it, but i'm really in a bad spot and i don't know if i can come back from this. to make the story short, there's been a lot of things at my work that make me feel really useless. most that problem is me being awkward and not knowing how to communicate properly, some people influencing me in ways so i don't trust people and then they don't trust me. i guess the main problem is my sleeping habits. i've tried sleeping for 10 hours and being tired and hard to wake up, to only sleeping 4 hours and waking up really early. my job gave me the funding for a sleep study to see if i have any sleeping disorders, and i've tried making appointments. i tried getting it done january 2017, and they only called me at the beginning of the month for an initial session, and i had to reschedule getting the results. i have not received a call back and that would help with some of the stress. and then at work, i try to do the best work i can as a mechanic, but always feel like i'm falling short. it helps i'm small, so i can fit into small spaces easy, where other people can't reach. but they treat me like i'm useless and a problem most of the time. i can admit, i internalize things too much. i've been trying to let people in, but when i do, i feel like i get stabbed in the back all the time. what's worse, is i try not to let stuff like this show at work, which only makes it worse when it does. tl;dr: i'm a failure to everyone i know and just want to take a swim with an anchor right now. my phone is at 12%, if i can't calm down, i'm going to try it when it dies. i'm sorry if this stresses you guys out, i'm just so done with it all."
1,147480,"disappointed with the worldfirst of all i’m not suicidal. i just want a secret place to complain without letting anyone in real life knows.⟶⟶i think i have been extremely disappointed with everything since teenage years. when i was a kid, i was strongly influenced by children’s books written by andersen and wilde, etc. i think i completely believed in the morality that those books encourage. but ever since i tried to become a responsible adult, i have been feeling like why is the world so fucked up? the love stories in the children’s literature are so pure and beautiful, whereas in real life i haven’t fucking even seen any happy marriage. my parents’ marriage, my grandparents’s marriage and my aunt and uncle’s marriage are all fucked up. none of them divorced but everyone’s marriage is ugly. people are all selfish and had all sorts of bad qualities in their nature, such as cheating, bullying the weaker people (usually women) in the family. some are egoistic, some are control freaks, and some are lazy as fuck and doesn’t want to be responsible even at the age of 40.⟶⟶other aspects of life are also fucked up. why are some people so poor? and no matter how hard i work, i will never be as rich as those lazy idiots who have rich parents. i’m female and why the fuck do i feel unsafe all the time, like people literally stalk me on the street? i wanted to fucking shout at him and tell him to have some respect but i was worried he would become more aggressive. i really wish that i have never been born so i don’t have to fucking face these questions.⟶⟶no matter how, i’m still going to live. and no matter what happens, tomorrow’s sun will always rise and shine. i will do work like a donkey and have fun whenever i can. fuck everyone. fuck everything. fuck you all. you are very welcome.","disappointed with the worldfirst of all i’m not suicidal. i just want a secret place to complain without letting anyone in real life knows. i think i have been extremely disappointed with everything since teenage years. when i was a kid, i was strongly influenced by children’s books written by andersen and wilde, etc. i think i completely believed in the morality that those books encourage. but ever since i tried to become a responsible adult, i have been feeling like why is the world so fucked up? the love stories in the children’s literature are so pure and beautiful, whereas in real life i haven’t fucking even seen any happy marriage. my parents’ marriage, my grandparents’s marriage and my aunt and uncle’s marriage are all fucked up. none of them divorced but everyone’s marriage is ugly. people are all selfish and had all sorts of bad qualities in their nature, such as cheating, bullying the weaker people (usually women) in the family. some are egoistic, some are control freaks, and some are lazy as fuck and doesn’t want to be responsible even at the age of 40. other aspects of life are also fucked up. why are some people so poor? and no matter how hard i work, i will never be as rich as those lazy idiots who have rich parents. i’m female and why the fuck do i feel unsafe all the time, like people literally stalk me on the street? i wanted to fucking shout at him and tell him to have some respect but i was worried he would become more aggressive. i really wish that i have never been born so i don’t have to fucking face these questions. no matter how, i’m still going to live. and no matter what happens, tomorrow’s sun will always rise and shine. i will do work like a donkey and have fun whenever i can. fuck everyone. fuck everything. fuck you all. you are very welcome."
2,122852,silly putty implies there is a serious putty. ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍⟶⟶‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍,silly putty implies there is a serious putty. ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
3,196183,"guys i need your help!!! i have this really great french teacher , she is really fun and you can talk to her about anything and she wouldn't judge you which is a great quality for an adult in my opinion. i was looking for ways to show her what she means to me and my classmates and thought a good idea would be to grow her youtube channel. the subscribers mainly⟶⟶are only students of her class so it would mean a lot to her if she could get a lot of subs. im doing this anonymously so she thinks it was her luck. im not karma whoring or begging for subs but do what you can guys. if you don't want to sub then just leave an wholesome comment.⟶⟶this is the link⟶⟶[unlearn & relearn](https://www.youtube.com/channel/ucul_xra7m9fardnk-25a4ta)","guys i need your help!!! i have this really great french teacher , she is really fun and you can talk to her about anything and she wouldn't judge you which is a great quality for an adult in my opinion. i was looking for ways to show her what she means to me and my classmates and thought a good idea would be to grow her youtube channel. the subscribers mainly are only students of her class so it would mean a lot to her if she could get a lot of subs. im doing this anonymously so she thinks it was her luck. im not karma whoring or begging for subs but do what you can guys. if you don't want to sub then just leave an wholesome comment. this is the link [unlearn & relearn](https://www.youtube.com/channel/ucul_xra7m9fardnk-25a4ta)"
4,40055,i honesty can’t comprehend how the sub went from 12k online to 68k online⟶⟶huhhh,i honesty can’t comprehend how the sub went from 12k online to 68k online huhhh
5,85622,"i don't know what else to doi've been depressed for the last 6 months. this is not the first time it happaned to me. i had a major clinic depression from age 16 to 26 with some upsides in the middle, but mostly i couldn't even get out of my house because i had agoraphobia. my ex boyfriend took care of me all those years. money never was a problem with him. but i left the house an took my chance out there when i realized he was only helping me to depend on him and to have control over me. he sabotaged any chance i could get to get better or to have economic independency. when i realized he was being an obtacle and told him to back up he got violent and abusive to me.⟶⟶so i went to live with a couple of friends. one of them is actually my current boyfriend. going back with my parents is not an option since i run away from their house and lose contact with them because they were abusive to me.⟶⟶right now i'm 29. last couple of years were nice. i was really happy and got better. i got out of agoraphobia and never came back. but i never found the way to be independent and it's becoming a problem because we don't have much money right now. we are barely paying the bills and i'm not emotionally stable to have a job and help with that. even so, getting a job is really difficult for me because i didn't even finish high school and never had a job before.⟶⟶i need therapy and meds but we can't afford it and my emotional inestability is taking a toll on my boyfriend because he is finishing his studies and is under a lot of stress. he can't deal with both things at the same time. plus we were told we had 1 year to move out from the place we are living but we don't have savings.⟶⟶i tried to apply to free therapy at my local university last month ago. they accepted my request and told me they would call me but they never did. i sent and email one week ago telling them i really need help but i didn't get an answer yet.⟶⟶now i can't even get out of my bed. i feel trapped in a system which mental health plans are only accesible to people who can afford it or that have someone to take care of them. i feel like a burden for everyone who cares about me and as if evetime i try to reach out for help i'm remember that i don't deserve it because i can't afford it. i just need things to be easy from time to time. i can't keep up with this constant struggle. i just need everything to stop.","i don't know what else to doi've been depressed for the last 6 months. this is not the first time it happaned to me. i had a major clinic depression from age 16 to 26 with some upsides in the middle, but mostly i couldn't even get out of my house because i had agoraphobia. my ex boyfriend took care of me all those years. money never was a problem with him. but i left the house an took my chance out there when i realized he was only helping me to depend on him and to have control over me. he sabotaged any chance i could get to get better or to have economic independency. when i realized he was being an obtacle and told him to back up he got violent and abusive to me. so i went to live with a couple of friends. one of them is actually my current boyfriend. going back with my parents is not an option since i run away from their house and lose contact with them because they were abusive to me. right now i'm 29. last couple of years were nice. i was really happy and got better. i got out of agoraphobia and never came back. but i never found the way to be independent and it's becoming a problem because we don't have much money right now. we are barely paying the bills and i'm not emotionally stable to have a job and help with that. even so, getting a job is really difficult for me because i didn't even finish high school and never had a job before. i need therapy and meds but we can't afford it and my emotional inestability is taking a toll on my boyfriend because he is finishing his studies and is under a lot of stress. he can't deal with both things at the same time. plus we were told we had 1 year to move out from the place we are living but we don't have savings. i tried to apply to free therapy at my local university last month ago. they accepted my request and told me they would call me but they never did. i sent and email one week ago telling them i really need help but i didn't get an answer yet. now i can't even get out of my bed. i feel trapped in a system which mental health plans are only accesible to people who can afford it or that have someone to take care of them. i feel like a burden for everyone who cares about me and as if evetime i try to reach out for help i'm remember that i don't deserve it because i can't afford it. i just need things to be easy from time to time. i can't keep up with this constant struggle. i just need everything to stop."
6,108844,"after 20 yrs of just barely surviving due to being a rso, i can't fight anymore, and an thinking about just offing myself.*pardon any coarse analogies and language⟶⟶at 17 i had got in trouble for messing with this girl off and on, having our several times and eventually we got drunk and stoned, partied together. we had what would likely be considered a few 3rd base encounters (making out, touching each other). after like the 4th or so time forget told by my friends that for lack of better terms she is a player, had more than me talking to her etc. so i decide to outflank her and hang out and do our usual thing and not say anything about the stuff that i've heard until the next day and then call her out in front of everybody.⟶⟶so the next morning i put her on the spot in front of our friends and was like you're talking to other people too and you're always asking me for stuff and leading me to believe that this was on its way to a relationship, like i said something to the effect of how does it feel to have your own game played on you 'cause i'm never talking to you again.⟶⟶mistake!!⟶⟶three or four weeks later i get pulled out of class, the state police want to talk to me. they are asking me all these questions about our encounter. i tell them the truth and that it wasn't the first time we'd messed around, but that once i found out she was doing the same thing with a few other people too i realize she wasn't the type of person i wanted to have his girlfriend.⟶⟶several months later i get arrested and charged with statutory csc. (at this point i'm basically a homeless youth who's getting himself to high school, shitty parents, shitty childhood etc)...⟶⟶long story short; i get a terrible court appointed attorney (my young dumb ass wouldn't have been able to decipher any difference from a good or bad one anyhow) and he works out a plea bargain that he says is going to keep me out of jail and minimal probation...⟶⟶wrong!!⟶⟶turns out he pled me out to a crime that was a felony instead of the one i was charged with which could have been a misdemeanor under the correct analogy.⟶⟶and since the laws have changed where i live this terrible plea bargain prevents me from taking advantage of that because it's based solely on conviction.⟶⟶that was 20 years ago almost to the day, and yesterday i got my⟶⟶114th or 115th canning from a job \ interview for this misappropriated 20 yrs old charge... i have never been in any trouble since then.⟶⟶i've paid hundreds of dollars to get risk assessments done i different psychologists (results say not a risk..., (imagine my surprise... lol) i've begged, pleaded, tried and tried and tried.⟶⟶yet i can't finish my degree, can't get hired anywhere, can't participate in society in even the most basic sense of the concept.⟶⟶i'm outta ideas, going bankrupt, and becoming a burden to my friends and family.⟶⟶i'm constantly depressed and anxious lately, aaaaand ya know.. i'm tired, i can't fight anymore and kinda just wanna quit.. it's a lot of frustration to deal with, i literally cannot take it. i'm afraid i might just do it.","after 20 yrs of just barely surviving due to being a rso, i can't fight anymore, and an thinking about just offing myself.*pardon any coarse analogies and language at 17 i had got in trouble for messing with this girl off and on, having our several times and eventually we got drunk and stoned, partied together. we had what would likely be considered a few 3rd base encounters (making out, touching each other). after like the 4th or so time forget told by my friends that for lack of better terms she is a player, had more than me talking to her etc. so i decide to outflank her and hang out and do our usual thing and not say anything about the stuff that i've heard until the next day and then call her out in front of everybody. so the next morning i put her on the spot in front of our friends and was like you're talking to other people too and you're always asking me for stuff and leading me to believe that this was on its way to a relationship, like i said something to the effect of how does it feel to have your own game played on you 'cause i'm never talking to you again. mistake!! three or four weeks later i get pulled out of class, the state police want to talk to me. they are asking me all these questions about our encounter. i tell them the truth and that it wasn't the first time we'd messed around, but that once i found out she was doing the same thing with a few other people too i realize she wasn't the type of person i wanted to have his girlfriend. several months later i get arrested and charged with statutory csc. (at this point i'm basically a homeless youth who's getting himself to high school, shitty parents, shitty childhood etc)... long story short; i get a terrible court appointed attorney (my young dumb ass wouldn't have been able to decipher any difference from a good or bad one anyhow) and he works out a plea bargain that he says is going to keep me out of jail and minimal probation... wrong!! turns out he pled me out to a crime that was a felony instead of the one i was charged with which could have been a misdemeanor under the correct analogy. and since the laws have changed where i live this terrible plea bargain prevents me from taking advantage of that because it's based solely on conviction. that was 20 years ago almost to the day, and yesterday i got my 114th or 115th canning from a job \ interview for this misappropriated 20 yrs old charge... i have never been in any trouble since then. i've paid hundreds of dollars to get risk assessments done i different psychologists (results say not a risk..., (imagine my surprise... lol) i've begged, pleaded, tried and tried and tried. yet i can't finish my degree, can't get hired anywhere, can't participate in society in even the most basic sense of the concept. i'm outta ideas, going bankrupt, and becoming a burden to my friends and family. i'm constantly depressed and anxious lately, aaaaand ya know.. i'm tired, i can't fight anymore and kinda just wanna quit.. it's a lot of frustration to deal with, i literally cannot take it. i'm afraid i might just do it."
7,298523,"i don't want to be part of the current world this is not a depression post. i am not having suicidal thoughts. this is about the general world's situation, not mine.⟶⟶this may not be the correct subreddit, but it's the one i relate to the most.⟶⟶now that's out of the way, let's talk.⟶⟶firstly, i'm a teenager. you all know it sucks.⟶⟶treated like children, expected to act like adults, school's ramping up, etc. this is most likely the worst stage in human life.⟶⟶secondly, i'm a boy. this may seem like an advantage, as we live in a male-dominated world, but when you can get hate for saying something as innocent as men deserve happiness (https://twitter.com/word2mymothaa/status/1370428957871443971 just read the comments) and everyone blaming age-old wrongdoings on your gender, it gets to you.⟶⟶third, i'm young. our generation is expected to clean up messes older people made with no knowledge of how to and less resources then the people who did. did you know that true irreversible damage to the enviroment from climate change, if things keep going at this rate,⟶⟶will happen in 7 years? i will either be in or just graduated from college. i won't have a full life on a healthy world.⟶⟶tl:dr the world sucks and i am forced to be a part of it.","i don't want to be part of the current world this is not a depression post. i am not having suicidal thoughts. this is about the general world's situation, not mine. this may not be the correct subreddit, but it's the one i relate to the most. now that's out of the way, let's talk. firstly, i'm a teenager. you all know it sucks. treated like children, expected to act like adults, school's ramping up, etc. this is most likely the worst stage in human life. secondly, i'm a boy. this may seem like an advantage, as we live in a male-dominated world, but when you can get hate for saying something as innocent as men deserve happiness (https://twitter.com/word2mymothaa/status/1370428957871443971 just read the comments) and everyone blaming age-old wrongdoings on your gender, it gets to you. third, i'm young. our generation is expected to clean up messes older people made with no knowledge of how to and less resources then the people who did. did you know that true irreversible damage to the enviroment from climate change, if things keep going at this rate, will happen in 7 years? i will either be in or just graduated from college. i won't have a full life on a healthy world. tl:dr the world sucks and i am forced to be a part of it."
8,63635,"triggered by my 10 yomy daughter expressed thoughts of suicidal ideation last week. she outcried to my mil at lunch because she was caught sneaking our house cell phone out of the house after being grounded for a week. she was grounded because her sister walked out of the house while she was helping me watch her and she didn't notice. i notified the school, and the counselor pulled to evaluate her. she is low risk, no plan, just thoughts.⟶⟶it was an extreme reaction to stress but it triggered my suicidal ideation. i feel like i've failed her as a parent. i know everyone messes up raising their first kid. i've tried the best i can and i know that's all i can do. unlike her, i have a plan. unlike her, i've struggled with these thoughts since i was a child. unlike her, i've attempted before.⟶⟶i told her my story, as kid appropriate as i could.⟶⟶so that she knows she is not alone. i did mention even though we hadn't gone through the same things, i could certainly understand what it's like to have those feelings and to remind her that it may seem bad but it could always be worse; that she was not alone. i made sure not to compare our feelings because her feelings are valid and its ""okay"" to have them. that it's ""okay"" to have the thoughts but it's not okay to close up and not talk it out. it's not okay to attempt. it's not okay to blame others for getting to this point because it's a choice she made to let it get this far without asking for help. maybe it's not the right choice of words but no one prepares you for this stuff.⟶⟶we discussed the side of suicide no one talks about - the aftermath on friends, family, the community. we talked about the laws in place to prevent cyber bullying and bullying in school. we talked about resources she has available and healthy coping mechanisms. that she is loved and cared for by her friends, family, community. that it's a temporary solution that leaves a lasting impact on everyone who⟶⟶i'm terrified to come home and find her dead because if she goes, i go. i just needed to get this out. i'm actively seeking her counseling and have made an appointment to get her seen by an md as well.","triggered by my 10 yomy daughter expressed thoughts of suicidal ideation last week. she outcried to my mil at lunch because she was caught sneaking our house cell phone out of the house after being grounded for a week. she was grounded because her sister walked out of the house while she was helping me watch her and she didn't notice. i notified the school, and the counselor pulled to evaluate her. she is low risk, no plan, just thoughts. it was an extreme reaction to stress but it triggered my suicidal ideation. i feel like i've failed her as a parent. i know everyone messes up raising their first kid. i've tried the best i can and i know that's all i can do. unlike her, i have a plan. unlike her, i've struggled with these thoughts since i was a child. unlike her, i've attempted before. i told her my story, as kid appropriate as i could. so that she knows she is not alone. i did mention even though we hadn't gone through the same things, i could certainly understand what it's like to have those feelings and to remind her that it may seem bad but it could always be worse; that she was not alone. i made sure not to compare our feelings because her feelings are valid and its ""okay"" to have them. that it's ""okay"" to have the thoughts but it's not okay to close up and not talk it out. it's not okay to attempt. it's not okay to blame others for getting to this point because it's a choice she made to let it get this far without asking for help. maybe it's not the right choice of words but no one prepares you for this stuff. we discussed the side of suicide no one talks about - the aftermath on friends, family, the community. we talked about the laws in place to prevent cyber bullying and bullying in school. we talked about resources she has available and healthy coping mechanisms. that she is loved and cared for by her friends, family, community. that it's a temporary solution that leaves a lasting impact on everyone who i'm terrified to come home and find her dead because if she goes, i go. i just needed to get this out. i'm actively seeking her counseling and have made an appointment to get her seen by an md as well."
9,43175,"if you do this, god will give you free v-bucks or else you will be beaten up in the locker room by satan and his homies &#x200b;⟶⟶https://preview.redd.it/ernkmjroccj61.png?width=277&format=png&auto=webp&s=759bd617c587f1faa2ff7e33932fa03f3b5fb081","if you do this, god will give you free v-bucks or else you will be beaten up in the locker room by satan and his homies &#x200b; https://preview.redd.it/ernkmjroccj61.png?width=277&format=png&auto=webp&s=759bd617c587f1faa2ff7e33932fa03f3b5fb081"


+---------------------------------------------------------+--------+
|Descripcion                                              |Cantidad|
+---------------------------------------------------------+--------+
|Registros con espacios múltiples en el dataframe original|0       |
+---------------------------------------------------------+--------+



###**Reducción Signos Repetidos**

En este apartado se llevará a cabo la detección de textos que contienen signos de puntuación repetidos de forma exagerada, como múltiples signos de emoción o interrogación consecutivos, los cuales serán removidos y/o corregidos.

Para revisar este comportamiento, primero se mostrará una muestra de registros donde aparezcan estos signos repetidos, permitiendo visualizar el antes y después de la corrección. Luego de verificar que la normalización funciona como se espera, se aplicará la transformación a todo el dataset, reemplazando las secuencias excesivas por un único signo. De esta manera, el texto quedará más equilibrado y uniforme, facilitando su interpretación y su procesamiento en etapas futuras.

In [0]:
# Reducción de signos repetidos
pattern = r"([!?.])\1{2,}"

# Mostrar antes/después
df_preview = (
    df
    .filter(F.col("text").rlike(pattern))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(10)
)

# Reemplazar seguro sin backreferences
df_preview = df_preview.withColumn(
    "after",
    F.regexp_replace("before", r"[!?.]{2,}", F.regexp_extract("before", r"([!?.])\1{2,}", 1))
)

# Convertir a pandas
preview_pd = df_preview.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style='max-height:500px; overflow:auto; font-size:15px;'>
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", r"[!?.]{2,}", ".")
)

Unnamed: 0,id,before,after
0,246879,well if you’re not gonna bleed for me go he said he likes crazy girls....but he hates when i act crazy just bc i’m fucked up it doesn’t mean i don’t feel,well if you’re not gonna bleed for me go he said he likes crazy girls.but he hates when i act crazy just bc i’m fucked up it doesn’t mean i don’t feel
1,73664,"hello.so, i’m not even sure what to say really. i’m just really... exhausted, i guess you could say? i don’t really feel like holding on anymore. i usually feel pretty sad about it and sob uncontrollably when i’m right on the edge like i am tonight. but tonight i feel peaceful. no emotions at all, really. i think i’m finally ready to let go. but i guess it doesn’t hurt to reach out one last time, right?","hello.so, i’m not even sure what to say really. i’m just really. exhausted, i guess you could say? i don’t really feel like holding on anymore. i usually feel pretty sad about it and sob uncontrollably when i’m right on the edge like i am tonight. but tonight i feel peaceful. no emotions at all, really. i think i’m finally ready to let go. but i guess it doesn’t hurt to reach out one last time, right?"
2,125996,"i'm not sure where else to turn.my story is a little weird. i'm a first time poster, as in i've never posted on reddit at all, i've commented on a few posts from others but i really don't do much but lurk. my dad left my mom, my sister and myself at a young age, around 8, and since then i've clung to my family. now, i'm 19 and i recently had to drop out of school due to financial issues. basically, now i'm 2,500 in the hole for school bills, i've run through 4 cars in the three short years i've had my license and i'm victim to a stepfather who sees me as nothing but a nuisance, and my mother feels i only stress her out now. due to those financial complications, i had to move home and try to get back on my own feet. i have a full time job and i use every bit of my paycheck to save for a car, pay off school debts and miscellaneous debts alike, leaving me with a few dollars to buy packs of cigarettes so i don't lose my mind. i never fit in in school; i even joined a fraternity in college, but i feel sort of alienated by them because of my financial situation (i wasn't allowed to be active because of my dropping out and i can understand why they'd be mad, it just sucks because i'm not sure if i'm welcome at the house at this point) i've contemplated taking my own life for several years and i don't think i have ever had more motive to do than right now. it's pathetic, and i feel pathetic, and i don't really know where to turn at this point. i'm not looking for sympathy; i'm looking for constructive points to better my life, if i seem in the wrong, and if you want to ask any questions or more details about my situation... well i'd be more than happy to let you know within reasonable means to really see if i'm in the wrong about my life as a whole and if i'm overreacting with these suicidal thoughts that seem to plague me. tldr my family thinks i'm nothing and i dont know how to deal with the hurdles life is throwing at me and if i wasn't such a huge pussy, i would absolutely kill myself and i'm scared i might one day get the balls. help.","i'm not sure where else to turn.my story is a little weird. i'm a first time poster, as in i've never posted on reddit at all, i've commented on a few posts from others but i really don't do much but lurk. my dad left my mom, my sister and myself at a young age, around 8, and since then i've clung to my family. now, i'm 19 and i recently had to drop out of school due to financial issues. basically, now i'm 2,500 in the hole for school bills, i've run through 4 cars in the three short years i've had my license and i'm victim to a stepfather who sees me as nothing but a nuisance, and my mother feels i only stress her out now. due to those financial complications, i had to move home and try to get back on my own feet. i have a full time job and i use every bit of my paycheck to save for a car, pay off school debts and miscellaneous debts alike, leaving me with a few dollars to buy packs of cigarettes so i don't lose my mind. i never fit in in school; i even joined a fraternity in college, but i feel sort of alienated by them because of my financial situation (i wasn't allowed to be active because of my dropping out and i can understand why they'd be mad, it just sucks because i'm not sure if i'm welcome at the house at this point) i've contemplated taking my own life for several years and i don't think i have ever had more motive to do than right now. it's pathetic, and i feel pathetic, and i don't really know where to turn at this point. i'm not looking for sympathy; i'm looking for constructive points to better my life, if i seem in the wrong, and if you want to ask any questions or more details about my situation. well i'd be more than happy to let you know within reasonable means to really see if i'm in the wrong about my life as a whole and if i'm overreacting with these suicidal thoughts that seem to plague me. tldr my family thinks i'm nothing and i dont know how to deal with the hurdles life is throwing at me and if i wasn't such a huge pussy, i would absolutely kill myself and i'm scared i might one day get the balls. help."
3,193513,"i think it's time...it's taken me a while of lurking, to get up the courage to make a throwaway to write on this subreddit with... i'm 6 months out of a ltr that i ended. i ended it because i cheated on him, and felt guilty. i couldn't bring myself to tell him, but he found out after we broke it off. now, things are just terrible. we kinda sorta keep in contact, as we have mutual friends. i love him with all my heart. i've tried seeing other people, but that just led to such a degredation of my self esteem. i don't remember the first 3 months after the relationship because i was either high on pills, drunk off alcohol or high off trees. i slept with any guy who'd i'd known for awhile that showed interest. (no.. i just met you lets fuck! tho.. thankfully) the fact that i did that has just added to my guilt. i hate myself. for what i did to him, for sleeping with those other guys when i was only trying to replace my ex... about 3 weeks ago, i was alone at home and just started drinking. 1,2,3,6,9... i don't remember it, but i ended up slitting my wrist. somehow i came around long enough to call a friend - she called 911. 5 stitches, could weeks and lots of bills later .. it's all i can think about. i want to do it right this time - or maybe just skip back to the pills. i miss him so much, and he doesn't give a shit about me anymore. i think it's time i ended both of our suffering. i just wish i was brave enough to do it in this exact moment... [edit: spelling]","i think it's time.it's taken me a while of lurking, to get up the courage to make a throwaway to write on this subreddit with. i'm 6 months out of a ltr that i ended. i ended it because i cheated on him, and felt guilty. i couldn't bring myself to tell him, but he found out after we broke it off. now, things are just terrible. we kinda sorta keep in contact, as we have mutual friends. i love him with all my heart. i've tried seeing other people, but that just led to such a degredation of my self esteem. i don't remember the first 3 months after the relationship because i was either high on pills, drunk off alcohol or high off trees. i slept with any guy who'd i'd known for awhile that showed interest. (no. i just met you lets fuck! tho. thankfully) the fact that i did that has just added to my guilt. i hate myself. for what i did to him, for sleeping with those other guys when i was only trying to replace my ex. about 3 weeks ago, i was alone at home and just started drinking. 1,2,3,6,9. i don't remember it, but i ended up slitting my wrist. somehow i came around long enough to call a friend - she called 911. 5 stitches, could weeks and lots of bills later . it's all i can think about. i want to do it right this time - or maybe just skip back to the pills. i miss him so much, and he doesn't give a shit about me anymore. i think it's time i ended both of our suffering. i just wish i was brave enough to do it in this exact moment. [edit: spelling]"
4,174612,"how should i die?it has been a week (?) trying to keep up the convo with the life coaching line provided by our company. they seem to respond poorly. i never get to the point why i wanna die already. so, i’ll just post here and rant about my life. this will become my suicide note — the one that’s hard to find. no one knows i’m on reddit. or if anyone knows, they don’t know how to use the app and/or they don’t know that i post here. because i vent out on twitter all the time. let’s start with my family, on how i became the biggest disappointment amongst my sibs... eldest is a cpa, she never failed any subject, quiz, exam, etc. in school. she is very bright and humorous. my parents loves her. bought her a laptop when she graduated from college. which is a big thing because we are not rich. second is now a cpe. my parents bought him almost all kinds of instruments (e.g. electric guitar, acoustic guitar, drum set, bass guitar, violin, keyboard, and many more) and supports him on his every gig (my mom is always there on the front seats when they perform) since he is very good in music. he is now working in japan. they both are graduate from a high-end university. and i am the youngest. i never graduated from college. i took 2 years and finished pre-dentistry at a so-so school, but never got to dentistry proper. why? i got pregnant at 18. my parents threw me out of the house while i was pregnant. i lived with my then boyfriend (that is also undergraduate and doesn’t have a job) at a squatters area. neighbors are drug addicts, a person dies every other day because of illegal drugs, police are around the area every nights, some houses are caught on fire (which, everyone though it a movement of some vigilantes because those houses are ‘spots’ where people do drugs). i gave birth and still lived there for another 3 months. i went home to my parents since i don’t want my daughter to grow up in that kind of environment. my parents finally accepted my daughter. i found a new boyfriend who has a stable job, he is my boss on my new job. i lived with him for (6 or 7) months, while my daughter lives with my parents. he abuses me physically, emotionally, and sexually. (i don’t wanna talk about it in details.) so, we broke up, left my job, and i lived with my parents for a couple of months. and, after i found my new job, i also found my new boyfriend. the one i have now. i love him so much. we’re running at almost two years now. we already have my daughter together with us for a year now. here’s the story of why: my now boyfriend and i went over to my parents’ since they want us to took after my daughter because all of ‘em have something to do that day. there’s no problem. we even went out on a family date (just the three of us). and when we got home, my father is already there drinking himself to death. and talked to my boyfriend. when he is drunk, he makes these unrealistic scenarios. and because of that, he thought my boyfriend isn’t serious about our relationship. my dad pulled his gun out to frighten my boyfriend. we went away from there and after a few weeks, we took my daughter with us. we are now a family, finally. my kid is turning three, and i don’t feel like i am being a good (enough) mother for her. there are times that she doesn’t have milk anymore, or diapers. she’s not even going to school yet. i feel like i can’t provide her the basic needs she should have. and my boyfriend doesn’t seem to understand me when i vent out to him. he doesn’t listen very good. but i love him. and i love how my kid loves him too. i don’t thing i’m giving them, both, enough to keep me as their mother and partner. my shitty life and low-paying job is not enough to live anymore. for some, this may be too shallow. but, fuck, i’m sorry. i’m sorry for this long post, too. i just don’t want to live anymore since i don’t feel like i’m doing enough. it never leaves my mind on how worthless i am. i just wanna kill myself already and end this miserable thoughts and what-ifs... how do you think should i die?","how should i die?it has been a week (?) trying to keep up the convo with the life coaching line provided by our company. they seem to respond poorly. i never get to the point why i wanna die already. so, i’ll just post here and rant about my life. this will become my suicide note — the one that’s hard to find. no one knows i’m on reddit. or if anyone knows, they don’t know how to use the app and/or they don’t know that i post here. because i vent out on twitter all the time. let’s start with my family, on how i became the biggest disappointment amongst my sibs. eldest is a cpa, she never failed any subject, quiz, exam, etc. in school. she is very bright and humorous. my parents loves her. bought her a laptop when she graduated from college. which is a big thing because we are not rich. second is now a cpe. my parents bought him almost all kinds of instruments (e.g. electric guitar, acoustic guitar, drum set, bass guitar, violin, keyboard, and many more) and supports him on his every gig (my mom is always there on the front seats when they perform) since he is very good in music. he is now working in japan. they both are graduate from a high-end university. and i am the youngest. i never graduated from college. i took 2 years and finished pre-dentistry at a so-so school, but never got to dentistry proper. why? i got pregnant at 18. my parents threw me out of the house while i was pregnant. i lived with my then boyfriend (that is also undergraduate and doesn’t have a job) at a squatters area. neighbors are drug addicts, a person dies every other day because of illegal drugs, police are around the area every nights, some houses are caught on fire (which, everyone though it a movement of some vigilantes because those houses are ‘spots’ where people do drugs). i gave birth and still lived there for another 3 months. i went home to my parents since i don’t want my daughter to grow up in that kind of environment. my parents finally accepted my daughter. i found a new boyfriend who has a stable job, he is my boss on my new job. i lived with him for (6 or 7) months, while my daughter lives with my parents. he abuses me physically, emotionally, and sexually. (i don’t wanna talk about it in details.) so, we broke up, left my job, and i lived with my parents for a couple of months. and, after i found my new job, i also found my new boyfriend. the one i have now. i love him so much. we’re running at almost two years now. we already have my daughter together with us for a year now. here’s the story of why: my now boyfriend and i went over to my parents’ since they want us to took after my daughter because all of ‘em have something to do that day. there’s no problem. we even went out on a family date (just the three of us). and when we got home, my father is already there drinking himself to death. and talked to my boyfriend. when he is drunk, he makes these unrealistic scenarios. and because of that, he thought my boyfriend isn’t serious about our relationship. my dad pulled his gun out to frighten my boyfriend. we went away from there and after a few weeks, we took my daughter with us. we are now a family, finally. my kid is turning three, and i don’t feel like i am being a good (enough) mother for her. there are times that she doesn’t have milk anymore, or diapers. she’s not even going to school yet. i feel like i can’t provide her the basic needs she should have. and my boyfriend doesn’t seem to understand me when i vent out to him. he doesn’t listen very good. but i love him. and i love how my kid loves him too. i don’t thing i’m giving them, both, enough to keep me as their mother and partner. my shitty life and low-paying job is not enough to live anymore. for some, this may be too shallow. but, fuck, i’m sorry. i’m sorry for this long post, too. i just don’t want to live anymore since i don’t feel like i’m doing enough. it never leaves my mind on how worthless i am. i just wanna kill myself already and end this miserable thoughts and what-ifs. how do you think should i die?"
5,219123,"i gave my roommate my razors and told him about cutting.he didn't freak out. but he said he was uncomfortable and that he now can't trust me as far as he can throw me.... and the reason i'm writing this is because it's always funny that if/when i tell someone, i always feel the need to apologize for it. i barely understand why i hurt myself (when i do, because, as said before, i don't want to die) but now i've got this added guilt/need to apologize when i do fess up. gah.","i gave my roommate my razors and told him about cutting.he didn't freak out. but he said he was uncomfortable and that he now can't trust me as far as he can throw me. and the reason i'm writing this is because it's always funny that if/when i tell someone, i always feel the need to apologize for it. i barely understand why i hurt myself (when i do, because, as said before, i don't want to die) but now i've got this added guilt/need to apologize when i do fess up. gah."
6,318779,"i got news today that ny auntie is completely sober from drugs! my auntie has been on pills, cocaine and much more since i was around 2 and after around 13 years i recently found out shes off drugs from my nan. she is now back on track and i thought i'd tell you guys, all of the fear i had from the amount of overdoses she's been through and even a coma at one point is now gone. this is the longest she's went without an overdose too, it became a frequent thing in 2016 when my cousin was born and was immediately put into care, but now hes back with her and everyone is happy!!! please stay away from drugs, you're not just putting yourself in danger its friends and family too.","i got news today that ny auntie is completely sober from drugs! my auntie has been on pills, cocaine and much more since i was around 2 and after around 13 years i recently found out shes off drugs from my nan. she is now back on track and i thought i'd tell you guys, all of the fear i had from the amount of overdoses she's been through and even a coma at one point is now gone. this is the longest she's went without an overdose too, it became a frequent thing in 2016 when my cousin was born and was immediately put into care, but now hes back with her and everyone is happy! please stay away from drugs, you're not just putting yourself in danger its friends and family too."
7,60104,i just want to...disappear? sleep? rest? laugh? dream? astral project? meditate? die? not feel helpless.,i just want to.disappear? sleep? rest? laugh? dream? astral project? meditate? die? not feel helpless.
8,289871,"i keep struggling and then i’m slammed down again i can’t keep doing thisi’m 33/f my parents are dead. my brother died in 2007, dad died 5/2013 and mom died 12/4/16 and i was her full time caregiver. long story short my siblings from her side did absolutely nothing. they made life hell for me. they criticized my care for her and made thing so hard. i no longer associate with them. at one point i was suicidal. my other sister is severely mentally and i love her dearly but i can’t rely on her for emotional support. my damn dog died 3 mths after my mom my best friend dropped me...when i wouldn’t give her mone for a vacation . i tried to change my life, tried to be positive move to a new city. got a promotion. i thought something good was finally happening but now in the past 3 days i totaled my car and my new dog as of today may have cancer and i have to wait 3 days to know. this whole 14 mths has meant nothing i tried to talk to a friend and his advice was i wouldn’t blow money on a sick animal. everything i try doesn’t work out, everybody around me leaves me or dies. i thought i was doing better but it’s all lies. i’ve cried for the past 4 hrs so hard my face hurts. i’m so scared if my boy dies i’ll have to join him. i don’t want to be alone...i wish i had one person to be with me. i don’t want to go to work tomorrow i have no one else to talk to","i keep struggling and then i’m slammed down again i can’t keep doing thisi’m 33/f my parents are dead. my brother died in 2007, dad died 5/2013 and mom died 12/4/16 and i was her full time caregiver. long story short my siblings from her side did absolutely nothing. they made life hell for me. they criticized my care for her and made thing so hard. i no longer associate with them. at one point i was suicidal. my other sister is severely mentally and i love her dearly but i can’t rely on her for emotional support. my damn dog died 3 mths after my mom my best friend dropped me.when i wouldn’t give her mone for a vacation . i tried to change my life, tried to be positive move to a new city. got a promotion. i thought something good was finally happening but now in the past 3 days i totaled my car and my new dog as of today may have cancer and i have to wait 3 days to know. this whole 14 mths has meant nothing i tried to talk to a friend and his advice was i wouldn’t blow money on a sick animal. everything i try doesn’t work out, everybody around me leaves me or dies. i thought i was doing better but it’s all lies. i’ve cried for the past 4 hrs so hard my face hurts. i’m so scared if my boy dies i’ll have to join him. i don’t want to be alone.i wish i had one person to be with me. i don’t want to go to work tomorrow i have no one else to talk to"
9,156844,i've been wondering what does a bag of freezing cold cheetos tastes like... filler filler filler filler filler filler filler,i've been wondering what does a bag of freezing cold cheetos tastes like. filler filler filler filler filler filler filler


###**Eliminación URLs**

En este apartado se llevará a cabo la identificación de textos que contienen enlaces o direcciones web, ya que este tipo de elementos no aporta información útil para el análisis lingüístico y puede introducir ruido innecesario en el procesamiento. Para revisar su presencia, se tomará una muestra de registros donde aparezcan URLs y se generará una vista comparativa que mostrará cómo se verá el texto antes y después de eliminarlas.

Una vez confirmado que la limpieza se está aplicando correctamente, la eliminación de URLs se extenderá al resto del conjunto de datos, dejando cada texto libre de enlaces externos. Con ello, se garantizará que el contenido final esté enfocado únicamente en la información textual relevante para las siguientes etapas del pre-procesamiento.

In [0]:
# Detectar registros con URLs
url_pattern = r"(https?://\S+|www\.\S+)"

df_url_preview = (
    df
    .filter(F.col("text").rlike(url_pattern))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Crear columna after (limpieza)
df_url_preview = df_url_preview.withColumn(
    "after",
    F.regexp_replace("before", url_pattern, "")
)

# Filtrar solo registros donde sí hubo cambios
df_url_changed = df_url_preview.filter(F.col("before") != F.col("after")).limit(10)

# Convertir a Pandas
preview_pd = df_url_changed.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", url_pattern, "")
)

Unnamed: 0,id,before,after
0,79923,"i dont know if i am normal im 15 .in past months i started to see that i have problem with communicating with people, and started to think about it, not making excuses for my behavior. i clearly use ""mathematical"" approah to my life, i mean i analyze almost everything to make decisions to benefit me, and people that i care about. even if it will mean someones else loss. i also dont feel much emotions, if i get yelled at, i shrug it off, i think i feel hapiness the ""normal"" way, i am almost not able to cry (for last few years the best i got was wet eyes). i dont like beeing surrounded by attention (i dont know what to do in those situations), i have a very hard time making new contacts, i mean speaking for the first time to sbd, i will try to make a trustable way of portraying it as a belivable accident. i dont really often feel guilt, or negative emotions for not adjusting to moral, or any other rules (as long as what i did was well thouht, worth the risk, or i thought of it as stupid. also, i write here beacouse i am not really concerned about it, but i just want to hear sbd anonymous and their view of this topic. sorry for any grammatical, or ortgraphical mistakes. &#x200b; bellow this point i will paste results of some free psyho tests (i know that they are probably shit, but they give something at least) there is no clear indication that you might have a psychopathic / antisocial personality disorder. you reached, however, 50,00% of factor 1 which captures the core personality traits of psychopathy that define the interpersonal and affective deficits of this personality disorder (e.g. shallow affect, superficial charm, manipulativeness, lack of empathy) and that are correlated with narcissistic personality disorder, low anxiety, low empathy, low stress reaction and low suicide risk. but factor 1 is also associated with extraversion and positive affect - affected persons usually score high on scales of achievement and well-being, so some aspects of the personality disorder may even be beneficial for the psychopath (in terms of nondeviant social functioning or if it comes to profit from manipulation or lies). you reached, however, 22,22% of factor 2 which captures the traits of antisocial behavior (e.g. criminal versatility, impulsiveness, irresponsibility, poor behaviour controls, juvenile delinquency) and is associated with reactive anger, social deviance, sensation seeking, anxiety, increased risk of suicide, low socio-economic status, criminality, and impulsive violence. you reached, however, 100,00% in other relevant traits that can indicate this personality disorder. you might have certain traits of antisocial or psychopathic personalities but certainly not in a form that would justify a personality disorder diagnosis according to the standards. score: 16 of 38 \[16:8/4/4\] &#x200b; there are strong indications that you might have a narcissistic personality disorder. there are slight differences between the major diagnostic manuals in how to diagnose a narcissistic personality disorder, with the icd-10 manual stating that a person may only be diagnosed with a narcissistic personality disorder if she/he does not meet the diagnostic criteria for a dissocial (antisocial, psychopathic), histrionic or any of the other personality disorders at the same time. for the dsm-iv manual, there is no such exclusion. it is unusual for npd personality types to seek therapy, as they unconsciously fear exposure or inadequacy and will usually disdain therapeutic processes or the idea of psychotherapy itself, sabotage the therapeutic process or openly oppose it. pharmacotherapy is rarely effective. score: 5 of 9 &#x200b; there is no indication that you might have a histrionic personality disorder.\[d:4/i:3\] &#x200b; you meet 100% of the range of general personality disorder criteria. this further indicates that you might have to deal with a severe personality disorder. thus, is strongly recommended you seek a professional diagnosis to be sure what exactly you are dealing with. it might turn out useful to print the previous page including your selections and take it to a psychotherapist, psychiatrist or psychologist.\[g:5\] &#x200b; \-- quelle: [https://www.counseling-office.com/surveys/survey\_p.php](https://www.counseling-office.com/surveys/survey_p.php) &#x200b; you have completed the levenson self-report psychopathy scale. the lsrp measures two scales. scores range from 1 (low) to 5 (high). your score from primary psychopathy has been calculated as 4.1. primary psychopathy is the affective aspects of psychopathy; a lack of empathy for other people and tolerance for antisocial orientations. your score from secondary psychopathy has been calculated as 1.8. secondary psychopathy is the antisocial aspects of psychopathy; rule breaking and a lack of effort towards socially rewarded behavior. with two scores, results of the lsrp are very suitable for being plotted. below is the distribution of how other you score for primary psychopathy was higher than 91.63% of people who have taken this test. you score for secondary psychopathy was higher than 16.44% of people who have taken this test.","i dont know if i am normal im 15 .in past months i started to see that i have problem with communicating with people, and started to think about it, not making excuses for my behavior. i clearly use ""mathematical"" approah to my life, i mean i analyze almost everything to make decisions to benefit me, and people that i care about. even if it will mean someones else loss. i also dont feel much emotions, if i get yelled at, i shrug it off, i think i feel hapiness the ""normal"" way, i am almost not able to cry (for last few years the best i got was wet eyes). i dont like beeing surrounded by attention (i dont know what to do in those situations), i have a very hard time making new contacts, i mean speaking for the first time to sbd, i will try to make a trustable way of portraying it as a belivable accident. i dont really often feel guilt, or negative emotions for not adjusting to moral, or any other rules (as long as what i did was well thouht, worth the risk, or i thought of it as stupid. also, i write here beacouse i am not really concerned about it, but i just want to hear sbd anonymous and their view of this topic. sorry for any grammatical, or ortgraphical mistakes. &#x200b; bellow this point i will paste results of some free psyho tests (i know that they are probably shit, but they give something at least) there is no clear indication that you might have a psychopathic / antisocial personality disorder. you reached, however, 50,00% of factor 1 which captures the core personality traits of psychopathy that define the interpersonal and affective deficits of this personality disorder (e.g. shallow affect, superficial charm, manipulativeness, lack of empathy) and that are correlated with narcissistic personality disorder, low anxiety, low empathy, low stress reaction and low suicide risk. but factor 1 is also associated with extraversion and positive affect - affected persons usually score high on scales of achievement and well-being, so some aspects of the personality disorder may even be beneficial for the psychopath (in terms of nondeviant social functioning or if it comes to profit from manipulation or lies). you reached, however, 22,22% of factor 2 which captures the traits of antisocial behavior (e.g. criminal versatility, impulsiveness, irresponsibility, poor behaviour controls, juvenile delinquency) and is associated with reactive anger, social deviance, sensation seeking, anxiety, increased risk of suicide, low socio-economic status, criminality, and impulsive violence. you reached, however, 100,00% in other relevant traits that can indicate this personality disorder. you might have certain traits of antisocial or psychopathic personalities but certainly not in a form that would justify a personality disorder diagnosis according to the standards. score: 16 of 38 \[16:8/4/4\] &#x200b; there are strong indications that you might have a narcissistic personality disorder. there are slight differences between the major diagnostic manuals in how to diagnose a narcissistic personality disorder, with the icd-10 manual stating that a person may only be diagnosed with a narcissistic personality disorder if she/he does not meet the diagnostic criteria for a dissocial (antisocial, psychopathic), histrionic or any of the other personality disorders at the same time. for the dsm-iv manual, there is no such exclusion. it is unusual for npd personality types to seek therapy, as they unconsciously fear exposure or inadequacy and will usually disdain therapeutic processes or the idea of psychotherapy itself, sabotage the therapeutic process or openly oppose it. pharmacotherapy is rarely effective. score: 5 of 9 &#x200b; there is no indication that you might have a histrionic personality disorder.\[d:4/i:3\] &#x200b; you meet 100% of the range of general personality disorder criteria. this further indicates that you might have to deal with a severe personality disorder. thus, is strongly recommended you seek a professional diagnosis to be sure what exactly you are dealing with. it might turn out useful to print the previous page including your selections and take it to a psychotherapist, psychiatrist or psychologist.\[g:5\] &#x200b; \-- quelle: [ &#x200b; you have completed the levenson self-report psychopathy scale. the lsrp measures two scales. scores range from 1 (low) to 5 (high). your score from primary psychopathy has been calculated as 4.1. primary psychopathy is the affective aspects of psychopathy; a lack of empathy for other people and tolerance for antisocial orientations. your score from secondary psychopathy has been calculated as 1.8. secondary psychopathy is the antisocial aspects of psychopathy; rule breaking and a lack of effort towards socially rewarded behavior. with two scores, results of the lsrp are very suitable for being plotted. below is the distribution of how other you score for primary psychopathy was higher than 91.63% of people who have taken this test. you score for secondary psychopathy was higher than 16.44% of people who have taken this test."
1,89365,cat i drew while sitting alone in the classroom &#x200b; https://preview.redd.it/92elts11brr61.jpg?width=4160&format=pjpg&auto=webp&s=2de21a014d04a3120e31cff4448623118a334bd0,cat i drew while sitting alone in the classroom &#x200b;
2,117686,day #20 of recommending songs. [the killers - dying breed](https://youtu.be/toev-efobzy) comment what your favorite songs is of the killers. except mr. brightside ofcourse.,day #20 of recommending songs. [the killers - dying breed]( comment what your favorite songs is of the killers. except mr. brightside ofcourse.
3,84606,test post haha test post [test test haha](https://youtu.be/eoadltjcyxw),test post haha test post [test test haha](
4,294376,"please someone out there fucking help mei (19f) can't take it anymore. i'm not immediately suicidal but i've lost all motivation to draw, play video games, and do basic activities on a daily basis. i've basically been in ""lockdown"" my whole life (way before the pandemic), with my only real life contact with the outside world being 9 hours of therapy with a bunch of middle aged adults. aside from the cold, i can't step a single foot out the door because of how trashy my father has made the porch. both my guinea pigs are fucking dead and my parents screams my cat. in fact, everyone screams across the house so much that it's given me severe anxiety and possibly even hearing loss. i'm always stressed bc they all have booming loud voices and there's nothing i can do about it. it's always so demanding and makes my nerves explode off the roof with me just nodding my head in response as it's the quickest thing i can do. i long to go for car rides whatsoever as my one and only temporary escape from such madness. my desperation got so bad that i will start to pop a dayquil and a terazosin every time they time they leave me at home. anything helps at this point, please i'm begging. [i've made a post about my situation here](https://www.reddit.com/r/advice/comments/keis1r/what_do_you_do_when_nothing_more_can_be_done/) big warning for nsfw","please someone out there fucking help mei (19f) can't take it anymore. i'm not immediately suicidal but i've lost all motivation to draw, play video games, and do basic activities on a daily basis. i've basically been in ""lockdown"" my whole life (way before the pandemic), with my only real life contact with the outside world being 9 hours of therapy with a bunch of middle aged adults. aside from the cold, i can't step a single foot out the door because of how trashy my father has made the porch. both my guinea pigs are fucking dead and my parents screams my cat. in fact, everyone screams across the house so much that it's given me severe anxiety and possibly even hearing loss. i'm always stressed bc they all have booming loud voices and there's nothing i can do about it. it's always so demanding and makes my nerves explode off the roof with me just nodding my head in response as it's the quickest thing i can do. i long to go for car rides whatsoever as my one and only temporary escape from such madness. my desperation got so bad that i will start to pop a dayquil and a terazosin every time they time they leave me at home. anything helps at this point, please i'm begging. [i've made a post about my situation here]( big warning for nsfw"
5,196148,hey please sign my petition it may not seem like a big issue to you but it matters a lot to me so please help https://www.change.org/helpaaemployees i made this petition because my mom works at american airlines as a customer service agent and she has to stand for 10 hours a day it really puts a strain on her. her coworkers have to stand for a long period of time as well. the petition is pretty much just asking that american airlines give them chairs. thanks for reading,hey please sign my petition it may not seem like a big issue to you but it matters a lot to me so please help i made this petition because my mom works at american airlines as a customer service agent and she has to stand for 10 hours a day it really puts a strain on her. her coworkers have to stand for a long period of time as well. the petition is pretty much just asking that american airlines give them chairs. thanks for reading
6,174916,day 2 of posting music no one i know has heard of [up all night-viagra boys](https://open.spotify.com/track/3pdmuthj2kmahpysoio6fg?si=ukdqygioqvsnotke1un4na),day 2 of posting music no one i know has heard of [up all night-viagra boys](
7,204455,"feeling pretty good right now. stark contrast to my usual. sending you all the positive vibes i possibly can. broke free from my anxiety a bit today, so i’ve been riding that high. i’ve made a few friends these past 6 months. they’re all truly wonderful people, i’m beyond lucky to have them in my life. i’ve got a roof over my head, access to water, food, etc. i’m leaning back on my bed in a hoodie & sweats that i pulled out of the dryer a while ago, feeling comfortable as hell. got some lofi filling the room totally enhancing this good vibe my head isn’t clouded with depression or anxiety. any worry or doubt is whisked away by the soft sound of this music. for the first night in a very long time, i feel okay. i hope you do too. and if not, please, i invite you to join me in stopping for a bit to enjoy some lofi with me, [here.](https://youtu.be/li9cfmag7_a) i’ll hang out for a bit longer to talk to anyone who feels like commenting, but if nobody does, that’s perfectly fine. just know that whatever it is you’re going through, you’re not alone. there is hope. and if you were looking for a sign not to give up: this is it, right here. you can do the thing. i believe in you & i’m so proud of you for making it to this moment. i genuinely hope all of you lovely peoples’ nights/mornings go well, that you can find a way to beat your demons. probably going to doze off soon, but i just wanted to share this small victory with all of you. love ya ♡","feeling pretty good right now. stark contrast to my usual. sending you all the positive vibes i possibly can. broke free from my anxiety a bit today, so i’ve been riding that high. i’ve made a few friends these past 6 months. they’re all truly wonderful people, i’m beyond lucky to have them in my life. i’ve got a roof over my head, access to water, food, etc. i’m leaning back on my bed in a hoodie & sweats that i pulled out of the dryer a while ago, feeling comfortable as hell. got some lofi filling the room totally enhancing this good vibe my head isn’t clouded with depression or anxiety. any worry or doubt is whisked away by the soft sound of this music. for the first night in a very long time, i feel okay. i hope you do too. and if not, please, i invite you to join me in stopping for a bit to enjoy some lofi with me, [here.]( i’ll hang out for a bit longer to talk to anyone who feels like commenting, but if nobody does, that’s perfectly fine. just know that whatever it is you’re going through, you’re not alone. there is hope. and if you were looking for a sign not to give up: this is it, right here. you can do the thing. i believe in you & i’m so proud of you for making it to this moment. i genuinely hope all of you lovely peoples’ nights/mornings go well, that you can find a way to beat your demons. probably going to doze off soon, but i just wanted to share this small victory with all of you. love ya ♡"
8,9092,"my band's 3rd single! indie, slightly prog https://open.spotify.com/track/3nlo0nol6zfasrg2nx7t7n?si=kpikv5soqsum92kle2ufva 100% listening to feedback!","my band's 3rd single! indie, slightly prog 100% listening to feedback!"
9,241174,"i have a plan ok, so basically, i came out as trans recently and i want to buy some girly clothes. however, my parents buy things for me ( i still pay ) so i can't buy anything without them knowing. so here's my plan to bypass my parent's control. ( they can also track my phone and where i go and when ) [https://docs.google.com/document/d/1s36y5tnzpkvhf-rsz7nq681ia50xjvjl7mfcxjmnude/edit?usp=sharing](https://docs.google.com/document/d/1s36y5tnzpkvhf-rsz7nq681ia50xjvjl7mfcxjmnude/edit?usp=sharing)","i have a plan ok, so basically, i came out as trans recently and i want to buy some girly clothes. however, my parents buy things for me ( i still pay ) so i can't buy anything without them knowing. so here's my plan to bypass my parent's control. ( they can also track my phone and where i go and when ) ["


###**Eliminación Correos Electrónicos (Emails)**

En este apartado se llevará a cabo la detección de textos que contienen direcciones de correo electrónico, un tipo de información que no aporta valor al análisis del lenguaje y que podría generar ruido o sesgos innecesarios. Para revisar su presencia, se tomará una muestra de registros donde aparezcan estos patrones y se mostrará una comparación entre el texto original y su versión sin correos electrónicos.

Después de verificar que la limpieza se esté aplicando correctamente, la eliminación de correos electrónicos se realizará sobre todo el conjunto de datos, asegurando que cada texto quede libre de este tipo de información puntual. Con ello, el contenido quedará más enfocado y mejor preparado para las siguientes etapas del proceso de pre-procesamiento.

In [0]:
# Detectar registros con emails
email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

df_email_preview = (
    df
    .filter(F.col("text").rlike(email_pattern))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Crear columna after (limpieza)
df_email_preview = df_email_preview.withColumn(
    "after",
    F.regexp_replace("before", email_pattern, "")
)

# Filtrar solo los registros donde sí hubo cambio
df_email_changed = df_email_preview.filter(F.col("before") != F.col("after")).limit(10)

# Convertir a pandas
preview_pd = df_email_changed.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", email_pattern, "")
)

Unnamed: 0,id,before,after
0,227347,"have you experienced worrisome changes in thoughts or feelings? participate in online research!**what is the purpose of the study?** we want to better understand the development of psychiatric symptoms in adolescents and young adults **who can participate?** males and females in georgia, usa ages 12-34 who have experienced one or more of the following: · unusual thoughts · suspiciousness · a sense of having special powers or unrealistic plans for the future · changes in sensory experiences such as unusual experiences with seeing or hearing things that are not there **what will participants be asked to do?** participants will be asked to attend 2 or more remote sessions on zoom to do the following: \- complete interviews and questionnaires about life experiences, lasting about 3 ½ hours \- complete tasks that measure attention and memory, lasting about 2 hours \- take surveys about emotion for 1 week using your android or iphone \- repeat the same procedures 12 months later **where is the study taking place?** online using zoom video conferencing **how much does it cost?** there is no cost for participation **will i receive payment of some kind?** you can receive up to $311 at baseline and up to $421 at 12-month follow up for more information, call our lab at **706-510-7101** or email [**canlab@uga.edu**](mailto:canlab@uga.edu) for a confidential screening.","have you experienced worrisome changes in thoughts or feelings? participate in online research!**what is the purpose of the study?** we want to better understand the development of psychiatric symptoms in adolescents and young adults **who can participate?** males and females in georgia, usa ages 12-34 who have experienced one or more of the following: · unusual thoughts · suspiciousness · a sense of having special powers or unrealistic plans for the future · changes in sensory experiences such as unusual experiences with seeing or hearing things that are not there **what will participants be asked to do?** participants will be asked to attend 2 or more remote sessions on zoom to do the following: \- complete interviews and questionnaires about life experiences, lasting about 3 ½ hours \- complete tasks that measure attention and memory, lasting about 2 hours \- take surveys about emotion for 1 week using your android or iphone \- repeat the same procedures 12 months later **where is the study taking place?** online using zoom video conferencing **how much does it cost?** there is no cost for participation **will i receive payment of some kind?** you can receive up to $311 at baseline and up to $421 at 12-month follow up for more information, call our lab at **706-510-7101** or email [****](mailto:) for a confidential screening."
1,330735,"have you experienced worrisome changes in thoughts or feelings? participate in online research!**what is the purpose of the study?** we want to better understand the development of psychiatric symptoms in adolescents and young adults **who can participate?** males and females in georgia, usa ages 12-34 who have experienced one or more of the following: · unusual thoughts · suspiciousness · a sense of having special powers or unrealistic plans for the future · changes in sensory experiences such as unusual experiences with seeing or hearing things that are not there **what will participants be asked to do?** participants will be asked to attend 2 or more remote sessions on zoom to do the following: \- complete interviews and questionnaires about life experiences, lasting about 3 ½ hours \- complete tasks that measure attention and memory, lasting about 2 hours \- take surveys about emotion for 1 week using your android or iphone \- repeat the same procedures 12 months later **where is the study taking place?** online using zoom video conferencing **how much does it cost?** there is no cost for participation **will i receive payment of some kind?** you can receive up to $311 at baseline and up to $421 at 12-month follow up &#x200b; for more information, call our lab at **706-510-7101** or email [**canlab@uga.edu**](mailto:canlab@uga.edu) for a confidential screening.","have you experienced worrisome changes in thoughts or feelings? participate in online research!**what is the purpose of the study?** we want to better understand the development of psychiatric symptoms in adolescents and young adults **who can participate?** males and females in georgia, usa ages 12-34 who have experienced one or more of the following: · unusual thoughts · suspiciousness · a sense of having special powers or unrealistic plans for the future · changes in sensory experiences such as unusual experiences with seeing or hearing things that are not there **what will participants be asked to do?** participants will be asked to attend 2 or more remote sessions on zoom to do the following: \- complete interviews and questionnaires about life experiences, lasting about 3 ½ hours \- complete tasks that measure attention and memory, lasting about 2 hours \- take surveys about emotion for 1 week using your android or iphone \- repeat the same procedures 12 months later **where is the study taking place?** online using zoom video conferencing **how much does it cost?** there is no cost for participation **will i receive payment of some kind?** you can receive up to $311 at baseline and up to $421 at 12-month follow up &#x200b; for more information, call our lab at **706-510-7101** or email [****](mailto:) for a confidential screening."
2,83626,"recruiting for a research study on suicide in seattlewe're now recruiting for a study focused on youth and young adult suicidal thoughts and behaviors. this study is funded by the national institute of mental health (nimh) and involves participants (16-20 years old) coming to the university of washington - seattle campus (with their parents if under 18) for an initial appointment and evaluation, and then participants filling out surveys administered over cell phones for 14 days. please go to [ariseuw.com]( to learn more about the study and fill out a screening survey to determine eligibility. participants are compensated up to $125 for their time. &#x200b; thanks, &#x200b; arise study team [arise@uw.edu](mailto:arise@uw.edu)","recruiting for a research study on suicide in seattlewe're now recruiting for a study focused on youth and young adult suicidal thoughts and behaviors. this study is funded by the national institute of mental health (nimh) and involves participants (16-20 years old) coming to the university of washington - seattle campus (with their parents if under 18) for an initial appointment and evaluation, and then participants filling out surveys administered over cell phones for 14 days. please go to [ariseuw.com]( to learn more about the study and fill out a screening survey to determine eligibility. participants are compensated up to $125 for their time. &#x200b; thanks, &#x200b; arise study team [](mailto:)"
3,269644,"suicide isn t a crimewhy are countries spending so much money and effort in creating weapons to kill , but they won’t let me die . i just want to die witout felling hurt. if some one has an idee my email is : benjamin.ramosdeolival@gmail.com","suicide isn t a crimewhy are countries spending so much money and effort in creating weapons to kill , but they won’t let me die . i just want to die witout felling hurt. if some one has an idee my email is :"
4,305077,"how i got my life back in orderthis is a long post, sorry for my english. i want to start by saying that it is important to listen to your psychologist, treat yourself like you would treat your bestfriend if he/she were in your shoes. also, take your medication, i know how hard it is to get out of bed when dealing with depression and (for me) meds were the thing that suppressed my negative emotions just enough to get things done. remember, getting out of depression is a marathon, not a sprint. about a year and a half ago, i started really hating my self, and for good reasons (i thought). i started doing all the drugs i could find, almost overdosed a couple times cause i didnt care, i stopped working, stopped going to school, stopped working out, i was very self destructive because i was gonna kill myself whenever it was too much. in september, i was a good student, i was kind and had never touched drugs, i was working out 5 times a week and by december, i had nothing, i was done, i never thought i could ever get out of the mess that was my life. just before christmas, my best friend (now girlfriend) invited me to go to miami with her for a week, i said yes and during that trip, we finally got in a serious relationship, this is what i needed, i had hope, i wanted to get better. so when i got back, i started seeing a psychologist, and i started taking medication, sadly, i couldnt afford to see my psychologist after a while, but i still had hope. heres how i got my life back on track. i was going to ""tidy up my life"", one thing at a time. first, i wanted to get rid of my addiction, to do that, i smoked 3 joints a day until i was used to it, then i smoked 2 joints a day, after that it was 1, then i started only smoking before going to sleep and one day, i was smoking 1 puff a day before going to sleep. after that, i stopped. remember, its marathon. now i was addiction free, but i wanted to clean my place because it was disgusting, i think some of you will relate, i couldnt see the floor in my room, my bed was a storage space for anything that could fit, i had nothing to eat with because all of my plates had been in the sink for a while, i had no furniture except for a tv, a couch and an old table that used to be white but was now a mess of colours. so one day, i decided to clean my room, it took me 3 days, but in the end my clothes were in bags and my bed was only a bed. after that i did the kitchen, same thing there, it took days and i had to throw the plates in the garbage, but it was clean, same thing for the living room and bathroom. after 2 weeks, my place was clean. now, i was addiction free and my place was clean, but i wanted some money so i got a job, i worked only 14h a week because i knew that i couldnt do more, and it was okay. now, i was addiction free, my place was clean and i made 800$ a month. but i was tired of the meds, so i stopped taking them, sadly, i didnt stop the right way so it was hard. i started being very anxious all the time, i was having panic attacks and for 3 months life was hell, but since my life was somewhat in order now, i got through it, and after 3 months, i wasnt on any medication. now i was addiction free blah blah blah and i wanted some furniture, so i got some with my 800$ a month, i got a 50$ table and a 25$ table for my tv. not much but it was good. then, at the beginning of summer, i started working out again, life wasnt great, but it was okay. then, in august, i wanted to go back to school, so i looked for online programs in universities outside of quebec because my grades were too bad to go to university in quebec, and god damn i was accepted by a university for the fall semester. i was now addiction free, my place was clean, i had a stable job, i was working out, i was off meds and i was in university. my life was back on track. right now, im making 1700$ a month working a job i like, im getting ready for my exams, im trying to start a buisness with my cousin and i smoke weed from time to time. it was a marathon, it took a year and a half but im now better than i ever was. dont give up, it is a lot to deal with, there is a lot of work to do, but every single thing you do will make the next thing easier, take your time. do not hate yourself for who you are, depression is not a choice, it is real, and there is hope. do one thing at a time, not matter how small it is, if you wash 1 dish a day, well god damn it is still 1 more every day, remember it took me 1 year and a half to deal with my shit. i hope my post was enough to help at least one person. depression is a marathon, if you sprint, you will lose your breath, but if you walk, you will find the end, no matter how far it is. heres my email for anyone who feels like it could help them: felixlarocque21@gmail.com","how i got my life back in orderthis is a long post, sorry for my english. i want to start by saying that it is important to listen to your psychologist, treat yourself like you would treat your bestfriend if he/she were in your shoes. also, take your medication, i know how hard it is to get out of bed when dealing with depression and (for me) meds were the thing that suppressed my negative emotions just enough to get things done. remember, getting out of depression is a marathon, not a sprint. about a year and a half ago, i started really hating my self, and for good reasons (i thought). i started doing all the drugs i could find, almost overdosed a couple times cause i didnt care, i stopped working, stopped going to school, stopped working out, i was very self destructive because i was gonna kill myself whenever it was too much. in september, i was a good student, i was kind and had never touched drugs, i was working out 5 times a week and by december, i had nothing, i was done, i never thought i could ever get out of the mess that was my life. just before christmas, my best friend (now girlfriend) invited me to go to miami with her for a week, i said yes and during that trip, we finally got in a serious relationship, this is what i needed, i had hope, i wanted to get better. so when i got back, i started seeing a psychologist, and i started taking medication, sadly, i couldnt afford to see my psychologist after a while, but i still had hope. heres how i got my life back on track. i was going to ""tidy up my life"", one thing at a time. first, i wanted to get rid of my addiction, to do that, i smoked 3 joints a day until i was used to it, then i smoked 2 joints a day, after that it was 1, then i started only smoking before going to sleep and one day, i was smoking 1 puff a day before going to sleep. after that, i stopped. remember, its marathon. now i was addiction free, but i wanted to clean my place because it was disgusting, i think some of you will relate, i couldnt see the floor in my room, my bed was a storage space for anything that could fit, i had nothing to eat with because all of my plates had been in the sink for a while, i had no furniture except for a tv, a couch and an old table that used to be white but was now a mess of colours. so one day, i decided to clean my room, it took me 3 days, but in the end my clothes were in bags and my bed was only a bed. after that i did the kitchen, same thing there, it took days and i had to throw the plates in the garbage, but it was clean, same thing for the living room and bathroom. after 2 weeks, my place was clean. now, i was addiction free and my place was clean, but i wanted some money so i got a job, i worked only 14h a week because i knew that i couldnt do more, and it was okay. now, i was addiction free, my place was clean and i made 800$ a month. but i was tired of the meds, so i stopped taking them, sadly, i didnt stop the right way so it was hard. i started being very anxious all the time, i was having panic attacks and for 3 months life was hell, but since my life was somewhat in order now, i got through it, and after 3 months, i wasnt on any medication. now i was addiction free blah blah blah and i wanted some furniture, so i got some with my 800$ a month, i got a 50$ table and a 25$ table for my tv. not much but it was good. then, at the beginning of summer, i started working out again, life wasnt great, but it was okay. then, in august, i wanted to go back to school, so i looked for online programs in universities outside of quebec because my grades were too bad to go to university in quebec, and god damn i was accepted by a university for the fall semester. i was now addiction free, my place was clean, i had a stable job, i was working out, i was off meds and i was in university. my life was back on track. right now, im making 1700$ a month working a job i like, im getting ready for my exams, im trying to start a buisness with my cousin and i smoke weed from time to time. it was a marathon, it took a year and a half but im now better than i ever was. dont give up, it is a lot to deal with, there is a lot of work to do, but every single thing you do will make the next thing easier, take your time. do not hate yourself for who you are, depression is not a choice, it is real, and there is hope. do one thing at a time, not matter how small it is, if you wash 1 dish a day, well god damn it is still 1 more every day, remember it took me 1 year and a half to deal with my shit. i hope my post was enough to help at least one person. depression is a marathon, if you sprint, you will lose your breath, but if you walk, you will find the end, no matter how far it is. heres my email for anyone who feels like it could help them:"
5,73448,"i don't wanna diebeen without a job for almost 2years now, living miserable depressed life. tried killing myself couple of times but was afraid of the pain. wanted to join airforce, also tried for electrical engineering jobs, well nobody wants to take me in. as am physically strong, at least wanna join terror groups. i don't care anymore, what society thinks. can ping me personally mail: vinuhigs11121@gmail.com","i don't wanna diebeen without a job for almost 2years now, living miserable depressed life. tried killing myself couple of times but was afraid of the pain. wanted to join airforce, also tried for electrical engineering jobs, well nobody wants to take me in. as am physically strong, at least wanna join terror groups. i don't care anymore, what society thinks. can ping me personally mail:"
6,212511,i just want someone that i can talk to.i really just want to talk to somebody about how i feel. add me on yahoo messenger please? iamsosorrylove@yahoo.com,i just want someone that i can talk to.i really just want to talk to somebody about how i feel. add me on yahoo messenger please?
7,48830,i want to diei want to die really bad. im couldn't do it alone. if anyone interested to help me. email. saavudanee@protonmail.com. but only of you you really interested and you are above 18.i hate me .my life and everything. it's a waste to.live.i don't hav anything to live for,i want to diei want to die really bad. im couldn't do it alone. if anyone interested to help me. email. . but only of you you really interested and you are above 18.i hate me .my life and everything. it's a waste to.live.i don't hav anything to live for
8,27298,"back on bills, i need help.i work as a waiter. i serve coffee. i get about 1,000 a month. part of it is given to my parents as allowance due to me being asian and what not. i hardly have time to sleep, i work 14 hours a day. i just need somebody to get me $300 so i can pay my bills. i don't know what to do now, i'm on the verge of suicide. if anybody wants to help me, my paypal is thatpizzaguy@hotmail.sg i need $300. it should be enough to cover the bills for now. i'm going to sleep, hopefully somebody is kind enough to help me. i've had my friends bail me out of this sort of situation many times but they just don't care about me anymore. otherwise, anybody have a good way to commit suicide?","back on bills, i need help.i work as a waiter. i serve coffee. i get about 1,000 a month. part of it is given to my parents as allowance due to me being asian and what not. i hardly have time to sleep, i work 14 hours a day. i just need somebody to get me $300 so i can pay my bills. i don't know what to do now, i'm on the verge of suicide. if anybody wants to help me, my paypal is i need $300. it should be enough to cover the bills for now. i'm going to sleep, hopefully somebody is kind enough to help me. i've had my friends bail me out of this sort of situation many times but they just don't care about me anymore. otherwise, anybody have a good way to commit suicide?"
9,65496,"communei am considering starting a commune in california on a 20 acre ranch. it would either be free or very inexpensive like 25-100 a month and we could do fundraising and take donations in nearby cities. cheap food like rice and beans would be cooked and we could have chicken coops and garden. water and camping gear will be provided. transportation to town provided. we can have campfires and coffee or tea. my goal is to provide nonprofit service to homeless, hopeless, lonely, and hungry people who may be penniless and or suicidal and in deep debt and unable to get and hold a job. police will be welcome on site so illegal drug addicts criminals will not be welcome and will be banned. i need a substantial list of names of interested people to proceed and all ideas considered. please email phillipanthonybiondo@gmail.com with your name and contact info and any ideas like how to recruit members, please try to get names of interested people, i really want to do this, i think 10 people minimum would be enough to start but ideally would like to have a large group and maybe someday a huge community.","communei am considering starting a commune in california on a 20 acre ranch. it would either be free or very inexpensive like 25-100 a month and we could do fundraising and take donations in nearby cities. cheap food like rice and beans would be cooked and we could have chicken coops and garden. water and camping gear will be provided. transportation to town provided. we can have campfires and coffee or tea. my goal is to provide nonprofit service to homeless, hopeless, lonely, and hungry people who may be penniless and or suicidal and in deep debt and unable to get and hold a job. police will be welcome on site so illegal drug addicts criminals will not be welcome and will be banned. i need a substantial list of names of interested people to proceed and all ideas considered. please email with your name and contact info and any ideas like how to recruit members, please try to get names of interested people, i really want to do this, i think 10 people minimum would be enough to start but ideally would like to have a large group and maybe someday a huge community."


###**Eliminación Menciones (@Usuario)**

En este apartado se identificarán y eliminarán las menciones típicas de redes sociales, es decir, aquellas expresiones que comienzan con @ y hacen referencia a un usuario. Este tipo de elementos no aportará información útil para el análisis y podría interferir con la interpretación del lenguaje.

Primero se generará una muestra para visualizar cómo lucirán los textos antes y después de limpiar estas menciones. Una vez verificado el comportamiento, la transformación se aplicará sobre todo el conjunto de datos, dejando los textos libres de referencias directas a usuarios y conservando únicamente el contenido relevante para las siguientes etapas.

In [0]:
# Detectar registros con menciones
mention_pattern = r"@\w+"

df_mention_preview = (
    df
    .filter(F.col("text").rlike(mention_pattern))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Crear columna after (limpieza)
df_mention_preview = df_mention_preview.withColumn(
    "after",
    F.regexp_replace("before", mention_pattern, "")
)

# Filtrar solo los registros donde sí hubo cambio
df_mention_changed = df_mention_preview.filter(F.col("before") != F.col("after")).limit(10)

# Convertir a pandas
preview_pd = df_mention_changed.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", mention_pattern, "")
)

Unnamed: 0,id,before,after
0,55914,i love arguing with people and they assume stuff about me i remember being told i couldn't say f@g because i *don't look gay enough*. i'm not republican because i disagree with you.,i love arguing with people and they assume stuff about me i remember being told i couldn't say f because i *don't look gay enough*. i'm not republican because i disagree with you.
1,152734,"i'm fucking donei just lost all the fucking money i had. i've been fucking sim swapping for 7 months now and like an idiot i fucking lost my recovery words for my blockchain. i just lost my $350,000.00 in crypto and blockchain won't fucking help me. i'm fucking done with life. i'm giving away my instagrams. @pin & @frown. if anyone wishes to see me kill myself after i shoot up the entire fucking area, i will be doing so tomorrow at 12:00am at eastlink centre in grande prairie. but if anyone wishes to fucking end my life for me i live at 9501 72 ave unit 136 grande prairie, ab t8v 6a1. like you can think i'm trolling but wait and see what i fucking do to everyone in that fucking building and on that fucking street you fucking pussies. no one can fucking touch me. what are they gonna do to me? a fucking 16 year old. nothing. hahahaha enjoy losing ur fucking worthless lives","i'm fucking donei just lost all the fucking money i had. i've been fucking sim swapping for 7 months now and like an idiot i fucking lost my recovery words for my blockchain. i just lost my $350,000.00 in crypto and blockchain won't fucking help me. i'm fucking done with life. i'm giving away my instagrams. & . if anyone wishes to see me kill myself after i shoot up the entire fucking area, i will be doing so tomorrow at 12:00am at eastlink centre in grande prairie. but if anyone wishes to fucking end my life for me i live at 9501 72 ave unit 136 grande prairie, ab t8v 6a1. like you can think i'm trolling but wait and see what i fucking do to everyone in that fucking building and on that fucking street you fucking pussies. no one can fucking touch me. what are they gonna do to me? a fucking 16 year old. nothing. hahahaha enjoy losing ur fucking worthless lives"
2,93744,can we still post pictures? i'm trying to post some pics of some c@ke but reddit has taken 15 minutes stuck on submitting,can we still post pictures? i'm trying to post some pics of some c but reddit has taken 15 minutes stuck on submitting
3,106923,"crisis text will only talk to you every other day.when you text 741741 they auto respond with a message that you can only text with them for 45 minutes at a time, every 48 hours. no idea if the hold time counts. thanks assholes. what would i do without waiting around for your shitty resources and two-hour long hold time. i guess the only option left is to kill myself. everyone abandoned me. and don't @me with that ""it gets better go to therapy"" shit. not all problems are temporary.","crisis text will only talk to you every other day.when you text 741741 they auto respond with a message that you can only text with them for 45 minutes at a time, every 48 hours. no idea if the hold time counts. thanks assholes. what would i do without waiting around for your shitty resources and two-hour long hold time. i guess the only option left is to kill myself. everyone abandoned me. and don't with that ""it gets better go to therapy"" shit. not all problems are temporary."
4,306424,my phone is to dry here is one of my socials @s if you wanna talk. lukewaid19 u have to find which one though. someone talk to me,my phone is to dry here is one of my socials if you wanna talk. lukewaid19 u have to find which one though. someone talk to me
5,325563,"just a rant because this guy assumes everything. so i was just looking through people’s snapchat stories and noticed someone posted something with a bunch of popular girls from school with their @s. i went to add them because i’ve talked to them a few times but don’t have their snap. either way one of the girls snaps me and turns out it’s her boyfriend saying “sorry bud she’s got a mans.” i’m like what? just because a guy adds a girl doesn’t mean they wants to talk to them or even likes them. i did like this girl in freshman year but that was 2 years ago. i’ve moved on and haven’t talked to her in atleast a year. this guy goes and digs up dms from years ago. either way i don’t think they will be together very long, she will find out the type of person he is soon enough.","just a rant because this guy assumes everything. so i was just looking through people’s snapchat stories and noticed someone posted something with a bunch of popular girls from school with their . i went to add them because i’ve talked to them a few times but don’t have their snap. either way one of the girls snaps me and turns out it’s her boyfriend saying “sorry bud she’s got a mans.” i’m like what? just because a guy adds a girl doesn’t mean they wants to talk to them or even likes them. i did like this girl in freshman year but that was 2 years ago. i’ve moved on and haven’t talked to her in atleast a year. this guy goes and digs up dms from years ago. either way i don’t think they will be together very long, she will find out the type of person he is soon enough."
6,130076,"(vent)i hate feeling like this. hate it. fffffffnnnnndmsmsm.uuuughghhghghfhjdxbhbhifgtf vbbhhbk. uuugggghgghgfggggggggg. jhhhjhtfhuoiikjjjhknjhgfdrtttttyt. f*******★***★★*********★*********k. gooooooooooooood. hddhdudduhdhfhvvdhxhfudu.*@@(@(@@8@8@9#9#88#8@8!&'&!""%#%4)&$&3&37636#63%$3'%%-&3&3@&3&@*3&2:35#$4""&%'.#!xdxxxxdrdxxxx∞.%%%5xdddffffgffffdssxxcvx56%""%65%%%66)6)777&7-------)))((((((($.xffffgggggb¡. but sceriously, i just needed to be heard and to vent. you should try this. just type it out. as long as it needs to be. it feels great.*;@(@@201)10100&?&xvdhb cxbxbxbbxbxxx","(vent)i hate feeling like this. hate it. fffffffnnnnndmsmsm.uuuughghhghghfhjdxbhbhifgtf vbbhhbk. uuugggghgghgfggggggggg. jhhhjhtfhuoiikjjjhknjhgfdrtttttyt. f*******★***★★*********★*********k. gooooooooooooood. hddhdudduhdhfhvvdhxhfudu.*@@(@(@#9#88#8!&'&!""%#%4)&$&3&37636#63%$3'%%-&3&3@&3&@*3&2:35#$4""&%'.#!xdxxxxdrdxxxx∞.%%%5xdddffffgffffdssxxcvx56%""%65%%%66)6)777&7-------)))((((((($.xffffgggggb¡. but sceriously, i just needed to be heard and to vent. you should try this. just type it out. as long as it needs to be. it feels great.*;@(@)10100&?&xvdhb cxbxbxbbxbxxx"
7,5894,help me pleasei was going to talk about my problems and why i want to kill myself tonight but i'm scared of people can someone please text me on snapchat @ryxntumblzz,help me pleasei was going to talk about my problems and why i want to kill myself tonight but i'm scared of people can someone please text me on snapchat
8,154579,"honestly, what's the point anymore? i have done everything, yet i can never truly feel that i am freei got top 3 in high honors in high school and,top 10 honors in elementary school, with few medals here and there. i still need to go through all that tedious difficult bullsh!t again and again once im in college? and the fact that even if i did top the board exams, i have to go through all that grueling work again when i have a job? and when i have a super lucrative job, a loving family, trustworthy friends, and a bed of roses since i have everything i could have imagined i wanted i still have to go through all that grueling tedious work again. because of the fact that evil exists and that there are at least some people that are not afraid to do horrible things to you to get what they wanted? like stealing your hardearned work? and that not all of my friends and family are absolutely modest, and any of them can betray me once there comes a time where i need to lean on for dear life. so i have to maintain it, by keeping up my best? as if i'm not human? that has no limits and no emotions? just performs and thats f@cking it. when can i have freedom? a peaceful life where nothing can bother me, when? when im dead? cus i really feel like i want to end myself. because what is the point anymore? how can i be happy? when pursuing it seems impossible since there will always be more problems that will arise when i solve the current ones. when will i take a break? why is everything in life a competition? why do humans do this to each other? when in reality there is no point to it, why can't we all be at peace? why do i and others have to go through all of this with no end, why? &#x200b; i just need somebody who truly understands, i just need somebody who thinks not everything has to be a competition, why is reality like this? i know it can't be removed, but why can't it be changed? am i the only one? i need to know","honestly, what's the point anymore? i have done everything, yet i can never truly feel that i am freei got top 3 in high honors in high school and,top 10 honors in elementary school, with few medals here and there. i still need to go through all that tedious difficult bullsh!t again and again once im in college? and the fact that even if i did top the board exams, i have to go through all that grueling work again when i have a job? and when i have a super lucrative job, a loving family, trustworthy friends, and a bed of roses since i have everything i could have imagined i wanted i still have to go through all that grueling tedious work again. because of the fact that evil exists and that there are at least some people that are not afraid to do horrible things to you to get what they wanted? like stealing your hardearned work? and that not all of my friends and family are absolutely modest, and any of them can betray me once there comes a time where i need to lean on for dear life. so i have to maintain it, by keeping up my best? as if i'm not human? that has no limits and no emotions? just performs and thats f it. when can i have freedom? a peaceful life where nothing can bother me, when? when im dead? cus i really feel like i want to end myself. because what is the point anymore? how can i be happy? when pursuing it seems impossible since there will always be more problems that will arise when i solve the current ones. when will i take a break? why is everything in life a competition? why do humans do this to each other? when in reality there is no point to it, why can't we all be at peace? why do i and others have to go through all of this with no end, why? &#x200b; i just need somebody who truly understands, i just need somebody who thinks not everything has to be a competition, why is reality like this? i know it can't be removed, but why can't it be changed? am i the only one? i need to know"
9,236015,"my friends hate mei made a mistake. everything was finally going well at school. i had friends, i have a boyfriend, my anxiety and depression were under control, it was okay. these friends - two close ones - let's call them amy and hannah and i were part of two group chats. one with just us three, and one with us three plus amy's boyfriend, my boyfriend, and two other male friends. each of the group chats were full of good banter, memes, and constant discussion. on the group chat with just myself and amy and hannah we were joking about how horny we were and amy was talking about her boyfriend and how much she missed him and joking about making all her decisions with her ""pussy"". i was like ""someone should screenshot this and send it to the other group chat"". hannah laughed along and then amy said, ""okay"". she said i could send it. i took a screenshot. she said to blank out her name. i did. she said - direct quote - ""just don't @ michael"" (her boyfriend - not real name). i took that obviously as don't mention him in the message - tag him as in @michaelsmith. she said i should send the cropped screenshot and say ""guess who?"" along with it. so i did all that and sent it. immediately i got shit. apparently, i was meant to black out michael's name too. instantly i felt like shit. i began to apologize over and over. it's been two days since this happened. neither amy nor hannah are talking to me. amy thinks i did it on purpose. she has been saying petty and obviously snarky comments in reply to anything i send in the big group chat. i feel like throwing up and i've been crying constantly. i feel like i have no friends, i feel like i have no one to talk to, i feel like they may never talk to me again.","my friends hate mei made a mistake. everything was finally going well at school. i had friends, i have a boyfriend, my anxiety and depression were under control, it was okay. these friends - two close ones - let's call them amy and hannah and i were part of two group chats. one with just us three, and one with us three plus amy's boyfriend, my boyfriend, and two other male friends. each of the group chats were full of good banter, memes, and constant discussion. on the group chat with just myself and amy and hannah we were joking about how horny we were and amy was talking about her boyfriend and how much she missed him and joking about making all her decisions with her ""pussy"". i was like ""someone should screenshot this and send it to the other group chat"". hannah laughed along and then amy said, ""okay"". she said i could send it. i took a screenshot. she said to blank out her name. i did. she said - direct quote - ""just don't @ michael"" (her boyfriend - not real name). i took that obviously as don't mention him in the message - tag him as in . she said i should send the cropped screenshot and say ""guess who?"" along with it. so i did all that and sent it. immediately i got shit. apparently, i was meant to black out michael's name too. instantly i felt like shit. i began to apologize over and over. it's been two days since this happened. neither amy nor hannah are talking to me. amy thinks i did it on purpose. she has been saying petty and obviously snarky comments in reply to anything i send in the big group chat. i feel like throwing up and i've been crying constantly. i feel like i have no friends, i feel like i have no one to talk to, i feel like they may never talk to me again."


###**Eliminación Hashtags (#Palabra)**

En este apartado se detectarán y eliminarán los hashtags, aquellos términos precedidos por el símbolo # que suelen usarse para etiquetar temas. Aunque son útiles para categorizar contenido, no aportarán significado directo al análisis textual y podrán generar ruido innecesario.

Primero se mostrará una vista previa que permitirá comparar cómo luce el texto antes y después de remover los hashtags. Esto ayudará a validar la transformación. Luego, una vez comprobado el resultado, la limpieza se aplicará a todo el conjunto de datos, asegurando que las etiquetas queden fuera y el contenido permanezca centrado en el mensaje real del usuario.

In [0]:
# Detectar registros con hashtags
hashtag_pattern = r"#\w+"

df_hash_preview = (
    df
    .filter(F.col("text").rlike(hashtag_pattern))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Generar columna after (limpieza)
df_hash_preview = df_hash_preview.withColumn(
    "after",
    F.regexp_replace("before", hashtag_pattern, "")
)

# Filtrar solo los registros donde sí hubo cambio
df_hash_changed = df_hash_preview.filter(F.col("before") != F.col("after")).limit(10)

# Convertir a Pandas
preview_pd = df_hash_changed.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", hashtag_pattern, "")
)

Unnamed: 0,id,before,after
0,142203,"gonna turn 15 soon it's like 9:27 pm, i'm tired as fuck, prob gonna sleep &#x200b; goodnight, might change my flair when i wake up","gonna turn 15 soon it's like 9:27 pm, i'm tired as fuck, prob gonna sleep &; goodnight, might change my flair when i wake up"
1,316893,"heartbroken.16 y/o south of spain &#x200b; after 2 years i've finally managed to get out with my crush we have been for over 4 months almost 5 already, what i didn't know was that she was suicidal, just because of me. i went to her place had a cute time hugging each other and that but she went to the bathroom i realized that her diary was on the desk i couldn't help myself i had to take it with me, and i rode it all. &#x200b; learnt out she had been suffering to be with me for also some years, she really liked me and didn't want to lose me on the last pages i rode that she was that kind of person that gets tired of others fast even if she doesn't want to and some pages where you cant even see whats written because they're full of tears except some things that i could read were: &#x200b; can't image not being with him , we probably wont last long because i know myself i will get tired of him and suffer even more than these years of wanting to be with him, nothing makes sense anymore i should't even be here. &#x200b; &#x200b; &#x200b; she didn't realize i rode it all since i took it and brought it back the next day, &#x200b; now everything makes sense she's been really sad lately can't even hug her im really worried and she doesn't admit she's sad, she is all i basically have , the only thing that wants me to wake up every morning call her talk with her, be with her &#x200b; i guess this will all take an end the only person ive truly loved, im just constantly thinking of her crying 24/7 when im home or just alone i dont want to end this but its coming closer everyday the only person i can truly tell the truth to, someone im not scared of talking to asking for help anything this is just coming to an end and there is no need for me anymore just cant take it anymore","heartbroken.16 y/o south of spain &; after 2 years i've finally managed to get out with my crush we have been for over 4 months almost 5 already, what i didn't know was that she was suicidal, just because of me. i went to her place had a cute time hugging each other and that but she went to the bathroom i realized that her diary was on the desk i couldn't help myself i had to take it with me, and i rode it all. &; learnt out she had been suffering to be with me for also some years, she really liked me and didn't want to lose me on the last pages i rode that she was that kind of person that gets tired of others fast even if she doesn't want to and some pages where you cant even see whats written because they're full of tears except some things that i could read were: &; can't image not being with him , we probably wont last long because i know myself i will get tired of him and suffer even more than these years of wanting to be with him, nothing makes sense anymore i should't even be here. &; &; &; she didn't realize i rode it all since i took it and brought it back the next day, &; now everything makes sense she's been really sad lately can't even hug her im really worried and she doesn't admit she's sad, she is all i basically have , the only thing that wants me to wake up every morning call her talk with her, be with her &; i guess this will all take an end the only person ive truly loved, im just constantly thinking of her crying 24/7 when im home or just alone i dont want to end this but its coming closer everyday the only person i can truly tell the truth to, someone im not scared of talking to asking for help anything this is just coming to an end and there is no need for me anymore just cant take it anymore"
2,260107,"i got accepted to the best university in the country. bois and girls, i did it. today i got the results of my exams. i got accepted to the most famous and popular university in my country. i’m just happy that i finished school. that shit made me depressed and anxious af. university is a whole different thing. #lets gooooooooo","i got accepted to the best university in the country. bois and girls, i did it. today i got the results of my exams. i got accepted to the most famous and popular university in my country. i’m just happy that i finished school. that shit made me depressed and anxious af. university is a whole different thing. gooooooooo"
3,308763,"i feel off, typing this out to give myself some perspective24/m i've had depression on and off most of my life. ever since i was in grade school i kind of knew i was a little off. i've never really looked for help about it, because until not too long ago i wasn't even sure i had it. i only recently got diagnosed in navy basic training of all places. that was a fun month to say the least. i felt low, really low, after getting out. didn't really think of suicide though. i eventually got a decent job by my standards and have been at it for a few months now. i signed up for college classes when i was feeling pretty good a few months back. however, a few days ago i got a huge wave of anxiety about the whole thing and just dropped all the classes and canceled everything. since then i've had thoughts on and off of suicide. i'll think of it for hours, when i'm working, when i'm driving, reading, talking to people, etc. then the switch will flip and it'll be the last thing on my mind. i've had the thoughts in limited amounts before but now they are much more vivid, organized. i've even narrowed down the methods. #1 being a bridge about 5 minutes away from my house. never once thought about jumping off of it until yesterday. had to drive over it today and was noticing how low the railings were. one part of me knows it's stupid, another thinks it's inevitable. just typing that bridge sentence up made me a bit teary eyed. i don't *really* want to end it, but on the other hand it's the one thing i really want most. right now i don't feel the strong urge i felt earlier today, but the thoughts are still back there. wow, just rereading that bridge sentence. i feel like jumping would be my way to go. i'm making excuses for myself not to jump for now. a mix between ""i'll wait for the weather to warm up first"", to ""if i'm going to jump off a bridge it might as well be the golden gate"" at least that way i've got a goal to hold out for. never been to cali before, and i want to at least feel nice and warm before the jump. the plane tickets are pretty cheap. maybe the drive to just get it over with will make me impatient, but i don't know. main thing keeping me from doing it is family. they haven't done anything wrong, and i know they'll blame themselves for it. on one hand they haven't helped me with my depression, but on the other they have pretty much been there for me in general. it's tough thought. in the end, if i do go through with it, i'll probably try and do it in a way where i can't be id'ed. like jump off the gg bridge without id. based on my families track record, it would take at least 1-2 months before they realized something was up at the earliest. i'd rather they think i ran off than killed myself. tldr: i want to say i'll never do it, but who knows.","i feel off, typing this out to give myself some perspective24/m i've had depression on and off most of my life. ever since i was in grade school i kind of knew i was a little off. i've never really looked for help about it, because until not too long ago i wasn't even sure i had it. i only recently got diagnosed in navy basic training of all places. that was a fun month to say the least. i felt low, really low, after getting out. didn't really think of suicide though. i eventually got a decent job by my standards and have been at it for a few months now. i signed up for college classes when i was feeling pretty good a few months back. however, a few days ago i got a huge wave of anxiety about the whole thing and just dropped all the classes and canceled everything. since then i've had thoughts on and off of suicide. i'll think of it for hours, when i'm working, when i'm driving, reading, talking to people, etc. then the switch will flip and it'll be the last thing on my mind. i've had the thoughts in limited amounts before but now they are much more vivid, organized. i've even narrowed down the methods. being a bridge about 5 minutes away from my house. never once thought about jumping off of it until yesterday. had to drive over it today and was noticing how low the railings were. one part of me knows it's stupid, another thinks it's inevitable. just typing that bridge sentence up made me a bit teary eyed. i don't *really* want to end it, but on the other hand it's the one thing i really want most. right now i don't feel the strong urge i felt earlier today, but the thoughts are still back there. wow, just rereading that bridge sentence. i feel like jumping would be my way to go. i'm making excuses for myself not to jump for now. a mix between ""i'll wait for the weather to warm up first"", to ""if i'm going to jump off a bridge it might as well be the golden gate"" at least that way i've got a goal to hold out for. never been to cali before, and i want to at least feel nice and warm before the jump. the plane tickets are pretty cheap. maybe the drive to just get it over with will make me impatient, but i don't know. main thing keeping me from doing it is family. they haven't done anything wrong, and i know they'll blame themselves for it. on one hand they haven't helped me with my depression, but on the other they have pretty much been there for me in general. it's tough thought. in the end, if i do go through with it, i'll probably try and do it in a way where i can't be id'ed. like jump off the gg bridge without id. based on my families track record, it would take at least 1-2 months before they realized something was up at the earliest. i'd rather they think i ran off than killed myself. tldr: i want to say i'll never do it, but who knows."
4,152957,"guys, what happened to this award? &#x200b;","guys, what happened to this award? &;"
5,215765,"random thought #20 most people live boring lives. from birth they learn, and learn and learn and learn until the get a job, then they do that job for years and years. maybe they'll find love in between but that isn't always a good thing. after you retire there isn't much to do. you'll be old and probably wouldn't understand the current technology to well, maybe you'd go back and play minecraft or something, or do a project. but in the end you're just living a boring life. for some people it's better though. things may be easier, and other things harder. being rich must be nice.","random thought most people live boring lives. from birth they learn, and learn and learn and learn until the get a job, then they do that job for years and years. maybe they'll find love in between but that isn't always a good thing. after you retire there isn't much to do. you'll be old and probably wouldn't understand the current technology to well, maybe you'd go back and play minecraft or something, or do a project. but in the end you're just living a boring life. for some people it's better though. things may be easier, and other things harder. being rich must be nice."
6,184299,annie are you okay? &#x200b;,annie are you okay? &;
7,124261,"the feeling of wanting to scream but nothing comes outi just dont want it anymore today she was hitting me, and i'm so used to it, i crave it. i craved she made a mistake, maybe would knock me down and my head would bust open, i'd die. any moment something is about to happen, i think about what it would be like if it killed me, and everythought i have satisfys me. &#x200b; i can't talk to friends about it, they don't know how to handle it. i guess i can't blame them, i don't really know how to handle myself either. i'm stuck. i don't know what to do. killing myself would make them sad, but they don't even listen to me, they tell me to get professional help, and i know they mean that with good intent but but ic ant","the feeling of wanting to scream but nothing comes outi just dont want it anymore today she was hitting me, and i'm so used to it, i crave it. i craved she made a mistake, maybe would knock me down and my head would bust open, i'd die. any moment something is about to happen, i think about what it would be like if it killed me, and everythought i have satisfys me. &; i can't talk to friends about it, they don't know how to handle it. i guess i can't blame them, i don't really know how to handle myself either. i'm stuck. i don't know what to do. killing myself would make them sad, but they don't even listen to me, they tell me to get professional help, and i know they mean that with good intent but but ic ant"
8,277347,"about to suicide in the next few days. (hopefully)15.5 years old, the cliche i had low self esteem and self hate since i was a little boy. the little boy became big boy, really fat over the years. in middle school the boy was bullied by the 1 strong guy who was liked by the class. little boy was bad at school, but over time he somehow got to a computer science class. the boy realized after 2 months that he lost all of his motivation, didn't do anything, no one bothered really. the boy failed and failed and failed and failed and failed. the boy realized that he is about to fall out of the honor class to the lowest level class. the boy was also interested in economics (the only thing kept him alive), and he did 1+1 and he knew that if he wasn't going to be a software engineer he is doomed to join the majority of the population which has the same real wage since 1998. &#x200b; he had many medical problems which made him practically, unemployable at low skill employment. &#x200b; the boy has decided a few months back to cease relationships with all of his friends because he didn't want to be a bad influence on them. &#x200b; there's a relatively tall mall/commercial building in a 5 minutes walk from his home. tomorrow, on the 26/2/2019, he is going to head to the top floor of the building, and he is going to jump. &#x200b; hopefully, the boy might have the guts to do it. &#x200b; to be continued, or not.","about to suicide in the next few days. (hopefully)15.5 years old, the cliche i had low self esteem and self hate since i was a little boy. the little boy became big boy, really fat over the years. in middle school the boy was bullied by the 1 strong guy who was liked by the class. little boy was bad at school, but over time he somehow got to a computer science class. the boy realized after 2 months that he lost all of his motivation, didn't do anything, no one bothered really. the boy failed and failed and failed and failed and failed. the boy realized that he is about to fall out of the honor class to the lowest level class. the boy was also interested in economics (the only thing kept him alive), and he did 1+1 and he knew that if he wasn't going to be a software engineer he is doomed to join the majority of the population which has the same real wage since 1998. &; he had many medical problems which made him practically, unemployable at low skill employment. &; the boy has decided a few months back to cease relationships with all of his friends because he didn't want to be a bad influence on them. &; there's a relatively tall mall/commercial building in a 5 minutes walk from his home. tomorrow, on the 26/2/2019, he is going to head to the top floor of the building, and he is going to jump. &; hopefully, the boy might have the guts to do it. &; to be continued, or not."
9,5802,real suppleroot hours #922 who up? what've you been up to today?,real suppleroot hours who up? what've you been up to today?


###**Expansión Contracciones & Jerga**

En este apartado se normalizará el lenguaje informal del texto, ampliando tanto contracciones (“don’t”, “i’m”, “can’t”) como expresiones de slang o jerga digital (“idk”, “lol”, “btw”). Este tipo de palabras, pueden dificultar la correcta interpretación del lenguaje por parte de los modelos, ya que acortan o distorsionan conceptos que deberían analizarse en su forma completa.

Para mejorar la claridad del texto, se usará un diccionario que reemplazará estas formas abreviadas por sus equivalentes plenamente expresados. Antes de aplicar el cambio de manera global, se mostrará una vista previa que permitirá comparar cómo se verá el contenido antes y después de la expansión. Con esto se garantizará que el texto gane coherencia y significado, facilitando el procesamiento en etapas posteriores.

In [0]:
# Diccionario contracciones + slang
abb = {
    # Contracciones
    "ain't": "am not",
    "aren't": "are not",
    "can't": "cannot",
    "can't've": "cannot have",
    "'cause": "because",
    "could've": "could have",
    "couldn't": "could not",
    "didn't": "did not",
    "doesn't": "does not",
    "don't": "do not",
    "dont": "do not",
    "hadn't": "had not",
    "hasn't": "has not",
    "haven't": "have not",
    "he's": "he is",
    "i'm": "i am",
    "im": "i am",
    "i've": "i have",
    "isn't": "is not",
    "it's": "it is",
    "let's": "let us",
    "mightn't": "might not",
    "mustn't": "must not",
    "shan't": "shall not",
    "she's": "she is",
    "should've": "should have",
    "shouldn't": "should not",
    "that's": "that is",
    "there's": "there is",
    "they're": "they are",
    "they've": "they have",
    "wasn't": "was not",
    "weren't": "were not",
    "what's": "what is",
    "where's": "where is",
    "who's": "who is",
    "won't": "will not",
    "wouldn't": "would not",
    "you're": "you are",
    "you've": "you have",
    "y'all": "you all",

    # Slang
    "idk": "i do not know",
    "omg": "oh my god",
    "wtf": "what the fuck",
    "wth": "what the hell",
    "lmao": "laughing my ass off",
    "lol": "laughing out loud",
    "btw": "by the way",
    "brb": "be right back",
    "imo": "in my opinion",
    "imho": "in my humble opinion",
    "irl": "in real life",
    "u": "you",
    "ur": "your",
    "thx": "thanks",
    "pls": "please",
    "plz": "please",
    "ppl": "people"
}

# Compilar regex
abb_re = re.compile(r"(?i)\b(" + "|".join(map(re.escape, abb.keys())) + r")\b")

# Función expandir contracciones
def expand_contractions(text):
    if text is None:
        return text

    def replace(match):
        key = match.group(0).lower()
        return abb.get(key, key)

    return abb_re.sub(replace, text)

expand_contractions_udf = F.udf(expand_contractions, StringType())

# Crear preview Antes/Despues
df_preview = (
    df
    .withColumn("before", F.col("text"))
    .withColumn(
        "after",
        expand_contractions_udf(
            F.regexp_replace("before", "’", "'")
        )
    )
    .select("before", "after")
    .orderBy(F.rand())
    .limit(15)
)

# Convertir a pandas
preview_pd = df_preview.toPandas()
html_preview = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_preview}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    expand_contractions_udf(
        F.regexp_replace("text", "’", "'")
    )
)

Unnamed: 0,before,after
0,"goodbyelife is fucking retarded. i cannot fucking relax, everyone and everything just keeps fucking annoying me, i keep losing at everything. its all complete fucking bullshit. its the same every single fucking day. and now that school is starting and i need to move to a different city, i just dont feel like living. im extremely stressed, and my parents are doing the opposite of helping. it feels like i wasnt supposed to be fucking born. i just keep losing, i keep getting beaten down, i cant get up anymore. ill just finish myself off","goodbyelife is fucking retarded. i cannot fucking relax, everyone and everything just keeps fucking annoying me, i keep losing at everything. its all complete fucking bullshit. its the same every single fucking day. and now that school is starting and i need to move to a different city, i just do not feel like living. i am extremely stressed, and my parents are doing the opposite of helping. it feels like i wasnt supposed to be fucking born. i just keep losing, i keep getting beaten down, i cant get up anymore. ill just finish myself off"
1,"my friend's situation is really bad, should i tell his parents?my friend is an introvert and he is really shy. there's a girl that he ""indirectly"" likes. he keeps thinking about her, and gets jealous if some guy, or say, any guy talks to her, in the college. he even warned a guy to stay away from her. and also called a guy who he thought was talking to her, just to check. and he sometimes spaces out, becomes blank, and tears start coming from his eyes. he also says and thinks that everyone in the college is better than him, and also that he is a ""hopeless piece of shit"". and also that he doesn't have any talent or whatsoever, and just wants to give up. he also tells me that everyone in his class thinks he is a ""loser"", and no one wants to talk to him. me and my other friend are trying everything we can, to help him. but we are limited by the business in our own individual lives. what do we do to help him? apart from that, he was also involved in a toxic relationship with a couple. the girl would annoy him telling him why isn't her boyfriend messaging her. and he (my friend) would also talk to that ""boyfriend"" all the time, and even play online games with him, and then at one point, he grew really annoyed of my friend's messages, and told him to stay away. and also, there is too much workload on him right now, from his college. and yesterday, as it turns out, he tried to ""kill himself"". it was night, i was leaving his home, and then he insisted that he comes to drop me off. i said okay. but this time, he chose his car, rather than his two-wheelers. i asked him whether he drove car everyday for college, and he said no. i also asked him whether he drove it before this time, he also said no. the tone of his reply was monotonic. he replied with no interest, and then after dropping me off, we said bye, and then he was off his home. but i waited for him to take a u-turn, so i could see him returning his home. it was just a hunch. but i didn't see him take the turn, instead he took another road. he was all alone in the car, filled with misery and pain, probably driving on the highway, with no one inside. and then i got a reply from him saying that he has reached home. and now he uses me and my other friend as a ""diary"". he constantly texts us whatever comes up in his mind, including his negative thoughts and what others think of him. so i'm planning to put an end to this, tomorrow, by telling his mother (who is the only person there currently), while my other friend takes him to a garden. but the feeling of doubt i get is, what if she treats him badly after listening to me? and it becomes more worse for him? though there was time when i saw his mother be sympathetic to him when he got low grades in the final exam of our 12th class. but i still don't know if it's a good idea. p.s. i also tried telling his mother yesterday, but he was there, and dragged my shirt and stopped me. he kept saying, ""please, you don't know everything, please."" and most of the time, the talk wasn't even in person. he would text me, rather than speak to me directly. ""you are the only person i can trust."" ""if not you, then whom?"" i'm really sorry to have you waste your time reading this long paragraphs, but any of your help is appreciated at this point.","my friend's situation is really bad, should i tell his parents?my friend is an introvert and he is really shy. there is a girl that he ""indirectly"" likes. he keeps thinking about her, and gets jealous if some guy, or say, any guy talks to her, in the college. he even warned a guy to stay away from her. and also called a guy who he thought was talking to her, just to check. and he sometimes spaces out, becomes blank, and tears start coming from his eyes. he also says and thinks that everyone in the college is better than him, and also that he is a ""hopeless piece of shit"". and also that he does not have any talent or whatsoever, and just wants to give up. he also tells me that everyone in his class thinks he is a ""loser"", and no one wants to talk to him. me and my other friend are trying everything we can, to help him. but we are limited by the business in our own individual lives. what do we do to help him? apart from that, he was also involved in a toxic relationship with a couple. the girl would annoy him telling him why is not her boyfriend messaging her. and he (my friend) would also talk to that ""boyfriend"" all the time, and even play online games with him, and then at one point, he grew really annoyed of my friend's messages, and told him to stay away. and also, there is too much workload on him right now, from his college. and yesterday, as it turns out, he tried to ""kill himself"". it was night, i was leaving his home, and then he insisted that he comes to drop me off. i said okay. but this time, he chose his car, rather than his two-wheelers. i asked him whether he drove car everyday for college, and he said no. i also asked him whether he drove it before this time, he also said no. the tone of his reply was monotonic. he replied with no interest, and then after dropping me off, we said bye, and then he was off his home. but i waited for him to take a you-turn, so i could see him returning his home. it was just a hunch. but i did not see him take the turn, instead he took another road. he was all alone in the car, filled with misery and pain, probably driving on the highway, with no one inside. and then i got a reply from him saying that he has reached home. and now he uses me and my other friend as a ""diary"". he constantly texts us whatever comes up in his mind, including his negative thoughts and what others think of him. so i am planning to put an end to this, tomorrow, by telling his mother (who is the only person there currently), while my other friend takes him to a garden. but the feeling of doubt i get is, what if she treats him badly after listening to me? and it becomes more worse for him? though there was time when i saw his mother be sympathetic to him when he got low grades in the final exam of our 12th class. but i still do not know if it is a good idea. p.s. i also tried telling his mother yesterday, but he was there, and dragged my shirt and stopped me. he kept saying, ""please, you do not know everything, please."" and most of the time, the talk was not even in person. he would text me, rather than speak to me directly. ""you are the only person i can trust."" ""if not you, then whom?"" i am really sorry to have you waste your time reading this long paragraphs, but any of your help is appreciated at this point."
2,"is 5'4 normal for a teenager? just the title lol &; p.s: don't comment ""i'm 6 foot and i'm 13"" because you're just being an asshole and it has nothing to do with the question.","is 5'4 normal for a teenager? just the title laughing out loud &; p.s: do not comment ""i am 6 foot and i am 13"" because you are just being an asshole and it has nothing to do with the question."
3,"here it is. againa couple of days ago i said that maybe one day i'll give in to this stupid thing. a year ago i said that when i'm overseas i'll end it. in 3 months i'll be leaving for the uk and i have plans to end it there. i'm so tired. i don't know who else to tell who won't be overly alarmed. like. i don't know what to tell people anymore. it always feels like the same time. the whole i'm tired i've had enough i don't want to go on. i feel like it's always the same, just always a little bit worse. i don't even know like, how to share what i feel. i just want to be loved. all i ask is for that. i just want someone to hold me close and tell me it'll be ok. i just wish there was someone who could that now and tell me it'll all go away. i'm so tired. i'm so tired. i'm so tired. i want all of this to end.","here it is. againa couple of days ago i said that maybe one day i'll give in to this stupid thing. a year ago i said that when i am overseas i'll end it. in 3 months i'll be leaving for the uk and i have plans to end it there. i am so tired. i do not know who else to tell who will not be overly alarmed. like. i do not know what to tell people anymore. it always feels like the same time. the whole i am tired i have had enough i do not want to go on. i feel like it is always the same, just always a little bit worse. i do not even know like, how to share what i feel. i just want to be loved. all i ask is for that. i just want someone to hold me close and tell me it'll be ok. i just wish there was someone who could that now and tell me it'll all go away. i am so tired. i am so tired. i am so tired. i want all of this to end."
4,"alone foreveri don't feel right in the past 2 years ,i don't feel right ,i feel like i just wanna kill myself but i can't do it it's just too hard i feel like i wanna disappear forever and just see what the fuck happens after i die,who would care.i'm in 8th grade,14 but i don't think it's just a phase. i feel like i'm going to be like this my whole life i don't feel like anything would change everything is the same every fucking day. i'm very social awkward and anti social i don't go outside even if ""friends"" or classmates or neighbours call me to get outside i tried to but i just don't feel right i feel like i'm useless i just look at the ground and say nothing i can't be happy ,3 days ago i had my 8th grade banquet thing i just stood there the whole time i didn't even eat. my everyday life is just wake up playing games on my pc or watching memes or anime and watching youtube just wasting time,i also go to sleep late. when i go to school i just get bullied by classmates that when they see that i'm sad or just being alone for the whole day they bother me and they think that they are my friends again but i don't even consider that ,i just act like i do just so they don't get mad at me for my fake personality with them. i'm a failure for my parents i can't learn anything i got exams on 12 and 13 june and if i fail them i'm going to become more depressed. i tried to kill myself when i was like 11 i tried to jump off the balcony from floor 3 and took some sleep pills in a row recently. people don't talk to me because i'm awkward but i just want to have someone to talk with about my problems in real life . i might end it all soon","alone foreveri do not feel right in the past 2 years ,i do not feel right ,i feel like i just wanna kill myself but i cannot do it it is just too hard i feel like i wanna disappear forever and just see what the fuck happens after i die,who would care.i am in 8th grade,14 but i do not think it is just a phase. i feel like i am going to be like this my whole life i do not feel like anything would change everything is the same every fucking day. i am very social awkward and anti social i do not go outside even if ""friends"" or classmates or neighbours call me to get outside i tried to but i just do not feel right i feel like i am useless i just look at the ground and say nothing i cannot be happy ,3 days ago i had my 8th grade banquet thing i just stood there the whole time i did not even eat. my everyday life is just wake up playing games on my pc or watching memes or anime and watching youtube just wasting time,i also go to sleep late. when i go to school i just get bullied by classmates that when they see that i am sad or just being alone for the whole day they bother me and they think that they are my friends again but i do not even consider that ,i just act like i do just so they do not get mad at me for my fake personality with them. i am a failure for my parents i cannot learn anything i got exams on 12 and 13 june and if i fail them i am going to become more depressed. i tried to kill myself when i was like 11 i tried to jump off the balcony from floor 3 and took some sleep pills in a row recently. people do not talk to me because i am awkward but i just want to have someone to talk with about my problems in real life . i might end it all soon"
5,garlic bread garlic bread garlic bread garlic bread garlic bread,garlic bread garlic bread garlic bread garlic bread garlic bread
6,she actually said yes boys. my friend agreed with me that this sub needs new jokes holy shit the exact same setup and punchline happen every single week please find some originality.,she actually said yes boys. my friend agreed with me that this sub needs new jokes holy shit the exact same setup and punchline happen every single week please find some originality.
7,ive done it ive open the door on my refrigerator before the light came on it was amazing,ive done it ive open the door on my refrigerator before the light came on it was amazing
8,guys i need help asap! my girlfriend asked me what animal i think she would be and i said cockatoo which *apparently* was not the right answer.idk what to do i'm hiding in my closet help,guys i need help asap! my girlfriend asked me what animal i think she would be and i said cockatoo which *apparently* was not the right answer.i do not know what to do i am hiding in my closet help
9,"thinking about ending itim 20, about to fail out of university, for the past year ive thought about killing myself every single day im a polisubstance abuser who can't stop taking everything. these past 2 weeks i have been blacked out/in psychosis the whole time from an enormous pcp binge. and the previous year i have been abusing otc cough medicine among basically every other drug every day. i feel mentally slow, ive started stuttering, i have forgotten how to do basic stuff, and i have no real/meaningful memories from the past year, just nothing at all. monday this week i was going to kill myself but fell asleep beside the tracks instead of on them after a huge oral dose of pcp, i called the suicide hotline 2 times and close friends but i still cant get the thought out of my head. i will update in a day or 2 on my situation. i wonder if ill finally get a rest","thinking about ending itim 20, about to fail out of university, for the past year ive thought about killing myself every single day i am a polisubstance abuser who cannot stop taking everything. these past 2 weeks i have been blacked out/in psychosis the whole time from an enormous pcp binge. and the previous year i have been abusing otc cough medicine among basically every other drug every day. i feel mentally slow, ive started stuttering, i have forgotten how to do basic stuff, and i have no real/meaningful memories from the past year, just nothing at all. monday this week i was going to kill myself but fell asleep beside the tracks instead of on them after a huge oral dose of pcp, i called the suicide hotline 2 times and close friends but i still cant get the thought out of my head. i will update in a day or 2 on my situation. i wonder if ill finally get a rest"


###**Eliminación Caracteres No Alfabéticos**

En este apartado se eliminarán aquellos símbolos que no forman parte del alfabeto, los números, los espacios o los signos de puntuación básicos. Este tipo de caracteres suele aparecer por errores de codificación, símbolos fuera de contexto, marcas extrañas o elementos que no aportan significado al análisis textual.

Para asegurar que solo se retire contenido realmente innecesario, primero se generará una muestra comparando el texto antes y después de la limpieza. Esto permitirá observar cómo se reducirán estos elementos sin afectar la estructura general del mensaje. Al finalizar, el dataframe quedará compuesto únicamente por caracteres útiles y consistentes para las siguientes secciones del pre-procesamiento.

In [0]:
# Detectar registros con caracteres no alfanuméricos
non_alpha_pattern = r"[^a-zA-Z0-9\s\.\,\!\?\']"

df_nonalpha_preview = (
    df
    .filter(F.col("text").rlike(non_alpha_pattern))
    .orderBy(F.rand())
    .select("id", F.col("text").alias("before"))
    .limit(200)
)

# Crear columna after (limpieza)
df_nonalpha_preview = df_nonalpha_preview.withColumn(
    "after",
    F.regexp_replace("before", non_alpha_pattern, "")
)

# Filtrar solo los registros donde sí hubo cambio
df_nonalpha_changed = df_nonalpha_preview.filter(F.col("before") != F.col("after")).limit(10)

# Convertir a Pandas
preview_pd = df_nonalpha_changed.toPandas()
html_table = preview_pd.to_html(escape=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    white-space: nowrap;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_table}
</div>
"""))

# Aplicar transformación al dataframe original
df = df.withColumn(
    "text",
    F.regexp_replace("text", non_alpha_pattern, "")
)

Unnamed: 0,id,before,after
0,115317,"looking for (mainly) french-speaking people we are looking for people to begin a five nights at freddy's roleplay on messenger. people who are interested are very welcomed, rules will be sent to you in your dms. thanks for at least paying attention to this little announcement. have a great day !","looking for mainly frenchspeaking people we are looking for people to begin a five nights at freddy's roleplay on messenger. people who are interested are very welcomed, rules will be sent to you in your dms. thanks for at least paying attention to this little announcement. have a great day !"
1,302355,help i please am be harassed the r/teenagers bot keep tell me to play minecraft server over and over every time i tell it to stop it says it again i have to help i please stop i only said one word and i help need help!,help i please am be harassed the rteenagers bot keep tell me to play minecraft server over and over every time i tell it to stop it says it again i have to help i please stop i only said one word and i help need help!
2,287684,"work ironically keeping me alivethe only thing keeping me going right now is work and that is almost done. people need me to do my part until next week and they would be fucked without me, so i guess i'll be here until then. once the projects i am working on are done, well, that is a good stopping point. my boss is a friend and a mentor to me, so the idea of disappointing her is all that is really there. i guess that is kinda funny: i am past concern about how my family will be devastated when i die, but the idea of my boss being disappointed in me is keeping me here right now. once that baggage is gone at the end of the month, i do not know what i am going to do. i bought a gun to kill myself a couple weeks ago. i hope i can finally use it and just be done. i am tired of life. it is just a bleak, endless, featureless expanse. there is no hope of it ever being better. i cannot do it anymore. i just want to give up. i do not know why i am bothering to try to put my thoughts into words right now. this is no different than any other fucking post here.","work ironically keeping me alivethe only thing keeping me going right now is work and that is almost done. people need me to do my part until next week and they would be fucked without me, so i guess i'll be here until then. once the projects i am working on are done, well, that is a good stopping point. my boss is a friend and a mentor to me, so the idea of disappointing her is all that is really there. i guess that is kinda funny i am past concern about how my family will be devastated when i die, but the idea of my boss being disappointed in me is keeping me here right now. once that baggage is gone at the end of the month, i do not know what i am going to do. i bought a gun to kill myself a couple weeks ago. i hope i can finally use it and just be done. i am tired of life. it is just a bleak, endless, featureless expanse. there is no hope of it ever being better. i cannot do it anymore. i just want to give up. i do not know why i am bothering to try to put my thoughts into words right now. this is no different than any other fucking post here."
3,138825,“are you even a shitposter if you do not have 20k karma?” ~ sir william shakespeare true story,are you even a shitposter if you do not have 20k karma? sir william shakespeare true story
4,146447,so when i stand up i sometimes get real dizzy and feel like i am about to pass out has this happened to anyone else and if so why does it happen and/or how could i stop it,so when i stand up i sometimes get real dizzy and feel like i am about to pass out has this happened to anyone else and if so why does it happen andor how could i stop it
5,135570,"ok guys so i might be a comedy genius so i got this very fun idea when i was walking home drunk friday. the idea is about making some serious discussion and it blowing up and all the comments are just very serious people talking about a big theme like politics so the comments are going to be like ""yeah so i think we should move the military to mars"" or whatever. and then, when everything is serious talk, i strike, and i strike like a mfing missile. i will then edit whatever the question is to some stupid word like ""monke"" and everyone are just having a nice serious and maybe a bit riled up discussion about monke.","ok guys so i might be a comedy genius so i got this very fun idea when i was walking home drunk friday. the idea is about making some serious discussion and it blowing up and all the comments are just very serious people talking about a big theme like politics so the comments are going to be like yeah so i think we should move the military to mars or whatever. and then, when everything is serious talk, i strike, and i strike like a mfing missile. i will then edit whatever the question is to some stupid word like monke and everyone are just having a nice serious and maybe a bit riled up discussion about monke."
6,203905,"what does your schedule look like rn, this is mine 1. wake up dizzily at 7:55 and open laptop 2. stay awake long enough during 1st period to get important info, sleep until the end of class 3. sleep during passing time 4. do a little bit of work 2nd period, sleep. 5. lunch time! i do one of the following: a. sleep, b. eat, c. shower 6. i can finally open my eyes without my head hurting , space out but do some work in class. 7. complete all work in 4th period. 8. after school, i do one or more of the following: watch yt, eat, sleep, annoy people. 9. dinner; eat 10. chores 11. stare at computer like i am acc gonna do any hw 12. stay up til 3-4 am doing literally anything but being productive. 13. finally realize i am not gonna complete anything and return to my favorite activity. comment your schedules too!","what does your schedule look like rn, this is mine 1. wake up dizzily at 755 and open laptop 2. stay awake long enough during 1st period to get important info, sleep until the end of class 3. sleep during passing time 4. do a little bit of work 2nd period, sleep. 5. lunch time! i do one of the following a. sleep, b. eat, c. shower 6. i can finally open my eyes without my head hurting , space out but do some work in class. 7. complete all work in 4th period. 8. after school, i do one or more of the following watch yt, eat, sleep, annoy people. 9. dinner eat 10. chores 11. stare at computer like i am acc gonna do any hw 12. stay up til 34 am doing literally anything but being productive. 13. finally realize i am not gonna complete anything and return to my favorite activity. comment your schedules too!"
7,203516,"i need advice ok, so i was finally allowed to go out with the bois by myself,ik, my mums overprotective, anyway, she thinks i am “growing up so fast” or some crap because i bought a monster, then, when i told her there were also a couple girls she started thinking the the one ive known for ages is now my “girlfriend” and we “went on a date” i mean, honestly, can i not be friends with a girl without being her girlfriend? i want her to stop but i do not know how, what should i do? thank you for listening to my ted talk","i need advice ok, so i was finally allowed to go out with the bois by myself,ik, my mums overprotective, anyway, she thinks i am growing up so fast or some crap because i bought a monster, then, when i told her there were also a couple girls she started thinking the the one ive known for ages is now my girlfriend and we went on a date i mean, honestly, can i not be friends with a girl without being her girlfriend? i want her to stop but i do not know how, what should i do? thank you for listening to my ted talk"
8,317235,so i was at psychiatristhe told me i am socially undeveloped and its true i feel like he never had a problem like mine i just want to die i hate myself i do not want to be like this i do not want to feel like this and there is much more to it i hope he can help me but deep down i know nothing will help whyý why i am crying so fucking much should just kill myself rly why keep going i do not want do be like this,so i was at psychiatristhe told me i am socially undeveloped and its true i feel like he never had a problem like mine i just want to die i hate myself i do not want to be like this i do not want to feel like this and there is much more to it i hope he can help me but deep down i know nothing will help why why i am crying so fucking much should just kill myself rly why keep going i do not want do be like this
9,168897,day of recommending songs. [daft punk - instant crush]( daft punk is still in the meta. night,day of recommending songs. daft punk instant crush daft punk is still in the meta. night


##**Limpieza Estructural Conjunto Datos**

En esta etapa se llevará a cabo una depuración profunda de la estructura del dataset, garantizando que los registros conserven la coherencia, representatividad y calidad necesarias para continuar con el proceso de modelado. Aquí se identificarán y eliminarán valores atípicos, entradas ruidosas o contenido que no aporte al objetivo analítico, además de revisar la distribución y características generales del texto.

El propósito será asegurar que el conjunto de datos mantenga una base sólida, libre de distorsiones y con una composición adecuada para las etapas posteriores del pre-procesamiento y análisis.

##**Eliminación Outliers Clase Suicide**

En este apartado se llevará a cabo la detección y eliminación de outliers dentro de la clase suicide, con el fin de asegurar que los textos extremadamente cortos o inusualmente largos no afecten la calidad del conjunto de datos. Para ello, se medirá la longitud de cada registro y se identificará qué tan lejos se encuentra del comportamiento típico de esta categoría.

Para mantener la información relevante, se generará una tabla resumen donde se visualizará cuántos casos serán descartados por ser demasiado breves o demasiado extensos. Finalmente, se conservarán únicamente los textos que representen adecuadamente el patrón natural de la clase, integrándolos nuevamente al dataset general y dejándolo listo para continuar con las siguientes etapas del pre-procesamiento.

In [0]:
# Filtrar la clase suicide
df_suicide = df.filter(F.col("class") == "suicide")

# Añadir longitud
df_suicide = df_suicide.withColumn("text_length", F.length("text"))

# Calcular estadísticas Q1, Q3, IQR
stats = df_suicide.select(
    F.percentile_approx("text_length", 0.25).alias("q1"),
    F.percentile_approx("text_length", 0.75).alias("q3")
).collect()[0]

q1 = stats["q1"]
q3 = stats["q3"]
iqr = q3 - q1

py_max = __builtins__.max
lower_bound = py_max(3, q1 - 1.5 * iqr)
upper_bound = int(q3 + 1.5 * iqr)

# Contar outliers
short_outliers = df_suicide.filter(F.col("text_length") < 3).count()
long_outliers  = df_suicide.filter(F.col("text_length") > upper_bound).count()
total_outliers = short_outliers + long_outliers

# Tabla resumen
html_table = f"""
<table border="1" style="border-collapse: collapse; padding: 8px;">
    <tr style="background:#f2f2f2; font-weight:bold;">
        <th>Clase</th>
        <th>Categoría</th>
        <th>Outliers detectados</th>
    </tr>
    <tr>
        <td>suicide</td>
        <td>Textos demasiado cortos (&lt; 3 caracteres)</td>
        <td>{short_outliers}</td>
    </tr>
    <tr>
        <td>suicide</td>
        <td>Textos demasiado largos (&gt; {upper_bound} caracteres)</td>
        <td>{long_outliers}</td>
    </tr>
    <tr style="font-weight:bold;">
        <td>suicide</td>
        <td>Total eliminado</td>
        <td>{total_outliers}</td>
    </tr>
</table>
"""

display(HTML(html_table))

# Eliminar solo outliers de suicide
df_suicide_clean = df_suicide.filter(
    (F.col("text_length") >= 3) &
    (F.col("text_length") <= upper_bound)
).drop("text_length")

# Reintegrar con el dataframe original y mantener clase non-suicide intacta
df = (
    df.filter(F.col("class") != "suicide")
    .unionByName(df_suicide_clean)
)

Clase,Categoría,Outliers detectados
suicide,Textos demasiado cortos (< 3 caracteres),5
suicide,Textos demasiado largos (> 2772 caracteres),8196
suicide,Total eliminado,8201


##**Eliminación Outliers Clase Non-Suicide**

En este apartado se llevará a cabo la detección y eliminación de outliers dentro de la clase non-suicide, con el fin de asegurar que los textos extremadamente cortos o inusualmente largos no afecten la consistencia del conjunto de datos. Para ello, se medirá la longitud de cada registro y se identificará qué tan lejos se encuentra del comportamiento típico de esta categoría.

Para mantener únicamente la información representativa, se generará una tabla resumen donde se visualizará cuántos casos serán descartados por ser demasiado breves o demasiado extensos. Finalmente, se conservarán solo los textos que reflejen adecuadamente el patrón natural de la clase, integrándolos nuevamente al dataset general y dejándolo listo para continuar con las siguientes etapas del pre-procesamiento.

In [0]:
# Filtrar la clase non-suicide
df_non = df.filter(F.col("class") == "non-suicide")

# Añadir longitud
df_non = df_non.withColumn("text_length", F.length("text"))

# Calcular estadísticas Q1, Q3, IQR
stats = df_non.select(
    F.percentile_approx("text_length", 0.25).alias("q1"),
    F.percentile_approx("text_length", 0.75).alias("q3")
).collect()[0]

q1 = stats["q1"]
q3 = stats["q3"]
iqr = q3 - q1

py_max = __builtins__.max
lower_bound = py_max(3, q1 - 1.5 * iqr)
upper_bound = int(q3 + 1.5 * iqr)

# Contar outliers
short_outliers = df_non.filter(F.col("text_length") < 3).count()
long_outliers  = df_non.filter(F.col("text_length") > upper_bound).count()
total_outliers = short_outliers + long_outliers

# Tabla resumen
html_table = f"""
<table border="1" style="border-collapse: collapse; padding: 8px;">
    <tr style="background:#f2f2f2; font-weight:bold;">
        <th>Clase</th>
        <th>Categoría</th>
        <th>Outliers detectados</th>
    </tr>
    <tr>
        <td>non-suicide</td>
        <td>Textos demasiado cortos (&lt; 3 caracteres)</td>
        <td>{short_outliers}</td>
    </tr>
    <tr>
        <td>non-suicide</td>
        <td>Textos demasiado largos (&gt; {upper_bound} caracteres)</td>
        <td>{long_outliers}</td>
    </tr>
    <tr style="font-weight:bold;">
        <td>non-suicide</td>
        <td>Total eliminado</td>
        <td>{total_outliers}</td>
    </tr>
</table>
"""

display(HTML(html_table))

# Eliminar solo outliers de non-suicide
df_non_clean = df_non.filter(
    (F.col("text_length") >= 3) &
    (F.col("text_length") <= upper_bound)
).drop("text_length")

# Reintegrar con el dataframe original y mantener clase suicide intacta
df = (
    df.filter(F.col("class") != "non-suicide")
    .unionByName(df_non_clean)
)

Clase,Categoría,Outliers detectados
non-suicide,Textos demasiado cortos (< 3 caracteres),11
non-suicide,Textos demasiado largos (> 634 caracteres),11948
non-suicide,Total eliminado,11959


##**Eliminación Ruido Irrelevante (Spam, Publicidad, Etc.)**

En este apartado se llevará a cabo la detección y eliminación de contenido que corresponda a spam, publicidad o mensajes claramente promocionales. Este tipo de registros no aporta información útil para el objetivo del proyecto y, además, puede distorsionar la distribución natural del lenguaje dentro del conjunto de datos.

Para realizar esta depuración, se empleará un conjunto de patrones diseñados para identificar frases comunes de publicidad, llamados a la acción, enlaces externos y expresiones típicas de mensajes automatizados. A partir de esta detección, se generará una vista previa, junto con una tabla que resumirá cuántos registros serán eliminados por pertenecer a esta categoría.

Finalmente, se excluirán por completo los textos marcados como spam, garantizando que el conjunto de datos mantenga únicamente información auténtica y alineada con el propósito del pre-procesamiento.

In [0]:
# Definir patrón de SPAM / publicidad
spam_pattern = r"""(?i)(
    click|clic|haz\s+click|presiona
    |gana\s+dinero|trabaja\s+desde\s+tu\s+casa
    |compra\s+ahora|compra\s+ya|oferta|promoción|descuento
    |suscríbete|subscribe
    |http[s]?:\/\/|www\.
    |llama\s+ya|número\s+gratis
)"""

# Crear columna indicando si es spam
df = df.withColumn("is_spam", F.col("text").rlike(spam_pattern))

# Contar spam detectado
spam_count = df.filter("is_spam = true").count()

# Vista previa antes (top 10)
preview_df = (
    df.filter("is_spam = true")
      .select("id", "text")
      .limit(10)
)

# Después (reemplazo del contenido spam)
preview_df = preview_df.withColumn(
    "text",  
    F.regexp_replace("text", spam_pattern, "")
)

# Convertir a pandas y mostrar tabla HTML
preview_pd = preview_df.toPandas()
html_preview = preview_pd.to_html(escape=False)

display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 650px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
<h3>Vista previa</h3>
{html_preview}
</div>
"""))

# Crear tabla resumen
html_table = f"""
<table border="1" style="border-collapse: collapse; padding: 8px;">
    <tr style="background:#f2f2f2; font-weight:bold;">
        <th>Concepto</th>
        <th>Cantidad</th>
    </tr>
    <tr>
        <td>Registros identificados como SPAM / publicidad</td>
        <td>{spam_count}</td>
    </tr>
    <tr style="font-weight:bold; background:#f9f9f9;">
        <td>Total eliminado</td>
        <td>{spam_count}</td>
    </tr>
</table>
"""

display(HTML(html_table))

# Eliminar spam del dataframe final
df = df.filter("is_spam = false").drop("is_spam")

Unnamed: 0,id,text
0,216,"i probably will not. but it just seems like an easy end to the pain.truthfully, like most people here. i just want an outside reason or voice to give me some light and make it seem worth it. for the past few weeks i'll grab my gun, unloaded, cock it, and just put it to my head and pull the trigger just to hear the 'k'. it is pretty fucked up considering i have not felt like this since high school. if anyone cares for the story, i have posted it in roffmychest and rrelationshipadvice. of course this is all over a girl and i know it'll pass. i just hate that my mind is trying to kill me."
1,1742,"the last playlisthere it is. i am going to load up the list, put it on shuffle. and when the music runs out. i am done. anyone want to help me add to it before i hit play? 3 doors down be like that alex clare too close america sister golden hair awolnation not your fault awolnation kill your heroes avril lavigne alice blake shelton i'll just hold on blue october hate me brad paisley, alison krauss whiskey lullaby brooks amp dunn she used to be mine coldplay fix you eric church hell on the heart fuel bad day gary allen airplanes elvis presley suspicious minds eminem amp rihanna love the way you lie enrique iglesias hero george strait i can still make cheyenne george strait she'll leave you with a smile howie day collide iron amp wine flightless bird, american mouth jack johnson flake johnny rivers poor side of town lifehouse everything les miserables little fall of rain maroon 5 she will be loved matchbox twenty push matchbox twenty she is so mean mgmt kids mumford amp sons white blank page owl city air traffic owl city enchanted taylor swift cover owl city the technicolor phase owl city vanilla twilight one republic apologize rob thomas her diamonds shinedown her name is alice snow patrol chasing cars staind outside the airborne toxic event sometime around midnight the all american rejects the poison the k five just the girl u2 stuck in a moment you cannot get out of"
2,2239,"i am tiredi'm really tired of mental illness and struggle and not knowing who i am. i am tired of living for everyone else. i am tired of being depressed. i am tired of being pushed aside. i am tired of being invisible. i am tired of laying here and wondering when it will get better. i am tired of knowing that i cannot be positive no matter how hard i try. the meds only help somewhat. i wanna sleep forever as he as that is, i do not want there to be a heaven or hell. i just want there to be peace in death. i just wanna give up and be in peace."
3,2276,"tricy antidepressantsthey are more dangerous than the new antidepressants, no?"
4,2514,"what could i do?hi, everyone, thanks for king on this link. i am a nihilist as you may have seen on my name i have thought about killing myself a lot of times lately, i know that life is pointless, every human being is just dictated by its conservation and survival instincts as well as their personal pleasure, i know that killing myself would be painless as i know some fast and good methods. i also do not care much about what impact it would have on my friends, family and society as i will not be there to regret anything i do not believe in gods or afterworlds, to me these are just stuff that humans invented because they could not believe that we are pointless and just a complete coincidence i was wondering what could i do before i decide to take my final step towards death, i am ready to die anytime, it does not matter a lot to me anyway but i'd like to have a little fun before that happens. i have been watching a lot of anime, playing some video games and making stuff lately, however, i feel like i have not experienced something that make people want to stay alive longer i have already kissed a girl but i never had a girlfriend, nor have i ever had sex, i failed my last two trimesters at school, i could get good grades if i worked a lot but i feel like it is not worth it, i feel like if i work hard or not, it will not change anything, i am still going to die, i just want to have fun, to do what i want but there is no way i could live by doing the things i want. so, can you think of anything i could do before i kill myself? except watching dank memes all day, i have already done that"
5,2643,"no kbaitsome days i wish i could just stop. it is started to become more then less. this is where i am. churning down a hallway trying to work as hard as i can with the work i am signed to do, but still not getting forward. it is deliriously tiring. seeing people get promoted in front of me for doing jackshit or having an opportunity in front of me without actually being clever enough to do the job. it is all just tiring. it feels like i am bathing in hellfire. scratching through each piercing arrow. happily conjoining with them, just do sear myself even more. i am so tired and so fed up with the situation. i wish. i just wish somebody close to me knew what i feel. and i wish somebody could give me a push and an answer. i am so fed up"
6,2769,"i am tired of trying.i cannot make friends for the life of me. i have got buddies i hang around with at uni and thats it. i guess my dumb trigger today was those buddies talking about how they went to a hip hip party the other night with buddy a's friends. buddy b does not even like hip hop, i do and live 10 minutes away from buddy a but they like each other. i do not have that with anyone. get hobbies i have multiple social and socialish hobbies volunteer i have done in the past but with my hobbies, university and work i do not have time at the moment it does not make a difference because i am still the same boring old me no matter where i go. i have been to social skills classes, i have tried to invite people out only to receive silence or rejection. i have read book after book and watched ted talk after ted talk. i suck at conversations, thinking of things to say. i am probably an energy drain without realising it. i do not want to wait for the one in a romantic sense. i am lonely and do not want to live like this anymore. what is the point of working hard for money and then is suicide only a depression thing? why cannot it be a logical choice? maybe i am a damn alien i cannot connect with people. even when i think i have ked with a person it means nothing to them. maybe people can smell desperation but i do not know what else to do. i am normally one to try to fix an issue instead of complain but i am done. i want to spend time with people. i'll probably end it after university so i would not have failed college twice. i want to at least get a 1sta, go out on a high."
7,3497,"i hope my family reads my reddit posts to understand more clearly why i have done it. i hope i do it soon.i mean, they have no idea what is going on on my mind, if i do not leave anything they would just guilt themselves thinking they are the ones who caused it, and that is the last thing i want. now that i think about it, i really regret posting some mean shit about some people on reddit, especially on this subreddit, and especially my family. i just do not want to turn temporary arguments into permanent wounds yeah ik that sounded really he. i just do not wanna write in my native language because it makes me feel like what i am typing is more real and it just makes me sad. i mean, its real either way, but writing in another language kinda gives you an illusion. its hard to explain i do not know i hope you get it. i do not really know what to say. its too much to type into one thing."
8,4805,"when i lost my mindi have created a throwaway account just so i do not get tarnished on my other one. i am a frequent visitor here although it has been a while since i last visited. i think i have lost my mind. now i truly know the meaning of the word. genuinely, i think my mind is now lost forever. i used to be suicidal. i would not do too much. but i was attempting mildly about once every few months. i have minimal scarring due to my method of choice. anyways, about 9 months ago something deep within me ked. suddenly, i was happy and active. i was diagnosed bipolar. i was working around until 3 am every night. i was working one full time job, part time classes, and i was doing extracurricular work all the time. i was extremely happy, but in an unhealthy way. after about 3 months of super energized bliss, the light did not shut off. it kind of. burned low. now, like an ember in a bulb, it radiates a general content with life. what some people might call happy. however, along with it is the ashes of my previous depression. 20 long years of constant suicidal thoughts perverted my mind. i was a sociopath by many accounts. though sensibly ethical and occasionally expressive, i was always subtly empty. that emptiness carried over after my sudden bout of happiness. now, i am content, but empty. i feel like a shell. i have goals and aspirations now. i have dreams and hopes. but i cannot feel anything. its like instead of being suicidal and having a deep fulfilling desire for suicide, i am now just me. just a sociopath trying his best. i do not feel like i am me anymore. i feel like a robot. i walk into the office, i make jokes, i talk to friends and family, i am pursuing other things, but all the while i feel like a robot. literally. emotions deleted. i guess depression was the one thing that made me feel. human. it was the occasional despair that tied me to this earth. now, i feel beyond it. if i was fired today, i would feel nothing. if someone hated me deeply, nothing. now, instead of falling deeper into sadness, i feel nothing. it sometimes makes me want to murder or steal. just to feel that rush of guilt and despair as my life falls apart. but i am much too ethical to abuse another life just for my own. sometimes though i fantasize about the court room. about the jail time. how i'd ache to be free again, ache for something. my only driving force is to educate myself further and further until i have conquered the educational world. until i have learned physics and math to a level that i consider prideful. so, essentially, i feel weightless and empty now. like i am floating around the world. i have nothing to ground me. does anyone else feel this way?"
9,6940,"i do not know how to dream anymore and i want to give upi do not know what happened but around the time i was 10 something ked in me. within a day i completely lost my ability to dream about a future. i stopped trying in school, i lost normal habits showering regularly, cleaning my room, brushing my teeth and lost all responsibility, under the impression i was never going to grow up so there is no use keeping those habits. now i am 21 and i completely fucked myself over. i do not have any social skills, do not have a job, fuck i barely even have teeth anymore because i have not brushed them in so long half of them are broken or corroded. no matter how much i tell myself i want and need to get help i know i'll never be able to push myself to get the help i need. everything became so much worse when last year i ignored a friend who desperately wanted and needed help because i felt too awkward around someone who was just as depressed as i am and they ended up killing themselves, which i have blamed myself for for 417 days straight, and now i straight up hate the one person who was there for me through it all. all i can do is distract myself from the bad thoughts by watching videos and playing video games. i even force myself to take large doses of sleeping pills to the point where i cannot even feel my own face because if i am left alone with my thoughts sober i start to convince myself that it is not worth it anymore. i know this is all taking a huge toll on my mother because she is forced to support me through all of this but whenever i consider if i should at least get a job i worry the added pressure will just push me over the edge. i do not remember what happened to me when i suddenly changed like that but whatever it was, took away my ability to dream, took away my drive for living, and all my passions and i just wish i could go back in time and prevent it from ever happening."


Concepto,Cantidad
Registros identificados como SPAM / publicidad,1162
Total eliminado,1162


###**Distribución Longitud Texto Mediante Diagramas De Caja**

En este apartado se generará un boxplot interactivo para observar cómo se distribuyen las longitudes de los textos en cada clase después de la eliminación de outliers. Esta visualización permitirá apreciar la forma general de las distribuciones, su rango típico y las diferencias entre categorías, ofreciendo una perspectiva clara y limpia del comportamiento estructural del dataset.

In [0]:
# Parámetros
max_rows = 232074
seed = 42

# Generar nueva columna con longitud de texto
df_temp = df.withColumn("text_length", length(col("text")))

# Convertir a pandas
pdf = df_temp.select("class", "text_length").toPandas()
pdf["class"] = pdf["class"].astype(str)

# Condicional a max_rows
if len(pdf) > max_rows:
    pdf = pdf.sample(max_rows, random_state=seed)

# Colores pastel ultra suaves
soft_colors = [
    "rgba(102, 194, 165, 0.50)",
    "rgba(141, 160, 203, 0.50)",
    "rgba(252, 141, 98, 0.50)",
    "rgba(231, 138, 195, 0.50)",
    "rgba(166, 216, 84, 0.50)"
]

# Gráfico interactivo
fig = px.box(
    pdf,
    x="class",
    y="text_length",
    points="all",
    hover_data=["text_length", "class"],
    color="class",
    color_discrete_sequence=soft_colors,
    title="Distribución Longitud Texto X Clase - Boxplot"
)

# Ajustes al gráfico interactivo
fig.update_traces(
    pointpos=0,
    jitter=0.30,
    marker=dict(size=3, opacity=0.55, line=dict(width=0)),
    width=0.95
)

# Layout grande y elegante
fig.update_layout(
    title_x=0.5,
    xaxis_title="Clase",
    yaxis_title="Longitud del Texto",
    font=dict(
        family="Algerian, DejaVu Sans, sans-serif",
        size=18
    ),
    template="simple_white",
    height=800,
    width=800,
    showlegend=False,
    margin=dict(l=80, r=80, t=110, b=80)
)

# Zoom basado en IQR global para alargar visualmente las cajas
q1 = pdf['text_length'].quantile(0.25)
q3 = pdf['text_length'].quantile(0.75)
iqr = q3 - q1

# Ajustar el factor para aumentar/disminuir el zoom (1.0 = menos zoom, 2.5 = más zoom)
zoom_factor = 2.5
margin = iqr * zoom_factor

y_min = builtins.max(0, q1 - margin)
y_max = q3 + margin

fig.update_yaxes(range=[y_min, y_max], tickformat=",")

# Mostrar gráfico
fig.show()

%md
###**Distribución Longitud Texto Mediante Diagramas De Violín**

En este apartado se generará un gráfico de violín para analizar la distribución de las longitudes de texto por clase. Esta representación permitirá observar la densidad, variabilidad y forma general de cada categoría después de la limpieza previa, incluyendo outliers eliminados. Además, el gráfico incorporará puntos individuales y una línea de media para ofrecer una visión más completa del comportamiento real de los datos en cada clase.


In [0]:
# Parámetros
max_rows = 232074
seed = 42

# Generar nueva columna con longitud de texto
df_temp = df.withColumn("text_length", length(col("text")))

# Convertir a pandas
pdf = df_temp.select("class", "text_length").toPandas()
pdf["class"] = pdf["class"].astype(str)

# Condicional a máximo registros
if len(pdf) > max_rows:
    pdf = pdf.sample(max_rows, random_state=seed)

# Colores pastel ultra suaves
soft_colors = [
    "rgba(102, 194, 165, 0.50)",
    "rgba(141, 160, 203, 0.50)",
    "rgba(252, 141, 98, 0.50)",
    "rgba(231, 138, 195, 0.50)",
    "rgba(166, 216, 84, 0.50)"
]

# Gráfico de violín
fig = px.violin(
    pdf,
    x="class",
    y="text_length",
    color="class",
    color_discrete_sequence=soft_colors,
    box=True,
    points="all",
    hover_data=["class", "text_length"],
    title="Distribución Longitud Texto X Clase - Violín"
)

# Ajustes estéticos
fig.update_traces(
    meanline_visible=True,
    jitter=0.30,
    marker=dict(
        size=3,
        opacity=0.55,
        line=dict(width=0)
    )
)

# Layout elegante con fuente Algerian
fig.update_layout(
    title_x=0.5,
    xaxis_title="Clase",
    yaxis_title="Longitud del Texto",
    font=dict(
        family="Algerian, DejaVu Sans, sans-serif",
        size=18
    ),
    template="simple_white",
    height=700,
    width=900,
    showlegend=False,
    margin=dict(l=80, r=80, t=110, b=60)
)

fig.update_yaxes(tickformat=",")

# Mostrar gráfico
fig.show()

###**Distribución Clases**

En este apartado se genera un gráfico de barras que muestra la cantidad de textos pertenecientes a cada clase del dataset. Esta visualización permite identificar el nivel de equilibrio o desbalance entre categorías, aportando una vista general del estado final del conjunto de datos después de los procesos de limpieza aplicados.

In [0]:
# Convertir dataframe a pandas
class_dist = df.groupBy("class").count().toPandas().copy()

# Tipos
class_dist["class"] = class_dist["class"].astype(str)
class_dist["count"] = class_dist["count"].astype(int)

# Colores pastel ultra suaves con transparencia
soft_colors = [
    "rgba(102, 194, 165, 0.50)",
    "rgba(141, 160, 203, 0.50)",
    "rgba(252, 141, 98, 0.50)",
    "rgba(231, 138, 195, 0.50)",
    "rgba(166, 216, 84, 0.50)"
]

# Gráfico interactivo
fig = px.bar(
    class_dist,
    x="class",
    y="count",
    text="count",
    color="class",
    color_discrete_sequence=soft_colors,
    hover_data={"count": ":,"}
)

# Ajustar estilo de barras
fig.update_traces(
    width=0.35,
    marker=dict(line=dict(width=0)),
    texttemplate="%{text:,}",
    textposition="outside"
)

# Layout con fuente Algerian
fig.update_layout(
    title="Distribución Clases",
    title_x=0.5,
    xaxis_title="Clase",
    yaxis_title="Cantidad de textos",
    font=dict(
        family="Algerian, DejaVu Sans, sans-serif",
        size=16
    ),
    template="simple_white",
    bargap=0.42,
    showlegend=False,
    height=500,
    width=650,
    margin=dict(l=30, r=30, t=70, b=40)
)

# Números eje Y con separador de miles
fig.update_yaxes(tickformat=",")

fig.show()

###**DataFrame Pre-Procesado Almacenado Unity Catalog**

En este apartado se llevará a cabo la persistencia del conjunto de datos procesado dentro de Unity Catalog, con el fin de garantizar un almacenamiento confiable y versionado mediante el formato Delta. Para ello, se escribirá la tabla resultante sobrescribiendo cualquier versión previa y, posteriormente, se cargará nuevamente desde el catálogo para verificar su correcta creación. Esta operación permitirá disponer de un punto de referencia limpio y centralizado, facilitando consultas posteriores y asegurando continuidad en el flujo de trabajo.

In [0]:
df.write.format("delta").mode("overwrite").saveAsTable(
    "workspace.suicide_detection.suicide_detection_clean"
)

df_clean = spark.table("workspace.suicide_detection.suicide_detection_clean")
display(df_clean)

DataFrame[id: bigint, text: string, class: string]

##**Tokenización & Eliminación Básica**

En esta sección se aplicarán transformaciones iniciales orientadas a descomponer el texto en unidades manejables y depurar elementos que no aportan información útil. El objetivo será obtener representaciones más limpias y consistentes de cada registro, reduciendo ruido lingüístico y preparando la base para etapas posteriores de modelado.

###**Tokenización**

En este apartado se llevará a cabo la tokenización del texto mediante un enfoque basado en expresiones regulares, lo que permitirá descomponer cada mensaje en unidades léxicas limpias y normalizadas. El proceso se enfocará en identificar secuencias válidas de caracteres alfabéticos y descartando fragmentos vacíos o irrelevantes. Para facilitar la inspección del resultado, se generará una vista previa que mostrará cómo los textos originales quedarán representados como listas de tokens depurados, sirviendo como punto de partida para las etapas posteriores del pre-procesamiento.

In [0]:
# Tokenización con RegexTokenizer
tokenizer = RegexTokenizer(
    inputCol="text",
    outputCol="tokens",
    pattern=r"[^a-zA-ZáéíóúñÁÉÍÓÚÑ]+",
    toLowercase=True,
    minTokenLength=1
)

# Crear dataframe df_clean
df_clean = tokenizer.transform(df)

# Eliminar tokens vacíos o nulos
df_clean = df_clean.withColumn(
    "tokens",
    F.expr("filter(tokens, x -> x != '' AND x IS NOT NULL)")
)

# Crear preview
sample_pd = (
    df_clean.select("id", "tokens")
            .limit(10)
            .toPandas()
)

html_preview = sample_pd.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 500px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_preview}
</div>
"""))

id,tokens
2,"[ex, wife, threatening, suiciderecently, i, left, my, wife, for, good, because, she, has, cheated, on, me, twice, and, lied, to, me, so, much, that, i, have, decided, to, refuse, to, go, back, to, her, as, of, a, few, days, ago, she, began, threatening, suicide, i, have, tirelessly, spent, these, paat, few, days, talking, her, out, of, it, and, she, keeps, hesitating, because, she, wants, to, believe, i, ll, come, back, i, know, a, lot, of, people, will, threaten, this, in, order, to, get, their, way, but, what, happens, if, she, really, does, what, do, i, do, and, how, am, i, ...]"
8,"[i, need, helpjust, help, me, i, am, crying, so, hard]"
9,"[i, am, so, losthello, my, name, is, adam, and, i, have, been, struggling, for, years, and, i, am, afraid, through, these, past, years, thoughts, of, suicide, fear, anxiety, i, am, so, close, to, my, limit, i, have, been, quiet, for, so, long, and, i, am, too, scared, to, come, out, to, my, family, about, these, feelings, about, years, ago, losing, my, aunt, triggered, it, all, everyday, feeling, hopeless, lost, guilty, and, remorseful, over, her, and, all, the, things, i, have, done, in, my, life, but, thoughts, like, these, with, the, little, i, have, experienced, in, life, only, time, i, have, ...]"
11,"[honetly, idki, do, not, know, what, i, am, even, doing, here, i, just, feel, like, there, is, nothing, and, nowhere, for, me, all, i, can, feel, is, either, nothing, or, unbearably, sad, i, am, ignoring, friends, every, opitunity, i, can, i, feel, like, i, am, loosing, my, girlfriend, i, only, hurt, everyone, i, talk, too, and, i, do, not, cause, anything, good, i, am, behind, on, my, education, i, feel, alone, but, for, the, first, time, its, not, a, feeling, ive, enjoyed, i, have, no, hopes, or, dreams, i, care, about, nothing, not, family, not, friends, not, even, my, girlfriend, ...]"
12,"[trigger, warning, excuse, for, self, inflicted, burnsi, do, know, the, crisis, line, and, used, it, after, when, i, was, having, a, panic, attack, i, know, it, is, not, a, healthy, thing, to, do, but, i, did, i, did, something, stupid, out, of, impulse, i, burned, myself, i, really, need, help, with, an, excuse, as, the, father, of, my, daughter, knows, my, history, we, were, together, years, he, is, seen, my, at, my, worst, but, i, had, always, only, cut, on, my, ankles, and, wrists, i, am, thinking, the, excuse, for, this, would, be, easier, than, one, for, cuts, i, did, ...]"
13,"[it, ends, tonight, i, cannot, do, it, anymore, i, quit]"
18,"[my, life, is, over, at, years, oldhello, all, i, am, a, year, old, balding, male, my, hairline, is, trash, and, to, make, matters, worse, my, head, is, huge, i, have, bipolar, depression, and, crippling, social, anxiety, balding, has, been, the, cherry, on, top, i, wear, a, hat, even, in, my, room, when, i, am, alone, because, i, cannot, stop, thinking, about, it, i, pop, xanax, all, day, to, try, and, numb, the, pain, and, it, works, for, a, little, bit, but, it, all, comes, crashing, back, twice, as, hard, once, i, come, down, i, do, not, know, how, to, communicate, ...]"
19,"[i, took, the, rest, of, my, sleeping, pills, and, my, painkillersi, cannot, wait, for, it, to, end, i, have, struggled, for, the, past, years, and, i, am, finally, ending, it]"
20,"[can, you, imagine, getting, old, me, neither, wrinkles, weight, gain, hair, loss, messed, up, teeth, and, bones, health, issues, menopause, hormones, hating, new, generations, amp, the, way, world, progress, being, a, useless, angry, piece, of, shit, who, cannot, take, care, of, itself, being, totally, depended, on, people, who, secretly, wants, you, to, die, already, can, you, even, imagine, yourself, there, absolutely, not, even, if, i, was, happy, i, d, take, my, life, just, to, avoid, this]"
21,"[do, you, think, getting, hit, by, a, train, would, be, painful, guns, are, hard, to, come, by, in, my, country, but, trains, are, not, i, just, do, not, want, to, suffer, though, do, you, think, this, would, be, a, painless, method, of, suicide]"


###**Eliminación Stopwords**

En este apartado se procederá a eliminar las stopwords en inglés con el fin de reducir términos que, aunque frecuentes, no aportan significado relevante al análisis. Para ello, se conservarán los tokens originales en una columna temporal y se aplicará el filtrado utilizando un conjunto estándar de palabras vacías. Posteriormente, se generará una vista comparativa que permitirá observar cómo se depuran las listas de tokens antes y después del proceso. Esta etapa ayudará a concentrar el contenido en unidades realmente informativas, preparando el dataset para fases más avanzadas de procesamiento lingüístico.

In [0]:
# Stopwords en inglés
stopwords_en = StopWordsRemover.loadDefaultStopWords("english")

# Crear columna temporal tokens_before para mostrar antes
df_clean = df_clean.withColumn("tokens_before", F.col("tokens"))

# Configurar y aplicar StopWordsRemover
remover = StopWordsRemover(
    inputCol="tokens",
    outputCol="tokens_after",
    stopWords=stopwords_en
)

df_clean = remover.transform(df_clean)

# Crear preview Antes/Despues
preview = (
    df_clean
    .select("id", "tokens_before", "tokens_after")
    .limit(12)
    .toPandas()
)

html_preview = preview.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 550px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 15px;
    font-size: 15px;
">
{html_preview}
</div>
"""))

# Aplicar transformación al dataframe procesado
df_clean = (
    df_clean
    .drop("tokens")
    .drop("tokens_before")
    .withColumnRenamed("tokens_after", "tokens")
)

id,tokens_before,tokens_after
2,"[ex, wife, threatening, suiciderecently, i, left, my, wife, for, good, because, she, has, cheated, on, me, twice, and, lied, to, me, so, much, that, i, have, decided, to, refuse, to, go, back, to, her, as, of, a, few, days, ago, she, began, threatening, suicide, i, have, tirelessly, spent, these, paat, few, days, talking, her, out, of, it, and, she, keeps, hesitating, because, she, wants, to, believe, i, ll, come, back, i, know, a, lot, of, people, will, threaten, this, in, order, to, get, their, way, but, what, happens, if, she, really, does, what, do, i, do, and, how, am, i, ...]","[ex, wife, threatening, suiciderecently, left, wife, good, cheated, twice, lied, much, decided, refuse, go, back, days, ago, began, threatening, suicide, tirelessly, spent, paat, days, talking, keeps, hesitating, wants, believe, ll, come, back, know, lot, people, threaten, order, get, way, happens, really, supposed, handle, death, hands, still, love, wife, deal, getting, cheated, constantly, feeling, insecure, worried, today, may, day, hope, much, happen]"
8,"[i, need, helpjust, help, me, i, am, crying, so, hard]","[need, helpjust, help, crying, hard]"
9,"[i, am, so, losthello, my, name, is, adam, and, i, have, been, struggling, for, years, and, i, am, afraid, through, these, past, years, thoughts, of, suicide, fear, anxiety, i, am, so, close, to, my, limit, i, have, been, quiet, for, so, long, and, i, am, too, scared, to, come, out, to, my, family, about, these, feelings, about, years, ago, losing, my, aunt, triggered, it, all, everyday, feeling, hopeless, lost, guilty, and, remorseful, over, her, and, all, the, things, i, have, done, in, my, life, but, thoughts, like, these, with, the, little, i, have, experienced, in, life, only, time, i, have, ...]","[losthello, name, adam, struggling, years, afraid, past, years, thoughts, suicide, fear, anxiety, close, limit, quiet, long, scared, come, family, feelings, years, ago, losing, aunt, triggered, everyday, feeling, hopeless, lost, guilty, remorseful, things, done, life, thoughts, like, little, experienced, life, time, revealed, feelings, family, broke, saw, cuts, watching, get, worried, something, portrayed, average, day, made, feel, absolutely, dreadful, later, found, attempt, survivor, attempt, odoverdose, pills, attempt, hanging, happened, blackout, pills, never, went, noose, still, afraid, first, therapy, diagnosed, severe, depression, social, anxiety, eating, disorder, later, transferred, fucken, group, therapy, reason, made, feel, anxious, eventually, last, session, therapy, showed, results, daily, check, ...]"
11,"[honetly, idki, do, not, know, what, i, am, even, doing, here, i, just, feel, like, there, is, nothing, and, nowhere, for, me, all, i, can, feel, is, either, nothing, or, unbearably, sad, i, am, ignoring, friends, every, opitunity, i, can, i, feel, like, i, am, loosing, my, girlfriend, i, only, hurt, everyone, i, talk, too, and, i, do, not, cause, anything, good, i, am, behind, on, my, education, i, feel, alone, but, for, the, first, time, its, not, a, feeling, ive, enjoyed, i, have, no, hopes, or, dreams, i, care, about, nothing, not, family, not, friends, not, even, my, girlfriend, ...]","[honetly, idki, know, even, feel, like, nothing, nowhere, feel, either, nothing, unbearably, sad, ignoring, friends, every, opitunity, feel, like, loosing, girlfriend, hurt, everyone, talk, cause, anything, good, behind, education, feel, alone, first, time, feeling, ive, enjoyed, hopes, dreams, care, nothing, family, friends, even, girlfriend, still, love, complicated, words, describe, something, end, know, strong, brave, enough, knowing, weak, makes, sadder, thing, push, away, emotion, empty, bad, used, way, normal, understand, people, hopes, dreams, mentioned, bad, feeling, girlfriend, got, scared, die, havnt, brought, talk, realised, cant, even, comprehend, life, meaning, anyone, know, rambling, probably, regret, posting, ill, think, taking, place, someone, worse, ...]"
12,"[trigger, warning, excuse, for, self, inflicted, burnsi, do, know, the, crisis, line, and, used, it, after, when, i, was, having, a, panic, attack, i, know, it, is, not, a, healthy, thing, to, do, but, i, did, i, did, something, stupid, out, of, impulse, i, burned, myself, i, really, need, help, with, an, excuse, as, the, father, of, my, daughter, knows, my, history, we, were, together, years, he, is, seen, my, at, my, worst, but, i, had, always, only, cut, on, my, ankles, and, wrists, i, am, thinking, the, excuse, for, this, would, be, easier, than, one, for, cuts, i, did, ...]","[trigger, warning, excuse, self, inflicted, burnsi, know, crisis, line, used, panic, attack, know, healthy, thing, something, stupid, impulse, burned, really, need, help, excuse, father, daughter, knows, history, together, years, seen, worst, always, cut, ankles, wrists, thinking, excuse, easier, one, cuts, work, car, last, night, self, harmed, long, time, without, thinking, usual, impulse, lost, moment, say, touched, something, hood, car, still, hot, almost, curved, like, pattern, first, forearm, little, side, wrist, inch, long, kind, wide, little, deep, think, car, excuse, good, one, need, say, working, explain, burns, maybe, wire, smooshed, behind, engine, went, fix, touched, engine, want, self, harm, need, able, ...]"
13,"[it, ends, tonight, i, cannot, do, it, anymore, i, quit]","[ends, tonight, anymore, quit]"
18,"[my, life, is, over, at, years, oldhello, all, i, am, a, year, old, balding, male, my, hairline, is, trash, and, to, make, matters, worse, my, head, is, huge, i, have, bipolar, depression, and, crippling, social, anxiety, balding, has, been, the, cherry, on, top, i, wear, a, hat, even, in, my, room, when, i, am, alone, because, i, cannot, stop, thinking, about, it, i, pop, xanax, all, day, to, try, and, numb, the, pain, and, it, works, for, a, little, bit, but, it, all, comes, crashing, back, twice, as, hard, once, i, come, down, i, do, not, know, how, to, communicate, ...]","[life, years, oldhello, year, old, balding, male, hairline, trash, make, matters, worse, head, huge, bipolar, depression, crippling, social, anxiety, balding, cherry, top, wear, hat, even, room, alone, stop, thinking, pop, xanax, day, try, numb, pain, works, little, bit, comes, crashing, back, twice, hard, come, know, communicate, people, anymore, know, keep, relationship, used, one, popular, kids, dad, passed, away, feel, deep, dark, hole, arrested, numerous, times, rehab, mental, hospitals, name, reason, killed, yet, mom, brothers, d, dead, long, ago, getting, point, even, love, support, going, enough, keep, alive, anymore, either, going, guy, killed, guy, went, bald, looks, like, child, molestor, one, ...]"
19,"[i, took, the, rest, of, my, sleeping, pills, and, my, painkillersi, cannot, wait, for, it, to, end, i, have, struggled, for, the, past, years, and, i, am, finally, ending, it]","[took, rest, sleeping, pills, painkillersi, wait, end, struggled, past, years, finally, ending]"
20,"[can, you, imagine, getting, old, me, neither, wrinkles, weight, gain, hair, loss, messed, up, teeth, and, bones, health, issues, menopause, hormones, hating, new, generations, amp, the, way, world, progress, being, a, useless, angry, piece, of, shit, who, cannot, take, care, of, itself, being, totally, depended, on, people, who, secretly, wants, you, to, die, already, can, you, even, imagine, yourself, there, absolutely, not, even, if, i, was, happy, i, d, take, my, life, just, to, avoid, this]","[imagine, getting, old, neither, wrinkles, weight, gain, hair, loss, messed, teeth, bones, health, issues, menopause, hormones, hating, new, generations, amp, way, world, progress, useless, angry, piece, shit, take, care, totally, depended, people, secretly, wants, die, already, even, imagine, absolutely, even, happy, d, take, life, avoid]"
21,"[do, you, think, getting, hit, by, a, train, would, be, painful, guns, are, hard, to, come, by, in, my, country, but, trains, are, not, i, just, do, not, want, to, suffer, though, do, you, think, this, would, be, a, painless, method, of, suicide]","[think, getting, hit, train, painful, guns, hard, come, country, trains, want, suffer, though, think, painless, method, suicide]"


###**Eliminación Tokens Cortos/Irrelevantes (1–2 Letras)**

En este apartado se llevará a cabo la eliminación de tokens cortos, con el objetivo de descartar palabras de una o dos letras que no ofrecen valor semántico relevante. Para ello, se conservará temporalmente la versión original de los tokens, lo que permitirá visualizar una comparación directa antes y después del proceso. Una vez aplicado el filtro, se presentará una vista previa de los cambios realizados y, posteriormente, se eliminará la columna auxiliar. Con esta etapa, el conjunto de tokens quedará más depurado y listo para avanzar hacia análisis más precisos y representativos.

In [0]:
# Crear columna temporal con tokens antes del filtrado
df_clean = df_clean.withColumn("tokens_before_shortfilter", F.col("tokens"))

# Aplicar filtro: dejar solo tokens con len >= 3
df_clean = df_clean.withColumn(
    "tokens",
    F.expr("filter(tokens, x -> length(x) >= 3)")
)

# Crear preview Antes/Despues
preview_short = (
    df_clean
    .filter("tokens_before_shortfilter != tokens")
    .select(
        "id",
        F.col("tokens_before_shortfilter").alias("before"),
        F.col("tokens").alias("after")
    )
    .limit(10)
    .toPandas()
)

html_preview_short = preview_short.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 600px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 15px;
">
{html_preview_short}
</div>
"""))

# Eliminar columna temporal
df_clean = df_clean.drop("tokens_before_shortfilter")

id,before,after
2,"[ex, wife, threatening, suiciderecently, left, wife, good, cheated, twice, lied, much, decided, refuse, go, back, days, ago, began, threatening, suicide, tirelessly, spent, paat, days, talking, keeps, hesitating, wants, believe, ll, come, back, know, lot, people, threaten, order, get, way, happens, really, supposed, handle, death, hands, still, love, wife, deal, getting, cheated, constantly, feeling, insecure, worried, today, may, day, hope, much, happen]","[wife, threatening, suiciderecently, left, wife, good, cheated, twice, lied, much, decided, refuse, back, days, ago, began, threatening, suicide, tirelessly, spent, paat, days, talking, keeps, hesitating, wants, believe, come, back, know, lot, people, threaten, order, get, way, happens, really, supposed, handle, death, hands, still, love, wife, deal, getting, cheated, constantly, feeling, insecure, worried, today, may, day, hope, much, happen]"
18,"[life, years, oldhello, year, old, balding, male, hairline, trash, make, matters, worse, head, huge, bipolar, depression, crippling, social, anxiety, balding, cherry, top, wear, hat, even, room, alone, stop, thinking, pop, xanax, day, try, numb, pain, works, little, bit, comes, crashing, back, twice, hard, come, know, communicate, people, anymore, know, keep, relationship, used, one, popular, kids, dad, passed, away, feel, deep, dark, hole, arrested, numerous, times, rehab, mental, hospitals, name, reason, killed, yet, mom, brothers, d, dead, long, ago, getting, point, even, love, support, going, enough, keep, alive, anymore, either, going, guy, killed, guy, went, bald, looks, like, child, molestor, one, ...]","[life, years, oldhello, year, old, balding, male, hairline, trash, make, matters, worse, head, huge, bipolar, depression, crippling, social, anxiety, balding, cherry, top, wear, hat, even, room, alone, stop, thinking, pop, xanax, day, try, numb, pain, works, little, bit, comes, crashing, back, twice, hard, come, know, communicate, people, anymore, know, keep, relationship, used, one, popular, kids, dad, passed, away, feel, deep, dark, hole, arrested, numerous, times, rehab, mental, hospitals, name, reason, killed, yet, mom, brothers, dead, long, ago, getting, point, even, love, support, going, enough, keep, alive, anymore, either, going, guy, killed, guy, went, bald, looks, like, child, molestor, one, choose]"
20,"[imagine, getting, old, neither, wrinkles, weight, gain, hair, loss, messed, teeth, bones, health, issues, menopause, hormones, hating, new, generations, amp, way, world, progress, useless, angry, piece, shit, take, care, totally, depended, people, secretly, wants, die, already, even, imagine, absolutely, even, happy, d, take, life, avoid]","[imagine, getting, old, neither, wrinkles, weight, gain, hair, loss, messed, teeth, bones, health, issues, menopause, hormones, hating, new, generations, amp, way, world, progress, useless, angry, piece, shit, take, care, totally, depended, people, secretly, wants, die, already, even, imagine, absolutely, even, happy, take, life, avoid]"
25,"[scared, everything, seems, getting, worse, worse, young, think, transgender, even, sure, tell, lying, actually, trans, feel, overwhelmed, thoughts, emotions, take, anymore, wish, least, know, sure, trans, even, worry, religious, family, accepting, actually, anything, alleviate, pain, bit, cut, first, time, yesterday, barely, even, drew, blood, even, fucking, hurt, correctly, think, ll, ever, able, anything, correctly, want, pursue, music, know, money, found, field, unless, become, famous, happening, currently, seriously, debating, suicide, thoughts, keep, coming, back, keep, getting, worse, sure, really, take, much, longer, wish, born, girl, want, cry]","[scared, everything, seems, getting, worse, worse, young, think, transgender, even, sure, tell, lying, actually, trans, feel, overwhelmed, thoughts, emotions, take, anymore, wish, least, know, sure, trans, even, worry, religious, family, accepting, actually, anything, alleviate, pain, bit, cut, first, time, yesterday, barely, even, drew, blood, even, fucking, hurt, correctly, think, ever, able, anything, correctly, want, pursue, music, know, money, found, field, unless, become, famous, happening, currently, seriously, debating, suicide, thoughts, keep, coming, back, keep, getting, worse, sure, really, take, much, longer, wish, born, girl, want, cry]"
38,"[think, today, may, last, everything, becoming, overwhelming, late, enough, night, think, enough, go, finally, end, miserable, life, mine, plan, works, certain, friend, call, know, ll, able, actually, move, know, want, anymore, dying, fixes, everything, like, pain, chest, everything, else, dying, fix, everything, d, stop, nuisance, past, people, care, waste, anyone, time, dead, amp, think, time, right, ll, go]","[think, today, may, last, everything, becoming, overwhelming, late, enough, night, think, enough, finally, end, miserable, life, mine, plan, works, certain, friend, call, know, able, actually, move, know, want, anymore, dying, fixes, everything, like, pain, chest, everything, else, dying, fix, everything, stop, nuisance, past, people, care, waste, anyone, time, dead, amp, think, time, right]"
41,"[best, way, looking, talked, effective, easiest, way, go]","[best, way, looking, talked, effective, easiest, way]"
42,"[man, hope, someone, finds, thisi, drunk, fuck, found, hodgkins, lymphoma, want, fam, suffer, shit, taking, life, tomorrow, guys, think, point, knife, heart, fall, downwards, ll, done, easy, hope, going, man, hope, girls, moves, sweet, loves, falling, sleep, arm, next, hopes, thinks, coward, hope, hates, instead, feeling, sad, man, hope, brother, affected, good, soul, fuck, man, rambling, sorry, gone, tomorrow, bullshit, posts, like]","[man, hope, someone, finds, thisi, drunk, fuck, found, hodgkins, lymphoma, want, fam, suffer, shit, taking, life, tomorrow, guys, think, point, knife, heart, fall, downwards, done, easy, hope, going, man, hope, girls, moves, sweet, loves, falling, sleep, arm, next, hopes, thinks, coward, hope, hates, instead, feeling, sad, man, hope, brother, affected, good, soul, fuck, man, rambling, sorry, gone, tomorrow, bullshit, posts, like]"
44,"[feel, like, drowningi, used, go, school, state, university, drop, financial, issues, year, since, moved, anywhere, life, one, job, hours, week, make, enough, money, essentials, like, food, soap, toothpaste, etc, trying, find, another, job, find, one, feel, like, going, stuck, hole, forever, part, wants, everything, apply, jobs, day, make, sure, call, back, get, explanation, need, also, able, afford, haircut, months, selfconscious, hair, never, gets, long, feels, gross, spent, month, bank, account, overdrafted, clue, pay, rent, sorry, rambling, stop, thinking, killing, really, want, help]","[feel, like, drowningi, used, school, state, university, drop, financial, issues, year, since, moved, anywhere, life, one, job, hours, week, make, enough, money, essentials, like, food, soap, toothpaste, etc, trying, find, another, job, find, one, feel, like, going, stuck, hole, forever, part, wants, everything, apply, jobs, day, make, sure, call, back, get, explanation, need, also, able, afford, haircut, months, selfconscious, hair, never, gets, long, feels, gross, spent, month, bank, account, overdrafted, clue, pay, rent, sorry, rambling, stop, thinking, killing, really, want, help]"
48,"[lonely, year, old, guy, feeling, like, loser, future, feel, alone, insecure, fat, pound, guy, really, small, inch, penis, gf, experience, girls, cause, feel, ugly, think, laugh, size, one, real, friend, barely, family, left, still, cares, top, college, educaton, idea, kind, career, pursue, leaves, working, shit, minimum, wage, job, hate, living, life, paycheck, paycheck, feel, like, saying, fuck, giving, life, good, feel, like, biggest, fucking, loser, world, year, olds, successful, sad]","[lonely, year, old, guy, feeling, like, loser, future, feel, alone, insecure, fat, pound, guy, really, small, inch, penis, experience, girls, cause, feel, ugly, think, laugh, size, one, real, friend, barely, family, left, still, cares, top, college, educaton, idea, kind, career, pursue, leaves, working, shit, minimum, wage, job, hate, living, life, paycheck, paycheck, feel, like, saying, fuck, giving, life, good, feel, like, biggest, fucking, loser, world, year, olds, successful, sad]"
51,"[revenge, suicidedoes, thought, ever, cross, mind, ever, alone, isolated, uncared, feel, like, youve, reached, yet, one, hears, takes, seriously, ever, feel, like, everyone, life, given, feel, like, jumping, roof, right, way, anyone, ever, notice, care, much, pain, feel, onewill, care, dead, people, finally, hear, ll, understand, serious, theyll, understand, much, pain, right, theyll, wish, tried, harder, listen, every, time, spoke, hurting, ending, life, past, twelve, months, may, dead, may, never, see, butpeople, finally, care, peop, e, finally, notice, pain, itll, okay, ill, dead, wont, pain, wont, anything, thats, really, really, okay, coward, enormou, fucking, coward, sitting, roof, slowly, dosing, medication, ...]","[revenge, suicidedoes, thought, ever, cross, mind, ever, alone, isolated, uncared, feel, like, youve, reached, yet, one, hears, takes, seriously, ever, feel, like, everyone, life, given, feel, like, jumping, roof, right, way, anyone, ever, notice, care, much, pain, feel, onewill, care, dead, people, finally, hear, understand, serious, theyll, understand, much, pain, right, theyll, wish, tried, harder, listen, every, time, spoke, hurting, ending, life, past, twelve, months, may, dead, may, never, see, butpeople, finally, care, peop, finally, notice, pain, itll, okay, ill, dead, wont, pain, wont, anything, thats, really, really, okay, coward, enormou, fucking, coward, sitting, roof, slowly, dosing, medication, hope, wit, ...]"


##**Procesamiento Lingüístico Profundo**

En esta sección se aplicarán transformaciones lingüísticas profundas orientadas a normalizar y enriquecer el texto, buscando extraer información relevante y representaciones más consistentes de cada registro. El objetivo será reducir ruido semántico y preparar los datos para análisis y modelado más precisos en etapas posteriores.

###**Lematización**

En este apartado se aplicará un proceso de lematización diseñado para reducir cada palabra a una forma más básica y coherente. Para lograrlo, se utilizarán reglas específicas que permiten reconocer variaciones irregulares, eliminar terminaciones innecesarias y unificar palabras que comparten el mismo significado. Antes de transformar los tokens, se conservará una copia temporal para comparar los cambios realizados y visualizar una muestra del resultado. Con este paso, el texto quedará más normalizado y listo para análisis lingüísticos más precisos en fases posteriores.

In [0]:
# Mapeo de formas irregularares
irregulars = {
    # Formas y verbos comunes
    "went":"go","gone":"go","going":"go",
    "was":"be","were":"be","is":"be","are":"be","been":"be","being":"be",
    "has":"have","had":"have","have":"have",
    "does":"do","did":"do","done":"do","doing":"do",
    "saw":"see","seen":"see","seen":"see",
    "took":"take","taken":"take",
    "gave":"give","given":"give",
    "thought":"think","thinking":"think",
    "bought":"buy","brought":"bring",
    "felt":"feel","felt":"feel",
    "found":"find","found":"find",
    "kept":"keep","kept":"keep",
    "left":"leave","left":"leave",
    "lost":"lose","lost":"lose",
    "made":"make","made":"make",
    "met":"meet","met":"meet",
    "paid":"pay","paid":"pay",
    "said":"say","said":"say",
    "sat":"sit","sat":"sit",
    "stood":"stand","stood":"stand",
    "told":"tell","told":"tell",
    "understood":"understand","understood":"understand",
    "won":"win","won":"win",
    "wrote":"write","written":"write",
    "ran":"run","running":"run",
    "ate":"eat","eaten":"eat",
    "drank":"drink","drunk":"drink",
    "sang":"sing","sung":"sing",
    "broke":"break","broken":"break",
    "chose":"choose","chosen":"choose",

    # Formas comunes de auxiliares y modales
    "can":"can","could":"can","may":"may","might":"may","must":"must",
    "shall":"shall","should":"shall","will":"will","would":"will",

    # Adjetivos irregulares / comparativos
    "better":"good","best":"good","worse":"bad","worst":"bad",

    # Algunos sustantivos con formas irregulares
    "children":"child","mice":"mouse","geese":"goose","teeth":"tooth","feet":"foot",
    "men":"man","women":"woman","people":"person","lice":"louse",

    # Participios comunes / formas
    "succeeded":"succeed","failed":"fail","uploaded":"upload","downloaded":"download",
}

# Palabras que NO se deben modificar aunque coincidan con las reglas (lista corta)
safe_exceptions = set([
    "us","as","is","this","that","bus","gas","news","lens","glass","class","boss"
])

# Reemplazos de sufijos con prioridad (primero los más largos)
suffix_rules = [
    # Transformaciones de verbos/sustantivos/adjetivos (sufijo, reemplazo, longitud mínima de la raíz)
    ("ization", "ize", 4),
    ("ational", "ate", 4),
    ("ication", "ic", 4),
    ("tional", "tion", 4),
    ("ational", "ate", 4),
    ("fulness", "ful", 4),
    ("iveness", "ive", 4),
    ("ization", "ize", 4),
    ("ational", "ate", 4),
    ("ization", "ize", 4),
    ("ational", "ate", 4),
    ("isation", "ise", 4),
    ("isation", "ise", 4),
    ("ization", "ize", 4),
    ("ational", "ate", 4),
    ("ication", "ic", 4),
    ("ness", "", 3),
    ("ment", "", 3),
    ("ship", "", 3),
    ("able", "", 4),
    ("ible", "", 4),
    ("ence", "", 4),
    ("ance", "", 4),
    ("ology", "o", 4),
]

# Pequeñas utilidades auxiliares
vowels = set("aeiou")

def has_vowel(s):
    for ch in s:
        if ch in vowels:
            return True
    return False

def safe_candidate(orig, cand):
    """Devuelve True si cand es seguro de usar en lugar de orig"""
    # Debe ser más corto o diferente y tener ≥3 caracteres (a menos que orig sea pequeño)
    if cand == orig:
        return False
    if len(cand) < 3 and len(orig) >= 3:
        return False
    # El candidato debe contener una vocal (para evitar sin sentido)
    if not has_vowel(cand):
        return False
    # Evitar generar tokens sin sentido extremadamente cortos
    if len(cand) < 2:
        return False
    # Evitar modificar las excepciones de la lista blanca
    if orig in safe_exceptions:
        return False
    return True

# Heurísticas para adivinar la categoría gramatical (POS) (muy aproximado) usando tokens del contexto local
def guess_pos(tokens, i):
    """
    Devuelve 'v' (verbo), 'n' (sustantivo), 'adj' (adjetivo), 'adv' (adverbio) o None
    Heurísticas muy simples:
      - si el token anterior es 'to' o un modal/auxiliar → probablemente verbo
      - si termina en 'ly' → adverbio
      - si el sufijo sugiere sustantivo → sustantivo
      - si el token anterior es 'the'/'a' → probablemente sustantivo
    """
    tok = tokens[i].lower() if tokens[i] else tokens[i]
    prev = tokens[i-1].lower() if i-1 >= 0 and tokens[i-1] else None
    if prev in ("to","will","would","should","could","might","may","can","must","did","do","does","did","have","has","had","be","being","been"):
        return "v"
    if tok.endswith("ly"):
        return "adv"
    if tok.endswith("ness") or tok.endswith("ment") or tok.endswith("tion") or tok.endswith("sion") or tok.endswith("ship"):
        return "n"
    if tok.endswith("able") or tok.endswith("ible") or tok.endswith("al") or tok.endswith("ous") or tok.endswith("ive"):
        return "adj"
    if prev in ("the","a","an","this","those","these"):
        return "n"
    return None

# La función principal de lematización
def super_advanced_lemmatize(tokens):
    if tokens is None:
        return []
    out = []
    # Trabajar con la lista de tokens originales para usar heurísticas del contexto local
    for i, t in enumerate(tokens):
        if t is None:
            continue
        orig = t
        w = orig.lower()

        # Mantener URLs/menciones/hashtags/emoticonos tal como están
        if w.startswith("http") or w.startswith("@") or w.startswith("#") or re.search(r"[:;=8][\-~]?[)DPOp/\\]", w):
            out.append(w)
            continue

        # Mantener signos de puntuación o números puros
        if re.fullmatch(r"[\W_]+", w) or re.fullmatch(r"\d+", w):
            out.append(w)
            continue

        # Mantener si el token es corto y común
        if w in safe_exceptions or len(w) <= 2:
            out.append(w)
            continue

        # Búsqueda directa de irregulares
        if w in irregulars:
            out.append(irregulars[w])
            continue

        # Intentar aplicar reglas explícitas de sufijos (orden de prioridad)
        applied = False
        for suf, rep, minroot in suffix_rules:
            if w.endswith(suf) and len(w) - len(suf) >= minroot:
                cand = w[:-len(suf)] + rep
                # Normalizar letras dobles creadas o eliminadas
                if cand.endswith("ie") and not has_vowel(cand[:-2]):
                    pass
                if safe_candidate(w, cand):
                    out.append(cand)
                    applied = True
                    break
        if applied:
            continue

        # Heurística de categoría gramatical (POS)
        pos = guess_pos(tokens, i)

        # Reglas morfológicas según la categoría gramatical estimada
        if pos == "v" or (w.endswith("ing") or w.endswith("ed") or w.endswith("en")):
            # Manejar 'being' → be
            if w == "being":
                out.append("be"); continue

            # -ing
            if w.endswith("ing") and len(w) > 4:
                base = w[:-3]
                # Consonante doble: running → run
                if len(base) > 2 and base[-1] == base[-2]:
                    cand = base[:-1]
                    if safe_candidate(w, cand):
                        out.append(cand); continue
                # Verbos donde la raíz termina en 'e' eliminada: making → make
                if base + "e" not in ("",):
                    cand = base
                    if safe_candidate(w, cand):
                        out.append(cand); continue
                # Recurso/alternativa de respaldo
                if safe_candidate(w, base):
                    out.append(base); continue

            # -ied -> y (studied -> study)
            if w.endswith("ied") and len(w) > 4:
                cand = w[:-3] + "y"
                if safe_candidate(w, cand):
                    out.append(cand); continue

            # -ed
            if w.endswith("ed") and len(w) > 3:
                base = w[:-2]
                if len(base) > 2 and base[-1] == base[-2]:
                    cand = base[:-1]
                    if safe_candidate(w, cand):
                        out.append(cand); continue
                if safe_candidate(w, base):
                    out.append(base); continue

            # -en participio pasado
            if w.endswith("en") and len(w) > 4:
                cand = w[:-2]
                if safe_candidate(w, cand):
                    out.append(cand); continue

        # Manejo tipo sustantivo
        if pos == "n" or (w.endswith("s") or w.endswith("ies") or w.endswith("ves") or w.endswith("es")):
            # 'ies' -> 'y'
            if w.endswith("ies") and len(w) > 4:
                cand = w[:-3] + "y"
                if safe_candidate(w, cand):
                    out.append(cand); continue
            # 'ves' -> 'f'/'fe' elegir 'f'
            if w.endswith("ves") and len(w) > 4:
                cand = w[:-3] + "f"
                if safe_candidate(w, cand):
                    out.append(cand); continue
            # 'es' -> remover
            if w.endswith("es") and len(w) > 3:
                cand = w[:-2]
                if safe_candidate(w, cand):
                    out.append(cand); continue
            # 's' -> singular (evitar ss/us/is)
            if w.endswith("s") and not (w.endswith("ss") or w.endswith("us") or w.endswith("is")) and len(w) > 3:
                cand = w[:-1]
                if safe_candidate(w, cand):
                    out.append(cand); continue

        # Manejo de adjetivos/adverbios
        if pos == "adj" or pos == "adv":
            # -ly -> remover
            if w.endswith("ly") and len(w) > 4:
                cand = w[:-2]
                if safe_candidate(w, cand):
                    out.append(cand); continue
            # -ness -> remover
            if w.endswith("ness") and len(w) > 5:
                cand = w[:-4]
                if safe_candidate(w, cand):
                    out.append(cand); continue
            # Comparativos / superlativos -er/-est
            if w.endswith("er") and len(w) > 4:
                cand = w[:-2]
                if safe_candidate(w, cand):
                    out.append(cand); continue
            if w.endswith("est") and len(w) > 5:
                cand = w[:-3]
                if safe_candidate(w, cand):
                    out.append(cand); continue

        # Verificar manualmente algunos finales comunes
        # Si termina en 'ies' pero el candidato es demasiado corto, volver al original
        if w.endswith("ies") and len(w) <= 4:
            out.append(w); continue

        # Pequeñas heurísticas morfológicas (captura general)
        # Eliminar sufijos simples solo si es seguro
        if w.endswith("ing") and len(w) > 5:
            cand = w[:-3]
            if safe_candidate(w, cand):
                out.append(cand); continue
        if w.endswith("ed") and len(w) > 4:
            cand = w[:-2]
            if safe_candidate(w, cand):
                out.append(cand); continue
        if w.endswith("ly") and len(w) > 4:
            cand = w[:-2]
            if safe_candidate(w, cand):
                out.append(cand); continue

        # Recurso final: mantener el token original en minúsculas
        out.append(w)

    return out

# Registrar UDF para Spark
lemmatize_udf = F.udf(super_advanced_lemmatize, ArrayType(StringType()))

# Columna temporal before
df_clean = df_clean.withColumn("tokens_before_lemma", F.col("tokens"))

# Aplicar lematización
df_clean = df_clean.withColumn("tokens", lemmatize_udf(F.col("tokens")))

# Crear preview Antes/Después
preview_lemma = (
    df_clean
    .filter("tokens_before_lemma != tokens")
    .select(
        "id",
        F.col("tokens_before_lemma").alias("before"),
        F.col("tokens").alias("after")
    )
    .limit(10)
    .toPandas()
)

# Mostrar tabla con scroll
html_preview_lemma = preview_lemma.to_html(escape=False, index=False)

display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 600px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    padding: 15px;
">
{html_preview_lemma}
</div>
"""))

# Eliminar columna temporal
df_clean = df_clean.drop("tokens_before_lemma")

id,before,after
2,"[wife, threatening, suiciderecently, left, wife, good, cheated, twice, lied, much, decided, refuse, back, days, ago, began, threatening, suicide, tirelessly, spent, paat, days, talking, keeps, hesitating, wants, believe, come, back, know, lot, people, threaten, order, get, way, happens, really, supposed, handle, death, hands, still, love, wife, deal, getting, cheated, constantly, feeling, insecure, worried, today, may, day, hope, much, happen]","[wife, threaten, suiciderecent, leave, wife, good, cheat, twice, lied, much, decid, refuse, back, day, ago, began, threaten, suicide, tireless, spent, paat, day, talk, keep, hesitat, want, believe, come, back, know, lot, person, threat, order, get, way, happen, real, suppos, handle, death, hand, still, love, wife, deal, get, cheat, constant, feel, insecure, worry, today, may, day, hope, much, happ]"
9,"[losthello, name, adam, struggling, years, afraid, past, years, thoughts, suicide, fear, anxiety, close, limit, quiet, long, scared, come, family, feelings, years, ago, losing, aunt, triggered, everyday, feeling, hopeless, lost, guilty, remorseful, things, done, life, thoughts, like, little, experienced, life, time, revealed, feelings, family, broke, saw, cuts, watching, get, worried, something, portrayed, average, day, made, feel, absolutely, dreadful, later, found, attempt, survivor, attempt, odoverdose, pills, attempt, hanging, happened, blackout, pills, never, went, noose, still, afraid, first, therapy, diagnosed, severe, depression, social, anxiety, eating, disorder, later, transferred, fucken, group, therapy, reason, made, feel, anxious, eventually, last, session, therapy, showed, results, daily, check, ...]","[losthello, name, adam, struggl, year, afraid, past, year, thought, suicide, fear, anxiety, close, limit, quiet, long, scar, come, fami, feeling, year, ago, los, aunt, trigger, everyday, feel, hopeless, lose, guilty, remorseful, thing, do, life, thought, like, little, experienc, life, time, reveal, feeling, fami, break, see, cut, watch, get, worry, someth, portray, average, day, make, feel, absolute, dreadful, later, find, attempt, survivor, attempt, odoverdose, pill, attempt, hang, happen, blackout, pill, never, go, noose, still, afraid, first, therapy, diagnos, severe, depression, social, anxiety, eat, disorder, later, transfer, fuck, group, therapy, reason, make, feel, anxious, eventual, last, session, therapy, show, result, dai, check, ...]"
11,"[honetly, idki, know, even, feel, like, nothing, nowhere, feel, either, nothing, unbearably, sad, ignoring, friends, every, opitunity, feel, like, loosing, girlfriend, hurt, everyone, talk, cause, anything, good, behind, education, feel, alone, first, time, feeling, ive, enjoyed, hopes, dreams, care, nothing, family, friends, even, girlfriend, still, love, complicated, words, describe, something, end, know, strong, brave, enough, knowing, weak, makes, sadder, thing, push, away, emotion, empty, bad, used, way, normal, understand, people, hopes, dreams, mentioned, bad, feeling, girlfriend, got, scared, die, havnt, brought, talk, realised, cant, even, comprehend, life, meaning, anyone, know, rambling, probably, regret, posting, ill, think, taking, place, someone, worse, ...]","[honet, idki, know, even, feel, like, noth, nowhere, feel, either, noth, unbearab, sad, ignor, friend, every, opitunity, feel, like, loos, girlfriend, hurt, everyone, talk, cause, anyth, good, behind, education, feel, alone, first, time, feel, ive, enjoy, hop, dream, care, noth, fami, friend, even, girlfriend, still, love, complicat, word, describe, someth, end, know, strong, brave, enough, know, weak, mak, sadder, thing, push, away, emotion, empty, bad, used, way, normal, understand, person, hop, dream, mention, bad, feel, girlfriend, got, scar, die, havnt, bring, talk, realis, cant, even, comprehend, life, mean, anyone, know, rambl, probab, regret, post, ill, think, tak, place, someone, bad, ...]"
12,"[trigger, warning, excuse, self, inflicted, burnsi, know, crisis, line, used, panic, attack, know, healthy, thing, something, stupid, impulse, burned, really, need, help, excuse, father, daughter, knows, history, together, years, seen, worst, always, cut, ankles, wrists, thinking, excuse, easier, one, cuts, work, car, last, night, self, harmed, long, time, without, thinking, usual, impulse, lost, moment, say, touched, something, hood, car, still, hot, almost, curved, like, pattern, first, forearm, little, side, wrist, inch, long, kind, wide, little, deep, think, car, excuse, good, one, need, say, working, explain, burns, maybe, wire, smooshed, behind, engine, went, fix, touched, engine, want, self, harm, need, able, ...]","[trigger, warn, excuse, self, inflict, burnsi, know, crisis, line, used, panic, attack, know, healthy, thing, someth, stupid, impulse, burn, real, need, help, excuse, father, daughter, know, history, together, year, see, bad, alway, cut, ankl, wrist, think, excuse, easier, one, cut, work, car, last, night, self, harm, long, time, without, think, usual, impulse, lose, moment, say, touch, someth, hood, car, still, hot, almost, curv, like, pattern, first, forearm, little, side, wrist, inch, long, kind, wide, little, deep, think, car, excuse, good, one, need, say, work, explain, burn, maybe, wire, smoosh, behind, engine, go, fix, touch, engine, want, self, harm, need, able, ...]"
13,"[ends, tonight, anymore, quit]","[end, tonight, anymore, quit]"
18,"[life, years, oldhello, year, old, balding, male, hairline, trash, make, matters, worse, head, huge, bipolar, depression, crippling, social, anxiety, balding, cherry, top, wear, hat, even, room, alone, stop, thinking, pop, xanax, day, try, numb, pain, works, little, bit, comes, crashing, back, twice, hard, come, know, communicate, people, anymore, know, keep, relationship, used, one, popular, kids, dad, passed, away, feel, deep, dark, hole, arrested, numerous, times, rehab, mental, hospitals, name, reason, killed, yet, mom, brothers, dead, long, ago, getting, point, even, love, support, going, enough, keep, alive, anymore, either, going, guy, killed, guy, went, bald, looks, like, child, molestor, one, choose]","[life, year, oldhello, year, old, bald, male, hairline, trash, make, matter, bad, head, huge, bipolar, depression, crippl, social, anxiety, bald, cherry, top, wear, hat, even, room, alone, stop, think, pop, xanax, day, try, numb, pain, work, little, bit, com, crash, back, twice, hard, come, know, communicate, person, anymore, know, keep, relation, used, one, popular, kid, dad, pas, away, feel, deep, dark, hole, arrest, numerous, tim, rehab, mental, hospital, name, reason, kil, yet, mom, brother, dead, long, ago, get, point, even, love, support, go, enough, keep, alive, anymore, either, go, guy, kil, guy, go, bald, look, like, child, molestor, one, choose]"
19,"[took, rest, sleeping, pills, painkillersi, wait, end, struggled, past, years, finally, ending]","[take, rest, sleep, pill, painkillersi, wait, end, struggl, past, year, final, end]"
20,"[imagine, getting, old, neither, wrinkles, weight, gain, hair, loss, messed, teeth, bones, health, issues, menopause, hormones, hating, new, generations, amp, way, world, progress, useless, angry, piece, shit, take, care, totally, depended, people, secretly, wants, die, already, even, imagine, absolutely, even, happy, take, life, avoid]","[imagine, get, old, neither, wrinkl, weight, gain, hair, loss, mes, tooth, bon, health, issu, menopause, hormon, hat, new, generation, amp, way, world, progress, useless, angry, piece, shit, take, care, total, depend, person, secret, want, die, already, even, imagine, absolute, even, happy, take, life, avoid]"
21,"[think, getting, hit, train, painful, guns, hard, come, country, trains, want, suffer, though, think, painless, method, suicide]","[think, get, hit, train, painful, gun, hard, come, country, train, want, suffer, though, think, painless, method, suicide]"
22,"[death, continuedi, posted, saw, something, interesting, asked, information, know, got, back, bunch, people, wanted, thing, always, spit, back, personal, information, makes, things, worse, obviously, least, bunch, trolls, laughs, end, desire, selfterminate, grows, stronger, little, left, still, bitterness, bit, stronger, main, goal, throughout, process, minimize, subsequent, fallout, certainly, nice, patrons, forum, respectful, privacy, obviously, bit, ridiculous, expectation, considering, source]","[death, continuedi, post, see, someth, interest, ask, information, know, got, back, bunch, person, want, thing, alway, spit, back, personal, information, mak, thing, bad, obvious, least, bunch, troll, laugh, end, desire, selfterminate, grow, stronger, little, leave, still, bitter, bit, stronger, main, goal, throughout, process, minimize, subsequent, fallout, certain, nice, patron, forum, respectful, privacy, obvious, bit, ridiculous, expectation, consider, source]"


###**Extracción Entidades (NER)**

En este apartado se realizará la extracción automática de entidades relevantes dentro del texto, un proceso conocido como Named Entity Recognition (NER). Para ello se definirán listas de palabras y patrones capaces de detectar menciones sobre medicamentos, sustancias, métodos autolesivos, lugares, personas cercanas, cantidades y expresiones que indiquen riesgo o intención. La función creada analizará cada mensaje y marcará cualquier término asociado a estos grupos, permitiendo identificar señales importantes dentro del contenido. Finalmente, se agregarán columnas con el conteo de entidades encontradas y se generará una vista preliminar para revisar los resultados de manera rápida.

In [0]:
# Listas NER (Named Entity Recognition)

# Medicamentos
medications = [
    "acetaminophen","paracetamol","ibuprofen","aspirin","naproxen",
    "amoxicillin","azithromycin","penicillin","metformin","insulin",
    "atorvastatin","simvastatin","lisinopril","amlodipine","omeprazole",
    "pantoprazole","prednisone","hydrocortisone","albuterol","salbutamol",
    "levothyroxine","warfarin","clopidogrel","gabapentin","pregabalin",
    "tramadol","oxycodone","oxycotin","oxycodone","hydrocodone","vicodin",
    "morphine","fentanyl","codeine","buprenorphine","methadone",
    "diazepam","alprazolam","lorazepam","clonazepam","temazepam",
    "zolpidem","eszopiclone","sertraline","fluoxetine","paroxetine",
    "citalopram","escitalopram","venlafaxine","duloxetine","bupropion",
    "amitriptyline","nortriptyline","imipramine","quetiapine","olanzapine",
    "risperidone","aripiprazole","haloperidol","clozapine","metformin",
    "insulin glargine","insulin lispro","sitagliptin","glipizide",
    "pioglitazone","levetiracetam","valproate","carbamazepine","phenytoin",
    "montelukast","furosemide","hydrochlorothiazide","spironolactone",
    "amlodipine","losartan","valsartan","atenolol","propranolol",
    "digoxin","isotretinoin","tamoxifen","letrozole","trastuzumab",
    "prednisone","methotrexate","cyclophosphamide","infliximab",
    "adalimumab","etanercept","rituximab","ondansetron","promethazine",
    "benzodiazepine","benzo","ssri","snri","tca","mirtazapine",
    "xanax","prozac","zoloft","lexapro","celexa","cymbalta","effexor",
    "valium","ativan","klonopin","ambien","lunesta","seroquel","abilify",
    "vyvanse","adderall","ritalin","concerta","xanac","xanex","prozack"
]

# Sustancias
substances = [
    # recreativo / común
    "alcohol","ethanol","weed","marijuana","cannabis","coke","cocaine",
    "crack","heroin","opiates","opiates","opiate","meth","methamphetamine",
    "mdma","ecstasy","lsd","ketamine","pcp","synthetic cannabinoid",
    "nicotine","tobacco","vape","cigarette","smoke",

    # términos comunes para pastillas/contenedores
    "pill","pills","tablet","tablets","capsule","capsules","dose","doses",
    "bottle","bottles","packet","blister",

    # productos químicos domésticos/peligrosos
    "bleach","detergent","pesticide","insecticide","rodenticide","cyanide",
    "mercury","lead","arsenic","ammonia","chlorine","benzene","toluene",
    "paint thinner","solvent","gasoline","kerosene","antifreeze","ethylene glycol",

    # opioides / sintéticos
    "fentanyl","carfentanil","tramadol","oxycodone","hydrocodone","morphine",
    "codeine","heroin","opium",

    # fármacos comunes repetidos para detección de sustancias
    "acetaminophen","paracetamol","ibuprofen","aspirin","naproxen",

    # patrones regex para formas de nombres químicos
    r"\b[A-Za-z0-9\-]+ine\b",         # ej., chlorine, benzocaine, codeine
    r"\b[A-Za-z0-9\-]+ane\b",         # ej., propane, butane, methane
    r"\b[A-Za-z0-9\-]+ol\b",          # ej., ethanol, methanol
    r"\b[A-Za-z0-9\-]+one\b",         # ej., acetone
    r"\b[A-Za-z0-9\-]+ide\b",         # ej., cyanide, chloride
]

# Metodos suicidas
methods = [
    # sobredosis / envenenamiento
    "overdose","overdosed","take overdose","od","poison","poisoning","ingest poison",
    # corte / instrumento punzante
    "cut","cutting","cut myself","slit","slitting","slash","stab","stabbing",
    # ahorcamiento / asfixia
    "hang","hanging","hang myself","strangle","strangling","asphyxiate",
    # armas de fuego
    "shoot","shoot myself","gun","firearm","shot","shooting",
    # saltos / alturas
    "jump","jumped","jumping","jump off","jump from","jumped off","throw myself",
    # ahogamiento
    "drown","drowning","drown myself",
    # monóxido de carbono
    "carbon monoxide","gas leak","gas oven","car exhaust",
    # vehículo / autopista
    "drive into","run over","pit maneuver","steer into",
    # sofocación / bolsa de plástico
    "suffocate","suffocating","smother","plastic bag","bag over head",
    # fuerza contundente / saltar frente a un tren
    "train","railroad","tracks","jump in front of","bus","truck","fast moving",
    # otros
    "end my life","kill myself","take my life","suicide","self harm","self-harm",
    "self harm attempt","selfharm","attempted suicide"
]

# Ubicación
location = [
    # natural
    "mountain","hill","cliff","ridge","valley","canyon","beach","coast","shore","lake",
    "river","stream","waterfall","sea","ocean","island","forest","wood","desert",
    # urbano / construido por el hombre
    "bridge","tunnel","overpass","underpass","highway","freeway","motorway",
    "road","street","avenue","boulevard","lane","drive","alley","parkway","roundabout",
    "intersection","crosswalk","plaza","square","park","playground","parking lot",
    "parking garage","rooftop","roof","balcony","terrace","stadium","arena",
    "station","train station","bus station","terminal","airport","pier","dock",
    "harbor","pier","wharf","dockside","bridge",
    # instituciones / edificios
    "hospital","clinic","emergency room","er","pharmacy","school","college","university",
    "library","museum","hotel","motel","hostel","casino","church","temple","synagogue",
    "police station","courthouse","court","prison","jail","factory","warehouse",
    # transporte
    "railway","railroad","tracks","subway","metro","tram","light rail",
    "highway ramp","cliff edge","waterfront","pier","dock",
    # regex para direcciones / nombres de calles
    r"\b\d{1,5}\s+[A-Z][a-z0-9\-\. ]{2,50}\b",
    r"\b[A-Z][a-z]+ (Street|St|Avenue|Ave|Road|Rd|Boulevard|Blvd|Lane|Ln|Drive|Dr|Court|Ct)\b"
]

# Personas Familiares
person_family = [
    "mom","mother","mommy","mama","mum","dad","father","daddy","pa","pop",
    "son","daughter","sister","brother","uncle","aunt","grandma","grandfather",
    "grandmother","granddad","grandpa","wife","husband","girlfriend","boyfriend",
    "partner","ex","ex-wife","ex-husband","stepfather","stepmother","stepson",
    "stepdaughter","stepbrother","stepsister","cousin","friend","roommate",
    "neighbor","colleague","boss","manager","therapist","doctor","psychiatrist",
    "nurse","teacher","professor"
]

# Palabras de riesgo de acción
action_risk_words = [
    "think about","thinking about","thought about","plan to","planning to","intend to",
    "intend","intend to kill myself","want to die","want to end my life","want to die",
    "considered suicide","considered","attempt","attempted","tried to","tried",
    "bought pills","bought a gun","got pills","took","took pills","took overdose",
    "overdosed","i want to die","i want to kill myself","i might kill myself",
    "i might end it","ready to die","i cant go on","i can't go on","end it all",
    "end it","suicidal","suicidal thoughts","suicide ideation","kill myself",
    "hurt myself","cut myself","self harm","self-harm","selfharm","i tried",
    "i attempted","going to kill myself","planning to kill myself","i will kill myself"
]

# Patrón de ubicación
location_patterns = [
    # puente con identificador opcional
    r"\bbridge(?:\s+\d{1,4})?\b",
    r"\b(?:railroad|railway|train) (?:tracks|track)\b",
    r"\b(?:parking lot|parking garage|car park)\b",
    r"\brooftop\b",
    r"\bbalcony\b",
    r"\bcliff\b",
    r"\b(seaside|waterfront|pier|dock)\b",
    r"\bhighway\b",
    r"\bfreeway\b",
    r"\bstreet\b",
    r"\b(?:bus|train|subway|metro|tram) station\b",
    r"\bhospital\b",
    r"\ber\b",
    r"\bhome\b",
    r"\bhouse\b",
]

# Mapeo de palabras a números
word_nums = {
    "one":"1","two":"2","three":"3","four":"4","five":"5","six":"6","seven":"7",
    "eight":"8","nine":"9","ten":"10","eleven":"11","twelve":"12","dozen":"12",
    "twenty":"20","thirty":"30"
}

# Busca todas las coincidencias de un patrón regex en un texto
def find_regex_phrases(text, pattern):
    return [m.group(0).strip() for m in re.finditer(pattern, text, flags=re.IGNORECASE)]

# Normaliza una lista: minúsculas, trim, y sin duplicados
def norm_list_unique(lst):
    seen = set()
    out = []
    for x in lst:
        try:
            xn = str(x).strip().lower()
        except:
            xn = ""
        if xn and xn not in seen:
            seen.add(xn)
            out.append(xn)
    return out

# Función principal de extracción (usa text + tokens)
def extract_entities_from(text, tokens):
    """
    Devuelve un diccionario con listas: Person, Location, Medication, Substance, MethodSuicide, Quantity, ActionRisk
    """
    entities = {
        "Person": [],
        "Location": [],
        "Medication": [],
        "Substance": [],
        "MethodSuicide": [],
        "Quantity": [],
        "ActionRisk": []
    }
    if (text is None or str(text).strip() == "") and (tokens is None or len(tokens)==0):
        return entities

    txt = (text or "").strip()
    txt_low = txt.lower()
    toks = [t.lower() for t in (tokens or []) if t is not None]

    # Medications: coincidencia de token O subcadena en el texto (límites de palabra)
    for med in medications:
        if (med in toks) or re.search(r"\b" + re.escape(med.lower()) + r"\b", txt_low):
            entities["Medication"].append(med)

    #Substances: coincidencia de token O patrones regex
    for sub in substances:
        if isinstance(sub, str):
            if (sub in toks) or re.search(r"\b" + re.escape(sub.lower()) + r"\b", txt_low):
                entities["Substance"].append(sub)
        else:
            try:
                if re.search(sub, txt, flags=re.IGNORECASE):
                    entities["Substances"].append(re.search(sub, txt, flags=re.IGNORECASE).group(0))
            except:
                pass

    # Methods: coincidir frases o tokens
    for m in methods:
        if (m in toks) or re.search(r"\b" + re.escape(m.lower()) + r"\b", txt_low):
            entities["MethodSuicide"].append(m)

    # Locations: patrones + heurísticas de tokens
    for pat in location_patterns:
        matches = find_regex_phrases(txt, pat)
        for mm in matches:
            entities["Location"].append(mm)

    # Intentar palabras de ubicación directas de la lista Location
    for loc in location:
        if isinstance(loc, str):
            if (loc.lower() in toks) or re.search(r"\b" + re.escape(loc.lower()) + r"\b", txt_low):
                entities["Location"].append(loc)
        else:
            try:
                m = re.search(loc, txt, flags=re.IGNORECASE)
                if m:
                    entities["LOCATION"].append(m.group(0))
            except:
                pass

    # Person: patrones de menciones familiares y algunas heurísticas de título/nombre
    for fam in person_family:
        if re.search(r"\b(my|the|a|his|her)\s+" + re.escape(fam.lower()) + r"\b", txt_low):
            entities["Person"].append(fam)

    # Título + Apellido / Nombre Apellido (heurística simple sobre el texto original)
    for m in re.finditer(r"\b(?:Dr|Dr\.|Mr|Mr\.|Mrs|Mrs\.|Ms|Ms\.)\s+[A-Z][a-z]{1,}\b", text or ""):
        entities["Person"].append(m.group(0))
    for m in re.finditer(r"\b[A-Z][a-z]{2,}\s+[A-Z][a-z]{2,}\b", text or ""):
        entities["Person"].append(m.group(0))

    # Quantity: dígitos y números escritos en palabras
    for num in re.finditer(r"\b(\d+)\b", txt_low):
        entities["Quantity"].append(num.group(1))
    for w in toks:
        if w in word_nums:
            entities["Quantity"].append(word_nums[w])

    # ActionRisk: frases/verbos que indican planificación o acción
    for ar in action_risk_words:
        if ar in txt_low or ar in toks:
            entities["ActionRisk"].append(ar)

    # Post-procesamiento: normalizar, eliminar duplicados, pasar a minúsculas, mantener un orden razonable
    for k in list(entities.keys()):
        entities[k] = norm_list_unique([str(x) for x in entities[k] if x is not None])

    return entities

# Register UDF en Spark
extract_udf = F.udf(lambda text, tokens: extract_entities_from(text, tokens),
                    MapType(StringType(), ArrayType(StringType())))

# Aplicar y crear la columna entities
df_entities = df_clean.withColumn("entities", extract_udf(F.col("text"), F.col("tokens")))

# Crear columnas numéricas derivadas (counts)
df_entities = (
    df_entities
    .withColumn("ent_person_ct", F.size(F.col("entities").getItem("Person")))
    .withColumn("ent_location_ct", F.size(F.col("entities").getItem("Location")))
    .withColumn("ent_med_ct", F.size(F.col("entities").getItem("Medication")))
    .withColumn("ent_substance_ct", F.size(F.col("entities").getItem("Substance")))
    .withColumn("ent_method_ct", F.size(F.col("entities").getItem("MethodSuicide")))
    .withColumn("ent_qty_ct", F.size(F.col("entities").getItem("Quantity")))
    .withColumn("ent_actionrisk_ct", F.size(F.col("entities").getItem("ActionRisk")))
)

# Aplicar transformación al dataframe original
df_clean = df_entities

# Crear preview de registros
preview_full = (
    df_clean
    .limit(100)
    .toPandas()
)

html_preview_full = preview_full.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 700px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    border-radius: 8px;
    padding: 15px;
    box-shadow: 0px 2px 8px rgba(0,0,0,0.15);
    font-size: 14px;
">
{html_preview_full}
</div>
"""))

id,text,class,tokens,entities,ent_person_ct,ent_location_ct,ent_med_ct,ent_substance_ct,ent_method_ct,ent_qty_ct,ent_actionrisk_ct
2,"ex wife threatening suiciderecently i left my wife for good because she has cheated on me twice and lied to me so much that i have decided to refuse to go back to her. as of a few days ago, she began threatening suicide. i have tirelessly spent these paat few days talking her out of it and she keeps hesitating because she wants to believe i'll come back. i know a lot of people will threaten this in order to get their way, but what happens if she really does? what do i do and how am i supposed to handle her death on my hands? i still love my wife but i cannot deal with getting cheated on again and constantly feeling insecure. i am worried today may be the day she does it and i hope so much it does not happen.",suicide,"[wife, threaten, suiciderecent, leave, wife, good, cheat, twice, lied, much, decid, refuse, back, day, ago, began, threaten, suicide, tireless, spent, paat, day, talk, keep, hesitat, want, believe, come, back, know, lot, person, threat, order, get, way, happen, real, suppos, handle, death, hand, still, love, wife, deal, get, cheat, constant, feel, insecure, worry, today, may, day, hope, much, happ]","{'ActionRisk': [], 'MethodSuicide': ['suicide'], 'Medication': [], 'Quantity': [], 'Substance': [], 'Person': ['wife'], 'Location': []}",1,0,0,0,1,0,0
8,i need helpjust help me i am crying so hard,suicide,"[need, helpjust, help, crying, hard]","{'ActionRisk': [], 'MethodSuicide': [], 'Medication': [], 'Quantity': [], 'Substance': [], 'Person': [], 'Location': []}",0,0,0,0,0,0,0
9,"i am so losthello, my name is adam 16 and i have been struggling for years and i am afraid. through these past years thoughts of suicide, fear, anxiety i am so close to my limit . i have been quiet for so long and i am too scared to come out to my family about these feelings. about 3 years ago losing my aunt triggered it all. everyday feeling hopeless , lost, guilty, and remorseful over her and all the things i have done in my life,but thoughts like these with the little i have experienced in life? only time i have revealed these feelings to my family is when i broke down where they saw my cuts. watching them get so worried over something i portrayed as an average day made me feel absolutely dreadful. they later found out i was an attempt survivor from attempt odoverdose from pills and attempt hanging. all that happened was a blackout from the pills and i never went through with the noose because i am still so afraid. during my first therapy i was diagnosed with severe depression, social anxiety, and a eating disorder. i was later transferred to a fucken group therapy for some reason which made me feel more anxious. eventually before my last session with a 1 on 1 therapy she showed me my results from a daily check up on my feelingswhich was a 2 step survey for me and my momdad come to find out as i have been putting feeling horrible and afraidanxious everyday , my mom has been doing i have been doing absolutely amazing with me described as happiest she is ever seen me, therapy has helped him i eventually was put on sertaline anti anxiety or anti depression i am sorry i forgot but i never finished my first prescription nor ever found the right type of anti depressant because my mom thought i only wanted the drugs so she took me off my recommended pill schedule after 3 week and stopped me from taking them. all this time i have been feeling worse afraid of the damage worry i have caused them even more. now here with everything going on, i am as afraid as i have ever been . i have relapsed on cutting and have developed severe insomnia . day after day feeling more hopeless, worthless questioning why am i still here? what is my motivation to move out of bed and keep going? i ask these to myself nearly every night almost having a break down everytime. please please please someone. anyone help me. i am so scared i might do something drastic, i have been shaped by fear and anxiety. i do not know what to do anymore",suicide,"[losthello, name, adam, struggl, year, afraid, past, year, thought, suicide, fear, anxiety, close, limit, quiet, long, scar, come, fami, feeling, year, ago, los, aunt, trigger, everyday, feel, hopeless, lose, guilty, remorseful, thing, do, life, thought, like, little, experienc, life, time, reveal, feeling, fami, break, see, cut, watch, get, worry, someth, portray, average, day, make, feel, absolute, dreadful, later, find, attempt, survivor, attempt, odoverdose, pill, attempt, hang, happen, blackout, pill, never, go, noose, still, afraid, first, therapy, diagnos, severe, depression, social, anxiety, eat, disorder, later, transfer, fuck, group, therapy, reason, make, feel, anxious, eventual, last, session, therapy, show, result, dai, check, ...]","{'ActionRisk': ['attempt', 'took'], 'MethodSuicide': ['cut', 'cutting', 'hang', 'hanging', 'suicide'], 'Medication': [], 'Quantity': ['16', '3', '1', '2'], 'Substance': ['pill', 'pills'], 'Person': ['mom', 'aunt'], 'Location': []}",2,0,0,2,5,4,2
11,"honetly idki do not know what i am even doing here. i just feel like there is nothing and nowhere for me. all i can feel is either nothing or unbearably sad. i am ignoring friends every opitunity i can. i feel like i am loosing my girlfriend. i only hurt everyone i talk too and i do not cause anything good. i am behind on my education, i feel alone but for the first time its not a feeling ive enjoyed. i have no hopes or dreams. i care about nothing, not family, not friends, not even my girlfriend i still love her, its complicated and i do not have the words to describe it. i would do something to end myself but i know i am not strong and brave enough to do it, and knowing i am that weak makes me sadder. the only thing i can do is push away all emotion and be empty, because as bad as it is i am used to it, its my way of being normal. i do not understand how people have hopes or dreams, and i mentioned how bad i was feeling to my girlfriend but she just got scared i would die so i havnt brought it up again. but in that talk i realised i cant even comprehend my life having meaning to anyone. i know this is just me rambling and i will probably regret posting this as ill think i am taking the place of someone having a worse time with a gun to their head. i encoage all people who see this to help them instead of me. ill probably suvive, they might not. plus my life is meaningless and my future bleak, while they could cure cancer or something useful. sorry for wasting your time",suicide,"[honet, idki, know, even, feel, like, noth, nowhere, feel, either, noth, unbearab, sad, ignor, friend, every, opitunity, feel, like, loos, girlfriend, hurt, everyone, talk, cause, anyth, good, behind, education, feel, alone, first, time, feel, ive, enjoy, hop, dream, care, noth, fami, friend, even, girlfriend, still, love, complicat, word, describe, someth, end, know, strong, brave, enough, know, weak, mak, sadder, thing, push, away, emotion, empty, bad, used, way, normal, understand, person, hop, dream, mention, bad, feel, girlfriend, got, scar, die, havnt, bring, talk, realis, cant, even, comprehend, life, mean, anyone, know, rambl, probab, regret, post, ill, think, tak, place, someone, bad, ...]","{'ActionRisk': [], 'MethodSuicide': ['gun'], 'Medication': [], 'Quantity': [], 'Substance': [], 'Person': ['girlfriend'], 'Location': []}",1,0,0,0,1,0,0
12,"trigger warning excuse for self inflicted burnsi do know the crisis line and used it after when i was having a panic attack. i know it is not a healthy thing to do. but, i did. i did something stupid out of impulse. i burned myself. i really need help with an excuse as the father of my daughter knows my history we were together 12 years. he is seen my at my worst but! i had always only cut on my ankles and wrists, i am thinking the excuse for this would be easier than one for cuts. i did work on my car last night and i had not self harmed in a long time. i just did it without thinking, as usual impulse and lost in the moment. should i say i touched something under the hood while the car was still hot? i have 3 almost in a curved like pattern the first on forearm then down a little then on the side of my wrist. they are about an inch long, kind of wide and a little deep. i think the car excuse is a good one but i would need to say what i was working on to explain the 3 burns. maybe that there was a wire smooshed behind the engine and when i went to fix it i touched the engine? i do not want to self harm again, i just need to be able to explain this.",suicide,"[trigger, warn, excuse, self, inflict, burnsi, know, crisis, line, used, panic, attack, know, healthy, thing, someth, stupid, impulse, burn, real, need, help, excuse, father, daughter, know, history, together, year, see, bad, alway, cut, ankl, wrist, think, excuse, easier, one, cut, work, car, last, night, self, harm, long, time, without, think, usual, impulse, lose, moment, say, touch, someth, hood, car, still, hot, almost, curv, like, pattern, first, forearm, little, side, wrist, inch, long, kind, wide, little, deep, think, car, excuse, good, one, need, say, work, explain, burn, maybe, wire, smoosh, behind, engine, go, fix, touch, engine, want, self, harm, need, able, ...]","{'ActionRisk': ['self harm'], 'MethodSuicide': ['cut', 'self harm'], 'Medication': [], 'Quantity': ['12', '3', '1'], 'Substance': [], 'Person': ['father', 'daughter'], 'Location': []}",2,0,0,0,2,3,1
13,it ends tonight.i cannot do it anymore. i quit.,suicide,"[end, tonight, anymore, quit]","{'ActionRisk': [], 'MethodSuicide': [], 'Medication': [], 'Quantity': [], 'Substance': [], 'Person': [], 'Location': []}",0,0,0,0,0,0,0
18,"my life is over at 20 years oldhello all. i am a 20 year old balding male. my hairline is trash and to make matters worse my head is huge. i have bipolar, depression and crippling social anxiety. balding has been the cherry on top. i wear a hat 247 even in my room when i am alone because i cannot stop thinking about it. i pop xanax all day to try and numb the pain and it works for a little bit but it all comes crashing back twice as hard once i come down. i do not know how to communicate with people anymore and i do not know how to keep a relationship. i used to be one of the popular kids but after my dad passed away i feel into a deep dark hole. i have been arrested numerous times, been in rehab, mental hospitals, you name it. the only reason i have not killed myself yet is because of my mom and brothers. if i did not have them i'd be dead long ago. but it is getting to the point where even their love and support is not going to be enough to keep me alive anymore. i am either going to be the guy who killed himself, or the guy who went bald and 20 and looks like a child molestor. which one would you choose?",suicide,"[life, year, oldhello, year, old, bald, male, hairline, trash, make, matter, bad, head, huge, bipolar, depression, crippl, social, anxiety, bald, cherry, top, wear, hat, even, room, alone, stop, think, pop, xanax, day, try, numb, pain, work, little, bit, com, crash, back, twice, hard, come, know, communicate, person, anymore, know, keep, relation, used, one, popular, kid, dad, pas, away, feel, deep, dark, hole, arrest, numerous, tim, rehab, mental, hospital, name, reason, kil, yet, mom, brother, dead, long, ago, get, point, even, love, support, go, enough, keep, alive, anymore, either, go, guy, kil, guy, go, bald, look, like, child, molestor, one, choose]","{'ActionRisk': ['thinking about'], 'MethodSuicide': [], 'Medication': ['xanax'], 'Quantity': ['20', '247', '1'], 'Substance': [], 'Person': ['mom', 'dad'], 'Location': ['hospital']}",2,1,1,0,0,3,1
19,"i took the rest of my sleeping pills and my painkillersi cannot wait for it to end, i have struggled for the past 6 years and i am finally ending it.",suicide,"[take, rest, sleep, pill, painkillersi, wait, end, struggl, past, year, final, end]","{'ActionRisk': ['took'], 'MethodSuicide': [], 'Medication': [], 'Quantity': ['6'], 'Substance': ['pill', 'pills'], 'Person': [], 'Location': []}",0,0,0,2,0,1,1
20,"can you imagine getting old? me neither.wrinkles, weight gain, hair loss, messed up teeth and bones, health issues, menopause, hormones, hating new generations amp the way world progress. being a useless angry piece of shit who cannot take care of itself. being totally depended on people who secretly wants you to die already. can you even imagine yourself there? absolutely not. even if i was happy, i'd take my life just to avoid this.",suicide,"[imagine, get, old, neither, wrinkl, weight, gain, hair, loss, mes, tooth, bon, health, issu, menopause, hormon, hat, new, generation, amp, way, world, progress, useless, angry, piece, shit, take, care, total, depend, person, secret, want, die, already, even, imagine, absolute, even, happy, take, life, avoid]","{'ActionRisk': [], 'MethodSuicide': ['take my life'], 'Medication': [], 'Quantity': [], 'Substance': [], 'Person': [], 'Location': []}",0,0,0,0,1,0,0
21,"do you think getting hit by a train would be painful?guns are hard to come by in my country but trains are not. i just do not want to suffer though, do you think this would be a painless method of suicide?",suicide,"[think, get, hit, train, painful, gun, hard, come, country, train, want, suffer, though, think, painless, method, suicide]","{'ActionRisk': [], 'MethodSuicide': ['gun', 'train', 'suicide'], 'Medication': [], 'Quantity': [], 'Substance': [], 'Person': [], 'Location': []}",0,0,0,0,3,0,0


##**Preparación Final**

En esta sección se realizarán las transformaciones finales necesarias para que los datos puedan ser utilizados por los modelos de aprendizaje automático. Se convertirán los tokens en secuencias numéricas, se aplicará padding para unificar longitudes y se codificarán las etiquetas en un formato compatible con el entrenamiento. Con estos pasos se dejarán las entradas completamente estructuradas y listas para la fase de modelado.

###**Conversión Tokens Secuencias Numéricas**

En este apartado se construirán las representaciones numéricas que utilizará el modelo durante el entrenamiento. Para ello se transformarán los tokens en secuencias de números mediante un tokenizer, se integrarán estas secuencias con las demás características extraídas y se organizará un dataframe final con todos los atributos necesarios. Al finalizar, se generará una vista preliminar que permitirá revisar cómo quedan estructurados los datos antes de pasar a la fase de modelado.

In [0]:
from tensorflow.keras.preprocessing.text import Tokenizer

# Copiar dataframe df_clean 
df_temp = df_clean.withColumn(
    "tokens_str",
    F.concat_ws(" ", F.col("tokens"))
)

# Convertir tokens_str a pandas
df_pd = df_temp.select("tokens_str").toPandas()
tokens_list = df_pd["tokens_str"].tolist()

# Crear y entrenar tokenizer
vocab_size = 20000
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(tokens_list)

# Convertir a secuencias numéricas
df_pd["text_token_numeric"] = tokenizer.texts_to_sequences(df_pd["tokens_str"])

# Preparar dataframe final
columns_final = [
    "ent_person_ct", "ent_location_ct", "ent_med_ct",
    "ent_substance_ct", "ent_method_ct",
    "ent_qty_ct", "ent_actionrisk_ct",
    "class"
]

# Extraer las columnas a pandas
df_base = df_temp.select(columns_final).toPandas()

# Insertar text_token_numeric como primera columna
df_base.insert(0, "text_token_numeric", df_pd["text_token_numeric"])

# Convertir a Spark DataFrame final
spark = SparkSession.builder.getOrCreate()
df_features = spark.createDataFrame(df_base)

# Crear preview de registros
preview = df_features.limit(10).toPandas()
html_preview = preview.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 400px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    border-radius: 6px;
    padding: 10px;
">
{html_preview}
</div>
"""))

text_token_numeric,ent_person_ct,ent_location_ct,ent_med_ct,ent_substance_ct,ent_method_ct,ent_qty_ct,ent_actionrisk_ct,class
"[526, 1063, 1, 66, 526, 10, 668, 752, 1167, 26, 285, 1340, 51, 19, 114, 1238, 1063, 45, 11330, 449, 1, 19, 28, 46, 5160, 2, 243, 146, 51, 5, 107, 9, 1412, 705, 6, 33, 134, 12, 346, 411, 149, 340, 53, 56, 526, 190, 6, 668, 178, 3, 1682, 239, 108, 97, 19, 122, 26, 265]",1,0,0,0,1,0,0,suicide
"[31, 13169, 22, 284, 105]",0,0,0,0,0,0,0,suicide
"[1, 420, 4854, 323, 17, 257, 136, 17, 106, 45, 349, 169, 194, 1179, 992, 87, 120, 146, 47, 293, 17, 114, 433, 1597, 677, 248, 3, 540, 101, 699, 12140, 20, 98, 7, 106, 4, 150, 609, 7, 18, 2148, 293, 47, 113, 27, 184, 209, 6, 239, 50, 6203, 1097, 19, 23, 3, 354, 4728, 347, 49, 174, 3024, 174, 1, 275, 174, 234, 134, 4417, 275, 24, 11, 1260, 53, 257, 109, 267, 600, 514, 103, 230, 169, 191, 542, 347, 1950, 13, 459, 267, 80, 23, 3, 683, 490, 63, 1551, 267, 281, 747, 626, 439, ...]",2,0,0,2,5,4,2,suicide
"[1, 6262, 5, 16, 3, 4, 44, 757, 3, 219, 44, 6782, 125, 731, 15, 55, 1, 3, 4, 2677, 193, 81, 62, 28, 238, 38, 10, 469, 1032, 3, 102, 109, 18, 3, 262, 287, 414, 268, 71, 44, 47, 15, 16, 193, 53, 56, 1920, 296, 1336, 50, 29, 5, 446, 1772, 100, 5, 592, 69, 4064, 20, 361, 91, 489, 438, 25, 187, 33, 233, 160, 9, 414, 268, 728, 25, 3, 193, 41, 120, 34, 5007, 254, 28, 1274, 152, 16, 2828, 7, 130, 40, 5, 1619, 137, 547, 60, 310, 8, 179, 172, 35, 25, ...]",1,0,0,0,1,0,0,suicide
"[677, 1531, 1098, 176, 2749, 1, 5, 1070, 855, 187, 602, 482, 5, 891, 20, 50, 195, 2697, 807, 12, 31, 22, 1098, 360, 842, 5, 774, 300, 17, 27, 25, 54, 184, 6399, 746, 8, 1098, 617, 14, 184, 36, 117, 63, 129, 176, 427, 87, 18, 124, 8, 400, 2697, 101, 263, 21, 801, 50, 4288, 117, 53, 764, 164, 9767, 4, 2498, 109, 4855, 150, 481, 746, 1829, 87, 206, 2774, 150, 436, 8, 117, 1098, 10, 14, 31, 21, 36, 462, 807, 111, 5713, 1, 469, 4818, 11, 408, 801, 4818, 2, 176, 427, 31, 155, ...]",2,0,0,0,2,3,1,suicide
"[29, 274, 32, 558]",0,0,0,0,0,0,0,suicide
"[7, 17, 1, 17, 128, 2794, 636, 6706, 1051, 23, 159, 25, 165, 640, 846, 103, 1405, 230, 169, 2794, 4315, 484, 659, 325, 16, 278, 102, 67, 8, 1034, 1603, 19, 84, 557, 83, 36, 150, 244, 145, 958, 51, 752, 105, 146, 5, 2188, 9, 32, 5, 46, 212, 187, 14, 1410, 185, 167, 523, 91, 3, 436, 456, 892, 1907, 2171, 140, 2742, 139, 269, 420, 80, 132, 197, 118, 260, 177, 87, 114, 6, 70, 16, 56, 273, 11, 100, 46, 175, 32, 219, 11, 82, 132, 82, 11, 2794, 64, 4, 314, 19080, 14, 590]",2,1,1,0,0,3,1,suicide
"[37, 369, 116, 275, 1, 196, 29, 323, 136, 17, 141, 29]",0,0,0,2,0,1,1,suicide
"[509, 6, 128, 1292, 7504, 569, 977, 656, 988, 856, 1132, 2385, 286, 351, 17511, 2571, 325, 148, 2164, 220, 33, 104, 1251, 432, 441, 479, 61, 37, 71, 605, 1348, 9, 1141, 2, 34, 144, 16, 509, 354, 16, 77, 37, 7, 734]",0,0,0,0,1,0,0,suicide
"[8, 6, 301, 660, 528, 384, 105, 146, 435, 660, 2, 188, 147, 8, 873, 562, 45]",0,0,0,0,3,0,0,suicide


###**Padding Tokens Secuencias Numéricas**

En este apartado se estandarizarán las longitudes de todas las secuencias numéricas generadas previamente. Para ello, se calculará la longitud máxima presente en el conjunto y se aplicará padding en modo post, garantizando que cada registro posea una secuencia uniforme y apta para su ingreso a modelos neuronales. Luego, estas secuencias normalizadas se integrarán nuevamente con el resto de características del dataframe, generando una estructura final coherente y lista para la etapa de entrenamiento.

In [0]:
# Extraer solo la columna de secuencias a pandas
df_seq_pd = df_features.select("text_token_numeric").toPandas()
sequences = df_seq_pd["text_token_numeric"].tolist()

# Obtener la máxima longitud observada en la columna text_token_numeric
max_len = builtins.max(len(seq) for seq in sequences) if len(sequences) > 0 else 0
print("Longitud Máxima = ", max_len)

# Aplicar padding (post)
padded = pad_sequences(
    sequences,
    maxlen=max_len,
    padding="post",
    truncating="post"
)

# Guardar padded (listas) en el dataframe pandas auxiliar
df_seq_pd["text_token_numeric"] = padded.tolist()

# Traer el resto de columnas a pandas
other_cols = [c for c in df_features.columns if c != "text_token_numeric"]
df_other_pd = df_features.select(other_cols).toPandas()

# Insertar la columna padded como primera columna con el nombre original
df_other_pd.insert(0, "text_token_numeric", df_seq_pd["text_token_numeric"])

# Convertir de vuelta a Spark (reemplazando el DataFrame)
spark = SparkSession.builder.getOrCreate()
df_features = spark.createDataFrame(df_other_pd)

# Crear preview de registros
preview = df_features.limit(10).toPandas()
html_preview = preview.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 500px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    border-radius: 8px;
    padding: 12px;
    box-shadow: 0px 2px 8px rgba(0,0,0,0.08);
    font-size: 13px;
">
{html_preview}
</div>
"""))

Longitud Máxima =  288


text_token_numeric,ent_person_ct,ent_location_ct,ent_med_ct,ent_substance_ct,ent_method_ct,ent_qty_ct,ent_actionrisk_ct,class
"[526, 1063, 1, 66, 526, 10, 668, 752, 1167, 26, 285, 1340, 51, 19, 114, 1238, 1063, 45, 11330, 449, 1, 19, 28, 46, 5160, 2, 243, 146, 51, 5, 107, 9, 1412, 705, 6, 33, 134, 12, 346, 411, 149, 340, 53, 56, 526, 190, 6, 668, 178, 3, 1682, 239, 108, 97, 19, 122, 26, 265, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",1,0,0,0,1,0,0,suicide
"[31, 13169, 22, 284, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,0,0,0,suicide
"[1, 420, 4854, 323, 17, 257, 136, 17, 106, 45, 349, 169, 194, 1179, 992, 87, 120, 146, 47, 293, 17, 114, 433, 1597, 677, 248, 3, 540, 101, 699, 12140, 20, 98, 7, 106, 4, 150, 609, 7, 18, 2148, 293, 47, 113, 27, 184, 209, 6, 239, 50, 6203, 1097, 19, 23, 3, 354, 4728, 347, 49, 174, 3024, 174, 1, 275, 174, 234, 134, 4417, 275, 24, 11, 1260, 53, 257, 109, 267, 600, 514, 103, 230, 169, 191, 542, 347, 1950, 13, 459, 267, 80, 23, 3, 683, 490, 63, 1551, 267, 281, 747, 626, 439, ...]",2,0,0,2,5,4,2,suicide
"[1, 6262, 5, 16, 3, 4, 44, 757, 3, 219, 44, 6782, 125, 731, 15, 55, 1, 3, 4, 2677, 193, 81, 62, 28, 238, 38, 10, 469, 1032, 3, 102, 109, 18, 3, 262, 287, 414, 268, 71, 44, 47, 15, 16, 193, 53, 56, 1920, 296, 1336, 50, 29, 5, 446, 1772, 100, 5, 592, 69, 4064, 20, 361, 91, 489, 438, 25, 187, 33, 233, 160, 9, 414, 268, 728, 25, 3, 193, 41, 120, 34, 5007, 254, 28, 1274, 152, 16, 2828, 7, 130, 40, 5, 1619, 137, 547, 60, 310, 8, 179, 172, 35, 25, ...]",1,0,0,0,1,0,0,suicide
"[677, 1531, 1098, 176, 2749, 1, 5, 1070, 855, 187, 602, 482, 5, 891, 20, 50, 195, 2697, 807, 12, 31, 22, 1098, 360, 842, 5, 774, 300, 17, 27, 25, 54, 184, 6399, 746, 8, 1098, 617, 14, 184, 36, 117, 63, 129, 176, 427, 87, 18, 124, 8, 400, 2697, 101, 263, 21, 801, 50, 4288, 117, 53, 764, 164, 9767, 4, 2498, 109, 4855, 150, 481, 746, 1829, 87, 206, 2774, 150, 436, 8, 117, 1098, 10, 14, 31, 21, 36, 462, 807, 111, 5713, 1, 469, 4818, 11, 408, 801, 4818, 2, 176, 427, 31, 155, ...]",2,0,0,0,2,3,1,suicide
"[29, 274, 32, 558, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,0,0,0,suicide
"[7, 17, 1, 17, 128, 2794, 636, 6706, 1051, 23, 159, 25, 165, 640, 846, 103, 1405, 230, 169, 2794, 4315, 484, 659, 325, 16, 278, 102, 67, 8, 1034, 1603, 19, 84, 557, 83, 36, 150, 244, 145, 958, 51, 752, 105, 146, 5, 2188, 9, 32, 5, 46, 212, 187, 14, 1410, 185, 167, 523, 91, 3, 436, 456, 892, 1907, 2171, 140, 2742, 139, 269, 420, 80, 132, 197, 118, 260, 177, 87, 114, 6, 70, 16, 56, 273, 11, 100, 46, 175, 32, 219, 11, 82, 132, 82, 11, 2794, 64, 4, 314, 19080, 14, 590, ...]",2,1,1,0,0,3,1,suicide
"[37, 369, 116, 275, 1, 196, 29, 323, 136, 17, 141, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,2,0,1,1,suicide
"[509, 6, 128, 1292, 7504, 569, 977, 656, 988, 856, 1132, 2385, 286, 351, 17511, 2571, 325, 148, 2164, 220, 33, 104, 1251, 432, 441, 479, 61, 37, 71, 605, 1348, 9, 1141, 2, 34, 144, 16, 509, 354, 16, 77, 37, 7, 734, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,1,0,0,suicide
"[8, 6, 301, 660, 528, 384, 105, 146, 435, 660, 2, 188, 147, 8, 873, 562, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,3,0,0,suicide


###**Codificación Etiquetas Clase Suicida (Encoding)**

En este apartado se transformarán las etiquetas de la clase suicida del dataset en valores numéricos adecuados para el entrenamiento del modelo. Se asignará 1 a los casos clasificados como suicide y 0 a aquellos marcados como non-suicide, dejando como nulos los valores no reconocidos o inconsistentes. Con esta codificación, el dataframe quedará estandarizado y listo para integrarse en el pipeline de modelado supervisado.

In [0]:
# Reemplazar: non-suicide -> 0, suicide -> 1; valores no reconocidos -> NULL
df_features = df_features.withColumn(
    "class",
    F.when(F.trim(F.lower(F.col("class"))) == "suicide", F.lit(1))
     .when(F.trim(F.lower(F.col("class"))) == "non-suicide", F.lit(0))
     .otherwise(F.lit(None))
     .cast(IntegerType())
)

# Crear preview de registros
preview = df_features.limit(10).toPandas()
html_preview = preview.to_html(escape=False, index=False)

# Mostrar tabla con scroll
display(HTML(f"""
<div style="
    max-width: 100%;
    max-height: 500px;
    overflow-x: auto;
    overflow-y: auto;
    border: 1px solid #ccc;
    border-radius: 8px;
    padding: 12px;
    box-shadow: 0px 2px 8px rgba(0,0,0,0.08);
    font-size: 13px;
">
{html_preview}
</div>
"""))

text_token_numeric,ent_person_ct,ent_location_ct,ent_med_ct,ent_substance_ct,ent_method_ct,ent_qty_ct,ent_actionrisk_ct,class
"[526, 1063, 1, 66, 526, 10, 668, 752, 1167, 26, 285, 1340, 51, 19, 114, 1238, 1063, 45, 11330, 449, 1, 19, 28, 46, 5160, 2, 243, 146, 51, 5, 107, 9, 1412, 705, 6, 33, 134, 12, 346, 411, 149, 340, 53, 56, 526, 190, 6, 668, 178, 3, 1682, 239, 108, 97, 19, 122, 26, 265, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",1,0,0,0,1,0,0,1
"[31, 13169, 22, 284, 105, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,0,0,0,1
"[1, 420, 4854, 323, 17, 257, 136, 17, 106, 45, 349, 169, 194, 1179, 992, 87, 120, 146, 47, 293, 17, 114, 433, 1597, 677, 248, 3, 540, 101, 699, 12140, 20, 98, 7, 106, 4, 150, 609, 7, 18, 2148, 293, 47, 113, 27, 184, 209, 6, 239, 50, 6203, 1097, 19, 23, 3, 354, 4728, 347, 49, 174, 3024, 174, 1, 275, 174, 234, 134, 4417, 275, 24, 11, 1260, 53, 257, 109, 267, 600, 514, 103, 230, 169, 191, 542, 347, 1950, 13, 459, 267, 80, 23, 3, 683, 490, 63, 1551, 267, 281, 747, 626, 439, ...]",2,0,0,2,5,4,2,1
"[1, 6262, 5, 16, 3, 4, 44, 757, 3, 219, 44, 6782, 125, 731, 15, 55, 1, 3, 4, 2677, 193, 81, 62, 28, 238, 38, 10, 469, 1032, 3, 102, 109, 18, 3, 262, 287, 414, 268, 71, 44, 47, 15, 16, 193, 53, 56, 1920, 296, 1336, 50, 29, 5, 446, 1772, 100, 5, 592, 69, 4064, 20, 361, 91, 489, 438, 25, 187, 33, 233, 160, 9, 414, 268, 728, 25, 3, 193, 41, 120, 34, 5007, 254, 28, 1274, 152, 16, 2828, 7, 130, 40, 5, 1619, 137, 547, 60, 310, 8, 179, 172, 35, 25, ...]",1,0,0,0,1,0,0,1
"[677, 1531, 1098, 176, 2749, 1, 5, 1070, 855, 187, 602, 482, 5, 891, 20, 50, 195, 2697, 807, 12, 31, 22, 1098, 360, 842, 5, 774, 300, 17, 27, 25, 54, 184, 6399, 746, 8, 1098, 617, 14, 184, 36, 117, 63, 129, 176, 427, 87, 18, 124, 8, 400, 2697, 101, 263, 21, 801, 50, 4288, 117, 53, 764, 164, 9767, 4, 2498, 109, 4855, 150, 481, 746, 1829, 87, 206, 2774, 150, 436, 8, 117, 1098, 10, 14, 31, 21, 36, 462, 807, 111, 5713, 1, 469, 4818, 11, 408, 801, 4818, 2, 176, 427, 31, 155, ...]",2,0,0,0,2,3,1,1
"[29, 274, 32, 558, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,0,0,0,1
"[7, 17, 1, 17, 128, 2794, 636, 6706, 1051, 23, 159, 25, 165, 640, 846, 103, 1405, 230, 169, 2794, 4315, 484, 659, 325, 16, 278, 102, 67, 8, 1034, 1603, 19, 84, 557, 83, 36, 150, 244, 145, 958, 51, 752, 105, 146, 5, 2188, 9, 32, 5, 46, 212, 187, 14, 1410, 185, 167, 523, 91, 3, 436, 456, 892, 1907, 2171, 140, 2742, 139, 269, 420, 80, 132, 197, 118, 260, 177, 87, 114, 6, 70, 16, 56, 273, 11, 100, 46, 175, 32, 219, 11, 82, 132, 82, 11, 2794, 64, 4, 314, 19080, 14, 590, ...]",2,1,1,0,0,3,1,1
"[37, 369, 116, 275, 1, 196, 29, 323, 136, 17, 141, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,2,0,1,1,1
"[509, 6, 128, 1292, 7504, 569, 977, 656, 988, 856, 1132, 2385, 286, 351, 17511, 2571, 325, 148, 2164, 220, 33, 104, 1251, 432, 441, 479, 61, 37, 71, 605, 1348, 9, 1141, 2, 34, 144, 16, 509, 354, 16, 77, 37, 7, 734, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,1,0,0,1
"[8, 6, 301, 660, 528, 384, 105, 146, 435, 660, 2, 188, 147, 8, 873, 562, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]",0,0,0,0,3,0,0,1


###**DataFrame Final Procesado Almacenado Unity Catalog**

En este último paso de la etapa de pre-procesamiento se almacenarán las características finales en una tabla Delta dentro del Unity Catalog. Esto permitirá preservar el conjunto de datos limpio, tokenizado, vectorizado y completamente estructurado para su uso en fases posteriores de entrenamiento, validación y experimentación. Al cargar nuevamente la tabla, se verificará que los registros estén disponibles y correctamente formateados.

In [0]:
df_features.write.format("delta").mode("overwrite").saveAsTable(
    "workspace.suicide_detection.suicide_detection_features"
)

df_features = spark.table("workspace.suicide_detection.suicide_detection_features")
display(df_features)

DataFrame[text_token_numeric: array<bigint>, ent_person_ct: int, ent_location_ct: int, ent_med_ct: int, ent_substance_ct: int, ent_method_ct: int, ent_qty_ct: int, ent_actionrisk_ct: int, class: int]