# Práctica 2 - Word2Vec

- Martínez Ostoa Néstor I.
- Procesamiento de Lenguaje Natural
- IeC - FI - UNAM

--- 
**Objetivo**: A partir del corpus seleccionado en el notebook anterior **lab-1-bpe-algorithm.ipynb** realizar un modelo de embeddings basado en Word2Vec. 

Pasos a realizar: 

1. Trabajar con el corpus tokenizado
2. Obtener los pares de entrenamiento a partir de los contextos
3. Construir una red neuronal con una capa con 128 unidades ocultas. Entrenar la red para obtener los embeddings
4. Evaluar el modelo (capa de salida) con Entropía o Perplejidad
5. Visualizar los embeddings
6. Guardar los vectores de la capa de embedding asociados a las palabras

---

**Corpus elegido:** Don't Patronize Me! dataset ([link](https://github.com/Perez-AlmendrosC/dontpatronizeme))

- Este corpus contiene $10,468$ párrafos extraídos de artículos de noticias con el objetivo principal de realizar un análisis para detectar lenguaje condescendiente (*patronizing and condescending language PCL*) en grupos socialmente vulnerables (refugiados, familias pobres, personas sin casa, etc)
- Cada uno de estos párrafos están anotados con etiquetas que indican el tipo de lenguaje PCL que se encuentra en él (si es que está presente). Los párrafos se extrajeron del corpus [News on Web (NOW)](https://www.english-corpora.org/now/)
- [Link al paper principal](https://aclanthology.org/2020.coling-main.518/)


**Estructura del corpus (original - antes del proceso de limpieza)**

- De manera general, el dataset contiene párrafos anotados con una etiqueta con valores entre $0$ y $4$ que indican el nivel de lenguaje PCL presente
- Cada instancia del dataset está conformada de la siguiente manera:
    - ```<doc-id>```: id del documento dentro del corpus NOW
    - ```<keyword>```: término de búsqueda utilizado para extraer textos relacionados con una comunidad en específico
    - ```<country-code>```: código de dos letras ISO Alpha-2
    - ```<paragraph>```: párrafo perteneciente al ```<keyword>```
    - ```<label>```: entero que indica el nivel de PCL presente
    
**Estructura del corpus actual (después del proceso de limpieza)**

- ```paragraph```: párrafo limpio sin stop words, signos de puntuación

## 0. Bibliotecas requeridas

In [107]:
import string
import re
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from utils import sigmoid, get_batches, compute_pca, get_dict
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import nltk
from nltk.corpus import stopwords
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/nestorivanmo/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## 1. Tokenización del corpus

- Como el corpus ya está limpio, lo único que nos queda por hacer es tokenizarlo

In [41]:
path = "../dontpatronizeme_v1.4/dontpatronizeme_pcl_clean.tsv"
df = pd.read_csv(path)
df = df.sample(frac=0.3, random_state=0)
print(f"Número de párrafos: {df.shape[0]}")
df.head()

Número de párrafos: 3140


Unnamed: 0,paragraph
4007,refugees identified ioc possible contenders va...
9503,mention moments highlight illustrate potential...
6994,many celebrities wore blue ribbons support ame...
3629,game latest longrunning growing strategyrpg se...
8107,amy fischer policy director texasbased immigra...


In [42]:
tokenized_data = []
for idx, row in df.iterrows():
    tdata = nltk.word_tokenize(row["paragraph"])
    tokenized_data.append(tdata)
    
tokenized_df = pd.DataFrame({"tokenized_paragraph": tokenized_data})
tokenized_df.head()

Unnamed: 0,tokenized_paragraph
0,"[refugees, identified, ioc, possible, contende..."
1,"[mention, moments, highlight, illustrate, pote..."
2,"[many, celebrities, wore, blue, ribbons, suppo..."
3,"[game, latest, longrunning, growing, strategyr..."
4,"[amy, fischer, policy, director, texasbased, i..."


In [43]:
print(f"Original data:\n{df.iloc[0,0]} \n")
print(f"Tokenized data:\n{tokenized_df.iloc[0,0]}")

Original data:
refugees identified ioc possible contenders various sports selection made june un refugee agency source told afp 

Tokenized data:
['refugees', 'identified', 'ioc', 'possible', 'contenders', 'various', 'sports', 'selection', 'made', 'june', 'un', 'refugee', 'agency', 'source', 'told', 'afp']


## 2. Obtención de pares de entrenamiento a partir de los contextos

**Entrada**: 
- DataFrame con los párrafos tokenizados

**Salida**: 
- $X$: matriz de $V\times m$ con los vectores de las palabras de contexto: ```<Pandas DataFrame>```
- $Y$: matriz de $V \times m$ con los vectores de las palabras centradas: ```<Pandas DataFrame>```

donde $V$ es el tamaño del vocabulario de palabras del corpus y $m$ es el tamaño de la ventana y se define como $m=2c + 1$

---
**Proceso**:

1. **Definir $C$**
2. **Obtener un vocabulario del corpus**
3. **Para cada párrafo tokenizado**:
    - *Obtener el vector de palabras de contexto*:
        - Con base en $C$, obtener una lista de palabras de contexto
        - Para cada palabra de contexto, obtener su codificación *one-hot*
        - Hacer el promedio de cada uno de los vectores de contexto
    - *Obtener el vector de palabra de centrado*:
        - Realizar la codificación *one-hot*
    - *Almacenar ambos vectores en dos matrices: $X$ y $Y$*


### $C$ - Tamaño del contexto

In [44]:
C = 2

### Vocabulario del corpus

A parte del vocabulario del corpus, obtendremos dos diccionarios útiles:

1. ```word_to_index```:
    - **Llave**: palabra del corpus
    - **Valor**: índice numérico dentro del corpus
    
2. ```index_to_word```: 
    - **Llave**: índice numérico de la palabra dentro del corpus
    - **Valor**: palabra del corpus

In [45]:
def get_word_vocab(tokenized_data):
    """
    Params:
    -------
    tokenized_data: <Pandas Dataframe>
    
    Returns:
    --------
    word_vocab: <set>
    
    N: <int>
        - Size of the word vocabulary
    """
    word_vocab = set()
    for _, row in tokenized_data.iterrows():
        tokenized_paragraph = row[tokenized_data.columns[0]]
        
        for word in tokenized_paragraph:
            word_vocab.add(word)
    
    return word_vocab, len(word_vocab)

In [46]:
word_vocab, V = get_word_vocab(tokenized_df)

In [47]:
print(f"Vocabulary size:\n- {V}\n")
print(f"Vocabulary sample:")
for w in list(word_vocab)[130:135]: print(f"- {w}")

Vocabulary size:
- 16003

Vocabulary sample:
- monitored
- backtotheland
- steer
- kinky
- detainers


In [48]:
def get_dictionaries(word_vocab):
    """
    Params:
    -------
    word_vocab: <set>
        - Contains all the words in the corpus
        
    Returns:
    --------
    word_to_index: <dictionary>
        - Key: word
        - Value: index of the word in the corpus
        
    index_to_word: <dictionary>
        - Key: index of the word in the corpus
        - Value: word
    """
    word_to_index = dict()
    index_to_word = dict()
    
    words = sorted(list(word_vocab))
    for idx, word in enumerate(words):
        index_to_word[idx] = word
        word_to_index[word] = idx
        
    return word_to_index, index_to_word


In [49]:
word_to_index, index_to_word = get_dictionaries(word_vocab)

### Obtención de $X$ y $Y$

In [50]:
def get_one_hot_vector(word, word_to_index, V):
    """
    Params:
    -------
    word: <str>
    
    word_to_index: <dictionary>
        - Key: word
        - Value: index of the word in the corpus
    
    V: <int>
        - size of the corpus' vocabulary of words
    
    Returns:
    -------
    one_hot_vector: <Numpy's ndarray>
    """
    one_hot_vector = np.zeros(V)
    one_hot_vector[word_to_index[word]] = 1
    return one_hot_vector


def get_one_hot_from_context_words(context_words, word_to_index, V):
    """
    Params:
    -------
    context_words: <list>

    word_to_index: <dictionary>
    
    V: <int>
    
    Returns:
    --------
    one_hot_vector: <numpy's ndarray>
        - Mean representation of all the context words' one hot vectors
    """
    one_hot_vectors = [get_one_hot_vector(w, word_to_index, V) for w in context_words]
    return np.mean(one_hot_vectors, axis=0)
    

def get_context_centered_words(tokenized_paragraph, C):
    """
    Params:
    -------
    tokenized_paragraph: <list>
    
    C: <int>
        - Size of the context
    
    Returns:
    -------
    context_words: <list>
    
    centered_words: <matrix>
    """
    context_words_matrix = []
    centered_words = tokenized_paragraph
    
    m = len(tokenized_paragraph)
    for idx, word in enumerate(centered_words):
        context_words = []
        
        # Context words before centered word
        if idx < C and idx != 0: context_words += tokenized_paragraph[:idx]
        else:                    context_words += tokenized_paragraph[idx-C:idx]
            
        # Context words after centered word
        if idx > m-C and idx != m-1: context_words += tokenized_paragraph[idx:]
        else:                        context_words += tokenized_paragraph[idx+1:idx+C+1]
            
        context_words_matrix.append(context_words)
    
    return context_words_matrix, centered_words

In [51]:
def get_X_Y(tokenized_paragraphs_df, word_to_index, V, C):
    """
    Params:
    -------
    tokenized_paragraphs_df: <Pandas DataFrame>
    
    word_to_index: dictionary where keys are words and values are indices of the word in the corpus
    
    C: <int>
        - Size of the context
    
    V: <int>
        - Size of the vocabulary
    
    Returns:
    --------
    XY: <Pandas DataFrame>
    """
    X = []
    Y = []
    centered_words_list = []
    context_words_list = []
    for _, row in tokenized_paragraphs_df.iterrows():
        paragraph = row[tokenized_paragraphs_df.columns[0]]
        context_words_matrix, centered_words = get_context_centered_words(paragraph, C)
        
        for idx, context_words in enumerate(context_words_matrix):
            centered_words_list.append(centered_words[idx])
            Y.append(
                get_one_hot_vector(centered_words[idx], word_to_index, V)
            )
            
            context_words_list.append(context_words)
            X.append(
                get_one_hot_from_context_words(context_words, word_to_index, V)
            )
    
    df_dict = {
        "centered_word": np.array(centered_words_list),
        "context_words": np.array(context_words_list),
        "X": X,
        "Y": Y,
    }
    XY = pd.DataFrame(df_dict)
    return XY
        

In [52]:
data_df = get_X_Y(tokenized_df, word_to_index, V, C)

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  "context_words": np.array(context_words_list),


In [53]:
print(tokenized_df.shape)
print(data_df.shape)
data_df.head()

(3140, 1)
(75759, 4)


Unnamed: 0,centered_word,context_words,X,Y
0,refugees,"[identified, ioc]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
1,identified,"[refugees, ioc, possible]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
2,ioc,"[refugees, identified, possible, contenders]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
3,possible,"[identified, ioc, contenders, various]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,contenders,"[ioc, possible, various, sports]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."


In [54]:
data_df.iloc[1, 2].shape

(16003,)

In [55]:
X = data_df['X']
Y = data_df['Y']

print(f"X shape: {X.shape[0]} x {X.iloc[0].shape[0]}")
print(f"Y shape: {Y.shape[0]} x {Y.iloc[0].shape[0]}")

X shape: 75759 x 16003
Y shape: 75759 x 16003


## 3. Red Neuronal

**Instrucción**: construir una red neuronal cocn 128 unidades ocultas. Entrenar la red para obtener los embeddings. 

**Desarrollo**: en esta práctica estaremos utilizando la arquitectura neuronal **word2vec** para genear embeddings de palabras. Concretamente, estaremos utilizando **CBOW (continuous bag-of-words)** como diseño de arquitectura neuronal en la cual notaremos lo siguiente: 

- La entrada a esta red neuronal serán los vectores de palabras de contexto $X$
- La salida de esta red neuronal será el vector de palabras de centrado estimado $\hat{y}$

---

**Arquitectura neuronal - CBOW**

- **Capa de entrada $X$:**
    - Matriz de vectores de palabras de contexto $X$
    - $X$ es de dimensiones $V\times m$
        - $V=38069$
        - $m=10672$
- **Capa oculta $H$:**
    - $H$ es de dimensiones $N\times m$
        - $N = 128$ 
    - $H = \text{ReLU}(Z_1)$
    - $Z_1 = W_1X + B_1 $
- **Capa de salida $\hat{y}$:**
    - $\hat{y}$ es de dimensiones $V \times m$
    - $\hat{y} = \text{softmax}(Z_2)$
    - $Z_2 = W_2H + B_2$

In [56]:
tokenized_train, tokenized_test = train_test_split(tokenized_df, test_size=0.3)
print(f"{tokenized_df.shape}")
print(f"{tokenized_train.shape}")
print(f"{tokenized_test.shape}")

(3140, 1)
(2198, 1)
(942, 1)


In [95]:
# Data - list of tokens
training_data = []
for idx, row in tokenized_train.iterrows():
    training_data += row["tokenized_paragraph"]
print(len(data), data[:10])


N = 128
word2Ind, Ind2word = get_dict(training_data)
V = len(word2Ind)
print(f"Train vocab size: {V}")

52141 ['antimuslim', 'environment', 'encouraged', 'hindu', 'religious', 'right', 'hobnob', 'people', 'like', 'trump']
Train vocab size: 12858


### 3.1 Funciones de activación

Para implementar esta red neuronal, utilizamos dos funciones de activación:

1. **Rectified Linear Unit (ReLU)**:
    - $\text{ReLU}(z) = \max(0, z)$
2. **Softmax**
    - $\hat{y} = \frac{\exp(z)}{\sum_{j=1}^V \exp(z_j)}$

In [58]:
# Funciones de activación
def ReLU(z):
    result = z.copy()
    result[result < 0] = 0    
    return result

def softmax(z):
    e_z = np.exp(z)
    yhat = e_z/np.sum(e_z,axis=0)
    return yhat

### 3.2 Propagación hacía adelante

1. Definición del hiperparámetro $N$
2. Inicialización de las matrices $W_1, W_2, B_1, B_2$
3. Funciones para calcular $Z_1, H$

In [59]:
def init_model(N,V, random_state=1):
    """
    Params:
    -------
    N:  <int>
        Dimension of the hidden layer
        
    V:  <int>
        Dimension of vocabulary
        
    random_state: <int>
        Make random results consistent - could be any number

    Returns:
    --------
    W1: <numpy's ndarray>
        - Weights of the hidden layer of dimensions NxV
        
    b1: <numpy's ndarray>
        - Biases of the hidden layer of dimensions Nx1
        
    W2: <numpy's ndarray>
        - Weights of the output layer of dimensions VxN
        
    b2: <numpy's ndarray>
        - Biases of the output layer of dimensions Vx1
    """
    
    np.random.seed(random_state)
    
    W1 = np.random.rand(N, V)
    b1 = np.random.rand(N, 1)
    
    W2 = np.random.rand(V, N)
    b2 = np.random.rand(V, 1)

    return W1, b1, W2, b2

def forward_propagation(x, W1, b1, W2, b2):
    """
    Params:
    -------
    x: <numpy's ndarray>
        - Average One hot encoded vector
        
    W1: <numpy's ndarray>
        - Weights of the hidden layer of dimensions NxV
        
    b1: <numpy's ndarray>
        - Biases of the hidden layer of dimensions Nx1
        
    W2: <numpy's ndarray>
        - Weights of the output layer of dimensions VxN
        
    b2: <numpy's ndarray>
        - Biases of the output layer of dimensions Vx1

    Returns:
    --------
    z: <numpy's ndarray>
        - Score vector
    """
    
    h = np.dot(W1,x)+b1
    h = np.maximum(0,h)
    z = np.dot(W2,h)+b2
    
    return z, h

### 3.3 Función de costo

Para esta red neuronal utilizaremos la función pérdida entropía cruzada la cual se define como: 

$$ J=-\sum_{k=1}^{V}y_k\log{\hat{y}_k} \tag{6}$$

In [60]:
def compute_cost(y, yhat, batch_size):
    """
    Cost using Cross Entropy Loss
    
    Params:
    -------
    y: <numpy's ndarray>
        - Original vector (of centered words)
    
    yhat: <numpy's ndarray>
        - Predicted vector
    
    batch_size: <int>
        - Indicates the amount of training and real values to take on each batch
    
    Returns:
    --------
    cost: <double>
    """
    
    logprobs = np.multiply(np.log(yhat),y) + np.multiply(np.log(1 - yhat), 1 - y)
    cost = - 1/batch_size * np.sum(logprobs)
    cost = np.squeeze(cost)
    
    return cost

### 3.4 Retropropagación

In [26]:
def backward_propagation(x, yhat, y, h, W1, b1, W2, b2, batch_size):
    """
    Params:
    -------
    x: <numpy's ndarray>
        - Average one hot encoded vector (context words)
        
    yhat: <numpy's ndarray>
        - Predicted vector (centered word)
        
    y: <numpy's ndarray>
        - Target vector
        
    h: <numpy's ndarray>
        - Hidden vector
        
    W1: <numpy's ndarray>
        - Weights of the hidden layer of dimensions NxV
        
    b1: <numpy's ndarray>
        - Biases of the hidden layer of dimensions Nx1
        
    W2: <numpy's ndarray>
        - Weights of the output layer of dimensions VxN
        
    b2: <numpy's ndarray>
        - Biases of the output layer of dimensions Vx1
    
    batch_size: <int>
        - Indicates the amount of training and real values to take on each batch
        
    Returns:
    --------
    W1_grad: <numpy's ndarray>
        - Weights of the gradient's hidden layer of dimensions NxV
        
    b1_grad: <numpy's ndarray>
        - Biases of the gradient's hidden layer of dimensions Nx1
        
    W2_grad: <numpy's ndarray>
        - Weights of the gradient's output layer of dimensions VxN
        
    b2_grad: <numpy's ndarray>
        - Biases of the gradient's output layer of dimensions Vx1
    """
    
    l1 = np.dot(W2.T,(yhat-y))
    l1 = np.maximum(0,l1) #ReLU
    
    grad_W1 = (1/batch_size)*np.dot(l1,x.T)    #1/m * relu(w2.T(yhat-y)) . xT
    grad_b1 = np.sum((1/batch_size)*np.dot(l1,x.T),axis=1,keepdims=True)
    grad_W2 = (1/batch_size)*np.dot(yhat-y,h.T)
    grad_b2 = np.sum((1/batch_size)*np.dot(yhat-y,h.T),axis=1,keepdims=True)
    
    return grad_W1, grad_b1, grad_W2, grad_b2


### 3.5 Descenso del gradiente

In [34]:
def gradient_descent(data, word2Ind, N, V, C, num_iters, alpha=0.03):
    """
      Inputs: 
        data:      text
        word2Ind:  words to Indices
        N:         dimension of hidden vector  
        V:         dimension of vocabulary 
        num_iters: number of iterations  
     Outputs: 
        W1, W2, b1, b2:  updated matrices and biases   

    """
    W1, b1, W2, b2 = init_model(N, V, random_state=0)
    batch_size = 128
    iters = 0
    C = 2
    for x, y in get_batches(data, word2Ind, V, C, batch_size):
        z, h = forward_propagation(x, W1, b1, W2, b2)
        yhat = softmax(z)
        cost = compute_cost(y, yhat, batch_size)
        
        if ((iters+1) % 10 == 0):
            print(f"Number of iterations: {iters + 1} => cost: {cost:.6f}")
            
        grad_W1, grad_b1, grad_W2, grad_b2 = backward_propagation(
            x, yhat, y, h, W1, b1, W2, b2, batch_size
        )
        
        W1 -= alpha*grad_W1 
        W2 -= alpha*grad_W2
        b1 -= alpha*grad_b1
        b2 -= alpha*grad_b2
        
        iters += 1 
        if iters == num_iters:  break
        if iters % 100 == 0: alpha *= 0.66
            
    return W1, W2, b1, b2

### 3.6 Entrenamiendo del modelo

In [62]:
C = 2
N = 128
word2Ind, Ind2word = get_dict(data)
V = len(word2Ind)
num_iters = 150
W1, W2, b1, b2 = gradient_descent(training_data, word2Ind, N, V, C, num_iters)

Number of iterations: 10 cost: 0.008509
Number of iterations: 20 cost: 0.006306
Number of iterations: 30 cost: 0.005012
Number of iterations: 40 cost: 0.004160
Number of iterations: 50 cost: 0.003556
Number of iterations: 60 cost: 0.003106
Number of iterations: 70 cost: 0.002757
Number of iterations: 80 cost: 0.002479
Number of iterations: 90 cost: 0.002252
Number of iterations: 100 cost: 0.002063
Number of iterations: 110 cost: 0.001950
Number of iterations: 120 cost: 0.001853
Number of iterations: 130 cost: 0.001765
Number of iterations: 140 cost: 0.001686
Number of iterations: 150 cost: 0.001613


## 4. Evaluación del modelo

Para evaluar el modelo primero obtendremos los embeddings de la última capa de salida de la red neuronal. Para eso, utilizaremos el promedio de los pesos de las matrices $W_1$ y $W_2$

In [63]:
# Data - list of tokens
test_data = []
for idx, row in tokenized_test.iterrows():
    test_data += row["tokenized_paragraph"]
word2Ind_test, Ind2word_test = get_dict(test_data)
V_test = len(word2Ind_test)
print(f"Test vocab size: {V}")

Test vocab size: 12858


In [64]:
embeddings = (W1.T + W2) / 2.0
print(f"Embeddings dimensions: {embeddings.shape}")

Embeddings dimensions: (12858, 128)


## 5. Visualización de los Word Embeddings

1. Obtenemos los embeddings
2. Aplicamos técnicas de reducción de dimensionalidad (PCA por ejemplo)
3. Normalización de los datos
4. Visualización

In [109]:
words_df = pd.DataFrame({'word': training_data})
words_df = words_df.sample(frac=0.0005)
words = words_df.values
print(f"Words: {len(words)}")
words_df.head(15)

Words: 26


Unnamed: 0,word
46769,many
17533,system
8176,libby
17042,border
41133,controversial
5460,debuts
48601,useless
21645,money
40472,line
25916,track


In [110]:
# 1. Embeddings
embeddings = (W1.T + W2) / 2.0
idx = [word2Ind[word[0]] for word in words]
X = embeddings[idx, :]

# 2. PCA
X = PCA(n_components=3).fit_transform(X)

# 3. Normalización
X = StandardScaler().fit_transform(X)

In [111]:
# 4. Visualización
embeddings_df = pd.DataFrame(data=X, columns=['X', 'Y', 'Z'])
embeddings_df['word'] = [word[0] for word in words]

fig = px.scatter_3d(
    embeddings_df, x='X', y='Y', z='Z', text="word"
)
fig.update_traces(
    marker=dict(size=5), textposition='top center'
)
fig.show()

## 6. Almacenamiento

In [134]:
embeddings = (W1.T + W2) / 2.0
embeddings = [e for e in embeddings]

embeddings_df = pd.DataFrame({'word':list(word2Ind.keys()), 
                              'embedding': embeddings})
print(embeddings_df.shape)
embeddings_df.head()

(12858, 2)


Unnamed: 0,word,embedding
0,aap,"[0.7244269791785534, 0.8359355001269977, 0.700..."
1,ababa,"[0.7745357827248547, 0.765039999143007, 0.7689..."
2,abadies,"[0.7825410524052706, 0.313356671072614, 0.5257..."
3,abandon,"[0.48739499506886336, 0.493720983620092, 0.514..."
4,abandoned,"[0.46609545188343415, 0.4776234384153497, 0.67..."


In [135]:
embeddings_df.to_csv('embeddings.csv', index=False)

**Referencias**:


- [V. Mijangos - Curso Procesamiento de Lenguaje Natural](https://github.com/VMijangos/Curso-Procesamiento-de-Lenguaje-Natural/tree/master/Notebooks)
- [ElotlMX - Curso Redes Neuronales](https://github.com/ElotlMX/Curso_redes)
- [Deep Learning AI NLP Specialization](https://www.coursera.org/specializations/natural-language-processing)