<a href="https://colab.research.google.com/github/psyling/Chinese-Word-Vectors/blob/master/create_the_python_code_for_word_embeddings_in_thi_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import nltk
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from gensim.models import Word2Vec

# Download required NLTK resources
nltk.download('punkt')

# 1. Prepare the corpus
def prepare_corpus(text_data):
  """
  Tokenizes and preprocesses text data for word embedding training.

  Args:
    text_data: A list of text documents.

  Returns:
    A list of tokenized sentences.
  """
  sentences =
  for text in text_data:
    sentences.extend(nltk.sent_tokenize(text))
  return [nltk.word_tokenize(sentence.lower()) for sentence in sentences]

# Example usage with your data sources (replace with your actual data)
leadership_speeches = [
    # ... load speeches from your dataset ...
]
employee_feedback = [
    # ... load feedback data ...
]
online_discussions = [
    # ... load online discussion data ...
]

corpus = prepare_corpus(leadership_speeches + employee_feedback + online_discussions)

# 2. Train the word embedding model
model = Word2Vec(sentences=corpus, vector_size=100, window=5, min_count=5, workers=4)

# 3. Define pronoun and perception-related words
pronouns = ["i", "me", "my", "mine", "we", "us", "our", "ours"]
perception_words = [
    "trustworthy", "competent", "inspiring", "motivated", "engaged",
    "collaborative", "accountable", "inclusive", "respectful", "leader",
    "leadership", "effective", "reliable", "credible", "supportive"
]

# 4. Analyze word embedding relationships
def analyze_associations(model, pronouns, perception_words):
  """
  Calculates cosine similarity between pronoun and perception word embeddings.

  Args:
    model: A trained word embedding model.
    pronouns: A list of pronouns.
    perception_words: A list of perception-related words.

  Returns:
    A dictionary of pronoun-perception word similarity scores.
  """
  similarity_scores = {}
  for pronoun in pronouns:
    similarity_scores = {}
    for word in perception_words:
      if pronoun in model.wv and word in model.wv:
        similarity_scores = cosine_similarity(
            model.wv.reshape(1, -1), model.wv.reshape(1, -1)
        )[0][0]
  return similarity_scores

similarity_scores = analyze_associations(model, pronouns, perception_words)

# Print the similarity scores
for pronoun, scores in similarity_scores.items():
  print(f"Pronoun: {pronoun}")
  for word, score in scores.items():
    print(f"  {word}: {score:.4f}")

# Further analysis and visualization (t-SNE, PCA, etc.) can be done here
# ...

**Explanation:**

* **Import necessary libraries:** `nltk` for text processing, `numpy` for numerical operations, `cosine_similarity` from `sklearn` to calculate similarity, and `Word2Vec` from `gensim` for word embedding training.
* **Prepare the corpus:** This function tokenizes the text data and converts it to lowercase. You'll need to replace the example data sources with your actual data.
* **Train the word embedding model:** This code trains a `Word2Vec` model on the prepared corpus. You can adjust parameters like `vector_size`, `window`, and `min_count` based on your data and needs.
* **Define pronoun and perception-related words:** Create lists of relevant words for analysis.
* **Analyze word embedding relationships:** This function calculates the cosine similarity between the word embeddings of pronouns and perception words.
* **Print the similarity scores:** This code displays the calculated similarity scores.

**Further Steps:**

* **Visualize the relationships:** Use dimensionality reduction techniques like t-SNE or PCA to visualize the word embeddings and observe clusters and patterns.
* **Contextual analysis:** Analyze pronoun use in different contexts within your corpus to understand how it relates to employee perceptions in those specific situations.
* **Combine with other methods:** Integrate this analysis with sentiment analysis, topic modeling, or surveys to gain a more comprehensive understanding.

Remember to adapt this code and analysis to your specific data and research questions.