# üöÄ Google Colab Ready!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fairsayan/rbs-nlp-rl-dm/blob/main/explore-embeddings/explore_embeddings_lab.ipynb)

This notebook is optimized for Google Colab with automatic dependency installation and GPU support.

## üìã Prerequisites
- No local setup required - everything runs in the cloud!
- The notebook will automatically install required packages
- Uses Google's free GPU/TPU resources for faster processing

## üîß Colab Features Used
- **Automatic package installation** (gensim, sklearn, matplotlib, etc.)
- **Persistent data storage** in Colab's temporary file system
- **GPU acceleration** for faster model loading (optional)
- **Google Drive integration** for saving results (optional)

In [None]:
# üì¶ Install Required Packages (Google Colab)
# This cell will install all necessary packages for the embedding exploration

import sys
import subprocess

def install_package(package):
    """Install a package using pip"""
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# List of required packages
required_packages = [
    "gensim>=4.0.0",
    "scikit-learn>=1.0.0",
    "matplotlib>=3.5.0",
    "seaborn>=0.11.0",
    "pandas>=1.3.0",
    "numpy>=1.21.0"
]

print("üîß Installing required packages for Google Colab...")
print("="*50)

for package in required_packages:
    try:
        print(f"üì¶ Installing {package}...")
        install_package(package)
        print(f"‚úÖ {package} installed successfully!")
    except Exception as e:
        print(f"‚ùå Error installing {package}: {e}")

print("\nüéâ Package installation complete!")
print("üìù Note: You may need to restart the runtime if prompted by Colab.")

In [None]:
# üîß Google Colab Environment Setup
# Check if we're running in Google Colab and setup the environment accordingly

import os
import sys

# Check if we're in Google Colab
try:
    import google.colab
    IN_COLAB = True
    print("üåê Running in Google Colab!")
    
    # Mount Google Drive (optional - for saving results)
    from google.colab import drive
    print("üìÅ Mounting Google Drive...")
    drive.mount('/content/drive')
    print("‚úÖ Google Drive mounted successfully!")
    
    # Set up working directory
    WORK_DIR = '/content/embeddings_lab'
    os.makedirs(WORK_DIR, exist_ok=True)
    os.chdir(WORK_DIR)
    print(f"üìÇ Working directory set to: {WORK_DIR}")
    
except ImportError:
    IN_COLAB = False
    print("üíª Running in local environment")
    WORK_DIR = os.getcwd()

print(f"üè† Current working directory: {os.getcwd()}")
print(f"üêç Python version: {sys.version}")

# Check available resources
if IN_COLAB:
    # Check GPU availability
    import torch
    if torch.cuda.is_available():
        print(f"üöÄ GPU available: {torch.cuda.get_device_name(0)}")
        print(f"üíæ GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
    else:
        print("üîÑ CPU only (consider enabling GPU in Runtime > Change runtime type)")
        
    # Check RAM
    import psutil
    ram_gb = psutil.virtual_memory().total / (1024**3)
    print(f"üíæ Total RAM: {ram_gb:.1f} GB")

print("\nüéØ Environment setup complete! Ready to explore embeddings.")

# Hands-on Lab: Explore Embeddings üöÄ

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fairsayan/rbs-nlp-rl-dm/blob/main/explore-embeddings/explore_embeddings_lab.ipynb)

## Goal
Visualize word similarity using pretrained Word2Vec embeddings

## Tools
- **gensim**: For loading pretrained Word2Vec models
- **sklearn**: For dimensionality reduction and similarity calculations
- **matplotlib**: For visualization
- **Google Colab**: Cloud-based execution with automatic setup

## Learning Objectives
By the end of this lab, you will be able to:
1. Load and work with pretrained Word2Vec embeddings
2. Find nearest neighbors in vector space
3. Visualize word similarities using 2D plots
4. Analyze business-relevant vocabulary through embeddings
5. Run everything seamlessly in Google Colab

## üåü Colab Features
- **No local setup required** - everything runs in the cloud
- **Automatic package installation** - just run the cells
- **GPU acceleration** - faster model loading and processing
- **Persistent storage** - save your results to Google Drive

## Step 1: Import Required Libraries

First, let's import all the necessary libraries for our embedding exploration.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# NLP and embeddings
import gensim
from gensim.models import Word2Vec
from gensim.models import KeyedVectors

# Machine learning
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.metrics.pairwise import cosine_similarity

# Utilities
import warnings
warnings.filterwarnings('ignore')

# Configure matplotlib for Google Colab
plt.style.use('default')  # Using default style for better Colab compatibility

# Set better defaults for Colab
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

# Set seaborn palette
sns.set_palette("husl")

# Configure display options for better output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print("‚úÖ Libraries imported successfully!")
print(f"üìä Gensim version: {gensim.__version__}")
print(f"üêç NumPy version: {np.__version__}")
print(f"üêº Pandas version: {pd.__version__}")
print(f"üìà Matplotlib configured for Google Colab")

# Check if we're in Colab for additional optimizations
try:
    import google.colab
    print("üåê Google Colab environment detected")
    print("‚ö° Optimizations applied for cloud execution")
except ImportError:
    print("üíª Local environment detected")

## Step 2: Load Pretrained Word2Vec Model

We'll use Google's pretrained Word2Vec model trained on Google News dataset. This model contains 300-dimensional vectors for 3 million words and phrases.

**Google Colab Integration**: 
- The model will be automatically downloaded using gensim's API
- Downloads directly to Colab's temporary storage
- Optimized for cloud execution with progress tracking
- No local file management needed

**Model Details**:
- **Source**: Google News dataset (3 million words and phrases)
- **Dimensions**: 300-dimensional vectors
- **Download**: Automatic via gensim API
- **Storage**: Colab's temporary file system (persists during session)
- **Memory**: ~1.5GB (well within Colab's limits)

**‚ö° Performance Note**: 
- First run: ~2-3 minutes download time
- Subsequent runs: Instant loading from memory
- GPU acceleration available for faster processing

In [None]:
# üì° Load Google's pretrained Word2Vec model
# Optimized for Google Colab with automatic downloading and caching

import os
from pathlib import Path
import gensim.downloader as api
from gensim.models import KeyedVectors

print("üöÄ Loading Word2Vec model for Google Colab...")
print("="*60)

# Check if we're in Colab environment
try:
    import google.colab
    IN_COLAB = True
    print("üåê Google Colab detected - using optimized loading")
except ImportError:
    IN_COLAB = False
    print("üíª Local environment detected")

# Set data directory based on environment
if IN_COLAB:
    data_dir = '/content/data'
else:
    data_dir = 'data'

os.makedirs(data_dir, exist_ok=True)

try:
    # Method 1: Try to load from local cache first
    model_path = os.path.join(data_dir, 'word2vec-google-news-300.bin')
    
    if os.path.exists(model_path):
        print("üìÅ Found cached model, loading from local storage...")
        word_vectors = KeyedVectors.load_word2vec_format(model_path, binary=True)
        print("‚úÖ Model loaded from cache!")
    else:
        raise FileNotFoundError("Model not found in cache")
        
except FileNotFoundError:
    print("üì¶ Downloading Word2Vec model (this may take 2-3 minutes)...")
    print("üí° Tip: The model will be cached for faster future loading")
    
    # Method 2: Download using gensim's API
    print("\nüîÑ Downloading word2vec-google-news-300...")
    print("üìä Model size: ~1.5GB")
    
    try:
        word_vectors = api.load('word2vec-google-news-300')
        print("‚úÖ Model downloaded successfully!")
        
        # Save to local cache for future use
        print("? Caching model for future runs...")
        word_vectors.save_word2vec_format(model_path, binary=True)
        print("‚úÖ Model cached successfully!")
        
    except Exception as e:
        print(f"‚ùå Error downloading model: {e}")
        print("üîÑ Trying alternative download method...")
        
        # Method 3: Alternative download approach
        import urllib.request
        import gzip
        
        url = "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"
        print(f"üì° Downloading from: {url}")
        
        # This is a fallback - in practice, gensim's API should work
        print("‚ö†Ô∏è This is a fallback method - may take longer")
        word_vectors = api.load('word2vec-google-news-300')

# Model verification and info
print(f"\nüîç Model loaded successfully!")
print(f"üìä Vocabulary size: {len(word_vectors.key_to_index):,}")
print(f"üìê Vector dimensions: {word_vectors.vector_size}")
print(f"? Model type: {type(word_vectors)}")

# Test the model
test_words = ['business', 'technology', 'profit', 'innovation']
available_test_words = [word for word in test_words if word in word_vectors.key_to_index]

print(f"\nüß™ Model verification:")
print(f"‚úÖ Test words in vocabulary: {available_test_words}")
print(f"‚úÖ Model ready for embedding exploration!")

if IN_COLAB:
    print(f"\n‚òÅÔ∏è Running in Google Colab - optimal performance achieved!")
    print(f"üöÄ GPU acceleration: {'Available' if 'torch' in locals() and torch.cuda.is_available() else 'CPU only'}")
    print(f"üíæ Memory usage: {len(word_vectors.key_to_index) * word_vectors.vector_size * 4 / 1024**3:.1f} GB")

## Step 3: Define Business-Relevant Words

Let's select a set of business-relevant words to explore their embeddings and relationships.

In [None]:
# Define business-relevant words for analysis
business_words = [
    # Finance & Economics
    'profit', 'revenue', 'investment', 'budget', 'finance', 'economy',
    
    # Technology & Innovation
    'technology', 'innovation', 'digital', 'software', 'artificial_intelligence', 'data',
    
    # Marketing & Sales
    'marketing', 'sales', 'customer', 'brand', 'advertising', 'promotion',
    
    # Operations & Management
    'management', 'leadership', 'strategy', 'operations', 'efficiency', 'quality',
    
    # Human Resources
    'employee', 'talent', 'training', 'performance', 'recruitment', 'teamwork'
]

# Filter words that exist in our vocabulary
available_words = [word for word in business_words if word in word_vectors.key_to_index]
missing_words = [word for word in business_words if word not in word_vectors.key_to_index]

print(f"‚úÖ Available words ({len(available_words)}): {available_words}")
print(f"‚ùå Missing words ({len(missing_words)}): {missing_words}")

# Use available words for our analysis
target_words = available_words[:15]  # Limit to first 15 for better visualization
print(f"\nüéØ Words selected for analysis: {target_words}")

## Step 4: Explore Word Similarity - Find Nearest Neighbors

Let's find the nearest neighbors for each of our target words in the vector space.

In [None]:
def find_nearest_neighbors(word, model, top_n=5):
    """Find nearest neighbors for a given word"""
    try:
        neighbors = model.most_similar(word, topn=top_n)
        return neighbors
    except KeyError:
        return None

# Find nearest neighbors for each target word
print("üîç Finding nearest neighbors for each business word...\n")

neighbors_data = {}
for word in target_words:
    neighbors = find_nearest_neighbors(word, word_vectors, top_n=5)
    if neighbors:
        neighbors_data[word] = neighbors
        print(f"üìä **{word.upper()}** - Nearest neighbors:")
        for neighbor, similarity in neighbors:
            print(f"   {neighbor}: {similarity:.3f}")
        print()

## Step 5: Visualize Word Embeddings in 2D

Now let's visualize our business words in a 2D space using dimensionality reduction techniques.

In [None]:
# Extract vectors for our target words
word_vectors_matrix = np.array([word_vectors[word] for word in target_words])

# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
word_vectors_2d_pca = pca.fit_transform(word_vectors_matrix)

# Apply t-SNE for dimensionality reduction
tsne = TSNE(n_components=2, random_state=42, perplexity=min(5, len(target_words)-1))
word_vectors_2d_tsne = tsne.fit_transform(word_vectors_matrix)

# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Plot PCA results
scatter1 = ax1.scatter(word_vectors_2d_pca[:, 0], word_vectors_2d_pca[:, 1], 
                      c=range(len(target_words)), cmap='viridis', s=100, alpha=0.7)

for i, word in enumerate(target_words):
    ax1.annotate(word, (word_vectors_2d_pca[i, 0], word_vectors_2d_pca[i, 1]), 
                xytext=(5, 5), textcoords='offset points', fontsize=10, fontweight='bold')

ax1.set_title('Word Embeddings Visualization - PCA', fontsize=14, fontweight='bold')
ax1.set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)')
ax1.set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)')
ax1.grid(True, alpha=0.3)

# Plot t-SNE results
scatter2 = ax2.scatter(word_vectors_2d_tsne[:, 0], word_vectors_2d_tsne[:, 1], 
                      c=range(len(target_words)), cmap='viridis', s=100, alpha=0.7)

for i, word in enumerate(target_words):
    ax2.annotate(word, (word_vectors_2d_tsne[i, 0], word_vectors_2d_tsne[i, 1]), 
                xytext=(5, 5), textcoords='offset points', fontsize=10, fontweight='bold')

ax2.set_title('Word Embeddings Visualization - t-SNE', fontsize=14, fontweight='bold')
ax2.set_xlabel('t-SNE Component 1')
ax2.set_ylabel('t-SNE Component 2')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"üìà Visualized {len(target_words)} business words in 2D space")
print(f"üìä PCA explained variance: {pca.explained_variance_ratio_.sum():.1%}")

## Step 6: Calculate and Visualize Similarity Matrix

Let's create a similarity matrix to see how related our business words are to each other.

In [None]:
# Calculate cosine similarity matrix
similarity_matrix = cosine_similarity(word_vectors_matrix)

# Create a DataFrame for better visualization
similarity_df = pd.DataFrame(similarity_matrix, 
                           index=target_words, 
                           columns=target_words)

# Create heatmap
plt.figure(figsize=(12, 10))
mask = np.triu(np.ones_like(similarity_df, dtype=bool))  # Mask upper triangle

sns.heatmap(similarity_df, 
            annot=True, 
            cmap='RdYlBu_r', 
            vmin=0, 
            vmax=1,
            center=0.5,
            square=True,
            mask=mask,
            fmt='.2f',
            cbar_kws={'label': 'Cosine Similarity'})

plt.title('Business Words Similarity Matrix', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Words', fontsize=12)
plt.ylabel('Words', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

print("üî• Similarity matrix created! Higher values indicate more similar words.")

## Step 7: Find Most and Least Similar Word Pairs

Let's identify the most and least similar word pairs from our business vocabulary.

In [None]:
# Find most and least similar pairs
word_pairs = []
similarities = []

for i in range(len(target_words)):
    for j in range(i+1, len(target_words)):
        word1, word2 = target_words[i], target_words[j]
        similarity = similarity_matrix[i, j]
        word_pairs.append((word1, word2))
        similarities.append(similarity)

# Sort by similarity
sorted_pairs = sorted(zip(word_pairs, similarities), key=lambda x: x[1], reverse=True)

print("üîù TOP 10 MOST SIMILAR WORD PAIRS:")
print("="*50)
for i, ((word1, word2), sim) in enumerate(sorted_pairs[:10]):
    print(f"{i+1:2d}. {word1:12} ‚Üî {word2:12} | Similarity: {sim:.3f}")

print("\nüîª TOP 10 LEAST SIMILAR WORD PAIRS:")
print("="*50)
for i, ((word1, word2), sim) in enumerate(sorted_pairs[-10:]):
    print(f"{i+1:2d}. {word1:12} ‚Üî {word2:12} | Similarity: {sim:.3f}")

## Step 8: Interactive Word Similarity Explorer

Let's create an interactive function to explore word relationships in our vocabulary.

In [None]:
def explore_word_relationships(word, model, target_words_list):
    """Explore relationships between a word and our target vocabulary"""
    if word not in model.key_to_index:
        print(f"‚ùå '{word}' not found in vocabulary")
        return
    
    print(f"üîç Exploring relationships for: **{word.upper()}**")
    print("="*60)
    
    # Find similarities with our target words
    similarities = []
    for target_word in target_words_list:
        if target_word in model.key_to_index:
            sim = model.similarity(word, target_word)
            similarities.append((target_word, sim))
    
    # Sort by similarity
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    print("üìä Similarity with business words:")
    for target_word, sim in similarities[:10]:
        print(f"   {target_word:15} | {sim:.3f} {'üî•' if sim > 0.5 else 'üìä' if sim > 0.3 else 'üìâ'}")
    
    # Find general nearest neighbors
    print("\nüéØ Top 5 nearest neighbors:")
    neighbors = model.most_similar(word, topn=5)
    for neighbor, sim in neighbors:
        print(f"   {neighbor:15} | {sim:.3f}")

# Example usage
print("Try exploring different words! For example:")
explore_word_relationships('profit', word_vectors, target_words)

## Step 9: Word Arithmetic and Analogies

One fascinating aspect of word embeddings is their ability to capture semantic relationships through vector arithmetic.

In [None]:
def find_analogy(word1, word2, word3, model, top_n=5):
    """Find word that completes the analogy: word1 is to word2 as word3 is to ?"""
    try:
        result = model.most_similar(positive=[word2, word3], negative=[word1], topn=top_n)
        return result
    except KeyError as e:
        return f"Error: {e}"

# Business analogies to explore
analogies = [
    ('king', 'man', 'woman'),  # Classic example: king - man + woman = queen
    ('CEO', 'company', 'school'),  # CEO is to company as ? is to school
    ('profit', 'business', 'education'),  # profit is to business as ? is to education
    ('marketing', 'product', 'candidate'),  # marketing is to product as ? is to candidate
    ('investment', 'money', 'time'),  # investment is to money as ? is to time
]

print("üßÆ WORD ARITHMETIC AND ANALOGIES")
print("="*60)

for word1, word2, word3 in analogies:
    print(f"\nüîç {word1} is to {word2} as {word3} is to...")
    result = find_analogy(word1, word2, word3, word_vectors, top_n=3)
    
    if isinstance(result, str):
        print(f"   {result}")
    else:
        print(f"   Top predictions:")
        for word, score in result:
            print(f"     {word} ({score:.3f})")

## Step 10: Summary and Key Insights

Let's summarize our findings and extract key insights from our embedding exploration.

In [None]:
# Summary statistics
print("üìà EMBEDDING EXPLORATION SUMMARY")
print("="*60)

print(f"üìä Total words analyzed: {len(target_words)}")
print(f"üìê Vector dimensions: {word_vectors.vector_size}")
print(f"üìö Total vocabulary size: {len(word_vectors.key_to_index):,}")

# Calculate some statistics
avg_similarity = np.mean(similarity_matrix[np.triu_indices(len(target_words), k=1)])
max_similarity = np.max(similarity_matrix[np.triu_indices(len(target_words), k=1)])
min_similarity = np.min(similarity_matrix[np.triu_indices(len(target_words), k=1)])

print(f"\nüìä Similarity Statistics:")
print(f"   Average similarity: {avg_similarity:.3f}")
print(f"   Maximum similarity: {max_similarity:.3f}")
print(f"   Minimum similarity: {min_similarity:.3f}")

print("\nüéØ KEY INSIGHTS:")
print("1. Word embeddings capture semantic relationships between business terms")
print("2. Similar words cluster together in the vector space")
print("3. Vector arithmetic can reveal analogical relationships")
print("4. Dimensionality reduction helps visualize high-dimensional embeddings")
print("5. Cosine similarity is effective for measuring word relationships")

print("\nüöÄ NEXT STEPS:")
print("‚Ä¢ Experiment with different word lists (industry-specific terms)")
print("‚Ä¢ Try different pretrained models (GloVe, FastText, etc.)")
print("‚Ä¢ Explore domain-specific embedding models")
print("‚Ä¢ Apply embeddings to text classification or clustering tasks")
print("‚Ä¢ Create custom embeddings from your own text data")

## üéØ Lab Exercise: Your Turn!

Now it's your turn to explore embeddings! Complete the following exercises:

### Exercise 1: Custom Word List
Create your own list of words related to your field of interest (e.g., healthcare, education, sports) and repeat the analysis above.

### Exercise 2: Word Arithmetic
Try to find interesting analogies using word arithmetic. Can you find business-related analogies?

### Exercise 3: Similarity Threshold
Experiment with different similarity thresholds to group words into clusters.

Use the cells below to implement your solutions:

In [None]:
# Exercise 1: Your custom word list
my_words = []
# Add your words here and run the analysis

# Your code here...

In [None]:
# Exercise 2: Word arithmetic experiments
# Try your own analogies here

# Your code here...

In [None]:
# Exercise 3: Clustering with similarity thresholds
# Group words based on similarity thresholds

# Your code here...

## üíæ Save Results to Google Drive (Optional)

If you want to save your results and visualizations to Google Drive for later use, run the cell below.

In [None]:
# üíæ Save Results to Google Drive
# This cell saves your analysis results to Google Drive for future reference

import os
from datetime import datetime
import json

try:
    import google.colab
    IN_COLAB = True
    
    # Create a folder in Google Drive for results
    drive_folder = '/content/drive/MyDrive/Embedding_Lab_Results'
    os.makedirs(drive_folder, exist_ok=True)
    
    # Create timestamped folder
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    session_folder = os.path.join(drive_folder, f'session_{timestamp}')
    os.makedirs(session_folder, exist_ok=True)
    
    print(f"üìÅ Created results folder: {session_folder}")
    
    # Save similarity matrix
    if 'similarity_df' in locals():
        similarity_df.to_csv(os.path.join(session_folder, 'similarity_matrix.csv'))
        print("‚úÖ Similarity matrix saved")
    
    # Save word pairs analysis
    if 'sorted_pairs' in locals():
        pairs_data = {
            'most_similar': [{'word1': pair[0], 'word2': pair[1], 'similarity': sim} 
                           for (pair, sim) in sorted_pairs[:10]],
            'least_similar': [{'word1': pair[0], 'word2': pair[1], 'similarity': sim} 
                            for (pair, sim) in sorted_pairs[-10:]]
        }
        
        with open(os.path.join(session_folder, 'word_pairs_analysis.json'), 'w') as f:
            json.dump(pairs_data, f, indent=2)
        print("‚úÖ Word pairs analysis saved")
    
    # Save target words list
    if 'target_words' in locals():
        with open(os.path.join(session_folder, 'target_words.txt'), 'w') as f:
            f.write('\n'.join(target_words))
        print("‚úÖ Target words list saved")
    
    # Save session summary
    summary = {
        'timestamp': timestamp,
        'total_words_analyzed': len(target_words) if 'target_words' in locals() else 0,
        'vector_dimensions': word_vectors.vector_size if 'word_vectors' in locals() else 0,
        'vocabulary_size': len(word_vectors.key_to_index) if 'word_vectors' in locals() else 0,
        'average_similarity': float(avg_similarity) if 'avg_similarity' in locals() else 0,
        'max_similarity': float(max_similarity) if 'max_similarity' in locals() else 0,
        'min_similarity': float(min_similarity) if 'min_similarity' in locals() else 0
    }
    
    with open(os.path.join(session_folder, 'session_summary.json'), 'w') as f:
        json.dump(summary, f, indent=2)
    
    print("‚úÖ Session summary saved")
    print(f"\nüéâ All results saved to Google Drive!")
    print(f"üìÇ Location: {session_folder}")
    print(f"üîó Access via: Google Drive > Embedding_Lab_Results > session_{timestamp}")
    
except ImportError:
    print("üíª Not running in Google Colab - skipping Drive save")
    print("üí° To save results locally, modify the paths in this cell")
    
except Exception as e:
    print(f"‚ùå Error saving to Google Drive: {e}")
    print("üí° Make sure Google Drive is mounted and you have write permissions")

## üéØ Next Steps & Sharing

### üì§ Share Your Work
1. **Save a copy to Drive**: `File > Save a copy in Drive`
2. **Share with others**: `Share` button ‚Üí Add collaborators
3. **Download as notebook**: `File > Download > Download .ipynb`
4. **Export to GitHub**: `File > Save a copy in GitHub`

### üöÄ Advanced Exploration
- **Try different models**: FastText, GloVe, domain-specific embeddings
- **Scale up**: Use larger word lists or entire documents
- **Custom training**: Train embeddings on your own text data
- **Applications**: Text classification, clustering, recommendation systems

### üõ†Ô∏è Troubleshooting
- **Out of memory**: Reduce word list size or restart runtime
- **Slow downloads**: Check internet connection or try different times
- **GPU issues**: Runtime ‚Üí Change runtime type ‚Üí GPU
- **Drive mounting**: Rerun the environment setup cell

### üìö Resources
- [Gensim Documentation](https://radimrehurek.com/gensim/)
- [Word2Vec Paper](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
- [Google Colab Tips](https://colab.research.google.com/notebooks/welcome.ipynb)

**üéâ Congratulations! You've successfully explored word embeddings in Google Colab!**