# Gensim Fasttext

Gensim provides a convenient implementation of FastText, which can be used to train word vectors on a custom corpus or to use pre-trained models for various tasks such as finding similar words and computing similarity scores between words.

### Installing Gensim

> pip install gensim

### Using Gensim FastText

Here are the key steps to use Gensim FastText:

1. Loading a Pre-trained Model
2. Training a FastText Model on a Custom Corpus
3. Finding Similar Words
4. Computing Similarity Scores Between Words

### Links

[Migrating-from-Gensim-3.x-to-4](https://github.com/piskvorky/gensim/wiki/Migrating-from-Gensim-3.x-to-4)

[cub-200-2011_paper](https://paperswithcode.com/dataset/cub-200-2011)

[cub-200-2011_dataset](https://www.kaggle.com/datasets/wenewone/cub2002011?resource=download)

[Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)

### 1. Loading a Pre-trained Model

Gensim provides a way to load pre-trained FastText models. For example, you can load the pre-trained FastText model provided by Facebook:

In [4]:
import gensim.downloader as api

# Load pre-trained FastText model
model = api.load('fasttext-wiki-news-subwords-300')

print(model)

KeyedVectors<vector_size=300, 999999 keys>


### 2. Training a FastText Model on a Custom Corpus

You can also train a FastText model on your custom corpus. Here is an example:

In [12]:
from gensim.models import FastText
from gensim.utils import simple_preprocess

# Example sentences
# sentences = [
#     "Cats and dogs are both popular household pets, yet cats are more independent and often prefer solitude. They share some hunting instincts with their larger feline cousins like lions and tigers.",
#     "Dogs and cats are common pets, but dogs are known for their loyalty and tendency to form strong bonds with humans. Unlike solitary big cats, dogs are social animals that thrive in packs.",
#     "Horses, like elephants, have been domesticated to assist humans in various tasks. However, horses are known for their speed and agility, whereas elephants are prized for their strength and intelligence.",
#     "Lions and tigers are both apex predators, but lions are social animals living in prides. In contrast, tigers are solitary creatures, only coming together during mating or to raise cubs.",
#     "Tigers share their powerful physique and hunting prowess with lions. Unlike the social lions, tigers are mostly solitary, showcasing a stark behavioral difference between the two big cats.",
#     "Elephants, similar to horses, have been used by humans for labor due to their strength. Elephants, however, are highly intelligent with complex social structures, unlike the more individually task-oriented horses.",
# ]
sentences = [
    'african_buffalo', 'alligator', 'amphibian', 'amur_leopard', 
    'ants', 'bear', 'bird', 'blue_whale', 'bobcat', 'cat', 'chimp', 
    'chimpanzee', 'cow', 'dog', 'dolphin', 'domestic_water_buffalo', 
    'eagle', 'elephant', 'fish', 'frog', 'giant', 'giant_panda', 'goat', 
    'gorilla', 'hen', 'horse', 'killer_whale', 'lion', 'lizard', 'monkey', 
    'mouse', 'orangutan', 'ostrich', 'ox', 'panda', 'polar_bear', 'rabbit', 
    'rat', 'rhino', 'rhinoceros', 'rhinoceroses', 'seal', 'sealskin', 
    'siamese_cat', 'skunk', 'spider_monkey', 'squirrel', 'tiger', 'turtle', 
    'walrus', 'whale', 'bird', 'fish', 'lion', 'tiger', 'bull'
]

# Preprocess sentences
sentences = [simple_preprocess(sentence) for sentence in sentences]

print(f"sentences : {sentences} \n")

# Train FastText model
model = FastText(sentences, vector_size=300, window=5, min_count=1, epochs=10000)

print(f"model : {model} \n")

sentences : [['african_buffalo'], ['alligator'], ['amphibian'], ['amur_leopard'], ['ants'], ['bear'], ['bird'], ['blue_whale'], ['bobcat'], ['cat'], ['chimp'], ['chimpanzee'], ['cow'], ['dog'], ['dolphin'], [], ['eagle'], ['elephant'], ['fish'], ['frog'], ['giant'], ['giant_panda'], ['goat'], ['gorilla'], ['hen'], ['horse'], ['killer_whale'], ['lion'], ['lizard'], ['monkey'], ['mouse'], ['orangutan'], ['ostrich'], ['ox'], ['panda'], ['polar_bear'], ['rabbit'], ['rat'], ['rhino'], ['rhinoceros'], ['rhinoceroses'], ['seal'], ['sealskin'], ['siamese_cat'], ['skunk'], ['spider_monkey'], ['squirrel'], ['tiger'], ['turtle'], ['walrus'], ['whale'], ['bird'], ['fish'], ['lion'], ['tiger'], ['bull']] 

model : FastText<vocab=51, vector_size=300, alpha=0.025> 



In [24]:
model.wv, len(model.wv), len(model.wv[0])

(<gensim.models.fasttext.FastTextKeyedVectors at 0x17e562e5600>, 51, 300)

### 3. Finding Similar Words

Once you have a trained or pre-trained model, you can find similar words using the `most_similar` method:

In [13]:
# Find similar words to 'machine'
similar_words = model.wv.most_similar('cat', topn=10)

print(similar_words)

[('bobcat', 0.2737165093421936), ('goat', 0.15198037028312683), ('bird', 0.12732714414596558), ('ostrich', 0.09741493314504623), ('seal', 0.09225074201822281), ('siamese_cat', 0.08397927135229111), ('african_buffalo', 0.08353123813867569), ('ants', 0.06892219930887222), ('walrus', 0.05629545822739601), ('rat', 0.055904362350702286)]


### 4. Computing Similarity Scores Between Words

You can compute similarity scores between two words using the `similarity` method:

In [14]:
# Compute similarity score between 'cat' and 'dog'
similarity_score = model.wv.similarity('cat', 'bobcat')
print(similarity_score)

similarity_score = model.wv.similarity('cat', 'siamese_cat')
print(similarity_score, ' *')

similarity_score = model.wv.similarity('cat', 'siamese cat')
print(similarity_score, ' *')

similarity_score = model.wv.similarity('cat', 'dog')
print(similarity_score)

similarity_score = model.wv.similarity('cat', 'rat')
print(similarity_score)

similarity_score = model.wv.similarity('cat', 'ferret')
print(similarity_score)

0.2737165
0.08397927  *
0.1394797  *
0.012166994
0.055904355
0.007804998


# Gensim Word2Vec

### Using Gensim Word2Vec

Here are the key steps to use Gensim Word2Vec:

1. Training a Word2Vec Model on a Custom Corpus
2. Finding Similar Words
3. Computing Similarity Scores Between Words

### 1. Training a Word2Vec Model on a Custom Corpus

You can train a Word2Vec model on your custom corpus. Here is an example:

In [31]:
from gensim.models import Word2Vec
from gensim.utils import simple_preprocess

# Example sentences
sentences = [
    'african_buffalo', 'alligator', 'amphibian', 'amur_leopard', 
    'ants', 'bear', 'bird', 'blue_whale', 'bobcat', 'cat', 'chimp', 
    'chimpanzee', 'cow', 'dog', 'dolphin', 'domestic_water_buffalo', 
    'eagle', 'elephant', 'fish', 'frog', 'giant', 'giant_panda', 'goat', 
    'gorilla', 'hen', 'horse', 'killer_whale', 'lion', 'lizard', 'monkey', 
    'mouse', 'orangutan', 'ostrich', 'ox', 'panda', 'polar_bear', 'rabbit', 
    'rat', 'rhino', 'rhinoceros', 'rhinoceroses', 'seal', 'sealskin', 
    'siamese_cat', 'skunk', 'spider_monkey', 'squirrel', 'tiger', 'turtle', 
    'walrus', 'whale', 'bird', 'fish', 'lion', 'tiger', 'bull'
]

# Preprocess sentences
sentences = [simple_preprocess(sentence) for sentence in sentences]
print(f"sentences : {sentences} \n")

# Train Word2Vec model
model = Word2Vec(sentences, vector_size=300, window=5, min_count=1, epochs=10000)
print(f"model : {model} \n")

sentences : [['african_buffalo'], ['alligator'], ['amphibian'], ['amur_leopard'], ['ants'], ['bear'], ['bird'], ['blue_whale'], ['bobcat'], ['cat'], ['chimp'], ['chimpanzee'], ['cow'], ['dog'], ['dolphin'], [], ['eagle'], ['elephant'], ['fish'], ['frog'], ['giant'], ['giant_panda'], ['goat'], ['gorilla'], ['hen'], ['horse'], ['killer_whale'], ['lion'], ['lizard'], ['monkey'], ['mouse'], ['orangutan'], ['ostrich'], ['ox'], ['panda'], ['polar_bear'], ['rabbit'], ['rat'], ['rhino'], ['rhinoceros'], ['rhinoceroses'], ['seal'], ['sealskin'], ['siamese_cat'], ['skunk'], ['spider_monkey'], ['squirrel'], ['tiger'], ['turtle'], ['walrus'], ['whale'], ['bird'], ['fish'], ['lion'], ['tiger'], ['bull']] 

model : Word2Vec<vocab=51, vector_size=300, alpha=0.025> 



### 2. Finding Similar Words

Once you have a trained model, you can find similar words using the most_similar method:

In [32]:
# Find similar words to 'cat'
similar_words = model.wv.most_similar('cat', topn=10)
print("Most similar words to 'cat':")
for word, score in similar_words:
    print(f"{word}: {score}")

Most similar words to 'cat':
ants: 0.11463356018066406
goat: 0.10705526173114777
rhinoceros: 0.09309180825948715
monkey: 0.09122835099697113
rhino: 0.08179710805416107
rabbit: 0.07725443691015244
orangutan: 0.07632383704185486
bear: 0.07548326253890991
rhinoceroses: 0.06613056361675262
chimpanzee: 0.042745448648929596


### 3. Computing Similarity Scores Between Words

You can compute similarity scores between two words using the similarity method:

In [34]:
# Compute similarity score between 'cat' and 'bobcat'
similarity_score = model.wv.similarity('cat', 'bobcat')
print(f"Similarity score between 'cat' and 'bobcat': {similarity_score}")

Similarity score between 'cat' and 'bobcat': -0.01531929150223732


# ZSL

Zero-shot learning (ZSL) is a machine learning paradigm where a model is trained to recognize objects or perform tasks that it has never seen before during training. Instead of relying solely on labeled examples for every possible category, ZSL leverages auxiliary information (such as semantic attributes, descriptions, or relationships) to make predictions about unseen classes.

# Code for Zero-Shot Learning

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from sklearn.preprocessing import normalize
from scipy.spatial.distance import cdist
import fasttext

# Load a pre-trained ResNet50 model
model = ResNet50(weights='imagenet', include_top=False, pooling='avg')

# Function to extract visual features
def extract_features(image_path):
    image = tf.keras.preprocessing.image.load_img(image_path, target_size=(224, 224))
    image = tf.keras.preprocessing.image.img_to_array(image)
    image = np.expand_dims(image, axis=0)
    image = tf.keras.applications.resnet50.preprocess_input(image)
    features = model.predict(image)
    return features

# Load FastText word vectors
fasttext_model = fasttext.load_model('cc.en.300.bin')
# from gensim.models.keyedvectors import KeyedVectors
# import gensim.downloader as api
# fast_text_vectors = api.load("fasttext-wiki-news-subwords-300")
# fast_text_vectors.save('fstwk_1.d2v')
# fast_text_vectors = KeyedVectors.load("fstwk_1.d2v")

# Example seen and unseen classes
seen_classes = ['cat', 'dog', 'horse']
unseen_classes = ['lion', 'tiger', 'elephant']

# Get word vectors for classes
def get_class_vectors(classes):
    return np.array([fasttext_model.get_word_vector(cls) for cls in classes])

# Normalize the word vectors
seen_vectors = normalize(get_class_vectors(seen_classes))
unseen_vectors = normalize(get_class_vectors(unseen_classes))

# Function to perform zero-shot classification
def zero_shot_classify(image_path):
    features = extract_features(image_path)
    features = normalize(features)
    distances = cdist(features, unseen_vectors, metric='cosine')
    return unseen_classes[np.argmin(distances)]

# Example usage
image_path = 'Sample_Images\cat1.jpg'
predicted_class = zero_shot_classify(image_path)
print(f'Predicted class: {predicted_class}')





ModuleNotFoundError: No module named 'fasttext'

Conditional Autoencoders (CAEs) are a variant of autoencoders where additional information is used to condition the encoding and decoding processes. This conditioning can help the autoencoder learn more structured and relevant representations based on the context provided by the additional information.

### Autoencoders Recap

Autoencoders are neural networks designed to learn efficient representations (encodings) of input data, typically for the purpose of dimensionality reduction or data denoising.

Components:
- Encoder: Compresses the input data into a latent-space representation.
- Decoder: Reconstructs the input data from the latent representation.

### Conditional Autoencoders
In a Conditional Autoencoder, the input data is conditioned on some additional information (conditions). This information can be labels, attributes, or any other relevant context that influences the encoding and decoding processes.