<a href="https://colab.research.google.com/github/karthik111/Skylab/blob/main/Gorilla_hosted.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gorilla Hosted - Try it out in less than 60s 🚀

[![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/ShishirPatil/gorilla)  [![arXiv](https://img.shields.io/badge/arXiv-2305.15334-<COLOR>.svg?style=flat-square)](https://arxiv.org/abs/2305.15334)   [![Discord](https://img.shields.io/discord/1111172801899012102?label=Discord&logo=discord&logoColor=green&style=flat-square)](https://discord.gg/grXXvj9Whz)  [![Twitter](https://img.shields.io/twitter/url?url=https://twitter.com/shishirpatil_/status/1661780076277678082)](https://twitter.com/shishirpatil_/status/1661780076277678082)

Play around with Gorilla! Here, we host the Gorilla zero-shot models, so you can try it out! This is compatible with the OpenAI chat completion API - plug and play!

🟢 Now with Apache-2.0! Gorilla is commercially usable with no obligations 🚀

We are happy to launch all three models: `gorilla-7b-hf-v1` which chooses from 925 Hugging Face APIs 0-shot, `gorilla-7b-th-v0` for 94 (exhaustive) Tensor Hub APIs 0-shot, `gorilla-7b-tf-v0` for 626 (exhaustive) Tensorflow Hub APIs 0-shot. `gorilla-mpt-7b-hf-v0` and `gorilla-falcon-7b-hf-v0`are two Apache-2.0 licensed models for Hugging Face APIs. We have a hosted end-point for `gorilla-mpt-7b-hf-v0` in this colab, and are in the process of adding `gorilla-falcon-7b-hf-v0` soon! In spirit of openess, we do not filter, nor carry out any post processing either to the prompt nor response. We will release the combined {HF+TF+TH} model which also has generic chat capability slowly to accomodate server demand.

💃 If you want to use Gorilla or build on top of it! Feel absolutely free to do so - we believe in open source research and you don't even have to tell us! In case you choose to do, we have a vibrant community in Discord! Stop by and say Hi 👋

<img src="https://github.com/ShishirPatil/gorilla/blob/gh-pages/assets/img/logo.png?raw=true" width=30% height=30%>

## Gorilla 🦍 is hosted by UC Berkeley Sky lab for FREE 🤩 as a research prototype 🤓
## Please don't use it for commercial serving 👀
## The hosted models are only trained to serve HuggingFace/TF/Torch APIs. They are NOT trained to serve other restful APIs.

In [2]:
# Import Chat completion template and set-up variables
!pip install openai==0.28.1 &> /dev/null
import openai
import urllib.parse

openai.api_key = "EMPTY" # Key is ignored and does not matter
openai.api_base = "http://zanino.millennium.berkeley.edu:8000/v1"
# Alternate mirrors
# openai.api_base = "http://34.132.127.197:8000/v1"

# Report issues
def raise_issue(e, model, prompt):
    issue_title = urllib.parse.quote("[bug] Hosted Gorilla: <Issue>")
    issue_body = urllib.parse.quote(f"Exception: {e}\nFailed model: {model}, for prompt: {prompt}")
    issue_url = f"https://github.com/ShishirPatil/gorilla/issues/new?assignees=&labels=hosted-gorilla&projects=&template=hosted-gorilla-.md&title={issue_title}&body={issue_body}"
    print(f"An exception has occurred: {e} \nPlease raise an issue here: {issue_url}")

# Query Gorilla server
def get_gorilla_response(prompt="I would like to translate from English to French.", model="gorilla-7b-hf-v1"):
  try:
    completion = openai.ChatCompletion.create(
      model=model,
      messages=[{"role": "user", "content": prompt}]
    )
    return completion.choices[0].message.content
  except Exception as e:
    raise_issue(e, model, prompt)

## 🧑‍💻 [Update Jun 15] With our new v1 model `gorilla-7b-hf-delta-v1`, Gorilla now returns code snippets you can use directly in your workflow!

## Example 1: Translation ✍ with 🤗

In [3]:
# Gorilla `gorilla-mpt-7b-hf-v1` with code snippets
# Translation
prompt = "I would like to translate 'I feel very good today.' from English to Hindi."
print(get_gorilla_response(prompt, model="gorilla-7b-hf-v1"))

<<<domain>>>: Natural Language Processing Text2Text Generation
<<<api_call>>>: m2m100_model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>:1. Import the M2M100ForConditionalGeneration and M2M100Tokenizer from the transformers library.
2. Load the pretrained 'facebook/m2m100_418M' model and its tokenizer.
3. Set the source language and input text in Hindi, and tokenize the input text.
4. Generate the translated output using the model and decode it back to text.
5. Print the translated text.<<<code>>>:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

def load_model():
    model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
    tokenizer = M2M100Tokenizer.from_pretrained('facebook/m2m100_418M')
    return model, tokenizer

def process_data(english_text, language_code_to_target, model, tokenizer):
    tokenizer.src_lang = language_code_to_source
    

## Example 2: Object detection 🔷 with 🤗

In [None]:
# Gorilla `gorilla-7b-hf-v1` with code snippets
# Object Detection
prompt = "I want to build a robot that can detecting objects in an image ‘cat.jpeg’. Input: [‘cat.jpeg’]"
print(get_gorilla_response(prompt, model="gorilla-7b-hf-v1"))

<<<domain>>>: Computer Vision Object Detection
<<<api_call>>>: model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-101-dc5')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>: 1. Import the necessary components from the Hugging Face Transformers library, torch, and PIL (Python Imaging Library).
2. Open the image using PIL's Image.open() function with the provided image path.
3. Initialize the pretrained DETR (DEtection TRansformer) model and the image processor.
4. Generate inputs for the model using the image processor.
5. Pass the inputs to the model, which returns object detection results.
<<<code>>>:

from transformers import AutoFeatureExtractor, AutoModelForObjectDetection
from PIL import Image
import torch

def load_model():
    feature_extractor = AutoFeatureExtractor.from_pretrained('facebook/detr-resnet-101-dc5')
    model = AutoModelForObjectDetection.from_pretrained('facebook/detr-resnet-101-dc5')
    return feature_extractor, model

def proce

## Let's try to invoke APIs from Torch Hub instead for the same prompts!

In [None]:
# Translation ✍ with Torch Hub
prompt = "I would like to translate from English to Chinese."
print(get_gorilla_response(prompt, model="gorilla-7b-th-v0"))

{'domain': 'Machine Translation', 'api_call': \"model = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt')\", 'api_provider': 'PyTorch', 'explanation': 'Load the Transformer model from PyTorch Hub, which is specifically trained on the WMT 2014 English-French translation task.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt')'}"



## ⛳️ With Gorilla being fine-tuned on MPT, and Falcon, you can use Gorilla commercially with no obligations! 🟢

In [None]:
# Gorilla with `gorilla-mpt-7b-hf-v0`
prompt = "I would like to translate from English to Chinese."
print(get_gorilla_response(prompt, model="gorilla-mpt-7b-hf-v0"))

Please provide the English text you would like to translate: \"Translate this text to Chinese:\"\n<<<domain>>>: Natural Language Processing Text2Text Generation\n<<<api_call>>>: M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_1.2B')\n<<<api_provider>>>: Hugging Face Transformers\n<<<explanation>>>: 1. Import the necessary libraries - M2M100ForConditionalGeneration and M2M100Tokenizer from the transformers library.\n2. Load the pretrained model 'facebook/m2m100_1.2B' and its corresponding tokenizer.\n3. Set the source language to English (en) and use the tokenizer to tokenize the input text.\n4. Use the model to generate the translated text in Chinese by providing the tokenized input to the 'generate' function.\n5. Decode the generated tokens back into a readable text string using the tokenizer.\n<<<code>>>: from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer\nsrc_text = \"Translate this text to Chinese:\"\nsrc_lang = \"en\"\nmodel = M2M100ForConditi

## We will deprecate the `gorilla-7b-hf-v0` model on July 4 when we will automatically upgrade all v0 model requests to v1. The only changes between v0 and v1 is better code snippets.
Below are example prompt-responses for `gorilla-7b-hf-v0` Legacy Model for 🤗

In [None]:
prompt = "I would like to translate from English to Chinese."
print(get_gorilla_response(prompt, model="gorilla-7b-hf-v0" ))

<<<domain>>>: Natural Language Processing Text2Text Generation
<<<api_call>>>: M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_1.2B')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>: 1. Import the necessary libraries, which are M2M100ForConditionalGeneration and M2M100Tokenizer from the transformers package.
2. Load the pre-trained model 'facebook/m2m100_1.2B' using the M2M100ForConditionalGeneration.from_pretrained() method. This model is designed for machine-to-machine translation tasks.
3. Load the tokenizer using the M2M100Tokenizer.from_pretrained() method. This tokenizer is used to prepare the input text for the model and convert the translated output back into human-readable text.
4. Define the source text for translation and tokenize it using the tokenizer.
5. Use the model to generate the translated text using the source tokens as input.
6. Decode the translated text using the tokenizer and print the result.


In [None]:
prompt = "I want to build a robot that can detect objects in an image."
print(get_gorilla_response(prompt, model="gorilla-7b-hf-v0"))

<<<domain>>>: Computer Vision Object Detection
<<<api_call>>>: YolosForObjectDetection.from_pretrained('hustvl/yolos-tiny')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>: 1. We first import the necessary classes from the transformers, PIL, and requests packages. This includes YolosForObjectDetection for the object detection model and Image for processing image data.
2. We then use the from_pretrained method of the YolosForObjectDetection class to load the pre-trained model 'hustvl/yolos-tiny'. This model has been trained for object detection tasks, which is exactly what we need for detecting objects in an image.
3. We load the image data from a file or a URL, and then use the model to analyze the image and identify the objects within it.



# 🚀 Using gorilla is as easy as calling `get_gorilla_response()` with your prompt! Try out Gorilla, and share your interesting findings in `#showcase` 🤩 [Discord](https://discord.gg/3apqwwME)!

In [4]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

def load_model():
    model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
    tokenizer = M2M100Tokenizer.from_pretrained('facebook/m2m100_418M')
    return model, tokenizer

def process_data(english_text, language_code_to_target, model, tokenizer):
    tokenizer.src_lang = language_code_to_source
    tokenizer.tgt_lang = language_code_to_target
    encoded_input = tokenizer(english_text, return_tensors='pt')
    generated_tokens = model.generate(**encoded_input, forced_bos_token_id=tokenizer.get_lang_id(language_code_to_target))
    generated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
    return generated_text

english_text = 'I feel very good today.'
language_code_to_source = 'en'
language_code_to_target = 'hi'

In [5]:
process_data(english_text, language_code_to_target, *load_model())

config.json:   0%|          | 0.00/908 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/233 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/298 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/3.71M [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

'आज मैं बहुत अच्छा महसूस करता हूं।'

In [6]:
process_data('I will go to Barrie today', language_code_to_target, *load_model())

'आज मैं बार्सिलोना जाऊंगा।'

In [7]:
process_data('I will go to Niagra today and return via Hamilton tomorrow, stopping by my office in the afternoon to see my boss.', language_code_to_target, *load_model())

'मैं आज नियाग्रा जाऊंगा और कल हमिल्टन के माध्यम से वापस आऊंगा, दोपहर में मेरे कार्यालय में रुक जाऊंगा और मेरे बॉस को देखूंगा।'

In [8]:
prompt = "I want to extract the last layer features from the R3D model fror"
print(get_gorilla_response(prompt, model="gorilla-7b-hf-v1"))

<<<domain>>>: Multimodal Feature Extraction
<<<api_call>>>: r3d_model = AutoModel.from_pretrained('hf-tiny-model-private/tiny-random-R3D')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>: 1. Import the necessary components from the Hugging Face Transformers library.
2. Load the pretrained 'hf-tiny-model-private/tiny-random-R3D' model using the `AutoModel.from_pretrained` method.
3. Process the input images and extract their last layer features using the loaded model.
4. The extracted features can be used for further analysis or other downstream tasks.
<<<code>>>:

from transformers import AutoModel, AutoTokenizer
import torch

def load_model(model_name):
    model = AutoModel.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer

def process_data(images, model, tokenizer):
    inputs = tokenizer(images, return_tensors='pt', padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    features =

In [5]:
from transformers import ViTImageProcessor, ViTModel
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

url_c = 'https://farm4.staticflickr.com/3545/3409800178_24c6f790e6_z.jpg'
image_c = Image.open(requests.get(url_c, stream=True).raw)

url_d = 'https://farm6.staticflickr.com/5332/9374828651_07f9433075_z.jpg'
image_d = Image.open(requests.get(url_d, stream=True).raw)

url_d1 = 'https://farm3.staticflickr.com/2556/4228514131_81f3416db3_z.jpg'
image_d1 = Image.open(requests.get(url_d1, stream=True).raw)

processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224-in21k')
model = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k')

#image = [np.random.randn(3, 224, 224) for _ in range(10)]

inputs = processor(images=[image, image_c, image_d, image_d1], return_tensors="pt")

outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state


In [6]:
last_hidden_states.shape

torch.Size([4, 197, 768])

In [26]:
last_hidden_states[0].shape

torch.Size([197, 768])

In [23]:
import numpy as np

def cosine_similarity_pairs(vectors):
  """
  Finds the cosine similarity between each pair of 2D vectors inside another vector.

  Args:
    vectors: A NumPy array of shape (n, 2) containing the 2D vectors.

  Returns:
    A NumPy array of shape (n, n) containing the cosine similarity between each pair of vectors.
  """

  # Calculate the dot product of each pair of vectors.
  dot_products = np.dot(vectors, vectors.T)

  # Calculate the magnitude of each vector.
  magnitudes = np.linalg.norm(vectors, axis=1)

  # Avoid division by zero.
  magnitudes[magnitudes == 0] = 1

  # Calculate the cosine similarity.
  cosine_similarity = dot_products / (magnitudes[:, np.newaxis] * magnitudes)

  return cosine_similarity

In [24]:
vectors = np.array([[1, 0], [0, 1], [1, 1]])

cosine_similarity = cosine_similarity_pairs(vectors)

print(cosine_similarity)

[[1.         0.         0.70710678]
 [0.         1.         0.70710678]
 [0.70710678 0.70710678 1.        ]]


In [18]:
type(image)

In [10]:
from transformers import AutoModel, AutoTokenizer
import torch
import numpy as np

def load_model(model_name):
    model = AutoModel.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer

def process_data(images, model, tokenizer):
    inputs = tokenizer(images, return_tensors='pt', padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    features = outputs.last_hidden_state
    return features

# Define input data
images = [np.random.randn(3, 224, 224) for _ in range(10)]

# Load the model and tokenizer
model, tokenizer = load_model('hf-tiny-model-private/tiny-random-R3D')

# Process the data
features = process_data(images, model, tokenizer)

print(features)

OSError: hf-tiny-model-private/tiny-random-R3D is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

In [31]:
import numpy as np

# Example list of vectors, where each vector is of shape (197, 768)
vector_list = [np.random.rand(197, 768) for _ in range(3)]  # Replace with your actual data

# Flatten each vector of shape (197, 768) to (197*768,)
flattened_vectors = [v.flatten() for v in vector_list]

# Convert list of vectors to a numpy array
data = np.array(flattened_vectors)

# Normalize the vectors to unit length
norms = np.linalg.norm(data, axis=1, keepdims=True)
normalized_data = data / norms

# Compute the cosine similarity matrix
cosine_similarity_matrix = np.dot(normalized_data, normalized_data.T)

print(cosine_similarity_matrix)


[[1.         0.75157936 0.75077004]
 [0.75157936 1.         0.74982464]
 [0.75077004 0.74982464 1.        ]]


In [7]:
last_hidden_states = last_hidden_states.detach()

In [8]:
import numpy as np

# Example list of vectors, where each vector is of shape (197, 768)
vector_list = [np.random.rand(197, 768) for _ in range(3)]  # Replace with your actual data

# Flatten each vector of shape (197, 768) to (197*768,)
flattened_vectors = [v.flatten() for v in last_hidden_states]

# Convert list of vectors to a numpy array
data = np.array(flattened_vectors)

# Normalize the vectors to unit length
norms = np.linalg.norm(data, axis=1, keepdims=True)
normalized_data = data / norms

# Compute the cosine similarity matrix
cosine_similarity_matrix = np.dot(normalized_data, normalized_data.T)

print(cosine_similarity_matrix)

[[0.99999994 0.2181006  0.03333519 0.02924326]
 [0.2181006  1.         0.02654453 0.11337216]
 [0.03333519 0.02654453 1.0000004  0.1966406 ]
 [0.02924326 0.11337216 0.1966406  0.9999998 ]]


In [34]:
197*768

151296

In [9]:
import torch
import torch.nn.functional as F

# Assuming we have a list of embeddings, where each embedding is of shape (197, 768)
embedding_list = [torch.rand(197, 768) for _ in range(3)]  # Example embeddings

# Flatten each embedding from (197, 768) to (197 * 768)
flattened_embeddings = [v.flatten() for v in last_hidden_states]

# Stack embeddings into a single tensor of shape (N, 197 * 768)
embedding_tensor = torch.stack(flattened_embeddings)  # Shape (N, 151296)

# Calculate the cosine similarity matrix
cosine_similarity_matrix = F.cosine_similarity(embedding_tensor.unsqueeze(1), embedding_tensor.unsqueeze(0), dim=-1)

print(cosine_similarity_matrix)


tensor([[1.0000, 0.2181, 0.0333, 0.0292],
        [0.2181, 1.0000, 0.0265, 0.1134],
        [0.0333, 0.0265, 1.0000, 0.1966],
        [0.0292, 0.1134, 0.1966, 1.0000]])


In [38]:
embedding_tensor.shape

torch.Size([3, 151296])

In [39]:
embedding_tensor.unsqueeze(1).shape

torch.Size([3, 1, 151296])

In [10]:
embedding_tensor.unsqueeze(0).shape

torch.Size([1, 4, 151296])

In [24]:
def avg_pool(original_array):
  import numpy as np

  # Original array of length 151296
  #original_array = np.random.rand(151296)  # Replace with your actual data

  if isinstance(original_array, torch.Tensor):
        original_array = original_array.numpy()

  # Calculate the segment length
  segment_length = 151296 // 1024

  # Averaging each segment
  downsampled_array = np.array([np.mean(original_array[i:i + segment_length], dtype=np.float64) for i in range(0, 151296, segment_length)])

  return downsampled_array  # Should be (1025,)


In [33]:
def pca(original_array):
  import numpy as np
  from sklearn.decomposition import PCA


  if isinstance(original_array, torch.Tensor):
        original_array = original_array.numpy()

  # Reshape to a 2D array as PCA works on 2D arrays
  reshaped_array = original_array.reshape(1, -1)  # Shape (1, 151296)

  # Apply PCA
  pca = PCA(n_components=1024)
  downsampled_array = pca.fit_transform(reshaped_array)

  # Flatten to a 1D array
  downsampled_array = downsampled_array.flatten()

  return downsampled_array  # Should be (1025,)

In [28]:
flattened_embeddings_avg = [avg_pool(v) for v in flattened_embeddings]
# Convert numpy arrays to tensors
flattened_embeddings_avg = [torch.from_numpy(v) for v in flattened_embeddings_avg]

# Stack embeddings into a single tensor of shape (N, 197 * 768)
embedding_tensor = torch.stack(flattened_embeddings_avg) # Shape (N, 151296)

# Calculate the cosine similarity matrix
cosine_similarity_matrix = F.cosine_similarity(embedding_tensor.unsqueeze(1), embedding_tensor.unsqueeze(0), dim=-1)

print(cosine_similarity_matrix)

tensor([[1.0000, 0.7671, 0.7789, 0.6616],
        [0.7671, 1.0000, 0.6831, 0.5856],
        [0.7789, 0.6831, 1.0000, 0.7509],
        [0.6616, 0.5856, 0.7509, 1.0000]], dtype=torch.float64)


In [26]:
tensor([[1.0000, 0.2181, 0.0333, 0.0292],
        [0.2181, 1.0000, 0.0265, 0.1134],
        [0.0333, 0.0265, 1.0000, 0.1966],
        [0.0292, 0.1134, 0.1966, 1.0000]])

NameError: name 'tensor' is not defined

tensor([[1.0000, 0.2181, 0.0333, 0.0292],
        [0.2181, 1.0000, 0.0265, 0.1134],
        [0.0333, 0.0265, 1.0000, 0.1966],
        [0.0292, 0.1134, 0.1966, 1.0000]])

In [32]:
flattened_embeddings_avg = [pca(v) for v in flattened_embeddings]
# Convert numpy arrays to tensors
flattened_embeddings_avg = [torch.from_numpy(v) for v in flattened_embeddings_avg]

# Stack embeddings into a single tensor of shape (N, 197 * 768)
embedding_tensor = torch.stack(flattened_embeddings_avg) # Shape (N, 151296)

# Calculate the cosine similarity matrix
cosine_similarity_matrix = F.cosine_similarity(embedding_tensor.unsqueeze(1), embedding_tensor.unsqueeze(0), dim=-1)

print(cosine_similarity_matrix)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


  explained_variance_ = (S**2) / (n_samples - 1)
  explained_variance_ = (S**2) / (n_samples - 1)
  explained_variance_ = (S**2) / (n_samples - 1)
  explained_variance_ = (S**2) / (n_samples - 1)


In [42]:
def conv1d(original_array):
  import numpy as np
  import torch
  import torch.nn as nn

  # Original tensor of length 151296
  #original_tensor = torch.rand(1, 1, 151296)  # Shape (1, 1, 151296) for Conv1d

  original_array = original_array.reshape(1, 1, -1)

  # Calculate the kernel size and stride
  target_length = 1024
  kernel_size = 151296 // target_length
  stride = kernel_size  # Same as kernel size for downsampling

  # Define a convolutional layer
  conv1d = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=kernel_size, stride=stride)

  # Apply the convolutional layer
  downsampled_tensor = conv1d(original_array)

  # Reshape to get the final output
  downsampled_tensor = downsampled_tensor.view(-1)

  return downsampled_tensor  # Should be torch.Size([1025])

  # Since the output might slightly overshoot, we slice it to the exact target length
  downsampled_tensor = downsampled_tensor[:, :, :target_length]

  # Reshape to get the final output
  downsampled_tensor = downsampled_tensor.view(-1)

  return downsampled_tensor  # Should be torch.Size([1025])


In [44]:
flattened_embeddings_avg = [conv1d(v) for v in flattened_embeddings]
# Convert numpy arrays to tensors
#flattened_embeddings_avg = [torch.from_numpy(v) for v in flattened_embeddings_avg]

# Stack embeddings into a single tensor of shape (N, 197 * 768)
embedding_tensor = torch.stack(flattened_embeddings_avg) # Shape (N, 151296)

# Calculate the cosine similarity matrix
cosine_similarity_matrix = F.cosine_similarity(embedding_tensor.unsqueeze(1), embedding_tensor.unsqueeze(0), dim=-1)

print(cosine_similarity_matrix)

tensor([[ 1.0000,  0.0465, -0.1088,  0.0234],
        [ 0.0465,  1.0000, -0.1353,  0.0883],
        [-0.1088, -0.1353,  1.0000,  0.0554],
        [ 0.0234,  0.0883,  0.0554,  1.0000]], grad_fn=<SumBackward1>)
