In [4]:
import torch
import numpy as np

In [31]:
from transformers import pipeline, TextStreamer

In [35]:
llm_pipeline = pipeline(
    task="text-generation",
    model="google/gemma-2-2b",
    torch_dtype=torch.float16,
    device="cuda",
    token=""
)

Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  3.01it/s]
Device set to use cuda


In [34]:
torch.cuda.empty_cache()
import gc
gc.collect()

13839

In [5]:
pipeline = pipeline(
    task="fill-mask",
    model="google-bert/bert-base-uncased",
    torch_dtype=torch.float16,
    device='cuda'
)

Some weights of the model checkpoint at google-bert/bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda


In [37]:
text_streamer = TextStreamer(llm_pipeline.tokenizer, skip_prompt=True)

In [None]:
llm_pipeline = pipeline(
    task="text-generation",
    model="google/gemma-2-2b",
    torch_dtype=torch.float16,
    device="cuda",
    token=""
)

In [52]:
llm_pipeline("Write me a poem about Embeddings.",
             max_length=100,
             
             streamer=text_streamer
             )[0]['generated_text']

Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)




[Answer 1]

I'm a little late to the party (but I'm not sure if there's a limit for answers here)

Embeddings are the way we can use one model to make multiple models, in this case a classification model (<code>kNN</code>) can be used to make a <code>tree</code> model.

Embeddings can be thought of as the vectors that hold the information that we want to represent in the tree model.

For example, my embeddings would be the words that describe the tree.

And if I'm learning about tree rings, the embeddings of the tree would be the width of the rings.

The embeddings give the data another point of view that is completely different from the data, so we can use the embeddings to make a new model, that has a completely new point of view.

<em>P.S. I made the poem using R to generate the embeddings and then use them to make the tree</em>

<code>## Embedding Data
dat <- setNames(
  c(
    rep("a",20),
    rep("b",20),
    rep("c",20),
    rep("d",20),
    rep("e


'Write me a poem about Embeddings.\n\n[Answer 1]\n\nI\'m a little late to the party (but I\'m not sure if there\'s a limit for answers here)\n\nEmbeddings are the way we can use one model to make multiple models, in this case a classification model (<code>kNN</code>) can be used to make a <code>tree</code> model.\n\nEmbeddings can be thought of as the vectors that hold the information that we want to represent in the tree model.\n\nFor example, my embeddings would be the words that describe the tree.\n\nAnd if I\'m learning about tree rings, the embeddings of the tree would be the width of the rings.\n\nThe embeddings give the data another point of view that is completely different from the data, so we can use the embeddings to make a new model, that has a completely new point of view.\n\n<em>P.S. I made the poem using R to generate the embeddings and then use them to make the tree</em>\n\n<code>## Embedding Data\ndat <- setNames(\n  c(\n    rep("a",20),\n    rep("b",20),\n    rep("c",

In [1]:
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", token="")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", revision="float16", token="")

  from .autonotebook import tqdm as notebook_tqdm
Fetching 2 files: 100%|██████████| 2/2 [00:23<00:00, 11.69s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.40it/s]


In [None]:
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

In [3]:
from transformers import AutoTokenizer

model_name = "google-bert/bert-base-uncased"
bert_tokenizer = AutoTokenizer.from_pretrained(model_name)

In [14]:
bert_tokenizer("My", return_tensors="pt")

{'input_ids': tensor([[ 101, 2026,  102]]), 'token_type_ids': tensor([[0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1]])}

In [9]:
a = np.array([1.0, 2.0])
b = np.array([[1.0, 0.0], [1.0, 0.0]])

In [None]:
"""
a = [1, 2]
b = [[1, 0],
     [1, 0]]

"""

In [None]:
model_checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [38]:
import threading

In [43]:
# Define a train function to be used in different threads
def train_fn():
    x = torch.ones(5, 5, requires_grad=True)
    # forward
    y = (x + 3) * (x + 4) * 0.5
    # backward
    y.sum().backward()
    # potential optimizer update
    print("DONE {i}")


# User write their own threading code to drive the train_fn
threads = []
for i in range(10):
    p = threading.Thread(target=train_fn, args=(i))
    p.start()
    threads.append(p)

for p in threads:
    p.join()

DONE
DONE
DONE
DONE
DONE
DONE
DONE
DONE
DONE
DONE


In [None]:
text = "My name is omer"
