## Exercise 1: Write Translation Functions
- Create two translation functions:  
  1. **`bert2gpt4()`**: Converts BERT tokens into GPT-4 tokens.  
  2. **`gpt42bert()`**: Converts GPT-4 tokens into BERT tokens.  
- These functions will serve as bridges between the two tokenization systems.  

In [None]:
# Import the libs

import tiktoken # for GBT4 tokenizer
from transformers import BertTokenizer # for BERT tokenizer

In [2]:
gbt4_tokenizer = tiktoken.get_encoding("cl100k_base")
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [3]:
testText = "Roses are red, herbs are green, why Acar's eyes are not purple?"

In [4]:
# Defining BERT to GBT4 translation function

def bert2gbt4(bertToks):
    text = bert_tokenizer.decode(bertToks)
    c = gbt4_tokenizer.encode(text)
    return c

BERTtoks=bert_tokenizer.encode(testText)
gbtToks=bert2gbt4(BERTtoks)
print(gbtToks)
print(gbt4_tokenizer.decode(gbtToks))


[58, 88816, 60, 61741, 527, 2579, 11, 42393, 527, 6307, 11, 3249, 1645, 277, 596, 6548, 527, 539, 25977, 30, 510, 82476, 60]
[CLS] roses are red, herbs are green, why acar's eyes are not purple? [SEP]


In [5]:
# Defining GBT4 to BERT translation function

def gbt42bert(gbt4Toks):
    text = gbt4_tokenizer.decode(gbt4Toks)
    b = bert_tokenizer.encode(text)
    return b[1:-1]

GBT4tokens = gbt4_tokenizer.encode(testText)
bertToks = gbt42bert(GBT4tokens)
print(bertToks)
print(bert_tokenizer.decode(bertToks))


[10529, 2024, 2417, 1010, 17561, 2024, 2665, 1010, 2339, 9353, 2906, 1005, 1055, 2159, 2024, 2025, 6379, 1029]
roses are red, herbs are green, why acar's eyes are not purple?


## Exercise 2: BERT ➡ GPT-4 ➡ BERT 
- Convert BERT tokens into GPT-4 tokens using the `bert2gpt4()` function.  
- Then, translate back to BERT tokens using `gpt42bert()`.  
- Validate the round-trip conversion to ensure accuracy.

In [6]:
text01 = "World is so violent, I shouldn't be so humble to anyone."

print(f"Original text:\n\t{text01}")

bertToks01 = bert_tokenizer.encode(text01)[1:-1]
print(f"\nBERT tokens:\n\t{bertToks01}")

gbtToks01 = bert2gbt4(bertToks01)
print(f"\nBERT to GBT4:\n\t{ gbt4_tokenizer.decode(gbtToks) }")

back_to_bertToks = gbt42bert(gbtToks01)
print(f"\nBack to BERT:\n\t{ bert_tokenizer.decode(back_to_bertToks) }")

Original text:
	World is so violent, I shouldn't be so humble to anyone.

BERT tokens:
	[2088, 2003, 2061, 6355, 1010, 1045, 5807, 1005, 1056, 2022, 2061, 15716, 2000, 3087, 1012]

BERT to GBT4:
	[CLS] roses are red, herbs are green, why acar's eyes are not purple? [SEP]

Back to BERT:
	world is so violent, i shouldn't be so humble to anyone.


## Exercise 3: GPT-4 ➡ BERT ➡ GPT-4
- Start with GPT-4 tokens

In [7]:
text02 = "Can a human live 300 years?"

print(f"Original text:\n\t{text02}")

gbtToks02 = gbt4_tokenizer.encode(text02)
print(f"\nGBT4 tokens:\n\t{gbtToks02}")

bertToks02 = gbt42bert(gbtToks02)
print(f"\nGBT4 to BERT:\n\t{bert_tokenizer.decode(bertToks02)}")

back_to_gbtToks = bert2gbt4(bertToks02)
print(f"\nBack to GBT4:\n\t{gbt4_tokenizer.decode(back_to_gbtToks)}")



Original text:
	Can a human live 300 years?

GBT4 tokens:
	[6854, 264, 3823, 3974, 220, 3101, 1667, 30]

GBT4 to BERT:
	can a human live 300 years?

Back to GBT4:
	can a human live 300 years?
