# **Encoder-Decoder for Everyone**
#### By Jashank Kshirsagar
#### **Connect with me on LinkedIn**: [linkedin.com/in/jashank-kshirsagar](https://www.linkedin.com/in/jashank-kshirsagar/)

### **Reccomended Prerequisites:**  
To really understand what the following script does, you must first understand what an Encoder and Decoder are, as well as the architecture of a Transformer model. Come back to this script once you have read the following articles:  
1. https://medium.com/@amanatulla1606/transformer-architecture-explained-2c49e2257b4c  
2. https://kikaben.com/transformers-encoder-decoder/ 

### **Required Library Installs in Terminal :**    
pip install transformers

**STEP 1: LOAD THE TOKENIZER**

In [None]:
from transformers import AutoTokenizer # 'AutoTokenizer' automatically loads the correct tokenizer for the chosen model. 
# A tokenizer is what converts human text into numbers (tokens) that the model can understand.

tokenizer_model_name = "microsoft/phi-2" # We’re selecting the 'microsoft/phi-2' tokenizer because it's small and runs well on most computers.
tokenizer = AutoTokenizer.from_pretrained(tokenizer_model_name)  # This line actually downloads (if not already cached) and loads the tokenizer from Hugging Face’s model hub.
print("Tokenizer Loaded")  # Simple print statement so you know the tokenizer was loaded successfully.

  from .autonotebook import tqdm as notebook_tqdm


Tokenizer Loaded


**STEP 2: DEFINE YOUR INPUT TEXT**

In [None]:
input_text = "Hello, how are you today?" #You can change this to whatever text you'd like to encode

**STEP 3: ENCODE THE ORIGINAL TEXT INTO NUMERIC VECTORS (TOKENS)**  
Each number represents a word, subword or punctuation

In [None]:
encoded_text = tokenizer.encode(input_text)   # Converts the input text into tokens (numbers) using the loaded tokenizer.
print(f"Original Input Text: {input_text}")    # Shows the text as you entered it.
print(f"Encoded Form of Input Text: {encoded_text}")  # Shows the tokenized numerical representation of the text.

Original Input Text: Hello, how are you today?
Encoded Form of Input Text: [15496, 11, 703, 389, 345, 1909, 30]


**STEP 4: DECODE THE NUMERIC VECTORS BACK TO TEXT**

In [None]:
decoded_text = tokenizer.decode(encoded_text) # Converts tokens (numbers) back into human-readable text.
print(f"Encoded Form of Input Text: {encoded_text}")  # Displays the numeric token sequence.
print(f"Decoded Text Returned from Encoded form: {decoded_text}") # Shows the text reconstructed from tokens.

Encoded Form of Input Text: [15496, 11, 703, 389, 345, 1909, 30]
Decoded Text Returned from Encoded form: Hello, how are you today?


### **Summary:**

🔑 In short, here's why Encoding & Decoding Matter:
 - **Encoding** turns text into numbers (tokens) the model can understand.
 - A **Model** works only on numbers, not raw text.
 - **Decoding** converts the model’s numeric output back into readable text.
 - Together this cycle lets humans talk to AI models in natural language. Simple!
