# Large language models

![FigureP-1.Allpartsandchaptersofthebook]('./figures/FigureP-1.Allpartsandchaptersofthebook.png')


## Part 1: Understanding Language Models

### 1. An introduction to large language models

#### What Is Language AI / Natuaral Language Processing ?

+ A subfield of AI that focuses on developing technologies capable of understanding, processing, and generating human language


![Figure1-2.LanguageAIiscapableofmanytasksbyprocessingtextualinput](./figures/Figure1-2.LanguageAIiscapableofmanytasksbyprocessingtextualinput.png)


+ Bag-of-words: 
    - a method for representing unstructured text
    - Tokenizing => Representation (model)
    - Limitation: this approach ignores the semantic nature or meaning.  

    ![Figure1-5.Abag-of-wordsiscreatedbycountingindividualwords.Thesevaluesarereferredtoasvectorrepresentations.](./figures/Figure1-5.Abag-of-wordsiscreatedbycountingindividualwords.Thesevaluesarereferredtoasvectorrepresentations.png)

+ Word2vec (*neural networks*):
    - capturing the meaning of text in *embeddings*
    - nếu các từ có chung hàng xóm => thường sẽ có embedding giống nhau.
    - embeddings can have many properties to represent the meaning of a word like: the size of embeddings is fixed.

    ![Figure1-8.Thevaluesofembeddingsrepresentpropertiesthatareusedtorepresentwords](./figures/Figure1-8.Thevaluesofembeddingsrepresentpropertiesthatareusedtorepresentwords.png)


+ Types of Embeddings:
    - Word embedings
    - Sentence embedings

    ![Figure1-10.Embeddingscanbecreatedfordifferenttypesofinput](./figures/Figure1-10.Embeddingscanbecreatedfordifferenttypesofinput.png)

+ Encoding and Decoding Context with Attention
    - With w2v, same embedding regardless of context in which it's used. 
        - **NOTE**: tìm các ví dụ chứng minh
    - RNN includes 2 tasks:
        - encoding: representing an input sentence 
        - decoding: generating an output sentence

    ![Figure1-11.Tworecurrentneuralnetworks(decoderandencoder)](./figures/Figure1-11.Tworecurrentneuralnetworks(decoderandencoder).png)

    - Each step in this architecture is *autoregressive*
        - When generating the next word, this architecture needs to consume all previously generated words.

        ![Figure1-12.Eachpreviousoutputtokenisusedasinputtogeneratethenexttoken](./figures/Figure1-12.Eachpreviousoutputtokenisusedasinputtogeneratethenexttoken.png)

+ Attention mechanism can be replace decoder
    
    ![Figure1-14.Attentionallowsamodelto“attend”tocertainpartsofsequencesthatmightrelatemoreorlesstooneanother](./figures/Figure1-14.Attentionallowsamodelto“attend”tocertainpartsofsequencesthatmightrelatemoreorlesstooneanother.png)

    - By adding these attention mechanism to the decoder step

    ![Figure1-15.RNNwithAttention](./figures/Figure1-15.RNNwithAttention.png)

+ Attention Is All You Need
    - Based on *Transformer*:
        - could be trained in parallel -> speed up training

        ![Figure1-16.TheTransformerisacombinationofstackedencoderanddecoderblocks](./figures/Figure1-16.TheTransformerisacombinationofstackedencoderanddecoderblocks.png)

    - 

    
    

#### What are large language models ?

#### What are the common use cases and applications of large language models ?

#### How can we use large language models ourselves?

### 2. Tokens & Embeddings

+ What tokens are & the tokenization methods used to power LLMs

    ![Figure2-1.Languagemodelsdealwithtextinsmallchunkscalledtokens](./figures/Figure2-1.Languagemodelsdealwithtextinsmallchunkscalledtokens.PNG)


#### LLM Tokenization

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.cuda.empty_cache()

In [2]:
def tokenize_by_llm(model_name=None, device='cpu'):
    """
    device (str): cpu / cuda
    """
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map=device,
        torch_dtype='auto',
        trust_remote_code=True
    )
    #
    # tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model

In [3]:
input_prompt = "Write an email applogizing to Sarah for the tragic gardening mishap. Explain how it happened."
LL_MODEL = "microsoft/Phi-3-mini-4k-instruct"
#
model = AutoModelForCausalLM.from_pretrained(
        LL_MODEL,
        device_map='cuda',
        torch_dtype='auto',
        trust_remote_code=True
    )

#
tokenizer = AutoTokenizer.from_pretrained(LL_MODEL)


`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
input_ids = tokenizer(input_prompt, 
                      return_tensors='pt')
#  generate the text
generation_output = model.generate(
    input_ids=input_ids,
    max_new_tokens=5
)


AttributeError: 

#### Text embeddings (for sentences & whole docs)

#### Word embeddings beyond LLMs

#### Embeddings for Recommendation Systems

### 3. Lookig inside large language models

## Part 2: Using Pretrained Language Models

### 4. Text classification

### 5. Text clustering and topic modeling

### 6. Prompt Engineering

### 7. Advanced Text Generation techniques and tools

### 8. Semantic Search & Retrieval Augmented Generation

### 9. Multimodel large language models

## Part 3: Training & Fine-tuning Language Models

### 10. Creating text embedding models

### 11. Fine-tuning representation models for classification

### 12. Fine-tuning Generation Models