### Author: Hoang Mau Trung
### Position: Machine Learning Engineer
### Email: hoangmautrung@gmail.com

# Part 1: Theoretical Questions

## 1.1 Why do GPT models only use the decoder part of the Transformer architecture?

Answer:

Default Transformer models include 2 parts Encoder and Decoder. The encoder part is given inputs to capture context, it is rich information to support tasks such as machine translate or classification where context affects  final outputs. The decoder part uses to generate outputs.
GPT models are designed for tasks that involve generating text based on a given context or prompt (autoregressive text generation) as context is unnecessary. They use decoder part with masked self-attention to create a model, which generates the next tokens based on previously generated tokens. It also makes a simple architecture and training process, allowing the model can be trained on vast amounts of unannotated text data to learn language patterns and structures. 


## 1.2 Explain the Attention mechanism in Transformer and why it is more effective than traditional RNN architectures. 



## 1.3 Explain the Chain-of-Thought technique. Analyze the advantages and disadvantages of this Prompt Engineering technique.

* Explain CoT technique 

CoT is a way mimics human processing a complex task, it will break problem into simple step by step to gain final result. It is a way we address a math exercise in high shool. It encourages the model reason through intermediate steps before make final answer. By prompting the model with examples of reasoning chains, it can better generalize to new, complex problems. 

Example: How many rows in string "hja34r87392rrr83873rrrr"

Without CoT: LLM will start generate: 

```Number of 'r' in string 'hja34r87392rrr83873rrrr' is 10."```

With CoT:

```
Let's count the number of occurrences of the letter 'r' in the string "hja34r87392rrr83873rrrr" step by step.

Given string:
hja34r87392rrr83873rrrr

Step-by-step counting:
hja34r87392rrr83873rrrr
First 'r' at index 5
Second 'r' at index 10
Third 'r' at index 11
Fourth 'r' at index 12
Fifth 'r' at index 17
Sixth 'r' at index 18
Seventh 'r' at index 19
Eighth 'r' at index 20
Final count:
There are 8 occurrences of the letter 'r' in the given string.
```

* Advantages and Disadvantages

**Advantages**

- Enhancing performance of LLM when splitting complex tasks step by step. 
- Easy control and understanding step-by-step generated by LLM.  
- More details about solutions help us build a chain of thought.   
- Combining more tools or systems to finish a task (Idea of Agents) 

**Disadvantages** 
- Depend on prompts 
- Consuming more tokens because need to process longer -> more cost, and slower 
- Over fitting when building a solution from prompt, and always keep one way to process


# Part 2 Practical Exercises

## 2.1 

In [16]:
from typing import List

def parse_amount(amount_str: str) -> float:
    """Convert string amount to float by removing commas."""
    return float(amount_str.replace(",", ""))

def extract_amounts(text: str) -> List[float]:
    """Extract all numeric amounts from input string.
    
    Args:
        text: Input string containing amounts (e.g. "donate1.23buyapple12,390")
    
    Returns:
        List of extracted amounts as floats
    """
    DIGITS = set("0123456789.,")
    amounts = []
    current = []
    
    for char in text:
        if char in DIGITS:
            current.append(char)
        elif current:
            amounts.append("".join(current))
            current.clear()
            
    if current:
        amounts.append("".join(current))
        
    return [parse_amount(amt) for amt in amounts]

def format_amount(total_amount: float) -> str:

    """Format total amount int defined format"""
    str_total_amount = str(total_amount)
    cent = str_total_amount.split(".")[-1]
    if len(cent) == 1: str_total_amount += '0'
    
    dollar = str_total_amount[:-3]
    cent = str_total_amount[-3:]

    formatted_dollar = ""
    for i, digit in enumerate(reversed(dollar)):
        if i > 0 and i % 3 == 0:
            formatted_dollar = "," + formatted_dollar
        formatted_dollar = digit + formatted_dollar
    
    result = formatted_dollar + cent 

    if result.endswith("00"):
        result = result[:-3]
    
    return result 

def calculate_total(text: str) -> float:
    """Calculate total of all amounts in input string."""
    total_amounts = round(sum(extract_amounts(text)),2)
    return format_amount(total_amounts)

"""
Floating point will make some issues 

0.01 + 0.05 -> 0.060000000000000005

0.01 + 0.06 -> 0.06999999999999999

So I use func round in sum value
"""

In [17]:
inputs = ["donate1.23buyapple12,390", "aa0.01t0.02", "a1b2c3.45", "p0.05c9.95", "a0.01b0.05"]
for _input in inputs:
    print("------------------------")
    print(calculate_total(_input))

------------------------
12,391.23
------------------------
0.03
------------------------
6.45
------------------------
10
------------------------
0.06


## 2.2 

The company's product currently needs a module to classify medical documents into 10 different types (Patient Records, Prescriptions, Infusion Guidelines, Treatment Protocols, etc.). Given a labeled dataset of pairs (medical document PDF - document type label) 

Briefly describe a solution that would effectively solve this problem (which model to use, how to train and test, etc.)

From this requirements we decide into 2 phases in development pace modeling (select solution about model and train/test cycle.)

### Phase 1: Modeling


#### Summary Requirements
Input:
Pair (Document - Label): 
- Labels are from 10 medical document classes.
- Documents are PDFs containing medical text of varying lengths.

Output:
- Model is able to classify medical documents into 10 different types. 
- Model should handle variable length inputs effectively

### Problems Consider 
- Document length variation
- Medical domain 
- Technical constrain: speed, acc, env of production 

### Research and discussion  
- Model type: Text/Document Classification 
- Model Options:
    + BERT
    + Sentence Transformer
    + LongFormer (or other variations optimized for long documents)

Many models are designed to handle long-context inputs, but in practice, we often set a fixed input size, such as 512 tokens for BERT or Sentence Transformer, or a larger size for LongFormer.

- To improve classification performance on long documents, we split each document into multiple chunks, with chunk sizes determined by the model's input limit. During prediction, we classify each chunk individually and use a voting mechanism—averaging the results across all chunks—to determine the final document classification.

Final solution: 

Documents -> N x chunks -> Model (BERT or ST) -> N x classes -> Voting -> Final Class

* chunks = input sizes of model 
* N depend on length of documents (should be get overlap documents)

- Extend idea:
+ Combine result with full text search use ELK to better results. 

### Training Stage 

* Stage 1: Data preprocess: reader, chunking, formatting, splitting train:test:eval ratio 
    + Train:Test:Eval = 70:15:15

* Stage 2: Training  

* Stage 3: Evaluating

Metrics:
    + Acc 
    + F1 score (if imbalance data)
    + Confusion matrix 
    + Recall/Precision per class 


### Phase 2: MLOps: Design system to serving model to production 

![](SystemDesignDocumentClassification.drawio.png)

Some notes from design:

+ Data Preparation: Documents -> Preprocessing (Reading/Chunking/Splitting) -> Storage (S3 & Database) 
+ Model Training: Data Retrieval -> Training Model -> Tracking (MLFlow) -> Checkpoint Selection
+ Deployment: Best Checkpoint -> Model Transformation (ONNX, TensorRT) -> Containerization (Triton) -> Deployment -> Monitoring

+ Trigger CI/CD (Github Action/Jenkins): Automate the deployment process. When a new best checkpoint is identified (e.g., by MLFlow), trigger a CI/CD pipeline to:

    + Transform the model.

    + Build the container.

    + Run tests (unit tests, integration tests).

    + Deploy the new model version to serving environment.

+ Monitoring: 

    + System Metrics: CPU usage, memory usage, GPU utilization, request latency, error rates. 
    + Model Performance Metrics: Track the model's accuracy, precision, recall, etc., in production.
    + Data Drift Detection: Compare the distribution of incoming data to the training data distribution. 
    + Alerting 
    + Sentry to log bugs of serving services.