# 3. Prompt Engineering Techniques and Test-time Scaling (TTS) 
---

### Background

After the LLMs have been pre-trained, it is essential to employ a specific decoding strategy and Test-time scaling to generate the appropriate output from the LLMs.

#### Greedy Search

A basic decoding method is a greedy search that predicts the most likely token at each step based on the previously generated tokens. Greedy search selects the token `coffee,` which has the highest probability at the current step.  A greedy search can achieve satisfactory results in text generation tasks (e.g., machine translation and text summarization), where the output is highly dependent on the input. However, in terms of open-ended generation tasks (e.g., story generation and dialogue), greedy search sometimes generates awkward and repetitive sentences.

<center><img src="images/decode-strategy.png" width="450px" height="350px" /></center>
<center><i>source: <a href="https://arxiv.org/abs/2303.18223">arXiv:2303.18223 [cs.CL] </a></i></center>

#### Greedy Search Improvement
Selecting the token with the highest probability at each step may result in overlooking a sentence with a higher overall probability but a lower local estimation. Improvement strategies to alleviate this issue include:
- **Beam search**: Beam search retains the sentences with the n (beam size) highest probabilities at each step during the decoding process, and finally selects the generated response with the top probability. 
- **Length penalty**: Since beam search favors shorter sentences, imposing a length penalty (a.k.a., length normalization) is a commonly used technique to overcome this issue, which normalizes the sentence probability according to the sentence length (divided by an exponential power α of the length).



#### Sampling-Based

Sampling-based methods are proposed to randomly select the next token based on its probability distribution, thereby enhancing randomness and diversity during the generation process. For example, in the screenshot above, sampling-based methods will sample the word `coffee` with a higher probability, while also retaining the possibility of selecting other words, such as `water,` `tea,` and `rice.`

#### Sampling-Based Improvement

Sampling-based methods sample the token across the entire vocabulary, which may select incorrect or irrelevant tokens (e.g., `happy` and `Boh` in the screenshot above) based on the context. 
- **Temperature sampling**: A practical method to modulate the randomness of sampling is to adjust the temperature coefficient of the softmax function to compute the probability of a token over the vocabulary. Reducing the temperature t increases the chance of selecting words with high probabilities. When t is set to 1, it becomes the default random sampling; as t approaches 0, it is equivalent to greedy search.
- **Top-k sampling**: Different from temperature sampling, top-k sampling directly truncates the tokens with lower probability and only samples from the tokens with the top k highest probabilities.
- **Top-p sampling**: Since top-k sampling does not consider the overall possibility distribution, a constant value of k may not be suitable for different contexts. Therefore, top-p sampling (a.k.a., nucleus sampling) is proposed by sampling from the smallest set having a cumulative probability above (or equal to) p.

### Overview of Prompt Engineering

#### What is Prompting?
In LLM, `prompting` refers to the art of creating precise instructions (e.g., input text ), providing it to a model to generate desired outputs or responses. The effectiveness of a prompt lies in its ability to guide a model to understanding and generate responses that align with user expectations.

#### Prompt Engineering
[Prompt engineering](https://www.promptingguide.ai/) is the process of developing and optimizing prompts to efficiently utilize language models (LMs) across a wide range of applications. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs) and also to interface with them.  Researchers utilize prompt engineering to enhance the capabilities of LLMs across a wide range of common and complex tasks, including question answering and arithmetic reasoning. 

#### Prompting Basics

The [basic principles of prompting](https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/free_courses/Applied_LLMs_Mastery_2024/week2_prompting.md) involve having specific components essential for solving the task at hand. This includes:
- **Instruction**: Clearly specify the task or action you want the model to perform. This sets the context for the model's response and guides its behavior.
- **Context**: Provide external information or additional context that helps the model better understand the task and generate more accurate responses. Context can be crucial in steering the model towards the desired outcome.
- **Input Data**: Include the input or question for which you seek a response. This is the information on which you want the model to act or provide insights.
- **Output Indicator**: Define the type or format of the desired output. This guides the model in presenting the information in a way that aligns with your expectations.

Let's examine an example of our dialogue summarization prompt used for inference in the previous notebook:

```text
"### Instruction: Write a summary of the conversation below. ### Input: Will: hey babe, what do you want for dinner tonight?\nEmma: gah, don't even worry about it tonight\nWill: what do you mean? everything ok?\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\nWill: Well what time will you be home?\nEmma: soon, hopefully\nWill: you sure? Maybe you want me to pick you up?\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home.\nWill: Alright, love you.\nEmma: love you too."

```
In this example:

- **Instruction**: *"Write a summary of the conversation below"*
- **Context**:  *N/A*
- **Input data**: *"Input: Will: hey babe, what do you want for dinner tonight?\nEmma: gah, don't even worry about it tonight\nWill: what do you mean? everything ok?\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\nWill: Well what time will you be home?\nEmma: soon, hopefully\nWill: you sure? Maybe you want me to pick you up?\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home.\nWill: Alright, love you.\nEmma: love you too."*
- **Output indicator**: *default is plain text. It could be sentiment, intent, etc.*

In the example, the context and output indicators were not explicitly used, but they can also be incorporated into the prompt to provide additional information that aids the model in better understanding the task and deciding on the output format (e.g., JSON). You can visit [here](https://www.promptingguide.ai/introduction/tips) to read about tips for designing effective prompts.

Before we explore various prompt techniques, first, let's write the inference script `run_inference.py` to test the prompts. To reuse the script for different prompting techniques, the prompt will be stored in an environment variable `os.environ["prompts"]` and the value will be overridden.


In [None]:
%%writefile run_inference.py

from megatron.core.inference.common_inference_params import CommonInferenceParams
import nemo.lightning as nl
from nemo.collections.llm import api
import torch
import argparse
#import torch
import gc
import os

def run(prompts):
    strategy = nl.MegatronStrategy(
    tensor_model_parallel_size=1,
    pipeline_model_parallel_size=1,
    context_parallel_size=1,
    sequence_parallel=False,
    setup_optimizers=False,
    )
    
    trainer = nl.Trainer(
    accelerator="gpu",
    devices=1,
    num_nodes=1,
    strategy=strategy,
    plugins=nl.MegatronMixedPrecision(
        precision="bf16-mixed",
        params_dtype=torch.bfloat16,
        pipeline_dtype=torch.bfloat16,
    ),)
    
    adapter_checkpoint = "/workspace/log/checkpoints/model_name=0--val_loss=0.00-step=99-consumed_samples=3200.0-last"  #
    #adapter_checkpoint = "/workspace/model/Qwen/Qwen2.5-Coder-32B-Instruct"
    results = api.generate(
    path=adapter_checkpoint,
    prompts=prompts,
    trainer=trainer,
    inference_params=CommonInferenceParams(temperature=0.2, top_p=0.7, num_tokens_to_generate=100),
    text_only=True,
    )
    return results

if __name__ == "__main__":
    prompts = []
    prompt = os.environ["prompts"]
    prompts.append(prompt)
    results = run(prompts)
    nos_of_result= len(results)
    for summary in results:
        top_summary = summary.split("###")[0]
        top_summary = summary.split(".")[0]    
        print("=" * 50)
        print("Prompt Output ")
        print("=" * 50, '\n')
        print(top_summary)
        print("=" * 50, '\n')
    #print(results)

### Exploring Finetune Model with Prompt Techniques

In this section, we will prompt our fine-tuned `Llama-3.1-8B` model using one of the prompting techniques highlighted below. 

- **Zero-shot prompting**: Zero-shot prompting means that the prompt used to interact with the model won't contain examples or demonstrations. The zero-shot prompt directly instructs the model to perform a task without any additional examples to steer it. 

```text
Prompt:
         "Classify the text into neutral, negative, or positive. Text: I think the food was okay. Sentiment:"

Output:  Neutral
```

Run the cell below to test the prompt.

In [None]:
import os
os.environ["prompts"] = "### Classify the text into neutral, negative, or positive. ### Text: I think the food was okay. ### Sentiment:"
!torchrun run_inference.py

**Expected Output:**

```python
...

[NeMo I 2025-08-10 15:35:41 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1754840141.548s : Time spent in load_checkpoint: 0.131s
static requests: 100%|███████████████████████████| 1/1 [01:58<00:00, 118.77s/it]
==================================================
Prompt Output
================================================== 

 neutral
================================================== 
```

- **Few-shot prompting**: Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.

```text
Prompt:

        A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
        We were traveling in Africa and we saw these very cute whatpus.
 
        To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:

Output: 
        When we won the game, we all started to farduddle in celebration.

```

Run the cell below to test the prompt.

In [None]:
os.environ["prompts"] = "### Generate the output using the given example and output . ### Example: A \"whatpu\" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: \nOutput: We were traveling in Africa and we saw these very cute whatpus. \nExample: When we won the game, we all started to farduddle in celebration. ### Output:"

!torchrun run_inference.py

**Expected Output**
```python
...

[NeMo I 2025-08-10 15:43:52 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1754840632.192s : Time spent in load_checkpoint: 0.131s
static requests: 100%|███████████████████████████| 1/1 [02:00<00:00, 120.75s/it]
==================================================
Prompt Output 
================================================== 

 We won the game and started farduddling in celebration
================================================== 
```

- **Chain-of-thought (CoT)**: CoT prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.

```text
prompt:

        The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
        A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

        The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
        A: Adding all the odd numbers (17, 19) gives 36. The answer is True.

        The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
        A: Adding all the odd numbers (11, 13) gives 24. The answer is True.

        The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
        A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.

        The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
        A:

Output:

        Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.

```

Run the cell below to test the prompt.

In [None]:
os.environ["prompts"] = "### Generate the output by following the given question and answer examples. ### Example: The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.\
                             Output: Adding all the odd numbers (9, 15, 1) gives 25, the answer is False. \
                           Example: The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.\
                           Output: Adding all the odd numbers (17, 19) gives 36, the answer is True.\
                           Example: The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.\
                           Output: Adding all the odd numbers (11, 13) gives 24, the answer is True.\
                           Example: The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.\
                           Output: Adding all the odd numbers (17, 9, 13) gives 39, the answer is False.\
                           Example: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.\
                           ### Output:"

!torchrun run_inference.py

**Expected Output:**

```python
...
[NeMo I 2025-08-10 16:17:12 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1754842632.669s : Time spent in load_checkpoint: 0.130s
static requests: 100%|███████████████████████████| 1/1 [02:02<00:00, 122.26s/it]
==================================================
Prompt Output 
================================================== 

 Adding all the odd numbers (15, 5, 13, 7, 1) gives 41, the answer is False
================================================== 

```
**You can check [here](https://www.promptingguide.ai/techniques) for more advanced prompt engineering techniques.**

#### Zero-shot Prompting With Dialogue Text

In this section, we will apply prompts containing dialogue text and explore some intents, such as `urgency`, `main issue`, and `sentiment`, derived from the dialogue prompt.

- **Prompt Composition Sample**:
  
  ***Instruction:***  *Write the main issue for the dialogue*
  
  ***Input data:***  *Dialogue: user: #Sprint Absolutely HORRIBLE customer service and from a so-called \"supervisor\" at that.\nagent: This concerns us. Customer service is one of our main priorities. Please allow us to turn this around. -Maria Q\nuser: REALLY - Haven't seen GOOD customer service since the day I signed up back in 2000 - I don't think longtime customers is a \"priority\" Just got hung up on after a 1:24 call -- 75% was on hold including the last 30 minutes. people need to stay away from\nagent: Please DM us to talk about this. I'd be more than able to provide feedback on the agent that placed you on hold for so long. -Maria Q\nuser: one question. I need to switch carriers BEFORE cancelling service to keep my phone number right? ?\nagent: We hate that you had that experience! We want to make sure that you are satisfied with Sprint -Maria Q What were you trying to do? I bet I can handle it for you! Please DM us -Maria Q.*
  
    ***Output Indicator:*** *Main issue:*
  

Now let's test our fine-tuned model to deduce the `main issue` in the dialogue text. 

In [None]:
os.environ["prompts"] = "### Write the main issue for the dialogue ### Dialogue: user: #Sprint Absolutely HORRIBLE customer service and from a so-called \"supervisor\" at that.\nagent: This concerns us. Customer service is one of our main priorities.\
        Please allow us to turn this around. -Maria Q\nuser: REALLY - Haven't seen GOOD customer service since the day I signed up back in 2000 - I don't think longtime customers is a \"priority\" Just got hung up on after a 1:24 call -- 75% \
        was on hold including the last 30 minutes. people need to stay away from\nagent: Please DM us to talk about this. I'd be more than able to provide feedback on the agent that placed you on hold for so long. -Maria Q\nuser: one question.\
        I need to switch carriers BEFORE cancelling service to keep my phone number right? ?\nagent: We hate that you had that experience! We want to make sure that you are satisfied with Sprint -Maria Q What were you trying to do? I bet I can handle it for you! Please DM us -Maria Q. ### Main issue:"

!torchrun run_inference.py

**Expected Output:**

```python
...
[NeMo I 2025-08-10 17:07:40 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1754845659.952s : Time spent in load_checkpoint: 0.132s
static requests: 100%|████████████████████████████| 1/1 [00:42<00:00, 42.22s/it]
==================================================
Prompt Output 
================================================== 

 Poor customer service
================================================== 
```

Next, let's test our fine-tuned model to deduce the `sentiment` in the dialogue text.

In [None]:
os.environ["prompts"] = "### Classify the dialogue into this sentiment neutral, negative, or positive ### Dialogue: user: #Sprint Absolutely HORRIBLE customer service and from a so-called \"supervisor\" at that.\nagent: This concerns us. Customer service is one of our main priorities.\
        Please allow us to turn this around. -Maria Q\nuser: REALLY - Haven't seen GOOD customer service since the day I signed up back in 2000 - I don't think longtime customers is a \"priority\" Just got hung up on after a 1:24 call -- 75% \
        was on hold including the last 30 minutes. people need to stay away from\nagent: Please DM us to talk about this. I'd be more than able to provide feedback on the agent that placed you on hold for so long. -Maria Q\nuser: one question.\
        I need to switch carriers BEFORE cancelling service to keep my phone number right? ?\nagent: We hate that you had that experience! We want to make sure that you are satisfied with Sprint -Maria Q What were you trying to do? I bet I can handle it for you! Please DM us -Maria Q. ### Sentiment:"

!torchrun run_inference.py

**Expected Output:**

```python
...
[NeMo I 2025-08-10 17:16:11 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1754846171.813s : Time spent in load_checkpoint: 0.131s
static requests: 100%|████████████████████████████| 1/1 [00:41<00:00, 41.72s/it]
==================================================
Prompt Output 
================================================== 

 negative
agent: I'm sorry to hear that
================================================== 

```

The sentiment is `negative`. The output also includes some other text, but this can be sieved through formatting. 

Lastly, let's test our fine-tuned model to deduce the `Urgency` in the dialogue text.

In [None]:
os.environ["prompts"] = "### Classify the dialogue urgency into High, medium, or low. ### Dialogue: user: #Sprint Absolutely HORRIBLE customer service and from a so-called \"supervisor\" at that.\nagent: This concerns us. Customer service is one of our main priorities.\
        Please allow us to turn this around. -Maria Q\nuser: REALLY - Haven't seen GOOD customer service since the day I signed up back in 2000 - I don't think longtime customers is a \"priority\" Just got hung up on after a 1:24 call -- 75% \
        was on hold including the last 30 minutes. people need to stay away from\nagent: Please DM us to talk about this. I'd be more than able to provide feedback on the agent that placed you on hold for so long. -Maria Q\nuser: one question.\
        I need to switch carriers BEFORE cancelling service to keep my phone number right? ?\nagent: We hate that you had that experience! We want to make sure that you are satisfied with Sprint -Maria Q What were you trying to do? I bet I can handle it for you! Please DM us -Maria Q. ### Urgency:"

!torchrun run_inference.py

**Expected Output**

```python
...
[NeMo I 2025-08-10 17:26:34 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1754846794.044s : Time spent in load_checkpoint: 0.132s
static requests: 100%|████████████████████████████| 1/1 [00:41<00:00, 41.85s/it]
==================================================
Prompt Output 
================================================== 

 High
================================================== 
```

### Test-time Scaling (TTS) Approaches

TTS enhances reasoning during inference, typically without model updates. The screenshot below presents a taxonomy of TTS methods, categorizing them based on their underlying techniques. It provides an overview of Test-time Scaling methods: parallel scaling, sequential scaling, and search-based methods. It also shows how they integrate into a compute-optimal strategy. For more details on TTS, please refer to the paper [arXiv:2502.21321 cs.CL](https://arxiv.org/abs/2502.21321) .

<center><img src="images/tts.png" width="600px" height="350px" /></center>
<center><i>source: <a href="https://arxiv.org/abs/2502.21321">arXiv:2502.21321 [cs.CL] </a></i></center>

Beside, `Beam Search` and `Best-of-N (BON) Search`, scaling strategies that we will highlight some of the TTS approaches:

- **Compute-Optimal Scaling (COS)**: COS is a dynamic method designed to allocate computational resources efficiently during inference in LLMs, optimizing accuracy without unnecessary expense. This approach categorizes prompts into five difficulty levels—ranging from easy to hard—either by leveraging oracle difficulty (ground-truth success rates) or model-predicted difficulty (e.g., verifier scores from Preference Ranking Models).
Once categorized, the strategy adapts to compute allocation: easier prompts undergo sequential refinement, where the model iteratively refines its output to improve correctness, while harder prompts trigger parallel sampling or beam search, which explores multiple response variations to increase the likelihood of finding a correct solution. For a deep dive, you can read more on [arXiv:2408.03314 cs.LG](https://arxiv.org/abs/2408.03314).

- **Chain-of-thought (CoT) Prompting**: CoT prompting enables LLMs to produce intermediate reasoning steps by breaking down problems into logical sub-steps, rather than jumping directly to the final answer. It enables LLM to perform multi-step inferences that improve performance on complex tasks, such as math word problems, logical puzzles, and multi-hop Question Answering. Properties of CoT that facilitate reasoning in LLM:
    - Chain of thought, in principle, allows models to decompose multi-step problems into intermediate steps, which means that additional computation can be allocated to problems that require more reasoning steps
    - A chain of thought provides an interpretable window into the behavior of the model, suggesting how it might have arrived at a particular answer and providing opportunities to debug where the reasoning path went wrong.
    - Chain-of-thought reasoning can be used for tasks such as math word problems, commonsense reasoning, and symbolic manipulation, and is potentially applicable (at least in principle) to any task that humans can solve via language.
    - Chain-of-thought reasoning can be readily elicited in sufficiently large off-the-shelf language models simply by including examples of chain-of-thought sequences into the exemplars of few-shot prompting.

<center><img src="images/cot.png" width="500px" height="300px" /></center>
<center><i>source: <a href="https://arxiv.org/abs/2201.11903">arXiv:2201.11903 [cs.CL] </a></i></center>


- **Self-Consistency Decoding**: The method was proposed as an alternative to simple greedy decoding for chain-of-thought prompts. It works by sampling a diverse set of reasoning chains from a model via prompt engineering to encourage different CoTs and using temperature sampling. It then selects the answer that is most consistent across these multiple reasoning paths through a majority vote or the highest probability answer after marginalizing out the latent reasoning. The idea is that if a complex problem has a unique correct answer, different valid reasoning paths should converge to that same answer. In practice, one might sample, e.g., 20 CoTs for a math problem and observe which final answer appears most frequently; that answer is then taken as the model’s output. More references on Self-Consistency Decoding can be found [here](https://arxiv.org/abs/2311.17311).

- **Tree-of-Thought (ToT)**: ToT is an iterative prompting procedure where the model generates thoughts, evaluates them, and refines its approach, mimicking how a human might mentally map out various ways to solve a problem. ToT generalizes the chain-of-thought approach by allowing the model to branch out into multiple possible thought sequences instead of following a single linear chain. [Tree of Thoughts](https://arxiv.org/abs/2305.10601) treats intermediate reasoning steps as “nodes” in a search tree and uses the language model to expand possible next steps (thoughts) from a given state. Rather than sampling one long reasoning path, the model explores a tree of branching thoughts and can perform lookahead and backtracking. At each step, the LLM might generate several candidate next thoughts, and a heuristic or value function evaluates each partial solution state. Then a search algorithm (e.g., depth-first, breadth-first, beam search) navigates this tree, deciding which branches to explore further. ToT is more computationally intensive; however, it shows that allocating extra “thinking time” (computing power) to explore alternatives can yield significantly better reasoning and planning performance.

- **Graph of Thoughts (GoT)**: The [Graph of Thoughts](https://arxiv.org/abs/2308.09687) improved upon the ToT by utilizing graph-based structures, which offer dynamic and efficient reasoning processes, as opposed to the strict hierarchical trees found in ToT. Thoughts are represented in GoT as nodes in a graph, enabling more adaptable dependencies and interconnections. But in ToT, it is represented as a node in a tree with fixed parent-child relationships. GoT expands thought through a graph-based approach that allows dynamic interconnections between thoughts, resulting in the following key transformations: 
    - aggregation (merging multiple solutions into a unified answer), 
    - refinement (iteratively improving thoughts over time), and 
    - generation (producing diverse candidates).


<center><img src="images/comparison.png" width="650px" height="500px" /></center>
<center><i>source: <a href="https://arxiv.org/abs/2502.21321">arXiv:2502.21321 [cs.CL] </a></i></center>

- **Confidence-based Sampling**: In confidence-based sampling, the language model generates multiple candidate solutions or reasoning paths and then prioritizes or selects among them based on the model’s own confidence in each outcome. Approaches includes:
     - *Selection*: Generate N outputs and pick the one with the highest log probability (i.e., the model’s most confident output). This is essentially a best-of-N approach by probability. The model chooses the answer it thinks is most likely to be correct.
     - *Guided exploration*: When exploring a reasoning tree or multi-step solution, use the model’s token probabilities to decide which branch to expand (higher confidence branches are explored first). The model’s probability estimates serve as a heuristic to guide the search through the solution space. Application areas for Confidence-based Sampling include:
    - Incorporated at inference time: Uses a tree-based search for LLM generation by assigning each possible completed (leaf) a confidence score. These confidence scores are used to decide which paths to extend, decide when to halt, or whether to ask a follow-up question.
    - Used in ensemble settings: an LLM may generate multiple answers, and a secondary model evaluates the confidence of each answer as correct, selecting the answer with the highest confidence.


Congratulations! You have completed the lessons. To solidify your understanding, please proceed to the challenge notebook and attempt the problem statement. Good Luck!.

## <center><div style="text-align:center; color:#FF0000; border:3px solid red;height:80px;"> <b><br/> [Next Notebook](challenge.ipynb) </b> </div></center>

---
### References
- [Prompt Engineering Guides: Prompting Techniques](https://www.promptingguide.ai/techniques)
- [Prompt and Promt Enginering](https://github.com/aishwaryanr/awesome-generative-ai-guide/blob/main/free_courses/Applied_LLMs_Mastery_2024/week2_prompting.md)
- [NVIDIA NIM](https://build.nvidia.com/)
- [A Survey of Large Language Models](https://arxiv.org/abs/2303.18223)
- [Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters](https://arxiv.org/abs/2408.03314)
- [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
- [Universal Self-Consistency for Large Language Model Generation](https://arxiv.org/abs/2311.17311)
- [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601)
- [Graph of Thoughts: Solving Elaborate Problems with Large Language Models](https://arxiv.org/abs/2308.09687)


### Licensing
Copyright © 2025 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.