#### Chain of Thought (CoT) in RAG Applications
- CoT can be leveraged in RAG applications to help the model break down complex queries into simpler sub-steps before generating a final output.
  
- The model can first retrieve relevant information from external sources (like documents or databases) and then reason step-by-step based on that information.

- Use Case: For a question-answering RAG application, you can structure the model to perform step-by-step reasoning by asking the model to reason through multiple steps after retrieving relevant information.

In [1]:
import openai

In [2]:
client = openai.OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    # api_key = api_key
)

In [3]:
# Chain of Thought Example: Reasoning through steps
def chain_of_thought(query, context):

# Constructing the chat messages
    messages = [
        {"role": "system", "content": "You are a helpful assistant in the field of weather forecasting."},
        {"role": "user", "content": f"Let us break down the reasoning for the following question step by step:\n\n{query}\n\nRelevant context:\n{context}\n\nFirst, summarize the key concepts involved, then explore the different theories before giving a final conclusion."}
    ]
    
    response = client.chat.completions.create(
        messages = messages,
        model    = "gpt-4o-mini",
    )
    
    return response.choices[0].message.content

In [4]:
# data (weather related facts)
context = """
The temperature today is 28°C, which is considered warm.
The forecast predicts a slight increase in temperature tomorrow, reaching 30°C.
It has been raining for two days straight, which might bring cooler weather.
The humidity level is also high, which can make the temperature feel warmer.
"""

query = "How will the weather change tomorrow?"

In [5]:
# Run Chain of Thought reasoning
result = chain_of_thought(query, context)
print(result)

### Key Concepts Involved:

1. **Current Temperature**: Today’s temperature is 28°C.
2. **Forecasted Temperature**: Tomorrow’s forecast estimates a rise to 30°C.
3. **Precipitation Pattern**: It has been raining for two consecutive days.
4. **Impact of Rain**: Continuous rain may lead to cooler weather.
5. **Humidity Levels**: High humidity is present, which can affect how warm the temperature feels.

### Exploring Different Theories:

1. **Temperature Increase (Thermal Theory)**:
   - The forecast indicates a slight increase in temperature, meaning that by tomorrow, the air is expected to warm up despite the ongoing cloud cover from the rain.
   - A rise from 28°C to 30°C suggests that daytime warming may still occur even after consecutive rainy days.

2. **Influence of Previous Rain (Precipitation Theory)**:
   - Rain can have a cooling effect, particularly if it has been continuous as is the case. The evaporation of water can lead to a drop in temperature.
   - However, if the rain 

In [6]:
# Chain of Thought Example: Reasoning through steps
def chain_of_thought(query, context):

# Constructing the chat messages
    messages = [
        {"role": "system", "content": "You are a helpful assistant in the field of medicine."},
        {"role": "user", "content": f"Let's break down the reasoning for the following question step by step:\n\n{query}\n\nRelevant context:\n{context}\n\nFirst, summarize the possible causes, then explore diagnostic tests or other relevant factors before giving a final conclusion."}
    ]
    
    response = client.chat.completions.create(
        messages = messages,
        model    = "gpt-4o-mini",
    )
    
    return response.choices[0].message.content


In [7]:
# Example Data (Patient's symptoms)
context = """
The patient has been experiencing a persistent dry cough for the past three weeks.
Other symptoms include mild fever, shortness of breath, and occasional chest tightness.
There is a history of seasonal allergies but no known history of asthma or chronic lung disease.
The patient is a non-smoker and has not been exposed to any known irritants.
"""

query = "What could be causing a patient's persistent cough?"

In [8]:
result = chain_of_thought(query, context)
print(result)

Let's break down the potential causes of the patient's persistent dry cough, considering the relevant context provided.

### Possible Causes of Persistent Cough:

1. **Upper Respiratory Infection (URI)**:
   - Viral infections (e.g., common cold) can cause a dry cough that persists even after other symptoms have resolved.
   - Bacterial infections (e.g., bronchitis) may also contribute.

2. **Allergic Rhinitis**:
   - The patient's history of seasonal allergies could lead to post-nasal drip, which can cause a dry cough.

3. **Asthma**:
   - While there is no known history of asthma, new-onset asthma or allergic asthma can present with a dry cough, especially if triggered by allergens.

4. **Chronic Sinusitis**:
   - Inflammation of the sinuses can lead to persistent cough due to drainage into the throat (post-nasal drip).

5. **Gastroesophageal Reflux Disease (GERD)**:
   - Acid reflux can irritate the throat and airways, leading to a chronic cough.

6. **Environmental Irritants**:
   

#### To ensure the LLM is considering all possible `causes` in a Chain of Thought reasoning process:

#### 1. Define Expected Causes and Reasoning Pathways
- Pre-define a set of known causes or hypotheses for the query at hand (e.g., for a medical diagnosis, there are standard differential diagnoses). This helps set a baseline of what should be considered in the model's response.
- Create a comprehensive list of all potential causes based on domain knowledge, research papers, textbooks, or expert opinions. This list will serve as a reference to compare the LLM's response.

**Example**: If you are asking a question about **chronic cough**, your expected causes might include:
- **Infections** (e.g., viral, bacterial, tuberculosis)
- **Asthma**
- **Gastroesophageal reflux disease (GERD)**
- **Post-nasal drip**
- **Chronic obstructive pulmonary disease (COPD)**
- **Lung cancer**

#### 2. Comprehensive CoT Prompting
Ensure the prompt explicitly asks the model to list all potential causes, systematically reason through them, and eliminate or support each cause based on the symptoms or context provided.

```python
"Given the following symptoms: chronic cough, low-grade fever, and fatigue, please list all potential causes of these symptoms. For each cause, explain why it could or could not be the most likely reason. Then, provide a ranking of these causes from most likely to least likely."



#### 3. Manual Evaluation: Compare Responses to Known Causes
- **Verify the model’s response** by comparing the causes it lists with your pre-defined set of potential reasons. This comparison allows you to check if the model has missed any critical causes.
- **Check the reasoning quality**: After listing the causes, review the model’s explanation for each cause. Does it explain the reasoning process in a clear and logical way? Is there a chain of reasoning that supports the choice of each cause, or does the model seem to jump to conclusions?

**Manual Evaluation Criteria**:
- **Completeness**: Did the model consider the major categories of causes? For instance, did it include **infectious, chronic, environmental, and lifestyle factors** for a medical diagnosis?
- **Relevance**: Are the causes listed logically related to the symptoms or context provided?
- **Elimination of causes**: Did the model eliminate unlikely causes based on the reasoning (e.g., ruling out a certain cause due to lack of supporting symptoms or medical history)?

#### 4. Use of Follow-up Questions
After receiving the initial response, you can use **follow-up questions** to probe deeper into the reasoning process and ensure all possibilities were considered:

**Follow-up Examples**:
- "What about less common causes like **X** or **Y**? Could they also explain the symptoms?"
- "Are there any other **environmental** or **lifestyle factors** that could contribute to these symptoms?"
- "Have you considered rare conditions or conditions that mimic these symptoms?"

If the model fails to consider or address these follow-up queries adequately, it might indicate that it hasn't covered all possible causes.

#### 5. Check for Diversity in Reasoning
Evaluate whether the model is offering a **diverse range of reasons** or merely focusing on a few common causes. For example, a model that only focuses on common viral infections without considering autoimmune conditions, environmental triggers, or rare diseases might indicate incomplete reasoning.

You can check for diversity by introducing **alternative hypotheses** in your prompt:


#### 6. Use Multiple LLMs or Expert Validation
- **Cross-validation with multiple models**: If you're unsure whether the LLM has considered all possible causes, you can run the query through multiple instances or different models (e.g., GPT-4 and GPT-3) to compare responses and check for consistency.
- **Consult domain experts**: In specialized fields like medicine, you can compare the model's responses to expert opinions or existing literature. Expert feedback can confirm whether the model has covered all major causes or missed important ones.

#### 7. Test the Model with Edge Cases
Test the model's reasoning by **deliberately introducing edge cases** or less typical scenarios:
- **Uncommon causes**: For example, if you have a medical symptom, ask whether there are any rare conditions that could explain it.
- **Overlooked possibilities**: Check if the model recognizes environmental, lifestyle, or psychological factors that might influence the condition.
  
If the model fails to mention these less common causes, you might need to adjust the prompt to guide it toward considering a broader range of possibilities.

#### 8. Scoring or Ranking Responses
Some advanced evaluation can involve **scoring the model’s responses** based on completeness and accuracy. For example, you can use a scoring system that assigns points for:
- **Number of causes mentioned** (e.g., did it cover a broad range or focus only on one or two obvious causes?).
- **Correctness of reasoning** (e.g., did the model logically eliminate improbable causes based on symptoms?).
- **Depth of explanation** (e.g., did the model provide a detailed rationale for each cause?).
