# Task 1 - Zero Shot Classification

### Load OpenAI API Credentials

In [3]:
from getpass import getpass
openai_key = getpass("Enter your openai API Key: ")

Enter your openai API Key:  ········


In [5]:
import openai
from IPython.display import HTML
openai.api_key= openai_key

In [7]:
def get_completion(prompt, model="gpt-4o-mini"):
    messages = [{"role":"user", "content":prompt}]
    response = openai.chat.completions.create(model = model, messages=messages, temperature=0)
    return response.choices[0].message.content

### Let's try out the ChatGPT API!

In [10]:
from IPython.display import Markdown, display

In [14]:
response = get_completion(prompt="Write a blog about DecisionTree", model="gpt-4o-mini")
display(Markdown(response))                          

# Understanding Decision Trees: A Comprehensive Guide

In the realm of machine learning and data science, decision trees stand out as one of the most intuitive and widely used algorithms. Their simplicity, interpretability, and effectiveness make them a popular choice for both beginners and seasoned professionals. In this blog, we will explore what decision trees are, how they work, their advantages and disadvantages, and some practical applications.

## What is a Decision Tree?

A decision tree is a flowchart-like structure used for classification and regression tasks. It consists of nodes that represent features (or attributes) of the data, branches that represent decision rules, and leaves that represent outcomes (or target values). The tree is built by splitting the data into subsets based on the value of the features, ultimately leading to a decision or prediction.

### Structure of a Decision Tree

1. **Root Node**: The top node of the tree, representing the entire dataset.
2. **Decision Nodes**: Nodes that split the data based on certain conditions.
3. **Leaf Nodes**: Terminal nodes that provide the final output or prediction.
4. **Branches**: The connections between nodes that represent the flow of decisions.

## How Decision Trees Work

The process of building a decision tree involves several steps:

1. **Selecting the Best Feature**: The algorithm evaluates all features and selects the one that best separates the data into distinct classes. Common criteria for this selection include:
   - **Gini Impurity**: Measures the impurity of a dataset. A lower Gini score indicates a better split.
   - **Entropy**: Measures the amount of disorder or uncertainty in the dataset. The goal is to minimize entropy after the split.
   - **Mean Squared Error (MSE)**: Used for regression tasks to minimize the variance in the target variable.

2. **Splitting the Data**: Once the best feature is selected, the data is split into subsets based on the feature's values.

3. **Recursion**: The process is repeated for each subset, creating new decision nodes until a stopping criterion is met (e.g., maximum depth, minimum samples per leaf, or no further improvement).

4. **Pruning**: To avoid overfitting, the tree may be pruned by removing nodes that provide little predictive power.

## Advantages of Decision Trees

- **Interpretability**: Decision trees are easy to understand and visualize, making them accessible for non-technical stakeholders.
- **No Need for Feature Scaling**: Unlike algorithms such as SVM or K-means, decision trees do not require normalization or standardization of features.
- **Handles Both Numerical and Categorical Data**: Decision trees can work with various data types without the need for extensive preprocessing.
- **Robust to Outliers**: Decision trees are less sensitive to outliers compared to other algorithms.

## Disadvantages of Decision Trees

- **Overfitting**: Decision trees can easily become too complex, capturing noise in the data rather than the underlying pattern.
- **Instability**: Small changes in the data can lead to different tree structures, making them less robust.
- **Bias towards Dominant Classes**: In imbalanced datasets, decision trees may favor the majority class.

## Practical Applications of Decision Trees

Decision trees are versatile and can be applied in various domains, including:

- **Healthcare**: Predicting patient outcomes based on medical history and symptoms.
- **Finance**: Assessing credit risk and determining loan approvals.
- **Marketing**: Segmenting customers based on purchasing behavior and preferences.
- **Manufacturing**: Predicting equipment failures and optimizing maintenance schedules.

## Conclusion

Decision trees are a powerful tool in the data scientist's toolkit, offering a blend of simplicity and effectiveness. While they have their limitations, understanding how to leverage decision trees can lead to valuable insights and predictions across various fields. As you delve deeper into machine learning, consider incorporating decision trees into your projects, and explore their potential to enhance your data-driven decision-making processes. 

Whether you're a beginner or an experienced practitioner, mastering decision trees can significantly enhance your analytical capabilities and help you tackle complex problems with confidence. Happy learning!

In [18]:
print(response)

# Understanding Decision Trees: A Comprehensive Guide

In the realm of machine learning and data science, decision trees stand out as one of the most intuitive and widely used algorithms. Their simplicity, interpretability, and effectiveness make them a popular choice for both beginners and seasoned professionals. In this blog, we will explore what decision trees are, how they work, their advantages and disadvantages, and some practical applications.

## What is a Decision Tree?

A decision tree is a flowchart-like structure used for classification and regression tasks. It consists of nodes that represent features (or attributes) of the data, branches that represent decision rules, and leaves that represent outcomes (or target values). The tree is built by splitting the data into subsets based on the value of the features, ultimately leading to a decision or prediction.

### Structure of a Decision Tree

1. **Root Node**: The top node of the tree, representing the entire dataset.
2. **

# Task 1 - Zero-Shot Classification ( without providing any output examples)

In [21]:
reviews = [
    f"""
    Just received the Bluetooth speaker I ordered for beach outings, and it's fantastic.
    The sound quality is impressively clear with just the right amount of bass.
    It's also waterproof, which tested true during a recent splashing incident.
    Though it's compact, the volume can really fill the space.
    The price was a bargain for such high-quality sound.
    Shipping was also on point, arriving two days early in secure packaging.
    """,
    f"""
    Needed a new kitchen blender, but this model has been a nightmare.
    It's supposed to handle various foods, but it struggles with anything tougher than cooked vegetables.
    It's also incredibly noisy, and the 'easy-clean' feature is a joke; food gets stuck under the blades constantly.
    I thought the brand meant quality, but this product has proven me wrong.
    Plus, it arrived three days late. Definitely not worth the expense.
    """
]

In [27]:
response = []

for review in reviews:
    prompt = f"""
    Act as a product review analyst. Given the following review, display the overall sentiment for the review as only one of the following:
    Positive, Negative OR Neutral

    '''{review}'''
    """
    responses = get_completion(prompt=prompt, model="gpt-4o-mini")
    response.append(responses)

In [29]:
for review, response in zip(reviews, response):
    print("Review: ", review)
    print("Sentiment :", response)
    print("***************************")
    print('\n')

Review:  
    Just received the Bluetooth speaker I ordered for beach outings, and it's fantastic.
    The sound quality is impressively clear with just the right amount of bass.
    It's also waterproof, which tested true during a recent splashing incident.
    Though it's compact, the volume can really fill the space.
    The price was a bargain for such high-quality sound.
    Shipping was also on point, arriving two days early in secure packaging.
    
Sentiment : Positive
***************************


Review:  
    Needed a new kitchen blender, but this model has been a nightmare.
    It's supposed to handle various foods, but it struggles with anything tougher than cooked vegetables.
    It's also incredibly noisy, and the 'easy-clean' feature is a joke; food gets stuck under the blades constantly.
    I thought the brand meant quality, but this product has proven me wrong.
    Plus, it arrived three days late. Definitely not worth the expense.
    
Sentiment : Negative
*********

# Task 2 - Few-Shot Classification

In [32]:
response = []

for review in reviews:
    prompt = f"""
              Act as a product review analyst.
              Given the following review,
              Display only the overall sentiment for the review:

              Try to classify it by using the following examples as a reference:

              Review: Just received the Laptop I ordered for work, and it's amazing.
              Sentiment: 😊

              Review: Needed a new mechanical keyboard, but this model has been totally disappointing.
              Sentiment: 😡

              Review: ```{review}```
              """
    responses = get_completion(prompt=prompt, model="gpt-4o-mini")
    response.append(responses)

In [34]:
for review, response in zip(reviews, response):
    print("Review: ", review)
    print("Sentiment :", response)
    print("***************************")
    print('\n')

Review:  
    Just received the Bluetooth speaker I ordered for beach outings, and it's fantastic.
    The sound quality is impressively clear with just the right amount of bass.
    It's also waterproof, which tested true during a recent splashing incident.
    Though it's compact, the volume can really fill the space.
    The price was a bargain for such high-quality sound.
    Shipping was also on point, arriving two days early in secure packaging.
    
Sentiment : Sentiment: 😊
***************************


Review:  
    Needed a new kitchen blender, but this model has been a nightmare.
    It's supposed to handle various foods, but it struggles with anything tougher than cooked vegetables.
    It's also incredibly noisy, and the 'easy-clean' feature is a joke; food gets stuck under the blades constantly.
    I thought the brand meant quality, but this product has proven me wrong.
    Plus, it arrived three days late. Definitely not worth the expense.
    
Sentiment : Sentiment: 😡
*

# Task 3 - Coding Tasks - Python

In [37]:
prompt =f"""
Act as an expert in generating python code.

Your task is to generate python code to build a "chain of thought" prompt pattern how deep seek r1 using
by openai python library
"""

response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

To create a "chain of thought" prompt pattern for deep seeking using the OpenAI Python library, we can structure the code to generate a prompt that encourages the model to think step-by-step through a problem. This approach is particularly useful for complex tasks where reasoning is required.

Below is an example of how to implement this in Python using the OpenAI library. Make sure you have the OpenAI library installed (`pip install openai`) and that you have your API key set up.

```python
import openai

# Set your OpenAI API key
openai.api_key = 'your-api-key-here'

def generate_chain_of_thought_prompt(question):
    # Create a prompt that encourages a step-by-step reasoning process
    prompt = (
        "Let's think through this step by step. "
        "I will break down the problem and analyze each part. "
        f"Here is the question: {question}\n\n"
        "1. First, I need to understand the key components of the question.\n"
        "2. Next, I will identify any relevant information or concepts.\n"
        "3. Then, I will analyze how these components interact with each other.\n"
        "4. Finally, I will draw a conclusion based on my analysis.\n\n"
        "Now, let's start with step 1."
    )
    return prompt

def get_response_from_openai(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",  # or any other model you want to use
        messages=[
            {"role": "user", "content": prompt}
        ],
        max_tokens=1500,  # Adjust based on your needs
        temperature=0.7,  # Adjust for creativity
    )
    return response['choices'][0]['message']['content']

def main():
    question = "What are the implications of quantum computing on cryptography?"
    prompt = generate_chain_of_thought_prompt(question)
    response = get_response_from_openai(prompt)
    print("Response from OpenAI:\n", response)

if __name__ == "__main__":
    main()
```

### Explanation:
1. **API Key**: Replace `'your-api-key-here'` with your actual OpenAI API key.
2. **Prompt Generation**: The `generate_chain_of_thought_prompt` function creates a structured prompt that guides the model to think through the question step-by-step.
3. **OpenAI API Call**: The `get_response_from_openai` function sends the generated prompt to the OpenAI API and retrieves the response.
4. **Main Function**: The `main` function sets a sample question, generates the prompt, and prints the response from the model.

### Usage:
- You can modify the `question` variable in the `main` function to test different queries.
- Adjust the `max_tokens` and `temperature` parameters in the API call to control the length and creativity of the response.

This code provides a framework for generating a "chain of thought" prompt pattern that can be adapted for various questions and scenarios.

In [39]:
prompt =f"""
Act as an expert in generating python code.

Your task is to generate python code to build a "supervised maching learning by using Logistic Regression" for
Titanic Dataset by openai python library
"""

response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

To build a supervised machine learning model using Logistic Regression on the Titanic dataset, you can use the `pandas`, `scikit-learn`, and `numpy` libraries in Python. Below is a step-by-step guide along with the code to achieve this.

### Step 1: Install Required Libraries
Make sure you have the required libraries installed. You can install them using pip if you haven't done so already:

```bash
pip install pandas scikit-learn numpy
```

### Step 2: Load the Titanic Dataset
You can download the Titanic dataset from Kaggle or use a CSV file if you have it locally. For this example, let's assume you have a CSV file named `titanic.csv`.

### Step 3: Preprocess the Data
You need to preprocess the data by handling missing values, encoding categorical variables, and selecting relevant features.

### Step 4: Build and Train the Logistic Regression Model
Finally, you will build and train the Logistic Regression model.

Here is the complete code:

```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Step 1: Load the dataset
data = pd.read_csv('titanic.csv')

# Step 2: Preprocess the data
# Drop columns that are not useful for prediction
data = data.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1)

# Fill missing values
data['Age'].fillna(data['Age'].median(), inplace=True)
data['Embarked'].fillna(data['Embarked'].mode()[0], inplace=True)

# Convert categorical variables to numerical
data = pd.get_dummies(data, columns=['Sex', 'Embarked'], drop_first=True)

# Step 3: Define features and target variable
X = data.drop('Survived', axis=1)
y = data['Survived']

# Step 4: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 6: Create and train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Step 7: Make predictions
y_pred = model.predict(X_test)

# Step 8: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(class_report)
```

### Explanation of the Code:
1. **Loading the Dataset**: The Titanic dataset is loaded into a pandas DataFrame.
2. **Preprocessing**:
   - Unnecessary columns are dropped.
   - Missing values in the 'Age' and 'Embarked' columns are filled.
   - Categorical variables ('Sex' and 'Embarked') are converted to numerical using one-hot encoding.
3. **Feature and Target Definition**: Features (X) and target variable (y) are defined.
4. **Train-Test Split**: The dataset is split into training and testing sets.
5. **Feature Scaling**: StandardScaler is used to standardize the features.
6. **Model Creation and Training**: A Logistic Regression model is created and trained on the training data.
7. **Predictions**: Predictions are made on the test set.
8. **Evaluation**: The model's accuracy, confusion matrix, and classification report are printed.

Make sure to adjust the path to the Titanic dataset CSV file as needed. This code provides a basic implementation of Logistic Regression for the Titanic dataset. You can further enhance it by performing more advanced feature engineering, hyperparameter tuning, and model evaluation techniques.

In [43]:
prompt =f"""
Act as an expert in generating python code.

Your task is to generate python code to build a "supervised maching learning by using Deep Neural Network (MLP)" for
Titanic Dataset by openai python library.
please note to do preprocessing extensively basis below instruction and post then build MultiLayer Perceptron Model-
1) missing value treatement by using knn imputation
2) encoding 3) Feature scaling 4) outlier treatement 5) imbalance treatement 
"""

response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

To build a supervised machine learning model using a Deep Neural Network (MLP) for the Titanic dataset, we will follow the steps you outlined: preprocessing the data, treating missing values, encoding categorical variables, scaling features, handling outliers, and addressing class imbalance. Below is a complete Python code that accomplishes this using libraries such as `pandas`, `numpy`, `scikit-learn`, and `tensorflow`.

Make sure you have the necessary libraries installed. You can install them using pip if you haven't done so already:

```bash
pip install pandas numpy scikit-learn tensorflow imbalanced-learn
```

Here’s the complete code:

```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import KNNImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import IsolationForest
from sklearn.utils import resample
from tensorflow import keras
from tensorflow.keras import layers

# Load the Titanic dataset
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
data = pd.read_csv(url)

# Display the first few rows of the dataset
print(data.head())

# Step 1: Preprocessing
# Drop columns that won't be used
data.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1, inplace=True)

# Step 2: Missing value treatment using KNN Imputation
# Identify categorical and numerical columns
categorical_cols = ['Sex', 'Embarked']
numerical_cols = ['Age', 'Fare', 'SibSp', 'Parch']

# Create a KNN Imputer
imputer = KNNImputer(n_neighbors=5)

# Impute missing values
data[numerical_cols] = imputer.fit_transform(data[numerical_cols])

# Step 3: Encoding categorical variables
# One-hot encoding for categorical features
data = pd.get_dummies(data, columns=categorical_cols, drop_first=True)

# Step 4: Feature scaling
scaler = StandardScaler()
data[numerical_cols] = scaler.fit_transform(data[numerical_cols])

# Step 5: Outlier treatment using Isolation Forest
iso_forest = IsolationForest(contamination=0.05)
outliers = iso_forest.fit_predict(data)
data = data[outliers != -1]

# Step 6: Imbalance treatment using resampling
X = data.drop('Survived', axis=1)
y = data['Survived']

# Separate majority and minority classes
X_majority = X[y == 0]
y_majority = y[y == 0]
X_minority = X[y == 1]
y_minority = y[y == 1]

# Upsample minority class
X_minority_upsampled, y_minority_upsampled = resample(X_minority, y_minority,
                                                       replace=True,     # sample with replacement
                                                       n_samples=len(y_majority),    # to match majority class
                                                       random_state=123) # reproducible results

# Combine majority class with upsampled minority class
X_balanced = pd.concat([X_majority, X_minority_upsampled])
y_balanced = pd.concat([y_majority, y_minority_upsampled])

# Step 7: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_balanced, y_balanced, test_size=0.2, random_state=42)

# Step 8: Build the MLP model
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')  # Output layer for binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Step 9: Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

# Step 10: Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {accuracy:.4f}')
```

### Explanation of the Code:
1. **Data Loading**: The Titanic dataset is loaded from a URL.
2. **Preprocessing**:
   - Unnecessary columns are dropped.
   - Missing values in numerical columns are treated using KNN imputation.
   - Categorical variables are encoded using one-hot encoding.
   - Features are scaled using `StandardScaler`.
   - Outliers are detected and removed using the Isolation Forest algorithm.
   - Class imbalance is addressed by upsampling the minority class.
3. **Model Building**: A simple MLP model is created using Keras with two hidden layers.
4. **Training and Evaluation**: The model is trained on the training set and evaluated on the test set.

Make sure to adjust the parameters and architecture of the model as needed based on your specific requirements and dataset characteristics.

# Task 4 - Coding Tasks - SQL

In [46]:
prompt = f"""
Act as an expert in generating SQL code.

understanding the following schema of the database tables carefully.
Table departments, columns = [DepID, DepName]
Table employees, columns = [EmpID, EmpName, DepName]
Table salaries, columns = [EmpID, Salary]

create a MySQL query for the employee with max salary in the 'IT' Department.
"""

response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

To retrieve the employee with the maximum salary in the 'IT' department, you can use a SQL query that joins the `employees`, `salaries`, and `departments` tables. Here’s how you can construct the query:

```sql
SELECT e.EmpID, e.EmpName, s.Salary
FROM employees e
JOIN salaries s ON e.EmpID = s.EmpID
JOIN departments d ON e.DepName = d.DepName
WHERE d.DepName = 'IT'
ORDER BY s.Salary DESC
LIMIT 1;
```

### Explanation:
1. **SELECT**: We select the employee ID, employee name, and salary.
2. **FROM**: We start from the `employees` table (aliased as `e`).
3. **JOIN**: We join the `salaries` table (aliased as `s`) on the `EmpID` to get the salary information. We also join the `departments` table (aliased as `d`) on the `DepName` to filter by department.
4. **WHERE**: We filter the results to only include employees in the 'IT' department.
5. **ORDER BY**: We order the results by salary in descending order to get the highest salary first.
6. **LIMIT 1**: We limit the results to just one record, which will be the employee with the maximum salary in the 'IT' department.

This query will return the employee with the highest salary in the specified department.

In [48]:
prompt = f"""
Act as an expert in generating SQL code.

understanding the following schema of the database tables carefully.
Table departments, columns = [DepID, DepName]
Table employees, columns = [EmpID, EmpName, DepName]
Table salaries, columns = [EmpID, Salary]

create a ProcSQL query in SAS for the employee with max salary in the 'IT' Department.
"""

response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

To create a Proc SQL query in SAS that retrieves the employee with the maximum salary in the 'IT' department, you can follow these steps:

1. Join the `employees` table with the `salaries` table to get the salary information for each employee.
2. Filter the results to include only those employees who belong to the 'IT' department.
3. Use the `MAX` function to find the maximum salary and then retrieve the corresponding employee details.

Here is the SAS Proc SQL code to achieve this:

```sas
proc sql;
    select e.EmpID, e.EmpName, s.Salary
    from employees e
    join salaries s on e.EmpID = s.EmpID
    where e.DepName = 'IT'
    and s.Salary = (select max(Salary) 
                    from salaries s2 
                    join employees e2 on s2.EmpID = e2.EmpID 
                    where e2.DepName = 'IT')
    ;
quit;
```

### Explanation:
- The `select` statement retrieves the employee ID, employee name, and salary.
- The `from` clause specifies the `employees` table (aliased as `e`) and joins it with the `salaries` table (aliased as `s`) on the `EmpID`.
- The `where` clause filters the results to include only employees in the 'IT' department.
- The subquery `(select max(Salary) ...)` finds the maximum salary for employees in the 'IT' department, ensuring that only the employee(s) with that maximum salary are returned.

This query will return the employee(s) with the highest salary in the 'IT' department. If there are multiple employees with the same maximum salary, all of them will be included in the result.

# Task 5 - Information Extraction

In [51]:
clinical_note = """
60-year-old man in NAD with a h/o CAD, DM2, asthma, pharyngitis, SBP,
and HTN on altace for 8 years awoke from sleep around 1:00 am this morning
with a sore throat and swelling of the tongue.
He came immediately to the ED because he was having difficulty swallowing and
some trouble breathing due to obstruction caused by the swelling.
He did not have any associated SOB, chest pain, itching, or nausea.
He has not noticed any rashes.
He says that he feels like it is swollen down in his esophagus as well.
He does not recall vomiting but says he might have retched a bit.
In the ED he was given 25mg benadryl IV, 125 mg solumedrol IV,
and pepcid 20 mg IV.
Family history of CHF and esophageal cancer (father).
"""

In [53]:
prompt = f"""
Act as an expert in analyzing and understanding clinical doctor notes in healthcare.
Extract all symptoms only from the clinical note information below.
Differentiate between symptoms that are present vs. absent.
Give me the probability (high/ medium/ low) of how sure you are about the result.
Add a note on the probabilities and why you think so.

Output as a markdown table with the following columns,
all symptoms should be expanded and no acronyms unless you don't know:

Symptoms | Present/Denies | Probability.

Also expand all acronyms.
Output that also as a separate appendix table in Markdown.

Clinical Note:
```{clinical_note}```
"""
response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

Here is the extracted information from the clinical note regarding symptoms:

| Symptoms                                   | Present/Denies | Probability |
|--------------------------------------------|----------------|-------------|
| Sore throat                                | Present        | High        |
| Swelling of the tongue                     | Present        | High        |
| Difficulty swallowing                       | Present        | High        |
| Trouble breathing                           | Present        | High        |
| Shortness of breath                         | Denies         | High        |
| Chest pain                                 | Denies         | High        |
| Itching                                    | Denies         | High        |
| Nausea                                     | Denies         | High        |
| Rashes                                      | Denies         | High        |
| Swelling in the esophagus                  | Present        | Medium      |
| Vomiting                                   | Denies         | Medium      |
| Retching                                   | Present        | Medium      |

### Note on Probabilities:
- **High Probability**: Symptoms such as sore throat, swelling of the tongue, difficulty swallowing, trouble breathing, and the absence of shortness of breath, chest pain, itching, nausea, and rashes are clearly stated in the clinical note. The language used is definitive, indicating a strong presence or absence of these symptoms.
- **Medium Probability**: The swelling in the esophagus and retching are mentioned but are less definitive. The patient expresses a feeling of swelling in the esophagus, which is subjective and not directly observed. Retching is mentioned as a possibility rather than a certainty, leading to a medium probability rating.

### Appendix Table

| Acronym          | Expanded Form                     |
|------------------|-----------------------------------|
| NAD              | No Acute Distress                 |
| h/o              | History of                        |
| CAD              | Coronary Artery Disease           |
| DM2              | Diabetes Mellitus Type 2         |
| SBP              | Systolic Blood Pressure           |
| HTN              | Hypertension                      |
| IV               | Intravenous                       |
| CHF              | Congestive Heart Failure          | 
| esophageal cancer | Esophageal Cancer                 |

# Task - 6 : Translation

In [66]:
prompt = """You are an expert translator. Translate the given text from English to German, Spanish, Hindi, Tamil, Telgu, 
Gujarati, Marathi, Bengali, tulu, oriya, maithili, malayalam,kannada,punjabi, and nepali.

Text:"Subject: Request for Sick Leave on [Leave Dates]
Dear Manager,
I am feeling unwell and would like to request sick leave for [Leave Dates]. I will ensure to complete any pending tasks once I return and keep you updated if there are any changes.
Please let me know if you need any further information."

Transaltion:
"""
response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

Here are the translations of the provided text into the requested languages:

**German:**
Betreff: Antrag auf Krankheitsurlaub für [Urlaubsdaten]
Sehr geehrter Manager,
ich fühle mich unwohl und möchte für [Urlaubsdaten] Krankheitsurlaub beantragen. Ich werde sicherstellen, dass ich alle ausstehenden Aufgaben erledige, sobald ich zurück bin, und Sie informieren, falls es Änderungen gibt.
Bitte lassen Sie mich wissen, ob Sie weitere Informationen benötigen.

**Spanish:**
Asunto: Solicitud de licencia por enfermedad del [Fechas de licencia]
Estimado/a Gerente,
me siento mal y me gustaría solicitar una licencia por enfermedad del [Fechas de licencia]. Me aseguraré de completar cualquier tarea pendiente una vez que regrese y le mantendré informado/a si hay algún cambio.
Por favor, hágame saber si necesita más información.

**Hindi:**
विषय: [अवकाश तिथियों] पर बीमार छुट्टी के लिए अनुरोध
प्रिय प्रबंधक,
मैं अस्वस्थ महसूस कर रहा हूँ और [अवकाश तिथियों] के लिए बीमार छुट्टी का अनुरोध करना चाहता हूँ। मैं सुनिश्चित करूंगा कि लौटने के बाद कोई भी लंबित कार्य पूरा कर दूं और यदि कोई परिवर्तन होता है तो आपको अपडेट रखूंगा।
कृपया मुझे बताएं कि क्या आपको किसी और जानकारी की आवश्यकता है।

**Tamil:**
หัวข้อ: [விடுமுறை தேதிகள்] இல் நோய்வாய்ப்பு விடுமுறை கோரிக்கை
அன்புள்ள மேலாளர்,
நான் உடல் நலக்குறைவால் பாதிக்கப்படுகிறேன் மற்றும் [விடுமுறை தேதிகள்] இல் நோய்வாய்ப்பு விடுமுறை கோர விரும்புகிறேன். நான் திரும்பிய பிறகு எந்தவொரு நிலுவையில் உள்ள பணிகளையும் முடிக்க உறுதி செய்கிறேன் மற்றும் எந்த மாற்றங்களும் இருந்தால் உங்களை புதுப்பிக்கிறேன்.
மேலும் தகவலுக்கு நீங்கள் என்னை தொடர்பு கொள்ளவும்.

**Telugu:**
విషయం: [విడుదల తేదీలు] పై రోగ సెలవు కోసం అభ్యర్థన
ప్రియమైన మేనేజర్,
నేను ఆరోగ్యంగా లేను మరియు [విడుదల తేదీలు] కోసం రోగ సెలవు కోరుతున్నాను. నేను తిరిగి వచ్చిన తర్వాత ఎలాంటి పెండింగ్ పనులను పూర్తి చేస్తాను మరియు మార్పులు ఉంటే మీకు సమాచారం అందిస్తాను.
మీకు మరింత సమాచారం అవసరమైతే దయచేసి నాకు తెలియజేయండి.

**Gujarati:**
વિષય: [છુટ્ટા તારીખો] પર બીમારીની રજા માટે વિનંતી
પ્રિય મેનેજર,
હું અસ્વસ્થ અનુભવી રહ્યો છું અને [છુટ્ટા તારીખો] માટે બીમારીની રજા માંગું છું. હું પાછા ફર્યા પછી કોઈપણ બાકી કામ પૂર્ણ કરવા ખાતરી આપીશ અને જો કોઈ ફેરફાર થાય તો તમને અપડેટ રાખીશ.
કૃપા કરીને મને જણાવો કે શું તમને વધુ માહિતીની જરૂર છે.

**Marathi:**
विषय: [अवकाश तारीख] वर आजारी सुट्टीसाठी विनंती
प्रिय व्यवस्थापक,
मी अस्वस्थ आहे आणि [अवकाश तारीख] साठी आजारी सुट्टीची विनंती करतो. मी परत आल्यानंतर कोणतीही प्रलंबित कामे पूर्ण करण्याची खात्री करतो आणि कोणतेही बदल झाल्यास तुम्हाला अपडेट ठेवतो.
कृपया तुम्हाला आणखी माहिती हवी असल्यास मला कळवा.

**Bengali:**
বিষয়: [ছুটির তারিখ] এ অসুস্থ ছুটির জন্য আবেদন
প্রিয় ম্যানেজার,
আমি অসুস্থ অনুভব করছি এবং [ছুটির তারিখ] এ অসুস্থ ছুটির জন্য আবেদন করতে চাই। আমি ফিরে আসার পর যে কোনও বাকি কাজ সম্পন্ন করার নিশ্চয়তা দেব এবং যদি কোনও পরিবর্তন হয় তবে আপনাকে আপডেট রাখব।
আপনার যদি আরও তথ্যের প্রয়োজন হয় তবে দয়া করে আমাকে জানান।

**Tulu:**
ವಿಷಯ: [ಊರ ದಿನಾಂಕಗಳು] ರೋಗಾವಸ್ಥೆ ರಜೆಗೆ ವಿನಂತಿ
ಪ್ರಿಯ ಮ್ಯಾನೇಜರ್,
ನಾನು ಆರೋಗ್ಯವಾಗಿಲ್ಲ ಮತ್ತು [ಊರ ದಿನಾಂಕಗಳು] ರೋಗಾವಸ್ಥೆ ರಜೆಗೆ ವಿನಂತಿ ಮಾಡುತ್ತೇನೆ. ನಾನು ಹಿಂದಿರುಗಿದ ನಂತರ ಯಾವುದೇ ಬಾಕಿ ಕೆಲಸಗಳನ್ನು ಪೂರ್ಣಗೊಳಿಸಲು ಖಚಿತಪಡಿಸುತ್ತೇನೆ ಮತ್ತು ಯಾವುದೇ ಬದಲಾವಣೆಗಳಿದ್ದರೆ ನಿಮಗೆ ಮಾಹಿತಿ ನೀಡುತ್ತೇನೆ.
ನಿಮಗೆ ಇನ್ನಷ್ಟು ಮಾಹಿತಿಯ ಅಗತ್ಯವಿದ್ದರೆ ದಯವಿಟ್ಟು ನನಗೆ ತಿಳಿಸಿ.

**Odia:**
ବିଷୟ: [ଛୁଟି ତାରିଖ] ରେ ରୋଗ ଛୁଟି ପାଇଁ ଅନୁରୋଧ
ପ୍ରିୟ ମ୍ୟାନେଜର,
ମୁଁ ଅସୁସ୍ଥ ଅନୁଭବ କରୁଛି ଏବଂ [ଛୁଟି ତାରିଖ] ରେ ରୋଗ ଛୁଟି ପାଇଁ ଅନୁରୋଧ କରୁଛି। ମୁଁ ଫେରିବା ପରେ କୌଣସି ବକି ତାଲିକା କାମ ସମ୍ପୂର୍ଣ୍ଣ କରିବାକୁ ନିଶ୍ଚିତ କରିବି ଏବଂ ଯଦି କୌଣସି ପରିବର୍ତ୍ତନ ହୁଏ ତେବେ ଆପଣଙ୍କୁ ଅପଡେଟ୍ ରଖିବି।
ଦୟାକରି ଆପଣଙ୍କୁ ଅଧିକ ସୂଚନା ଆବଶ୍ୟକ ହେଲେ ମୋତେ ଜଣାଇବେ।

**Maithili:**
विषय: [छुट्टी के तिथि] पर बीमार छुट्टी के लिए अनुरोध
प्रिय प्रबंधक,
हम अस्वस्थ महसूस कर रहल छी और [छुट्टी के तिथि] पर बीमार छुट्टी के लिए अनुरोध कर रहल छी। हम लौटने के बाद कोई भी लंबित कार्य पूरा करब और यदि कोई परिवर्तन होए त हम अपने के अपडेट रखब।
कृपया बताउ कि यदि अपने के और जानकारी के आवश्यकता अछि।

**Malayalam:**
വിഷയം: [അവധിയുടെ തീയതികൾ] നുള്ള രോഗ അവധിക്ക് അപേക്ഷ
പ്രിയ മാനേജർ,
ഞാൻ അസുഖം അനുഭവിക്കുന്നു, [അവധിയുടെ തീയതികൾ] നുള്ള രോഗ അവധിക്ക് അപേക്ഷിക്കാനാണ് ആഗ്രഹിക്കുന്നത്. ഞാൻ തിരികെ വന്ന ശേഷം ബാക്കി ഉള്ള എല്ലാ ജോലികളും പൂർത്തിയാക്കാൻ ഉറപ്പുനൽകുന്നു, കൂടാതെ മാറ്റങ്ങൾ ഉണ്ടെങ്കിൽ നിങ്ങളെ അപ്ഡേറ്റ് ചെയ്യാൻ ഞാൻ ശ്രമിക്കും.
കൂടുതൽ വിവരങ്ങൾ ആവശ്യമുണ്ടെങ്കിൽ ദയവായി എന്നെ അറിയിക്കുക.

**Kannada:**
ವಿಷಯ: [ಅವಕಾಶ ದಿನಾಂಕಗಳು] ರೋಗ ರಜೆಗೆ ವಿನಂತಿ
ಪ್ರಿಯ ಮ್ಯಾನೇಜರ್,
ನಾನು ಆರೋಗ್ಯವಾಗಿಲ್ಲ ಮತ್ತು [ಅವಕಾಶ ದಿನಾಂಕಗಳು] ರೋಗ ರಜೆಗೆ ವಿನಂತಿ ಮಾಡುತ್ತೇನೆ. ನಾನು ಹಿಂದಿರುಗಿದ ನಂತರ ಯಾವುದೇ ಬಾಕಿ ಕೆಲಸಗಳನ್ನು ಪೂರ್ಣಗೊಳಿಸಲು ಖಚಿತಪಡಿಸುತ್ತೇನೆ ಮತ್ತು ಯಾವುದೇ ಬದಲಾವಣೆಗಳಿದ್ದರೆ ನಿಮಗೆ ಮಾಹಿತಿ ನೀಡುತ್ತೇನೆ.
ನಿಮಗೆ ಇನ್ನಷ್ಟು ಮಾಹಿತಿಯ ಅಗತ್ಯವಿದ್ದರೆ ದಯವಿಟ್ಟು ನನಗೆ ತಿಳಿಸಿ.

**Punjabi:**
ਵਿਸ਼ਾ: [ਛੁੱਟੀ ਦੀਆਂ ਤਾਰੀਖਾਂ] 'ਤੇ ਬਿਮਾਰੀ ਦੀ ਛੁੱਟੀ ਲਈ ਬੇਨਤੀ
ਪਿਆਰੇ ਮੈਨੇਜਰ,
ਮੈਂ ਅਸੁਥ ਹੋ ਰਿਹਾ ਹਾਂ ਅਤੇ [ਛੁੱਟੀ ਦੀਆਂ ਤਾਰੀਖਾਂ] ਲਈ ਬਿਮਾਰੀ ਦੀ ਛੁੱਟੀ ਦੀ ਬੇਨਤੀ ਕਰਨਾ ਚਾਹੁੰਦਾ ਹਾਂ। ਮੈਂ ਵਾਪਸ ਆਉਣ 'ਤੇ ਕਿਸੇ ਵੀ ਬਾਕੀ ਕੰਮ ਨੂੰ ਪੂਰਾ ਕਰਨ ਦੀ ਯਕੀਨੀ ਬਣਾਵਾਂਗਾ ਅਤੇ ਜੇ ਕੋਈ ਬਦਲਾਅ ਹੋਵੇ ਤਾਂ ਤੁਹਾਨੂੰ ਅਪਡੇਟ ਰੱਖਾਂਗਾ।
ਕਿਰਪਾ ਕਰਕੇ ਮੈਨੂੰ ਦੱਸੋ ਕਿ ਕੀ ਤੁਹਾਨੂੰ ਹੋਰ ਜਾਣਕਾਰੀ ਦੀ ਲੋੜ ਹੈ।

**Nepali:**
विषय: [छुट्टीको मिति] मा बिरामी बिदाको लागि अनुरोध
प्रिय प्रबन्धक,
म अस्वस्थ महसुस गर्दैछु र [छुट्टीको मिति] मा बिरामी बिदाको लागि अनुरोध गर्न चाहन्छु। म फर्केपछि कुनै पनि बाँकी काम पूरा गर्ने सुनिश्चित गर्नेछु र यदि कुनै परिवर्तन भएमा तपाईंलाई अपडेट राख्नेछु।
कृपया मलाई बताउनुहोस् कि यदि तपाईंलाई थप जानकारीको आवश्यकता छ भने। 

Feel free to ask if you need any further assistance!

In [70]:
prompt = """You are an expert translator. Translate the given text from English to German, Spanish, Hindi, Tamil, Telgu, 
Gujarati, Marathi, Bengali, tulu, oriya, maithili, malayalam,kannada,punjabi, and nepali.

Text:"Hi, how are you"

Transaltion:
"""
response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

Here are the translations for "Hi, how are you" in the requested languages:

- **German**: "Hallo, wie geht es dir?"
- **Spanish**: "Hola, ¿cómo estás?"
- **Hindi**: "नमस्ते, आप कैसे हैं?"
- **Tamil**: "வணக்கம், நீங்கள் எப்படி இருக்கிறீர்கள்?"
- **Telugu**: "హాయ్, మీరు ఎలా ఉన్నారు?"
- **Gujarati**: "હાય, તમે કેમ છો?"
- **Marathi**: "नमस्कार, तुम्ही कसे आहात?"
- **Bengali**: "হ্যালো, তুমি কেমন আছো?"
- **Tulu**: "ನಮಸ್ಕಾರ, ನಿನಗೆ ಹೇಗಿದೆ?"
- **Oriya**: "ନମସ୍କାର, ତୁମେ କେମିତି ଅଛ?"
- **Maithili**: "नमस्कार, अहाँ केहन छी?"
- **Malayalam**: "ഹായ്, നീ എങ്ങനെയുണ്ട്?"
- **Kannada**: "ಹಾಯ್, ನೀವು ಹೇಗಿದ್ದೀರಿ?"
- **Punjabi**: "ਸਤ ਸ੍ਰੀ ਅਕਾਲ, ਤੁਸੀਂ ਕਿਵੇਂ ਹੋ?"
- **Nepali**: "नमस्ते, तपाईँलाई कस्तो छ?"

If you need any further assistance, feel free to ask!

# Task 7 - Document Summarization

In [73]:
doc = """
Coronaviruses are a large family of viruses which may cause illness in animals or humans.
In humans, several coronaviruses are known to cause respiratory infections ranging from the
common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).
The most recently discovered coronavirus causes coronavirus disease COVID-19.
COVID-19 is the infectious disease caused by the most recently discovered coronavirus.
This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.
COVID-19 is now a pandemic affecting many countries globally.
The most common symptoms of COVID-19 are fever, dry cough, and tiredness.
Other symptoms that are less common and may affect some patients include aches
and pains, nasal congestion, headache, conjunctivitis, sore throat, diarrhea,
loss of taste or smell or a rash on skin or discoloration of fingers or toes.
These symptoms are usually mild and begin gradually.
Some people become infected but only have very mild symptoms.
Most people (about 80%) recover from the disease without needing hospital treatment.
Around 1 out of every 5 people who gets COVID-19 becomes seriously ill and develops difficulty breathing.
Older people, and those with underlying medical problems like high blood pressure, heart and lung problems,
diabetes, or cancer, are at higher risk of developing serious illness.
However, anyone can catch COVID-19 and become seriously ill.
People of all ages who experience fever and/or  cough associated with difficulty breathing/shortness of breath,
chest pain/pressure, or loss of speech or movement should seek medical attention immediately.
If possible, it is recommended to call the health care provider or facility first,
so the patient can be directed to the right clinic.
People can catch COVID-19 from others who have the virus.
The disease spreads primarily from person to person through small droplets from the nose or mouth,
which are expelled when a person with COVID-19 coughs, sneezes, or speaks.
These droplets are relatively heavy, do not travel far and quickly sink to the ground.
People can catch COVID-19 if they breathe in these droplets from a person infected with the virus.
This is why it is important to stay at least 1 meter) away from others.
These droplets can land on objects and surfaces around the person such as tables, doorknobs and handrails.
People can become infected by touching these objects or surfaces, then touching their eyes, nose or mouth.
This is why it is important to wash your hands regularly with soap and water or clean with alcohol-based hand rub.
Practicing hand and respiratory hygiene is important at ALL times and is the best way to protect others and yourself.
When possible maintain at least a 1 meter distance between yourself and others.
This is especially important if you are standing by someone who is coughing or sneezing.
Since some infected persons may not yet be exhibiting symptoms or their symptoms may be mild,
maintaining a physical distance with everyone is a good idea if you are in an area where COVID-19 is circulating."""

In [75]:
prompt = f"""
You are an expert in generating accurate document summaries.
Generate a summary of the given document.

Document:
{doc}

constraints: Please start the summary with the delimiter "SUMMARY :" and limit the summary to 5 lines only

SUMMARY:
"""
response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

SUMMARY: Coronaviruses can cause respiratory infections in humans, with COVID-19 being the most recent and severe outbreak, first identified in Wuhan, China, in December 2019. Common symptoms include fever, cough, and tiredness, while serious cases can lead to difficulty breathing, especially in older adults and those with underlying health issues. The virus spreads primarily through respiratory droplets, making physical distancing and hand hygiene crucial for prevention. Most people recover without hospitalization, but immediate medical attention is advised for severe symptoms.

In [79]:
editorial = f"""
By calling for an all-party meeting on March 5 to discuss the delimitation exercise, Tamil Nadu Chief Minister M.K. Stalin has sought to stir up a national debate. Delimitation, as an exercise, has not seen any increase in the legislative seats since 1973, having been frozen as per the 1971 Census for parity in population growth across States. This was done to prevent States with a higher population growth from getting “rewarded” with a higher number of representatives at the cost of other States with better health indices and lower growth. The 84th constitutional Amendment had stipulated that the delimitation exercise would be based on the first Census after 2026. Is the Union government inexplicably delaying the Census exercise to allow for the delimitation exercise to be held earlier? In the normal scheme of things, the exercise would have been done after the 2031 Census, but it is now possible after 2026 once the Census (due in 2021), is conducted.
The concern of Tamil Nadu that the exercise could hurt its representation in Parliament is legitimate if the Centre intends the process to be only proportionally representative of each State’s population. This is illustrated in the population growth rates (1971-2024) in Tamil Nadu and undivided Bihar. The electorate, for which recent data is available, grew by 171% in the former as against 233% in the latter, while they had a comparable number of Lok Sabha MPs (39 versus 54, including Jharkhand). If delimitation was held and constituencies redrawn to match population growth, and even if the overall Lok Sabha tally was increased, the final number for Tamil Nadu would clearly be much lower than Bihar’s. Other States with reduced fertility rates, such as Kerala and Karnataka, will also be affected. Home Minister Amit Shah has said that there will be no “reduction on a pro-rata basis” for Southern States and that they would get their “rightful share” but there has been little clarity on whether this would mean that their proportion of representatives will be retained after delimitation. The significant increase in population since 1973 should lead to an increased number of representatives, and, therefore, a higher number of seats, especially in north India’s highly populated States. Yet, the equally important principle of federalism should suggest the need for the proportions of representation to be maintained to keep the balance of power intact across States. More importantly, the government must expedite the Census just to allay concerns that it has been delayed to facilitate an earlier and controversial delimitation. A nation striving to reach the higher pedestals of world power cannot afford to delay the basic exercise of counting the number of its own people.


"""
editorial

'\nBy calling for an all-party meeting on March 5 to discuss the delimitation exercise, Tamil Nadu Chief Minister M.K. Stalin has sought to stir up a national debate. Delimitation, as an exercise, has not seen any increase in the legislative seats since 1973, having been frozen as per the 1971 Census for parity in population growth across States. This was done to prevent States with a higher population growth from getting “rewarded” with a higher number of representatives at the cost of other States with better health indices and lower growth. The 84th constitutional Amendment had stipulated that the delimitation exercise would be based on the first Census after 2026. Is the Union government inexplicably delaying the Census exercise to allow for the delimitation exercise to be held earlier? In the normal scheme of things, the exercise would have been done after the 2031 Census, but it is now possible after 2026 once the Census (due in 2021), is conducted.\nThe concern of Tamil Nadu tha

In [81]:
prompt = f"""
You are an expert in generating accurate document summaries.
Generate a summary of the given document.

Document:
{editorial}

constraints: Please start the summary with the delimiter "SUMMARY :" and limit the summary to 5 lines only

SUMMARY:
"""
response = get_completion(prompt, model="gpt-4o-mini")
display(Markdown(response))

SUMMARY: Tamil Nadu Chief Minister M.K. Stalin has called for an all-party meeting on March 5 to discuss the implications of the delimitation exercise, which has not seen an increase in legislative seats since 1973. Concerns arise that the Union government may delay the Census to facilitate earlier delimitation, potentially reducing Tamil Nadu's parliamentary representation. Home Minister Amit Shah has assured that Southern States will retain their rightful share, but clarity is lacking. The document emphasizes the need for an expedited Census to address these concerns and maintain federal balance.