<a href="https://colab.research.google.com/github/klemenp950/ElektronskaTajnica/blob/main/appointment_email_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Doctor Appointment Email Generator with Phi-3

This notebook uses Microsoft's Phi-3 language model to intelligently process transcribed doctor appointment conversations and generate professional emails.

## 1. Install Required Libraries

Run this cell to install the necessary dependencies.

In [1]:
!pip install transformers torch accelerate bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.48.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Downloading bitsandbytes-0.48.2-py3-none-manylinux_2_24_x86_64.whl (59.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.48.2


## 2. Import Libraries

In [2]:
import json
import re
from datetime import datetime
from pathlib import Path
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 3. Load Phi-3 Model

Load Microsoft's Phi-3.5-mini-instruct. This is a small but powerful language model optimized for instruction following.

In [4]:
# Model selection
MODEL_NAME = "microsoft/Phi-3.5-mini-instruct"
print("Loading Phi-3 model...")
print(f"Model: {MODEL_NAME}")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

# Load model with automatic device mapping
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)

print("✓ Model loaded successfully!")
print(f"Device: {next(model.parameters()).device}")

Loading Phi-3 model...
Model: microsoft/Phi-3.5-mini-instruct


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

✓ Model loaded successfully!
Device: cuda:0


## 4. Load Transcription from File

Provide the path to your .txt file containing the transcribed conversation.

In [5]:
def load_transcription(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        return f.read()

## 5. Process Single Transcription File

Convert an individual transcription file into an email-ready JSON package.

In [6]:
def process_transcription_file(file_path, output_dir='email_outputs'):
    conversation = load_transcription(file_path)
    email_data = generate_email_with_llm(conversation)
    source_name = Path(file_path).stem
    output_path = save_email_to_json(email_data, conversation, output_dir, source_name=source_name)
    return email_data, output_path

## 6. Batch Process Multiple Transcription Files

Process multiple .txt files at once from a directory.

In [7]:
def process_transcription_files(directory_path, file_pattern="*.txt", output_dir='email_outputs'):
    from glob import glob
    import os

    # Find all matching files
    search_path = os.path.join(directory_path, file_pattern)
    txt_files = glob(search_path)

    if not txt_files:
        print(f"No files found matching: {search_path}")
        return []

    print(f"Found {len(txt_files)} transcription file(s)")
    print("="*60)

    results = []

    for i, txt_file in enumerate(txt_files, 1):
        print(f"\n[{i}/{len(txt_files)}] Processing: {os.path.basename(txt_file)}")

        try:
            email, output_path = process_transcription_file(txt_file, output_dir)
            results.append((txt_file, email, output_path))
            print(f"✓ Success: {os.path.basename(output_path)}")
        except Exception as e:
            print(f"✗ Error processing {os.path.basename(txt_file)}: {str(e)}")
            continue

    print("\n" + "="*60)
    print(f"Processed {len(results)}/{len(txt_files)} files successfully")

    return results

## 7. Generate Professional Email Using Phi-3

Use the language model to generate a well-formatted, professional email directly from the conversation.

In [8]:
def generate_email_with_llm(conversation):
    prompt = f"""<|system|>
You are a professional email writer. Convert the following phone conversation into a professional email that a patient would send to their doctor's office to request or confirm an appointment.

The email should:
- Be polite and professional
- Include all relevant details (patient name, reason for visit, preferred date/time)
- Be concise and well-formatted
- Use proper email etiquette

Provide the email in the following format:
SUBJECT: [subject line]

BODY:
[email body]<|end|>
<|user|>
Conversation:
{conversation}

Write a professional email for this appointment request.<|end|>
<|assistant|>
"""

    inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False).to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=400,
        temperature=0.7,
        do_sample=True,
        top_p=0.95,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
        use_cache=False # Disable caching
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract the assistant's response
    if "<|assistant|>" in response:
        response = response.split("<|assistant|>")[-1].strip()

    # Parse subject and body
    subject = "Appointment Request"
    body = response

    subject_match = re.search(r'SUBJECT:\s*(.+?)(?:\n|$)', response, re.IGNORECASE)
    if subject_match:
        subject = subject_match.group(1).strip()
        # Extract body after BODY: marker
        body_match = re.search(r'BODY:\s*(.+)', response, re.IGNORECASE | re.DOTALL)
        if body_match:
            body = body_match.group(1).strip()

    # Remove "BODY:" and any text before it from the body
    body = re.sub(r'^.*?BODY:\s*', '', body, flags=re.IGNORECASE | re.DOTALL)

    # Remove everything after "---"
    if "---" in body:
        body = body.split("---")[0].strip()


    # Return only the email body
    return body

## 8. Save Email to JSON File

In [9]:
def save_email_to_json(email_data, conversation_text, output_dir='email_outputs', source_name=None):
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    if source_name:
        safe_name = re.sub(r"[^A-Za-z0-9_-]+", "_", source_name).strip("_")
        filename = f"{safe_name}_{timestamp}.json" if safe_name else f"appointment_email_{timestamp}.json"
    else:
        filename = f"appointment_email_{timestamp}.json"
    filepath = output_path / filename

    output_data = {
        'transcript': conversation_text,
        'email': email_data,
        'created_at': datetime.now().isoformat(),
        'model_used': MODEL_NAME
    }

    with open(filepath, 'w', encoding='utf-8') as f:
        json.dump(output_data, f, indent=2, ensure_ascii=False)

    return str(filepath)

## 9. Usage Examples

### Single File Processing
Process one transcription file at a time.

In [10]:
transcription_file = "/content/drive/MyDrive/transcriptions/transcript_002.txt"

email, filepath = process_transcription_file(transcription_file)

print("\n" + "="*60)
print("GENERATED EMAIL:")
print("="*60)
print("EMAIL BODY")
print("="*60)
print(email)
print("="*60)
print(f"\nSaved to: {filepath}")

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



GENERATED EMAIL:
EMAIL BODY
Dear Greenfield Family Clinic Office,


I hope this message finds you well. My name is Isabella Carter, and I am reaching out to request an appointment with Dr. Brown, as I have been experiencing some digestive issues that I believe warrant professional evaluation.


I would like to schedule a visit for the upcoming Tuesday at 1 PM, if that is available. Please let me know if this time slot can be confirmed for me, or if alternative arrangements are necessary.


Thank you for your assistance and attention to this matter. I look forward to your confirmation and am hopeful for a resolution to my health concerns.


Warm regards,


Isabella Carter

[Contact Information: Phone Number and/or Email Address]

Saved to: email_outputs/transcript_002_20251109_195426.json


### Batch Processing
Process multiple transcription files from a directory.

In [11]:
transcriptions_directory = "/content/drive/MyDrive/transcriptions/"

results = process_transcription_files(transcriptions_directory, "*.txt")

# Display summary of generated emails
print("\n" + "="*60)
print("PROCESSING SUMMARY:")
print("="*60)
for txt_file, email, output_path in results:
    print(f"\nFile: {Path(txt_file).name}")
    print(f"Subject: {email}")
    print(f"Output: {output_path}")

Found 81 transcription file(s)

[1/81] Processing: transcript_002.txt
✓ Success: transcript_002_20251109_195608.json

[2/81] Processing: transcript_003.txt
✓ Success: transcript_003_20251109_195625.json

[3/81] Processing: transcript_004.txt
✓ Success: transcript_004_20251109_195637.json

[4/81] Processing: transcript_005.txt
✓ Success: transcript_005_20251109_195705.json

[5/81] Processing: transcript_007.txt
✓ Success: transcript_007_20251109_195721.json

[6/81] Processing: transcript_009.txt
✓ Success: transcript_009_20251109_195734.json

[7/81] Processing: transcript_010.txt
✓ Success: transcript_010_20251109_195756.json

[8/81] Processing: transcript_012.txt
✓ Success: transcript_012_20251109_195815.json

[9/81] Processing: transcript_013.txt
✓ Success: transcript_013_20251109_195829.json

[10/81] Processing: transcript_014.txt
✓ Success: transcript_014_20251109_195842.json

[11/81] Processing: transcript_015.txt
✓ Success: transcript_015_20251109_195858.json

[12/81] Processing: 

TypeError: string indices must be integers, not 'str'

In [13]:
!zip -r /content/email_outputs.zip /content/email_outputs
from google.colab import files
files.download('/content/email_outputs.zip')

updating: content/email_outputs/ (stored 0%)
updating: content/email_outputs/transcript_037_20251109_200421.json (deflated 49%)
updating: content/email_outputs/transcript_053_20251109_200743.json (deflated 48%)
updating: content/email_outputs/transcript_005_20251109_195705.json (deflated 48%)
updating: content/email_outputs/transcript_083_20251109_201557.json (deflated 49%)
updating: content/email_outputs/transcript_082_20251109_201542.json (deflated 48%)
updating: content/email_outputs/transcript_026_20251109_200142.json (deflated 48%)
updating: content/email_outputs/transcript_031_20251109_200250.json (deflated 51%)
updating: content/email_outputs/transcript_089_20251109_201726.json (deflated 51%)
updating: content/email_outputs/transcript_027_20251109_200159.json (deflated 50%)
updating: content/email_outputs/transcript_002_20251109_195608.json (deflated 49%)
updating: content/email_outputs/transcript_075_20251109_201338.json (deflated 49%)
updating: content/email_outputs/transcript

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [14]:
!jupyter nbconvert --ClearMetadataPreprocessor.enabled=True --to notebook --inplace appointment_transcript_generator.ipynb

This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr