The docs with name is the input docs, the docs start with MSR is the output docs. 

I want to create a python streamlit app, that let the managers input the inputs docs, then it will look at each input doc. 

Process:

* Create a master file: just a variable in the script. (save to a word or text file later)
   a. Look into each file,
       a. extract the file name.
       b. extract the data from the file.
   b. Do the same for all input docs.


* Create output docs
   1. Dumb all into llm: 
      1. Sonnet: 1 hallucination
      2. Gemini:
   2. By task order

Design decisions:
- Do one task order at a time, instead of dumb all files in at once.
  - Why: 
    - Simplified the task: avoid complexity manage files.
    - Reduce risk of hallucination for super long context. 

Edge case:
- Some person may not have the report: 
    - Administrative Services Support   
        Kyerra Jones
        o	NO INPUT DOCUMENT
- Make sure their name is on the file name: "MSR TO1 BernardonL September 24.docx"
- Highlight the accuracy of the information in the prompt!!! 



# Testing:
- Create a good example file.
- Combined all docx files into 1 file.
- Upload to LLM with example file

In [None]:
!python -m pip install python-docx google-generativeai python-dotenv pypandoc anthropic openai

- Upload TO files
- Extract text from docx file into 1 master file
- Pass the master file with example file to LLM
- Output the MSR file

# Cell: Organize Input Files by Task Order and Division

In [None]:
import os
import shutil
from docx import Document
import re

def get_task_order_info(file_path):
    try:
        doc = Document(file_path)
        task_order_info = None
        division_info = None
        for paragraph in doc.paragraphs:
            if "Task Order" in paragraph.text:
                task_order_info = paragraph.text.strip()
            if "Division" in paragraph.text:
                division_info = paragraph.text.strip()
            if task_order_info and division_info:
                break
        
        if task_order_info and division_info:
            to_match = re.search(r'Task Order (\d+)', task_order_info)
            to_number = to_match.group(1) if to_match else ""
            
            division_full = division_info.strip()
            short_name_match = re.search(r'\(([^)]+)\)', division_full)
            short_name = short_name_match.group(1) if short_name_match else ""
            
            return to_number, short_name, division_full.replace('('+short_name+')', '').strip()
    except Exception as e:
        print(f"Error reading file: {file_path}. Error: {e}")
    return None, None, None

# The rest of the script remains the same

def organize_files(input_dir, output_base_dir):
    for filename in os.listdir(input_dir):
        if filename.endswith('.docx'):
            file_path = os.path.join(input_dir, filename)
            to_number, short_name, division_full = get_task_order_info(file_path)
            
            if to_number and short_name and division_full:
                # Create folder name
                folder_name = f"TO{to_number}_{short_name}_{division_full.replace(' ', '')}"
                output_dir = os.path.join(output_base_dir, folder_name)
                
                # Create the output directory if it doesn't exist
                os.makedirs(output_dir, exist_ok=True)
                
                # Move the file
                shutil.move(file_path, os.path.join(output_dir, filename))
                print(f"Moved {filename} to {output_dir}")
            else:
                print(f"Could not determine task order info for {filename}")

# Usage
input_directory = "./data_files/inputs/all"
output_base_directory = "./data_files/inputs"

organize_files(input_directory, output_base_directory)

# Generate MSR from input files

In [13]:
import os
import google.generativeai as genai
from typing import List, Dict
from docx import Document
from dotenv import load_dotenv
from IPython.display import Markdown, display
import pypandoc
import anthropic
import openai

# Load environment variables
load_dotenv()

# Setup for different AI models
def setup_ai_model(model_name: str):
    if model_name == "gemini":
        api_key = os.getenv("GOOGLE_API_KEY")
        genai.configure(api_key=api_key)
        
        generation_config = {
            "temperature": 0,
            "max_output_tokens": 8192,
        }
        
        safety_settings = [
            {"category": "HARM_CATEGORY_DANGEROUS", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
        ]
        
        return genai.GenerativeModel(
            model_name="gemini-1.5-pro",
            generation_config=generation_config,
            safety_settings=safety_settings
        )
    elif model_name == "claude":
        api_key = os.getenv("ANTHROPIC_API_KEY")
        return anthropic.Anthropic(api_key=api_key)
    elif model_name == "gpt4":
        api_key = os.getenv("OPENAI_API_KEY")
        return openai.OpenAI(api_key=api_key)
    else:
        raise ValueError(f"Unsupported model: {model_name}")

# Generate monthly status report
def generate_monthly_status_report(model_name: str, master_content: str, example_content: str) -> str:
    # Load the prompt from the file
    prompt = load_prompt("monthly_status_report")
    
    # Format the prompt with the variables
    formatted_prompt = prompt.format(
        master_content=master_content,
        example_content=example_content
    )
    
    model = setup_ai_model(model_name)
    
    try:
        if model_name == "gemini":
            response = model.generate_content(formatted_prompt, stream=True)
            full_response = ""
            for chunk in response:
                if chunk.text:
                    full_response += chunk.text
            
            # Handling Safety Filters
            if response.candidates[0].finish_reason == "SAFETY":
                safety_ratings = response.candidates[0].safety_ratings
                safety_message = "Content was filtered due to safety concerns:\n"
                for rating in safety_ratings:
                    safety_message += f"- Category: {rating.category}, Probability: {rating.probability}\n"
                print(safety_message)
                return safety_message
            
            # Retrieving Usage Metadata
            if hasattr(response, 'usage_metadata'):
                prompt_tokens = response.usage_metadata.prompt_token_count
                candidates_tokens = response.usage_metadata.candidates_token_count
                print(f"Prompt tokens: {prompt_tokens}")
                print(f"Response tokens: {candidates_tokens}")
            
            return full_response
        
        elif model_name == "claude":
            response = model.messages.create(
                model="claude-3-sonnet-20240229",
                max_tokens=4096,
                temperature=0,
                messages=[
                    {
                        "role": "user",
                        "content": formatted_prompt
                    }
                ]
            )
            return response.content[0].text
        
        elif model_name == "gpt4":
            response = model.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": formatted_prompt}],
                max_tokens=4096,
                temperature=0
            )
            return response.choices[0].message.content
        
    except Exception as e:
        error_message = f"An error occurred: {e}"
        print(error_message)
        return error_message

# File processing functions
def read_word_file(file_path: str) -> str:
    doc = Document(file_path)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    return '\n'.join(full_text)

def process_input_docs(directory: str) -> Dict[str, str]:
    input_docs = {}
    for filename in os.listdir(directory):
        if filename.endswith('.docx'):
            file_path = os.path.join(directory, filename)
            content = read_word_file(file_path)
            input_docs[filename] = content
    return input_docs

# Prompt management functions
def load_prompt(prompt_name: str) -> str:
    prompt_folder = "./prompts"
    prompt_path = os.path.join(prompt_folder, f"{prompt_name}.txt")
    
    if os.path.exists(prompt_path):
        with open(prompt_path, 'r') as file:
            return file.read()
    else:
        raise FileNotFoundError(f"Prompt file '{prompt_name}.txt' not found in the prompts folder.")

# File saving and conversion functions
def save_markdown_to_file(markdown_content: str, file_path: str):
    with open(file_path, 'w') as md_file:
        md_file.write(markdown_content)

def convert_markdown_to_docx(markdown_file_path: str, output_file_path: str):
    pypandoc.convert_file(markdown_file_path, 'docx', outputfile=output_file_path)

# Main execution
if __name__ == "__main__":

    # Process input documents
    input_directory = "./data_files/inputs/TO1_FFD_FederalFacilitiesDivision"
    processed_docs = process_input_docs(input_directory)

    # Export into a text file in the data_files/master_file folder
    os.makedirs("./data_files/master_file", exist_ok=True)
    output_file = "./data_files/master_file/master_file.txt"

    with open(output_file, 'w') as file:
        for filename, content in processed_docs.items():
            file.write(f"File: {filename}\n")
            file.write(f"Content: {content}\n")
            file.write("-" * 50 + "\n")
            file.write("\n\n")

    print(f"Combined documents exported to: {output_file}")

    # Read the master file and example file
    with open("./data_files/master_file/master_file.txt", 'r') as file:
        master_content = file.read()

    with open("./example/example.txt", 'r') as file:
        example_content = file.read()

    # Let user select the model
    model_choice = input("Choose a model (gemini/claude/gpt4): ").lower()
    while model_choice not in ["gemini", "claude", "gpt4"]:
        print("Invalid choice. Please choose gemini, claude, or gpt4.")
        model_choice = input("Choose a model (gemini/claude/gpt4): ").lower()

    # Generate the monthly status report in Markdown
    report = generate_monthly_status_report(model_choice, master_content, example_content)

    # Create outputs folder if it doesn't exist
    os.makedirs("./data_files/outputs", exist_ok=True)

    # Get the input folder name
    input_folder_name = os.path.basename(input_directory)

    # Save the Markdown report
    markdown_file = f"./data_files/outputs/{input_folder_name}_{model_choice}.md"
    save_markdown_to_file(report, markdown_file)

    # Convert the Markdown report to Word document
    docx_file = f"./data_files/outputs/{input_folder_name}_{model_choice}.docx"
    convert_markdown_to_docx(markdown_file, docx_file)

    print(f"Monthly Status Report has been generated using {model_choice} and saved to '{docx_file}'")

    # Display the generated report (if running in a Jupyter notebook)
    with open(markdown_file, 'r') as md_file:
        generated_report_content = md_file.read()
    display(Markdown(generated_report_content))

Combined documents exported to: ./data_files/master_file/master_file.txt
Monthly Status Report has been generated using gpt4 and saved to './data_files/outputs/TO4_LFD_LeasedFacilitiesDivision_gpt4.docx'


# Monthly Status Report # --

**HQ0034-20-F-0237**  
**Task Order 4**  
**Leased Facilities Division (LFD)**  
**Administrative and Financial Services Support**

## For Work Performed:  
**September 2024**  

### Submitted to:  
Mrs. Tina Hall  
Contracting Officer’s Representative  
Washington Headquarters Service (WHS)  
Acquisition Directorate (AD)  
1155 Defense Pentagon Room 5B951  
Washington, DC 20301-1155  
tina.m.hall70.civ@mail.mil  
(202) 819-2679

### Submitted by:  
Adrian Nicholas  
Redhorse Corporation  
1777 N. Kent St, Suite 1200  
Arlington, VA 22209  
adrian.nicholas@redhorsecorp.com  
(347) 204-8125

---

## Administrative and Financial Services Support Team  
- Eddy Biniam  
- Miguel Vega

---

### Work Performed During September 2024  
**Administrative Services Support**  

#### Eddy Biniam  
- Assisted a colleague in gathering necessary 3 DAI reports and 1 Maximo report to send out a Cost Transfer in a timely fashion.  
- Completed a hot item that my Supervisor and lead needed immediately. Did a contract PR for a CMTSS contract for goods, demonstrating the ability to remain calm and focused under pressure and submit the document without any mistakes for immediate signing.  
- Worked on the Status of Funds for FY22 and FY23, verifying no funds remained. Communicated findings to my supervisor and edited the Status of Funds data for clarity.  
- Completed 2 contract PRs for goods for 20K, providing a solution to my supervisor. Successfully created 2 PRs with my supervisor observing, ensuring visibility in PD2.  
- Assisted my supervisor in locating a specific MIPR on the Open Commitment Pivot Tables, ensuring the correct email chain was used to provide the agency (DLA) with the requested MIPR for urgent processing of the 448-2 acceptance.

#### Miguel Vega  
- Assisted a colleague in gathering necessary 3 DAI reports and 1 Maximo report to send out a Cost Transfer in a timely fashion.  
- Completed a hot item that my Supervisor and lead needed immediately. Did a contract PR for a CMTSS contract for goods, demonstrating the ability to remain calm and focused under pressure and submit the document without any mistakes for immediate signing.  
- Worked on the Status of Funds for FY22 and FY23, verifying no funds remained. Communicated findings to my supervisor and edited the Status of Funds data for clarity.  
- Completed 2 contract PRs for goods for 20K, providing a solution to my supervisor. Successfully created 2 PRs with my supervisor observing, ensuring visibility in PD2.  
- Assisted my supervisor in locating a specific MIPR on the Open Commitment Pivot Tables, ensuring the correct email chain was used to provide the agency (DLA) with the requested MIPR for urgent processing of the 448-2 acceptance.

---

## Deliverables Completed  
- Monthly Status Report  
- Provided administrative and financial services support.

---

## Highlights  
- None

---

## Issues/Resolutions  
- **Issue:** None  
- **Resolution:** None

---

## Planned Work for Next Two Months  
**Administrative Services Support**  
- No new bullet points

**Financial Services Support**  
- No new bullet points

---

## Leave  
| Name         | Planned Leave September | Planned Leave October |
|--------------|-------------------------|-----------------------|
| Eddy Biniam  | NA                      | NA                    |
| Miguel Vega  | NA                      | NA                    |

---

## Recommendations  
- None

---

## Contractual/Staffing Actions  
- None