# Local RFP Staffing Requirements Extractor (Azure OpenAI)

This standalone notebook extracts staffing requirements from local Request for Proposal (RFP) documents using Azure's OpenAI service with the GPT-4 model and structured outputs.  To use, create an extracted_RFP_key_personal folder in the same directory as this notebook and place your RFP documents in it.  The notebook will extract the staffing requirements from each RFP and save results to an output folder.

## Setup

In [1]:
import os
import json
import logging
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from openai import AzureOpenAI
from docx import Document
import pypdf

# Load environment variables
load_dotenv()

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Initialize Azure OpenAI client
client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version="2024-06-01"
)

# Set the deployment name for the model
deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")

## Helper Functions

In [2]:
def read_file(file_path):
    _, file_extension = os.path.splitext(file_path)
    content = ""
    
    if file_extension.lower() == '.pdf':
        with open(file_path, 'rb') as file:
            pdf_reader = pypdf.PdfReader(file)
            for page in pdf_reader.pages:
                content += page.extract_text()
    elif file_extension.lower() == '.docx':
        doc = Document(file_path)
        for para in doc.paragraphs:
            content += para.text + "\n"
    else:
        logging.warning(f"Unsupported file type: {file_path}")
    
    return content

def process_rfp_folder_local(folder_path):
    logging.info(f"Processing RFP folder: {folder_path}")
    rfp_id = os.path.basename(folder_path)
    all_content = ""
    
    for file_name in os.listdir(folder_path):
        file_path = os.path.join(folder_path, file_name)
        if file_path.lower().endswith(('.pdf', '.docx')):
            all_content += read_file(file_path) + "\n\n"
    print(all_content);
     
    return rfp_id, all_content

def process_rfp_folder_web(folder_path):
    logging.info(f"Processing RFP folder: {folder_path}")
    rfp_id = os.path.basename(folder_path)
    file_urls = []
    
    url_file_path = os.path.join(folder_path, "file_urls.txt")
    
    if os.path.exists(url_file_path):
        with open(url_file_path, 'r') as url_file:
            file_urls = [url.strip() for url in url_file.readlines() if url.strip()]
        logging.info(f"Collected {len(file_urls)} file URLs for RFP {rfp_id}")
    else:
        logging.warning(f"No file_urls.txt found in {folder_path}")
    
    return rfp_id, file_urls

## Azure OpenAI Function with Structured Output

In [3]:
# add to create call if using openai directly or after Azure OpenAI client is updated to support Structured Output
response_format= {
    "type": "json_schema",
    "json_schema": {
        "name": "RFP_Staff_Requirements",
        "schema":{
            "type": "object",
            "properties": {
                "RFP_ID": {"type": "string"},
                "Title": {"type": "string"},
                "Required_Roles": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "Role": {"type": "string"},
                            "Requirements": {
                                "type": "array",
                                "items": {"type": "string"}
                            }
                        },
                        "required": ["Role", "Requirements"]
                    }
                }
            },
            "required": ["RFP_ID", "Title", "Required_Roles"]
        },
        "strict":True
    }
} 

def extract_roles_and_requirements_local(content, rfp_id):
    logging.info(f"Extracting roles and requirements for RFP: {rfp_id}")
 
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": "You are an expert in analyzing RFP documents and extracting staffing requirements."},
            {"role": "user", "content": f"Analyze the following RFP content and extract the required roles and their requirements.\n\nRFP Content:\n{content}"}
        ]
    )
    # extracted_info = json.loads(response.choices[0].message.content)
    extracted_info = response.choices[0].message.content
    #extracted_info["RFP_ID"] = rfp_id
    
    logging.info(f"Token usage for RFP {rfp_id}: {response.usage.total_tokens} tokens")
    
    return extracted_info 

def extract_roles_and_requirements_web(file_urls, rfp_id):
    logging.info(f"Extracting roles and requirements for RFP: {rfp_id}")
    urls_content = "\n".join([f"- {url}" for url in file_urls])

    # response = client.beta.chat.completions.parse(
    response = client.chat.completions.create(
        model=deployment_name,
        messages=[
            {"role": "system", "content": "You are an expert in analyzing RFP documents and extracting staffing requirements."},
            {"role": "user", "content": f"Analyze the RFP documents available at the following URLs:\n{urls_content}\n\nExtract the required roles and their requirements."}
        ]
    )
    
    # extracted_info = json.loads(response.choices[0].message.content)
    extracted_info = response.choices[0].message.content
    #extracted_info["RFP_ID"] = rfp_id
    
    logging.info(f"Token usage for RFP {rfp_id}: {response.usage.total_tokens} tokens")
    
    return extracted_info

## Main Processing Loop
This is the main function that processes all RFP folders and saves the extracted information.

In [4]:
def main():
    input_dir = "Input_RFPs"
    output_dir = "Extracted_RFP_key_personnel"
    
    os.makedirs(output_dir, exist_ok=True)
    
    for folder_name in os.listdir(input_dir):
        folder_path = os.path.join(input_dir, folder_name)
        if os.path.isdir(folder_path):
            try:
                rfp_id, content = process_rfp_folder_local(folder_path)
                extracted_info = extract_roles_and_requirements_local(content, rfp_id)
                output_file = os.path.join(output_dir, f"{rfp_id}_extracted_info.txt")
                #with open(output_file, 'w') as f:
                #    json.dump(extracted_info, f, indent=2)
                with open(output_file, 'w') as file:
                    file.write(extracted_info)


                logging.info(f"Extracted information saved to {output_file}")
            
            except Exception as e:
                logging.error(f"Error processing RFP {folder_name}: {str(e)}")

## Run the Extraction Process


In [5]:
if __name__ == "__main__":
    main()

2024-08-08 16:52:32,349 - INFO - Processing RFP folder: Input_RFPs/rfp2
2024-08-08 16:52:32,481 - INFO - Extracting roles and requirements for RFP: rfp2



Request For Proposal (RFP)

Solicitation Number 47QFAA24R0004

Budget Department Management System (BDMS) Implementation & Support Services

in support of:

Pension Benefit Guaranty Corporation (PBGC)




Issued to:
All Certified OneStream Partners   
under the General Services Administration (GSA) 
Indefinite Delivery, Indefinite Quantity (IDIQ) Single Award Contract 




Issued by:
General Services Administration (GSA)
Federal Acquisition Service (FAS) / AAS Civilian

26 June 2024 - Initial Solicitation

3 July 2024 - Amendment 1: Updated ASSIST registration link and information

12 July 2024 - Amendment 2: Updated Solicitation Section L on page limitations, Updated Key Personnel information on page 35, Section H.1 and in Attachment 4, Updated Attachment 6, Tab 1 to include a fourth space for Prior Experience, and Extension of Proposal due date to 7/19/2024

16 July 2024 - Amendment 3: Updated to allow for submission of a Compensation Plan in Volume 2 - Pricing, separate from the pa

2024-08-08 16:52:49,698 - INFO - HTTP Request: POST https://oai-llm072024-dev-eastus-01.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-06-01 "HTTP/1.1 200 OK"
2024-08-08 16:52:49,705 - INFO - Token usage for RFP rfp2: 42130 tokens
2024-08-08 16:52:49,707 - INFO - Extracted information saved to Extracted_RFP_key_personnel/rfp2_extracted_info.txt
2024-08-08 16:52:49,707 - INFO - Processing RFP folder: Input_RFPs/rfp1
2024-08-08 16:52:50,228 - INFO - Extracting roles and requirements for RFP: rfp1


SOLICITATION, OFFER AND AWARD1. THIS CONTRACT IS A RATED ORDER 
UNDER DPAS (15 CFR 7900)RATING
 PAGE             OF       PAGES
2. CONTRACT NUMBER 3. SOLICITATION NUMBER 4. TYPE OF SOLICITATION 5. DATE ISSUED 6. REQUISITION/PURCHASE NUMBER
CODE 7. ISSUED BY 8. ADDRESS OFFER TO (If other than item 7)
NOTE: In sealed bid solicitations "offer" and "offeror" mean "bid" and "bidder".
SOLICITATION
9. Sealed offers in original and copies for furnishings the supplies or services in the Schedule will be received at the place specified in item 8, or if
hand carried , in the depository located in until local time
CAUTION - LATE Submissions, Modifications, and Withdrawals: See Section L, Provision No. 52.214-7 or 52.215-1. All offers are subject to all terms and conditions  
contained in this solicitation.
10. FOR  
INFORMATION  
CALL:A. NAME B. TELEPHONE (NO COLLECT CALLS)
AREA CODE NUMBER EXTENSIONC. E-MAIL ADDRESS
11. TABLE OF CONTENTS
(X) SEC. DESCRIPTION PAGE(S) (X) SEC. PAGE(S) DESCRIPTION
A

2024-08-08 16:53:29,958 - INFO - HTTP Request: POST https://oai-llm072024-dev-eastus-01.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-06-01 "HTTP/1.1 200 OK"
2024-08-08 16:53:29,959 - INFO - Token usage for RFP rfp1: 61477 tokens
2024-08-08 16:53:29,960 - INFO - Extracted information saved to Extracted_RFP_key_personnel/rfp1_extracted_info.txt


## Conclusion
This notebook has processed all RFP documents in the `Input_RFPs` directory and saved the extracted staffing requirements as JSON files in the `extracted_RFP_key_personnel` directory. Check the console output for any error messages or logs.