# Automated Grant Discovery System

Intelligent web scraping system that discovers and prioritizes grant opportunities from 7+ major foundations. Implements tiered scraping strategy scalable to 800+ foundations with foundation-specific relevance scoring.

**Key Features:**
- Anti-bot detection & dynamic content handling
- Delta-specific relevance scoring (AI/ML climate solutions prioritized)
- Tiered foundation strategy (custom scrapers → platform-aware → generic)
- Robust error handling with multiple fallback mechanisms

**Output:** Prioritized CSV of active grants ranked by alignment with organizational priorities.

In [None]:
# Cell 1: Install dependencies and import libraries
!pip install selenium
!apt-get update -qq
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')

# Import necessary libraries
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import re
from datetime import datetime
import json
import traceback

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
chromium-chromedriver is already the newest version (1:85.0.4183.83-0ubuntu2.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.
cp: '/usr/lib/chromium-browser/chromedriver' and '/usr/bin/chromedriver' are the same file


In [None]:
# Cell 2: Configure Chrome options for headless browsing
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')

# Initialize the Chrome driver
driver = webdriver.Chrome(options=options)

In [None]:
# Cell 3: Define utility functions
def get_text_or_default(element, selector, default="Not specified"):
    """Extract text from an element using a selector with a default fallback."""
    try:
        found_element = element.find_element(By.CSS_SELECTOR, selector)
        text = found_element.text.strip()
        return text if text else default
    except:
        return default

def get_attribute_or_default(element, selector, attribute, default=""):
    """Extract an attribute from an element using a selector with a default fallback."""
    try:
        found_element = element.find_element(By.CSS_SELECTOR, selector)
        attr_value = found_element.get_attribute(attribute)
        return attr_value if attr_value else default
    except:
        return default

def log_error(error_type, foundation_name, details):
    """Log errors for debugging."""
    print(f"ERROR ({error_type}) with {foundation_name}: {details}")
    traceback.print_exc()

## Foundation Prioritization & Relevance Scoring

Implements intelligent grant matching based on Delta's strategic priorities:
1. **AI/ML climate solutions** (highest weight)
2. **Sustainable architecture**
3. **Redwood conservation & research**
4. **Native/drought-resistant landscaping**
5. **Sustainability showcases**

Foundation tiers determine scraping approach: Tier 1 (custom scrapers), Tier 2 (platform-aware), Tier 3 (generic patterns).

In [None]:
# New Cell: Define priority matching system to rank grants by relevance to Delta's priorities
def calculate_relevance_score(grant):
    """
    Calculate relevance score based on Delta's priorities:
    1. AI/ML climate solutions
    2. Sustainable architecture
    3. Redwood conservation, research, natural growth patterns
    4. Native, Drought-Resistant, and Edible Landscaping Nursery
    5. Sustainability Showcases
    """
    # Extract text to analyze
    text = ' '.join([
        str(grant.get('Title', '')),
        str(grant.get('Type', '')),
        str(grant.get('Description', ''))
    ]).lower()

    # Priority weights (5 = highest, 1 = lowest)
    priorities = {
        # Priority 1: AI/ML climate solutions (weight 5)
        'ai': 5, 'machine learning': 5, 'ml': 5, 'artificial intelligence': 5,
        'climate': 5, 'carbon': 5, 'emissions': 5, 'climate change': 5,
        'climate solution': 5, 'climate tech': 5, 'climate data': 5,

        # Priority 2: Sustainable architecture (weight 4)
        'architecture': 4, 'building': 4, 'sustainable architecture': 4,
        'green building': 4, 'sustainable design': 4, 'sustainable construction': 4,

        # Priority 3: Redwood conservation (weight 3)
        'redwood': 3, 'conservation': 3, 'forest': 3, 'tree': 3,
        'ecosystem': 3, 'biodiversity': 3, 'natural growth': 3,

        # Priority 4: Native/Drought-Resistant Landscaping (weight 2)
        'drought': 2, 'native': 2, 'plant': 2, 'landscaping': 2,
        'edible': 2, 'nursery': 2,

        # Priority 5: Sustainability Showcases (weight 1)
        'showcase': 1, 'exhibition': 1, 'demonstration': 1, 'education': 1,
        'sustainability': 1
    }

    # Calculate score
    score = 0
    for keyword, weight in priorities.items():
        if keyword in text:
            score += weight

    # Normalize to 0-100 scale
    normalized_score = min(100, (score / 30) * 100)

    return normalized_score

# Define foundation tiers based on importance to Delta
def get_foundation_tier(foundation_name):
    """
    Assign foundations to tiers based on their relevance to Delta:
    - Tier 1: Most important foundations with dedicated scrapers
    - Tier 2: Medium priority foundations
    - Tier 3: Lower priority foundations with generic scraping
    """
    tier1_foundations = [
        "Sloan Foundation",
        "Simons Foundation",
        "Chan Zuckerberg Initiative"
    ]

    tier2_foundations = [
        "Rose Foundation",
        "Dreyfus Foundation",
        "Ford Foundation"
    ]

    if any(name.lower() in foundation_name.lower() for name in tier1_foundations):
        return 1
    elif any(name.lower() in foundation_name.lower() for name in tier2_foundations):
        return 2
    else:
        return 3

## Tier 1 Foundation Scrapers

Custom scrapers for high-priority foundations with advanced anti-detection measures and foundation-specific parsing logic.

In [None]:
# New Cell: Define Sloan Foundation scraper with anti-bot detection handling
def extract_sloan_foundation_grants(url):
    """
    Extracts information about open calls from the Sloan Foundation with anti-bot detection handling.

    Args:
        url (str): URL of the open calls page

    Returns:
        list: List of dictionaries containing grant information
    """
    print(f"Scraping open calls from {url}")

    # Configure options to avoid bot detection
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36')

    # Create a new driver instance for this specific site
    sloan_driver = webdriver.Chrome(options=options)

    try:
        sloan_driver.get(url)
        # Use a longer wait time for Cloudflare and other protection systems
        print("Waiting for page to load and protection systems to clear...")
        time.sleep(10)

        # Check if we're still on a challenge page
        if "challenge" in sloan_driver.title.lower() or "just a moment" in sloan_driver.title.lower():
            print("Detected security challenge. Waiting longer...")
            time.sleep(15)  # Wait longer for challenge to clear

        print(f"Page title: {sloan_driver.title}")

        grants_data = []

        # Find all grant sections
        grant_sections = sloan_driver.find_elements(By.CSS_SELECTOR, "section.scheme-none")

        if not grant_sections:
            print("No grant sections found with primary selector. Trying alternatives...")
            # Try more general selectors
            grant_sections = sloan_driver.find_elements(By.XPATH, "//h2/parent::*/parent::section")

        print(f"Found {len(grant_sections)} grant sections")

        for section in grant_sections:
            try:
                # Extract title
                title = section.find_element(By.CSS_SELECTOR, "h2").text.strip()
                print(f"Processing grant: {title}")

                # Extract the content
                content_container = section.find_element(By.CSS_SELECTOR, ".content-container")
                paragraphs = content_container.find_elements(By.CSS_SELECTOR, "p")

                grant_info = {
                    "Foundation": "Alfred P. Sloan Foundation",
                    "Title": title,
                    "Type": "Not specified",
                    "Description": "Not specified",
                    "Geographic Scale": "Not specified",
                    "Amount": "Not specified",
                    "Deadline": "Not specified",
                    "Status": "Open",
                    "URL": url
                }

                # Process paragraphs to extract structured data
                for p in paragraphs:
                    text = p.text.strip()

                    if text.startswith("Call for:"):
                        grant_info["Type"] = text.replace("Call for:", "").strip()
                    elif text.startswith("Deadline:"):
                        grant_info["Deadline"] = text.replace("Deadline:", "").strip()
                    elif text.startswith("Summary:"):
                        grant_info["Description"] = text.replace("Summary:", "").strip()
                    elif text.startswith("Link:"):
                        try:
                            link_elem = p.find_element(By.TAG_NAME, "a")
                            grant_info["URL"] = link_elem.get_attribute("href")
                        except:
                            grant_info["URL"] = text.replace("Link:", "").strip()

                # Extract amount if present in the description
                if "$" in grant_info["Description"]:
                    amount_matches = re.findall(r'\$[\d,]+ ?(?:- ?)?(?:\$)?[\d,]*', grant_info["Description"])
                    if amount_matches:
                        grant_info["Amount"] = ", ".join(amount_matches)

                # Only add if it doesn't appear to be a closed grant
                closed_terms = ["applications closed", "closed", "ended", "past"]
                is_closed = any(term in title.lower() for term in closed_terms)

                if not is_closed:
                    grants_data.append(grant_info)
                    print(f"Added grant: {title}")
                else:
                    print(f"Skipping closed grant: {title}")

            except Exception as e:
                print(f"Error processing grant section: {str(e)}")

        return grants_data

    except Exception as e:
        print(f"Error scraping Sloan Foundation: {str(e)}")
        return []

    finally:
        sloan_driver.quit()

In [None]:
# Updated Rose Foundation scraper for fully dynamic approach
def extract_rose_grants(url, grant_type):
    """
    Extracts information about active grants from the Rose Foundation with fully dynamic approach.

    Args:
        url (str): URL of the grants page
        grant_type (str): Type of grant (Environmental or Consumer Rights)

    Returns:
        list: List of dictionaries containing grant information
    """
    print(f"Scraping {grant_type} grants from {url}")

    driver.get(url)
    # Wait for the page to load completely
    time.sleep(5)

    # Find all grant entries on the page
    grant_entries = driver.find_elements(By.CSS_SELECTOR, "div.post")

    if not grant_entries:
        print(f"No grant entries found using div.post selector. Trying alternative approach...")
        # Try an alternative approach
        grant_entries = driver.find_elements(By.CSS_SELECTOR, "article")

        if not grant_entries:
            print("Still no entries found. Trying links...")
            links = driver.find_elements(By.CSS_SELECTOR, "a[href*='grant']")
            print(f"Found {len(links)} potential grant links")

    grants_data = []
    closed_count = 0
    active_count = 0

    for entry in grant_entries:
        try:
            # Check if the grant is active or open
            is_closed = False

            # First check tags/status indicators
            try:
                status_elements = entry.find_elements(By.CSS_SELECTOR, "div.tags, span.status")
                for status_element in status_elements:
                    status_text = status_element.text.strip()
                    if "closed" in status_element.get_attribute("class").lower() or "closed" in status_text.lower():
                        is_closed = True
                        break
            except:
                # If can't check status directly, check the entry text
                if "closed" in entry.text.lower():
                    is_closed = True

            if is_closed:
                closed_count += 1
                continue

            # Extract grant title
            try:
                title_element = entry.find_element(By.CSS_SELECTOR, "h3 a, h2 a, h4 a, .title a")
                title = title_element.text.strip()
                link = title_element.get_attribute("href")
            except:
                # Try to find any link
                try:
                    link_element = entry.find_element(By.TAG_NAME, "a")
                    link = link_element.get_attribute("href")
                    title = link_element.text.strip() or "Unnamed Grant"
                except:
                    title = "Unnamed Grant"
                    link = url

            # Extract grant description
            description = "Not specified"
            try:
                desc_elements = entry.find_elements(By.CSS_SELECTOR, "p.post-desc, .description, .excerpt")
                if desc_elements:
                    description = desc_elements[0].text.strip()
            except:
                pass

            # Extract amount/scale information
            amount = "Not specified"
            scale = "Not specified"

            try:
                amount_elements = entry.find_elements(By.CSS_SELECTOR, ".post-amount, .amount, [class*=amount]")
                if amount_elements:
                    amount = amount_elements[0].text.strip()
            except:
                # Try to extract amount from description using regex
                amount_match = re.search(r'\$[\d,]+(?:\s*-\s*\$[\d,]+)?', description)
                if amount_match:
                    amount = amount_match.group(0)

            try:
                scale_elements = entry.find_elements(By.CSS_SELECTOR, ".post-scale, .scale, [class*=scale]")
                if scale_elements:
                    scale = scale_elements[0].text.strip()
            except:
                pass

            # Extract deadline
            deadline = "Not specified"
            try:
                deadline_elements = entry.find_elements(By.CSS_SELECTOR, ".deadline, [class*=deadline], .date")
                if deadline_elements:
                    deadline = deadline_elements[0].text.strip()
                elif "deadline" in entry.text.lower():
                    # Try to extract deadline with regex
                    deadline_match = re.search(r'deadline:?\s*([^\n]+)', entry.text.lower())
                    if deadline_match:
                        deadline = deadline_match.group(1).strip()
            except:
                pass

            # Compile grant information
            grant_info = {
                "Foundation": "Rose Foundation",
                "Title": title,
                "Type": grant_type,
                "Description": description,
                "Geographic Scale": scale,
                "Amount": amount,
                "Deadline": deadline,
                "Status": "Open",
                "URL": link
            }

            grants_data.append(grant_info)
            active_count += 1
            print(f"Extracted active grant: {title}")

        except Exception as e:
            print(f"Error extracting grant: {e}")

    print(f"Found {active_count} active grants and skipped {closed_count} closed grants")
    return grants_data

In [None]:
# Updated Dreyfus Foundation scraper for fully dynamic approach
def extract_dreyfus_foundation_info(url):
    """
    Extracts grant application information from the Max and Victoria Dreyfus Foundation.
    Fully dynamic approach without hard-coded values.

    Args:
        url (str): URL of the Dreyfus Foundation page

    Returns:
        list: List of dictionaries containing grant information
    """
    print(f"Scraping grant information from {url}")

    driver.get(url)
    time.sleep(5)

    grants_data = []

    try:
        # Find main content sections
        content_elements = driver.find_elements(By.CSS_SELECTOR, "div[data-mesh-id*='comp-'], div.SxM0TO, div.content")

        if not content_elements:
            print("No content elements found. Trying alternative selectors...")
            content_elements = driver.find_elements(By.XPATH, "//div[contains(text(), 'Award Round') or contains(text(), 'Deadline')]")

        # Extract text from all content elements
        all_content = ""
        for elem in content_elements:
            all_content += elem.text + " "

        # Extract rounds information using regex
        spring_pattern = r"(?:requests|applications)(?:[^.]*?)(?:between|from)([^and]*)and([^are]*)are considered[^.]*Spring Award Round"
        fall_pattern = r"(?:requests|applications)(?:[^.]*?)(?:between|from)([^and]*)and([^are]*)are considered[^.]*Fall Award Round"

        # Find spring and fall periods
        spring_match = re.search(spring_pattern, all_content, re.IGNORECASE)
        fall_match = re.search(fall_pattern, all_content, re.IGNORECASE)

        # Extract funding range
        funding_range = "Not specified"
        funding_match = re.search(r'\$[\d,]+(?:\s*(?:to|-)\s*\$[\d,]+)?', all_content)
        if funding_match:
            funding_range = funding_match.group(0)

        # Create grant entries for each round found
        if spring_match:
            spring_start = spring_match.group(1).strip()
            spring_end = spring_match.group(2).strip()

            spring_grant = {
                "Foundation": "Max and Victoria Dreyfus Foundation",
                "Title": "Spring Award Round",
                "Type": "General Support",
                "Description": "Organizations can apply for general support or specific projects.",
                "Geographic Scale": "National",
                "Amount": funding_range,
                "Deadline": f"{spring_start} to {spring_end}",
                "Status": "Open",
                "URL": url
            }
            grants_data.append(spring_grant)
            print(f"Extracted Spring Award Round: {spring_start} to {spring_end}")

        if fall_match:
            fall_start = fall_match.group(1).strip()
            fall_end = fall_match.group(2).strip()

            fall_grant = {
                "Foundation": "Max and Victoria Dreyfus Foundation",
                "Title": "Fall Award Round",
                "Type": "General Support",
                "Description": "Organizations can apply for general support or specific projects.",
                "Geographic Scale": "National",
                "Amount": funding_range,
                "Deadline": f"{fall_start} to {fall_end}",
                "Status": "Open",
                "URL": url
            }
            grants_data.append(fall_grant)
            print(f"Extracted Fall Award Round: {fall_start} to {fall_end}")

        # If no rounds found through regex, try to extract from the page structure
        if not grants_data:
            print("No grant rounds found using regex. Trying direct extraction...")

            # Look for headings or sections that might contain round information
            round_elements = driver.find_elements(By.XPATH, "//*[contains(text(), 'Round') or contains(text(), 'Award') or contains(text(), 'Grant')]")

            for elem in round_elements:
                round_text = elem.text
                if "spring" in round_text.lower():
                    grants_data.append({
                        "Foundation": "Max and Victoria Dreyfus Foundation",
                        "Title": "Spring Award Round",
                        "Type": "General Support",
                        "Description": "Contact foundation for details",
                        "Geographic Scale": "National",
                        "Amount": funding_range,
                        "Deadline": "Check website for current deadlines",
                        "Status": "Open",
                        "URL": url
                    })
                elif "fall" in round_text.lower():
                    grants_data.append({
                        "Foundation": "Max and Victoria Dreyfus Foundation",
                        "Title": "Fall Award Round",
                        "Type": "General Support",
                        "Description": "Contact foundation for details",
                        "Geographic Scale": "National",
                        "Amount": funding_range,
                        "Deadline": "Check website for current deadlines",
                        "Status": "Open",
                        "URL": url
                    })

    except Exception as e:
        print(f"Error extracting Dreyfus Foundation info: {str(e)}")
        traceback.print_exc()

    # If still no data, add a note about the issue
    if not grants_data:
        print("WARNING: Could not extract Dreyfus Foundation grant information dynamically.")
        print("Consider visiting the website directly: https://www.mvdreyfusfoundation.org/")

    return grants_data

In [None]:
# Cell 5: Define Max and Victoria Dreyfus Foundation scraper
def extract_dreyfus_foundation_info(url):
    """
    Extracts grant application information from the Max and Victoria Dreyfus Foundation.
    Fully dynamic approach without hard-coded values.

    Args:
        url (str): URL of the Dreyfus Foundation page

    Returns:
        list: List of dictionaries containing grant information
    """
    print(f"Scraping grant information from {url}")

    driver.get(url)
    time.sleep(5)

    grants_data = []

    try:
        # Find main content sections
        content_elements = driver.find_elements(By.CSS_SELECTOR, "div[data-mesh-id*='comp-'], div.SxM0TO, div.content")

        if not content_elements:
            print("No content elements found. Trying alternative selectors...")
            content_elements = driver.find_elements(By.XPATH, "//div[contains(text(), 'Award Round') or contains(text(), 'Deadline')]")

        # Extract text from all content elements
        all_content = ""
        for elem in content_elements:
            all_content += elem.text + " "

        # Extract rounds information using regex
        spring_pattern = r"(?:requests|applications)(?:[^.]*?)(?:between|from)([^and]*)and([^are]*)are considered[^.]*Spring Award Round"
        fall_pattern = r"(?:requests|applications)(?:[^.]*?)(?:between|from)([^and]*)and([^are]*)are considered[^.]*Fall Award Round"

        # Find spring and fall periods
        spring_match = re.search(spring_pattern, all_content, re.IGNORECASE)
        fall_match = re.search(fall_pattern, all_content, re.IGNORECASE)

        # Extract funding range
        funding_range = "Not specified"
        funding_match = re.search(r'\$[\d,]+(?:\s*(?:to|-)\s*\$[\d,]+)?', all_content)
        if funding_match:
            funding_range = funding_match.group(0)

        # Create grant entries for each round found
        if spring_match:
            spring_start = spring_match.group(1).strip()
            spring_end = spring_match.group(2).strip()

            spring_grant = {
                "Foundation": "Max and Victoria Dreyfus Foundation",
                "Title": "Spring Award Round",
                "Type": "General Support",
                "Description": "Organizations can apply for general support or specific projects.",
                "Geographic Scale": "National",
                "Amount": funding_range,
                "Deadline": f"{spring_start} to {spring_end}",
                "Status": "Open",
                "URL": url
            }
            grants_data.append(spring_grant)
            print(f"Extracted Spring Award Round: {spring_start} to {spring_end}")

        if fall_match:
            fall_start = fall_match.group(1).strip()
            fall_end = fall_match.group(2).strip()

            fall_grant = {
                "Foundation": "Max and Victoria Dreyfus Foundation",
                "Title": "Fall Award Round",
                "Type": "General Support",
                "Description": "Organizations can apply for general support or specific projects.",
                "Geographic Scale": "National",
                "Amount": funding_range,
                "Deadline": f"{fall_start} to {fall_end}",
                "Status": "Open",
                "URL": url
            }
            grants_data.append(fall_grant)
            print(f"Extracted Fall Award Round: {fall_start} to {fall_end}")

        # If no rounds found through regex, try to extract from the page structure
        if not grants_data:
            print("No grant rounds found using regex. Trying direct extraction...")

            # Look for headings or sections that might contain round information
            round_elements = driver.find_elements(By.XPATH, "//*[contains(text(), 'Round') or contains(text(), 'Award') or contains(text(), 'Grant')]")

            for elem in round_elements:
                round_text = elem.text
                if "spring" in round_text.lower():
                    grants_data.append({
                        "Foundation": "Max and Victoria Dreyfus Foundation",
                        "Title": "Spring Award Round",
                        "Type": "General Support",
                        "Description": "Contact foundation for details",
                        "Geographic Scale": "National",
                        "Amount": funding_range,
                        "Deadline": "Check website for current deadlines",
                        "Status": "Open",
                        "URL": url
                    })
                elif "fall" in round_text.lower():
                    grants_data.append({
                        "Foundation": "Max and Victoria Dreyfus Foundation",
                        "Title": "Fall Award Round",
                        "Type": "General Support",
                        "Description": "Contact foundation for details",
                        "Geographic Scale": "National",
                        "Amount": funding_range,
                        "Deadline": "Check website for current deadlines",
                        "Status": "Open",
                        "URL": url
                    })

    except Exception as e:
        print(f"Error extracting Dreyfus Foundation info: {str(e)}")
        traceback.print_exc()

    # If still no data, add a note about the issue
    if not grants_data:
        print("WARNING: Could not extract Dreyfus Foundation grant information dynamically.")
        print("Consider visiting the website directly: https://www.mvdreyfusfoundation.org/")

    return grants_data

In [None]:
def extract_simons_foundation_grants(url):
    """
    Extracts active funding opportunities from Simons Foundation using deadline date comparison.
    """
    print(f"Scraping funding opportunities from {url}")

    driver.get(url)
    time.sleep(5)

    grants_data = []
    skipped_count = 0

    # Today's date for comparison
    today = datetime.now().date()
    print(f"Today's date: {today}")

    try:
        grant_entries = driver.find_elements(By.CSS_SELECTOR, "article.m-post--tabular")

        if not grant_entries:
            grant_entries = driver.find_elements(By.CSS_SELECTOR, ".m-post")

        print(f"Found {len(grant_entries)} potential grant entries")

        for entry in grant_entries:
            try:
                title = get_text_or_default(entry, "h4.m-post__title", "Unnamed Grant")
                aside_text = get_text_or_default(entry, ".m-post__aside", "")

                # Method 1: Check explicit status text
                if "Closed" in aside_text and "Status" in aside_text:
                    print(f"Skipping explicitly closed grant: {title}")
                    skipped_count += 1
                    continue

                # Method 2: Check deadline
                deadline = "Not specified"
                deadline_date = None

                if "Deadline" in aside_text:
                    deadline_match = re.search(r'(?:Application )?Deadline[^A-Za-z0-9]*([^\n]+)', aside_text)
                    if deadline_match:
                        deadline = deadline_match.group(1).strip()

                        # Parse deadline date with multiple formats
                        date_formats = [
                            "%B %d, %Y", "%b %d, %Y", "%B %d %Y",
                            "%b %d %Y", "%Y-%m-%d", "%m/%d/%Y"
                        ]

                        for date_format in date_formats:
                            try:
                                deadline_date = datetime.strptime(deadline, date_format).date()
                                break
                            except ValueError:
                                continue

                # Skip if deadline has passed
                if deadline_date and deadline_date < today:
                    print(f"Skipping grant with passed deadline ({deadline_date}): {title}")
                    skipped_count += 1
                    continue

                # Extract other info
                link = get_attribute_or_default(entry, "h4.m-post__title a", "href", url)
                description = get_text_or_default(entry, ".m-post__main", "Not specified")

                program_area = get_text_or_default(entry, ".program-area", "Not specified")
                if "Program Area: " in program_area:
                    program_area = program_area.split("Program Area: ")[1].strip()

                # Extract status
                status = "Open"  # Default
                if "Status" in aside_text:
                    status_match = re.search(r'Status\s*-\s*([^\n]+)', aside_text)
                    if status_match and "Rolling" in status_match.group(1):
                        status = "Rolling"

                # Extract amount from description
                amount = "Not specified"
                amount_match = re.search(r'\$[\d,.]+(?:\s*-\s*\$[\d,.]+)?', description)
                if amount_match:
                    amount = amount_match.group(0)

                grant_info = {
                    "Foundation": "Simons Foundation",
                    "Title": title,
                    "Type": program_area,
                    "Description": description,
                    "Geographic Scale": "National",
                    "Amount": amount,
                    "Deadline": deadline,
                    "Status": status,
                    "URL": link
                }

                grants_data.append(grant_info)
                print(f"Extracted active grant: {title}")

            except Exception as e:
                print(f"Error extracting grant entry: {str(e)}")
                traceback.print_exc()

    except Exception as e:
        print(f"Error scraping Simons Foundation: {str(e)}")
        traceback.print_exc()

    print(f"Extracted {len(grants_data)} active Simons Foundation grants (skipped {skipped_count})")
    return grants_data

In [None]:
# New Cell: Define Chan Zuckerberg Initiative scraper
def extract_czi_grants(url):
    """
    Extracts information about active grants from Chan Zuckerberg Initiative.

    Args:
        url (str): URL of the CZI science funding page

    Returns:
        list: List of dictionaries containing grant information
    """
    print(f"Scraping grants from {url}")

    driver.get(url)
    time.sleep(5)  # Allow page to load

    grants_data = []

    try:
        # Find all grant cards
        grant_cards = driver.find_elements(By.CSS_SELECTOR, "div.card.rfa-card")

        if not grant_cards:
            print("No grant cards found with primary selector. Trying alternatives...")
            grant_cards = driver.find_elements(By.CSS_SELECTOR, ".cards-section .card")

        print(f"Found {len(grant_cards)} potential grant cards")
        active_count = 0
        closed_count = 0

        for card in grant_cards:
            try:
                # Check if the grant is closed
                is_closed = False
                alert_elements = card.find_elements(By.CSS_SELECTOR, ".card__alert")

                if alert_elements:
                    for alert in alert_elements:
                        if "closed" in alert.text.lower():
                            is_closed = True
                            closed_count += 1
                            break

                if is_closed:
                    continue

                # Extract grant information
                title = get_text_or_default(card, ".card__title", "Unnamed Grant")
                description = get_text_or_default(card, ".card__text", "Not specified")

                # Extract program/type
                program_type = get_text_or_default(card, ".card__surtitle.color--off-black", "Not specified")

                # Extract deadline
                deadline = "Not specified"
                due_date_elements = card.find_elements(By.CSS_SELECTOR, ".card__due-date")

                if due_date_elements:
                    date_parts = []
                    for element in due_date_elements:
                        date_day = get_text_or_default(element, ".date-day", "")
                        date_month = get_text_or_default(element, "div:not(.date-day):not(.background--red)", "")
                        if date_day and date_month:
                            date_parts = [date_day, date_month]
                    if date_parts:
                        deadline = " ".join(date_parts)

                # Extract amount
                amount = "Not specified"
                amount_elements = card.find_elements(By.CSS_SELECTOR, ".card__info__desc")
                for element in amount_elements:
                    text = element.text
                    if "$" in text:
                        amount = text
                        break

                # Extract URL
                url_element = card.find_element(By.CSS_SELECTOR, "a.button-cta")
                link = url_element.get_attribute("href") if url_element else url

                # Create grant record
                grant_info = {
                    "Foundation": "Chan Zuckerberg Initiative",
                    "Title": title,
                    "Type": program_type,
                    "Description": description,
                    "Geographic Scale": "Not specified",
                    "Amount": amount,
                    "Deadline": deadline,
                    "Status": "Open",
                    "URL": link
                }

                grants_data.append(grant_info)
                active_count += 1
                print(f"Extracted active grant: {title}")

            except Exception as e:
                print(f"Error extracting grant: {str(e)}")
                traceback.print_exc()

        print(f"Found {active_count} active grants and skipped {closed_count} closed grants")

    except Exception as e:
        print(f"Error scraping CZI grants: {str(e)}")
        traceback.print_exc()

    return grants_data

In [None]:
# New Cell: Define Ford Foundation scraper
def extract_ford_foundation_grants(url):
    """
    Extracts information about current grant opportunities from the Ford Foundation website.

    Args:
        url (str): URL of the Ford Foundation grant opportunities page

    Returns:
        list: List of dictionaries containing grant information
    """
    print(f"Scraping grant opportunities from {url}")

    driver.get(url)
    # Allow page to load completely
    time.sleep(5)

    grants_data = []

    try:
        # Find the "Current Opportunities" section
        opportunities_header = driver.find_elements(By.XPATH, "//h2[contains(text(), 'Current Opportunities')]")

        if not opportunities_header:
            print("Could not find 'Current Opportunities' section. Trying alternative approach...")
            opportunities_header = driver.find_elements(By.XPATH, "//h2[contains(@class, 'wp-block-heading')]")

        if opportunities_header:
            # Look for media-text blocks that contain grant opportunities
            # These appear to be the containers for individual grant listings
            media_blocks = driver.find_elements(By.CSS_SELECTOR, ".wp-block-media-text")

            if not media_blocks:
                print("No media blocks found. Trying alternative selectors...")
                media_blocks = driver.find_elements(By.CSS_SELECTOR, "div[class*='media-text']")

            print(f"Found {len(media_blocks)} potential grant blocks")

            for block in media_blocks:
                try:
                    # Extract grant title
                    title_element = block.find_element(By.CSS_SELECTOR, "h2, .wp-block-heading")
                    title = title_element.text.strip()

                    # Skip if this doesn't look like a grant opportunity
                    if not title or "Opportunities" in title:
                        continue

                    # Extract description
                    description = get_text_or_default(block, "p, .wp-block-paragraph", "Not specified")

                    # Extract link if available
                    link = url  # Default to the main page
                    link_elements = block.find_elements(By.CSS_SELECTOR, "a.wp-block-button__link")
                    if link_elements:
                        link = link_elements[0].get_attribute("href")

                    # Create grant info dictionary
                    grant_info = {
                        "Foundation": "Ford Foundation",
                        "Title": title,
                        "Type": "Not specified",  # Try to extract from content if possible
                        "Description": description,
                        "Geographic Scale": "Not specified",
                        "Amount": "Not specified",
                        "Deadline": "Not specified",
                        "Status": "Open",
                        "URL": link
                    }

                    # Try to identify grant type from the title or description
                    if "film" in title.lower() or "film" in description.lower():
                        grant_info["Type"] = "Film/Media"
                    elif "neighbor" in title.lower() or "york" in title.lower():
                        grant_info["Type"] = "Local Community"

                    # Try to extract geographic information if available
                    if "york" in title.lower() or "york" in description.lower():
                        grant_info["Geographic Scale"] = "New York City"

                    # Add to results
                    grants_data.append(grant_info)
                    print(f"Extracted grant: {title}")

                except Exception as e:
                    print(f"Error extracting grant from block: {str(e)}")
                    continue
        else:
            print("Could not locate grant opportunities section")

    except Exception as e:
        print(f"Error scraping Ford Foundation grants: {str(e)}")
        traceback.print_exc()

    print(f"Extracted {len(grants_data)} Ford Foundation grants")
    return grants_data

## Tiered Execution Strategy

Executes scrapers by priority tier, calculates relevance scores, and filters active opportunities. Handles closed grants, duplicate detection, and data quality validation.

In [None]:
# Modified Cell 11: Execute the scrapers with tiered strategy and sort by relevance
# Define foundation URLs
env_grants_url = "https://rosefdn.org/granting/environmental-grants/"
consumer_grants_url = "https://rosefdn.org/granting/consumer-rights-grants/"
dreyfus_timeline_url = "https://www.mvdreyfusfoundation.org/contact"
sloan_open_calls_url = "https://sloan.org/grants/open-calls"
simons_foundation_url = "https://www.simonsfoundation.org/funding-opportunities/"
czi_grants_url = "https://chanzuckerberg.com/science/science-funding/"
ford_grants_url = "https://www.fordfoundation.org/work/our-grants/grant-opportunities/"

# Store all grants in a single list
all_grants = []

# Define the foundation tiers and their scrapers
foundations = [
    # Tier 1 Foundations (most important to Delta)
    {"name": "Alfred P. Sloan Foundation", "url": sloan_open_calls_url, "scraper": extract_sloan_foundation_grants, "tier": 1},
    {"name": "Simons Foundation", "url": simons_foundation_url, "scraper": extract_simons_foundation_grants, "tier": 1},
    {"name": "Chan Zuckerberg Initiative", "url": czi_grants_url, "scraper": extract_czi_grants, "tier": 1},

    # Tier 2 Foundations
    {"name": "Rose Foundation (Environmental)", "url": env_grants_url, "scraper": lambda url: extract_rose_grants(url, "Environmental"), "tier": 2},
    {"name": "Rose Foundation (Consumer Rights)", "url": consumer_grants_url, "scraper": lambda url: extract_rose_grants(url, "Consumer Rights"), "tier": 2},
    {"name": "Max and Victoria Dreyfus Foundation", "url": dreyfus_timeline_url, "scraper": extract_dreyfus_foundation_info, "tier": 2},
    {"name": "Ford Foundation", "url": ford_grants_url, "scraper": extract_ford_foundation_grants, "tier": 2}
]

# Sort foundations by tier
foundations.sort(key=lambda x: x["tier"])

print("Starting extraction process with tiered scraping strategy...")

# Scrape each foundation based on its tier
for foundation in foundations:
    print(f"\n--- {foundation['name']} (TIER {foundation['tier']}) ---")
    try:
        # Use the dedicated scraper for this foundation
        foundation_grants = foundation["scraper"](foundation["url"])

        # Add grants to the list
        all_grants.extend(foundation_grants)
        print(f"Extracted {len(foundation_grants)} grants")
    except Exception as e:
        print(f"Error scraping {foundation['name']}: {e}")
        traceback.print_exc()

# Post-process the grants and calculate relevance scores
processed_grants = []
for grant in all_grants:
    # Skip entries with closed terms anywhere in the entry
    closed_terms = ["applications closed", "app closed", "closed"]
    is_closed = False

    for field in grant:
        if isinstance(grant[field], str) and any(term in grant[field].lower() for term in closed_terms):
            is_closed = True
            break

    if is_closed:
        print(f"Skipping closed grant: {grant.get('Title', 'Unnamed')}")
        continue

    # Skip news items
    if "latest" in grant.get('URL', '').lower():
        continue

    # Ensure all grants have required fields
    for field in ['Description', 'Geographic Scale', 'Amount', 'Deadline', 'Type']:
        if field not in grant or not grant[field]:
            grant[field] = "Not specified"

    # Calculate relevance score based on Delta's priorities
    grant['Relevance_Score'] = calculate_relevance_score(grant)

    # Add the foundation tier
    foundation_name = grant.get('Foundation', '')
    grant['Tier'] = get_foundation_tier(foundation_name)

    processed_grants.append(grant)

# Sort grants by relevance score (primary) and tier (secondary)
processed_grants.sort(key=lambda x: (-x['Relevance_Score'], x['Tier']))

# Create a DataFrame from the processed grants data
grants_df = pd.DataFrame(processed_grants)

# Display the DataFrame
print(f"\nFinal dataframe size: {len(grants_df)} rows")
print(f"Columns: {grants_df.columns.tolist()}")

# Display top grants by relevance
print("\nTop grants by relevance to Delta's priorities:")
if not grants_df.empty:
    display_columns = ['Relevance_Score', 'Tier', 'Foundation', 'Title', 'Type', 'Deadline']
    display_df = grants_df[display_columns].head(10)
    print(display_df)

grants_df

Starting extraction process with tiered scraping strategy...

--- Alfred P. Sloan Foundation (TIER 1) ---
Scraping open calls from https://sloan.org/grants/open-calls
Waiting for page to load and protection systems to clear...
Page title: Open Calls
Found 2 grant sections
Processing grant: Metascience and AI Postdoctoral Fellowships
Added grant: Metascience and AI Postdoctoral Fellowships
Processing grant: Letters of Inquiry on Interdisciplinary Social Science Research on Energy System Interactions in the United States
Added grant: Letters of Inquiry on Interdisciplinary Social Science Research on Energy System Interactions in the United States
Extracted 2 grants

--- Simons Foundation (TIER 1) ---
Scraping funding opportunities from https://www.simonsfoundation.org/funding-opportunities/
Today's date: 2025-03-08
Found 51 potential grant entries
Skipping grant with passed deadline (2025-03-06): Autism Rat Models Consortium 2.0 RFA
Extracted active grant: Simons Dissertation Fellowship 

Unnamed: 0,Foundation,Title,Type,Description,Geographic Scale,Amount,Deadline,Status,URL,Relevance_Score,Tier
0,Chan Zuckerberg Initiative,Accelerating and Scaling Biological Sciences W...,Not specified,This RFA invites applications to build large-s...,Not specified,Not specified,18 June,Open,https://chanzuckerberg.com/rfa/ai-computing-gpu/,33.333333,1
1,Alfred P. Sloan Foundation,Metascience and AI Postdoctoral Fellowships,Submissions,"Grants of up to $250,000 (USD) over up to two ...",Not specified,"$250,000","April 10th, 2025",Open,https://sloan.org/programs/digital-technology/...,16.666667,1
2,Simons Foundation,Pivot Fellowship,Simons Foundation,The fellowship will enable today’s brightest m...,National,Not specified,Not specified,Open,https://www.simonsfoundation.org/grant/pivot-f...,16.666667,1
3,Max and Victoria Dreyfus Foundation,Spring Award Round,General Support,Contact foundation for details,National,Not specified,Check website for current deadlines,Open,https://www.mvdreyfusfoundation.org/contact,16.666667,2
4,Max and Victoria Dreyfus Foundation,Fall Award Round,General Support,Contact foundation for details,National,Not specified,Check website for current deadlines,Open,https://www.mvdreyfusfoundation.org/contact,16.666667,2
5,Max and Victoria Dreyfus Foundation,Spring Award Round,General Support,Contact foundation for details,National,Not specified,Check website for current deadlines,Open,https://www.mvdreyfusfoundation.org/contact,16.666667,2
6,Alfred P. Sloan Foundation,Letters of Inquiry on Interdisciplinary Social...,Letters of Inquiry,"Grants of $500,000 - $1,000,000 to be made for...",Not specified,"$500,000 - $1,000,000","March 25, 2025",Open,https://apply.sloan.org/prog/energy_system_int...,0.0,1
7,Simons Foundation,Simons Dissertation Fellowship in Mathematics,Mathematics & Physical Sciences,The Simons Foundation’s Mathematics and Physic...,National,Not specified,"March 31, 2025",Open,https://www.simonsfoundation.org/grant/simons-...,0.0,1
8,Simons Foundation,Targeted Grants in MPS,Mathematics & Physical Sciences,The program is intended to support high-risk t...,National,Not specified,Not specified,Rolling,https://www.simonsfoundation.org/grant/targete...,0.0,1
9,Chan Zuckerberg Initiative,Global Science Scholars,BIOHUB NETWORK,An up to two-year international postdoctoral f...,Not specified,Not specified,27 May,Open,https://www.czbiohub.org/program-ssfcz-global-...,0.0,1


In [None]:
# Cell 8: Export to CSV and clean up
# Export the grants data to a CSV file
current_date = datetime.now().strftime("%Y-%m-%d")
filename = f"foundation_active_grants_{current_date}.csv"

if not grants_df.empty:
    # Ensure consistent columns order for better readability
    desired_columns = [
        'Foundation', 'Title', 'Type', 'Description',
        'Geographic Scale', 'Amount', 'Deadline', 'Status', 'URL'
    ]

    # Add any additional columns that might be present
    all_columns = desired_columns + [col for col in grants_df.columns if col not in desired_columns]

    # Select columns that actually exist in the dataframe
    actual_columns = [col for col in all_columns if col in grants_df.columns]

    # Reorder and export
    grants_df = grants_df[actual_columns]
    grants_df.to_csv(filename, index=False)
    print(f"Active grants data exported to '{filename}'")

    # Print a preview of what's being saved
    print("\nPreview of CSV content:")
    print(grants_df[['Foundation', 'Title', 'Status']].head(10))
else:
    print("WARNING: No grants were found to export!")

# Clean up
driver.quit()



Active grants data exported to 'foundation_active_grants_2025-03-08.csv'

Preview of CSV content:
                            Foundation  \
0           Chan Zuckerberg Initiative   
1           Alfred P. Sloan Foundation   
2                    Simons Foundation   
3  Max and Victoria Dreyfus Foundation   
4  Max and Victoria Dreyfus Foundation   
5  Max and Victoria Dreyfus Foundation   
6           Alfred P. Sloan Foundation   
7                    Simons Foundation   
8                    Simons Foundation   
9           Chan Zuckerberg Initiative   

                                               Title   Status  
0  Accelerating and Scaling Biological Sciences W...     Open  
1        Metascience and AI Postdoctoral Fellowships     Open  
2                                   Pivot Fellowship     Open  
3                                 Spring Award Round     Open  
4                                   Fall Award Round     Open  
5                                 Spring Award Round   

In [None]:
# Modified Cell 12: Export to CSV with relevance scores and tiers
# Export the grants data to a CSV file
current_date = datetime.now().strftime("%Y-%m-%d")
filename = f"delta_prioritized_grants_{current_date}.csv"

if not grants_df.empty:
    # Ensure consistent columns order for better readability
    desired_columns = [
        'Relevance_Score', 'Tier', 'Foundation', 'Title', 'Type', 'Description',
        'Geographic Scale', 'Amount', 'Deadline', 'Status', 'URL'
    ]

    # Add any additional columns that might be present
    all_columns = desired_columns + [col for col in grants_df.columns if col not in desired_columns]

    # Select columns that actually exist in the dataframe
    actual_columns = [col for col in all_columns if col in grants_df.columns]

    # Reorder and export
    grants_df = grants_df[actual_columns]
    grants_df.to_csv(filename, index=False)
    print(f"Prioritized grants data exported to '{filename}'")

    # Print a preview of what's being saved
    print("\nPreview of CSV content (sorted by relevance to Delta's priorities):")
    print(grants_df[['Relevance_Score', 'Tier', 'Foundation', 'Title']].head(10))
else:
    print("WARNING: No grants were found to export!")

# Clean up
driver.quit()

Prioritized grants data exported to 'delta_prioritized_grants_2025-03-08.csv'

Preview of CSV content (sorted by relevance to Delta's priorities):
   Relevance_Score  Tier                           Foundation  \
0        33.333333     1           Chan Zuckerberg Initiative   
1        16.666667     1           Alfred P. Sloan Foundation   
2        16.666667     1                    Simons Foundation   
3        16.666667     2  Max and Victoria Dreyfus Foundation   
4        16.666667     2  Max and Victoria Dreyfus Foundation   
5        16.666667     2  Max and Victoria Dreyfus Foundation   
6         0.000000     1           Alfred P. Sloan Foundation   
7         0.000000     1                    Simons Foundation   
8         0.000000     1                    Simons Foundation   
9         0.000000     1           Chan Zuckerberg Initiative   

                                               Title  
0  Accelerating and Scaling Biological Sciences W...  
1        Metascience and AI

## Results Summary

Successfully extracted and prioritized grant opportunities with relevance scoring. Data exported as CSV with grants ranked by alignment to Delta's strategic priorities. System demonstrated successful handling of diverse website structures and dynamic content.