In [1]:
# 🌿 Botanical Species Identification Tool v1.0
### (e-Flora of South Africa Edition)

Welcome! This tool helps you identify botanical species in South Africa by comparing your observations against data from the e-Flora of South Africa and occurrence records from GBIF.

---

## 🚀 How to Use This Tool (3 Simple Steps)

1.  **Provide Your API Key:**
    *   The first code cell (`CELL 1`) will run, followed by a prompt in the final cell (`CELL 4`) asking for your Google AI API key. You can get a free key from [Google AI Studio](https://aistudio.google.com/app/apikey).
    *   **Tip:** For convenience, you can add your key to Colab's "Secrets" manager (click the 🔑 icon on the left) with the name `GOOGLE_API_KEY`. The tool will find it automatically.

2.  **Run the Entire Notebook:**
    *   Go to the menu and click **`Runtime` -> `Run all`**.
    *   This will execute all the setup, data download, and function definitions automatically. It may take a minute or two.

3.  **Interact with the Tool:**
    *   Scroll down to the **very last cell (`CELL 4`)**.
    *   Adjust the `Latitude`, `Longitude`, `Radius`, and `Taxon Name` in the interactive form.
    *   Optionally, add your own specimen description.
    *   The analysis will run, and the results, map, and raw data will appear below the form. You can change the parameters and the cell will re-run automatically.

---
**Data Source Credit:** This tool utilizes data from the [e-Flora of South Africa](http://www.sanbi.org.za/), managed by the South African National Biodiversity Institute (SANBI).

SyntaxError: unterminated string literal (detected at line 12) (ipython-input-2763635493.py, line 12)

In [2]:
# ==============================================================================
# BOTANICAL SPECIES IDENTIFICATION TOOL v1.0 (e-Flora of South Africa Edition)
# ==============================================================================

# ==============================================================================
# CELL 1: SETUP AND INSTALLATION
# Run this cell once at the start of your session.
# ==============================================================================

# --- 1. Install necessary libraries ---
!pip install -q google-generativeai pandas folium tqdm markdown pygbif

# --- 2. Import all required tools ---
import pandas as pd
import google.generativeai as genai
import pygbif.species as gbif_species
import pygbif.occurrences as gbif_occ
import math
import time
import os
from datetime import datetime
import markdown
from google.colab import files, userdata, drive
from IPython.display import display, Markdown, HTML
import folium
from tqdm.auto import tqdm
import warnings

# Ignore minor warnings to keep the output clean.
warnings.filterwarnings('ignore')

print("✅ Setup complete. You can now proceed to Cell 2.")
print(f"📅 Session started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.2/70.2 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/61.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.0/70.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h✅ Setup complete. You can now proceed to Cell 2.
📅 Session started: 2025-09-24 12:03:18


In [3]:
# ==============================================================================
# CELL 2: LOAD E-FLORA OF SOUTH AFRICA DATA (AUTOMATIC DOWNLOAD)
# This cell automatically downloads the required data files from public repositories.
# ==============================================================================
import os
import pandas as pd
from tqdm.auto import tqdm

# --- 1. Define Public Data URLs ---

# --- Files on GitHub (for small files) ---
GITHUB_BASE_URL = "https://raw.githubusercontent.com/Gouania/botanical-id-tool-sa/main/"

# --- File on Google Drive (for the large description.txt) ---
# We define the File ID separately to make the command cleaner
FILE_ID = "1eqLf_WrdZOZj6yxxq0018feKuJIqIQcc"
DESCRIPTION_FILE_URL = f"https://drive.google.com/uc?export=download&id={FILE_ID}"

FILES_TO_DOWNLOAD = {
    "taxon.txt": GITHUB_BASE_URL + "taxon.txt",
    "vernacularname.txt": GITHUB_BASE_URL + "vernacularname.txt",
    "description.txt": DESCRIPTION_FILE_URL
}

# --- 2. Download the data files ---
print("📂 Downloading required e-Flora data files...")
for filename, url in FILES_TO_DOWNLOAD.items():
    print(f"   -> Downloading {filename}...")
    # ==========================================================================
    # === KEY FIX: Use the correct wget command for each file source ===
    # ==========================================================================
    if filename == "description.txt":
        # Use the complex command ONLY for the Google Drive file
        !wget -q --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id={FILE_ID}' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id={FILE_ID}" -O {filename} && rm -rf /tmp/cookies.txt
    else:
        # Use the simple command for the GitHub files
        !wget -q -O {filename} {url}

# --- 3. Load and Process the Data ---
def load_dwca_data_from_local():
    """
    Loads the downloaded text files into a pandas DataFrame.
    """
    if not all(os.path.exists(f) for f in FILES_TO_DOWNLOAD.keys()):
        print("\n❌ ERROR: One or more data files failed to download. Please run the cell again.")
        return None

    try:
        print("\n🔄 Processing downloaded files... (This may take a moment)")
        taxa_df = pd.read_csv('taxon.txt', sep='\t', header=0, usecols=['id', 'scientificName'], dtype={'id': str})
        desc_df = pd.read_csv('description.txt', sep='\t', header=0, usecols=['id', 'description', 'type'], dtype={'id': str})
        vernacular_df = pd.read_csv('vernacularname.txt', sep='\t', header=0, usecols=[0, 1], names=['taxonID', 'vernacularName'], dtype={'id': str})

        taxa_df.rename(columns={'id': 'taxonID'}, inplace=True)
        desc_df.rename(columns={'id': 'taxonID'}, inplace=True)
        taxa_df['cleanScientificName'] = taxa_df['scientificName'].apply(lambda x: ' '.join(str(x).split()[:2]))
        desc_agg = desc_df.groupby('taxonID').apply(lambda x: x.set_index('type')['description'].to_dict()).reset_index(name='descriptions')
        vernacular_agg = vernacular_df.groupby('taxonID')['vernacularName'].apply(lambda x: list(set(x))).reset_index()
        eflora_data = pd.merge(taxa_df, desc_agg, on='taxonID', how='left')
        eflora_data = pd.merge(eflora_data, vernacular_agg, on='taxonID', how='left')
        eflora_data.set_index('cleanScientificName', inplace=True)

        if eflora_data.index.has_duplicates:
            print(f"   ⚠️ Found {eflora_data.index.duplicated().sum()} duplicate names after cleaning. Keeping first entry for each.")
            eflora_data = eflora_data[~eflora_data.index.duplicated(keep='first')]

        print(f"\n✅ Successfully loaded and processed data for {len(eflora_data)} taxa.")
        print("   Ready for analysis. Proceed to Cell 3.")
        return eflora_data

    except Exception as e:
        print(f"\n❌ An error occurred during data loading: {e}")
        return None

# --- Execute the data loading process ---
EFLORA_DATA = load_dwca_data_from_local()

📂 Downloading required e-Flora data files...
   -> Downloading taxon.txt...
   -> Downloading vernacularname.txt...
   -> Downloading description.txt...

🔄 Processing downloaded files... (This may take a moment)
   ⚠️ Found 1412 duplicate names after cleaning. Keeping first entry for each.

✅ Successfully loaded and processed data for 20174 taxa.
   Ready for analysis. Proceed to Cell 3.


In [4]:
# ==============================================================================
# CELL 3: CORE FUNCTIONS
# Run this cell once after loading the data to define the tool's capabilities.
# ==============================================================================

# --- Configuration ---
MODEL_NAME = "models/gemini-2.5-flash"
CACHE = {'gbif_taxa': {}} # Cache for avoiding redundant GBIF calls

# --- Helper Functions ---
def format_species_name(name):
    """Clean and format species names to 'Genus species' for matching."""
    if not name: return None
    parts = name.split()
    return f"{parts[0]} {parts[1]}" if len(parts) >= 2 else name

# --- GBIF Data Retrieval ---
def get_species_list_from_gbif(latitude, longitude, radius_km, taxon_name, limit=1000):
    """
    Queries GBIF for a list of species recorded within a specific area,
    now supporting any taxonomic rank and restricted to the plant kingdom.
    """
    print(f"\n📍 Searching GBIF for '{taxon_name}' within {radius_km}km of ({latitude:.4f}, {longitude:.4f})")
    cache_key = f"{taxon_name}_{latitude}_{longitude}_{radius_km}"
    if cache_key in CACHE['gbif_taxa']:
        print("   ✓ Using cached GBIF data for this location.")
        return CACHE['gbif_taxa'][cache_key]

    try:
        print(f"   > Looking up '{taxon_name}' in the GBIF backbone (Kingdom: Plantae)...")
        taxon_info = gbif_species.name_backbone(name=taxon_name, kingdom='Plantae', verbose=False)

        if 'usageKey' not in taxon_info or taxon_info.get('matchType') == 'NONE':
            print(f"   ❌ Taxon '{taxon_name}' could not be matched within Kingdom Plantae in the GBIF backbone.")
            print("      Please check the spelling or try a different taxonomic name.")
            return []

        found_name = taxon_info.get('scientificName', 'N/A')
        found_rank = taxon_info.get('rank', 'N/A').title()
        classification = " -> ".join(filter(None, [
            taxon_info.get('kingdom'), taxon_info.get('phylum'), taxon_info.get('class'),
            taxon_info.get('order'), taxon_info.get('family'), taxon_info.get('genus')
        ]))
        print(f"   ✓ GBIF matched '{found_name}' (Rank: {found_rank})")
        print(f"     Classification: {classification}")
        taxon_key = taxon_info['usageKey']

    except Exception as e:
        print(f"   ❌ An error occurred while contacting the GBIF backbone API: {e}")
        return []

    lat_offset = radius_km / 111.32
    lon_offset = radius_km / (111.32 * abs(math.cos(math.radians(latitude))))
    params = {'taxonKey': taxon_key, 'decimalLatitude': f'{latitude - lat_offset},{latitude + lat_offset}',
              'decimalLongitude': f'{longitude - lon_offset},{longitude + lon_offset}',
              'hasCoordinate': True, 'hasGeospatialIssue': False, 'limit': 300}

    all_records, offset = [], 0
    pbar = tqdm(total=limit, desc="   Fetching records", unit="rec", leave=False)
    while offset < limit:
        params['offset'] = offset
        try:
            response = gbif_occ.search(**params)
            batch = response.get('results', [])
            if not batch: break
            all_records.extend(batch)
            pbar.update(len(batch))
            if len(batch) < 300: break
            offset += len(batch)
            time.sleep(0.1)
        except Exception: break
    pbar.close()

    species_dict = {}
    for record in all_records:
        species_name = record.get('species')
        if species_name:
            if species_name not in species_dict:
                species_dict[species_name] = {'name': species_name, 'count': 0, 'family': record.get('family', 'Unknown')}
            species_dict[species_name]['count'] += 1
    species_list = sorted(species_dict.values(), key=lambda x: x['count'], reverse=True)

    print(f"\n   ✓ Found {len(species_list)} unique species from {len(all_records)} records.")
    CACHE['gbif_taxa'][cache_key] = species_list
    return species_list

# --- Local e-Flora Data Retrieval ---
def get_local_eflora_description(scientific_name, eflora_data):
    """Retrieves botanical descriptions from the pre-loaded e-Flora DataFrame."""
    if scientific_name not in eflora_data.index:
        return (False, "Species not found in local database.")
    record = eflora_data.loc[scientific_name]
    descriptions = record.get('descriptions')
    vernacular_names = record.get('vernacularName')
    full_scientific_name = record.get('scientificName')
    if not isinstance(descriptions, dict): return (False, "No description data available.")
    priority_sections = ["Morphological description", "Diagnostic characters", "Habitat", "Distribution", "Morphology", "Diagnostic"]
    extracted_data = [f"**Scientific Name:** {full_scientific_name}"]
    if isinstance(vernacular_names, list) and not pd.isna(vernacular_names).all():
        valid_names = [name for name in vernacular_names if pd.notna(name)]
        if valid_names: extracted_data.append(f"**Common Names:** {', '.join(valid_names)}")
    for section in priority_sections:
        if section in descriptions and pd.notna(descriptions[section]):
            extracted_data.append(f"**{section}:**\n{descriptions[section]}")
    return (True, "\n\n".join(extracted_data)) if len(extracted_data) > 2 else (False, "No relevant sections found.")

# --- AI Analysis ---
def analyze_with_gemini(combined_descriptions, user_input, failed_list, species_metadata):
    """Sends the collected data to the Gemini AI for analysis."""
    # ==========================================================================
    # === KEY CHANGE: Improved user feedback during the wait ===
    # ==========================================================================
    print("\n🤖 Analyzing with Gemini AI... (This may take 10-30 seconds, please wait)")

    safety_settings = [{"category": c, "threshold": "BLOCK_NONE"} for c in ["HARM_CATEGORY_HARASSMENT", "HARM_CATEGORY_HATE_SPEECH", "HARM_CATEGORY_SEXUALLY_EXPLICIT", "HARM_CATEGORY_DANGEROUS_CONTENT"]]
    metadata_summary = "\n**Species Occurrence Data from GBIF (Top 10):**\n" + "".join([f"- {sp['name']} (Family: {sp['family']}, Records: {sp['count']})\n" for sp in species_metadata[:10]])
    failed_summary = f"**Species without local descriptions:** {', '.join(failed_list) if failed_list else 'None'}"
    if user_input and user_input.strip():
        prompt = f"""You are an expert field botanist. Your task is to identify a user's specimen based on their description, comparing it against a list of candidate species found in the area.
**USER'S SPECIMEN DESCRIPTION:**
{user_input}
**CANDIDATE SPECIES DATA (from local e-Flora):**
{combined_descriptions}
**CONTEXTUAL DATA:**
{metadata_summary}
{failed_summary}
**YOUR TASK:**
Provide a systematic identification analysis in this exact structure:
## 🎯 TOP CANDIDATES
List the 3 most likely species. For each, provide a **Match Confidence** percentage. Justify your choice by listing key **Matching Features** and any **Discrepancies**. Consider the GBIF record count as an indicator of how common a species is.
## 🔍 DIAGNOSTIC COMPARISON
Create a markdown table comparing the most important diagnostic features (e.g., leaves, flowers, habit) of the user's specimen against your top candidates.
## ⚠️ CRITICAL OBSERVATIONS & NEXT STEPS
What single, key feature would best confirm the identification? What should the user look for or photograph next to be certain?
"""
    else:
        prompt = f"""You are creating a practical field guide for botanists based on species known to occur in a specific area.
**AVAILABLE SPECIES DATA (from local e-Flora):**
{combined_descriptions}
**CONTEXTUAL DATA:**
{metadata_summary}
{failed_summary}
**YOUR TASK:**
Create a practical field guide using this exact structure, focusing only on the species for which descriptions were provided.
## 🌿 QUICK IDENTIFICATION MATRIX
Create a markdown table comparing the most diagnostic features (e.g., Habit, Leaf Shape, Flower Color, Habitat) for all available species. Use the GBIF record count to hint at which species are more commonly encountered.
## 🔑 SIMPLE DICHOTOMOUS KEY
Create a simple, practical dichotomous key to help differentiate between these species.
**CRITICAL FORMATTING RULES:**
1.  Each lead of a couplet (e.g., `1a` and `1b`) **MUST** be on its own, separate line. **NEVER** combine `...a` and `...b` leads onto the same line.
2.  Do not use dot leaders (`.......`).
3.  Use an arrow `->` to point to the result.
4.  Bold the species name or the "Go to" instruction.
**EXAMPLE OF PERFECT FORMAT:**
1a. Flowers yellow -> **Go to 2**
1b. Flowers white or pink -> **Go to 3**
2a. Leaves needle-like -> ***Species A***
2b. Leaves broad -> ***Species B***
3a. Shrub over 1m tall -> ***Species C***
3b. Shrub under 1m tall -> ***Species D***
## 👀 KEY FIELD MARKS
For each species, list the 2-3 most distinctive "at-a-glance" features that a botanist in the field could use for rapid identification.
"""
    try:
        model = genai.GenerativeModel(MODEL_NAME, safety_settings=safety_settings)
        response = model.generate_content(prompt)
        if response.parts:
            return response.text
        elif response.prompt_feedback and response.prompt_feedback.block_reason:
            reason = response.prompt_feedback.block_reason
            return f"⚠️ **Gemini Analysis Error:** The request was blocked by the API's safety filters (Reason: **{reason}**). Try reducing `MAX_SPECIES_TO_PROCESS`."
        else:
            return "⚠️ **Gemini Analysis Error:** The AI returned an empty response. This might be a temporary issue."
    except Exception as e:
        return f"⚠️ **Gemini Analysis Error:** An exception occurred during the API call. **Details:** {str(e)}"

# --- Main Workflow ---
def run_analysis(latitude, longitude, radius_km, taxon_name, user_input, max_species=20):
    """The main workflow that orchestrates data collection and analysis."""
    if EFLORA_DATA is None:
        print("\n❌ e-Flora data not loaded. Please run Cell 2 successfully first.")
        return None, None, None, None
    gbif_species_list = get_species_list_from_gbif(latitude, longitude, radius_km, taxon_name)
    if not gbif_species_list:
        print("\n❌ No species found in the specified area according to GBIF.")
        return None, None, None, None
    print(f"\n📚 Collecting local e-Flora descriptions for up to {max_species} most common species...")
    successful_lookups, failed_species = [], []
    species_to_process = gbif_species_list[:max_species]
    for species_info in tqdm(species_to_process, desc="   Processing species", unit="taxa"):
        name = species_info['name']
        clean_name = format_species_name(name)
        success, desc = get_local_eflora_description(clean_name, EFLORA_DATA)
        if success:
            successful_lookups.append({'name': name, 'description': desc, 'family': species_info['family'], 'gbif_count': species_info['count']})
        else:
            failed_species.append(name)
    print("\n" + "─" * 60 + "\n📊 Data Collection Summary:")
    print(f"   • Descriptions found: {len(successful_lookups)} / {len(species_to_process)}")
    if not successful_lookups:
        print("\n⚠️ No descriptions found for any of the most common species. Cannot perform analysis.")
        return None, None, failed_species, gbif_species_list
    combined_descriptions = "\n\n".join([f"### {s['name']} (Family: {s['family']}, GBIF Records in Area: {s['gbif_count']})\n{s['description']}" for s in successful_lookups])
    analysis_result = analyze_with_gemini(combined_descriptions, user_input, failed_species, gbif_species_list)
    return analysis_result, successful_lookups, failed_species, gbif_species_list

print("✅ All functions loaded successfully! Proceed to the final cell to run your analysis.")

✅ All functions loaded successfully! Proceed to the final cell to run your analysis.


In [5]:
# ==============================================================================
# CELL 4: MAIN EXECUTION - CONFIGURE AND RUN YOUR ANALYSIS
# This is the main control panel for the tool.
# ==============================================================================

# --- Step 1: Configure API Key ---
print("🔑 Configuring API access...")
try:
    # Attempt to get the key from Colab Secrets
    from google.colab import userdata
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=GOOGLE_API_KEY)
    print("✅ API key configured successfully from Colab Secrets.\n")
except (ImportError, userdata.SecretNotFoundError):
    print("⚠️ Colab Secrets not found or key is missing.")
    # Fallback to manual input if secrets fail
    import getpass
    api_key_input = getpass.getpass("Please enter your Google API Key: ")
    if api_key_input:
        GOOGLE_API_KEY = api_key_input
        genai.configure(api_key=GOOGLE_API_KEY)
        print("✅ API key configured manually for this session.\n")
    else:
        print("❌ No API key provided. Cannot proceed.")
        raise SystemExit()
except Exception as e:
    print(f"❌ An unexpected error occurred during API configuration: {e}")
    raise SystemExit()

# --- Step 2: User Interface ---
print("="*60)
print("🌿 BOTANICAL SPECIES IDENTIFICATION TOOL (e-Flora SA)")
print("="*60)

#@title 📍 Location Settings
LATITUDE = -33.90537  #@param {type:"number"}
LONGITUDE = 25.21772   #@param {type:"number"}
RADIUS_KM = 15      #@param {type:"slider", min:1, max:100, step:1}

#@title 🌱 Search Parameters (Try any rank: Genus, Family, Order, etc.)
TAXON_NAME = "Phylica" #@param {type:"string"}
MAX_SPECIES_TO_PROCESS = 15 #@param {type:"slider", min:5, max:50, step:5}

#@title 🔬 Specimen Details (Optional: Leave blank for a general field guide)
USER_INPUT = "" #@param {type:"string"}

# --- Step 3: Run Pre-Analysis Checks ---
if 'EFLORA_DATA' not in globals() or EFLORA_DATA is None:
    print("\n❌ ERROR: e-Flora data is not loaded. Please run Cell 2 successfully before this cell.")
else:
    # --- Step 4: Display Search Area Map ---
    print("\n🗺️ Visualizing Search Area...")
    m = folium.Map(location=[LATITUDE, LONGITUDE], zoom_start=13)
    folium.Circle(
        location=[LATITUDE, LONGITUDE], radius=RADIUS_KM * 1000,
        popup=f"{RADIUS_KM}km search radius", color='blue', fill=True, fillOpacity=0.2
    ).add_to(m)
    folium.Marker(
        [LATITUDE, LONGITUDE], popup=f"Center: {LATITUDE:.4f}, {LONGITUDE:.4f}",
        icon=folium.Icon(color='red')
    ).add_to(m)
    display(m)

    # --- Step 5: Run the Full Analysis ---
    print("\n🚀 Starting analysis... This may take a few moments.")
    analysis_result, successful_lookups, failed_list, species_metadata = run_analysis(
        LATITUDE, LONGITUDE, RADIUS_KM, TAXON_NAME, USER_INPUT, MAX_SPECIES_TO_PROCESS
    )

    # --- Step 6: Display the Final Results ---
    if analysis_result:
        print("\n" + "="*60)
        print("📋 ANALYSIS RESULTS")
        print("="*60)
        display(Markdown(analysis_result))

        # --- Display Collapsible Raw Data ---
        if successful_lookups:
            html_parts = ["<br><h3>📖 Raw e-Flora Descriptions Used in Analysis (Click to Expand)</h3>"]
            for item in successful_lookups:
                description_html = markdown.markdown(item['description'])
                html_parts.append(f"""
                <details style="margin-bottom: 8px; border: 1px solid #ddd; border-radius: 4px; padding: 10px;">
                    <summary style="cursor: pointer; font-weight: bold;">
                        {item['name']}
                        <span style="font-weight: normal; color: #555;"> (GBIF Occurrences: {item['gbif_count']})</span>
                    </summary>
                    <div style="margin-top: 10px; padding-left: 15px; border-left: 2px solid #eee;">
                        {description_html}
                    </div>
                </details>
                """)
            display(HTML("".join(html_parts)))

        if failed_list:
            print("\n⚠️ Could not find local descriptions for the following species:")
            print("   " + ", ".join(failed_list))
    else:
        print("\n❌ Analysis could not be completed. Please check the logs above for errors.")

    print("\n" + "="*60)
    print("✨ Process complete!")
    print("="*60)

🔑 Configuring API access...
✅ API key configured successfully from Colab Secrets.

🌿 BOTANICAL SPECIES IDENTIFICATION TOOL (e-Flora SA)

🗺️ Visualizing Search Area...



🚀 Starting analysis... This may take a few moments.

📍 Searching GBIF for 'Phylica' within 15km of (-33.9054, 25.2177)
   > Looking up 'Phylica' in the GBIF backbone (Kingdom: Plantae)...
   ✓ GBIF matched 'Phylica L.' (Rank: Genus)
     Classification: Plantae -> Tracheophyta -> Magnoliopsida -> Rosales -> Rhamnaceae -> Phylica


   Fetching records:   0%|          | 0/1000 [00:00<?, ?rec/s]


   ✓ Found 9 unique species from 69 records.

📚 Collecting local e-Flora descriptions for up to 15 most common species...


   Processing species:   0%|          | 0/9 [00:00<?, ?taxa/s]


────────────────────────────────────────────────────────────
📊 Data Collection Summary:
   • Descriptions found: 9 / 9

🤖 Analyzing with Gemini AI...

📋 ANALYSIS RESULTS


## 🌿 QUICK IDENTIFICATION MATRIX

| Species (GBIF Records)      | Max Height        | Leaf Characteristics                                                                                                        | Inflorescence Type                                                                | Flower Color/Characteristics                                       | Primary Habitat                               |
| :-------------------------- | :---------------- | :-------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------- | :----------------------------------------------------------------- | :-------------------------------------------- |
| **P. axillaris** (22)       | 0.8 m             | 5-15 mm, smooth upper, subacute base, apiculate apex.                                                                       | Lax, axillary racemes, well below branch tips.                                    | Whitish, 2.5-3.0 mm.                                               | Rocky slopes, mountain crests, coastal margins. |
| **P. willdenowiana** (18)   | 0.6 m             | 0.5-1.5 cm, linear/linear-lanceolate, mucronulate, revolute (covering lower surface), upper finely tubercled, grey-silky (young). | Many-flowered capituliform or lax racemes.                                        | 3.5-5 mm, covered with short/long grey hairs.                      | Sandstone slopes.                             |
| **P. gnidioides** (10)      | 1 m               | 8-10 mm, linear, smooth above, margins closely revolute.                                                                    | Rounded capitula, grouped in small corymbs, surrounded by ciliate leaves.         | **Pink**.                                                          | Dunes and grassy slopes.                      |
| **P. litoralis** (2)        | 0.3-1.0 m         | 7-15 mm, lanceolate, cordate base, margins revolute (half lower surface exposed), white-pubescent.                          | Hemispheric, many-flowered capitula, ± 10 mm wide.                                | 3-4 mm, white-velutinous.                                          | Coastal dunes.                                |
| **P. purpurea** (1)         | 2(3) m            | 5-10 mm, lanceolate, rounded/cordate base, margins revolute (half lower surface covered), upper tubercled white-hirsute.    | Hemispheric or orbicular capitula, 7-10 mm wide, surrounded by white-velutinous bracts. | 3-4 mm, white-velutinous. **Fruit silky white-villous.**           | Sandstone slopes, forest margins.             |
| **P. ericoides** (1)        | 0.6 m             | 5-8 mm, linear/lanceolate-linear, obtuse/subacute, cordate/rounded base, margins closely revolute (covering lower surface). | Solitary or clustered hemisphaeric capitula, 4-7 mm wide.                         | 1.5-2 mm, sepals with dense, coarse, often curly, white hairs.     | Coastal slopes and deep sands, renosterveld.  |
| **P. paniculata** (1)       | 5 m               | 10-15 x 3 mm, crowded, overlapping, margins revolute (concealing less than half of lower surface).                          | **Paniculate thyrses.**                                                           | Creamy white.                                                      | Woodland, rocky situations.                   |
| **P. abietina** (1)         | 0.5-1.5 m         | 4-6 mm, linear-lanceolate, base rounded, smooth above, apex laterally compressed, truncate, mucronulate.                    | Rounded capitula, ± 10 mm wide, surrounded by many leaves with enlarged petioles.  | **6-8 mm, densely white or pinkish tomentose**, calyx tube 3-4 mm deep. | Dry sandstone slopes.                         |
| **P. pinea** (1)            | 1 m               | ±12 mm, lanceolate to linear-lanceolate, cordate at base, margins strongly revolute.                                        | Short, mostly terminal racemes.                                                   | White.                                                             | Sandstone slopes.                             |

---

## 🔑 SIMPLE DICHOTOMOUS KEY

1a. Inflorescence a paniculate thyrse -> **Phylica paniculata**
1b. Inflorescence not a paniculate thyrse (racemes or capitula) -> **Go to 2**

2a. Inflorescence of lax racemes, sometimes capituliform -> **Go to 3**
2b. Inflorescence of rounded, hemispheric, or orbicular capitula -> **Go to 5**

3a. Inflorescences lax, axillary, well below branch tips; flowers whitish, 2.5-3.0 mm long -> **Phylica axillaris**
3b. Inflorescences terminal or lax, not strictly axillary well below branch tips -> **Go to 4**

4a. Inflorescence a many-flowered capituliform raceme (0.5-1 cm long) or lax raceme; flowers 3.5-5 mm long, covered with grey hairs; leaves with finely tubercled upper surface -> **Phylica willdenowiana**
4b. Inflorescence of short, mostly terminal racemes; flowers white; leaves lanceolate to linear-lanceolate, cordate at base, mostly ±12 mm long -> **Phylica pinea**

5a. Flowers distinctly pink -> **Phylica gnidioides**
5b. Flowers white or whitish, sometimes pinkish-tomentose, but not distinctly pink -> **Go to 6**

6a. Shrub or small tree, up to 2(3) m high; fruit silky white-villous; inflorescence hemispheric or orbicular capitula, 7-10 mm wide, surrounded by white-velutinous bracts -> **Phylica purpurea**
6b. Shrublet or shrub, usually less than 1.5 m high; fruit glabrous or velvety pubescent, not silky white-villous -> **Go to 7**

7a. Flowers relatively large (6-8 mm long), densely white or pinkish tomentose; calyx tube 3-4 mm deep; leaves 4-6 mm long, apex laterally compressed, truncate, mucronulate -> **Phylica abietina**
7b. Flowers smaller (1.5-4 mm long); calyx tube shallower (up to 0.5 mm deep); leaf apex not laterally compressed and truncate -> **Go to 8**

8a. Leaves 7-15 mm long, lanceolate, cordate at base, margins revolute (± half of lower surface exposed), white-pubescent; habitat primarily coastal dunes -> **Phylica litoralis**
8b. Leaves mostly 5-8 mm long, linear or lanceolate-linear, cordate or rounded at base, margins closely revolute (covering lower surface); outer surface of sepals with dense, coarse, often curly, white hairs; habitat coastal slopes/sands/renosterveld -> **Phylica ericoides**

---

## 👀 KEY FIELD MARKS

**Phylica axillaris**
*   Lax, axillary racemes appearing well below the branch tips.
*   Whitish flowers, 2.5-3.0 mm long.
*   Tomentose shrub on rocky slopes, mountain crests.

**Phylica willdenowiana**
*   Upper surface of leaves finely tubercled, often with grey, silky pubescence when young.
*   Flowers (3.5-5 mm) covered with short or long grey hairs.
*   Inflorescence often a distinctive, many-flowered capituliform raceme.

**Phylica gnidioides**
*   Distinctive pink flowers in rounded capitula.
*   Linear leaves with smooth upper surface and closely revolute margins.
*   Found on dunes and grassy slopes.

**Phylica litoralis**
*   Strongly associated with coastal dune habitats.
*   Leaves lanceolate, white-pubescent, with roughly half of the lower surface exposed by revolute margins.
*   Hemispheric, white-velutinous capitula.

**Phylica purpurea**
*   The only species with distinctly silky, white-villous fruit.
*   Can be a small tree up to 2-3 m tall.
*   Capitula surrounded by white-velutinous, foliaceous bracts.

**Phylica ericoides**
*   Small capitula (4-7 mm wide) with outer sepals densely covered in coarse, often curly, white hairs.
*   Leaves with closely revolute margins completely covering the lower surface.
*   Found on coastal slopes and deep sands.

**Phylica paniculata**
*   Distinctive paniculate thyrses (branched inflorescence of racemes).
*   Can grow into a large shrub or small tree up to 5 m high.
*   Flowers creamy white.

**Phylica abietina**
*   Large flowers (6-8 mm long), densely white or pinkish tomentose, with a deep calyx tube (3-4 mm).
*   Leaves 4-6 mm long, with a unique laterally compressed, truncate, mucronulate apex.
*   Rounded capitula surrounded by many leaves with enlarged petioles.

**Phylica pinea**
*   Leaves lanceolate to linear-lanceolate, cordate at base, typically around 12 mm long.
*   Flowers white, borne in short, mostly terminal racemes.
*   Found on sandstone slopes.


✨ Process complete!
