# VR Weapons Training Scoring Pipeline

## Overview
This pipeline scores defense companies based on their relevance to VR/AR/MR weapons and tactical training solutions.

## Scoring Logic

**Total Score: 0-100 points**

### Scoring Criteria:

1. **Tags Analysis (30 points)**
   - VR/AR/MR tags ‚Üí +30 points

2. **Weapons Training Tags (25 points)**
   - Weapons/Small Arms Training ‚Üí +25 points

3. **Overview Keywords (45 points)**
   - "VR" / "virtual reality" ‚Üí +15 points
   - "weapons" / "firearms" / "shooting" ‚Üí +15 points
   - "police" / "law enforcement" / "military training" ‚Üí +10 points
   - "tactical" / "combat simulation" ‚Üí +5 points

### Relevance Categories:

| Score Range | Category | Description |
|-------------|----------|-------------|
| 70-100 | **HIGH** | Direct VR weapons training provider |
| 40-69 | **MEDIUM** | Partially relevant solution |
| 0-39 | **LOW** | Not relevant |

In [2]:
import pandas as pd

df = pd.read_csv('/Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies.csv')

df.head(5)


Unnamed: 0,company_name,slug_name,url,stand,tags,overview,website
0,"""NT Service"" UAB",nt-serviceuab,https://www.dsei.co.uk/exhibitors-list/nt-serv...,N9-205,Telecommunications; Communication Systems; Com...,‚ÄúNT SERVICE‚Äù a Lithuania based company with 27...,
1,1MILLIKELVIN PTY LTD,1millikelvin-pty,https://www.dsei.co.uk/exhibitors-list/1millik...,S2-110,Maintenance and Repair; Technical Services; Se...,1MILLIKELVIN are an innovative Australian manu...,http://www.1millikelvin.com
2,2Excel Aviation,2excel-aviation,https://www.dsei.co.uk/exhibitors-list/2excel-...,S3-165,Fixed Wing; Surveillance; Engineering Services...,2Excel is an innovative aviation and aerospace...,http://www.2excelaviation.com
3,2T Security,2t-security,https://www.dsei.co.uk/exhibitors-list/2t-secu...,S15-154,Cloud and Infrastructure; Artificial Intellige...,"2T Security provides agile, field-ready cyber ...",http://2T Security
4,3A Composites Mobility AG,3a-composites-mobility-ag,https://www.dsei.co.uk/exhibitors-list/3a-comp...,N9-150,Utility Vehicles; Emergency Vehicles; Rough Te...,3A Composites Mobility: Advanced Lightweight S...,http://Park Altenrhein


In [3]:
# Check dataset structure
print("Columns:", df.columns.tolist())
print("\nShape:", df.shape)
print("\nSample tags:", df['tags'].head(3).tolist() if 'tags' in df.columns else "No tags column")
print("\nSample overview:", df['overview'].head(1).tolist() if 'overview' in df.columns else "No overview column")

Columns: ['company_name', 'slug_name', 'url', 'stand', 'tags', 'overview', 'website']

Shape: (1733, 7)

Sample tags: ['Telecommunications; Communication Systems; Command and Control; Countermeasures; Electronic Warfare; Counter UAV', 'Maintenance and Repair; Technical Services; Sensors; Infrared Technologies; Test Equipment and Facilities', 'Fixed Wing; Surveillance; Engineering Services; Furnishing and Fixings; Electrical Components and Subcomponents; Avionics; Sensing Technologies; Unmanned and Autonomous; Fixed Wing Aircraft Simulation and Training']

Sample overview: ['‚ÄúNT SERVICE‚Äù a Lithuania based company with 27 years of industry experience in designing, implementing and servicing of communication and IT solutions for public, private and critical infrastructure networks.  ‚ÄúNT SERVICE‚Äù presents and manufactures Multi-dimensional security solutions for defense and public safety authorities. We put our best effort to meet the most demanding market requirements for Counter 

In [12]:
# Get all unique tags from dataset
all_tags = df['tags'].dropna().str.split(';').explode().str.strip().unique()
all_tags_sorted = sorted(all_tags)
print(f"Total unique tags: {len(all_tags_sorted)}\n")
for tag in all_tags_sorted:
    print(f"- {tag}")

Total unique tags: 374

- Access Management
- Accessories
- Active Microwave Technologies
- Actuators
- Additive Manufacturing
- Adhesive Bonding
- Adhesives
- Air Platforms
- Aircraft
- Airfield and Air Traffic Control Services
- Ammunition
- Amphibious Vehicles
- Anchor
- Antennas
- Anti-Aircraft
- Anti-Ship and Submarine
- Anti-Tank and Mines
- Architectures and Systems
- Armour
- Armouring
- Artificial Intelligence
- Artificial Intelligence (AI)
- Artillery
- Assembly
- Asset Tracking
- Augmented Reality (AR) / Virtual Reality (VR) / Mixed Reality (MR)
- Augmented Reality (AR) / Virtual Reality (VR) / Mixed Reality (MR) Based Training
- Augmented Service Technologies
- Automated Materials Handling
- Auxiliary power unit (APU)
- Avionics
- Ballistic Protection
- Base and Camp Protection and Security
- Batteries
- Battlefield Digitisation
- Battlefield Support
- Biometrics
- Biotechnology
- Boats
- Bodies and Panels
- Bomb Disposal (Explosive Ordnance Displosal - EOD / Improvised Exp

In [13]:
# Search for VR/AR/Training related tags
vr_related = [tag for tag in all_tags_sorted if any(kw in tag.lower() for kw in ['vr', 'virtual', 'ar', 'augment', 'mixed', 'xr', 'simulat', 'train', 'weapon', 'small arm', 'firearm', 'shoot', 'tactical', 'combat'])]
print("Tags potentially relevant to VR/Weapons Training:\n")
for tag in vr_related:
    print(f"- {tag}")

Tags potentially relevant to VR/Weapons Training:

- Anti-Ship and Submarine
- Architectures and Systems
- Armour
- Armouring
- Artificial Intelligence
- Artificial Intelligence (AI)
- Artillery
- Augmented Reality (AR) / Virtual Reality (VR) / Mixed Reality (MR)
- Augmented Reality (AR) / Virtual Reality (VR) / Mixed Reality (MR) Based Training
- Augmented Service Technologies
- Auxiliary power unit (APU)
- Breathing Apparatus
- Carriage and Mounts
- Circuit Boards
- Circuit Card Assemblies
- Combat Identification
- Counter-IED Training and Simulation
- Design and Information Management Software
- E-Learning
- Electronic Warfare
- Explosive Search Equipment
- Fixed Wing Aircraft Simulation and Training
- Flares
- Footwear
- Grenades and Mortars and Discharges
- Guided Weapons
- Hardware
- Harnesses and Climbing Equipment and Ropes
- Incident Management Training and Simulation
- Infrared Technologies
- Lidar Technologies
- Machine Learning
- Maintenance and Diagnostics Training
- Mecha

## Scoring Pipeline Implementation

In [14]:
# =============================================================================
# SCORING CONFIGURATION
# =============================================================================
# Easy to modify keywords and weights for future support

SCORING_CONFIG = {
    # Tags Analysis (max 30 points)
    # Based on actual dataset tags found
    "tags": {
        "vr_ar_mr": {
            "keywords": [
                # Exact tags from dataset
                "Augmented Reality (AR) / Virtual Reality (VR) / Mixed Reality (MR)",
                "Augmented Reality (AR) / Virtual Reality (VR) / Mixed Reality (MR) Based Training",
                # Partial matches
                "virtual reality", "augmented reality", "mixed reality",
                "vr", "ar", "mr", "xr",
                # Related
                "Modelling and Simulation",
                "Training and Simulation",
            ],
            "points": 30
        }
    },
    
    # Weapons Training Tags (max 25 points)
    # Based on actual dataset tags found
    "weapons_tags": {
        "keywords": [
            # Exact tags from dataset
            "Small Arms Training and Simulation",
            "Weapons Training and Simulation",
            "Small Arms and Guns",
            "Weapons",
            # Partial matches
            "weapons", "small arms", "firearms", "shooting", "marksmanship", "gunnery"
        ],
        "points": 25
    },
    
    # Overview Keywords Analysis (max 45 points)
    "overview": {
        "vr_keywords": {
            "keywords": ["vr", "virtual reality", "immersive", "simulation"],
            "points": 15
        },
        "weapons_keywords": {
            "keywords": ["weapons", "firearms", "shooting", "small arms", "gunnery"],
            "points": 15
        },
        "target_audience": {
            "keywords": ["police", "law enforcement", "military training", "armed forces"],
            "points": 10
        },
        "tactical_keywords": {
            "keywords": ["tactical", "combat simulation", "combat training", "tactical training"],
            "points": 5
        }
    }
}

# Relevance Categories
RELEVANCE_THRESHOLDS = {
    "HIGH": (70, 100),    # Direct VR weapons training provider
    "MEDIUM": (40, 69),   # Partially relevant solution
    "LOW": (0, 39)        # Not relevant
}

In [15]:
# =============================================================================
# SCORING FUNCTIONS
# =============================================================================

def check_keywords(text: str, keywords: list) -> bool:
    """Check if any keyword exists in text (case-insensitive)."""
    if pd.isna(text):
        return False
    text_lower = text.lower()
    return any(kw.lower() in text_lower for kw in keywords)


def score_tags_vr(tags: str) -> int:
    """Score VR/AR/MR tags (max 30 points)."""
    config = SCORING_CONFIG["tags"]["vr_ar_mr"]
    if check_keywords(tags, config["keywords"]):
        return config["points"]
    return 0


def score_tags_weapons(tags: str) -> int:
    """Score weapons training tags (max 25 points)."""
    config = SCORING_CONFIG["weapons_tags"]
    if check_keywords(tags, config["keywords"]):
        return config["points"]
    return 0


def score_overview(overview: str) -> dict:
    """
    Score overview text based on keywords.
    Returns dict with individual scores for transparency.
    """
    scores = {
        "vr_score": 0,
        "weapons_score": 0,
        "audience_score": 0,
        "tactical_score": 0
    }
    
    if pd.isna(overview):
        return scores
    
    config = SCORING_CONFIG["overview"]
    
    # VR keywords (+15 points)
    if check_keywords(overview, config["vr_keywords"]["keywords"]):
        scores["vr_score"] = config["vr_keywords"]["points"]
    
    # Weapons keywords (+15 points)
    if check_keywords(overview, config["weapons_keywords"]["keywords"]):
        scores["weapons_score"] = config["weapons_keywords"]["points"]
    
    # Target audience keywords (+10 points)
    if check_keywords(overview, config["target_audience"]["keywords"]):
        scores["audience_score"] = config["target_audience"]["points"]
    
    # Tactical keywords (+5 points)
    if check_keywords(overview, config["tactical_keywords"]["keywords"]):
        scores["tactical_score"] = config["tactical_keywords"]["points"]
    
    return scores


def get_relevance_category(score: int) -> str:
    """Determine relevance category based on total score."""
    for category, (min_score, max_score) in RELEVANCE_THRESHOLDS.items():
        if min_score <= score <= max_score:
            return category
    return "LOW"

In [16]:
# =============================================================================
# MAIN SCORING PIPELINE
# =============================================================================

def score_company(row: pd.Series) -> pd.Series:
    """
    Calculate all scores for a single company.
    Returns a Series with all scoring components.
    """
    # Tags scoring
    tags_vr_score = score_tags_vr(row.get("tags", ""))
    tags_weapons_score = score_tags_weapons(row.get("tags", ""))
    
    # Overview scoring
    overview_scores = score_overview(row.get("overview", ""))
    
    # Calculate total
    total_score = (
        tags_vr_score +
        tags_weapons_score +
        overview_scores["vr_score"] +
        overview_scores["weapons_score"] +
        overview_scores["audience_score"] +
        overview_scores["tactical_score"]
    )
    
    # Cap at 100
    total_score = min(total_score, 100)
    
    return pd.Series({
        "tags_vr_score": tags_vr_score,
        "tags_weapons_score": tags_weapons_score,
        "overview_vr_score": overview_scores["vr_score"],
        "overview_weapons_score": overview_scores["weapons_score"],
        "overview_audience_score": overview_scores["audience_score"],
        "overview_tactical_score": overview_scores["tactical_score"],
        "total_score": total_score,
        "relevance": get_relevance_category(total_score)
    })

## Apply Scoring Pipeline

In [17]:
# Apply scoring to all companies
scores_df = df.apply(score_company, axis=1)

# Combine original data with scores
df_scored = pd.concat([df, scores_df], axis=1)

# Sort by total score (highest first)
df_scored = df_scored.sort_values("total_score", ascending=False)

print(f"‚úÖ Scored {len(df_scored)} companies")
print(f"\nRelevance Distribution:")
print(df_scored["relevance"].value_counts())

‚úÖ Scored 1733 companies

Relevance Distribution:
relevance
LOW       1512
MEDIUM     175
HIGH        46
Name: count, dtype: int64


## Results: Top Scored Companies

In [8]:
# Display top companies with score breakdown
display_cols = [
    "company_name", 
    "total_score", 
    "relevance",
    "tags_vr_score",
    "tags_weapons_score",
    "overview_vr_score",
    "overview_weapons_score",
    "overview_audience_score",
    "overview_tactical_score"
]

print("üèÜ TOP 20 Companies by Score:\n")
df_scored[display_cols].head(20)

üèÜ TOP 20 Companies by Score:



Unnamed: 0,company_name,total_score,relevance,tags_vr_score,tags_weapons_score,overview_vr_score,overview_weapons_score,overview_audience_score,overview_tactical_score
1390,Shoot House,95,HIGH,30,25,15,15,10,0
1119,Operator XR Pty Ltd,90,HIGH,30,25,15,15,0,5
243,Bluedrop Training & Simulation Inc,85,HIGH,30,25,15,15,0,0
1394,Sig Sauer,85,HIGH,30,25,0,15,10,5
216,BERETTA DEFENSE TECHNOLOGIES,80,HIGH,30,25,0,15,10,0
673,"Grovtec USA, Inc.",80,HIGH,30,25,0,15,10,0
700,Heckler & Koch GmbH,80,HIGH,30,25,0,15,10,0
994,Militec Ltd,80,HIGH,30,25,0,15,10,0
355,Colt Canada Corporation,80,HIGH,30,25,0,15,10,0
1008,MKE,80,HIGH,30,25,0,15,10,0


In [9]:
# Filter by relevance category
def get_companies_by_relevance(df: pd.DataFrame, relevance: str) -> pd.DataFrame:
    """Get companies filtered by relevance category."""
    return df[df["relevance"] == relevance][["company_name", "total_score", "website", "overview"]].copy()

# HIGH relevance companies (direct VR weapons training providers)
high_relevance = get_companies_by_relevance(df_scored, "HIGH")
print(f"üéØ HIGH Relevance Companies ({len(high_relevance)}):")
high_relevance

üéØ HIGH Relevance Companies (46):


Unnamed: 0,company_name,total_score,website,overview
1390,Shoot House,95,http://www.shoothouse.co.uk,ShootHouse is a leading provider of cutting-ed...
1119,Operator XR Pty Ltd,90,https://operatorxr.com/,Operator XR provides a complete VR solution fo...
243,Bluedrop Training & Simulation Inc,85,http://katherinesmith@bluedrop.com,Bluedrop is a Canadian-based leader in advance...
1394,Sig Sauer,85,http://WWW.SIGSAUER.COM,"SIG SAUER, Inc. is a leading provider and manu..."
216,BERETTA DEFENSE TECHNOLOGIES,80,https://www.berettadefensetechnologies.com,Beretta Defense Technologies (BDT) is the stra...
673,"Grovtec USA, Inc.",80,,GrovTec is a U.S. manufacturer dedicated to de...
700,Heckler & Koch GmbH,80,http://www.heckler-koch.com,Heckler & Koch: Perfection for more than 75 ye...
994,Militec Ltd,80,http://www.militec.co.uk,"Founded in 1997, Militec Ltd specialise in the..."
355,Colt Canada Corporation,80,http://www.coltcanada.com,"In 1976, Canada hired Diemaco Inc. to repair a..."
1008,MKE,80,https://www.mke.gov.tr,MAKƒ∞NE ve Kƒ∞MYA END√úSTRƒ∞Sƒ∞ (MKE) is a state-ow...


## Export Scored Data

In [None]:
# Export scored data to CSV
output_path = "/Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_scored.csv"

df_scored.to_csv(output_path, index=False)
print(f"‚úÖ Exported to: {output_path}")

# Also export HIGH relevance companies separately
high_output_path = "/Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_high_relevance.csv"
high_relevance.to_csv(high_output_path, index=False)
print(f"‚úÖ HIGH relevance companies exported to: {high_output_path}")


# # Export to Excel with multiple sheets
# excel_output_path = "/Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_scored.xlsx"

# with pd.ExcelWriter(excel_output_path, engine='openpyxl') as writer:
#    df_scored.to_excel(writer, sheet_name='All Companies', index=False)
#    high_relevance.to_excel(writer, sheet_name='HIGH Relevance', index=False)

# print(f"‚úÖ Excel file exported to: {excel_output_path}")

‚úÖ Exported to: /Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_scored.csv
‚úÖ HIGH relevance companies exported to: /Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_high_relevance.csv
‚úÖ Excel file exported to: /Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_scored.xlsx
‚úÖ Excel file exported to: /Users/s.rublivskyi/Documents/Dev/scrape-DSEI.co.uk/data/processed/companies_scored.xlsx
