# Semantic Search Prototype

This notebook demonstrates semantic search over the EV India 13 dataset using cosine similarity.


In [1]:
import json
import numpy as np
import sys
from pathlib import Path

# Add src to path
project_root = Path().resolve().parent
sys.path.insert(0, str(project_root / "src"))

from search import SemanticSearch, format_search_results

# Set up paths
data_path = project_root / "data" / "processed" / "cleaned_data.json"
embeddings_path = project_root / "data" / "processed" / "embeddings.npy"

print(f"Loading data from: {data_path}")
print(f"Loading embeddings from: {embeddings_path}")


Loading data from: /Users/kevinmcpherson/github-projects/emergent-ventures-semantic-map/data/processed/cleaned_data.json
Loading embeddings from: /Users/kevinmcpherson/github-projects/emergent-ventures-semantic-map/data/processed/embeddings.npy


In [2]:
# Load data and embeddings
with open(data_path, 'r') as f:
    cleaned_data = json.load(f)

embeddings = np.load(embeddings_path)

print(f"Loaded {len(cleaned_data)} entries")
print(f"Embeddings shape: {embeddings.shape}")

# Initialize search engine
search_engine = SemanticSearch(cleaned_data, embeddings)
print("\nSemantic search engine initialized!")


Loaded 18 entries
Embeddings shape: (18, 3072)

Semantic search engine initialized!


In [3]:
# Test query 1: AI and machine learning projects
query1 = "AI and machine learning projects"
print(f"Query: {query1}")
print("=" * 80)

results1 = search_engine.search(query1, top_k=5)
print(format_search_results(results1))


Query: AI and machine learning projects

Rank 1 (Similarity: 0.4264)
  Name: Adithya Sakaray, Steve Aldrin, and Aadhithya D
  Project: Recruitr AI
  Description: Automating video interviews using AI.
  Domains: AI, HR tech, automation
  Category: software

Rank 2 (Similarity: 0.3017)
  Name: Deev Mehta
  Project: Autonomous Farming Rover
  Description: Developing a rover to make farming autonomous.
  Domains: robotics, agritech, automation
  Category: hardware

Rank 3 (Similarity: 0.2982)
  Name: Uddhav Gupta
  Project: Speech Therapy App
  Description: Developing a speech therapy application for children with special needs.
  Domains: healthtech, education, AI, assistive technology
  Category: software

Rank 4 (Similarity: 0.2633)
  Name: Vatsal Hariramani
  Project: Affordable Neonatal Incubator
  Description: Developing a smart and affordable neonatal incubator for remote terrain.
  Domains: healthcare, medical devices, hardware
  Category: hardware

Rank 5 (Similarity: 0.2583)
  Na

In [4]:
# Test query 2: Healthcare and medical devices
query2 = "Healthcare and medical devices"
print(f"Query: {query2}")
print("=" * 80)

results2 = search_engine.search(query2, top_k=5)
print(format_search_results(results2))


Query: Healthcare and medical devices

Rank 1 (Similarity: 0.4104)
  Name: Vatsal Hariramani
  Project: Affordable Neonatal Incubator
  Description: Developing a smart and affordable neonatal incubator for remote terrain.
  Domains: healthcare, medical devices, hardware
  Category: hardware

Rank 2 (Similarity: 0.2816)
  Name: Rushab M
  Project: Temperature-Regulating Jacket
  Description: Developing a jacket that controls body temperature for outdoor workers.
  Domains: hardware, wearables, labor safety
  Category: hardware

Rank 3 (Similarity: 0.2773)
  Name: Uddhav Gupta
  Project: Speech Therapy App
  Description: Developing a speech therapy application for children with special needs.
  Domains: healthtech, education, AI, assistive technology
  Category: software

Rank 4 (Similarity: 0.2621)
  Name: Soumil Nema
  Project: Stem Cell Therapies for Neurological Disorders
  Description: Developing stem cell therapies aimed at treating stroke and neuropathy.
  Domains: biotech, regene

In [5]:
# Test query 3: Hardware and robotics
query3 = "Hardware and robotics"
print(f"Query: {query3}")
print("=" * 80)

results3 = search_engine.search(query3, top_k=5)
print(format_search_results(results3))


Query: Hardware and robotics

Rank 1 (Similarity: 0.4504)
  Name: Deev Mehta
  Project: Autonomous Farming Rover
  Description: Developing a rover to make farming autonomous.
  Domains: robotics, agritech, automation
  Category: hardware

Rank 2 (Similarity: 0.4004)
  Name: Vatsal Hariramani
  Project: Affordable Neonatal Incubator
  Description: Developing a smart and affordable neonatal incubator for remote terrain.
  Domains: healthcare, medical devices, hardware
  Category: hardware

Rank 3 (Similarity: 0.3718)
  Name: Rushab M
  Project: Temperature-Regulating Jacket
  Description: Developing a jacket that controls body temperature for outdoor workers.
  Domains: hardware, wearables, labor safety
  Category: hardware

Rank 4 (Similarity: 0.3468)
  Name: Prakyath Gowda
  Project: Lightweight EV Battery
  Description: Developing a lightweight and efficient electric vehicle battery.
  Domains: energy, EVs, battery technology, hardware
  Category: hardware

Rank 5 (Similarity: 0.3446)

In [6]:
# Test query 4: Education and learning platforms
query4 = "Education and learning platforms"
print(f"Query: {query4}")
print("=" * 80)

results4 = search_engine.search(query4, top_k=5)
print(format_search_results(results4))


Query: Education and learning platforms

Rank 1 (Similarity: 0.4118)
  Name: Sudhir Sarnobat and Rajendra Bagwe
  Project: HowFrameworks
  Description: Helping Indian SMEs unlock sustainable growth through a learning portal.
  Domains: SMEs, business education, learning platforms
  Category: organization

Rank 2 (Similarity: 0.2982)
  Name: Uddhav Gupta
  Project: Speech Therapy App
  Description: Developing a speech therapy application for children with special needs.
  Domains: healthtech, education, AI, assistive technology
  Category: software

Rank 3 (Similarity: 0.2938)
  Name: Habel Anwar
  Project: Physics Olympiad Preparation
  Description: Furthering physics Olympiad preparation and advancing physics knowledge and research.
  Domains: physics, education, talent development
  Category: education

Rank 4 (Similarity: 0.2666)
  Name: Adithya Sakaray, Steve Aldrin, and Aadhithya D
  Project: Recruitr AI
  Description: Automating video interviews using AI.
  Domains: AI, HR tech, 

In [7]:
# Test query 5: Sustainability and climate
query5 = "Sustainability and climate adaptation"
print(f"Query: {query5}")
print("=" * 80)

results5 = search_engine.search(query5, top_k=5)
print(format_search_results(results5))


Query: Sustainability and climate adaptation

Rank 1 (Similarity: 0.2889)
  Name: Anushka Punukollu
  Project: SucroSoil
  Description: Repurposing sugarcane waste into hydrogels to combat soil erosion in rural India.
  Domains: agriculture, sustainability, materials science, climate adaptation
  Category: startup

Rank 2 (Similarity: 0.2092)
  Name: Rushab M
  Project: Temperature-Regulating Jacket
  Description: Developing a jacket that controls body temperature for outdoor workers.
  Domains: hardware, wearables, labor safety
  Category: hardware

Rank 3 (Similarity: 0.2047)
  Name: Sajal Deolikar
  Project: Hybrid Powertrains
  Description: Developing hybrid powertrains for commercial vehicles to improve mileage and reduce emissions.
  Domains: energy, transportation, hardware
  Category: hardware

Rank 4 (Similarity: 0.2017)
  Name: Sudhir Sarnobat and Rajendra Bagwe
  Project: HowFrameworks
  Description: Helping Indian SMEs unlock sustainable growth through a learning portal.
  

In [8]:
# Find similar entries to a specific entry
print("Finding entries similar to first entry:")
print("=" * 80)
print(f"Query entry: {cleaned_data[0]['name']} - {cleaned_data[0]['project_name']}")

similar = search_engine.find_similar_entries(0, top_k=5, exclude_self=True)
print(format_search_results(similar))


Finding entries similar to first entry:
Query entry: Khyathi Komalan - Category Theory Research

Rank 1 (Similarity: 0.4286)
  Name: Prakyath Gowda
  Project: Lightweight EV Battery
  Description: Developing a lightweight and efficient electric vehicle battery.
  Domains: energy, EVs, battery technology, hardware
  Category: hardware

Rank 2 (Similarity: 0.4089)
  Name: Habel Anwar
  Project: Physics Olympiad Preparation
  Description: Furthering physics Olympiad preparation and advancing physics knowledge and research.
  Domains: physics, education, talent development
  Category: education

Rank 3 (Similarity: 0.4055)
  Name: Samarth K J
  Project: Career Development Grant
  Description: General career development support.
  Domains: career development, engineering
  Category: career

Rank 4 (Similarity: 0.3848)
  Name: Sajal Deolikar
  Project: Hybrid Powertrains
  Description: Developing hybrid powertrains for commercial vehicles to improve mileage and reduce emissions.
  Domains: e