<a href="https://colab.research.google.com/github/swathypk93/saasquatch-enhancer-/blob/main/SaaSquatch_Enhancer_Caprae.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Caprae AI-Readiness  Challenge

## Project Title:
**Smart Lead Scraper & Enricher (Lite Version)**

##  Objective:
To build a lean and functional tool that scrapes professional profiles and enriches them with contact data — helping companies quickly generate quality sales leads.

---

##  What It Does:
- Scrapes Google Search results for relevant **LinkedIn profiles** (e.g., CEOs in Bangalore)
- Extracts **name** and **LinkedIn URL**
- Generates realistic **dummy email addresses**
- Saves the enriched leads to a downloadable **CSV file**

---

##  Tools & Technologies Used:
- Python
- Google Colab
- `googlesearch-python`, `pandas`, `re`
- Colab file download module

---

##  Why This Adds Value:
- Demonstrates business understanding of **lead generation workflows**
- Simulates real-world **email enrichment** when paid API services aren't available
- Code can easily be extended with **Hunter.io** or **Clearbit** for real use

---

##  Alignment with Caprae Goals:
The tool enables efficient sourcing of relevant professionals from public platforms and enriches them for outbound contact — perfectly aligning with Caprae’s AI-first, operator-led approach to transforming businesses.

---


Please note that
 email enrichment was simulated using logic-based dummy email generation. This mimics real enrichment flow while maintaining deliverability.

---


In [1]:
!pip install googlesearch-python email-validator python-dotenv


Collecting googlesearch-python
  Downloading googlesearch_python-1.3.0-py3-none-any.whl.metadata (3.4 kB)
Collecting email-validator
  Downloading email_validator-2.2.0-py3-none-any.whl.metadata (25 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting dnspython>=2.0.0 (from email-validator)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading googlesearch_python-1.3.0-py3-none-any.whl (5.6 kB)
Downloading email_validator-2.2.0-py3-none-any.whl (33 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Downloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-dotenv, dnspython, googlesearch-python, email-validator
Successfully installed dnspython-2.7.0 email-validator-2.2.0 googlesearch-python-1.3.0 python-dotenv-1.1.1


In [6]:
from googlesearch import search
import re

def scrape_linkedin_profiles(role, location, num_results=100):
    query = f'site:linkedin.com/in "{role}" "{location}"'
    results = search(query, num_results=num_results)

    profiles = []
    for url in results:
        match = re.search(r'linkedin\.com/in/([^/?]+)', url)
        if match:
            name = match.group(1).replace('-', ' ').title()
            profiles.append({'name': name, 'linkedin_url': url})

    return profiles

# Example usage
leads = scrape_linkedin_profiles("CEO", "Bangalore", num_results=10)
for lead in leads:
    print(lead)


{'name': 'Bangalorerobotics', 'linkedin_url': 'https://in.linkedin.com/in/bangalorerobotics'}
{'name': 'Anandsriganesh', 'linkedin_url': 'https://in.linkedin.com/in/anandsriganesh'}
{'name': 'Hari Marar 77841A4', 'linkedin_url': 'https://in.linkedin.com/in/hari-marar-77841a4'}
{'name': 'Sharanhegde95', 'linkedin_url': 'https://in.linkedin.com/in/sharanhegde95'}
{'name': 'Drlakshmijagannathan', 'linkedin_url': 'https://in.linkedin.com/in/drlakshmijagannathan'}
{'name': 'Sunil Bangalore 4160135', 'linkedin_url': 'https://in.linkedin.com/in/sunil-bangalore-4160135'}
{'name': 'Krishna Kumar Shetty 1026A255', 'linkedin_url': 'https://in.linkedin.com/in/krishna-kumar-shetty-1026a255'}
{'name': 'Salim Javed 28B998172', 'linkedin_url': 'https://in.linkedin.com/in/salim-javed-28b998172'}
{'name': 'Lalit Ahuja 298139171', 'linkedin_url': 'https://in.linkedin.com/in/lalit-ahuja-298139171'}


In [3]:
import random

def generate_dummy_email(name):
    domains = ["gmail.com", "outlook.com", "startuphub.com", "bizmail.com"]
    email = name.lower().replace(" ", ".") + "@" + random.choice(domains)
    return email


In [4]:
import pandas as pd

enriched_data = []

for lead in leads:
    email = generate_dummy_email(lead['name'])
    enriched_data.append({
        'Name': lead['name'],
        'LinkedIn': lead['linkedin_url'],
        'Email': email
    })

df = pd.DataFrame(enriched_data)
df.to_csv("leads.csv", index=False)
df.head()


Unnamed: 0,Name,LinkedIn,Email
0,Bangalorerobotics,https://in.linkedin.com/in/bangalorerobotics,bangalorerobotics@gmail.com
1,Anandsriganesh,https://in.linkedin.com/in/anandsriganesh,anandsriganesh@gmail.com
2,Hari Marar 77841A4,https://in.linkedin.com/in/hari-marar-77841a4,hari.marar.77841a4@gmail.com
3,Sharanhegde95,https://in.linkedin.com/in/sharanhegde95,sharanhegde95@outlook.com
4,Drlakshmijagannathan,https://in.linkedin.com/in/drlakshmijagannathan,drlakshmijagannathan@startuphub.com


In [5]:
from google.colab import files
files.download("leads.csv")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>