# 🎯 Lead Scoring & Prioritization Tool Walkthrough
This notebook demonstrates the **Lead Scoring Tool** developed in Python.  
It covers:
- Data generation for sample leads
- Lead scoring algorithm
- Confidence calculation
- Lead categorization
- Console demo outputs (CSV & plots)
- Unit testing for core functions


In [1]:
import sys
import os
import random
import argparse
from datetime import datetime, timedelta

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import unittest

# Streamlit may not be available in notebook
try:
    import streamlit as st
    STREAMLIT_AVAILABLE = True
except Exception:
    STREAMLIT_AVAILABLE = False


In [2]:
# Hide Streamlit Deploy and extra UI
if STREAMLIT_AVAILABLE:
    hide_streamlit_style = """
        <style>
        #MainMenu {visibility: hidden;}
        footer {visibility: hidden;}
        header {visibility: hidden;}
        [data-testid="stToolbar"] {visibility: hidden !important;}
        [data-testid="stDeployButton"] {display: none !important;}
        </style>
    """
    st.markdown(hide_streamlit_style, unsafe_allow_html=True)


2025-10-08 13:28:09.875 
  command:

    streamlit run /home/mahi/venv/lib/python3.12/site-packages/ipykernel_launcher.py [ARGUMENTS]


In [3]:
def generate_sample_leads(n=50, seed=42):
    rng = random.Random(seed)
    np.random.seed(seed)

    companies = [f"Company {i+1}" for i in range(n)]
    leads_data = []
    for i in range(n):
        has_email = rng.random() > 0.15
        has_phone = rng.random() > 0.30
        has_linkedin = rng.random() > 0.25
        employees = rng.choice([10, 25, 50, 100, 250, 500, 1000])
        revenue = employees * (40000 + rng.uniform(20000, 120000))
        recent_funding = rng.random() > 0.65
        lead = {
            'company_name': companies[i],
            'contact_name': f"Person {i+1}",
            'title': rng.choice(['CEO', 'VP Sales', 'CTO', 'Marketing Manager', 'COO']),
            'email': f"lead{i+1}@company.com" if has_email else None,
            'phone': f"+1 (555) {rng.randint(100,999)}-{rng.randint(1000,9999)}" if has_phone else None,
            'linkedin_url': f"linkedin.com/in/person{i+1}" if has_linkedin else None,
            'company_size': employees,
            'estimated_revenue': revenue,
            'recent_funding': recent_funding
        }
        leads_data.append(lead)
    return pd.DataFrame(leads_data)

# Generate sample leads
df = generate_sample_leads(n=10)
df.head()


Unnamed: 0,company_name,contact_name,title,email,phone,linkedin_url,company_size,estimated_revenue,recent_funding
0,Company 1,Person 1,COO,lead1@company.com,,linkedin.com/in/person1,25,1848845.0,False
1,Company 2,Person 2,COO,,+1 (555) 303-9928,,25,2081652.0,False
2,Company 3,Person 3,Marketing Manager,lead3@company.com,+1 (555) 448-5552,linkedin.com/in/person3,1000,60649880.0,True
3,Company 4,Person 4,CTO,lead4@company.com,+1 (555) 718-5333,linkedin.com/in/person4,10,979927.3,False
4,Company 5,Person 5,COO,lead5@company.com,+1 (555) 982-6925,linkedin.com/in/person5,100,6788002.0,False


**Explanation:**  
This function generates **sample lead data** including:
- Company and contact details
- Employee count and revenue
- Funding status
- Email, phone, and LinkedIn availability


In [4]:
def calculate_lead_score(row, weights, rng=None):
    if rng is None:
        _rand = random.random
    else:
        _rand = rng.random

    score = 0.0
    factors = []

    # Company size
    if row['company_size'] >= 1000:
        pts = 25
    elif row['company_size'] >= 500:
        pts = 20
    elif row['company_size'] >= 100:
        pts = 15
    else:
        pts = 10
    score += pts * weights.get('company_size', 1.0)
    factors.append(("Company Size", pts * weights.get('company_size', 1.0)))

    # Revenue
    rev_m = row['estimated_revenue'] / 1_000_000
    if rev_m >= 50:
        pts = 25
    elif rev_m >= 20:
        pts = 20
    elif rev_m >= 5:
        pts = 15
    else:
        pts = 10
    score += pts * weights.get('revenue', 1.0)
    factors.append(("Revenue", pts * weights.get('revenue', 1.0)))

    # Data completeness
    comp = 0
    if row.get('email'): comp += 8
    if row.get('phone'): comp += 7
    if row.get('linkedin_url'): comp += 5
    score += comp * weights.get('data', 1.0)
    factors.append(("Data Completeness", comp * weights.get('data', 1.0)))

    # Engagement readiness
    pts = 0
    if row.get('recent_funding'): pts += 10
    if _rand() > 0.5: pts += 5
    score += pts * weights.get('engagement', 1.0)
    factors.append(("Engagement", pts * weights.get('engagement', 1.0)))

    # Title relevance
    if row.get('title') in ['CEO', 'CTO', 'COO', 'VP Sales', 'Chief Revenue Officer']:
        pts = 15
    else:
        pts = 8
    score += pts * weights.get('title', 1.0)
    factors.append(("Title", pts * weights.get('title', 1.0)))

    return min(100, round(score, 2)), factors

# Example scoring
weights = {'company_size': 1, 'revenue': 1, 'data': 1, 'engagement': 1, 'title': 1}
score, factors = calculate_lead_score(df.iloc[0], weights)
score, factors


(48.0,
 [('Company Size', 10),
  ('Revenue', 10),
  ('Data Completeness', 13),
  ('Engagement', 0),
  ('Title', 15)])

**Explanation:**  
The **lead scoring algorithm** calculates a score based on:
1. Company Size (0–25 pts)
2. Revenue (0–25 pts)
3. Data Completeness (0–20 pts)
4. Engagement Readiness (0–15 pts)
5. Title Relevance (0–15 pts)

Each factor is weighted and summed to give a total score (max 100).  
`factors` shows the **breakdown of points per factor**.


In [5]:
def calculate_confidence(row):
    c = 0
    if row.get('email'): c += 33
    if row.get('phone'): c += 33
    if row.get('linkedin_url'): c += 34
    return int(c)

# Example confidence
calculate_confidence(df.iloc[0])


67

**Explanation:**  
**Confidence score** is calculated based on available contact information:
- Email = 33%
- Phone = 33%
- LinkedIn = 34%
- Maximum = 100%


In [6]:
def apply_scoring(df, weights):
    rng = random.Random(42)
    results = df.apply(lambda row: calculate_lead_score(row, weights, rng), axis=1)
    df['lead_score'] = results.apply(lambda x: x[0])
    df['factors'] = results.apply(lambda x: x[1])
    df['confidence'] = df.apply(calculate_confidence, axis=1)
    df['category'] = df['lead_score'].apply(lambda s: 'Hot' if s >= 70 else 'Warm' if s >= 40 else 'Cold')
    return df

df = apply_scoring(df, weights)
df.head()


Unnamed: 0,company_name,contact_name,title,email,phone,linkedin_url,company_size,estimated_revenue,recent_funding,lead_score,factors,confidence,category
0,Company 1,Person 1,COO,lead1@company.com,,linkedin.com/in/person1,25,1848845.0,False,53.0,"[(Company Size, 10), (Revenue, 10), (Data Comp...",67,Warm
1,Company 2,Person 2,COO,,+1 (555) 303-9928,,25,2081652.0,False,42.0,"[(Company Size, 10), (Revenue, 10), (Data Comp...",33,Warm
2,Company 3,Person 3,Marketing Manager,lead3@company.com,+1 (555) 448-5552,linkedin.com/in/person3,1000,60649880.0,True,88.0,"[(Company Size, 25), (Revenue, 25), (Data Comp...",100,Hot
3,Company 4,Person 4,CTO,lead4@company.com,+1 (555) 718-5333,linkedin.com/in/person4,10,979927.3,False,55.0,"[(Company Size, 10), (Revenue, 10), (Data Comp...",100,Warm
4,Company 5,Person 5,COO,lead5@company.com,+1 (555) 982-6925,linkedin.com/in/person5,100,6788002.0,False,70.0,"[(Company Size, 15), (Revenue, 15), (Data Comp...",100,Hot


**Explanation:**  
We apply **scoring and confidence calculation** to all leads and categorize them:
- **Hot**: 70+ points  
- **Warm**: 40–69 points  
- **Cold**: <40 points


In [None]:
# Histogram of lead scores
fig = go.Figure()
fig.add_trace(go.Histogram(x=df['lead_score'], nbinsx=10, name='Lead Scores'))
fig.update_layout(title='Lead Score Distribution', xaxis_title='Score', yaxis_title='Count')
fig.show()


In [None]:
df[['company_name', 'contact_name', 'title', 'lead_score', 'confidence', 'category']].sort_values('lead_score', ascending=False)


**Explanation:**  
This table shows the **top leads sorted by lead score**, including:
- Company and contact
- Title
- Lead score
- Confidence score
- Category


In [None]:
class LeadScoringTests(unittest.TestCase):
    def test_calculate_confidence_full(self):
        row = {'email': 'a@b.com', 'phone': '+1', 'linkedin_url': 'ln'}
        self.assertEqual(calculate_confidence(row), 100)

    def test_calculate_confidence_empty(self):
        row = {'email': None, 'phone': None, 'linkedin_url': None}
        self.assertEqual(calculate_confidence(row), 0)

    def test_calculate_lead_score_high(self):
        row = {
            'company_size': 1000,
            'estimated_revenue': 60_000_000,
            'email': 'a@b.com',
            'phone': '123',
            'linkedin_url': 'ln',
            'recent_funding': True,
            'title': 'CEO'
        }
        weights = {'company_size': 1, 'revenue': 1, 'data': 1, 'engagement': 0, 'title': 1}
        score, _ = calculate_lead_score(row, weights, rng=random.Random(0))
        self.assertEqual(score, 85)

# Run tests
suite = unittest.defaultTestLoader.loadTestsFromTestCase(LeadScoringTests)
unittest.TextTestRunner(verbosity=2).run(suite)


**Explanation:**  
Unit tests ensure **core functions work correctly**:
- Confidence calculation
- Lead scoring for high-potential leads


## 🚀 Streamlit Demo

The full **interactive demo** can be run using Streamlit. This allows you to:
- Adjust weights for lead scoring
- View live lead scores, confidence, and categories
- Explore charts and top leads interactively

### How to run:

1. Save this notebook’s code (or the Python script) as `lead_scoring.py`.
2. Open a terminal and run:

```bash
streamlit run lead_scoring.py
