# Investor-Startup Matching Using Gemini API

## Overview
This project utilizes the **Google Gemini API** to analyze and determine the compatibility between startups and potential investors. It evaluates compatibility based on industry alignment, funding stage, and investment range, generating a **match score (1-100)** with an explanation of the reasoning.

## Features
### ✅ Data Processing
- Cleans and structures startup funding data from a CSV file.
- Standardizes numerical values for accurate comparisons.
- Handles missing data for improved accuracy.

### 🤖 Gemini API Integration
- Uses **Google Gemini** models to generate compatibility scores between investors and startups.
- Provides a natural language explanation for each match.
- Ensures API response handling and error management.

### 📊 Investor Preferences Matching
- Matches startups with investors based on:
  - **Industry** (FinTech, HealthTech, EdTech, etc.)
  - **Investment Stage** (Seed, Series A, Series B, etc.)
  - **Investment Range** ($500K - $5M, etc.)
- Computes a **compatibility score (1-100)** for ranking matches.

### 🏆 Ranked Results
- Displays ranked matches with compatibility scores.
- Provides insights into the alignment of startups with investor preferences.
- Filters out mismatched investors to optimize the matching process.

## Installation
### Prerequisites
- Ensure you have **Python 3.8+** installed.

### Required Libraries
Install dependencies using:
```bash
pip install google-generativeai pandas
```

### Set Up Gemini API
To use the Gemini API, configure your API key:
```python
import google.generativeai as genai

genai.configure(api_key="YOUR_ACTUAL_API_KEY")
```
Replace `YOUR_ACTUAL_API_KEY` with your actual Google Gemini API key.

## Usage
1. **Prepare the dataset**: Update the `file_path` variable in `investor_matching.py` with your dataset path.
2. **Define investor preferences**: Customize investor criteria (e.g., preferred industry, funding range).
3. **Run the script**: Execute the script to generate ranked investor-startup matches.
4. **View results**: The output will display ranked matches with compatibility scores and explanations.

### Example Output
```
Investor: Investor A
Match Score: 90/100
Explanation:
- Strong industry match (FinTech)
- Investment range aligns ($500K - $2M)
- Series A funding preference matches startup stage
```

## File Structure
```
📂 Investor-Startup-Matching
│── investor_matching.ipynb   # Jupyter Notebook implementation
```

## Potential Improvements 🚀
- **Enhance Investor Criteria**:
  - Consider factors like investor location, past investments, and risk appetite.
- **Advanced Data Preprocessing**:
  - Improve missing data handling with machine learning techniques.
  - Standardize currency values for global investor-startup matching.
- **Web UI Development**:
  - Create an interactive web interface for user-friendly startup-investor matching.
  - Allow users to input investor preferences dynamically.
- **Machine Learning Model for Match Prediction**:
  - Train an ML model using historical startup-investor data for improved predictions.

## Deliverables 📦
- **Jupyter Notebook (`investor_matching.ipynb`)**: Contains code, explanations, and results.
- **README File (`README.md`)**: Detailed project documentation.
- **Dataset (`startups_data.csv`)**: Sample startup funding dataset for testing.

## License
This project is **open-source** and available for modifications under the MIT License. Contributions are welcome! 😊



In [2]:
import google.generativeai as genai
import pandas as pd

# Configure Gemini API
import google.generativeai as genai

genai.configure(api_key="AIzaSyD96ErhXsDJpEuQeJJv7GzDfdrh2zjlMwc")  # Replace with your actual key

# Load the dataset
file_path = r"C:\Users\kanch\Desktop\Founder-Investor-Matching\data\Indian startups funding in 2021.csv"
df = pd.read_csv(file_path)

# Clean dataset
df.dropna(inplace=True)
df["Amount(in dollars)"] = pd.to_numeric(df["Amount(in dollars)"].astype(str).str.replace(",", ""), errors="coerce")
df["Amount(in dollars)"].fillna(0, inplace=True)

# Example list of investors (Replace with actual investor dataset)
investors_list = [
    {"Name": "Investor A", "Preferred Sector": "FinTech", "Investment Range": "$500K - $2M", "Preferred Stage": "Seed, Series A"},
    {"Name": "Investor B", "Preferred Sector": "EdTech", "Investment Range": "$1M - $5M", "Preferred Stage": "Series A, Series B"},
    {"Name": "Investor C", "Preferred Sector": "HealthTech", "Investment Range": "$500K - $3M", "Preferred Stage": "Seed, Series A"},
]

def get_match_score(startup, investor):
    """
    Use Gemini API to analyze the match between a startup and an investor.
    Returns a compatibility score (1-100) and explanation.
    """

    prompt = f"""
    You are an expert in startup-investor matching. Given the following startup details:

    **Startup Name:** {startup['Company/Brand']}
    **Industry:** {startup['Sector']}
    **Description:** {startup['What it does']}
    **Founder(s):** {startup['Founder/s']}
    **Funding Amount Required:** {startup['Amount(in dollars)']}
    **Investment Stage:** {startup['Stage']}

    And the investor's preferences:
    - Investor Name: {investor['Name']}
    - Preferred Industry: {investor['Preferred Sector']}
    - Investment Range: {investor['Investment Range']}
    - Preferred Stage: {investor.get('Preferred Stage', 'Not Specified')}

    **Task:** 
    1. Analyze how well this startup aligns with the investor's preferences.
    2. Assign a **match score (1-100)** based on compatibility.
    3. Explain the reasoning behind the score.

    Provide your response in this format:
    ```
    Compatibility Score: X/100

    Explanation:
    - [Factor 1]: [Reason]
    - [Factor 2]: [Reason]
    - [Factor 3]: [Reason]
    ```
    """

    try:
        # Call the Gemini API
        model = genai.GenerativeModel("gemini-1.5-pro-latest")
        response = model.generate_content(prompt)
        
        if response.text:
            return response.text  # Return score and explanation
        else:
            return "API Error: No response received"
    
    except Exception as e:
        return f"API Error: {str(e)}"

def match_startup_to_investors(startup):
    """
    Matches a startup with multiple investors, ranks them by match score.
    """

    match_results = []

    for investor in investors_list:
        result = get_match_score(startup, investor)
        
        # Extract score from response
        score = int(result.split("Compatibility Score:")[1].split("/100")[0].strip()) if "Compatibility Score:" in result else 0
        
        match_results.append({
            "Investor": investor["Name"],
            "Score": score,
            "Explanation": result
        })

    # Rank investors by match score
    match_results = sorted(match_results, key=lambda x: x["Score"], reverse=True)

    return match_results

# Test with a startup example
startup_example = df.iloc[0].to_dict()
ranked_matches = match_startup_to_investors(startup_example)

# Display top matches
for match in ranked_matches:
    print(f"Investor: {match['Investor']}")
    print(f"Match Score: {match['Score']}/100")
    print(f"Explanation: {match['Explanation']}")
    print("-" * 50)


Investor: Investor B
Match Score: 90/100
Explanation: ```
Compatibility Score: 90/100

Explanation:
- Industry: Excellent match. The investor explicitly prefers EdTech, and CollegeDekho operates within the E-learning space, which falls directly under the EdTech umbrella.
- Investment Stage: Excellent match.  The startup is seeking Series B funding, which aligns perfectly with the investor's preferred investment stages of Series A and Series B.
- Funding Amount: Good match.  The startup is requesting $35,000,000, which falls slightly outside the investor's preferred range of $1M-$5M. However, given the investor's focus on EdTech and the stage of the startup, this discrepancy might not be a deal-breaker, particularly if the investor is open to slightly larger investments for promising companies.  It could also indicate a potential for syndication with other investors.
- Description/Potential: Strong potential. The description suggests a focus on student career goals, a large and growing 