An intelligent AI-powered platform that analyzes the compatibility between PhD students and professors by matching CVs with professor profiles. The system uses advanced LLM technology to provide comprehensive match analysis, research alignment insights, and personalized recommendations.
This application helps prospective PhD students:
- Analyze compatibility with potential advisors by comparing their CVs against professor profiles
- Get detailed insights on research alignment, mentorship style, and lab culture
- Receive actionable recommendations for improving their application strategy
- Access structured reports with fit scores, risk assessments, and gap analysis
- Intelligent CV Processing: Extracts and structures information from PDF/DOCX CVs using LLM
- Multi-Page Website Crawling: Automatically crawls professor websites (up to 10 pages, depth 2) using Crawl4AI
- Comprehensive Analysis:
- Professor profile extraction and lab analysis
- Multi-dimensional match analysis (9 dimensions)
- Structured JSON reports for dashboard visualization
- Smart Caching: Reduces API calls with intelligent caching for CVs, professor profiles, and analysis reports
- Parallel Processing: Optimized execution with parallel CV and professor profile processing
- Retry Logic: Robust error handling with exponential backoff for rate limits
- URL Validation: Security checks to ensure only valid public HTTPS URLs are processed
- FastAPI for REST API endpoints
- SQLite database for professor profiles and match reports
- Crawl4AI for web crawling
- OpenAI-compatible API for LLM processing
- Async/await for concurrent processing
- React 19 with modern hooks
- Tailwind CSS for styling
- Framer Motion & GSAP for animations
- Recharts for data visualization
- React Markdown for report rendering
- Python 3.8+
- Node.js 18+ and npm
- OpenAI-compatible API key (or Zeabur API)
- Crawl4AI dependencies (browser automation)
git clone <repository-url>
cd PhDuoV2cd backend
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtcd frontend
# Install dependencies
npm installCreate a .env file in the backend directory:
# LLM API Configuration
ZEABUR_API_KEY=your_api_key_here
ZEABUR_BASE_URL=https://hnd1.aihub.zeabur.ai/v1
# Database (optional, defaults to SQLite)
DATABASE_URL=sqlite:///./professors.dbThe backend uses environment variables for configuration:
ZEABUR_API_KEY: Your OpenAI-compatible API keyZEABUR_BASE_URL: Base URL for the LLM API (default: Zeabur endpoint)DATABASE_URL: Database connection string (default: SQLite)
The crawler is configured in backend/app/services/crawl4ai_service.py:
max_depth=2: Maximum link depth to crawlmax_pages=10: Maximum number of pages to crawlinclude_patterns: Only crawls pages within the root URL domain
cd backend
python run_server.pyThe server will start on http://localhost:7878
cd frontend
npm run devThe frontend will be available at http://localhost:5173 (or the port Vite assigns)
- Open the web interface in your browser
- Upload your CV (PDF or DOCX format, max 5MB)
- Enter the professor's website URL (must be HTTPS and public domain)
- Click "Analyze Match" and wait for processing
- View the results:
- Professor & Lab Analysis Report
- Match Analysis Report
- Structured Dashboard View with scores and visualizations
The system performs the following steps:
- CV Processing (Call #1): Extracts and structures CV data
- Professor Profile Extraction & Analysis (Call #2+3): Crawls website and generates profile + analysis
- Match Analysis & Refined Report (Call #3): Analyzes compatibility and generates structured JSON
Note: Steps 1 and 2 run in parallel for faster execution.
POST /analyze
- Description: Analyzes match between student CV and professor profile
- Request:
cv: File upload (PDF/DOCX)url: Professor website URL (form data)
- Response: Server-Sent Events (SSE) stream with status updates and final report
GET /dbp/professors
- List all professors with pagination
GET /dbp/professors/{professor_id}
- Get professor by ID
GET /dbp/professors/url/{root_url}
- Get professor by URL
GET /dbp/professors/search?query={query}
- Search professors by keyword
GET /dbp/professors/by-name?name={name}
- Get professors by name
GET /dbp/professors/by-university?university={university}
- Get professors by university
GET /dbp/professors/by-research?keyword={keyword}
- Get professors by research interest
GET /dbp/professors/by-venue?venue={venue}
- Get professors by publication venue
GET /dbp/stats
- Get database statistics
GET /api/match-reports
- List all match reports (history)
GET /api/match-reports/{report_id}
- Get specific match report by ID
PhDuoV2/
โโโ backend/
โ โโโ app/
โ โ โโโ api/ # API endpoints
โ โ โโโ services/ # Business logic services
โ โ โ โโโ crawl4ai_service.py # Web crawling service
โ โ โ โโโ llm_service.py # LLM operations
โ โ โ โโโ pdf_parser.py # PDF/DOCX parsing
โ โ โโโ database.py # Database configuration
โ โ โโโ models.py # SQLAlchemy models
โ โโโ prompts/ # LLM prompt templates
โ โโโ main.py # Main FastAPI application
โ โโโ run_server.py # Development server runner
โ โโโ requirements.txt # Python dependencies
โโโ frontend/
โ โโโ src/
โ โ โโโ components/ # React components
โ โ โโโ App.jsx # Main application component
โ โ โโโ main.jsx # Entry point
โ โโโ package.json # Node dependencies
โโโ crawl_cache/ # Cached website content
โโโ cv_cache/ # Cached CV extractions
โโโ match_log/ # Match analysis logs
โโโ metadata/ # Structured report JSONs
You can test the Crawl4AI service independently:
cd backend/app/services
python crawl4ai_service.pyModify the test_url variable in the main() function to test with different websites.
The system uses SQLite by default. Database operations are handled through:
database_operations.py: Helper functions for database queries- Automatic table creation on first run
The application implements multi-level caching:
- CV Cache: Stores extracted CV data by file hash
- Professor Profile Cache: Stores crawled website content
- Analysis Cache: Stores professor analysis reports (invalidated on profile changes)
- URL Validation: Only accepts HTTPS URLs from public domains
- File Size Limits: Maximum 5MB for CV uploads
- Input Sanitization: All inputs are validated and sanitized
- Error Handling: Comprehensive error handling with user-friendly messages
- Parallel Processing: CV and professor profile processing run concurrently
- Combined LLM Calls: Reduced from 5 to 3 maximum LLM calls per analysis
- Intelligent Caching: Reduces redundant API calls and processing
- Retry Logic: Handles rate limits with exponential backoff
Issue: "Crawling failed" error
- Solution: Ensure the URL is a valid HTTPS public domain. Check Crawl4AI dependencies are installed.
Issue: "Rate limit" errors
- Solution: The system automatically retries with exponential backoff. Check your API key limits.
Issue: Frontend can't connect to backend
- Solution: Ensure backend is running on port 7878 and CORS is properly configured.
Issue: Database errors
- Solution: Check file permissions for SQLite database. Delete
professors.dbto reset.
Note: This application requires an OpenAI-compatible API key. Make sure to configure your API credentials in the .env file before use.