class RishabhVaibhav:
def __init__(self):
self.role = "Senior Data Engineer"
self.companies = ["GlobalLogic (Client: Google)", "ALTEN India (Client: Airbus)"]
self.experience = "3.9+ years"
self.education = "B.Tech CSE — Haldia Institute of Technology (CGPA: 7.9)"
self.cert = "Microsoft Certified: Azure Developer Associate (AZ-204)"
self.focus = ["ETL Pipelines", "Web Scraping @ Scale", "LLM-Powered Workflows"]
self.data_scale = "1TB+ monthly | 5M+ documents | 500GB+ daily"
def current_mission(self):
return "Building production-grade data infrastructure that actually works at scale."🔵 GlobalLogic — Senior Data Engineer | Oct 2024 – Mar 2026
🟢 Client: Google — Government Schemes Pipeline
| Metric | Achievement |
|---|---|
| Records Crawled | 15,000+ across 20+ websites |
| Data Processed | 50GB+ with Pydantic validation |
| Error Rate Reduction | 8% → 0.5% |
| Pipelines Built | 12 scalable pipelines with retry logic |
| Storage | 1M+ records in Knowledge Graph |
Stack:
Selenium·Playwright·BeautifulSoup·LangChain·Pydantic·SC Studio·CI/CD
🟠 Client: AuditBoard — Regulatory Data Scraper Engine
| Metric | Achievement |
|---|---|
| Documents Collected | 5M+ from 200+ regulatory websites |
| Monthly Data | 2TB+ with 98% schema compliance |
| Manual Work Eliminated | 320+ hours/month via n8n |
| Scrapers Managed | 400+ via Flet Python desktop UI |
| Jurisdictions | UK, US & Global |
Stack:
BFS/DFS Crawler·Azure OpenAI·LangChain·n8n·Flet·Azure VM·EC2·Delta Versioning
🔵 ALTEN India — Data Engineer | Jul 2022 – Sep 2024
| Metric | Achievement |
|---|---|
| Daily Data Processed | 500GB+ flight records |
| Query Performance | Sub-2s on 10M+ records |
| Dashboard Users | 200+ Airbus engineers & analysts |
| Data Sources | 50+ flight data sources |
| Visualizations | 15+ interactive charts |
Stack:
PySpark·Pandas·NumPy·AWS RDS·Django REST API·Django Auth·SQL
🔭 Building LLM-powered data pipelines at GlobalLogic (Client: Google)
🌱 Exploring advanced RAG architectures and vector databases
⚡ Scaling web scraping systems to handle millions of documents
🛡️ Implementing robust anti-bot bypass strategies at scale
📊 Designing Knowledge Graph solutions for structured data storage