Skip to content
View Rishabhvaibhav's full-sized avatar

Block or report Rishabhvaibhav

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Rishabhvaibhav/README.md

🧠 About Me

class RishabhVaibhav:
    def __init__(self):
        self.role        = "Senior Data Engineer"
        self.companies   = ["GlobalLogic (Client: Google)", "ALTEN India (Client: Airbus)"]
        self.experience  = "3.9+ years"
        self.education   = "B.Tech CSE — Haldia Institute of Technology (CGPA: 7.9)"
        self.cert        = "Microsoft Certified: Azure Developer Associate (AZ-204)"
        self.focus       = ["ETL Pipelines", "Web Scraping @ Scale", "LLM-Powered Workflows"]
        self.data_scale  = "1TB+ monthly | 5M+ documents | 500GB+ daily"

    def current_mission(self):
        return "Building production-grade data infrastructure that actually works at scale."

💼 Experience

🔵 GlobalLogic — Senior Data Engineer  |  Oct 2024 – Mar 2026

🟢 Client: Google — Government Schemes Pipeline

Metric Achievement
Records Crawled 15,000+ across 20+ websites
Data Processed 50GB+ with Pydantic validation
Error Rate Reduction 8% → 0.5%
Pipelines Built 12 scalable pipelines with retry logic
Storage 1M+ records in Knowledge Graph

Stack: Selenium · Playwright · BeautifulSoup · LangChain · Pydantic · SC Studio · CI/CD


🟠 Client: AuditBoard — Regulatory Data Scraper Engine

Metric Achievement
Documents Collected 5M+ from 200+ regulatory websites
Monthly Data 2TB+ with 98% schema compliance
Manual Work Eliminated 320+ hours/month via n8n
Scrapers Managed 400+ via Flet Python desktop UI
Jurisdictions UK, US & Global

Stack: BFS/DFS Crawler · Azure OpenAI · LangChain · n8n · Flet · Azure VM · EC2 · Delta Versioning

🔵 ALTEN India — Data Engineer  |  Jul 2022 – Sep 2024

✈️ Client: Airbus — Flight Data Analytics Platform

Metric Achievement
Daily Data Processed 500GB+ flight records
Query Performance Sub-2s on 10M+ records
Dashboard Users 200+ Airbus engineers & analysts
Data Sources 50+ flight data sources
Visualizations 15+ interactive charts

Stack: PySpark · Pandas · NumPy · AWS RDS · Django REST API · Django Auth · SQL


🛠️ Tech Stack

🐍 Languages & Core Frameworks

🕷️ Web Scraping & Automation

☁️ Cloud & Infrastructure

🗄️ Databases

🤖 AI & LLM

🔧 Dev Tools


📊 GitHub Stats


🏆 GitHub Trophies


🎖️ Certification


🚀 What I'm Working On

🔭  Building LLM-powered data pipelines at GlobalLogic (Client: Google)
🌱  Exploring advanced RAG architectures and vector databases
⚡  Scaling web scraping systems to handle millions of documents
🛡️  Implementing robust anti-bot bypass strategies at scale
📊  Designing Knowledge Graph solutions for structured data storage

💬 Let's Connect & Build Something at Scale

"Good data pipelines are invisible. Great data pipelines are unforgettable."

Popular repositories Loading

  1. Django_Web_App Django_Web_App Public

    Flight Data Analysis and Visualization Project

    CSS 1

  2. Flight-analysis Flight-analysis Public

    Flight Delays and Cancellations

    Jupyter Notebook 1

  3. Rishabhvaibhav Rishabhvaibhav Public

    Config files for my GitHub profile.

  4. My-Calc-v1.0- My-Calc-v1.0- Public

    This calc is based on python code through import tkinter

    Python

  5. Certification Certification Public

    Certification

  6. googlecolab googlecolab Public

    Jupyter Notebook