- Web scraping at scale with Python (Selenium, Playwright, Scrapy, BeautifulSoup)
- Automation pipelines for data collection, cleaning, and storage
- API integrations (REST/GraphQL) and browser automation
- Data wrangling with Pandas, exporting to CSV/JSON/DB
- Learning ML & AI to build smarter data products
- Built bots that extract thousands of pages/day with rotating proxies & retries
- Designed resilient anti-bot bypass flows (stealth drivers, human-like waits, captchas via services)
- Delivered clean datasets ready for analysis & model training
- Currently exploring feature engineering, vector databases, and LLM-powered scraping assistants
/
- π― Current focus: data labeling, feature engineering, small ML models for classification/regression
- π§ Next up: LLM-assisted scraping, RAG for document-heavy sites, agent workflows
- π Notes & experiments live here β
/labs
- Full-site data extraction (anti-bot aware) β CSV/JSON/DB
- PDF/image capture & text extraction (OCR)
- API discovery & reverse engineering for private endpoints
- Dashboard/API to deliver data (FastAPI + simple UI)
- Ongoing monitoring for price changes, stock, new listings
π Need data? Open an issue or reach out!
Made with β€οΈ, Python, and a lot of headless browsers.