This project is a browser automation script built with Playwright (Python) that scrapes public profile information from a GitHub user page and stores the extracted data into a Supabase database.
-
Automates Chromium browser using Playwright
-
Scrapes GitHub profile details:
- Username
- Name (if available)
- Bio (if available)
- Followers count
- Following count
- Number of repositories
-
Stores data into Supabase table
-
Handles optional fields safely
- Python
- Playwright (Browser Automation)
- Supabase (Database)
github-scraper/
│
├── main.py # Main scraper script
├── .gitignore
├── README.md
└── venv/ # Virtual environment (ignored)
git clone <your-repo-url>
cd github-scraper
python -m venv venv
venv\Scripts\activate # Windows
pip install playwright supabase python-dotenv
playwright install
Create a .env file in the root folder:
SUPABASE_URL=your_project_url
SUPABASE_KEY=your_anon_key
python main.py
When executed:
- Browser opens automatically
- Profile data is scraped
- Data is inserted into Supabase table
create table github_profiles (
id uuid default uuid_generate_v4() primary key,
username text,
name text,
bio text,
followers text,
following text,
repositories text
);
- Target GitHub profile is public.
- GitHub UI selectors remain stable.
- Some fields like
nameorbiomay be empty.
- Add error handling and retry logic
- Convert follower counts to integers
- Support scraping multiple profiles
- Add logging system
- Dockerize the application
Rithik