Scrapes team stats and player game logs from nrl.com, plus named team lists for upcoming rounds. Data is stored as JSON and optionally converted to CSV.
Install dependencies (requires Playwright):
pip install playwright pandas
playwright install chromium1. scrape_nrl_seasons.py → nrl_outputs/nrl_{season}_team_gamelogs.json
→ nrl_outputs/nrl_{season}_player_gamelogs.json
2. get_future_teams.py → nrl_outputs/nrl_{season}_round_{N}_team_lists.json (upcoming rounds only)
3. convert_jsons_to_csvs.py → nrl_outputs/nrl_{season}_team_gamelogs.csv
→ nrl_outputs/nrl_{season}_player_gamelogs.csv
→ nrl_outputs/nrl_all_seasons_team_gamelogs.csv
→ nrl_outputs/nrl_all_seasons_player_gamelogs.csv
4. clean_games_new.py → cleaned/transformed DataFrames ready for analysis
To scrape one season — for example, only the 2026 season — set both --start-season and --end-season to the same year:
python scrape_nrl_seasons.py --start-season 2026 --end-season 2026The scraper automatically detects which rounds are already saved and only fetches missing ones, so it is safe to re-run at any time to pick up new rounds.
To also write CSV files alongside the JSON outputs, add --write-csv:
python scrape_nrl_seasons.py --start-season 2026 --end-season 2026 --write-csvTo see the browser while scraping (useful for debugging):
python scrape_nrl_seasons.py --start-season 2026 --end-season 2026 --show-browser| File | Contents |
|---|---|
nrl_outputs/nrl_2026_team_gamelogs.json |
One row per team per game (both home and away stats) |
nrl_outputs/nrl_2026_player_gamelogs.json |
One row per player per game (full stat line) |
To scrape multiple seasons in one run (e.g., 2020 through 2026):
python scrape_nrl_seasons.py --start-season 2020 --end-season 2026Seasons that are already fully scraped (all 31 rounds present) are skipped automatically. To force a full re-scrape of every season in the range:
python scrape_nrl_seasons.py --start-season 2020 --end-season 2026 --force-rescrapeTo scrape the full historical dataset (2015 onwards), use the defaults:
python scrape_nrl_seasons.py| Flag | Default | Description |
|---|---|---|
--start-season |
2015 |
First season to scrape |
--end-season |
2026 |
Last season to scrape |
--write-csv |
off | Also write per-season CSVs |
--force-rescrape |
off | Re-scrape even if data already exists |
--force-round |
off | Re-scrape specific round(s) even if already saved |
--show-browser |
off | Run with a visible browser window |
--empty-round-stop |
2 |
Stop a season after N consecutive empty rounds |
Once a round is saved to the JSON, the scraper considers it complete and will not re-visit it on subsequent runs. If you ran the scraper mid-week and only some of the round's games had been played, use --force-round to re-scrape that round and pick up the remaining results:
# Re-scrape round 4 only
python scrape_nrl_seasons.py --start-season 2026 --end-season 2026 --force-round 4You can also target multiple rounds at once:
python scrape_nrl_seasons.py --start-season 2026 --end-season 2026 --force-round 3 4--force-round strips the old rows for those rounds from the existing JSON, re-scrapes them fresh, then merges the result back in. All other saved rounds are left untouched.
Note:
--force-roundand--force-rescrapeare different.--force-roundtargets specific rounds only;--force-rescrapere-scrapes the entire season from scratch.
get_future_teams.py scrapes the named squads (players, jersey numbers, positions) announced ahead of a round — before match stats are available. This is useful for building pre-game prediction features.
Auto-detect the next upcoming round:
python get_future_teams.pyTarget a specific season and round:
python get_future_teams.py --season 2026 --round-num 5| File | Contents |
|---|---|
nrl_outputs/nrl_2026_round_5_team_lists.json |
Named squad rows (player, number, position) |
nrl_outputs/nrl_2026_round_5_team_lists.csv |
Same data as CSV |
| Flag | Default | Description |
|---|---|---|
--season |
auto-detect | Season to target |
--round-num |
auto-detect | Round number to target |
--show-browser |
off | Run with a visible browser window |
After scraping, run convert_jsons_to_csvs.py to convert all JSON files in nrl_outputs/ into clean, deduplicated CSVs:
python convert_jsons_to_csvs.pyThis will:
- Write a per-season CSV for every JSON file found (e.g.,
nrl_2026_team_gamelogs.csv) - Write two combined all-seasons CSVs:
nrl_outputs/nrl_all_seasons_team_gamelogs.csvnrl_outputs/nrl_all_seasons_player_gamelogs.csv
To skip writing per-season CSVs and only produce the combined files, set WRITE_PER_SEASON_CSVS = False at the top of the script.
clean_games_new.py loads the combined CSVs and applies transformations ready for modelling or analysis:
python clean_games_new.pyKey transformations applied:
- Sorts by
seasonandround_num - Merges player game logs with future team lists
- Strips
Coachrows from player data - Converts
mins_played,stint_one,stint_twofrommm:ssto decimal minutes - Converts
time_in_possessionfrommm:ssto total seconds - Creates a
game_idcolumn ({home_3_letters}{away_3_letters}{season}{round_num}) - Saves cleaned output to
nrl_outputs/nrl_tryscorers.csv
| File | Script | Description |
|---|---|---|
nrl_outputs/nrl_{season}_team_gamelogs.json |
scrape_nrl_seasons.py |
Raw team stats per game, per season |
nrl_outputs/nrl_{season}_player_gamelogs.json |
scrape_nrl_seasons.py |
Raw player stats per game, per season |
nrl_outputs/nrl_{season}_round_{N}_team_lists.json |
get_future_teams.py |
Named squads for an upcoming round |
nrl_outputs/nrl_{season}_team_gamelogs.csv |
convert_jsons_to_csvs.py |
Per-season team stats CSV |
nrl_outputs/nrl_{season}_player_gamelogs.csv |
convert_jsons_to_csvs.py |
Per-season player stats CSV |
nrl_outputs/nrl_all_seasons_team_gamelogs.csv |
convert_jsons_to_csvs.py |
All seasons combined team stats |
nrl_outputs/nrl_all_seasons_player_gamelogs.csv |
convert_jsons_to_csvs.py |
All seasons combined player stats |
nrl_outputs/nrl_tryscorers.csv |
clean_games_new.py |
Cleaned and transformed player data |
Finals rounds are mapped to the following round numbers:
| Round Number | Label |
|---|---|
| 28 | Finals Week 1 |
| 29 | Finals Week 2 |
| 30 | Finals Week 3 |
| 31 | Grand Final |