Conversation
- Add full support for national teams across all club endpoints
- Add new /clubs/{club_id}/competitions endpoint to retrieve club competitions
- Add isNationalTeam field to Club Profile response schema
- Make Club Profile fields optional to accommodate national teams
- Enhance Club Players endpoint to handle national team HTML structure
- Update XPath expressions to support both club and national team structures
- Add intelligent detection logic for national teams
- Maintain backward compatibility with existing club endpoints
This update enables the API to work seamlessly with both regular clubs
and national teams, providing a unified interface for all club-related
data retrieval.
|
Warning Rate limit exceeded@eskoubar95 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 18 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (5)
WalkthroughAdds national-team support across schemas, services, and endpoints; introduces club competitions and competition seasons services/endpoints; expands XPath selectors and parsing to handle alternate page structures; updates CI, Dockerfile, docs, settings, and tests; adjusts several schemas to allow optional fields and adds tournament-size configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as "API Endpoint"
participant Service as "Service Layer"
participant Parser as "HTML Parser/XPath"
participant Response
rect rgb(200,220,255)
note over Client,Response: Club Competitions Flow
Client->>API: GET /clubs/{club_id}/competitions?season_id=...
API->>Service: instantiate TransfermarktClubCompetitions(club_id, season_id)
Service->>Service: __post_init__: build URL, fetch page
Service->>Parser: locate RECORD_TABLE via XPath
Parser-->>Service: competition rows/links
Service->>Service: __parse_club_competitions(): extract id,name,url
Service-->>API: dict {id, seasonId, competitions[]}
API-->>Response: serialized ClubCompetitions JSON
Response-->>Client: 200 OK
end
sequenceDiagram
participant Client
participant API as "API Endpoint"
participant Service as "Service Layer"
participant Parser as "HTML Parser/XPath"
participant Response
rect rgb(220,255,220)
note over Client,Response: Competition Seasons Flow
Client->>API: GET /competitions/{competition_id}/seasons
API->>Service: instantiate TransfermarktCompetitionSeasons(competition_id)
Service->>Service: __post_init__: build URL, fetch page
Service->>Parser: extract season options via multiple XPath strategies
Parser-->>Service: season strings/options
Service->>Service: __parse_seasons(): detect type & parse to start/end years
Service-->>API: dict {id, name, seasons[]}
API-->>Response: serialized CompetitionSeasons JSON
Response-->>Client: 200 OK
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
- Add GET /competitions/{competition_id}/seasons endpoint
- Implement TransfermarktCompetitionSeasons service to scrape season data
- Add CompetitionSeason and CompetitionSeasons Pydantic schemas
- Support both cross-year (e.g., 25/26) and single-year (e.g., 2025) seasons
- Handle historical seasons correctly (e.g., 99/00 -> 1999-2000)
- Extract seasons from competition page dropdown/table structure
- Return season_id, season_name, start_year, and end_year for each season
- Sort seasons by start_year descending (newest first)
Closes #[issue-number]
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (9)
.github/workflows/check.yml (1)
12-13: Consider updating GitHub Actions to the latest stable versions.Actionlint flagged that
actions/checkout@v3andactions/setup-python@v4are outdated. While still functional, updating tov4andv5respectively will ensure compatibility with current GitHub Actions infrastructure and receive bug fixes and security updates.Apply this diff to modernize the actions:
- - uses: actions/checkout@v3 - - uses: actions/setup-python@v4 + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5.gitignore (1)
165-167: Consider grouping with other IDE-specific rules.The OpenMemory/Cursor rules could be grouped with the PyCharm
.idea/section (around line 161) for better organization, since both are IDE-specific ignores.openmemory.md (1)
1-90: Consider adding competition seasons documentation.This project memory guide documents Club Profile and Club Competitions services but is missing documentation for the new Competition Seasons service (
app/services/competitions/seasons.py) mentioned in the PR objectives. Consider adding a section similar to the Club Competitions Service section.app/services/competitions/search.py (1)
41-41: Consider paddingnamefor consistency.The
namelist is not padded like the other fields. If the NAMES XPath returns fewer elements than URLS, the zip will silently truncate results. For consistency with other fields, consider paddingnameas well.app/services/clubs/players.py (1)
115-120: Consider paddingplayers_namesandplayers_positionsfor robustness.The
base_lengthalignment is well-implemented for most fields. However,players_names(line 57) andplayers_positions(line 58, used in the non-national-team path) are not padded. If these XPath queries return fewer elements thanplayers_ids, the zip will silently truncate results.+ # Ensure players_names and players_positions are aligned + if len(players_names) != base_length: + players_names = (players_names + [None] * base_length)[:base_length] + if not is_national_team and len(players_positions) != base_length: + players_positions = (players_positions + [None] * base_length)[:base_length] + players_nationalities = [nationality.xpath(Clubs.Players.NATIONALITIES) for nationality in page_nationalities]app/services/competitions/seasons.py (2)
19-27: Verify URL domain consistency with other services.This service uses
https://www.transfermarkt.com/...whileTransfermarktClubCompetitionsuseshttps://www.transfermarkt.us/.... If the rest of the codebase standardizes on one domain (often.ushere), consider aligning this URL to avoid inconsistent behavior, redirects, or geo‑specific differences.
140-275: Season extraction and response structure look robust overall.The multi‑stage fallback in
__parse_seasons(table cell → dropdown → generic list/text search) plus deduping and filtering logic yields a cleanseasonslist, andget_competition_seasonswires it into the response in a way consistent with other services (id,name,seasons, populated via XPath and helpers).Aside from the year‑normalization details noted elsewhere, this structure and flow look good.
Also applies to: 277-289
app/services/clubs/competitions.py (2)
23-31: Revisit defaultseason_idheuristic for “current season”.
self.season_iddefaults todatetime.now().year. On many competitions Transfermarkt uses the start year of a season (e.g., 2024 for season 2024/25). With this heuristic:
- In Jan–Jun 2025, the “current” active season is 2024/25, but
datetime.now().yearwould give2025, which might not exist yet or might represent 2025/26.If you want “current active season”, consider something like:
- If month >= 7 →
year, elseyear - 1.Or make the default explicit in the API/docs (e.g., “defaults to calendar year, not necessarily current active season”) to avoid surprises.
32-38: Minor style: prefer top‑level imports fordatetimeandHTTPException.Importing
datetimeandHTTPExceptioninside__post_init__is safe but non‑idiomatic for this codebase; moving them to module scope will simplify the function and avoid repeated imports if this method is ever called frequently.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
.github/workflows/check.yml(1 hunks).gitignore(1 hunks)CHANGELOG.md(1 hunks)app/api/endpoints/clubs.py(2 hunks)app/api/endpoints/competitions.py(2 hunks)app/schemas/clubs/__init__.py(1 hunks)app/schemas/clubs/competitions.py(1 hunks)app/schemas/clubs/profile.py(2 hunks)app/schemas/competitions/__init__.py(1 hunks)app/schemas/competitions/seasons.py(1 hunks)app/services/clubs/competitions.py(1 hunks)app/services/clubs/players.py(1 hunks)app/services/clubs/profile.py(1 hunks)app/services/competitions/search.py(2 hunks)app/services/competitions/seasons.py(1 hunks)app/utils/xpath.py(3 hunks)openmemory.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
app/schemas/competitions/__init__.py (1)
app/schemas/competitions/seasons.py (1)
CompetitionSeasons(13-16)
app/schemas/clubs/__init__.py (1)
app/schemas/clubs/competitions.py (1)
ClubCompetitions(12-15)
app/services/clubs/players.py (3)
app/utils/xpath.py (6)
Clubs(105-191)Clubs(223-236)Players(1-102)Players(148-184)Past(179-184)Present(172-177)app/services/base.py (1)
get_list_by_xpath(132-150)app/utils/utils.py (1)
safe_regex(65-84)
app/schemas/competitions/seasons.py (1)
app/schemas/base.py (2)
AuditMixin(10-11)TransfermarktBaseModel(14-90)
app/schemas/clubs/competitions.py (1)
app/schemas/base.py (2)
AuditMixin(10-11)TransfermarktBaseModel(14-90)
app/services/clubs/competitions.py (3)
app/services/base.py (2)
TransfermarktBase(16-223)request_url_page(104-117)app/utils/utils.py (1)
extract_from_url(19-46)app/utils/xpath.py (4)
Clubs(105-191)Clubs(223-236)Competitions(186-191)Competitions(194-236)
app/api/endpoints/competitions.py (2)
app/services/competitions/seasons.py (2)
TransfermarktCompetitionSeasons(10-289)get_competition_seasons(277-289)app/schemas/competitions/seasons.py (1)
CompetitionSeasons(13-16)
app/services/competitions/seasons.py (4)
app/services/base.py (4)
TransfermarktBase(16-223)request_url_page(104-117)raise_exception_if_not_found(119-130)get_text_by_xpath(152-206)app/utils/utils.py (1)
trim(49-62)app/utils/xpath.py (5)
Competitions(186-191)Competitions(194-236)Profile(18-57)Profile(106-138)Profile(195-210)app/api/endpoints/competitions.py (1)
get_competition_seasons(28-31)
app/services/competitions/search.py (3)
app/services/base.py (1)
get_list_by_xpath(132-150)app/utils/xpath.py (5)
Competitions(186-191)Competitions(194-236)Search(59-70)Search(140-146)Search(212-221)app/utils/utils.py (1)
extract_from_url(19-46)
🪛 actionlint (1.7.9)
.github/workflows/check.yml
12-12: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
13-13: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 GitHub Actions: Code Check
app/schemas/competitions/seasons.py
[error] 1-17: Ruff reported multiple issues: F401 imported but unused (Optional), W391 extra newline at end of file, and other formatting problems on this file. Fix imports, trailing newline, and spacing.
app/services/competitions/seasons.py
[error] 1-290: Ruff reported linting errors in this file (multiple issues: unused imports F401, trailing/blank line warnings W391/W293, line length E501, and formatting/compression issues). Please run ruff --fix and address all listed items.
app/utils/xpath.py
[error] 199-210: Line length (E501) and related formatting issues detected in xpath definitions. Please shorten long lines or split expressions and re-run linting.
🔇 Additional comments (15)
.github/workflows/check.yml (2)
10-10: Solid updates for runner and Python environment consistency.Upgrading to
ubuntu-latestand explicitly pinning Python 3.9 ensures the workflow runs on stable, current infrastructure and maintains consistent test environments. This aligns well with the new Pydantic schemas and type hints introduced in this PR.Also applies to: 14-15
15-15: Verify that Python 3.9 is the intended minimum version.Python 3.9 reached end-of-life in October 2025. Consider whether the project should target a newer minimum version (e.g., 3.10 or 3.11) for an extended support window and access to newer language features.
Please confirm:
- Is Python 3.9 the intentional minimum version for this project?
- If not, which Python version(s) should be supported?
app/schemas/clubs/__init__.py (1)
1-4: LGTM!The re-export follows the established pattern and correctly exposes
ClubCompetitionsat the package level.app/schemas/clubs/profile.py (2)
11-11: LGTM!Making
national_team_playersoptional correctly handles national teams that don't have this field.
40-46: LGTM!Making
stadium_name,stadium_seats, andcurrent_transfer_recordoptional, and adding theis_national_teamflag appropriately supports national teams while maintaining backward compatibility with regular clubs.app/schemas/clubs/competitions.py (1)
1-15: LGTM!The schema structure is clean, follows existing patterns with
TransfermarktBaseModelandAuditMixin, and appropriately models club competitions data.app/utils/xpath.py (3)
154-164: LGTM!The XPath unions correctly handle both regular club structures and national team structures, enabling broader data extraction coverage.
186-191: LGTM!The new
Clubs.Competitionsclass provides well-structured XPath constants for extracting competition data from the Record table.
224-236: LGTM!The comments clearly explain the purpose of the union XPaths, and the patterns correctly match both regular clubs and national teams.
app/services/clubs/profile.py (1)
73-97: Approve with verification note on edge cases.The national team detection logic is reasonable, combining structural indicators (no legal form/stadium) with competition-based checks. The None-safety for
league_nameat line 86 is correct.Consider whether legitimate clubs with incomplete profile data (missing legal form or stadium) participating in international tournaments might be incorrectly flagged. The dual condition (structural + competition) mitigates this, but verify with edge cases like smaller national associations or youth national teams.
app/api/endpoints/clubs.py (1)
35-39: LGTM!The new endpoint follows the established patterns: consistent naming, optional
season_idparameter, and proper response model configuration matching other endpoints.app/api/endpoints/competitions.py (1)
27-31: LGTM!The new seasons endpoint is well-structured and correctly uses
response_model_exclude_none=Trueto handle optional fields cleanly. The implementation follows the established service pattern.app/schemas/competitions/__init__.py (1)
3-3: LGTM!The re-export follows the established pattern and properly exposes the new
CompetitionSeasonsschema.app/services/clubs/players.py (1)
59-114: Solid national team parsing implementation.The row-by-row extraction with URL matching effectively handles the different HTML structure for national teams. The
seen_urlsset correctly prevents duplicates.One consideration: the detection heuristic at line 61 (
PAGE_INFOSempty + players exist) might trigger for edge cases where a regular club's page has an unusual structure. This is a pragmatic approach, but consider if additional signals could strengthen the detection.app/services/clubs/competitions.py (1)
39-72: Club competitions parsing and response wiring look solid.
- Using
Clubs.Competitions.RECORD_TABLEas a guard ensures you only parse when the expected structure is present.- Extracting
href, derivingcomp_idviaextract_from_url, and building{id, name, url}objects per row is straightforward and resilient (skips rows without a competition link or missing id/name).get_club_competitionsreturns{id, seasonId, competitions}using the sharedresponsedict fromTransfermarktBase, which aligns with existing patterns.No functional issues spotted here.
Also applies to: 74-86
| from typing import Optional | ||
|
|
||
| from app.schemas.base import AuditMixin, TransfermarktBaseModel | ||
|
|
||
|
|
||
| class CompetitionSeason(TransfermarktBaseModel): | ||
| season_id: str | ||
| season_name: str | ||
| start_year: int | ||
| end_year: int | ||
|
|
||
|
|
||
| class CompetitionSeasons(TransfermarktBaseModel, AuditMixin): | ||
| id: str | ||
| name: str | ||
| seasons: list[CompetitionSeason] | ||
|
|
There was a problem hiding this comment.
Fix linting issues flagged by pipeline.
The pipeline reports unused import and trailing newline issues. Apply these fixes:
-from typing import Optional
-
from app.schemas.base import AuditMixin, TransfermarktBaseModel
class CompetitionSeason(TransfermarktBaseModel):
season_id: str
season_name: str
start_year: int
end_year: int
class CompetitionSeasons(TransfermarktBaseModel, AuditMixin):
id: str
name: str
seasons: list[CompetitionSeason]
-📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import Optional | |
| from app.schemas.base import AuditMixin, TransfermarktBaseModel | |
| class CompetitionSeason(TransfermarktBaseModel): | |
| season_id: str | |
| season_name: str | |
| start_year: int | |
| end_year: int | |
| class CompetitionSeasons(TransfermarktBaseModel, AuditMixin): | |
| id: str | |
| name: str | |
| seasons: list[CompetitionSeason] | |
| from app.schemas.base import AuditMixin, TransfermarktBaseModel | |
| class CompetitionSeason(TransfermarktBaseModel): | |
| season_id: str | |
| season_name: str | |
| start_year: int | |
| end_year: int | |
| class CompetitionSeasons(TransfermarktBaseModel, AuditMixin): | |
| id: str | |
| name: str | |
| seasons: list[CompetitionSeason] |
🧰 Tools
🪛 GitHub Actions: Code Check
[error] 1-17: Ruff reported multiple issues: F401 imported but unused (Optional), W391 extra newline at end of file, and other formatting problems on this file. Fix imports, trailing newline, and spacing.
🤖 Prompt for AI Agents
In app/schemas/competitions/seasons.py around lines 1 to 17, remove the unused
import "Optional" from the typing imports and ensure the file ends with a single
trailing newline (no extra blank lines) so the linter stops flagging an unused
import and trailing-newline issues; save and re-run the pipeline to confirm the
fixes.
| def pad_list(lst, length): | ||
| """ | ||
| Pad a list to a specified length with None values. | ||
|
|
||
| Args: | ||
| lst: The list to pad. | ||
| length: The target length for the list. | ||
|
|
||
| Returns: | ||
| The original list if it's already the target length, otherwise a padded list. | ||
| """ | ||
| return lst if len(lst) == length else lst + [None] * (length - len(lst)) |
There was a problem hiding this comment.
Handle lists longer than base_length.
If any extracted list is longer than base_length, the padding logic returns the original list unchanged (since [None] * negative is empty). This would cause a length mismatch in the zip. Consider truncating:
- return lst if len(lst) == length else lst + [None] * (length - len(lst))
+ if len(lst) >= length:
+ return lst[:length]
+ return lst + [None] * (length - len(lst))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def pad_list(lst, length): | |
| """ | |
| Pad a list to a specified length with None values. | |
| Args: | |
| lst: The list to pad. | |
| length: The target length for the list. | |
| Returns: | |
| The original list if it's already the target length, otherwise a padded list. | |
| """ | |
| return lst if len(lst) == length else lst + [None] * (length - len(lst)) | |
| def pad_list(lst, length): | |
| """ | |
| Pad a list to a specified length with None values. | |
| Args: | |
| lst: The list to pad. | |
| length: The target length for the list. | |
| Returns: | |
| The original list if it's already the target length, otherwise a padded list. | |
| """ | |
| if len(lst) >= length: | |
| return lst[:length] | |
| return lst + [None] * (length - len(lst)) |
🤖 Prompt for AI Agents
In app/services/competitions/search.py around lines 54 to 65, pad_list currently
returns the original list when it's longer than the target length which breaks
downstream zipping; change it so that if len(lst) > length it returns
lst[:length], if len(lst) < length it returns lst + [None] * (length -
len(lst)), and if len(lst) == length return lst; ensure you handle non-positive
length defensively (e.g., treat length <= 0 as an empty list) to avoid negative
repeats.
| ## [Unreleased] | ||
|
|
||
| ### Added | ||
| - **National Teams Support**: Full support for national teams across all club endpoints | ||
| - Added `isNationalTeam` boolean field to Club Profile response schema | ||
| - Club Profile endpoint now automatically detects and indicates if a club is a national team | ||
| - Club Players endpoint now supports fetching players from national teams | ||
| - Club Competitions endpoint supports national teams | ||
|
|
||
| - **New Endpoint**: `/clubs/{club_id}/competitions` | ||
| - Retrieves all competitions a club participates in for a given season | ||
| - Supports both regular clubs and national teams | ||
| - Returns competition ID, name, and URL for each competition | ||
| - Defaults to current season if `season_id` is not provided | ||
|
|
||
| ### Changed | ||
| - **Club Profile Schema**: Made several fields optional to accommodate national teams | ||
| - `stadium_name`, `stadium_seats`, `current_transfer_record` are now optional | ||
| - `squad.national_team_players` is now optional | ||
| - `league` fields can now be `None` for national teams | ||
|
|
||
| - **Club Players Endpoint**: Enhanced to handle different HTML structures | ||
| - Automatically detects national teams and uses appropriate parsing logic | ||
| - Handles different table structures between clubs and national teams | ||
| - Ensures all player data lists are properly aligned for correct parsing | ||
|
|
||
| ### Technical Details | ||
| - Updated XPath expressions to support both club and national team HTML structures | ||
| - Added intelligent detection logic for national teams based on HTML structure | ||
| - Improved error handling and data validation for edge cases | ||
| - All changes maintain backward compatibility with existing club endpoints |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Missing documentation for competition seasons endpoint.
The PR objectives describe a new /competitions/{competition_id}/seasons endpoint, but the changelog only documents /clubs/{club_id}/competitions. Consider adding an entry for the competition seasons feature:
- **New Endpoint**: `/clubs/{club_id}/competitions`
- Retrieves all competitions a club participates in for a given season
- Supports both regular clubs and national teams
- Returns competition ID, name, and URL for each competition
- Defaults to current season if `season_id` is not provided
+- **New Endpoint**: `/competitions/{competition_id}/seasons`
+ - Retrieves all available seasons for a competition
+ - Returns season ID, name, start year, and end year
+ - Seasons sorted by start year descending
+ - Handles cross-year (25/26), single-year (2025), and century-crossing (99/00) formats📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## [Unreleased] | |
| ### Added | |
| - **National Teams Support**: Full support for national teams across all club endpoints | |
| - Added `isNationalTeam` boolean field to Club Profile response schema | |
| - Club Profile endpoint now automatically detects and indicates if a club is a national team | |
| - Club Players endpoint now supports fetching players from national teams | |
| - Club Competitions endpoint supports national teams | |
| - **New Endpoint**: `/clubs/{club_id}/competitions` | |
| - Retrieves all competitions a club participates in for a given season | |
| - Supports both regular clubs and national teams | |
| - Returns competition ID, name, and URL for each competition | |
| - Defaults to current season if `season_id` is not provided | |
| ### Changed | |
| - **Club Profile Schema**: Made several fields optional to accommodate national teams | |
| - `stadium_name`, `stadium_seats`, `current_transfer_record` are now optional | |
| - `squad.national_team_players` is now optional | |
| - `league` fields can now be `None` for national teams | |
| - **Club Players Endpoint**: Enhanced to handle different HTML structures | |
| - Automatically detects national teams and uses appropriate parsing logic | |
| - Handles different table structures between clubs and national teams | |
| - Ensures all player data lists are properly aligned for correct parsing | |
| ### Technical Details | |
| - Updated XPath expressions to support both club and national team HTML structures | |
| - Added intelligent detection logic for national teams based on HTML structure | |
| - Improved error handling and data validation for edge cases | |
| - All changes maintain backward compatibility with existing club endpoints | |
| ## [Unreleased] | |
| ### Added | |
| - **National Teams Support**: Full support for national teams across all club endpoints | |
| - Added `isNationalTeam` boolean field to Club Profile response schema | |
| - Club Profile endpoint now automatically detects and indicates if a club is a national team | |
| - Club Players endpoint now supports fetching players from national teams | |
| - Club Competitions endpoint supports national teams | |
| - **New Endpoint**: `/clubs/{club_id}/competitions` | |
| - Retrieves all competitions a club participates in for a given season | |
| - Supports both regular clubs and national teams | |
| - Returns competition ID, name, and URL for each competition | |
| - Defaults to current season if `season_id` is not provided | |
| - **New Endpoint**: `/competitions/{competition_id}/seasons` | |
| - Retrieves all available seasons for a competition | |
| - Returns season ID, name, start year, and end year | |
| - Seasons sorted by start year descending | |
| - Handles cross-year (25/26), single-year (2025), and century-crossing (99/00) formats | |
| ### Changed | |
| - **Club Profile Schema**: Made several fields optional to accommodate national teams | |
| - `stadium_name`, `stadium_seats`, `current_transfer_record` are now optional | |
| - `squad.national_team_players` is now optional | |
| - `league` fields can now be `None` for national teams | |
| - **Club Players Endpoint**: Enhanced to handle different HTML structures | |
| - Automatically detects national teams and uses appropriate parsing logic | |
| - Handles different table structures between clubs and national teams | |
| - Ensures all player data lists are properly aligned for correct parsing | |
| ### Technical Details | |
| - Updated XPath expressions to support both club and national team HTML structures | |
| - Added intelligent detection logic for national teams based on HTML structure | |
| - Improved error handling and data validation for edge cases | |
| - All changes maintain backward compatibility with existing club endpoints |
fbeced8 to
93929b3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
app/services/competitions/seasons.py (1)
61-70: 2-digit year parsing heuristic may misparse historical seasons.The threshold of 90 means seasons like
"85/86"would be parsed as 2085–2086 instead of 1985–1986. Consider using a dynamic pivot based on current year (e.g., years > current_year % 100 + 10 map to 1900s).
🧹 Nitpick comments (1)
app/services/clubs/players.py (1)
115-168: Padding logic provides robustness, but be aware of silent truncation.The normalization pattern
(list + [default] * base_length)[:base_length]handles short lists well but will silently truncate lists longer thanbase_length. This is likely acceptable as a defensive measure against malformed HTML, but consider logging when truncation occurs for debugging purposes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
app/api/endpoints/competitions.py(2 hunks)app/schemas/competitions/__init__.py(1 hunks)app/schemas/competitions/search.py(1 hunks)app/schemas/competitions/seasons.py(1 hunks)app/services/clubs/players.py(1 hunks)app/services/competitions/seasons.py(1 hunks)app/utils/xpath.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- app/schemas/competitions/init.py
- app/schemas/competitions/seasons.py
🧰 Additional context used
🧬 Code graph analysis (3)
app/api/endpoints/competitions.py (2)
app/services/competitions/seasons.py (1)
get_competition_seasons(282-294)app/schemas/competitions/seasons.py (1)
CompetitionSeasons(11-14)
app/services/clubs/players.py (3)
app/utils/xpath.py (6)
Clubs(105-191)Clubs(227-240)Players(1-102)Players(148-184)Past(179-184)Present(172-177)app/services/base.py (1)
get_list_by_xpath(132-150)app/utils/utils.py (1)
safe_regex(65-84)
app/services/competitions/seasons.py (4)
app/services/base.py (4)
TransfermarktBase(16-223)request_url_page(104-117)raise_exception_if_not_found(119-130)get_text_by_xpath(152-206)app/utils/utils.py (1)
trim(49-62)app/utils/xpath.py (5)
Competitions(186-191)Competitions(194-240)Profile(18-57)Profile(106-138)Profile(195-214)app/api/endpoints/competitions.py (1)
get_competition_seasons(28-31)
🔇 Additional comments (7)
app/schemas/competitions/search.py (1)
9-11: Relaxingcountry/clubs/playersto optional fits partial-data/national-team casesMaking these fields optional with a
Nonedefault is consistent with search results that may not have those counts or a country (e.g., national teams or incomplete data). Just double‑check key consumers ofCompetitionSearchResult(sorting, aggregations, formatting) to ensure they don’t assumeclubs/playersare always non‑null integers orcountryalways a non‑empty string, and handleNoneexplicitly where needed.app/api/endpoints/competitions.py (1)
27-31: LGTM!The new endpoint follows the established pattern used by
get_competition_clubs, with appropriate response model and service integration.app/services/competitions/seasons.py (1)
282-294: LGTM!The public API follows the established pattern used by other Transfermarkt services, with clear documentation and consistent response structure.
app/utils/xpath.py (4)
154-164: LGTM – Union patterns correctly support both club and national team structures.The XPath unions allow graceful fallback between
posrelastructure (regular clubs) and alternative patterns (national teams). The formatting is clean and within line limits.
186-192: LGTM!The new
Clubs.Competitionsnested class provides well-structured XPath constants for competition record extraction.
198-214: Season dropdown selectors look good.The multi-line formatting addresses previous line-length concerns, and the fallback patterns provide robustness for varying HTML structures.
227-240: LGTM – National team support added to club URLs and names.The inline comments clearly document the purpose of each union branch.
|
|
||
| # Detect if this is a national team (no posrela structure) | ||
| is_national_team = len(self.page.xpath(Clubs.Players.PAGE_INFOS)) == 0 and len(players_names) > 0 | ||
|
|
||
| # Handle national teams: extract data from rows directly | ||
| if is_national_team: | ||
| # Get all player rows (may contain duplicates for home/away) | ||
| all_player_rows = self.page.xpath( | ||
| "//div[@id='yw1']//tbody//tr[.//td[@class='hauptlink']//a[contains(@href, '/profil/spieler')]]", | ||
| ) | ||
|
|
||
| # Extract data from rows, matching with player URLs to avoid duplicates | ||
| players_positions = [] | ||
| players_dobs_raw = [] | ||
| seen_urls = set() | ||
|
|
||
| for url in self.get_list_by_xpath(Clubs.Players.URLS): | ||
| # Skip if we've already processed this URL | ||
| if url in seen_urls: | ||
| continue | ||
| seen_urls.add(url) | ||
|
|
||
| # Find first row containing this player URL | ||
| for row in all_player_rows: | ||
| row_urls = row.xpath(".//td[@class='hauptlink']//a[contains(@href, '/profil/spieler')]/@href") | ||
| if url in row_urls: | ||
| tds = row.xpath(".//td") | ||
| # Position is in TD[4] for national teams | ||
| if len(tds) > 4: | ||
| pos_text = "".join(tds[4].xpath(".//text()")).strip() | ||
| players_positions.append(pos_text if pos_text else None) | ||
| else: | ||
| players_positions.append(None) | ||
|
|
||
| # DOB/Age is in TD[5] for national teams | ||
| if len(tds) > 5: | ||
| dob_text = "".join(tds[5].xpath(".//text()")).strip() | ||
| players_dobs_raw.append(dob_text if dob_text else None) | ||
| else: | ||
| players_dobs_raw.append(None) | ||
| break | ||
| else: | ||
| # URL not found in any row | ||
| players_positions.append(None) | ||
| players_dobs_raw.append(None) | ||
|
|
||
| players_dobs = [safe_regex(dob_age, REGEX_DOB, "dob") for dob_age in players_dobs_raw] | ||
| players_ages = [safe_regex(dob_age, REGEX_DOB, "age") for dob_age in players_dobs_raw] |
There was a problem hiding this comment.
Potential data misalignment between players_ids and national team parsed lists.
When is_national_team is true, the loop (lines 75-103) skips duplicate URLs via seen_urls, producing players_positions and players_dobs_raw with only unique entries. However, players_ids (line 56) is built from all URLs before this check, potentially including duplicates.
This causes players_ids to have more elements than players_positions/players_dobs, leading to incorrect player-to-attribute mapping even after padding normalization.
Consider deduplicating players_ids for national teams:
+ # For national teams, deduplicate players_ids to match positions/dobs
+ if is_national_team:
+ unique_ids = []
+ seen_ids = set()
+ for pid in players_ids:
+ if pid not in seen_ids:
+ seen_ids.add(pid)
+ unique_ids.append(pid)
+ players_ids = unique_ids
+
# Ensure all lists have the same length as players_ids
base_length = len(players_ids)Alternatively, rebuild players_ids using the same seen_urls set from the national team parsing loop.
Committable suggestion skipped: line range outside the PR's diff.
- Detect national team competitions (FIWC, EURO, COPA, AFAC, GOCU, AFCN) - Use /teilnehmer/pokalwettbewerb/ URL for national team competitions - Handle season_id correctly (year-1 for national teams in URL) - Add XPath expressions for participants table - Limit participants to expected tournament size to exclude non-qualified teams - Make season_id optional in CompetitionClubs schema - Update Dockerfile PYTHONPATH configuration
27b61c7 to
c3f2410
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
app/services/competitions/seasons.py (1)
108-142: Consider extracting shared year-normalization logic to reduce duplication.The 2-digit year to full year conversion logic (lines 128-135) duplicates the logic in
__parse_season_to_years(lines 62-69). Extracting a helper method would improve maintainability and prevent future divergence.def __normalize_two_digit_year(self, two_digit: int) -> int: """Convert a 2-digit year to a full year using the 90-threshold heuristic.""" if two_digit >= 90: return 1900 + two_digit return 2000 + two_digitThen use it in both methods:
- if start_num >= 90: - return str(1900 + start_num) - else: - return str(2000 + start_num) + return str(self.__normalize_two_digit_year(start_num))app/services/competitions/clubs.py (1)
162-190: Minor: Unreachable fallback code.At lines 176-177, the fallback
if url_season_id:inside theexcept ValueErrorblock is unreachable becauseint(url_season_id)at line 172 would only fail ifurl_season_idis not a valid integer, but line 169 already checkedurl_season_id.isdigit(). The try/except is still reasonable for defensive coding, but the inner fallback condition is redundant.if url_season_id and url_season_id.isdigit(): try: # Convert back to tournament year (add 1) tournament_year = str(int(url_season_id) + 1) self.response["seasonId"] = tournament_year except ValueError: - # Fallback to original season_id if conversion fails - if url_season_id: - self.response["seasonId"] = url_season_id + # This should not happen since isdigit() was checked + pass elif self.season_id:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.gitignore(1 hunks)Dockerfile(1 hunks)app/api/endpoints/competitions.py(2 hunks)app/schemas/competitions/clubs.py(2 hunks)app/services/competitions/clubs.py(4 hunks)app/services/competitions/seasons.py(1 hunks)app/utils/xpath.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .gitignore
- app/api/endpoints/competitions.py
🧰 Additional context used
🧬 Code graph analysis (2)
app/services/competitions/clubs.py (3)
app/services/base.py (3)
TransfermarktBase(16-223)get_list_by_xpath(132-150)get_text_by_xpath(152-206)app/utils/utils.py (2)
extract_from_url(19-46)trim(49-62)app/utils/xpath.py (7)
Competitions(186-191)Competitions(194-265)Clubs(105-191)Clubs(232-265)Profile(18-57)Profile(106-138)Profile(195-219)
app/services/competitions/seasons.py (4)
app/services/base.py (4)
TransfermarktBase(16-223)request_url_page(104-117)raise_exception_if_not_found(119-130)get_text_by_xpath(152-206)app/utils/utils.py (1)
trim(49-62)app/utils/xpath.py (5)
Competitions(186-191)Competitions(194-265)Profile(18-57)Profile(106-138)Profile(195-219)app/api/endpoints/competitions.py (1)
get_competition_seasons(28-31)
🔇 Additional comments (12)
Dockerfile (1)
4-4: Clean PYTHONPATH configuration for containerized environment.The change to set
PYTHONPATH=/appexplicitly is appropriate for a containerized application. This is cleaner than concatenating with$PYTHONPATH, which would be empty in the base Python image anyway. The explicit absolute path ensures consistent module discovery for the new services, schemas, and endpoints added in this PR.Verify that the import paths work correctly by ensuring that app/main.py and all new modules can properly resolve imports like
from app.services.competitions.seasons import ...with this PYTHONPATH setup.app/schemas/competitions/clubs.py (1)
1-15: LGTM!Making
season_idoptional withOptional[str] = Noneproperly accommodates national team competitions where the season identifier may not be available or applicable.app/utils/xpath.py (4)
154-164: LGTM!The union-based XPath expressions appropriately handle multiple HTML structures for player data extraction, providing fallback paths for different page layouts.
186-192: LGTM!The new
Clubs.Competitionsnested class provides well-structured XPath selectors for extracting competition records and links.
197-219: LGTM!The season-related XPath selectors provide comprehensive fallback patterns to handle various HTML structures on competition pages. The multi-line string concatenation is properly formatted.
232-265: LGTM!The updated XPath selectors with union patterns properly handle both regular clubs and national teams. The inline comments explaining the rationale are helpful for maintainability.
Minor consideration: The
PARTICIPANTS_URLSandPARTICIPANTS_NAMESselectors depend on the heading text "Clubs starting into tournament", which could break if Transfermarkt changes the heading. This is acceptable for scraping but worth noting for future maintenance.app/services/competitions/seasons.py (3)
144-223: LGTM with a note on the robust fallback strategy.The multiple XPath fallback patterns provide good resilience against HTML structure variations. The cascading approach (table cell → dropdown → alternative patterns → generic text extraction) is well-organized.
242-283: LGTM!The season filtering logic properly handles deduplication, filters out non-season text entries, and validates season formats. Sorting by
startYeardescending provides a good user experience with the most recent seasons first.
285-297: LGTM!The public API method is clean and returns a well-structured response with competition metadata and the parsed seasons list.
app/services/competitions/clubs.py (3)
8-16: LGTM!The
NATIONAL_TEAM_COMPETITIONSmapping provides a clean way to identify and handle national team competitions. The URL slugs align with Transfermarkt's URL structure.
126-149: LGTM!The name cleaning logic handles various page title formats appropriately. The cascading string replacements and the fallback in the
next()call provide good robustness.
70-85: LGTM!The
__post_init__method cleanly branches between national team and regular league URL construction, maintaining good separation of concerns.
| if is_cross_year: | ||
| # Cross-year format: "25/26" -> start_year: 2025, end_year: 2026 | ||
| # Handle both "25/26" and "2024/25" formats, and old seasons like "99/00" | ||
| parts = season_name.split("/") | ||
| if len(parts) == 2: | ||
| start_part = parts[0].strip() | ||
| end_part = parts[1].strip() | ||
|
|
||
| # Parse start year | ||
| if len(start_part) == 2: | ||
| start_num = int(start_part) | ||
| # If start is >= 90, it's likely 1900s (e.g., "99/00" = 1999/2000) | ||
| # Otherwise assume 2000s (e.g., "25/26" = 2025/2026) | ||
| if start_num >= 90: | ||
| start_year = 1900 + start_num | ||
| else: | ||
| start_year = 2000 + start_num | ||
| else: | ||
| start_year = int(start_part) | ||
|
|
||
| # Parse end year | ||
| if len(end_part) == 2: | ||
| end_num = int(end_part) | ||
| start_num = int(start_part) if len(start_part) == 2 else (start_year % 100) | ||
| # If end is less than start (e.g., "99/00"), it means we crossed century | ||
| # So if start is 1999, end should be 2000 | ||
| if end_num < start_num: | ||
| # Crossed century boundary - end is in next century | ||
| if start_year >= 2000: | ||
| end_year = 2000 + end_num | ||
| else: | ||
| # Start is in 1900s, end is in 2000s | ||
| end_year = 2000 + end_num | ||
| else: | ||
| # Same century as start | ||
| if start_year >= 2000: | ||
| end_year = 2000 + end_num | ||
| else: | ||
| end_year = 1900 + end_num | ||
| else: | ||
| end_year = int(end_part) | ||
|
|
||
| return (start_year, end_year) |
There was a problem hiding this comment.
Missing exception handling for malformed cross-year season strings.
The int() calls at lines 63, 71, 75, 76, and 93 can raise ValueError if the season string contains non-numeric parts (e.g., "25/abc"). While the single-year path (lines 98-103) has a try/except, the cross-year path does not.
Apply this diff to wrap the cross-year parsing in a try-except:
if is_cross_year:
# Cross-year format: "25/26" -> start_year: 2025, end_year: 2026
# Handle both "25/26" and "2024/25" formats, and old seasons like "99/00"
parts = season_name.split("/")
if len(parts) == 2:
- start_part = parts[0].strip()
- end_part = parts[1].strip()
-
- # Parse start year
- if len(start_part) == 2:
- start_num = int(start_part)
+ try:
+ start_part = parts[0].strip()
+ end_part = parts[1].strip()
+
+ # Parse start year
+ if len(start_part) == 2:
+ start_num = int(start_part)
+ # ... rest of parsing logic ...
+ # ... complete cross-year parsing ...
+ return (start_year, end_year)
+ except ValueError:
+ return (None, None)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In app/services/competitions/seasons.py around lines 53 to 95, the cross-year
parsing block performs multiple int() conversions that can raise ValueError for
malformed inputs; wrap the entire cross-year parsing logic in a try/except
ValueError block, catch ValueError, handle it the same way as the single-year
path (e.g., log or swallow the error and return None / appropriate failure
value), and ensure no uncaught ValueError propagates out of this function.
- Add length validation for ids and names before zip() to prevent silent data loss - Raise descriptive ValueError with logging if ids and names mismatch - Simplify seasonId assignment logic for national teams - Remove unnecessary try/except block (isdigit() prevents ValueError) - Clean up unreachable fallback code
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
app/services/competitions/clubs.py (1)
106-119: Hardcoded tournament sizes remain outdated.This issue was flagged in a previous review and remains unaddressed. The values for
FIWC(32) andCOPA(12) don't reflect current/upcoming tournament formats:
- World Cup expands to 48 teams in 2026
- Copa America has been 16 teams in recent editions (2016, 2024)
Consider making this configuration-aware or deriving expected size dynamically, since tournament formats change over time:
expected_sizes = { - "FIWC": 32, # World Cup has 32 participants + "FIWC": 48, # World Cup: 48 teams from 2026 "EURO": 24, # EURO has 24 participants - "COPA": 12, # Copa America typically has 12 participants + "COPA": 16, # Copa America: 16 teams (recent format) "AFAC": 24, # Asian Cup has 24 participants "GOCU": 16, # Gold Cup has 16 participants "AFCN": 24, # Africa Cup has 24 participants }
🧹 Nitpick comments (2)
app/services/competitions/clubs.py (2)
60-66: Redundant try/except block.The
isdigit()check on line 61 already guarantees the string can be parsed as an integer, soint(url_season_id)cannot raiseValueError. The try/except is dead code.url_season_id = self.season_id if url_season_id and url_season_id.isdigit(): - try: - year = int(url_season_id) - url_season_id = str(year - 1) - except ValueError: - pass + url_season_id = str(int(url_season_id) - 1)
148-161: Fragile string manipulation for name cleaning.The sequential string replacements and splits depend on specific text patterns from Transfermarkt pages, which could break if the site changes its formatting. The magic number
3on line 158 is unclear.Consider documenting the expected input formats and using a more explicit approach:
- # Take the part that doesn't contain "Participants" or is the longest - name = next((p for p in parts if "Participants" not in p and len(p) > 3), parts[0]) + # Filter out short fragments and "Participants" text + MIN_NAME_LENGTH = 4 # Exclude fragments like "20", "WC", etc. + name = next((p for p in parts if "Participants" not in p and len(p) >= MIN_NAME_LENGTH), parts[0])Also, the
trim(name)on line 150 is redundant sinceget_text_by_xpathalready trims the result.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/services/competitions/clubs.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/services/competitions/clubs.py (3)
app/services/base.py (3)
TransfermarktBase(16-223)get_list_by_xpath(132-150)get_text_by_xpath(152-206)app/utils/utils.py (2)
extract_from_url(19-46)trim(49-62)app/utils/xpath.py (2)
Competitions(186-191)Competitions(194-265)
🔇 Additional comments (3)
app/services/competitions/clubs.py (3)
11-19: LGTM!The national team competition mapping is clear and well-structured. The URL slugs align with Transfermarkt's URL patterns.
127-136: Good defensive validation.The length mismatch check with detailed error logging is a solid safeguard against silent data corruption from XPath parsing issues.
175-200: Season ID handling is well-structured with appropriate fallbacks.The year conversion logic correctly inverts the URL construction (subtracts 1 for URL, adds 1 for response). The fallback chain provides resilience.
One consideration: line 186-188 uses
url_season_idas-is when it exists but isn't a digit. Verify this is intentional and won't propagate unexpected values.
- Add tournament size configuration to Settings class with environment variable support - Replace hardcoded dict with settings.get_tournament_size() method - Add warning logging when tournament size is not configured (instead of silent truncation) - Proceed without truncation when size is unavailable (no silent data loss) - Add validation for tournament sizes (must be positive integers) - Add comprehensive unit tests for both configured and fallback paths - Update README.md with new environment variables documentation This prevents silent truncation when tournament sizes change (e.g., World Cup expanding to 48) and allows easy configuration via environment variables.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
app/services/competitions/clubs.py (1)
100-127: CRITICAL: Outdated tournament size defaults will truncate valid participant data.The past review comment correctly identified that tournament sizes are outdated. While the PR moved these to configuration (which is good), the default values remain incorrect:
- World Cup (FIWC): Default is 32, but expanding to 48 teams starting in 2026. This will truncate 16 valid participants for 2026+ tournaments.
- Copa America (COPA): Default is 12, but recent editions (2016, 2024) have 16 teams. This will truncate 4 valid participants.
Update the defaults in
app/settings.py(lines 16-18) andREADME.md(lines 63, 65):- TOURNAMENT_SIZE_FIWC: Optional[int] = Field(default=32, description="World Cup expected participants") + TOURNAMENT_SIZE_FIWC: Optional[int] = Field(default=48, description="World Cup expected participants") - TOURNAMENT_SIZE_COPA: Optional[int] = Field(default=12, description="Copa America expected participants") + TOURNAMENT_SIZE_COPA: Optional[int] = Field(default=16, description="Copa America expected participants")Consider making truncation more robust.
The current truncation simply takes the first N items (
urls[:expected_size]), assuming they appear in the correct order. If the HTML structure changes or the XPath captures items differently, this could truncate actual participants while keeping non-participants. Consider:
- Only truncating when a clear "not qualified" marker is present in the HTML
- Deriving the expected size from the page content rather than hardcoding
- Adding validation that the truncated items match expected patterns (e.g., team URLs contain expected path segments)
README.md (1)
63-69: Update default values to match current tournament formats.The documented defaults are outdated and should be updated along with the code changes in
app/settings.py:-| `TOURNAMENT_SIZE_FIWC` | Expected number of participants for World Cup | `32` | +| `TOURNAMENT_SIZE_FIWC` | Expected number of participants for World Cup | `48` | -| `TOURNAMENT_SIZE_COPA` | Expected number of participants for Copa America | `12` | +| `TOURNAMENT_SIZE_COPA` | Expected number of participants for Copa America | `16` |This aligns with the critical issue raised in the
app/services/competitions/clubs.pyreview.tests/competitions/test_competitions_clubs.py (1)
94-104: Update test assertions when fixing tournament size defaults.Lines 98 and 100 assert the current (outdated) default values. When you update the defaults in
app/settings.pyto FIWC=48 and COPA=16, remember to update these test assertions:- assert settings.get_tournament_size("FIWC") == 32 + assert settings.get_tournament_size("FIWC") == 48 - assert settings.get_tournament_size("COPA") == 12 + assert settings.get_tournament_size("COPA") == 16app/settings.py (1)
16-21: CRITICAL: Update default tournament sizes to match current formats.The default values are outdated and will cause data truncation:
- TOURNAMENT_SIZE_FIWC: Optional[int] = Field(default=32, description="World Cup expected participants") + TOURNAMENT_SIZE_FIWC: Optional[int] = Field(default=48, description="World Cup expected participants") - TOURNAMENT_SIZE_COPA: Optional[int] = Field(default=12, description="Copa America expected participants") + TOURNAMENT_SIZE_COPA: Optional[int] = Field(default=16, description="Copa America expected participants")Context:
- World Cup expanded to 48 teams starting in 2026
- Copa America recent editions (2016, 2024) have 16 teams
This is the root cause of the truncation issue flagged across multiple files.
🧹 Nitpick comments (1)
app/services/competitions/clubs.py (1)
178-204: Season ID extraction logic is correct but could be simplified.The logic correctly handles the year conversion for national teams (subtracts 1 for URL, adds 1 back for response). The fallback chain is reasonable.
Consider simplifying the conditional nesting for better readability:
if self.is_national_team: # Extract URL season_id and convert to tournament year (add 1) url_season_id = extract_from_url( self.get_text_by_xpath(Competitions.Profile.URL), "season_id", ) if url_season_id and url_season_id.isdigit(): self.response["seasonId"] = str(int(url_season_id) + 1) elif url_season_id: self.response["seasonId"] = url_season_id elif self.season_id: self.response["seasonId"] = self.season_id else: # For regular leagues, extract season_id normally season_id = extract_from_url( self.get_text_by_xpath(Competitions.Profile.URL), "season_id", ) if season_id: self.response["seasonId"] = season_id
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
README.md(1 hunks)app/services/competitions/clubs.py(4 hunks)app/settings.py(2 hunks)tests/competitions/test_competitions_clubs.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
app/services/competitions/clubs.py (3)
app/utils/utils.py (2)
extract_from_url(19-46)trim(49-62)app/utils/xpath.py (7)
Competitions(186-191)Competitions(194-265)Clubs(105-191)Clubs(232-265)Profile(18-57)Profile(106-138)Profile(195-219)app/settings.py (1)
get_tournament_size(38-56)
tests/competitions/test_competitions_clubs.py (1)
app/settings.py (2)
Settings(7-56)get_tournament_size(38-56)
🔇 Additional comments (6)
app/services/competitions/clubs.py (2)
1-20: LGTM!The imports and national team competition mapping are well-structured and clearly documented.
131-138: Excellent defensive validation!The length mismatch check prevents silent data loss and provides clear diagnostics. This is a great safeguard against XPath parsing issues.
tests/competitions/test_competitions_clubs.py (2)
42-73: Well-structured tests with good coverage.The tests for tournament size configuration, warning logging, and environment variable overrides are comprehensive and properly isolated using mocks.
107-113: Good validation testing.The tests properly verify that invalid tournament sizes (negative and zero) are rejected with appropriate error messages.
app/settings.py (2)
23-36: Excellent validation logic.The validator ensures tournament sizes are positive when provided, with clear error messages. The use of
@field_validatorwith multiple fields is efficient.
38-56: Clean and well-documented accessor method.The
get_tournament_sizemethod provides a clear interface for retrieving tournament sizes with proper type hints and documentation. The mapping approach is straightforward and maintainable.
| def __get_national_team_url(self) -> str: | ||
| """ | ||
| Construct the URL for national team competition participants page. | ||
|
|
||
| Returns: | ||
| str: The URL for the participants page. | ||
| """ | ||
| slug = NATIONAL_TEAM_COMPETITIONS.get(self.competition_id) | ||
| if not slug: | ||
| raise ValueError(f"Unknown national team competition: {self.competition_id}") | ||
|
|
||
| # For national teams, season_id in URL is the year BEFORE the tournament | ||
| # E.g., 2006 World Cup uses saison_id=2005 | ||
| url_season_id = self.season_id | ||
| if url_season_id and url_season_id.isdigit(): | ||
| try: | ||
| year = int(url_season_id) | ||
| url_season_id = str(year - 1) | ||
| except ValueError: | ||
| pass | ||
|
|
||
| return ( | ||
| f"https://www.transfermarkt.com/{slug}/teilnehmer/" | ||
| f"pokalwettbewerb/{self.competition_id}/saison_id/{url_season_id}" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, locate the file and examine the context around it
find . -type f -name "clubs.py" | head -5Repository: eskoubar95/transfermarkt-api
Length of output: 173
🏁 Script executed:
#!/bin/bash
# Find the NATIONAL_TEAM_COMPETITIONS definition
rg -n "NATIONAL_TEAM_COMPETITIONS" --type py -B 2 -A 10Repository: eskoubar95/transfermarkt-api
Length of output: 2563
🏁 Script executed:
#!/bin/bash
# Check for tests related to this method
fd -e "test*.py" -o -e "*_test.py" | xargs rg -l "__get_national_team_url\|national_team" 2>/dev/null | head -5Repository: eskoubar95/transfermarkt-api
Length of output: 188
🏁 Script executed:
#!/bin/bash
# Examine the full context of the method and surrounding code
rg -n "def __get_national_team_url" --type py -B 5 -A 30Repository: eskoubar95/transfermarkt-api
Length of output: 2781
🏁 Script executed:
#!/bin/bash
# Search for tests related to competitions/clubs
find . -type f -name "*.py" -path "*/test*" | xargs rg -l "clubs\|competition" 2>/dev/null | head -10Repository: eskoubar95/transfermarkt-api
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Search for any test files in the repo
fd -e "test*.py" -o -e "*_test.py"Repository: eskoubar95/transfermarkt-api
Length of output: 188
🏁 Script executed:
#!/bin/bash
# Look for any comments or documentation about URL construction patterns
rg -n "saison_id\|season_id.*URL\|year.*tournament" --type py -B 2 -A 2Repository: eskoubar95/transfermarkt-api
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check if there are any other methods that construct URLs to understand the pattern
rg -n "def __get.*url\|transfermarkt.com" --type py -B 1 -A 8Repository: eskoubar95/transfermarkt-api
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Try to understand if there's a specific pattern by checking the entire file structure
wc -l app/services/competitions/clubs.py
cat -n app/services/competitions/clubs.py | head -100Repository: eskoubar95/transfermarkt-api
Length of output: 4417
Remove the redundant try/except block.
Lines 63-67 are unreachable. The isdigit() check on line 62 guarantees the string contains only digits, so int() cannot raise ValueError.
url_season_id = self.season_id
if url_season_id and url_season_id.isdigit():
- try:
- year = int(url_season_id)
- url_season_id = str(year - 1)
- except ValueError:
- pass
+ year = int(url_season_id)
+ url_season_id = str(year - 1)Regarding the year-1 URL logic: The pattern is consistently applied to all six national team competitions (FIWC, EURO, COPA, AFAC, GOCU, AFCN) as documented. Verification that this pattern applies correctly across different tournament years requires testing against the actual TransferMarkt site, which cannot be done in this environment.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __get_national_team_url(self) -> str: | |
| """ | |
| Construct the URL for national team competition participants page. | |
| Returns: | |
| str: The URL for the participants page. | |
| """ | |
| slug = NATIONAL_TEAM_COMPETITIONS.get(self.competition_id) | |
| if not slug: | |
| raise ValueError(f"Unknown national team competition: {self.competition_id}") | |
| # For national teams, season_id in URL is the year BEFORE the tournament | |
| # E.g., 2006 World Cup uses saison_id=2005 | |
| url_season_id = self.season_id | |
| if url_season_id and url_season_id.isdigit(): | |
| try: | |
| year = int(url_season_id) | |
| url_season_id = str(year - 1) | |
| except ValueError: | |
| pass | |
| return ( | |
| f"https://www.transfermarkt.com/{slug}/teilnehmer/" | |
| f"pokalwettbewerb/{self.competition_id}/saison_id/{url_season_id}" | |
| ) | |
| def __get_national_team_url(self) -> str: | |
| """ | |
| Construct the URL for national team competition participants page. | |
| Returns: | |
| str: The URL for the participants page. | |
| """ | |
| slug = NATIONAL_TEAM_COMPETITIONS.get(self.competition_id) | |
| if not slug: | |
| raise ValueError(f"Unknown national team competition: {self.competition_id}") | |
| # For national teams, season_id in URL is the year BEFORE the tournament | |
| # E.g., 2006 World Cup uses saison_id=2005 | |
| url_season_id = self.season_id | |
| if url_season_id and url_season_id.isdigit(): | |
| year = int(url_season_id) | |
| url_season_id = str(year - 1) | |
| return ( | |
| f"https://www.transfermarkt.com/{slug}/teilnehmer/" | |
| f"pokalwettbewerb/{self.competition_id}/saison_id/{url_season_id}" | |
| ) |
🤖 Prompt for AI Agents
In app/services/competitions/clubs.py around lines 48 to 72, the try/except that
wraps int(url_season_id) is redundant because url_season_id.isdigit() already
guarantees numeric input; remove the try/except and directly compute the
previous year by converting to int and subtracting one (e.g., url_season_id =
str(int(url_season_id) - 1)) while preserving the existing isdigit() guard and
comments.
- Remove extra HTTP request to fetch club profile just to read isNationalTeam - Set is_national_team=None to let TransfermarktClubPlayers use DOM heuristics - Remove broad except Exception that silently swallowed all errors - Improve performance by eliminating redundant network call - Players class already has robust DOM-based detection for national teams
- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/api/endpoints/clubs.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/api/endpoints/clubs.py (2)
app/services/clubs/competitions.py (2)
TransfermarktClubCompetitions(9-86)get_club_competitions(74-86)app/schemas/clubs/competitions.py (1)
ClubCompetitions(12-15)
🔇 Additional comments (2)
app/api/endpoints/clubs.py (2)
6-6: LGTM!The import follows the established pattern and is necessary for the new endpoint.
37-41: LGTM!The new endpoint follows the established pattern in the file perfectly. The implementation is clean, type hints are correct, and the service integration matches the existing endpoints.
| # Let TransfermarktClubPlayers use its DOM heuristics to detect national teams | ||
| # This avoids an extra HTTP request and relies on the players page structure | ||
| tfmkt = TransfermarktClubPlayers(club_id=club_id, season_id=season_id, is_national_team=None) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the TransfermarktClubPlayers class definition
find . -type f -name "*.py" | xargs grep -l "class TransfermarktClubPlayers"Repository: eskoubar95/transfermarkt-api
Length of output: 103
🏁 Script executed:
# Search for the class and its initialization logic
rg -n "class TransfermarktClubPlayers" -A 30Repository: eskoubar95/transfermarkt-api
Length of output: 2494
🏁 Script executed:
# Look for is_national_team parameter handling
rg -n "is_national_team" -B 3 -A 3Repository: eskoubar95/transfermarkt-api
Length of output: 7065
🏁 Script executed:
# Check if TransfermarktClubPlayers is a dataclass and see full class definition
head -50 app/services/clubs/players.pyRepository: eskoubar95/transfermarkt-api
Length of output: 2170
🏁 Script executed:
# Check the parent class TransfermarktBase
rg -n "class TransfermarktBase" -A 20Repository: eskoubar95/transfermarkt-api
Length of output: 2338
🏁 Script executed:
# Check if is_national_team is defined anywhere in TransfermarktClubPlayers
rg -n "is_national_team" app/services/clubs/players.pyRepository: eskoubar95/transfermarkt-api
Length of output: 212
🏁 Script executed:
# Search for tests that call get_club_players or TransfermarktClubPlayers
rg -n "get_club_players\|TransfermarktClubPlayers" tests/ --type pyRepository: eskoubar95/transfermarkt-api
Length of output: 54
🏁 Script executed:
# Check git log to see recent changes to clubs/players.py
git log --oneline -20 app/services/clubs/players.pyRepository: eskoubar95/transfermarkt-api
Length of output: 148
🏁 Script executed:
# Check if there are any **kwargs handling in the dataclass or parent
git show HEAD:app/services/clubs/players.py | head -80Repository: eskoubar95/transfermarkt-api
Length of output: 3621
Remove the invalid is_national_team=None parameter.
The TransfermarktClubPlayers class does not accept an is_national_team parameter. Passing is_national_team=None will cause a TypeError at runtime. The DOM heuristics for detecting national teams work independently within the __parse_club_players() method (line 61 checks len(self.page.xpath(Clubs.Players.PAGE_INFOS)) == 0), so the parameter is unnecessary. Change line 32 to:
tfmkt = TransfermarktClubPlayers(club_id=club_id, season_id=season_id)🤖 Prompt for AI Agents
In app/api/endpoints/clubs.py around lines 30 to 32, the call to
TransfermarktClubPlayers incorrectly passes an unsupported parameter
is_national_team=None which will raise a TypeError at runtime; remove that
argument so the constructor is invoked as
TransfermarktClubPlayers(club_id=club_id, season_id=season_id) and rely on the
class's internal DOM heuristics for national team detection.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
app/services/clubs/competitions.py (3)
26-31: Calendar-year default may not match “current season” semantics
self.season_id = str(datetime.now().year)treats the current season as the current calendar year, which can diverge from competition seasons that span years (e.g., 24/25 vs 2025). If API consumers expect “current season” in league terms rather than pure year, consider delegating to a shared “current season” resolver (e.g., from the seasons service or config) instead ofdatetime.now().year.
38-71: Parsing logic is solid; consider refining type hints and XPath reuseThe overall extraction from
RECORD_TABLEviaCOMPETITION_LINKSis straightforward and defensively coded (skips rows without links, requires both id and name before appending).A couple of small polish points:
- The return annotation can be made more precise, e.g.
list[dict[str, str]], which helps static checking and downstream consumers.- Since
app.utils.xpath.Clubs.Competitionsalready exposesCOMPETITION_NAME/COMPETITION_URL, consider using those instead of inlining".//text()"and"href"here for better centralization if the DOM ever changes.- If you ever observe duplicate competitions in this table, you might want to de‑duplicate by
comp_idbefore returning, but that can be deferred until there’s a concrete need.
11-36: Tighten dataclass typing to useOptionalfor nullable fieldsThe type hints
club_id: str = Noneandseason_id: str = Noneconflict with the declared types and will trip static analyzers. UseOptionalinstead:club_id: Optional[str] = Noneandseason_id: Optional[str] = None, or makeclub_idrequired without a default since it's essential to form the URL. This aligns with the existing pattern used elsewhere in the codebase (e.g.,competitions/seasons.pyusescompetition_id: Optional[str] = None).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/services/clubs/competitions.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
app/services/clubs/competitions.py (4)
app/services/base.py (2)
TransfermarktBase(16-223)request_url_page(104-117)app/utils/utils.py (1)
extract_from_url(19-46)app/utils/xpath.py (2)
Competitions(186-191)Competitions(194-265)app/api/endpoints/clubs.py (1)
get_club_competitions(38-41)
🔇 Additional comments (1)
app/services/clubs/competitions.py (1)
73-85: Public getter wiring looks clean and consistentPopulating
self.responsewithid,seasonId, andcompetitionsand returning it directly is clear and matches the intended schema; no issues from a correctness perspective as long asTransfermarktBaseinitializesresponseas a dict.
- Move datetime and HTTPException imports from method level to module level - Improves code readability and marginally improves performance - Follows Python best practices for import organization
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
app/services/clubs/players.py (3)
86-113: Consider pre-indexing rows by URL for O(n) lookup.The current nested loop has O(n×m) complexity. For pages with many players, consider building a URL-to-row map first:
- # Iterate over the same URL list used to create players_ids to maintain order and count - for url in players_urls: - # Find first row containing this player URL - found = False - for row in all_player_rows: - row_urls = row.xpath(".//td[@class='hauptlink']//a[contains(@href, '/profil/spieler')]/@href") - if url in row_urls: + # Build URL-to-row map for O(1) lookups + url_to_row = {} + for row in all_player_rows: + row_urls = row.xpath(".//td[@class='hauptlink']//a[contains(@href, '/profil/spieler')]/@href") + for row_url in row_urls: + if row_url not in url_to_row: + url_to_row[row_url] = row + + # Iterate over the same URL list used to create players_ids to maintain order and count + for url in players_urls: + row = url_to_row.get(url) + if row is not None: tds = row.xpath(".//td") # ... rest of extraction logic
94-106: Consider extracting TD indices as named constants.The hardcoded indices (4, 5) for position and DOB columns are fragile. For consistency with other XPath selectors in
app/utils/xpath.py, consider defining these as constants:# In Clubs.Players or similar NATIONAL_TEAM_POSITION_TD_INDEX = 4 NATIONAL_TEAM_DOB_TD_INDEX = 5
125-178: Silent truncation may cause data loss.The padding pattern
(list + [default] * base_length)[:base_length]silently truncates lists longer thanbase_length. If a list has more elements due to parsing quirks, those extra entries are discarded without warning.Consider logging a warning when truncation occurs to aid debugging:
+import logging + +logger = logging.getLogger(__name__) + +def _normalize_list(data: list, base_length: int, default, name: str) -> list: + if len(data) > base_length: + logger.warning(f"{name} has {len(data)} elements, truncating to {base_length}") + return (data + [default] * base_length)[:base_length]This also reduces the repetitive normalization code throughout the method.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/services/clubs/players.py(4 hunks)app/services/competitions/search.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/services/competitions/search.py
🧰 Additional context used
🧬 Code graph analysis (1)
app/services/clubs/players.py (2)
app/services/base.py (1)
get_list_by_xpath(132-150)app/utils/utils.py (2)
extract_from_url(19-46)safe_regex(65-84)
🔇 Additional comments (4)
app/services/clubs/players.py (4)
18-25: LGTM!The
is_national_teamoptional flag is well-documented with clear behavior for when it'sNone(falling back to DOM-based detection).
62-64: Good fix for data alignment.Storing
players_urlsas the canonical source and derivingplayers_idsfrom it ensures consistent ordering across all player attribute lists. This addresses the previously flagged alignment issue.
68-74: LGTM!The detection logic is clean: uses explicit flag when provided, otherwise falls back to a sensible DOM heuristic. The local variable shadowing of
self.is_national_teamis intentional and improves readability within the method.
180-215: LGTM!The zip-based return structure is clean. With proper normalization above, all lists will have matching lengths, ensuring correct player-to-attribute mapping.
- Keep imports at module level in clubs/competitions.py (from CodeRabbit review) - Preserve is_national_team flag logic in clubs/players.py - Keep name padding in competitions/search.py - Add .DS_Store to .gitignore
be652e8 to
1383dc9
Compare
- Remove whitespace from blank lines (W293) - Add missing trailing commas (COM812) - Split long XPath lines to comply with E501 line length limit
- Format XPath strings to comply with line length - Format list comprehensions - Format is_season condition
📋 Summary
This PR adds a new endpoint to retrieve all available seasons for a specific competition, and enhances the existing clubs endpoint with full support for national team competitions.
🎯 Changes
New Endpoint
GET /competitions/{competition_id}/seasons- Returns all available seasons for a competitionEnhanced Endpoint
GET /competitions/{competition_id}/clubs- Now fully supports national team competitions/teilnehmer/) for national teams instead of record winnersNew Files
app/schemas/competitions/seasons.py- Pydantic schemas for season dataCompetitionSeason- Individual season schema withseason_id,season_name,start_year,end_yearCompetitionSeasons- Response schema with competition info and list of seasonsapp/services/competitions/seasons.py- Service implementationTransfermarktCompetitionSeasons- Scrapes season data from competition pagesModified Files
app/api/endpoints/competitions.py- Added new seasons endpoint, updated clubs endpointapp/schemas/competitions/clubs.py- Madeseason_idoptionalapp/schemas/competitions/__init__.py- Exported new schemasapp/services/competitions/clubs.py- Added national team competition supportapp/utils/xpath.py- Added XPath expressions for season dropdown and participants tableDockerfile- Fixed PYTHONPATH configuration🔍 Features
Competition Seasons Endpoint
season_id,season_name,start_year, andend_yearstart_yeardescending (newest first)National Team Competitions Support
/teilnehmer/pokalwettbewerb/) for accurate participant lists📝 Example Responses
Seasons Endpoint
{ "id": "DK1", "name": "Superliga", "seasons": [ { "seasonId": "2025", "seasonName": "25/26", "startYear": 2025, "endYear": 2026 } ], "updatedAt": "2025-12-01T17:34:55.269244" }Clubs Endpoint (National Team)
{ "id": "FIWC", "name": "World Cup", "seasonId": "2006", "clubs": [ { "id": "3439", "name": "Brazil" }, { "id": "3262", "name": "Germany" } ], "updatedAt": "2025-12-01T18:37:02.313924" }✅ Testing
Seasons Endpoint
National Team Competitions
🔗 Related
This endpoint complements the existing
/competitions/{competition_id}/clubsendpoint by providing season information that can be used to query clubs for specific seasons. The enhanced clubs endpoint now provides accurate participant lists for national team competitions, not just record winners.Summary by CodeRabbit
New Features
Improvements
Documentation
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.