-
Couldn't load subscription status.
- Fork 3
refactor: Move Poetry configuration to project root for cleaner monorepo structure #501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…epo structure ## Summary Move pyproject.toml and poetry.lock from backend/ to project root to centralize Python dependency management and fix virtual environment accessibility issues. ## Changes ### Poetry Configuration (Moved) - backend/pyproject.toml → pyproject.toml - backend/poetry.lock → poetry.lock ### Makefile (100+ lines across 20+ targets) - Changed VENV_DIR from backend/.venv to .venv - Updated all Poetry commands to run from project root with PYTHONPATH=backend - Added venv dependency to local-dev-backend and local-dev-all targets - Updated build targets to use project root as Docker build context - Updated all test targets (atomic, unit, integration, e2e) - Updated code quality targets (lint, format, security-check, coverage) - Fixed clean target to reference root-level paths ### CI/CD Workflows (5 files) - poetry-lock-check.yml: Updated paths and removed cd backend commands - 01-lint.yml: Removed working-directory, updated all tool paths - 04-pytest.yml: Updated cache keys and test commands - 05-ci.yml: Updated dependency installation commands - makefile-testing.yml: Updated test execution paths ### Docker Configuration - backend/Dockerfile.backend: Updated COPY commands for new build context - docker-compose.dev.yml: Changed context from ./backend to . + fixed indentation ## Benefits - Single source of truth for Python dependencies at project root - Simplified virtual environment management (.venv/ at root) - Consistent build context across all tools (Makefile, docker-compose, CI/CD) - Better monorepo structure for future frontend/backend separation - Fixes dependency accessibility issues (e.g., docling import errors) ## Breaking Changes Developers need to: 1. Remove old venv: rm -rf backend/.venv 2. Create new venv: make venv 3. Update IDE Python interpreter from backend/.venv to .venv 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🚀 Development Environment OptionsThis repository supports Dev Containers for a consistent development environment. Option 1: GitHub Codespaces (Recommended)Create a cloud-based development environment:
Option 2: VS Code Dev Containers (Local)Use Dev Containers on your local machine:
Option 3: Traditional Local SetupSet up the development environment manually: # Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout refactor/poetry-to-root-clean
# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validateAvailable CommandsOnce in your development environment: make help # Show all available commands
make dev-validate # Validate environment setup
make test-atomic # Run atomic tests
make test-unit # Run unit tests
make lint # Run lintingServices AvailableWhen running
This automated message helps reviewers quickly set up the development environment. |
Pull Request Review: Poetry Configuration Move to Project RootSummaryThis PR successfully restructures the project to centralize Python dependency management by moving Poetry configuration from ✅ Strengths1. Comprehensive Implementation
2. Strong Documentation
3. Consistency Improvements
|
| Category | Rating | Notes |
|---|---|---|
| Code Quality | 🟡 7/10 | Inconsistent patterns in test targets |
| Documentation | ✅ 9/10 | Excellent PR description and migration guide |
| Test Coverage | 🟡 7/10 | Some test targets not updated |
| Security | ✅ 10/10 | No concerns |
| Performance | ✅ 9/10 | Positive improvements |
✅ Verdict
REQUEST CHANGES - Fix critical Makefile inconsistencies before merge.
This is a well-thought-out refactoring with excellent documentation. The core concept is sound and implementation is 90% complete. However, the inconsistent test target patterns will cause CI failures and developer confusion. Once the 6 problematic Makefile targets are updated to use the new pattern consistently, this PR will be ready to merge.
Great work overall! 🎉 Just needs those final touches to ensure consistency across all test execution paths.
Review conducted with reference to project's CLAUDE.md guidelines and testing best practices.
When poetry files were moved from backend/ to project root, Docker cached layers still referenced the old file structure. Adding an ARG before the COPY command forces Docker to invalidate the cache at this layer. Fixes CI build failure in PR #501.
Code Review - PR #501: Poetry Configuration MigrationThis PR moves Poetry configuration from backend/ to project root. Overall strong refactoring with clear benefits, but has critical issues that must be fixed before merge. Critical Issues1. Test Files Not UpdatedTwo test files still reference old paths and WILL FAIL in CI:
These use backend_dir / poetry.lock but files are now at project root. Fix: Change to project_root = Path(file).parent.parent.parent 2. .gitignore May Need UpdateVerify .gitignore includes .venv/ at root (not just backend/.venv) Strengths
Code Quality
RecommendationsMust fix before merge:
Should verify: VerdictRating: 7/10 (would be 9/10 after fixes) DO NOT MERGE YET - Fix test paths first. Architecture change is sound and implementation is comprehensive. Great work on complex refactoring! |
Fixes #502 - Update all Docker and CI/CD references after moving Poetry config from backend/ to project root (Issue #501). Changes: 1. **Dockerfiles** (backend/Dockerfile.backend, Dockerfile.codeengine): - Add POETRY_ROOT_MIGRATION cache-bust ARG to both stages - Update COPY commands to reference pyproject.toml and poetry.lock from project root - Move poetry.lock copy alongside pyproject.toml for consistency - Add explanatory comments about Issue #501 migration 2. **GitHub Actions Workflows**: - Update 05-ci.yml: Fix poetry cache key to use 'poetry.lock' instead of 'backend/poetry.lock' - Update 03-build-secure.yml: Change backend context from 'backend' to '.' for correct file resolution 3. **PyTorch Version Update**: - Upgrade torch from 2.5.0+cpu to 2.6.0+cpu - Upgrade torchvision from 0.20.0+cpu to 0.21.0+cpu - Reason: 2.5.0+cpu not available for ARM64 architecture - New versions are compatible with both ARM64 and x86_64 4. **Secret Management**: - Add pragma: allowlist secret comments to test secrets in 05-ci.yml - Prevents false positives in detect-secrets pre-commit hook Impact: - Fixes failing CI/CD test: TestMakefileTargetsDirect.test_make_build_backend_minimal - Docker builds now correctly find pyproject.toml and poetry.lock at project root - Maintains compatibility with both local development (ARM64) and CI (x86_64) - GitHub Actions workflows correctly cache Poetry dependencies Testing: - Docker build context validated - All references to backend/pyproject.toml and backend/poetry.lock removed - Cache keys updated to match new file locations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request Review: Poetry Configuration Migration to Project RootOverviewThis PR successfully migrates Poetry configuration from backend/ to the project root, creating a cleaner monorepo structure. The refactoring is comprehensive and well-executed across 12 files. ✅ Strengths1. Excellent Architecture & Rationale
2. Thorough CI/CD Updates
3. Smart Docker Layer Invalidation
4. Consistent Makefile Refactoring
5. PyTorch Version Upgrade
|
Updated test to copy pyproject.toml and poetry.lock from project root instead of backend/ directory. Changes: - Added pyproject.toml and poetry.lock to root files_to_copy list - Removed these files from backend_files list - Added comment explaining Poetry root migration (Issue #501) This fixes the Docker build failure: ERROR: failed to compute cache key: "/pyproject.toml": not found Root cause: Makefile test was copying Poetry files from backend/ but they've been moved to project root in the Poetry migration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…506) * refactor: Move Poetry configuration to project root for cleaner monorepo structure ## Summary Move pyproject.toml and poetry.lock from backend/ to project root to centralize Python dependency management and fix virtual environment accessibility issues. ## Changes ### Poetry Configuration (Moved) - backend/pyproject.toml → pyproject.toml - backend/poetry.lock → poetry.lock ### Makefile (100+ lines across 20+ targets) - Changed VENV_DIR from backend/.venv to .venv - Updated all Poetry commands to run from project root with PYTHONPATH=backend - Added venv dependency to local-dev-backend and local-dev-all targets - Updated build targets to use project root as Docker build context - Updated all test targets (atomic, unit, integration, e2e) - Updated code quality targets (lint, format, security-check, coverage) - Fixed clean target to reference root-level paths ### CI/CD Workflows (5 files) - poetry-lock-check.yml: Updated paths and removed cd backend commands - 01-lint.yml: Removed working-directory, updated all tool paths - 04-pytest.yml: Updated cache keys and test commands - 05-ci.yml: Updated dependency installation commands - makefile-testing.yml: Updated test execution paths ### Docker Configuration - backend/Dockerfile.backend: Updated COPY commands for new build context - docker-compose.dev.yml: Changed context from ./backend to . + fixed indentation ## Benefits - Single source of truth for Python dependencies at project root - Simplified virtual environment management (.venv/ at root) - Consistent build context across all tools (Makefile, docker-compose, CI/CD) - Better monorepo structure for future frontend/backend separation - Fixes dependency accessibility issues (e.g., docling import errors) ## Breaking Changes Developers need to: 1. Remove old venv: rm -rf backend/.venv 2. Create new venv: make venv 3. Update IDE Python interpreter from backend/.venv to .venv 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docker): add cache-bust ARG to invalidate stale Docker layers When poetry files were moved from backend/ to project root, Docker cached layers still referenced the old file structure. Adding an ARG before the COPY command forces Docker to invalidate the cache at this layer. Fixes CI build failure in PR #501. * fix(docker): Update Dockerfiles and workflows for Poetry root migration Fixes #502 - Update all Docker and CI/CD references after moving Poetry config from backend/ to project root (Issue #501). Changes: 1. **Dockerfiles** (backend/Dockerfile.backend, Dockerfile.codeengine): - Add POETRY_ROOT_MIGRATION cache-bust ARG to both stages - Update COPY commands to reference pyproject.toml and poetry.lock from project root - Move poetry.lock copy alongside pyproject.toml for consistency - Add explanatory comments about Issue #501 migration 2. **GitHub Actions Workflows**: - Update 05-ci.yml: Fix poetry cache key to use 'poetry.lock' instead of 'backend/poetry.lock' - Update 03-build-secure.yml: Change backend context from 'backend' to '.' for correct file resolution 3. **PyTorch Version Update**: - Upgrade torch from 2.5.0+cpu to 2.6.0+cpu - Upgrade torchvision from 0.20.0+cpu to 0.21.0+cpu - Reason: 2.5.0+cpu not available for ARM64 architecture - New versions are compatible with both ARM64 and x86_64 4. **Secret Management**: - Add pragma: allowlist secret comments to test secrets in 05-ci.yml - Prevents false positives in detect-secrets pre-commit hook Impact: - Fixes failing CI/CD test: TestMakefileTargetsDirect.test_make_build_backend_minimal - Docker builds now correctly find pyproject.toml and poetry.lock at project root - Maintains compatibility with both local development (ARM64) and CI (x86_64) - GitHub Actions workflows correctly cache Poetry dependencies Testing: - Docker build context validated - All references to backend/pyproject.toml and backend/poetry.lock removed - Cache keys updated to match new file locations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Comprehensive test fixes after PYTHONPATH removal from Makefile This commit addresses all test failures that occurred after removing PYTHONPATH from the Makefile, ensuring clean separation between backend and root directories. Test Results: 1,508 unit + 177 atomic + 126 integration = 1,811 passing tests Changes: - Import path fixes (relative imports after PYTHONPATH removal) - Test logic fixes (3 files updated to match service behavior) - Pydantic V2 migration (removed deprecated json_encoders) - Atomic test configuration (pytest-atomic.ini moved to root) - Integration test fixes (removed backend. prefix from patches) - Configuration updates (pre-commit, markdownlint, secrets baseline) Related: Poetry root migration (branch: refactor/poetry-to-root-clean) * fix(ci): Exclude playwright tests from pytest collection Playwright tests require the playwright package which is not in project dependencies. Added norecursedirs to exclude tests/playwright from test collection. This fixes CI failure: ModuleNotFoundError: No module named 'playwright' * fix(tests): Fix 5 failing unit tests in CI Fixed test issues after PYTHONPATH removal: 1. test_audio_storage.py - Removed Path mocking, test actual behavior - Test now verifies default path creation instead of mock interaction 2-5. test_conversation_message_repository.py - Fixed schema mocking - Patched from_db_message at schema module level, not repository - Added proper refresh mock to set required fields (id, created_at) - All 4 failing tests now pass Tests passing locally: - test_initialization_with_default_path - test_create_message_success - test_create_message_integrity_error - test_get_by_id_success - test_get_by_id_not_found 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(tests): Fix Docker build test after Poetry root migration Updated test to copy pyproject.toml and poetry.lock from project root instead of backend/ directory. Changes: - Added pyproject.toml and poetry.lock to root files_to_copy list - Removed these files from backend_files list - Added comment explaining Poetry root migration (Issue #501) This fixes the Docker build failure: ERROR: failed to compute cache key: "/pyproject.toml": not found Root cause: Makefile test was copying Poetry files from backend/ but they've been moved to project root in the Poetry migration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Comprehensive CI/CD fixes for PR #506 - Remove PYTHONPATH from GitHub Actions workflows (pyproject.toml handles it) - Fix pytest collection to use tests/unit/ instead of marker filtering (1508 tests) - Fix Dockerfile torchvision installation (use PyPI version, not +cpu) - Install PyTorch CPU wheels BEFORE Poetry to prevent CUDA builds (saves 6GB disk space) - Normalize import paths in vectordb stores and service layer - Remove obsolete test_docling_processor.py (644 lines deleted) - Update tests to use correct package paths Fixes: - GitHub Actions pytest workflow now runs all 1508 unit tests - Docker build no longer runs out of disk space - Makefile direct tests pass - All local tests pass (verified with poetry run pytest) * fix(docker): Update Dockerfile comments for clarity Updated comments in Dockerfile to better explain the PyTorch CPU-only installation process. No functional changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docker): Use Docling's approach for CPU-only PyTorch installation Simplified PyTorch CPU-only installation by using PIP_EXTRA_INDEX_URL environment variable, matching Docling's official Docker approach. Changes: - Removed complex multi-step PyTorch installation - Use PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu - Single Poetry install command installs all deps with CPU-only PyTorch - Saves ~6GB vs CUDA version This approach is officially recommended by Docling project: https://github.com/docling-project/docling/blob/main/Dockerfile Root cause: poetry.lock has PyTorch 2.8.0 (CUDA). Setting extra-index-url during poetry install ensures CPU-only wheels are used instead. Fixes: https://github.com/manavgup/rag_modulo/actions/runs/18861121839 Issue: #506 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docker): Bust Docker cache to force CPU-only PyTorch rebuild Updated CACHE_BUST ARG from 20251027 to 20251028 to invalidate Docker layer cache and force rebuild with PIP_EXTRA_INDEX_URL for CPU-only PyTorch. Issue: Even though Dockerfile was fixed to use CPU-only PyTorch, Docker was using cached layers from previous builds that had CUDA PyTorch. Solution: Change CACHE_BUST ARG value to force all layers after it to rebuild, ensuring the poetry install step uses the CPU-only index. Related: #506 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docker): Actually use CACHE_BUST ARG to invalidate cache Added RUN command that references CACHE_BUST ARG to force Docker to invalidate cache and rebuild subsequent layers. Issue: ARG was declared but never used, so Docker continued using cached layers with CUDA PyTorch. Fix: Added 'RUN echo "Cache bust: $CACHE_BUST"' which forces Docker to execute this layer whenever CACHE_BUST value changes, invalidating all subsequent cached layers including poetry install. Related: #506 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docker): Use pip with CPU-only PyTorch index to bypass poetry.lock **Problem**: PR #506 CI failing with "no space left on device" due to NVIDIA CUDA libraries (~6-8GB) being installed from poetry.lock. **Root Cause**: poetry.lock has torch==2.8.0 (CUDA version) with NVIDIA dependencies as transitive deps for Linux systems. Even with PIP_EXTRA_INDEX_URL set, `poetry install` installs exactly what's in poetry.lock, ignoring the extra index. **Solution**: Use pip install directly to bypass poetry.lock and install dependencies from pyproject.toml with CPU-only PyTorch index. This matches Docling's official Docker approach. **Changes**: - Copy backend/ directory before pip install (needed for -e .) - Use pip install -e . with --extra-index-url for CPU-only PyTorch - Bypasses poetry.lock entirely, resolving deps from pyproject.toml - Reduces image size by ~6-8GB (NVIDIA libs not installed) **Testing**: Will validate in CI that NVIDIA libraries are not installed. Related: #506, #507 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docker): Install dependencies from pyproject.toml bypassing poetry.lock **Problem**: Previous fix failed because pyproject.toml has package-mode=false, which prevents editable install with `pip install -e .`. **Solution**: Extract dependencies from pyproject.toml and install them directly with pip using CPU-only PyTorch index. This approach: - Bypasses poetry.lock completely - Installs CPU-only PyTorch from https://download.pytorch.org/whl/cpu - Works with package-mode=false - Matches Docling's Docker approach **Changes**: - Extract dependencies using tomllib from pyproject.toml - Install each dependency with pip --extra-index-url - Removed editable install (-e .) which doesn't work with package-mode=false Related: #506 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: manavgup <manavg@gmail.com> * fix(docker): Normalize dependency strings to handle spaces and parentheses **Problem**: Dependencies like "psutil (>=7.0.0,<8.0.0)" were being split by xargs into separate arguments, causing pip to fail with "Invalid requirement". **Root Cause**: pyproject.toml uses format with spaces before parentheses (e.g., "psutil (>=7.0.0,<8.0.0)"). When piped through xargs, the space causes splitting into "psutil" and "(>=7.0.0,<8.0.0)", which pip treats as invalid. **Solution**: Normalize dependency strings by removing spaces and parentheses: - "psutil (>=7.0.0,<8.0.0)" -> "psutil>=7.0.0,<8.0.0" - "docling (>=2.0.0)" -> "docling>=2.0.0" **Benefits**: - Maintains all version constraints correctly - Works with xargs without quoting issues - Still bypasses poetry.lock for CPU-only PyTorch Related: #506 Signed-off-by: Manav Gupta <manavgup@ca.ibm.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Signed-off-by: manavgup <manavg@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
Summary
Move
pyproject.tomlandpoetry.lockfrombackend/to project root to centralize Python dependency management and fix virtual environment accessibility issues.Motivation
Original Issue: The backend process couldn't access the
doclingpackage despite it being installed in Poetry's virtual environment, causing "Docling not installed" errors during PDF document processing.Root Cause: The backend was running with system Python instead of through Poetry's virtual environment, making dependencies inaccessible.
Solution: Restructure the project to use Poetry from the project root with a centralized virtual environment at
.venv/, ensuring all processes use the correct Python environment.Changes Made
1. Moved Poetry Configuration Files
backend/pyproject.tomlandbackend/poetry.lockpyproject.tomlandpoetry.lockat project root2. Updated Makefile (100+ lines across 20+ targets)
Virtual Environment:
VENV_DIRfrombackend/.venvto.venvPYTHONPATH=backendDevelopment Targets:
local-dev-backend: Addedvenvdependency, usesPYTHONPATH=backend poetry run uvicornlocal-dev-all: Addedvenvdependency, properly sets PYTHONPATH for background processBuild Targets:
build-backend: Changed fromcd backend && docker build -f Dockerfile.backend .todocker build -f backend/Dockerfile.backend .Test Targets (9 targets updated):
PYTHONPATH=backend poetry run pytesthtmlcov/at project rootCode Quality Targets (7 targets updated):
backend/directory3. Updated CI/CD Workflows (5 files)
cd backendcommandsworking-directory: backend, updated all tool paths4. Updated Docker Configuration (2 files)
backend/subdirectory./backendto., fixed indentationTechnical Benefits
.venv/) for all toolsTesting
✅ Verified:
make venvcreates virtual environment at.venv/make local-dev-backendstarts backend with hot-reloaddocling❓ Will be verified in CI:
Breaking Changes
rm -rf backend/.venvmake venv(creates.venv/at root)backend/pyproject.tomlbackend/.venvto.venv)Migration Guide
For developers pulling this PR:
Files Changed (10 total)
Moved:
backend/pyproject.toml→pyproject.tomlbackend/poetry.lock→poetry.lockModified:
Makefile.github/workflows/poetry-lock-check.yml.github/workflows/01-lint.yml.github/workflows/04-pytest.yml.github/workflows/05-ci.yml.github/workflows/makefile-testing.ymlbackend/Dockerfile.backenddocker-compose.dev.ymlNext Steps
After merging this PR, we'll return to the original Docling issue and verify PDF document processing works correctly with the properly configured virtual environment.
Checklist
🤖 Generated with Claude Code