Skip to content

LCORE-417: Convert tests to pytest#42

Merged
tisnik merged 8 commits intolightspeed-core:mainfrom
max-svistunov:lcore-417-convert-to-pytest
Oct 23, 2025
Merged

LCORE-417: Convert tests to pytest#42
tisnik merged 8 commits intolightspeed-core:mainfrom
max-svistunov:lcore-417-convert-to-pytest

Conversation

@max-svistunov
Copy link
Copy Markdown
Contributor

@max-svistunov max-svistunov commented Oct 21, 2025

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue # LCORE-417
  • Closes # LCORE-417

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Tests
    • Test suite migrated from unittest to pytest across converters, CLI, and processors; uses fixtures, pytest-style assertions, and pytest-mock for mocking.
    • Introduced a deterministic mock embedding fixture and removed legacy test utility helpers.
  • Chores
    • Dev tooling updated: added pytest-mock and linting rule to discourage unittest usage.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 21, 2025

Walkthrough

Migrates tests from unittest to pytest (fixtures, mocker, caplog, pytest.raises), consolidates test helpers by moving RagMockEmbedding into tests/conftest.py and removing tests/utils.py, adds pytest-mock to dev dependencies, and applies minor formatting edits to scripts/remove_pytorch_cpu_pyproject.py. No production API or runtime control-flow changes.

Changes

Cohort / File(s) Summary
Test infra — consolidation
tests/conftest.py, tests/utils.py
Adds RagMockEmbedding to tests/conftest.py; deletes tests/utils.py (removes previous RagMockEmbedding, subtest decorator, and TestCase test-helper class).
Asciidoc tests
tests/asciidoc/test__main__.py, tests/asciidoc/test_asciidoc_converter.py
Converts tests from unittest to pytest: introduces fixtures, replaces unittest patching with mocker.patch, switches to pytest assertions, and adapts tests to new fixture-driven args.
Document-processor tests
tests/test_document_processor.py, tests/test_document_processor_llama_index.py, tests/test_document_processor_llama_stack.py
Replaces class-based setUp with pytest fixtures, migrates mocks to mocker.patch, converts assertions to plain assert, and adds parametrization / fixture-driven setups.
Metadata / OKP / utils tests
tests/test_metadata_processor.py, tests/test_okp.py, tests/test_utils.py
Migrates from unittest to pytest: adds fixtures, swaps @patch for mocker.patch, updates assertions to pytest style, and uses caplog / pytest.raises where applicable.
Test helper addition
tests/conftest.py
New deterministic RagMockEmbedding providing 768-dim mock embeddings for tests.
Dev dependencies
pyproject.toml
Adds pytest-mock>=3.15.1 to dev dependencies and a banned-api entry for unittest/unittest.mock.
Build script formatting
scripts/remove_pytorch_cpu_pyproject.py
Minor docstring and import-order formatting changes only; no behavioral changes.

Sequence Diagram(s)

None — changes are limited to test harness refactors, dev-dependency updates, and minor formatting; no runtime control-flow modifications warranting a sequence diagram.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Llama stack dbs #28 — overlaps in areas touching document processor, llama-stack support, and pyproject/dependency changes; likely related at code/test surface.

Suggested reviewers

  • umago
  • tisnik
  • lpiwowar

Poem

🐰 I hopped through tests at break of dawn,
Swapped classes for fixtures on the lawn.
Mocker stitched seams with pytest thread,
Conftest cradles embeddings in its bed.
Hooray — the test-burrow’s tidy and gone!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "LCORE-417: Convert tests to pytest" is fully related to the main change in the changeset and clearly summarizes the primary objective. The changeset predominantly consists of migrations from unittest-based test style to pytest across multiple test files (test__main__.py, test_asciidoc_converter.py, test_document_processor.py, test_document_processor_llama_index.py, test_document_processor_llama_stack.py, test_metadata_processor.py, test_okp.py, test_utils.py), along with supporting changes like moving RagMockEmbedding to conftest.py, updating pyproject.toml to ban unittest, and adding pytest-mock as a dependency. The title is concise, specific, and directly communicates the core refactoring effort without vague language.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03f4d47 and 16f2c1a.

📒 Files selected for processing (1)
  • pyproject.toml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-and-push-dev
  • GitHub Check: Pylinter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
tests/asciidoc/test_asciidoc_converter.py (1)

116-132: Consider more specific caplog assertion.

The assertion assert "WARNING" in caplog.text is broad and could match unrelated warnings. For more precise verification, consider checking for the specific warning message:

     with caplog.at_level(logging.WARNING):
         adoc_text_converter.convert(converter_data["input_file"], mock_output_file)
-        assert "WARNING" in caplog.text
+        assert any("exists" in record.message and "overwritten" in record.message 
+                   for record in caplog.records)

or simply check that a warning was logged:

-        assert "WARNING" in caplog.text
+        assert len(caplog.records) > 0
+        assert caplog.records[0].levelname == "WARNING"
tests/test_document_processor.py (1)

68-90: Consider using monkeypatch for environment variables.

The manual cleanup of environment variables with os.environ.pop works but could be more robust. Pytest's monkeypatch fixture provides automatic cleanup:

-def test_init_default(self, mock_processor):
+def test_init_default(self, mock_processor, monkeypatch):
     """Test DocumentProcessor initialization with default vector store type (faiss)."""
+    monkeypatch.delenv("HF_HOME", raising=False)
+    monkeypatch.delenv("TRANSFORMERS_OFFLINE", raising=False)
+    
     doc_processor = document_processor.DocumentProcessor(**mock_processor["params"])
     
     # ... rest of test ...
     
     assert expected_params["embeddings_model_dir"] == os.environ["HF_HOME"]
     assert os.environ["TRANSFORMERS_OFFLINE"] == "1"
-    os.environ.pop("TRANSFORMERS_OFFLINE", None)

This ensures cleanup even if the test fails midway.

tests/test_document_processor_llama_index.py (2)

28-57: Consider using a structured return type for better type safety.

The fixture returns a dictionary with string keys, which works but lacks type hints and IDE support. For better maintainability, consider using a dataclass, NamedTuple, or SimpleNamespace:

Example with dataclass:

from dataclasses import dataclass

@dataclass
class ProcessorFixture:
    processor: document_processor.DocumentProcessor
    model_name: str
    chunk_size: int
    chunk_overlap: int
    num_workers: int
    embeddings_model_dir: str

@pytest.fixture
def doc_processor(mocker) -> ProcessorFixture:
    # ... setup code ...
    return ProcessorFixture(
        processor=processor,
        model_name=model_name,
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        num_workers=num_workers,
        embeddings_model_dir=embeddings_model_dir,
    )

Then access as doc_processor.processor instead of doc_processor["processor"].


213-233: Use pytest-mock's mocker.patch.dict for consistency.

This test uses @mock.patch.dict (unittest style) while all other tests in this file use pytest-mock's mocker fixture. For consistency with the pytest migration, consider using mocker.patch.dict instead.

Apply this diff:

-    @mock.patch.dict(
-        os.environ,
-        {
-            "POSTGRES_USER": "postgres",
-            "POSTGRES_PASSWORD": "somesecret",
-            "POSTGRES_HOST": "localhost",
-            "POSTGRES_PORT": "15432",
-            "POSTGRES_DATABASE": "postgres",
-        },
-    )
-    def test_pgvector(self, doc_processor):
+    def test_pgvector(self, mocker, doc_processor):
         """Test that DocumentProcessor initializes successfully with postgres vector store."""
+        mocker.patch.dict(
+            os.environ,
+            {
+                "POSTGRES_USER": "postgres",
+                "POSTGRES_PASSWORD": "somesecret",
+                "POSTGRES_HOST": "localhost",
+                "POSTGRES_PORT": "15432",
+                "POSTGRES_DATABASE": "postgres",
+            },
+        )
         proc = document_processor.DocumentProcessor(
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a346c2f and 07a8aa9.

📒 Files selected for processing (11)
  • scripts/remove_pytorch_cpu_pyproject.py (2 hunks)
  • tests/asciidoc/test__main__.py (2 hunks)
  • tests/asciidoc/test_asciidoc_converter.py (2 hunks)
  • tests/conftest.py (1 hunks)
  • tests/test_document_processor.py (2 hunks)
  • tests/test_document_processor_llama_index.py (3 hunks)
  • tests/test_document_processor_llama_stack.py (7 hunks)
  • tests/test_metadata_processor.py (1 hunks)
  • tests/test_okp.py (7 hunks)
  • tests/test_utils.py (2 hunks)
  • tests/utils.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/utils.py
🧰 Additional context used
🧬 Code graph analysis (8)
tests/asciidoc/test_asciidoc_converter.py (1)
src/lightspeed_rag_content/asciidoc/asciidoctor_converter.py (3)
  • AsciidoctorConverter (63-188)
  • _get_converter_file (104-119)
  • _get_attribute_list (132-146)
tests/test_metadata_processor.py (2)
src/lightspeed_rag_content/metadata_processor.py (4)
  • MetadataProcessor (26-96)
  • ping_url (44-56)
  • get_file_title (34-42)
  • populate (58-91)
tests/test_okp.py (1)
  • test_get_file_title (194-198)
tests/test_document_processor_llama_stack.py (2)
tests/conftest.py (1)
  • RagMockEmbedding (19-28)
src/lightspeed_rag_content/document_processor.py (6)
  • _Config (44-61)
  • _LlamaStackDB (184-378)
  • write_yaml_config (288-307)
  • _start_llama_stack (309-319)
  • add_docs (137-140)
  • add_docs (321-350)
tests/test_document_processor.py (2)
tests/conftest.py (1)
  • RagMockEmbedding (19-28)
src/lightspeed_rag_content/document_processor.py (6)
  • DocumentProcessor (381-490)
  • _Config (44-61)
  • _check_config (425-430)
  • process (443-485)
  • add_docs (137-140)
  • add_docs (321-350)
tests/test_document_processor_llama_index.py (2)
tests/conftest.py (1)
  • RagMockEmbedding (19-28)
src/lightspeed_rag_content/document_processor.py (9)
  • DocumentProcessor (381-490)
  • _got_whitespace (77-82)
  • _filter_out_invalid_nodes (85-94)
  • _save_index (149-156)
  • _save_metadata (158-181)
  • process (443-485)
  • save (142-147)
  • save (352-378)
  • save (487-490)
tests/test_utils.py (1)
src/lightspeed_rag_content/utils.py (1)
  • get_common_arg_parser (19-71)
tests/asciidoc/test__main__.py (1)
src/lightspeed_rag_content/asciidoc/__main__.py (3)
  • main_convert (38-52)
  • main_get_structure (55-76)
  • get_argument_parser (79-140)
tests/test_okp.py (1)
src/lightspeed_rag_content/okp.py (7)
  • metadata_has_url_and_title (55-67)
  • is_file_related_to_projects (28-52)
  • parse_metadata (112-138)
  • yield_files_related_to_projects (70-109)
  • OKPMetadataProcessor (141-152)
  • url_function (144-147)
  • get_file_title (149-152)
🔇 Additional comments (23)
scripts/remove_pytorch_cpu_pyproject.py (1)

1-50: Formatting and import order changes look good.

The docstring formatting and import reordering are purely cosmetic. The remove_sections function logic is sound — the for-else+break pattern correctly handles cases where intermediate section keys are missing, ensuring safe removal without errors. Type hints and the # type: ignore annotation are appropriate for tomlkit's API.

tests/asciidoc/test_asciidoc_converter.py (3)

15-44: LGTM! Clean fixture-based test data setup.

The migration to pytest fixtures is well-structured. The converter_data fixture provides a clear, reusable test data dictionary.


47-114: LGTM! Proper pytest-mock usage.

The tests correctly use mocker.patch and verify subprocess invocations with fixture data. The migration from unittest mocks to pytest-mock is clean.


133-191: LGTM! Complete pytest migration.

The remaining tests properly use pytest patterns including pytest.raises for exception testing and mocker for file I/O. The migration is consistent and correct.

tests/asciidoc/test__main__.py (3)

31-54: LGTM! Well-structured test fixtures.

The main_data fixture and get_mock_parsed_args helper provide clean test setup for the main module tests.


57-84: LGTM! Correct pytest migration.

This test properly uses mocker and verifies the subprocess call with fixture data.


113-133: LGTM! Proper test structure.

The test correctly mocks dependencies and verifies the expected subprocess invocation.

tests/test_utils.py (1)

16-64: LGTM! Clean pytest migration.

The test class correctly uses pytest assertions and pytest.raises for exception testing. The migration is straightforward and correct.

tests/conftest.py (1)

19-28: LGTM! Proper relocation of test utility.

The RagMockEmbedding class is correctly moved from tests/utils.py to tests/conftest.py, which is the appropriate location for shared test fixtures. The implementation provides deterministic mock embeddings for testing.

tests/test_document_processor.py (2)

25-63: LGTM! Well-designed pytest fixture.

The mock_processor fixture provides a clean, reusable setup for mocking document processor dependencies. The pattern of yielding a dictionary with mocks and test parameters is effective.


91-197: LGTM! Excellent use of pytest parametrization.

The tests effectively use pytest.mark.parametrize to cover multiple vector store types and chunking strategies. The mocking patterns and assertions are correct and thorough.

tests/test_metadata_processor.py (1)

25-147: LGTM! Thorough pytest migration.

The metadata processor tests are well-migrated to pytest with proper use of fixtures, mocker, and caplog. The test coverage is maintained and the assertions are appropriate.

tests/test_document_processor_llama_stack.py (4)

26-52: LGTM! Comprehensive test fixture.

The llama_stack_processor fixture properly mocks all dependencies for testing the Llama Stack DB processor. The fixture design is clean and provides reusable test configuration.


57-101: LGTM! Thorough initialization testing.

Both init tests properly verify the processor initialization with different model directory configurations. The environment variable checks and mock assertions are correct.


102-213: LGTM! Comprehensive YAML configuration testing.

Both tests properly verify the YAML configuration generation for different vector store backends. The use of mock_open and assertion of the complete expected output is correct.


214-355: LGTM! Complete test coverage.

The tests thoroughly cover document addition and saving for both manual and automatic chunking modes. The helper function _test_save reduces duplication effectively.

tests/test_okp.py (1)

16-198: LGTM! Excellent pytest migration with autouse fixture.

The OKP tests are well-migrated to pytest. The use of an autouse fixture at lines 175-186 to mock parse_metadata for all tests in TestOKPMetadataProcessor is a clean pattern that reduces duplication. All mocking and assertions are correct.

tests/test_document_processor_llama_index.py (6)

16-25: LGTM!

The imports are correctly updated for the pytest migration. The RagMockEmbedding is now imported from tests.conftest, and pytest is properly imported. Retaining unittest.mock for mock.Mock and mock.sentinel usage is acceptable.


63-85: LGTM!

The whitespace and filter node tests are correctly migrated to pytest style with proper fixture usage and assertions.


87-125: LGTM!

The _save_index and _save_metadata tests properly use mocker for patching and verify the expected behavior. The use of mock.sentinel for test values is appropriate.


127-179: LGTM!

The process method tests correctly handle the different unreachable document scenarios (normal, drop, fail) and use appropriate patching and assertions.


181-211: LGTM!

The tests for failure on unreachable documents and the save method are correctly implemented with proper use of pytest.raises and fixture-based mocking.


235-245: LGTM!

The test correctly verifies that an invalid vector store type raises a RuntimeError using pytest.raises.

Comment thread tests/asciidoc/test__main__.py Outdated
Comment thread tests/asciidoc/test__main__.py Outdated
Comment thread tests/asciidoc/test__main__.py Outdated
Comment thread tests/asciidoc/test__main__.py Outdated
@max-svistunov max-svistunov force-pushed the lcore-417-convert-to-pytest branch from 07a8aa9 to c5355b6 Compare October 21, 2025 13:45
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
tests/asciidoc/test__main__.py (4)

98-100: Move assertion outside pytest.raises context.

The assertion at line 100 never executes because main_convert(mock_args) raises SystemExit and immediately exits the context manager. The captured exception is only available after the with block completes.

Apply this diff:

     with pytest.raises(SystemExit) as e:
         main_convert(mock_args)
-        assert e.value.code != 0
+    assert e.value.code != 0

109-111: Move assertion outside pytest.raises context.

Same issue: the assertion at line 111 is unreachable inside the context manager.

Apply this diff:

     with pytest.raises(SystemExit) as e:
         main_convert(mock_args)
-        assert e.value.code != 0
+    assert e.value.code != 0

134-146: Fix mock patch target and assertion placement.

Two issues:

  1. The patch targets asciidoctor_converter.subprocess.run, but main_get_structure calls subprocess.run from __main__.py
  2. The assertion at line 146 is inside the pytest.raises context and never executes

Apply this diff:

     def test_main_incorrect_asciidoctor_cmd(self, mocker, main_data):
         mock_run = mocker.patch(
-            "lightspeed_rag_content.asciidoc.asciidoctor_converter.subprocess.run"
+            "lightspeed_rag_content.asciidoc.__main__.subprocess.run"
         )
         mock_run.side_effect = subprocess.CalledProcessError(
             cmd=main_data["asciidoctor_cmd"], returncode=1
         )
         mock_args = Mock()
         mock_args.input_file = main_data["input_file"]

         with pytest.raises(SystemExit) as e:
             main_get_structure(mock_args)
-            assert e.value.code != 0
+    assert e.value.code != 0

148-160: Fix mock patch target and assertion placement.

Two issues:

  1. The patch targets asciidoctor_converter.shutil.which, but main_get_structure calls shutil.which from __main__.py
  2. Both assertions (lines 159-160) are inside the nested context managers and never execute

Apply this diff:

     def test_main_missing_asciidoctor_cmd(self, mocker, main_data, caplog):
         mock_which = mocker.patch(
-            "lightspeed_rag_content.asciidoc.asciidoctor_converter.shutil.which"
+            "lightspeed_rag_content.asciidoc.__main__.shutil.which"
         )
         mock_which.return_value = ""
         mock_args = Mock()
         mock_args.input_file = main_data["input_file"]

-        with pytest.raises(SystemExit) as e:
-            with caplog.at_level(logging.ERROR):
-                main_get_structure(mock_args)
-                assert e.value.code != 0
-                assert "ERROR" in caplog.text
+        with caplog.at_level(logging.ERROR):
+            with pytest.raises(SystemExit) as e:
+                main_get_structure(mock_args)
+        assert e.value.code != 0
+        assert "ERROR" in caplog.text
🧹 Nitpick comments (1)
tests/test_okp.py (1)

118-163: Consider using Path objects in the mock for accuracy.

The mock returns strings (["file1.md", "file2.md", ...]) but Path.glob() returns Path objects in the actual implementation. While the test passes, it doesn't accurately simulate the real behavior where filepath would be a Path object that gets converted via str(filepath) before being passed to parse_metadata.

Apply this diff to better match the real behavior:

+from pathlib import Path
+
 def test_yield_files_related_to_projects(self, mocker):
     """Test yielding files related to specific projects."""
     mock_glob = mocker.patch("lightspeed_rag_content.okp.Path.glob")
     mock_glob.return_value = [
-        "file1.md",
-        "file2.md",
-        "file3.md",  # Should be ignored, missing metadata
+        Path("file1.md"),
+        Path("file2.md"),
+        Path("file3.md"),  # Should be ignored, missing metadata
     ]
 
     # ... (rest of mock setup)
     
     projects = ["foo", "bar"]
     files = list(okp.yield_files_related_to_projects("/fake", projects))
 
     # Check that the correct files are yielded
     assert len(files) == 2
-    assert "file1.md" in files
-    assert "file2.md" in files
+    assert Path("file1.md") in files
+    assert Path("file2.md") in files
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07a8aa9 and c5355b6.

📒 Files selected for processing (11)
  • scripts/remove_pytorch_cpu_pyproject.py (2 hunks)
  • tests/asciidoc/test__main__.py (2 hunks)
  • tests/asciidoc/test_asciidoc_converter.py (2 hunks)
  • tests/conftest.py (1 hunks)
  • tests/test_document_processor.py (2 hunks)
  • tests/test_document_processor_llama_index.py (3 hunks)
  • tests/test_document_processor_llama_stack.py (7 hunks)
  • tests/test_metadata_processor.py (1 hunks)
  • tests/test_okp.py (7 hunks)
  • tests/test_utils.py (2 hunks)
  • tests/utils.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/conftest.py
  • tests/test_document_processor.py
🧰 Additional context used
🧬 Code graph analysis (7)
tests/asciidoc/test__main__.py (1)
src/lightspeed_rag_content/asciidoc/__main__.py (3)
  • main_convert (38-52)
  • main_get_structure (55-76)
  • get_argument_parser (79-140)
tests/test_document_processor_llama_index.py (2)
tests/conftest.py (1)
  • RagMockEmbedding (19-28)
src/lightspeed_rag_content/document_processor.py (9)
  • DocumentProcessor (384-495)
  • _got_whitespace (80-85)
  • _filter_out_invalid_nodes (88-97)
  • _save_index (152-159)
  • _save_metadata (161-184)
  • process (448-490)
  • save (145-150)
  • save (355-381)
  • save (492-495)
tests/asciidoc/test_asciidoc_converter.py (1)
src/lightspeed_rag_content/asciidoc/asciidoctor_converter.py (3)
  • AsciidoctorConverter (63-188)
  • _get_converter_file (104-119)
  • _get_attribute_list (132-146)
tests/test_utils.py (1)
src/lightspeed_rag_content/utils.py (1)
  • get_common_arg_parser (19-78)
tests/test_okp.py (1)
src/lightspeed_rag_content/okp.py (7)
  • metadata_has_url_and_title (55-67)
  • is_file_related_to_projects (28-52)
  • parse_metadata (112-138)
  • yield_files_related_to_projects (70-109)
  • OKPMetadataProcessor (141-152)
  • url_function (144-147)
  • get_file_title (149-152)
tests/test_metadata_processor.py (2)
src/lightspeed_rag_content/metadata_processor.py (4)
  • MetadataProcessor (26-96)
  • ping_url (44-56)
  • get_file_title (34-42)
  • populate (58-91)
tests/test_okp.py (1)
  • test_get_file_title (194-198)
tests/test_document_processor_llama_stack.py (2)
tests/conftest.py (1)
  • RagMockEmbedding (19-28)
src/lightspeed_rag_content/document_processor.py (6)
  • _Config (45-62)
  • _LlamaStackDB (187-381)
  • write_yaml_config (291-310)
  • _start_llama_stack (312-322)
  • add_docs (140-143)
  • add_docs (324-353)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pylinter
  • GitHub Check: build-and-push-dev
🔇 Additional comments (16)
scripts/remove_pytorch_cpu_pyproject.py (1)

1-50: LGTM! Minor formatting improvements.

The formatting changes (docstring alignment and alphabetical import ordering) are harmless and follow common conventions. No functional impact.

Note: This file appears to be a utility script for removing PyTorch CPU dependencies, which doesn't seem directly related to the PR objective of converting tests to pytest. If this cleanup is intentional, consider noting it in the PR description for clarity.

tests/test_document_processor_llama_stack.py (1)

19-356: Excellent pytest migration!

The conversion from unittest to pytest is well-executed:

  • Clean fixture-based setup replacing setUp methods
  • Proper use of mocker from pytest-mock for patching
  • Consistent pytest-style assertions throughout
  • Maintains full test coverage while improving readability
tests/test_document_processor_llama_index.py (1)

20-245: Well-structured pytest migration!

The migration demonstrates excellent pytest practices:

  • Fixture returns dictionary for flexible access to processor and configuration
  • Consistent use of mocker.patch.object for targeted mocking
  • Proper use of pytest.raises for exception assertions (lines 198, 237)
  • Appropriate use of mock.patch.dict decorator for environment variable testing (line 213)
tests/test_metadata_processor.py (4)

25-38: Well-structured fixtures!

The fixtures provide clean separation of concerns and make the tests more maintainable by centralizing test data and setup.


44-73: Excellent test coverage for ping_url!

The tests properly cover the success, failure, and exception scenarios using pytest-mock. The migration is clean and correct.


75-93: Correct migration of file mocking!

The use of mocker.mock_open properly simulates file operations and the exception handling test verifies graceful degradation.


95-147: Thorough testing of populate method!

The tests correctly mock the abstract url_function method and other dependencies using mocker.patch.object. The use of caplog for logging assertions follows pytest best practices and properly verifies warning messages for unreachable URLs.

tests/test_utils.py (2)

17-22: LGTM! Clean migration to pytest.

The addition of the pytest import and removal of unittest.TestCase inheritance correctly converts this test class to pytest style. Class-based tests with instance methods remain fully supported in pytest.


29-64: LGTM! All assertions correctly migrated.

All unittest assertions have been properly converted to pytest equivalents:

  • assertIsInstanceassert isinstance(...)
  • assertEqualassert ... ==
  • assertRaises(SystemExit)pytest.raises(SystemExit) context manager
  • assertFalseassert not ...
  • assertTrueassert ...

The conversions are semantically equivalent and maintain the original test behavior.

tests/asciidoc/test_asciidoc_converter.py (1)

28-191: Excellent pytest migration!

The test migration from unittest to pytest is well-executed:

  • Clean fixture-based test data organization
  • Proper use of mocker for all mocking
  • Correct pytest.raises and caplog usage
  • All assertions properly placed outside context managers
tests/test_okp.py (6)

16-21: LGTM! Clean pytest migration.

The import and class declaration correctly follow pytest conventions by removing the unittest.TestCase inheritance and importing pytest for fixture support.


24-62: LGTM! Assertions converted correctly.

All unittest-style assertions have been properly converted to pytest-style plain assert statements. The test logic is preserved and the assertions are clear and idiomatic.


64-116: LGTM! Proper mocker fixture usage.

The test correctly uses the mocker fixture to mock file I/O operations. The mock_open with read_data appropriately simulates reading the TOML metadata from the file, and the assertion validates the complete parsed structure.


166-169: LGTM! Well-structured fixture.

The okp_mp fixture is properly defined and provides a clean way to instantiate the OKPMetadataProcessor for tests. This follows pytest best practices for test setup.


172-186: LGTM! Excellent use of autouse fixture.

The autouse=True fixture efficiently mocks parse_metadata for all tests in the class, providing consistent test data. This is a clean pattern that reduces code duplication and ensures all tests in this class have the necessary mocking in place.


188-198: LGTM! Clean fixture-based tests.

Both tests properly utilize the okp_mp fixture parameter and rely on the autouse fixture for mocking. The tests are concise and follow pytest conventions correctly.

Comment thread tests/asciidoc/test__main__.py
Comment thread tests/test_metadata_processor.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c984c7f and bec475e.

📒 Files selected for processing (2)
  • tests/asciidoc/test__main__.py (2 hunks)
  • tests/test_metadata_processor.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_metadata_processor.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/asciidoc/test__main__.py (1)
src/lightspeed_rag_content/asciidoc/__main__.py (3)
  • main_convert (38-52)
  • main_get_structure (55-76)
  • get_argument_parser (79-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: mypy
  • GitHub Check: build-and-push-dev
  • GitHub Check: Pylinter
🔇 Additional comments (7)
tests/asciidoc/test__main__.py (7)

15-29: LGTM: Imports updated for pytest.

The imports correctly include pytest, logging for caplog usage, and Mock from unittest.mock. The test targets are properly imported from the main module.


31-43: LGTM: Well-structured pytest fixture.

The main_data fixture provides centralized test data that's reused across multiple tests. Good pytest practice.


45-53: LGTM: Helper reduces test duplication.

The get_mock_parsed_args helper appropriately constructs mock arguments from fixture data, reducing boilerplate across tests.


57-83: LGTM: Correct mock targets for main_convert.

The test correctly patches asciidoctor_converter.shutil.which and asciidoctor_converter.subprocess.run since main_convert invokes AsciidoctorConverter which uses these modules.


85-111: LGTM: Error scenarios correctly tested.

Both error tests properly use pytest.raises with assertions outside the context block (addressing past review feedback). Mock targets are appropriate for testing main_convert error paths.


113-132: LGTM: Correct patch targets for main_get_structure.

The test correctly patches __main__.shutil.which and __main__.subprocess.run since main_get_structure uses these directly from the __main__ module (addressing past review feedback).


162-165: LGTM: Simple assertion correctly checks parser type.

The pytest-style assertion appropriately verifies the return type.

Comment thread tests/asciidoc/test__main__.py
Comment thread tests/asciidoc/test__main__.py
@max-svistunov max-svistunov force-pushed the lcore-417-convert-to-pytest branch from bec475e to 9fe59eb Compare October 21, 2025 14:51
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
tests/test_metadata_processor.py (8)

24-27: Fixture may break if class becomes an ABC; consider a tiny concrete subclass.

If MetadataProcessor is ever changed to subclass abc.ABC, instantiating it here will fail. A minimal test subclass that implements url_function (or patching the instance method) makes the fixture future-proof.


43-53: Patch where used and assert call details (robustness).

Patch the imported symbol in the target module and assert the single-call/timeout behavior.

-        mock_get = mocker.patch("requests.get")
+        mock_get = mocker.patch("lightspeed_rag_content.metadata_processor.requests.get")
@@
         result = md_processor.ping_url(processor_data["url"])
 
         assert result is True
+        assert mock_get.call_count == 1
+        assert mock_get.call_args == ((processor_data["url"],), {"timeout": 30})

54-64: Verify retries and patch at the module path.

Given ping_url retries on non-200, assert it tried three times and used the expected args.

-        mock_get = mocker.patch("requests.get")
+        mock_get = mocker.patch("lightspeed_rag_content.metadata_processor.requests.get")
@@
         result = md_processor.ping_url(processor_data["url"])
 
         assert result is False
+        assert mock_get.call_count == 3
+        for args, kwargs in mock_get.call_args_list:
+            assert args == (processor_data["url"],)
+            assert kwargs == {"timeout": 30}

65-73: Also assert retries on exceptions and patch at the module path.

-        mock_get = mocker.patch("requests.get")
+        mock_get = mocker.patch("lightspeed_rag_content.metadata_processor.requests.get")
@@
         result = md_processor.ping_url(processor_data["url"])
 
         assert result is False
+        assert mock_get.call_count == 3
+        for args, kwargs in mock_get.call_args_list:
+            assert args == (processor_data["url"],)
+            assert kwargs == {"timeout": 30}

74-84: Include a newline in mocked file data to mirror real files.

This exercises rstrip("\n") as in production reads.

-            read_data=f'# {processor_data["title"]}',
+            read_data=f'# {processor_data["title"]}\n',

85-93: Assert open() was called with the expected encoding.

Strengthens the contract that get_file_title reads as UTF‑8.

         result = md_processor.get_file_title(processor_data["file_path"])
 
         assert "" == result
+        mock_file.assert_called_once_with(
+            processor_data["file_path"], "r", encoding="utf-8"
+        )

94-118: Patch instance methods to simplify call assertions and verify interactions.

Patching on the instance avoids self in call args and lets you assert exact inputs.

-        mock_url_func = mocker.patch.object(
-            metadata_processor.MetadataProcessor, "url_function"
-        )
-        mock_get_title = mocker.patch.object(
-            metadata_processor.MetadataProcessor, "get_file_title"
-        )
-        mock_ping_url = mocker.patch.object(
-            metadata_processor.MetadataProcessor, "ping_url"
-        )
+        mock_url_func = mocker.patch.object(md_processor, "url_function")
+        mock_get_title = mocker.patch.object(md_processor, "get_file_title")
+        mock_ping_url = mocker.patch.object(md_processor, "ping_url")
@@
         result = md_processor.populate(processor_data["file_path"])
 
         expected_result = {
@@
         }
         assert expected_result == result
+        mock_url_func.assert_called_once_with(processor_data["file_path"])
+        mock_get_title.assert_called_once_with(processor_data["file_path"])
+        mock_ping_url.assert_called_once_with(processor_data["url"])

119-146: Target the module logger in caplog and strengthen assertions.

Make logging capture deterministic and verify message contents and interactions.

-        mock_url_func = mocker.patch.object(
-            metadata_processor.MetadataProcessor, "url_function"
-        )
-        mock_get_title = mocker.patch.object(
-            metadata_processor.MetadataProcessor, "get_file_title"
-        )
-        mock_ping_url = mocker.patch.object(
-            metadata_processor.MetadataProcessor, "ping_url"
-        )
+        mock_url_func = mocker.patch.object(md_processor, "url_function")
+        mock_get_title = mocker.patch.object(md_processor, "get_file_title")
+        mock_ping_url = mocker.patch.object(md_processor, "ping_url")
@@
-        with caplog.at_level(logging.WARNING):
+        with caplog.at_level(logging.WARNING, logger=metadata_processor.__name__):
             result = md_processor.populate(processor_data["file_path"])
@@
         assert expected_result == result
-        assert "URL not reachable" in caplog.text
+        assert "URL not reachable" in caplog.text
+        assert processor_data["url"] in caplog.text
+        assert processor_data["title"] in caplog.text
+        mock_url_func.assert_called_once_with(processor_data["file_path"])
+        mock_get_title.assert_called_once_with(processor_data["file_path"])
+        mock_ping_url.assert_called_once_with(processor_data["url"])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bec475e and 9fe59eb.

📒 Files selected for processing (2)
  • tests/asciidoc/test__main__.py (2 hunks)
  • tests/test_metadata_processor.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/asciidoc/test__main__.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_metadata_processor.py (1)
src/lightspeed_rag_content/metadata_processor.py (4)
  • MetadataProcessor (26-96)
  • ping_url (44-56)
  • get_file_title (34-42)
  • populate (58-91)
🔇 Additional comments (1)
tests/test_metadata_processor.py (1)

40-146: Overall: migration looks solid.

Good pytest idioms, clear fixtures, and coverage of success/failure paths. With the minor robustness tweaks above, this suite will be very resilient.

@max-svistunov max-svistunov force-pushed the lcore-417-convert-to-pytest branch from da6d3e3 to 03f4d47 Compare October 21, 2025 15:19
@max-svistunov max-svistunov force-pushed the lcore-417-convert-to-pytest branch from f274973 to 1337d7b Compare October 21, 2025 20:53
Copy link
Copy Markdown
Collaborator

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread pyproject.toml
@tisnik tisnik merged commit 0d4c9c8 into lightspeed-core:main Oct 23, 2025
14 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Jan 15, 2026
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants