Skip to content

Conversation

@dev-jonathan
Copy link
Contributor

@dev-jonathan dev-jonathan commented Oct 28, 2025

Issue Link / Problem Description

Changes Made

  • Added metadata fields to dataset_schema.py:
    • persona_name: Optional[str]
    • query_style: Optional[str]
    • query_length: Optional[str]
  • Updated single_hop/base.py to populate these fields during synthetic data generation:
    return SingleTurnSample(
        user_input=response.query,
        reference=response.answer,
        reference_contexts=[reference_context],
        persona_name=getattr(scenario.persona, "name", None),
        query_style=getattr(scenario.style, "name", None),
        query_length=getattr(scenario.length, "name", None),
    )
  • Updated class documentation with descriptions for new fields

Testing

How to Test

  • Manual testing steps:
    1. Run synthetic data generation using SingleHopQuerySynthesizer
    2. Verify metadata fields are properly populated in generated samples
    3. Confirm values match the scenario settings (persona, style, length)
    4. Check backwards compatibility with existing code

References

Screenshots/Examples

# Example of generated sample with metadata:
{
    "user_input": "What are the key features of Python?",
    "reference": "Python is a versatile programming language...",
    "persona_name": "Student",
    "query_style": "POOR_GRAMMAR",
    "query_length": "MEDIUM"
}

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Oct 28, 2025
Copy link
Member

@anistark anistark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dev-jonathan

Overall looks good.

Could you please also add tests to verify this.

Also, what do you think about similar changes for MultiHop?

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Oct 28, 2025
@dev-jonathan
Copy link
Contributor Author

Update: Metadata field tests and CI/CD performance fix

Added tests for new metadata fields in SingleTurnSample:

  • persona_name, query_style, query_length

New simple tests:

  1. test_generate_sample_includes_metadata - Verifies SingleHopQuerySynthesizer correctly populates metadata fields in SingleTurnSample
  2. test_single_turn_sample_metadata_roundtrip_hf_and_jsonl - Ensures fields serialize/deserialize correctly in EvaluationDataset (HF/JSONL)

Fixed Windows CI performance test failure:

Issue: test_performance_find_n_indirect_clusters_large_web_constant_n was failing on Windows CI due to timing fluctuations i think.

Solution:

  • Increased micro-time skip threshold from 1e-6 to 1e-4 (100 microseconds)
  • Added tolerance factors similar to other performance tests in the file:
    • tolerance_factor = 3.0 for very fast operations
    • tolerance_factor = 2.0 for larger operations
  • Updated error message to be clearer about thresholds

Note: I'm not 100% certain this tolerance is perfect, but the test suite now passes consistently. If you think I should adjust the limits or use a different approach, please let me know.

Next steps:

Planning to look into similar coverage for multi-hop questions in the future, but encountered some local execution errors that made it more difficult to implement now.

Test status:

All related tests pass locally:

  • test_generate_sample_includes_metadata
  • test_single_turn_sample_metadata_roundtrip_hf_and_jsonl
  • test_performance_find_n_indirect_clusters_large_web_constant_n (with adjusted tolerance)

If you need any changes, please let me know and I'll update accordingly.

@anistark anistark merged commit 35e884b into explodinggradients:main Oct 29, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Testset generator not preserving persona and scenario metadata in generated samples

2 participants