Skip to content

SearchQA Data Split Configuration and max_completion_tokens Setting #13

@wenny823

Description

@wenny823

Thank you for this excellent project! I have two questions regarding the SearchQA environment configuration:

  1. SearchQA Data Split
    I noticed in SkillOpt/configs/searchqa/default.yaml: train_size: 400
    Could you clarify how the SearchQA dataset is split for training/validation/testing? Specifically:
    Training set: 400 samples (randomly sampled?)
    Validation set: 200 samples?
    Test set: 1400 samples?
    Is this split fixed or randomly sampled each run?

  2. max_completion_tokens=512 Truncation Concern
    In SkillOpt/skillopt/envs/searchqa/rollout.py, I observed: max_completion_tokens=512.
    Does this token limit frequently cause output truncation during agent rollout? Have you observed significant truncation issues in practice?

Thank you for your time and clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions