Random subsets for reproduction

Thank you for your wonderful work!

I just ran `run_pipeline.py` and got missing file errors: 
```
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/random-api-completion.test.jsonl'
```

As far as I understand, the `random-` prefix denotes split subsets for evaluation, as referred to in Section 3 of the original paper: 
> Eventually, a total of 1600 test samples are generated for the line completion dataset.

and
> From these candidates, we then randomly select 200 non-repetitive API invocations from each repository, resulting in a total of 1600 test samples for the API invocation completion dataset. 

For the purpose of reproduction , I would like to ask you about the following four subsets in `utils.py`:
```Python 
class FilePathBuilder:
    api_completion_benchmark = 'datasets/random-api-completion.test.jsonl'
    random_line_completion_benchmark = 'datasets/random-line-completion.test.jsonl'
    # short version for codegen
    short_api_completion_benchmark = 'datasets/random-api-completion-short-version.test.jsonl'
    short_random_line_completion_benchmark = 'datasets/random-line-completion-short-version.test.jsonl'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Random subsets for reproduction #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Random subsets for reproduction #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions