Reproduction Results Show Significant EM Difference in Iter1 Compared to Paper

Hi, thanks for your awesome work!

I'm trying to reproduce the results from the paper using `Codegen-350-mono` and noticed some discrepancies in the metrics, particularly in the EM score for Iter1.

**Paper Results:**
| Metric | Infile | Iter1 | Iter2 | Iter3 | Iter4 |
|--------|--------|-------|-------|-------|-------|
| EM     | 22.19  | 31.75 | 33.88 | 33.75 | 33.81 |
| ES     | 52.24  | 59.82 | 61.03 | 60.96 | 61.06 |

**My Reproduction Results:**
| Metric | Infile | Iter1 | Iter2 | Iter3 | Iter4 |
|--------|--------|-------|-------|-------|-------|
| EM     | 22.25  | 33.12 | 34.06 | 34.12 | 33.81 |
| ES     | 52.23  | 60.33 | 60.94 | 61.23 | 60.92 |

**My Hyperparameters:**
- max_retrieval_length = 900
- window_size = 20
- slice_size = 2

**Observations:**
1. The Infile metrics are very close to the paper results (EM: 22.25 vs 22.19, ES: 52.23 vs 52.24)
2. However, there's a significant difference in EM for Iter1 (33.12 vs 31.75 - difference of 1.37 points)
3. Other iterations show smaller but noticeable variations in both EM and ES metrics

I'm using the code from the /zfj/RepoCoder branch for my tests. Since the initial commit, I've noticed there have been a number of updates to the repository. This leads me to wonder:
- Could these recent commits be the primary reason for the elevated performance metrics I'm seeing?
- If not, what other factors should I investigate? (For example, were there specific hyperparameters or environmental settings used for the paper's experiments that differ from the current defaults?)


Thank you for your time and assistance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproduction Results Show Significant EM Difference in Iter1 Compared to Paper #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproduction Results Show Significant EM Difference in Iter1 Compared to Paper #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions