-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset comparison - USPTO full vs USPTO MIT #6
Comments
Just to add one thing: In ref 33 (AutoSynRoute), they evaluated neuralysm as 47.8% for the MIT dataset. Dai et. al. evaluated neuralysm at 35.8% on the full dataset. Given that both authors implemented neuralysm in the same way, it seems that the full dataset is way harder to perform well on. |
Hello, While I understand your concern, I would not fully concur with the assertion that "the evaluation on a dataset 2x the size is surely more difficult." The relationship between dataset size and accuracy is multifaceted, influenced by various factors, including the quality of the additional data and the training procedure employed. To illustrate, one could argue that evaluating performance on the USPTO-MIT dataset should be considerably more challenging than on the well-curated USPTO-50K dataset, given that the former is approximately ten times larger. The discrepancy between the neuralysm results you mentioned could potentially be explained if the USPTO-full dataset is noisier than the USPTO-MIT dataset. As you pointed out, the lack of consensus on dataset selection can indeed result in inconsistencies when comparing various studies. We have acknowledged this issue by including a footnote stating that some of the results in the comparison table are derived from either the USPTO-full or USPTO-MIT datasets. |
Hello, Thank you very much for your prompt reply. In Dai et. al., they say that:
Possibly, this is the indication that this dataset is harder to perform on (as it was curated by Dai et. al.). Regarding the footnote, are you referring to this?
If so, there are no superscripts/subscripts which indicate which of the algorithms are performed on USPTO-full and USPTO-MIT. That is why I got confused and possibly other readers will get, too. The performance difference between the two datasets becomes more evident in this paper: "Root-aligned SMILES: a tight representation for chemical reaction prediction" by Zhong et. al. Here we can see a performance decrease of (12-15%), which is rather large. Of course, your approach might not exhibit the same; however, I still believe that the comparison in Table 3 is misleading for the reader. |
Hi, Regarding the footnote, I agree that we should have been more explicit in indicating which methods were performed on the USPTO-full and USPTO-MIT datasets. The current form of the table may not be sufficiently clear for readers. Additionally, I appreciate the reference to the paper by Zhong et al., which highlights the performance difference between the two datasets. Best. |
Hi, Thanks for the clarification. Again, well done for this very interesting framework. |
Hi,
this is a very interesting approach. Well done!
I have one concern regarding the comparison w.r.t other existing methods. For evaluation, you made use of the USPTO-MIT dataset. This is commonly used for reaction prediction (forward). I saw that in the works of the self-correcting transformer and AutoSynRoute, they used the same dataset. However, for other retrosynthesis algorithms, GLN, AT, Retrosim and Retroprime were trained and evaluated on a dataset double the size (USPTO-full curated by Dai et. al.).
Would you not agree that a comparison here is a bit unfair as the evaluation on a dataset 2x the size is surely more difficult?
Thank you for clarifying this.
The text was updated successfully, but these errors were encountered: