Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please specify citation coverage #4

Closed
domenicrosati opened this issue Jul 11, 2022 · 2 comments
Closed

Please specify citation coverage #4

domenicrosati opened this issue Jul 11, 2022 · 2 comments

Comments

@domenicrosati
Copy link

domenicrosati commented Jul 11, 2022

Hey there,

Not sure how to get in touch with the authors but my primary concern is that common crawl / sphere does not cover many scientific publications or monographs. Which is something of course you point out yourself. Of course in some disciplines there may be exceptions for instance ACL and arXiv, I dont think common crawl comprises much of what is available through open access... (like unpaywall) though I could be wrong about that. Also I acknowledge that closed access sources and print monographs make verification hard! But perhaps this is why users prefer the sources SIDE suggests to original.

Either way SIDE is limited only to documents available on CC. Since i would guess scientific publications and monographs comprise many if not most wikipedia citations I am wondering what the number of existing citations in wikipedia is covered by SPHERE? While this is a big ask... I would have a bit of a hard time trusting this system if only a small percentage of citations are retrievable, not to mention the systematic bias introduced by excluding documents not available on common crawl. I am particularly interested because many of the setups in the paper rely on the original citation as an anchor.

Thanks for putting out this system and considering my request!

@fabiopetroni
Copy link
Contributor

Hey @domenicrosati,

thanks a lot for your message.

We only considered references corresponding to web pages, but Wikipedia also cites books, scientific articles and other kind of documents. These include other modalities than just text, such as images and videos. To fully assess the quality of Wikipedia references, Side needs to become multi-modal.

Note, Side is a POC that shows the technology is there. To build a production system there is still lots to do. :)

@domenicrosati
Copy link
Author

Thanks @fabiopetroni - I work in the domain doing similar things to SIDE but with scientific articles (@ scite.ai ) so if you all are looking for collaborators in that space let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants