You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure how to get in touch with the authors but my primary concern is that common crawl / sphere does not cover many scientific publications or monographs. Which is something of course you point out yourself. Of course in some disciplines there may be exceptions for instance ACL and arXiv, I dont think common crawl comprises much of what is available through open access... (like unpaywall) though I could be wrong about that. Also I acknowledge that closed access sources and print monographs make verification hard! But perhaps this is why users prefer the sources SIDE suggests to original.
Either way SIDE is limited only to documents available on CC. Since i would guess scientific publications and monographs comprise many if not most wikipedia citations I am wondering what the number of existing citations in wikipedia is covered by SPHERE? While this is a big ask... I would have a bit of a hard time trusting this system if only a small percentage of citations are retrievable, not to mention the systematic bias introduced by excluding documents not available on common crawl. I am particularly interested because many of the setups in the paper rely on the original citation as an anchor.
Thanks for putting out this system and considering my request!
The text was updated successfully, but these errors were encountered:
We only considered references corresponding to web pages, but Wikipedia also cites books, scientific articles and other kind of documents. These include other modalities than just text, such as images and videos. To fully assess the quality of Wikipedia references, Side needs to become multi-modal.
Note, Side is a POC that shows the technology is there. To build a production system there is still lots to do. :)
Thanks @fabiopetroni - I work in the domain doing similar things to SIDE but with scientific articles (@ scite.ai ) so if you all are looking for collaborators in that space let me know.
Hey there,
Not sure how to get in touch with the authors but my primary concern is that common crawl / sphere does not cover many scientific publications or monographs. Which is something of course you point out yourself. Of course in some disciplines there may be exceptions for instance ACL and arXiv, I dont think common crawl comprises much of what is available through open access... (like unpaywall) though I could be wrong about that. Also I acknowledge that closed access sources and print monographs make verification hard! But perhaps this is why users prefer the sources SIDE suggests to original.
Either way SIDE is limited only to documents available on CC. Since i would guess scientific publications and monographs comprise many if not most wikipedia citations I am wondering what the number of existing citations in wikipedia is covered by SPHERE? While this is a big ask... I would have a bit of a hard time trusting this system if only a small percentage of citations are retrievable, not to mention the systematic bias introduced by excluding documents not available on common crawl. I am particularly interested because many of the setups in the paper rely on the original citation as an anchor.
Thanks for putting out this system and considering my request!
The text was updated successfully, but these errors were encountered: