-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OMP Chapter - Googlescholar support #7003
Comments
Hi @asmecher and @NateWr , @ajnyga @mwestin-googlescholar and @gemusehandler |
I think we have three separate issues here:
For 2 my suggestion would be to add an independent handler that would look like Concerning 3, Ubiquity seems to redirect url's. For example https://www.ubiquitypress.com/site/books/10.5334/bcj is used in the navigation and is registered to crossref DOIs and also probably to other places. But clicking it will redirect to https://www.ubiquitypress.com/site/books/e/10.5334/bcj thus giving google the information they need. We have asked whether this data could be in metadata of the html page, but google informed us that they need the information before they start to read the page content. |
Older issue on landing pages here: #5280 |
I'm sorry but I think we need to push back on the URL requirements to distinguish between monographs and edited volumes. GoogleScholar should not be dictating the URL structure of any site. They can ask for a HTTP address for every link they want to provide (eg - a URL for every chapter or galley), but setting requirements on the structure of the URL, beyond the constraints of the DNS system, violates the core principles of HTTP. |
Hi @NateWr , May be , the way I wrote the specs made the misunderstanding that google scholar gave the structure. I and @ajnyga thought, that can be a proposal. GS actually wanted a markup in the url to indicate that is a chapter / edited volume or a monograph and their only restriction is they can't distinguish the information from the metadata. |
Hi @NateWr , Many platforms and publishers, from Ubiquity to MIT Press and Taylor & Francis, have added URL markers distinguishing chapters from books and standalone monographs from edited volumes where chapters are standalone content. The string is entirely up to the publisher/platform-- I had suggested what other publishers have put in place that works well for indexing. Adding various path markers to item-level URLs is totally unrelated to HTTP protocols. We see many different types of URL markers for publisher sites, both related and unrelated to indexing. Cheers, |
Hello @NateWr, @mwestin-googlescholar and @withanage, Nate, I respect your position. If we stick to it (Google Scholar not telling us what to do) then I see two simple solutions. In the administration I could use "monograph" as path. Using the same url I could add a second press where I would use "edited-volume". It is a bit clumsy to have two presses where it could be one but hey! It would work, right? But I see another solution as well. I spotted that the PDF reader add some additional information to the download URL. There is ?inline=1 at the end. I played around a bit with the Google Scholar plugin and I notice that I can add _monograph to the download link as well. My browser doesn't warn of additional errors and the link that is produced works as it should. If this would indeed work then we need some code that tells the plugin: I get the feeling that this issue can be resolved within the plugin... However, now I bumped into something strange But the URLs that appear in the "meta name" from book 12 on and higher produce a 404 page. So I have two questions. Is the way that the download link is generated in the Scholar Plugin correct? |
No apology needed, @withanage, your description was perfectly clear. My concern is not with the specific proposal ( URLs are an important part of UX, and something that we are actively trying to improve (#5932). URLs need to be clear and concise for humans, and Google Scholar's insistence on a fixed string to parse data from a URL is not compatible with what we need to do for our community. At the moment, our URLs are hard-coded in English. But this will not always be the case. A Spanish journal or press needs to be able to run their site with URLs in Spanish. Eventually, we will support localised routes, such as In the long-term, it is not viable for us to fix the URL structure to an English language word ( The expectation that Google Scholar can identify a resource from a URL structure is premised on a publishing oligopoly in which there are a small number of publishing systems with fixed URLs. Unfortunately, this is not compatible with our vision of a distributed, multilingual publishing infrastructure. @mwestin-googlescholar it's my understanding from talking with colleagues that Google Scholar is not willing to budge on this point. But from my perspective the clock is running out. If GS is unwilling to prioritize adaptations that are important to us, for example removing the requirement to include |
Hi all-- totally understood if this isn't a feature you want to pursue for OMP. As I had mentioned, the string for URL markers for book indexing could be any string you like, i.e. no need for using any particular language. If you'd ever like to explore this as a future feature for OMP, just let me know. Cheers, |
@mwestin-googlescholar, this is a feature we'd like to pursue -- but we're looking for alternatives to deriving the kind of information you need from URLs, when that's a big imposition on our community and is at odds with longer-term goals. We can't and don't want to exert control over our community; the |
Hi @asmecher, Happy to speak further about this, but the short version is that the only want to set this up well and consistently is within book/ chapter level URLs themselves. To handle different types of books, the indexing system needs to know how to treat the item before it analyzes the metadata. If this ends up being a direction your community wants to go in, happy to pick this conversation back up. |
Hi @mwestin-googlescholar, I'm following this thread with much attention. I have been an active member of the PKP community for the last decade which let me join the PKP technology committee as a "at-large" member. I provide support on local networks (spain) as well as in latin american ones, and I offer support in forum to the community. I don't intend to present a CV here, but I thought it was necessary to explain where I am speaking from when I say that I know the needs of the community well. Said that, I want to note that there is no IF is that "this ends up being a direction your community wants to go in". |
HI @marcbria, That is great to hear! Just to be crystal clear, this direction would need to involve URL markers in the URL paths themselves. As I mentioned a few times, always happy to continue the conversation. |
Sorry @mwestin-googlescholar but my English is not good and I'm probably missing something. Does it means that Google is telling the PKP community how things need to be done? Because "need to involve" does not sound like a good start for an open conversation. |
I think we're talking in circles a bit here. Let me put it a different way: to systematically and accurately distinguish chapters from books, and different book types from each other, the indexing system can only work with URL markers. Unfortunately there isn't another way that works well. I wish there were, as I know adding URL markers can be difficult for a few different reasons. To be clear: the indexing system will still index OMP publications without these URL markers. This is a potential future improvement project to further refine how we work together to index books and chapters of different kinds. There is no pressure at all from my side to implement this refinement - I am sharing answers to questions I've been asked (so the comment above feels a little uncomfortable to me). I hope that clears things up. |
I have been looking how Springer, Elsevier and Taylor and Francis work with their url's. I will divide the solutions the few categories and make a suggestion how we could work in OMP. URL markersSpringer and Elsevier do not seem to add url markers concerning the book type, or I could not find them. Springer (Springer link) and Elsevier (Science Direct) have independent handlers for showing single chapters. Springer: Taylor and Francis uses Metadata tags (Highwire Press)Taylor & Francis will add metadata tags to landing pages:
Effectively this is saying, "for monographs just index the main book page and for edited volumes just index the chapter pages". I will use this idea in my solution below. Elsevier And Springer seem to never add metadata tags for book main landing pages both in case of monographs and edited volumes, just for chapters regardless if it is a monograph or an edited volume. Elsevier also adds All publishers use these (among others) on chapter landing pages to determine that this is a chapter and the relation to a book:
And Elsevier also uses these to define the type of the chapter (could not find anything similar from others):
SitemapsAlthough Taylor & Francis does not add metatags to monograph chapters, interestingly in the sitemap they only seem to list chapter url’s also for monographs. I could not find any direct url's leading to book landing pages at all. But there are several sitemaps, so could be that I just did not find the right one. Springer sitemap is has url's leading to book main landing pages. Could not find links leading to single chapters. For Springer Link I could not find a sitemap, so this is the situation with the main Springer site. Elsevier sitemap has url's leading to both books and single articles/chapters. SuggestionSo what I have heard is that the problem in OMP indexing lies in the fact we do not distinguish between Monographs and Edited volumes and Google does not know how to handle their chapters. I have understood that indexing monographs is working already but for edited volumes Google would like to only serve hits to single articles/chapters. Altering the url structure is the solution provided by google to solve this and is used by Taylor and Francis. However, they also have other solutions in place, namely the way they show metadata in different cases. My suggestion is that OMP would work like this. OMP core functionalities
OMP Google Scholar plugin functionalities
This should lead to a situation where for monographs only the book main page is being indexed and for edited volumes only the chapters are indexed. edit: of course if I have misunderstood some of the requirements, let me know |
This is great, thanks @ajnyga! So if I understand correctly, the key distinction is where the Google Scholar meta tags are placed. For monographs, they're only placed on the book landing page. For edited volumes, they're only placed on the chapter pages. Is that right?
I don't think we should do this because every public URL should be indexable by general search engines. What Google Scholar needs shouldn't override the general compatibility with search engines. Maybe this has already been discussed, but chapter pages are going to be opt-in for monographs, right? So by default my monograph will only have the one page ( |
Yes, exactly. This is the way F&T is handling it. Besides having the url markers. Elsevier and Springer have tags just in chapter pages for both book types and never in the book landing page. I think this means that they are targeting Google Scholar indexing this way and just try to make sure individual chapters end up there. Yes, I am aware of the downsides of robots rules. We could of course limit them to googlebot. Too bad there is no separate bot rule for google scholar. It would make this very easy and exact. Making chapter landing pages optional for monographs is probably a good idea. Many monographs might just have the chapter metadata available, but no chapter specific full text. |
At the UB Heidelberg we're working on a chapter landing page-plugin for OMP 3.3. For now all chapters with a 'deposited' or 'marked registered ' DOI get automaticly a own landing page. But we're discussing make this configurable. The path is /book/book_id/c+chapter_id. Maybe we can make the part after the book_id configurable too. Without these configurable parts we have a working solution on our development system. Only the template is not finished by now. |
That sounds interesting, @nongenti. I'd recommend using a full chapter path |
Introduction
Edited volumes require chapter landing pages for exposing metadata and content to UI users, indexing systems and archiving systems and other purposes.
Requirements from Google scholar
Background
Google scholar (GS) drives end-users to compontent level e.g chapters. GS needs an indication in URL marker to determine the type of the standalone content.
For monographs, standalone piece of content is the book content and metadata. Chapters are not indexed.
For chapters, standalone content are chapters, GS Indexing System (IS) matches full-text for each chapter and drives the end-users to the chapters thus increasing ranking.
Requirements
GS IS needs an indication, what type of item using URL Path (not as parameters) and a top level landing page.
Possible solution after discssions @withanage , @ajnyga @mwestin-googlescholar and @gemusehandler
We allow an alternative path for indicating the type. e.g.
catalog/book/19 will always lead to book 19 as in the current implementation
Additionally url-marked paths will point gs the type of the content.
Robots.txt needs a URL map indicating those alternative paths.
Related ticket
Previous work by @ajnyga on chapters
Reference implementation by Ubiquity Press to support GS
Base URL : https://www.ubiquitypress.com/site/books/e/10.5334/bcj
Alternaticve URL: https://www.ubiquitypress.com/site/books/e/10.5334/bcg/ e means Edited Volume
The text was updated successfully, but these errors were encountered: