Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise Google Scholar indexation #798

Closed
pronguen opened this issue Mar 9, 2022 · 0 comments · Fixed by #866
Closed

Optimise Google Scholar indexation #798

pronguen opened this issue Mar 9, 2022 · 0 comments · Fixed by #866
Assignees
Labels
client request enhancement Enhancement of an existing feature f: data migration Data migration from a legacy system or a previous version p-High To set a high priority!

Comments

@pronguen
Copy link
Contributor

pronguen commented Mar 9, 2022

Current behaviour

Our service RERO DOC, well indexed by Google Scholar, is progressively replaced by SONAR, which is used by various universities in Switzerland.
Some documents in SONAR are not indexed by Google Scholar, example: https://sonar.ch/global/documents/312841

Diagnose from Google:

  • there is no way for the crawler to get a list of all the individual records

Wanted behaviour

All documents are indexed in Google Scholar

To do

  1. Add a sitemap that is updated daily or weekly and the url for that listed in the robots.txt. Then contact back Google to configure the crawler to use it to get the list of document urls. If individual files are larger than 5MB, it would be best to split the files with a max size of 5MB and use a sitemap index file to list the sitemaps. You can see an example at https://hal.inria.fr/robots which lists sitemap: https://hal.inria.fr/robots/sitemap
    • To clarify: for dedicated repo, present the dedicated URL or the global URL? Both?
  2. display the full abstract for users arrive from Google Scholar & Google web search (no "Show more" option)

Remarks

Maybe look at this: https://github.com/inveniosoftware/flask-sitemap

@pronguen pronguen added enhancement Enhancement of an existing feature p-High To set a high priority! labels Mar 9, 2022
@PascalRepond PascalRepond added the f: data migration Data migration from a legacy system or a previous version label Mar 23, 2022
@Garfield-fr Garfield-fr self-assigned this Jun 15, 2022
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 27, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Jun 28, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 10, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 10, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 10, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit that referenced this issue Aug 16, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes #798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit that referenced this issue Aug 17, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes #798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 18, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit to Garfield-fr/sonar that referenced this issue Aug 18, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes rero#798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Garfield-fr added a commit that referenced this issue Aug 18, 2022
* Adds a new field serverName on organisation resource.
* Implements sitemap generation for global and dedicated organisations.
* Adds a cli to generate sitemap files.
* Adds a task to generate sitemap files.
* Implements generation of the file robot.txt dynamically.
* Closes #798.

⚠️  ES Update mapping

Co-Authored-by: Bertrand Zuchuat <bertrand.zuchuat@rero.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client request enhancement Enhancement of an existing feature f: data migration Data migration from a legacy system or a previous version p-High To set a high priority!
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants