Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically add SkoHub blog posts to team site #485

Open
acka47 opened this issue Dec 7, 2022 · 7 comments
Open

Automatically add SkoHub blog posts to team site #485

acka47 opened this issue Dec 7, 2022 · 7 comments
Assignees
Projects

Comments

@acka47
Copy link
Contributor

acka47 commented Dec 7, 2022

While working on #484, I've noticed that the last three or four posts from the SkoHub blog are missing at http://lobid.org/product/skohub.(I have added the missing presentations with 36d54d2, though.) As we will be publishing more frequently in the coming months, we should think about automating the addition of these posts.

This could be implemented both by @sroertgen or @fsteeg , I guess.

@acka47 acka47 added this to Backlog in lobid board via automation Dec 7, 2022
@sroertgen
Copy link
Contributor

So I had a first look and what we could do is maybe fetch the xml-Feed of each blog and build the publications from there.
This has to happen on the client side then I guess.
Is this the kind of automated addition you have in mind?

@fsteeg
Copy link
Member

fsteeg commented Jan 26, 2023

Hm, so I think our goal should be to add files in gatsby/lobid/static/publication to have a uniform data base. That could happen from within the repo here, as you describe, by fetching the feeds and creating the files for them here (if that's what you mean).

However I'd think the cleanest approach would be to keep the creation of these files out of the scope for this repo, and instead create them elsewhere. Maybe triggered by a GitHub action when we push to the blogs, which then calls some conversion and then pushes the files here? Not sure if that makes sense, just some thoughts.

@acka47
Copy link
Contributor Author

acka47 commented Jan 26, 2023

I first liked the RSS approach as it may be independent from the actual blog software (we will have to integrate two Gqatsby and one Jekyll blog). However, after taking a short look at the RSS XML of the SkoHub blog, I am afraid that the RSS doesn't convey important structured data from the YAML frontmatter like author and tags. or am I missing something. If the RSS could be tweaked to include this, the approach might work after all, otherwise we will have to fetch th structured data from elsewhere. Also the HTMl of the blog post does not include structured data. I guess this might be configured with gatsby (a schema.org plugin maybe, see https://snappywebdesign.net/blog/how-to-add-structured-data-to-blog-posts-in-gatsby/). Otherwise we could think about @fsteeg 's approach to fetch it/push it directly from the git repo.

@sroertgen
Copy link
Contributor

I am afraid that the RSS doesn't convey important structured data from the YAML frontmatter like author and tags

I think this can be configured, e.g. the lobid-blog contains also author information: https://blog.lobid.org/feed.xml There are no author ids given, I would have to look how far this can be configured.

[...] otherwise we will have to fetch th structured data from elsewhere. Also the HTMl of the blog post does not include structured data.

I think this is a good hint. We should add structured data to the blog posts and then we can use the RSS feeds to get the links and from there we get the structured data.

If you agree, @acka47, I will open issues in our three blog systems (lobid, metafacture, skohub) and add the structured metadata there. Then I will continue on this issue and pull the structured data from there.

@fsteeg I get your point as well, because this will lead to an inconsitent publication database since one does not find every publication there since the blog posts get fetched dynamically. However the approach we want to take depends on how important it is that this database contains all data. If it is kind of authorative we should switch to an approach where these files are created. If it is okay to have all data on the website (we could also think about adding structured data there about all the publications after they got fetched).

I'm open for both thoug I think the RSS approach is easier to implement.

@sroertgen sroertgen assigned acka47 and unassigned acka47 Jan 27, 2023
@acka47
Copy link
Contributor Author

acka47 commented Jan 27, 2023

the lobid-blog contains also author information: https://blog.lobid.org/feed.xml There are no author ids given, I would have to look how far this can be configured.

There are no IDs in the YAML frontmatter of the lobid blog so this is fine. See e.g. https://github.com/hbz/lobid-blog/blob/master/_posts/2022-08-19-job-projektkoordinatorin.md

If you agree, @acka47, I will open issues in our three blog systems (lobid, metafacture, skohub) and add the structured metadata there. Then I will continue on this issue and pull the structured data from there.

+1 I think in the lobid blog feed only the tags are missing so not much to be done there.

I'm open for both thoug I think the RSS approach is easier to implement.

@fsteeg let us know if you still have problems with this approach. Then we should schedule a 30 min meeting to discuss this.

@acka47 acka47 assigned fsteeg and unassigned sroertgen Jan 27, 2023
@fsteeg
Copy link
Member

fsteeg commented Jan 27, 2023

I like the idea of using the RSS, my point was more about what we do with it (create JSON files) and where (not in this repo). I don't think it would be a nice solution to create the publication list on https://lobid.org/team both from files and from RSS feeds, if that's the suggestion, since that whole system is based on the files, the queries against the files etc. But maybe it's worth to reconsider that whole 'knowledge graph' approach to the website.

@fsteeg fsteeg removed their assignment Jan 27, 2023
@fsteeg
Copy link
Member

fsteeg commented Jan 27, 2023

Maybe it makes sense to approach this from a different angle: we could set up a new page to list the team publications, which uses the https://lobid.org/team/feed.xml RSS feed, and other feeds like the SkoHub blog, to create a complete list of publications (which we should publish as RSS again).

That way, we basically have two separate things: 1) a list of publications aggregated from different RSS sources and 2) a system to publish JSON files as RSS (our current setup). All sources that already publish RSS could come in via 1), and for all sources that we have no RSS for, we create JSON files in 2).

@sroertgen sroertgen self-assigned this Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants