Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change this repository into a collection of links? #207

Open
jgm opened this issue Dec 31, 2021 · 60 comments
Open

Change this repository into a collection of links? #207

jgm opened this issue Dec 31, 2021 · 60 comments

Comments

@jgm
Copy link
Member

jgm commented Dec 31, 2021

One drawback of the current structure is that people submit code here but then don't monitor the repository, and issues are neglected. Perhaps it would be better to make this simply a collection of links to lua filters that are maintained in independent repositories?

@cagix
Copy link

cagix commented Dec 31, 2021

I kind of like this idea. Maybe this repo could serve as a kind of collection of "official" scripts from the Pandoc creators and all other filters could be linked in the README (sorted by topic)? That would reduce the maintenance to checking the links every year. In addition, the "official" code provided could serve as a live demo / live documentation of the Lua API.

@tarleb
Copy link
Member

tarleb commented Dec 31, 2021

I'm very much in favor of that; it would save me a lot of time. It takes a significant amount of effort, on almost each new pandoc release, to adjust the tests and filters to the changes. It's tiresome, and pinging all authors and waiting for them to change the code would take just as long. I'd be glad to get out of that obligation.

We could still do occasional automatic "releases", which pack the filters into a single archive. This shouldn't be too hard if the individual repos use a common structure.

@cagix
Copy link

cagix commented Dec 31, 2021

We could still do occasional automatic "releases", which pack the filters into a single archive.

That sounds interesting, but might not be quite easy with regard to the then presumably different licences in different repos?

This shouldn't be too hard if the individual repos use a common structure.

Would maintaining a template repo help with this?

@tarleb
Copy link
Member

tarleb commented Dec 31, 2021

Pinging everyone who contributed a filter so far: what do you think of this idea? What would you need to make this as painless as possible for you?

@jdutant @tolot27 @blake-riley @not-my-profile @svenevs @b3 @jkr @cole-miller @sokotim @korintje @gtuckerkellogg @stroobandt @frederik-elwert @odkr

@cagix
Copy link

cagix commented Dec 31, 2021

Since each filter belongs to a subfolder, it should be easy to split your repository into several individual repositories using git filter-branch and retain the individual history :)

@not-my-profile
Copy link
Contributor

not-my-profile commented Dec 31, 2021

I am not sure about this. There are advantages to having a common repository. Having them all here guarantees that other pepole can contribute improvements even when the original author has ceased maintenance.

It takes a significant amount of effort, on almost each new pandoc release, to adjust the tests and filters to the changes. It's tiresome, and pinging all authors and waiting for them to change the code would take just as long. I'd be glad to get out of that obligation.

I think an easy fix for that would be to have a latest_pandoc_supported variable for each filter. When a new pandoc version is released that variable could be automatically bumped for each filter which tests pass with the new version. And the script could automatically update a table in the README of this repository that lists all scripts known to work with the latest version. Even the pinging of authors when their filter is no longer compatible with the latest version of pandoc could be easily automated.

Especially if you still occasionally want to release filter bundles having all filters in a single repository should make that easier. Otherwise you just have new potential problems to deal with (e.g. some repository went offline, some repository suddenly has an unexpected file structure, etc.).

@bpj
Copy link

bpj commented Dec 31, 2021 via email

@korintje
Copy link
Contributor

I agree with this idea. It is because the roles of the filters are highly independent, so we cannot expect large synergy effects by collecting them at one repository. Instead, I think it is better to focus on keeping accessibility, readability, and consistency of each filter documents. Listing the filters at Pandoc official web page or GitHub Pages would be nice. As far as I know, one ideal example is crates.io, which is a kind of library repository for Rust language, is known to have well-formatted and easily-accessible documents.

@jgm
Copy link
Member Author

jgm commented Dec 31, 2021

There are advantages to having a common repository. Having them all here guarantees that other pepole can contribute improvements even when the original author has ceased maintenance.

In that case you could always fork the original repository, make your changes, and submit a PR here for an updated link to the fork.

I am in favor, but what about having this repository contain
submodules/subtrees/subrepos linking to contributors' repositories so that
people can still pull this repository and get all filters?

I'm not sure it's all that valuable to be able to get all the filters in one repository. Generally you only need some of them; why not just clone those separately?

@tolot27
Copy link
Contributor

tolot27 commented Dec 31, 2021

I like the idea of submodules and switching to them should be easy because we have subdirectories, already. submodules can be checked out from their origin and individually.
Having this repository as the main repository has the advantage that checks (i. e. in case of a new pandoc version) can be maintained at a central place.

@svenevs
Copy link
Member

svenevs commented Dec 31, 2021

I don't have any preference either way, if it makes things easier for maintainers then I'm all for it ❤️ I'm pretty sure my filter is feature complete, but I'm sorry if I've missed any issues related to it.

@alerque
Copy link
Collaborator

alerque commented Dec 31, 2021

I'm really not convinced this would be an adventitious move. Having some complex filters that see a lot of development in their own repos is a good thing perhaps (and we have a history of suggesting that) but for small one-off ones that tend to be submitted, used by them a few times, and then the submitter moves on I think a large chunk of them would fall below some minimum threshold that would make them viable FOSS projects on their own. Having a team of maintainers at least reviewing submissions here adds some amount of normalizing and consistency that makes filters in this repo much more attractive than random ones out of people's Gists, and for maintenance not having the bottle neck of one maintainer that got it working form themselves and then is never motivated to tweak it to be more generally useful seems seems like a benefit to most simpler filters.

@jgm
Copy link
Member Author

jgm commented Jan 1, 2022

@alerque I think the filters could still be reviewed -- at any rate, we wouldn't need to include links to filters that didn't look good. The aim would be to change where bug reports and enhancement or support requests go. They should go to the author of the filter, not to the pandoc maintainers.

@stroobandt
Copy link
Contributor

As a first-time, one-off contributor, I have to admit that the current process together with the suggestions and help provided by @tarleb rendered my contribution more worthwhile and universally applicable.
That would not have happened without the "editorial work" of @tarleb.
The current process could be considered as a very valuable peer-review, where the value eventually goes to the end user.

Another admission of mine is that my extended family and I usually employ my filter only with the version of pandoc that comes packed with the latest Ubuntu LTS release and upgrades. The reason for this being the fact that too many of my users on too many machines require a stable system environment for work/study.

This certainly does not mean that I would not maintain my filter. However, if the user community at large fails to prod me, I would typically notice a version compatibility problem with my filter only when a new version of pandoc eventually lands in the Ubuntu LTS repositories.

I hope this straightforwardness helps with reaching a consensus about how to proceed with this great, curated collection of filters.

@b3
Copy link
Contributor

b3 commented Jan 3, 2022

I do not have any smart definitive answer to this good question. I added thumb up to comments given ideas that I like.

It is a fact that I didn't follow issues here for my small filters (thanks to @jgm I will now try to check them).

It is also a fact that, as @stroobandt states, @tarleb work rendered my small contributions more worthwhile and universally applicable.

IMHO I think that keeping a common framework (at least for tests and description for instance) need however to be kept.

Being able to fetch all code at once is also a nice facility (which helps me being inspired) but still can be offered if this repo is changed to a simple list of links.

Sorry not being able to help more concretely.

@alerque
Copy link
Collaborator

alerque commented Jan 6, 2022

The aim would be to change where bug reports and enhancement or support requests go. They should go to the author of the filter, not to the pandoc maintainers.

To some extent, we can get the best of both worlds. If the code stays here and we add contributors to a GitHub team with limited access to this repo, we can use .gitattributes to specify GitHub users as code owners for the filters they contribute. That way they would not only get asked to be involved in code review if somebody touched their code, but they could be assigned to related issues and such.

My experience is that people are even more likely to stay involved and take some ownership over their code if it has the publicity of being in an official repository rather than being in their own ad-hoc repos. Anybody that is going to keep on top of issue reports on their own repo is also likely to stay involved with it if they have some ownership in a bigger project.

@benabel
Copy link

benabel commented Jan 6, 2022

I really like this repository and it is a great source of inspiration when writing filters. It is very useful to have all these filters in one place. Of course, I agree the plugins creators should maintain their plugins( if they have the time to). Some plugins could be placed in a unmaintained folder or repo if they can't. Also a table in the README would be useful indicating filters name, description and formats processed.

@jgm
Copy link
Member Author

jgm commented Jan 6, 2022

we can use .gitattributes to specify GitHub users as code owners for the filters they contribute.

Can you elaborate? What would the syntax be? That would certainly be an improvement, as now there's no way to figure out who contributed which filter other than looking at git history.

@alerque
Copy link
Collaborator

alerque commented Jan 6, 2022

Here is an example CODEOWNERS file that uses .gitattributes syntax to assign code is a repository to different people. The @... names can be individual accounts or teams (or mix and match) that can have multiple members. This will automatically request they review any PRs touching those code paths as well as open the door to other GitHub tooling like allowing them to approve PRs if code-owners approve them.

What it doesn't do is triage bug reports and assign them to those owners. That would still need to be done manually, only PRs are automatically assigned.

@jgm
Copy link
Member Author

jgm commented Jan 7, 2022

I've added .github/CODEOWNERS, but I don't know the github handles of the contributors. Maybe people can update this themselves with PRs?

@ickc
Copy link
Member

ickc commented Jan 7, 2022

As a side note, I think this is really about having a packing index and a package manager.

people like this and pandocfilters (the Python one) because they act like both. It is a centralized location that once cloned, someone else is maintaining that for you which should guarantee it is working with the latest-ish pandoc.

The problem of this repo is that it isn't going to scale well (into many filters) and the work of maintenance is transferred to the maintainer.

Years ago some of us proposed to have a package manager, and there was a prototype. But there was a few problems. First we mixed the 2 related concept in one solution, and second it isn't official.

In short I think the right direction would be to have an official package index (like CTAN). This is similar to the "link" concept above, but more formal. May be a YAML file with a certain spec. The official pandoc community advertise this as the pandoc packaging index that people should submit to as authors and discover as users.

Then we can let 3rd parties to build a ecosystem around it. Eg a package manager (similar to 3rd party filter framework), or a website (like the 3rd party Mac AppStore-like website for homebrew).

odkr added a commit to odkr/lua-filters that referenced this issue Jan 9, 2022
jgm pushed a commit that referenced this issue Jan 9, 2022
See issue #207.

Co-authored-by: Odin Kroeger <odkr@users.noreply.github.com>
@tarleb
Copy link
Member

tarleb commented Jan 17, 2022

Allow me to think out loud for a moment; this gets a bit fundamental and includes some of the good points others already made above.

What I like about this repo:

  • It has become a beautiful collection of useful filters. Users can browse it, use the filters, and hopefully learn from the code.
  • Filters share a similar structure and allow customization in a similar manner.
  • I get to work with contributors, which is a chance for me to bring in the experience from writing many filteres and large parts of the Lua subsystem.

What I dislike:

  • Filter authors don't really get the recognition they deserve. I'm not at all a fan of "GitHub is my resumé", but that's often the way it is. A filter in a personal repository is easier to show off as achievement than a contribution in a repo like this one. It seems fair to encourage people to highlight their work.
  • We are currently excluding people who prefer other platforms like GitLab.
  • My code standards are opinionated and often high, quite possibly too high. This effectively prevents useful filters from becoming available as they are stuck in code review for far too long. (But I appreciate the kind words noting the positive side of this.)
  • The previous point is made worse by my time constraints. It felt in the past like I was often the bottleneck for changes and new filters. Not a good situation, neither for contributors, nor for me.
  • Tests are often difficult to write, and most tests depend on the specific output created by pandoc. That makes them, and the repo, high maintenance.

In conclusion, I'd still rather turn this repository into a collection of links. My proposal:

  1. Create a template repository for Lua filters. This way we can still encourage a certain standard layout, but filter authors have the freedom to do whatever they feel is right.
  2. Add an issue template for adding new links: this should include a checkbox to select if the author wishes for a detailed code review of their filter. We could go as far as to encourage community review by sending an automated mail to pandoc-discuss whenever such an issue is opened.
  3. Slowly move filters to separate repos, but explore ways to create collections of all filters listed here and adhering to certain conventions.

Edit: Forgot to make my main point: it seems unreasonable to expect people to maintain code that they no longer control; the sense of ownership is much stronger if authors can retain full control over their code.

@jgm
Copy link
Member Author

jgm commented Jan 17, 2022

I think this is a good plan!

@alerque
Copy link
Collaborator

alerque commented Jan 27, 2022

@bpj Template repositories are meant to be used as a base to clone from (and GH and a function for doing this), and you want things named what the final name is going to be, not something that will need to be shuffled around.

Converting existing repos is a bit trickier. Merging as you describe is technically possible with some next level Git ninja commands to join histories with no common root, but it also brings with it a pile of issues that most people would struggle to deal with later (e.g. git blame needing special handling).

I suggest just using a tree diff on existing repositories and manually massage them to be as alike or different as you feel like without doing any merge foo. A how-to on this could be useful to add to the template, but I would focus usage on getting new projects going.


On a different topic, I'll be looking into some subtree splits to help people with filters here already get them split out with full history for use in stand alone repositories. Once the dust settles a little bit on what we are recommending for stand alone repos we can look at migrating current ones to that model.

@mfhepp
Copy link

mfhepp commented Mar 15, 2022

Joining this discussion quite late: I think there is one huge argument in favor of a central repository for the most common LUA filter for Pandoc, and that is security: Since Pandoc is typically running with full user privileges, a LUA script can do really nasty stuff (steal information, load malicious content, ...). While this central repository does not make it impossible to inject malicious code into the most popular filters, it at least provides

  • some kind of community review and
  • a bit of centralization for a better flow of information (vulnerabilities can be reported in here, people are likely to take notice, more eyes increase the likelihood of spotting a malicious filter, ...).

A mere collection of links will cause more fragmentation and hence make it less likely and slower to spot and mitigate security risks.

Also, there are lots of commonalities among filters; with a central repository, it will be easier to modularize and reuse code.

I have a really bad feeling watching a growing community of non-developers running arbitrary code from some private Github repositories found by googeling for some Pandoc/LaTeX problem.

Search for "supply chain attacks" to get a glimpse. This is even more of an issue given that LUA is a bit of a niche language, further complicating it for many to understand what a piece of LUA code is doing on their machine.

@jgm
Copy link
Member Author

jgm commented Mar 15, 2022

From the pandoc manual:

.

A note on security

If you use pandoc to convert user-contributed content in a web
application, here are some things to keep in mind:

  1. Although pandoc itself will not create or modify any files other
    than those you explicitly ask it create (with the exception
    of temporary files used in producing PDFs), a filter or custom
    writer could in principle do anything on your file system. Please
    audit filters and custom writers very carefully before using them.

You are right, of course, that people can get into big trouble by running filters they download. And having a central repository would help with that. The problem is that it takes a lot of human-power to review the code, integrate pull requests, etc. We just don't have enough of that.

@ickc
Copy link
Member

ickc commented Mar 15, 2022

What is described is related to web of trust and basically what you said is as you trust pandoc you also trust other stuffs maintained by the same or related developers.

another related concept here is package manager. It does not solve the trust issue by itself, but basically now you’re trusting the maintainer of a package index rather than the developer. (Of course trusting both, ie you select package from author you trust only, is better.) Also just to mention that typically package index can be dangerous because there’s no “maintainer” you need to be approved from, in the example of PyPI.

Put it this way then the problem above is saying that “monolithic package index” like this puts too much burden to the maintainers, which is doing both job. A “proper package index” splits the burden into individual maintainers managing their own package, and a package index maintainer(s) who maintains the quality.

@ickc
Copy link
Member

ickc commented Mar 15, 2022

Just to elaborate a bit more, there’s also more incentive for the developer to maintain their script as they typically are the biggest user of that script. The problem then is to have a package index that people will have incentive to use, including official blessing and simplicity (and adequate level of trust.)

But to name the cons of having a centralized package index like this, it makes releasing a breaking release AST slightly easier. (But then the blame should be put to end users that upgrade without considering “pinning” their version. Again, a problem arises when not thinking this in terms of packaging.)

by the way, I’m not complaining as there’s no perfect solution. look at LaTeX for example, while they have a package index, packaging is a mess as version cannot easily be controlled so package can breaks mysteriously and then authors are conditioned to release backward compatible changes only, which leads to worse experience (bad behavior should be discontinued.)

@gtuckerkellogg
Copy link
Contributor

I'm largely agnostic; I contributed a filter which my PhD students have used, but I'll own up to neglecting it recently if issues have come up. I'll maintain my agnosticism, but I'll be happy to rectify my neglect of the issue no matter how it's decided.

@chrisaga
Copy link

@alerque

On a different topic, I'll be looking into some subtree splits to help people with filters here already get them split out with full history for use in stand alone repositories. Once the dust settles a little bit on what we are recommending for stand alone repos we can look at migrating current ones to that model.

I am not a git expert but I can share what I figured out to move to their own repository two Lua filter I had proposed as my contribution to lua-filters .

  1. Create the repository on Github from @tarleb's lua-filter-template. NB. The master branch name is main (came from the template).
  2. Make sure everything is clean in the lua-filters forked repository I am working on and make a fresh clone (named after my new Github repo):
git clone lua-filters hk-pandoc-filters
  1. Install git-filter-repo since git-filter-branch man page advise to switch to the former.
  2. Remove the remote link, remove everything but my two filters, add the just created Github repository as a remote.
cd hk-pandoc-filters
git remote remove origin
git filter-repo --path column-div/ --path tables-vrules/
git remote add -f origin git@github.com:chrisaga/hk-pandoc-filters.git
  1. At this point I have my two Lua filters in the master branch and the new files from @tarleb's template in the main branch. The two branches are not related (no common ancestor) but I still can merge them with the appropriate option.
git checkout main
git merge --allow-unrelated-histories master
  1. Do some cleaning with git mv and git rm and push everything to Github
git commit -a
git push
  1. Check everything is OK on Github and remove the now useless branch
git branch --delete master

@tarleb
Copy link
Member

tarleb commented Aug 10, 2022

I have created a new organization pandoc-ext and have started to migrate filters there. Each filter will be placed in a separate repository, as this makes it easier to use the filters with RStudio's quarto. I will only transfer those filters that I intend to maintain.

The main impediment right now is my template repository, which still needs more work.

@cagix
Copy link

cagix commented Aug 10, 2022

Step 4 could also be done more easily through git subtree: In hk-pandoc-filters you can perform a git subtree push --prefix=<yourfolder> <cloneofluafiltertemplate> <branch>, where <branch> should be different from main or master.

@ickc
Copy link
Member

ickc commented Nov 2, 2022

@tarleb, if you want to, I can invite you as maintainer of https://github.com/pandoc-extras which is intended for any "pandoc extras" kind of stuffs.

@tarleb
Copy link
Member

tarleb commented May 1, 2023

Thank you @ickc, and sorry for the late reply, I had forgotten about this. I went with the pandoc-ext name to mirror the quarto-ext org. Now we have two such orgs, but I think that's ok.

I've sent you an invite to become a maintainer at the new org.

@cagix
Copy link

cagix commented May 1, 2023

so, now we have both, https://github.com/pandoc-extras and https://github.com/pandoc-ext? does that mean some kind of split in the "pandoc extras"? and also there is https://github.com/pandoc/lua-filters ...

@tarleb
Copy link
Member

tarleb commented May 1, 2023

I don't see it as a split, it's two separate orgs with slightly different goals.

As for this repo, it should probably be archived at some point.

@bpj
Copy link

bpj commented May 3, 2023 via email

@ickc
Copy link
Member

ickc commented May 8, 2023

Thank you @ickc, and sorry for the late reply, I had forgotten about this. I went with the pandoc-ext name to mirror the quarto-ext org. Now we have two such orgs, but I think that's ok.

I've sent you an invite to become a maintainer at the new org.

The invitation just expired. Could you send one again? Thanks.

@hholst80
Copy link

and issues are neglected.

Is that worse or better than neglecting things in their own repos? At least here the issue is seen by many more users and perhaps fixed.

Just because something has problems does not mean that the problem is resolved by changing the physical location of the code. The problem just moved. 99% of the code here will be written by PhD students and as soon as they are out of academia they will forget all about their pandoc hacks. I believe that moving things out will make it less probable for code to be 'inherited' by new students, and rather they will reinvent the same solutions, ad infinitum.

As a legacy user of pandoc (I have written these kind of filters myself, when I was a PhD student) I find the pandoc vs pandoc-ext confusing as I find the reasons for this change confusing. With 30 open issues, the problem rationale appears to me more theoretical than practical. Consider a repositories like Ansible which has a myriad of custom domain plugins. If they would go down this route, I could sympathize. They have >30K issues, and often have thousands of open issues, and hundreds of open PRs. This repo has 30 open issues. I am a skeptic but I can be convinced by reason: what problem is being solved and what problem is invented?

@tarleb
Copy link
Member

tarleb commented Jan 25, 2024

Before I invest time in an answer, I'd like to learn more about your motivation for this question. Do you plan to contribute, maybe by writing explanatory paragraphs to place in some of the readme files? Or is this just curiosity and/or frustration?

@hholst80
Copy link

hholst80 commented Jan 26, 2024

I think the users of pandoc deserves the best possible ecosystem. My time is as important as yours and I already invested because I see that this important. I stated that I already wanted to contribute the d2 filter. I have used pandoc for some time. At least 10 years, but likely 15 years. The little code that I made available is largely unused by the community. Wasted efforts and I also have no interest in maintaining small lua hacks in my private github. Put together these "hacks" becomes valuable.

You can expand on the rationale and we can strawman on this as a case study:

https://github.com/JensErat/pandoc-scrlttr2

It seems to be a very nicely setup repository. Last changed 8 years ago. (I think we can keep up with the maintenance work?) Not everything would need to go into this repo either. Let the users decide what they want to put here. Perhaps there is no need to have pandoc-scrlttr2 (weird name, only makes sense in academia) in this repo because it is well maintained and discoverable in other places. I want to make it easy for people to hack up some code and share it with people. Collaborate and build together as not everyone want to put up a complete OSS project and put on their github.

@jgm
Copy link
Member Author

jgm commented Jan 26, 2024

That repository is a single project by a single person who maintains it. That's exactly what we want to move to for Lua filters.

We want to get away from the model where you contribute your script to a giant repository, then expect others to field bug reports for it, deal with questions about it, and so on.

@hholst80
Copy link

hholst80 commented Feb 7, 2024

We want to get away from the model where you contribute your script to a giant repository, then expect others to field bug reports for it, deal with questions about it, and so on.

I see nothing wrong with that expectation. This is exactly what happens with all successful open source. People learn about the code and knowledge is shared, and software is improved.

@jgm
Copy link
Member Author

jgm commented Feb 7, 2024

You're saying that your expectation is that others will maintain your code, answers questions about it, and so on? Where others = @tarleb and me? No, thanks. I'm already overburdened with my open source maintaining. I'd rather have you maintain your code in your own repository. Happy to link to it.

@bpj
Copy link

bpj commented Feb 10, 2024

I took the liberty to add a page on the repository wiki where authors can add links to their filters themselves with zero hassle for the owners of this repository.

Thoughts:

  • Perhaps that page can be (prominently) linked from and/or periodically copied to the repository README.
  • Maybe it should better go into the pandoc-ext wiki as/if that is activated. On the other hand this repository will probably come up if people search for "Pandoc (Lua) filters".
  • I will try to find time/remember to do some housekeeping on that page, but the more people help with that and with adding entries the better of course!l

@odkr
Copy link
Contributor

odkr commented Feb 10, 2024

There's already https://github.com/jgm/pandoc/wiki/Pandoc-Filters. But perhaps it'd be a good idea to place a prominent link to that Wiki in the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests