Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project state and Help Wanted: rga 1.0 with configurable external adapters and async rust #146

Closed
11 of 13 tasks
phiresky opened this issue Nov 28, 2022 · 15 comments
Closed
11 of 13 tasks

Comments

@phiresky
Copy link
Owner

phiresky commented Nov 28, 2022

The current version of rga is 0.9.6, released in 2020.

This is a small side project for me, so I've only spent very little time on this project even though I've regularily been using this tool myself.

For the next version the focus is on being able to configure custom preprocessors in addition to the internal ones.

For example, the integrated PDF adapter is rewritten and would look pretty much like this in ~/.config/ripgrep-all/config.jsonc:

{
    "custom_adapters": [
        {
            "name": "poppler",
            "version": 1,
            "description": "Uses pdftotext (from poppler-utils) to extract plain text from PDF files",

            "extensions": ["pdf"],
            "mimetypes": ["application/pdf"],

            "binary": "pdftotext",
            "args": ["-", "-"],
            "disabled_by_default": false,
            "match_only_by_mime": false,
            "postprocessors": [{"name": "add_page_numbers_by_pagebreaks"}]
        }
    ]
}

While implementing this, I hit some issues with threading though that exceeded my Rust, so I stopped working on it for a while.

More recently, I converted the core of the code to async rust (now passing around Box<dyn AsyncRead + Send>).

The following work still needs to be done:

  • Fixing / Converting the postprocessors to async.
  • Reenabling and converting the other internal adapters to async (

    ripgrep-all/src/adapters.rs

    Lines 120 to 126 in 54799f1

    //Rc::new(ffmpeg::FFmpegAdapter::new()),
    // Rc::new(zip::ZipAdapter::new()),
    //Rc::new(decompress::DecompressAdapter::new()),
    // Rc::new(tar::TarAdapter::new()),
    //Rc::new(sqlite::SqliteAdapter::new()),
    // Rc::new(pdfpages::PdfPagesAdapter::new()),
    // Rc::new(tesseract::TesseractAdapter::new()),
    • ffmpeg
    • sqlite
    • zip
    • tar
    • decompress
  • Fixing all the failing tests and possibly adding new ones.
  • Making sure recursion into archives works with any combination of adapters
  • fix pdf pages number bug

I'll implement these myself at some point, but at a trickling rate that may take a long time until the next release.
So I'm happy for PRs that help.

@phiresky phiresky pinned this issue Nov 28, 2022
@phiresky
Copy link
Owner Author

Here's a bit of info about the architecture https://github.com/phiresky/ripgrep-all/wiki/Architecture

@phiresky
Copy link
Owner Author

phiresky commented Mar 4, 2023

I think most (all?) of the missing stuff is implemented now. I'll release 1.0-beta soon I think

@lafrenierejm
Copy link
Contributor

The following work still needs to be done:

  • Making sure recursion into archives works with any combination of adapters

@phiresky I noticed that the above item has not been checked off in the opening post. Has it been completed yet?

@phiresky
Copy link
Owner Author

The alpha is now released.

@lafrenierejm
Copy link
Contributor

@phiresky Are there any remaining known issues or pending features blocking 1.0.0?

@phiresky
Copy link
Owner Author

phiresky commented Jul 4, 2023

I don't think so, I think it all works. Just needs some testing maybe, and maybe feedback on the config format. I kinda don't want to release 1.0 then realize I did something dumb and either break semver or immediately release 2.0

@lafrenierejm
Copy link
Contributor

For what it's worth, I wrote a custom adapter yesterday (added the result to the wiki a few minutes ago) and found the config format perfectly satisfactory.

@phiresky
Copy link
Owner Author

phiresky commented Jul 4, 2023

Great! I forgot to change that, but I think it makes more sense maybe if we put the community adapters in discussions: https://github.com/phiresky/ripgrep-all/discussions/categories/show-your-adapter

@lafrenierejm
Copy link
Contributor

Great! I forgot to change that, but I think it makes more sense maybe if we put the community adapters in discussions: https://github.com/phiresky/ripgrep-all/discussions/categories/show-your-adapter

Thanks for letting me know. I've copied the content I contributed to the wiki into a new discussion.

Thoughts on removing the "Show Your Adapter" page entirely from the wiki? I anticipate that having both the discussion category and the wiki page would cause confusion.

@lafrenierejm
Copy link
Contributor

Thoughts on removing the "Show Your Adapter" page entirely from the wiki? I anticipate that having both the discussion category and the wiki page would cause confusion.

Just noticed that you've already deleted the wiki page. Thanks!

@phiresky
Copy link
Owner Author

phiresky commented Jul 4, 2023

yeah. just need to link the wiki from the readme i guess

@phiresky
Copy link
Owner Author

phiresky commented Jul 4, 2023

Ah, I remember I also wanted to go through all the existing issues to see whether they are now fixed or not and if not fix them. Didn't have time for that yet.

@Freed-Wu
Copy link

When to publish a new version (can be 0.9.7 or 1.0.0)? It should be helpful to fix NixOS/nixpkgs#250306.

@phiresky
Copy link
Owner Author

I'll maybe release the rewrite as 0.10.0 as a compromise since I don't feel it's had enough testing to quality as a 1.0, but the current stable version 0.9.6 has even more issues due to its oldness. #188 still remains which I'd like to figure out.

@phiresky
Copy link
Owner Author

what were previously 1.0-alphas is now released as 0.10.x stable.

@phiresky phiresky unpinned this issue Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants