Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RST importer #378

Closed
orzelmichal opened this issue Feb 9, 2024 · 5 comments · Fixed by #386
Closed

Support for RST importer #378

orzelmichal opened this issue Feb 9, 2024 · 5 comments · Fixed by #386
Assignees
Labels

Comments

@orzelmichal
Copy link

Description

These days more and more documents are written in ReST format (especially in the open source projects), whereas at the moment, OFT expects a document to be written in Markdown format. It would be great to meet users' expectations by adding support for the new format which is more powerful and supports more complex data structures. Despite many similarities, converting from one format to another takes a long time.

Intermediate solution

With the following modification, we can hack OFT to detect RST files as if they were MD files. However, due to differences in heading format, the tool is no longer able to detect the title.

diff --git a/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java b/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java
index d112694afba5..f1c068fe2a85 100644
--- a/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java
+++ b/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java
@@ -11,7 +11,7 @@ public class MarkdownImporterFactory extends RegexMatchingImporterFactory
     /** Creates a new instance. */
     public MarkdownImporterFactory()
     {
-        super("(?i).*\\.markdown", "(?i).*\\.md");
+        super("(?i).*\\.markdown", "(?i).*\\.md", "(?i).*\\.rst");
     }

     @Override
@redcatbear redcatbear added this to the 3.10.0 RST Support milestone Feb 11, 2024
@redcatbear
Copy link
Collaborator

While Markdown is in general still more popular due to it's simplicity, RST has become the de-facto standard for documentation in Python projects. That alone is already a very good reason to support it.

Preparation

First order of business: checking RST parsers.

So far OFT has no external dependencies (with the exception of the Java runtime). We need to decide:

a) we keep it this way → we need to write the parser ourselves
b) we accept the external dependency and take an existing one

Criteria for an RST parser we could accept:

  1. No transitive dependencies
  2. Regular releases
  3. No unpatched CVEs
  4. Proper test coverage
  5. Decent code quality

@redcatbear
Copy link
Collaborator

@orzelmichal, I did some research. I found no active Java project that provides a RST parser. There are two abandoned projects, but that's it.

So I will have to write a parser that is based on the one we use for Markdown. In fact it will for the most part probably be the same code. My goal here is not to cover the whole feature set of RST, but instead a very limited subset that makes defining requirements possible. For instance, support for extensions is definitely out-of-scope.

In our discussions you mentioned that headlines were the one thing that gives you trouble sofar, so that's what I will put my main focus on. The underlined headline style is one that in Markdown exists too, but I did not support it in the Markdown parser sofar, since there is a more commonly used alternative that is also easier to parse. I am guessing here that in the end the Markdown parser will gain the capability to parse that headline style too, even if I have yet to see someone use it. Anyway, that's a nice side effect.

@orzelmichal
Copy link
Author

@redcatbear Thanks for the investigation. Yes, the headline would be a good starting point. Markdown supports only =,- whereas
RST supports several different characters (some of which are very rarely used) like =,-,^,",~.

@redcatbear
Copy link
Collaborator

#384 adds support for underline-style titles to the Markdown importer. Which means, we cracked the tricky part already. Next step: extracting common code.

redcatbear added a commit that referenced this issue Feb 15, 2024
redcatbear added a commit that referenced this issue Feb 15, 2024
redcatbear added a commit that referenced this issue Feb 15, 2024
* #378: Added support for underlined headlines in `MarkdownImporter`.
* #378: Fixed some requirements.
* #378: Improved test coverage, documentation and test reliability.
* #378: Introduced merging step for coverage.
* #378: Improved change log.
@redcatbear
Copy link
Collaborator

redcatbear commented Feb 15, 2024

@orzelmichal, the code for underline-style headlines in Markdown is now on main. Using your patch to make the Markdown parser ingest RST files, you can already try this out. At the moment only --- and === underlines are supported. We will extend that in the RST importer, since RST allows a lot more then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants