Skip to content

Commit

Permalink
OnLineBible export improvements
Browse files Browse the repository at this point in the history
- Add UTF-8 BOM to file
- Export cross references in structured form instead of plain text
- When using the `IgnoreKJV` option, do not split/reorder/merge verses

Fixes #45.
Fixes #46.
  • Loading branch information
schierlm committed Mar 29, 2021
1 parent c2ac186 commit 61fc0b5
Showing 1 changed file with 25 additions and 5 deletions.
Expand Up @@ -35,6 +35,8 @@ public class OnLineBible implements ExportFormat {
"Put <namesfile> as NewBkNms.Lst into the note control directory of the Bible notes set."
};

private static final Map<BookID, String> BOOK_TO_ABBR = new EnumMap<>(BookID.class);

private static final BookMeta[] BOOK_META = new BookMeta[] {
new BookMeta("Ge", BookID.BOOK_Gen),
new BookMeta("Ex", BookID.BOOK_Exod),
Expand Down Expand Up @@ -153,20 +155,29 @@ public void doExport(Bible bible, String... exportArgs) throws Exception {
}

try (BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outFile), StandardCharsets.UTF_8))) {
bw.write("\uFEFF");
for (BookMeta bm : BOOK_META) {
String prefix = "";
if (bm.id == BookID.BOOK_Matt && includeStrongs) {
prefix = "0 ";
}
Book bk = bookMap.remove(bm.id);
int[] verseCount = StandardVersification.KJV.getVerseCount(bm.id);
if (bk != null && ignoreKJV) {
verseCount = new int[bk.getChapters().size()];
for (int i = 0; i < verseCount.length; i++) {
verseCount[i] = bk.getChapters().get(i).createVirtualVerses(false, false).stream()
.mapToInt(vv -> vv.getNumber()).max().orElse(1);
for (int i = 0; i < bk.getChapters().size(); i++) {
Chapter ch = bk.getChapters().get(i);
for (Verse v : ch.getVerses()) {
bw.write("$$$ " + bm.abbr + " " + (i + 1) + ":" + v.getNumber() + " ");
bw.newLine();
StringBuilder text = new StringBuilder(prefix);
v.accept(new OnLineBibleVisitor(text, includeStrongs));
bw.write(text.toString().replaceAll(" +", " "));
bw.newLine();
prefix = "";
}
}
continue;
}
int[] verseCount = StandardVersification.KJV.getVerseCount(bm.id);
for (int i = 0; i < verseCount.length; i++) {
Chapter ch = bk != null && i < bk.getChapters().size() ? bk.getChapters().get(i) : null;
int maxVerse = verseCount[i];
Expand Down Expand Up @@ -214,6 +225,7 @@ private static class BookMeta {
public BookMeta(String abbr, BookID id) {
this.abbr = abbr;
this.id = id;
BOOK_TO_ABBR.put(id, abbr);
}
}

Expand Down Expand Up @@ -268,6 +280,14 @@ public Visitor<RuntimeException> visitFootnote() throws RuntimeException {

@Override
public Visitor<RuntimeException> visitCrossReference(String bookAbbr, BookID book, int firstChapter, String firstVerse, int lastChapter, String lastVerse) throws RuntimeException {
if (BOOK_TO_ABBR.containsKey(book) && firstChapter == lastChapter) {
content.append("\\\\#" + BOOK_TO_ABBR.get(book) + " " + firstChapter + ":" + firstVerse);
if (!firstVerse.equals(lastVerse))
content.append("-" + lastVerse);
content.append("\\\\");
return null;
}
System.out.println("WARNING: Cross reference references more than one book: " + bookAbbr + " " + firstChapter + ":" + firstVerse + "-" + lastChapter + ":" + lastVerse + " - replacing by plain text");
return new OnLineBibleVisitor(content, includeStrongs);
}

Expand Down

4 comments on commit 61fc0b5

@Michahel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a test (using my example module from #42).

In Mt 12:14 occurs following conversion:

  • Before:
...
<para style="s1">Headline</para>
<para style="r">(<ref loc="xxx1">yyy1; <ref loc="xxx2">yyy2</ref>)</para>
...
  • After BibleMultiConverter:
$$$ Mt 12:14 
... {\$Headline\$} {\$\@(yyy1 {\\#xxx1\\} ; yyy2 {\\#xxx2\\} )\@\$} ...

It should be like this:

  • After BibleMultiConverter:
$$$ Mt 12:14 
... {\$Headline\$ (\\#xxx1 xxx2\\)} ...

@schierlm
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting references in headlines / subheadlines into nested footnotes is a feature of Paratext import and needed for many other formats, like MyBibleZone, ZefaniaXML, E-Sword, BrowserBible, as those formats would not render references in headlines clickable). Can be disabled by passing -Dparatext.allowrefsoutsidefootnotes=true.

But that would result in

$$$ Mt 12:14 
... {\$Headline\$} {\$\@(\\#xxx1\\ ; \\#xxx2\\)\@\$} ...

Which still is a bit away from your desired result of having

$$$ Mt 12:14 
... {\$Headline\$ (\\#xxx1 xxx2\\)} ...

As you wrote in #47, collapsing footnotes creates problems when there is text directly adjacent to the footnote. Does this also apply to other footnotes? So this might probably be improved a bit more by implementing the joining of adjacent footnotes into one.

But I guess anything that goes beyond that would require special-casing conversions for individual import/export format combinations and I won't go into this alley. In case someone submits a pull request, I might accept it, but in general my conversions will be generic enough so that every input format can be converted into every output format and produce a result that may not be optimal, but acceptable.

It is obviously always possible to preprocess the input file, postprocess the output file or process any intermediate file (ParatextDump, Diffable, RoundtripXML, RoundtripTaggedText) to improve the conversion results.

@Michahel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work.

As you wrote in #47, collapsing footnotes creates problems when there is text directly adjacent to the footnote. Does this also apply to other footnotes? So this might probably be improved a bit more by implementing the joining of adjacent footnotes into one.

The problems with adjacent footnotes is matter of correct style which I do not particularly either and it may be not changed.

However, the second footnote is displayed very incorrectly. {See screen capture} There is a conflict between Formatting Codes \$ and \\ (the conflict starts after the second \\). Here it is written about the purpose of <para style="r">:

Parallel passage reference(s).
A reference to a parallel passage usually located under a section heading s#.

In the example itself, you can see that the text with this style is displayed in italics and in a smaller font than the heading. Such font attributes already apply to text that is enclosed in brackets {...}, so there is no need for additional Formatting Codes.

In addition to the fact that unwanted Formatting Codes are applied to the footnote text, the footnotes are designed as separate hyperlinks. This means that the user must view each Hypertext Linking separately. If you merge these links, then they will all open in one window, which is much more convenient.

Thus, I want to summarize. I would like to change only the second footnote so that it looks like this:

{\\#xxx1 xxx2\\}

@schierlm
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated #47 and created #49 for the two issues (no italics in footnotes, merge references).

About your link to the USX specification: That exact specification (or rather its counterpart the USFM specification) also gives guidance how to format these attributes for Bible formats that do not support them (which are usually also reflected in the screenshot). And the suggestion for \r is to create a smaller headline in Italics which is set in parentheses.

image

Which is exactly what BibleMultiConverter is doing on import. The problem that exactly this combination (a headline in italics) causes problems is nothing that the Paratext import should fix (if you convert to MyBibleZone, you won't be affected by it), so it needs to be added into the OnLineBible export.

Please sign in to comment.