Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MyBible to USFM/USX issues #32

Closed
viktor-zhuromskyy opened this issue Mar 30, 2020 · 2 comments
Closed

MyBible to USFM/USX issues #32

viktor-zhuromskyy opened this issue Mar 30, 2020 · 2 comments
Milestone

Comments

@viktor-zhuromskyy
Copy link

When trying to convert MyBible SQLite3 Bible modules to USFM/USX, I am getting a flood of the following warnings:

WARNING: Raw HTML is not supported
WARNING: No tag found for formatting: font-style: italic; myBibleType=note

The command I am running is java -Dmybiblezone.morphology.raw=true -jar ./BibleMultiConverter-SQLiteEdition/BibleMultiConverter-SQLiteEdition.jar MyBibleZone ./LCV’19r+.SQLite3 USFM ./zzz '#-*.usx'

Here you can download my file to trace it yourself: https://s2.igrnt.info/MyBible/%D0%BE%D1%84%D0%B8%D1%86%D0%B8%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9%20%D1%80%D0%B5%D0%BB%D0%B8%D0%B7/LCV%E2%80%9919r+.SQLite3

@schierlm
Copy link
Owner

schierlm commented Mar 31, 2020

Thank you for your report.

First, you are using USFM export filter but a .usx file extension, but that was probably just a mistake.

I am not sure what the desired outcome is for you (i.e. how you would expect the program to behave instead). There are obviously situations where Bible formats have features that can not (with acceptable effort) be mapped to other Bible formats, and warnings are there to tell you when this happens to give an indication where you may improve the quality by manually editing the input file, an intermediary Roundtrip file or the output file.

Do you want an option to not show such warnings, or to count/tally them instead of printing them when they are encountered? Or are the warnings unclear and need better wording? Or do the warnings result in a suboptimal output file and you want to suggest how conversion could have been improved here?

For the first warning, WARNING: Raw HTML is not supported: MyBible SQLite format can embed arbitrary HTML inside of footnotes and introduction texts. Your example module includes Facebook links, ordered lists with various CSS, etc. Only a very limited set of tags are converted to actual Bible format features, all the othe tags are stored as "Raw HTML" in the output format. If the destination format also supports Raw HTML, these information can be preserved that way. USFM/USX, on the other hand, does not support any HTML, so it can only export the tags which are supported. Raw HTML Tags will get stripped, but the text content will be preserved. I do not have plans to increase the number of HTML tags that can be converted, so in case there is anything important in the input HTML, I'd suggest to edit either in the source format or in an intermediate Roundtrip format.

The second warning is about a Formatting tag with custom CSS, which has no output mapping for USFM. The tag originally was a <n> tag in MyBible.Zone, and proably it would be a good idea to map it to \add USFM tag here. I will be improving this by matching this to \add, and also add a fuzzy matcher, so that if there was no exact mapping, it would find the italic and make it an \it tag instead. This can also be done manually in the Bible file, or by using the OptimizeFormatting option of StrippedDiffable, e.g.

OptimizeFormatting 'C=font-style: italic; myBibleType=note->F=ITALIC'

To replace all custom formatting with the given CSS by Italic formatting.

In both cases, the resulting file should be usable without any postprocessing, if you can live with the non-optimal tagging.

schierlm added a commit that referenced this issue Apr 6, 2020
First, add a special case to convert CSS created by MyBibleZone's `<n>`
tag to \add tag.

In addition, when there is custom CSS to be exported and there is no
exactly matching USFM tag, check if the CSS includes bold and/or italic
and fall back to \bd or \it or \bdit tags.

Finally, convert red text to \wj and small caps to \sc.

When these fuzzy rules find nothing, behave as before (print a warning
and leave the content unformatted.

See #32.
@schierlm schierlm added this to the v0.0.8 milestone Apr 25, 2020
@schierlm
Copy link
Owner

Closing for inactivity. Feel free to reopen in case anything is still open for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants