Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strong's numbers and Morphological tags in custom format #30

Closed
viktor-zhuromskyy opened this issue Mar 14, 2020 · 10 comments
Closed

Strong's numbers and Morphological tags in custom format #30

viktor-zhuromskyy opened this issue Mar 14, 2020 · 10 comments
Milestone

Comments

@viktor-zhuromskyy
Copy link

I am converting MyBible modules into TheWord format, and I need to have ability for the Multi Converter to accept and not through out my custom Strong's numbers and Morphological tags.

I have this type of Strongs:
3306, H3306, G3306, L3306
The H3306, G3306, L3306 are not accepted by the converter, at the moment.

I have custom type of Morphological tags.

When I run the converter, I get the following warnings:

WARNING: Invalid Strong number: L245
WARNING: Skipping malformed RMAC morphology code: N-N.MS
WARNING: Skipping malformed RMAC morphology code: R-PG.2S
WARNING: Skipping malformed RMAC morphology code: V-IFA.1P
WARNING: Skipping malformed RMAC morphology code: R-PA.MS
WARNING: Skipping malformed RMAC morphology code: R-PG.2S

Can you please make your code more flexible on treating "malformed" attribute types, please?

@schierlm
Copy link
Owner

schierlm commented Mar 14, 2020

Thank you for your report.

First, both H3306 and G3306 should be supported when importing from MyBible.Zone, but 3306 will be treated the same way as H3306 in old testament and G3306 in new testament. I have never seen the L ones, how should they be converted (to TheWord or other formats?) TheWord only supports <WGxxxx> and <WHxxxx> for Strongs, <WTxxxx> for morphology, but no <WLxxxx>.

For morphology, there already is a "morphology.raw" option, but unfortunately it is not supported by all module formats yet. it is supported by MyBible.Zone import, but not yet for TheWord export (although there is no particular reason for it, as in both formats you can use arbitrary strings for morphology tags). Other formats like Logos will obviously not support it, as they use their own way of encoding morphology which only works for morphology codes that follow the RMAC format.

So I'll take this as a feature request to

  1. add support for raw morphology tags for TheWord format (at least export)
  2. add some kind of "raw strong" support (which supports at least unlabeled, G, H, L) for at least MyBibleZone and TheWord fornat, while still unclear how to treat the "L" ones.

schierlm added a commit that referenced this issue Mar 21, 2020
When importing MyBibleZone bibles that contain Strongs numbers starting
with L or S, they are treated like the equivalent G/H numbers (depending
on OT/NT) and a warning is printed.

When importing MyBibleZone with the -Dmybiblezone.morphology.raw=true
option, the resulting morphology codes can now also be exported as
TheWord (since TheWord does not mandate any format for morphology
codes).

See #30.
@schierlm
Copy link
Owner

I implemented a quick and dirty fix:

  • Strong numbers in MyBibleZone that start with L or S (which I also found somewhere) are treated like the equivalent G/H numbers, as if they had no letter prefix.
  • When importing using the -Dmybiblezone.morphology.raw=true option, the imported morphology can now also exported as TheWord format.

Can you please check if this solves your use case? If not, please clarify how you want these Strongs/Morphology tags treated.

In case you cannot compile from the repo and need a precompiled version, please drop me a short notice and I can send you one.

@viktor-zhuromskyy
Copy link
Author

viktor-zhuromskyy commented Mar 28, 2020

Thank you so much. Will check it later.

Can you please compile a release?

@schierlm
Copy link
Owner

Find attached a build of 4e2456e:
BibleMultiConverter-SQLiteEdition-4e2456e7.zip

@viktor-zhuromskyy
Copy link
Author

Appreciate it so much!

@viktor-zhuromskyy
Copy link
Author

I checked the build, but I cannot figure out how to add the morphology option in my commandline, as well as I an mot happy at all with you replacing the L and S prefixes to Strong's numbers. I want these to be preserved, since L prefixes to so called Strong's numbers are in reality the references to LXX dictionary. Can you please fix the substitution of L... to be output as L..., as well as NOT TO REPLACE S..., L... and G... numbers on Old Tertament books into H... since if I am converting Septuagint module, everything is screwed up, the greek Strong's being substituted to hebrew ones.

@schierlm
Copy link
Owner

  1. you have to add the option before the -jar in your commandline, e.g. java -Dmybiblezone.morphology.raw=true -jar BibleMultiConverter.jar MyBibleZone 1.sqlite TheWord 1.ont.

  2. yeah the replacing of Greek to Hebrew for LXX is definitely a bug, and it probably affects more formats. Will have to thoroughly test it.

  3. Can you tell me how to encode the L numbers so that TheWord does not complain? Perhaps (if you have a matching Strong's dictionary) you can try changing in a text editor and check in TheWord how they need to be to work?

According to TheWord documentation, Strongs numbers have to look like <WGxxxx> or <WHxxxx>, and morphology tags <WTxxxx>. So do I have to make them <WGL1234> to conform to specification, or <WL1234> against specification (because the specification is wrong)?

If you don't want to try, I don't use TheWord so it will take me a while to set up a Windows VM and test it there myself.

@viktor-zhuromskyy
Copy link
Author

Just leave the L as it is, cince those records need special dictionary. Sure, is gonna be a perfect fit.

Currently, I am doing text replacing in SQLite3 database before eporting to TheWord format, and after that doing text replacing to recreate <WT and <WG tags.

schierlm added a commit that referenced this issue Mar 30, 2020
This is a massive commit, touching almost all format modules. Before,
the Strong numbers were treated as numbers, by default H in Old
Testament and G in New Testament, with some module specific quirks (in
case modules support to have custom prefix letters). As
@viktor-zhuromskyy pointed out, this is insufficient for modules like
LXX which have greek text in OT and which also have an extra L
dictionary in addition to G and H.

Therefore change the default that every strong reference can have a
letter A-Z attached to it and let the modules handle it. If a module
only supports G and H (or even only one kind of Strong number), the
module can ignore the prefix; other modules can use the prefix. Also
provide `StrippedDiffable` options to add prefix to Strongs (G or H
depending on Testament), remove prefixes that are default, or completely
remove Strongs numbers that have a prefix (if the export format has
trouble exporting it).

This commit probably needs some more testing before the next release :)

See #30.
@schierlm
Copy link
Owner

Ok, so I will now leave the L (or S) prefix in, in addition to <WG or <WH. H and G prefixes will (as they should have done before) overwrite the <WG or <WG. If you want them differently, you can manually edit them.

Find attached a build of b557089:
BibleMultiConverter-SQLiteEdition-b557089.zip

@schierlm schierlm added this to the v0.0.8 milestone Apr 25, 2020
@schierlm
Copy link
Owner

Closing this for now. Feel free to reopen if anything else is open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants