-
Notifications
You must be signed in to change notification settings - Fork 7
Raw data from JMDict? #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
First, please allow me to thank you for your support on Patreon, it is highly appreciated! [1] - I'll come to that later on ;) |
(1) that's quite unusual request :-) I am somewhat hesitant to add raw XML to the dictionary files since that would enlarge the data files by quite a lot. Currently I'm storing info in a binary blob rather than in a XML since it takes up less space and is faster to parse. I'll experiment and I'll let you know. |
[2] Fixed in Aedict 3.39.34 |
[1]: dictionary file without xml: 78mb; With XML: 171mb. I'm not sure, that's three times the original size... Maybe I can reconstruct the XML from the binary data? Not sure though, the entry XML structure may get quite complex... Let me look whether the frequency notes are transferred to the binary data. |
Okay, maybe [1] was a bad idea [1A] bug report http://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1547720 but Aedict v3.39.33 on Android shows both Kanji as 'common' [1B] enhancement discussion Might be nice if this info was retained in full: [ichi1,news1,nf16]. I've seen cases when one Kanji had 3 or 4 of these markers and the other only 1; yet both are displayed as 'Common' in Aedict. It would feel somewhat reassuring to know how an entry was marked in JMDict raw. I realize that this is a much bigger change than [1A] and that perhaps this will end up being less compatible with other dictionaries [1C] I seem to recollect seeing cases when there would be more than on spelling and more than one reading and one reading was marked as especially relevant for one of the spellings. But I can't find that example any more and I'm not sure if it existed in the first place. Much less do I remember if Aedict was able to present information with same detail as JMDict raw |
Thank you, I agree , let's split the issue. |
There already exists a mechanism - once you tap your finger on a marker like 'Common' in Aedict a pop-up appears giving some detail. I imagine it is here that [ichi1,news1,nf16] markers can appear. We already have separate 'Common' markers for Kanji and for reading which matches quite nicely jmdict format |
[1a] quoting from JMDict.xml: (The entries with news1, ichi1, spec1 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.). Quoting from the xml file:
So 来る is rightfully marked (P) - common, yet 來る should not be marked common. A clear bug, I'll fix that. The reading is common, and maybe thus Aedict incorrectly assumes that all kanjis it connects to are common as well. Thank you for spotting that! |
Fixed in Aedict 3.39.34; if you are missing still more raw data please open a new feature request. |
Thanks a lot for this improvement Martin. |
Sorry about that :-) For the future reference, just press the blackish (i) button, it should show this frequency information. |
Hi,
Android AEdict has become a daily tool for me, thanks a bunch.
[1]
One thing which I'm still sorely missing is the ability to view raw data from JMDict
Would be nice if I could click a small button in the 'Dictionary Entry Screen' or slide my finger from one of the screen edges or smth else to view raw data like this
http://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1225790
or even like this
http://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&disp=jm&e=21440
That way I would be able to get the full richness of info - particularly full details of frequency notes (spec2,news2,nf25) and the full relationship between spelling and glosses. I appreciate all the hard work that has gone into presenting this info nicely and wish you all the best in further bettering it but I still treat and guess always will treat the raw data as the ultimate truth and would like to be able to access from the app (I don't always have access to the internet so having it all in my phone is super-handy).
[2]
If I can abuse the issue system as a communication tool - would it be possible to ask you to name the source of pitch accent data? Just as I'm able to view JMDict data outside of AEDict I would like to keep the option open for myself to look up pitch data myself too. BTW it doesn't look like this info is available on the About screen. Maybe it would be nice to have it there?
[3]
The source code for these libraries that AEDict depends on
sk.baka.autils:autils
sk.baka.tools:bakatools
is presently not available online is it? I'm not saying it's wrong or it has to be - just making sure. In fact I only wanted to see the source to find out the source of pitch accent data because I couldn't find the word 'pitch' in the main source code repo :)
The text was updated successfully, but these errors were encountered: