Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and possibly migrate to lmdbjava #75

Open
kermitt2 opened this issue Mar 20, 2018 · 11 comments · May be fixed by #83
Open

Test and possibly migrate to lmdbjava #75

kermitt2 opened this issue Mar 20, 2018 · 11 comments · May be fixed by #83

Comments

@kermitt2
Copy link
Owner

lmdbjava is apparently better maintained (more features & more OS built) and faster... also never get the zero copy mode working reliably with lmdbjni so it is worth trying lmdbjava for this too.

@kermitt2 kermitt2 added this to the 0.0.4 milestone Mar 20, 2018
@lfoppiano lfoppiano self-assigned this Apr 18, 2018
lfoppiano added a commit that referenced this issue Apr 18, 2018
lfoppiano added a commit that referenced this issue May 2, 2018
@lfoppiano
Copy link
Collaborator

Everything seems to work properly except one thing, the number of readers is limited and asynchronous calls from the javascript frontend results in exceptions. lmdbjava/lmdbjava#65 (comment)

lfoppiano added a commit that referenced this issue May 11, 2018
lfoppiano added a commit that referenced this issue May 11, 2018
@lfoppiano
Copy link
Collaborator

The commit 7372366 should have solved the last issue with the number of readers.

@lfoppiano
Copy link
Collaborator

lfoppiano commented Jun 18, 2018

KB: 37413613 concepts.
EN: 14899737 pages.
DE: 3579552 pages.
FR: 3681264 pages.
ES: 3322291 pages.
IT: 2291751 pages.

@lfoppiano
Copy link
Collaborator

For languages other than english the domains are not resolved. Not a clue why.

@kermitt2
Copy link
Owner Author

It's because the domains are derived from the English categories only, the other languages first do not have the same category hierarchy (then we would need a mapping per language) and have a much small set of categories.

@lfoppiano
Copy link
Collaborator

I wasn't clear. I wanted to say that with this branch there there are no domains at all, while on the master version the domains are in the output json.

Might be solved by rebuilding again all the databases?

@kermitt2
Copy link
Owner Author

Mmm I dont understand. The domains are not produced for English or the domains are not in the disambiguation result json?

For the domains, they are built one time by the Upper KB and they are build just after building the Lower KB for English. It's like any db, if you want to force it to be rebuild, just delete the lmdb files and relaunch.

@lfoppiano
Copy link
Collaborator

They are not in the output json. But it was just a note on the task.

@lfoppiano
Copy link
Collaborator

before lmdbjava:

screen shot 2018-06-18 at 15 52 25

after lmdbjava:
screen shot 2018-06-18 at 15 52 32

@lfoppiano
Copy link
Collaborator

Interesting thing is that the total number of concepts and pages correspond (see #50)

@kermitt2
Copy link
Owner Author

might be that the interlingual files are missing in your resource files, so the KB cannot relate the English domain to an Italian entity

@lfoppiano lfoppiano linked a pull request Aug 10, 2018 that will close this issue
@kermitt2 kermitt2 modified the milestones: 0.0.4, 0.0.5 Jun 12, 2020
@lfoppiano lfoppiano linked a pull request Jul 1, 2020 that will close this issue
@kermitt2 kermitt2 removed this from the 0.0.5 milestone Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants