-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-234: Make Flavor search aware of CWS/Children based OCR #235
Conversation
Does not ensure yet right PAGE ID for IAB More working/yet un committed work happening
The OCR returns were overwriting previous results! @patdunlavey @giancarlobi this should now Match more things on a Book search. // Still incomplete and needs more checking (we need to be sure the new Solr Fields are always present) One note: If an OCR contains "Queen;" "Queen" won't match. I wonder if we need either add spacing on the OCR itself to allow this to match or we can alter the HOCR processor/fieldtype to tokenize against also ",", "-" and ";" ? (Solr config)
This allows any change on a parent (means the attached ADO) to trigger a refresh. This is needed to allow changes in titles/sequence_ids or any other arbitrary metadata to permeate into a SBF document
basic CWS/Parent/Child search using the parent sequence_id (means Children ADO based ordering).
@patdunlavey this is the code that solves an 85% of the use cases. Will add a guide with screenshots tomorrow but the code as it is should be ready. The other part goes into |
@patdunlavey,@alliomeria and @karomabiles (if you want to see this working) instructions for testing OCR for compounds:
And make sure each one also has "sequence_id" set, to 1, 2 and 3 respectively. If your webforms don't have that element/key (we need it) please add it or edit the JSON RAW. Save. Make sure the Queue is processed (All Background ones that will generate OCR).
Mine is named: and has these settings: Basically you want to have the IABookreader but using the IIIF V3 CWS as template as source. Now Apply that view Mode to the Top Object by editing and forcing that Display Mode You should not need to reindex at this stage (if you followed this steps for this demo object) Search for "Queen", "Pumpkin" and "King". Each should be highlighted correctly on its own page. Now search for "OCR" multiple pages. This covers the basic use case where all children have a Please let me know if you have issues/questions/needs |
@DiegoPino starting to look at this now (sorry for the delay!!!) |
We already cast to (int) later on. I can of course be EVEN more thorough but is_int makes webform generated NODE IDs be skipped. BAD!
@DiegoPino I was able to reproduce your steps, and your result! The only problem I noticed is that I don't get the pins in the result bar. I suspect that's due to me not being fully caught up to changes in the IIIF Presentation API 3 Creative Works Series Manifest. I tested what happens when I add a second image file to one of the child objects. It seems to OCR correctly, but it is saved in the key_value table with the sequence number of "1", rather than that found at Not sure if this is a simple problem to solve (and whether it's in the 15% you referred to!). |
Looking here, it seems like the sequence number should be correct. Not sure why it isn't! |
@patdunlavey adding a new page and having key_value = 1 is OK. I wonder if you added the "sequence_id" JSON KEY key to your new page/ADO? |
The actual page matching here depends on having a sequence_id at at Child ADO level. Without it, the Manifest is going to show pages in any order and won't match the response (and re-lative new ordering of results from the search) order that happens here now. The re-paging of the results happens here: strawberryfield/src/Controller/StrawberryfieldFlavorDatasourceSearchController.php Line 290 in 3d022ae
So if your ADO (the one that produced the HOCR) has no sequence_id it will return 1 and thus will offset all. Your new page should have sequence_id = 4 (in the JSON) now |
Also, the lack of pins in the result bar is strange. Are you using this on top of a custom code piece? e.g have you started modifying any other part of Archipelago already? Weird because on a fresh 1.0.0 I do see the pins .... maybe we need to have a call! |
@patdunlavey will merge and we open a new Pull/ISSUE for troubleshooting? There is more work to be done on SBFlavors for sure and I can add any corrections to a new pull. |
Sorry, I meant to get the results of my investigation in earlier! I'll make a new ticket for the multi-file sequencing issue. |
Still WIP don't even test