Skip to content

Revert erronous index to a coherent state #6

@benlabbe

Description

@benlabbe

Could we imagine a process that reverts the NindIndex to its last coherent state ?

Context

for some reason in the documents I try ton index in Nind , I get systematically an exception.

~:$  indexXmlMultimedia corpus_xml/*.mult
...
Indexing corpus_xml/02148_HORZ-ISBL_lot_A_-_EXHIBIT_F_-_Annex_F3_-_rev0.xml.mult (2148/2554, 84%)
terminate called after throwing an instance of 'latecon::nindex::OutWriteBufferException'
  what():  Out write buffer error
Abandon (core dumped)

Since my documents are large (frequently more than 200 pages), the indexation takes nearly 6 hours to reach this error at 84% of the corpus.

First of all, I don't know yet what causes this OutWriteBufferException. The .mult file seems to be erroneous, because when it is excluded, the corpus indexation can complete to its end.

Sadely, this needs a restart of the indexation from the top, because after the OutWriteBufferException, the Nind index files are now corrupted. An indexation on the corrupted Nind files causes a NindPadFileException that doesn't even tell on which file the corruption was found !!

terminate called after throwing an instance of 'latecon::nindex::NindPadFileException'
  what():  Nind Pad error

Propositions

  1. How could we ensure that the NindPadFileException tells us on which file the corruption is found ?
    • The class heritage of NindPadFileException from FileException and std::runtime_error does not seems to overwrite the what() method which is supposed to return a context message containing the file on-which the corruption was found.
  2. Could we imagine a process that rolls back the last document indexation, so that the Nind files come back to a safe state ?
    • This would allow the restart of the indexing with the next .mult files up to the end of the corpus, instead of restarting from the top.

Calling for help
@kleag ?? @jys ?? could you help here ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions