Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

WebSearch: prevent infinite synonyms lookup #804

Closed
jrbl opened this Issue · 6 comments

4 participants

Joe Blaylock jeromecaffaro Tibor Simko Samuele Kaplun
Joe Blaylock
Collaborator

Originally on 2011-09-02

The synonym getter in search_unit calls search_unit with the synonyms. This means that if you have a circumstance where getting the synonyms of a synonym transforms a synonym into the original term, you end up doing infinite recursion and blowing the stack.

So I added a flag to search_unit that lets calls to search_unit for synonyms not do synonym lookups.

jeromecaffaro jeromecaffaro was assigned by jrbl
Joe Blaylock jrbl changed the title from WebSearch: prevent infinite synonyms lookup to [PATCH] 500 Internal Server Errors on Journal Lookup
Joe Blaylock jrbl changed the title from [PATCH] 500 Internal Server Errors on Journal Lookup to WebSearch: prevent infinite synonyms lookup
jeromecaffaro
Collaborator

Originally on 2013-04-25

An issue with the provided solution is that synonym expansion will stop after one lookup. For eg. with the following knowledge base:

A->B
B->C

a search for A will search for either A or B but not C.

A branch that solves the above limitation is available in jerome/804-master-websearch-fix-synonyms-infinite-recursion

Tibor Simko
Owner

Originally on 2013-08-14

Thanks, merging into maint-1.1 as well.

jeromecaffaro
Collaborator

Originally on 2013-08-14

In ed6d4d5:

#CommitTicketReference repository="invenio" revision="ed6d4d54d29c1a791855fa245fa9a9cecc542c12"
WebSearch: fix infinite synonym lookup cases

- Fixes infinite recursion when a knowledge base that is used for
  synonym lookup contains a cycle (A->B, B->A).  Adds 'ignore_synonyms'
  parameter to `search_unit()` in order to control which synonyms have
  already been translated and should consequently be ignored.
  (closes #804)

Reviewed-by: Tibor Simko <tibor.simko@cern.ch>
jeromecaffaro
Collaborator

Originally on 2013-08-14

In ed6d4d5:

#CommitTicketReference repository="invenio" revision="ed6d4d54d29c1a791855fa245fa9a9cecc542c12"
WebSearch: fix infinite synonym lookup cases

- Fixes infinite recursion when a knowledge base that is used for
  synonym lookup contains a cycle (A->B, B->A).  Adds 'ignore_synonyms'
  parameter to `search_unit()` in order to control which synonyms have
  already been translated and should consequently be ignored.
  (closes #804)

Reviewed-by: Tibor Simko <tibor.simko@cern.ch>
jeromecaffaro
Collaborator

Originally on 2013-08-14

In ed6d4d5:

#CommitTicketReference repository="invenio" revision="ed6d4d54d29c1a791855fa245fa9a9cecc542c12"
WebSearch: fix infinite synonym lookup cases

- Fixes infinite recursion when a knowledge base that is used for
  synonym lookup contains a cycle (A->B, B->A).  Adds 'ignore_synonyms'
  parameter to `search_unit()` in order to control which synonyms have
  already been translated and should consequently be ignored.
  (closes #804)

Reviewed-by: Tibor Simko <tibor.simko@cern.ch>
Samuele Kaplun
Collaborator

Originally on 2013-11-15

We seem to have a regression in INSPIRE:

2013-11-14 23:58:57 --> Unexpected error occurred: 'list' object is not callable.
2013-11-14 23:58:57 --> Traceback is:
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibtask.py", line 531, in task_init
2013-11-14 23:58:57 -->     ret = _task_run(task_run_fnc)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibtask.py", line 1067, in _task_run
2013-11-14 23:58:57 -->     if callable(task_run_fnc) and task_run_fnc():
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank.py", line 159, in task_run_core
2013-11-14 23:58:57 -->     func_object(key)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_tag_based_indexer.py", line 443, in citation
2013-11-14 23:58:57 -->     return bibrank_engine(run)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_tag_based_indexer.py", line 356, in bibrank_engine
2013-11-14 23:58:57 -->     func_object(rank_method_code, cfg_name, config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_tag_based_indexer.py", line 68, in citation_exec
2013-11-14 23:58:57 -->     dic, index_update_time = get_citation_weight(rank_method_code, config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 140, in get_citation_weight
2013-11-14 23:58:57 -->     weights = process_and_store(updated_recids, config, chunk_size)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 176, in process_and_store
2013-11-14 23:58:57 -->     cites, refs = process_chunk(chunk, config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 209, in process_chunk
2013-11-14 23:58:57 -->     config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 766, in ref_analyzer
2013-11-14 23:58:57 -->     config=config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 90, in get_recids_matching_query
2013-11-14 23:58:57 -->     ret = search_pattern(p=p, f=f, m=m) & recids_cache(collections)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2064, in search_pattern
2013-11-14 23:58:57 -->     basic_search_unit_hitset = search_unit(bsu_p, bsu_f, bsu_m, wl)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2311, in search_unit
2013-11-14 23:58:57 -->     ignore_synonyms)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2311, in search_unit
[...]
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2290, in search_unit
2013-11-14 23:58:58 -->     tokenizer = get_field_tokenizer_type(f)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 460, in get_field_tokenizer_type
2013-11-14 23:58:58 -->     field_tokenizer_cache.recreate_cache_if_needed()
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/data_cacher.py", line 76, in recreate_cache_if_needed
2013-11-14 23:58:58 -->     if self.timestamp_verifier() > self.timestamp:
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 448, in timestamp_verifier
2013-11-14 23:58:58 -->     return get_table_update_time('idxINDEX')
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/dbquery.py", line 419, in get_table_update_time
2013-11-14 23:58:58 -->     run_on_slave=run_on_slave)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/dbquery.py", line 256, in run_sql
2013-11-14 23:58:58 -->     rc = cur.execute(sql, param)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 168, in execute
2013-11-14 23:58:58 -->     self.errorhandler(self, TypeError, m)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
2013-11-14 23:58:58 -->     raise errorclass, errorvalue
2013-11-14 23:58:58 --> Exiting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.