Skip to content

Query timeout limit reached while updating German Nouns #124

@shashank-iitbhu

Description

@shashank-iitbhu

Terms

Behavior

Description

(scribedev) shashankmittal@ShashanksLaptop Scribe-Data % python3 src/scribe_data/extract_transform/wikidata/update_data.py '["German"]' '["nouns", "verbs"]' 
Data updated:   0%|                                                                                                                   | 0/2 [00:00<?, ?dirs/s]Querying and formatting German nouns
Data updated:   0%|                                                                                                                   | 0/2 [01:00<?, ?dirs/s]
Traceback (most recent call last):
  File "/Users/shashankmittal/Documents/Developer/scribe/Scribe-Data/src/scribe_data/extract_transform/wikidata/update_data.py", line 141, in <module>
    results = sparql.query().convert()
  File "/opt/anaconda3/envs/scribedev/lib/python3.10/site-packages/SPARQLWrapper/Wrapper.py", line 1196, in convert
    return self._convertJSON()
  File "/opt/anaconda3/envs/scribedev/lib/python3.10/site-packages/SPARQLWrapper/Wrapper.py", line 1059, in _convertJSON
    json_str = json.loads(self.response.read().decode("utf-8"))
  File "/opt/anaconda3/envs/scribedev/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/envs/scribedev/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/envs/scribedev/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 320797 column 115 (char 6713171)

Query builder Link

The query time limit is reached that's why results = sparql.query().convert() in update_data.py is throwing json.decoder.JSONDecodeError due to Invalid control character at: line 320797 column 115 (char 6713171) in sparql.query().response as it contains the timeout error logs.

Suggested Changes

  • Considered splitting SPARQL query into smaller queries, such as one query for nouns and another for pronouns, or querying for singular and plural forms separately.
  • Still got Query timeout limit reached error as total number of nouns and pronouns for German are 165869. Verified here.
  • Use LIMIT and OFFSET to split into multiple queries.

Metadata

Metadata

Labels

-priority-High prioritybugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions