Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples not using default reconciliation service #11

Closed
gitonthescene opened this issue May 25, 2021 · 4 comments · Fixed by #12
Closed

Examples not using default reconciliation service #11

gitonthescene opened this issue May 25, 2021 · 4 comments · Fixed by #12
Labels
bug Something isn't working enhancement New feature or request

Comments

@gitonthescene
Copy link

Hi there, thanks for this project! Might it be useful for the README to have at least one example using a reconciliation service other than the default one?

Also, I think exposing the data extension service through this API might also be useful.

@gitonthescene
Copy link
Author

I tried running against my own reconciliation service and I ran into the following error:

>> reconcile(toreconcile['reps'], reconciliation_endpoint='http://127.0.0.1:5000/reconcile')
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13706.88it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/venv/lib/python3.9/site-packages/reconciler/reconcile.py", line 59, in reconcile
    full_df = parse_raw_results(input_keys, response)
  File "/private/tmp/venv/lib/python3.9/site-packages/reconciler/webutils.py", line 121, in parse_raw_results
    current_df.drop(["features"], axis=1, inplace=True)
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 4308, in drop
    return super().drop(
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/generic.py", line 4153, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/generic.py", line 4188, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5591, in drop
    raise KeyError(f"{labels[mask]} not found in axis")
KeyError: "['features'] not found in axis"
>>> ^D

Is it possible that my version of pandas gives a KeyError instead of the IndexError that's trapped here?

>>> pd.__version__
'1.2.4'
>>> sys.version
'3.9.5 (default, May  7 2021, 21:28:16) \n[Clang 12.0.5 (clang-1205.0.22.9)]'
>>> 

Is it worth trying to trap both? If I replace IndexError with KeyError on my copy, I'm able to reconcile just fine.

>>> reconcile(toreconcile['reps'], reconciliation_endpoint='http://127.0.0.1:5000/reconcile')
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 65.24it/s]
        id  match                name      score      input_value
0  Q434890  False  Nydia M. Velázquez  86.666667  Nydia velazques
>>>

Hmm.. I get a KeyError even with trying to access a non-existent field in a DataFrame. Is this just a bug?

>>> toreconcile['crap']
Traceback (most recent call last):
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'crap'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/private/tmp/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'crap'
>>> 

@jvfe
Copy link
Owner

jvfe commented May 30, 2021

Hey, good catch!

Actually in this case we want to catch both IndexErrors and KeyErrors, to account for both missing top-level keys in the JSON response (which would return the KeyError) as well as lower level keys that are caught in the list comprehensions in webutils.py (Which can give IndexErrors).

KeyErrors when trying to access a non-existent column is standard behaviour for Pandas, I catch that error because of different responses I can get from the reconciliation web APIs - I initially developed this only for Wikidata reconciliation, so I needed to adapt some things to make it possible to use other reconciliation services and this part is still a bit janky.

As for bringing an example using another reconciliation service, that would be a very good addition to the README, if you want to do it in your current pull request, it would be very welcome!

I never used the data extension API before, but it seems interesting, I'll give it a look later, too.

Thanks for posting this issue!

@jvfe jvfe added bug Something isn't working enhancement New feature or request labels May 30, 2021
@jvfe jvfe closed this as completed in #12 May 31, 2021
@gitonthescene
Copy link
Author

Thanks very much! Would you mind pinging me when you’ve pushed the fix to PyPI?

@jvfe
Copy link
Owner

jvfe commented May 31, 2021

@gitonthescene A quick patch solving both of your issues just went up. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants