Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve debug messaging for errors caused by type mismatch #16

Closed
lucaspatel opened this issue Aug 2, 2022 · 0 comments · Fixed by #18
Closed

Improve debug messaging for errors caused by type mismatch #16

lucaspatel opened this issue Aug 2, 2022 · 0 comments · Fixed by #18

Comments

@lucaspatel
Copy link

I recently an into an issue with Qupid that stumped me for half an hour. Basically, I ran Qupid using data from Qiita as follows:

example_data = prepare_data("data/analyses/wisc_meta/153241_metadata.tsv")

from qupid import match_by_multiple

nor_str = "Normal"
ad_str = "Dementia-AD"

# pairs for no rarefaction
background = example_data.query("diagnosis == @nor_str")
focus = example_data.query("diagnosis == @ad_str")

example_data = match_by_multiple(
    focus=focus,
    background=background,
    categories=["sex", "mars_age"],
    tolerance_map={"mars_age": 4.5}
)

Qupid outputted the following error:

---------------------------------------------------------------------------
NoMatchesError                            Traceback (most recent call last)
/Users/lucas/Documents/knight-rotation/LNP_02_U19_Prepare_SPSS.ipynb Cell 12 in <cell line: 14>()
     11 background = example_data.query("diagnosis == @nor_str")
     12 focus = example_data.query("diagnosis == @ad_str")
---> 14 example_data = match_by_multiple(
     15     focus=focus,
     16     background=background,
     17     categories=["sex", "mars_age"],
     18     tolerance_map={"mars_age": 4.5}
     19 )

File ~/Downloads/miniconda3-intel/envs/qiime2-2022.2/lib/python3.8/site-packages/qupid/qupid.py:107, in match_by_multiple(focus, background, categories, tolerance_map, on_failure)
    105 for cat in categories:
    106     tol = tolerance_map.get(cat, 1e-08)
--> 107     observed = match_by_single(focus[cat], background[cat],
    108                                tol, on_failure).case_control_map
    109     for fidx, fhits in observed.items():
    110         # Reduce the matches with successive categories
    111         matches[fidx] = matches[fidx] & fhits

File ~/Downloads/miniconda3-intel/envs/qiime2-2022.2/lib/python3.8/site-packages/qupid/qupid.py:55, in match_by_single(focus, background, tolerance, on_failure)
     53 else:
     54     if on_failure == "raise":
---> 55         raise exc.NoMatchesError(f_idx)
     56     else:
     57         matches[f_idx] = set()

NoMatchesError: No valid matches found for sample 13663.mars00006.

After working at it for a bit, I realized the problem is because the input focus and background data frames contained a few mars_age values that were non-numeric. I fixed this with the following line run before the Qupid lines:

example_data.loc[(example_data["mars_age"] == '>90'),'mars_age'] = "90"

Even though this resolves the non-numeric values, the error persists (with identical output) until the type of the dataframe column is updated (from object to numeric):

example_data["mars_age"] = pd.to_numeric(example_data["mars_age"])

After including this line, Qupid works as expected.

The error messages do not seem to reflect the root cause of the issue: that the type within the dataframe column is invalid. It would be preferable if Qupid could catch such type incompatibilities and report them to user in a more meaningful way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant