Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'None of [...] are in the [index]' #21

Closed
paulperry opened this issue Feb 4, 2019 · 4 comments
Closed

KeyError: 'None of [...] are in the [index]' #21

paulperry opened this issue Feb 4, 2019 · 4 comments

Comments

@paulperry
Copy link

I'm a bit lost here. Is there a toy example I can play with ?

from rankeval.analysis.effectiveness import model_performance

model_perf = model_performance(
    datasets=[rank_train], 
    models=[rankeval_model], 
    metrics=[precision_5, recall_5, ndcg_5])

model_perf.to_dataframe()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-211-bb7252cc105d> in <module>()
      4     datasets=[rank_train],
      5     models=[rankeval_model],
----> 6     metrics=[ ndcg_5])
      7 
      8 model_perf.to_dataframe()

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/analysis/effectiveness.py in model_performance(datasets, models, metrics, cache)
     57             for idx_metric, metric in enumerate(metrics):
     58                 data[idx_dataset][idx_model][idx_metric] = metric.eval(dataset,
---> 59                                                                        y_pred)[0]
     60 
     61     performance = xr.DataArray(data,

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/metrics/ndcg.py in eval(self, dataset, y_pred)
     91             for rel_id, (qid, q_y, _) in enumerate(
     92                     self.query_iterator(dataset, dataset.y)):
---> 93                 idcg_score[rel_id] = self.dcg.eval_per_query(q_y, q_y)
     94 
     95             self._cache_idcg_score[self._current_dataset] = idcg_score

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/metrics/dcg.py in eval_per_query(self, y, y_pred)
     97             gain = y[idx_y_pred_sorted]
     98         elif self.implementation == "exp":
---> 99             gain = np.exp2(y[idx_y_pred_sorted]) - 1.0
    100 
    101         dcg = (gain / discount).sum()

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    808             key = check_bool_indexer(self.index, key)
    809 
--> 810         return self._get_with(key)
    811 
    812     def _get_with(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    840             if key_type == 'integer':
    841                 if self.index.is_integer() or self.index.is_floating():
--> 842                     return self.loc[key]
    843                 else:
    844                     return self._get_values(key)

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1476 
   1477             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1478             return self._getitem_axis(maybe_callable, axis=axis)
   1479 
   1480     def _is_scalar_access(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1899                     raise ValueError('Cannot index with multidimensional key')
   1900 
-> 1901                 return self._getitem_iterable(key, axis=axis)
   1902 
   1903             # nested tuple slicing

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1141             if labels.is_unique and Index(keyarr).is_unique:
   1142                 indexer = ax.get_indexer_for(key)
-> 1143                 self._validate_read_indexer(key, indexer, axis)
   1144 
   1145                 d = {axis: [ax.reindex(keyarr)[0], indexer]}

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1204                 raise KeyError(
   1205                     u"None of [{key}] are in the [{axis}]".format(
-> 1206                         key=key, axis=self.obj._get_axis_name(axis)))
   1207 
   1208             # we skip the warning on Categorical/Interval

KeyError: 'None of [3807    76\n4956    59\n3972    72\n635     73\n3664    20\nName: target, dtype: int64] are in the [index]'

 = 

@strani
Copy link
Contributor

strani commented Feb 11, 2019

I need more context here. What about rank_train, rankeval_model and the several metrics object in the code above?

A toy example showing how to use the model_performance method is here

It seems like you are using somewhere in the dataset (is it the y label ?!?) a pandas object instead of a numpy one...

@paulperry
Copy link
Author

paulperry commented Feb 26, 2019

I'm trying the toy example, but I get stuck at cell 6:

# Loading Models
from rankeval.model import RTEnsemble

msn_qr_lmart_1Ktrees = RTEnsemble(msn_qr_lmart_1Ktrees_file, name="QR_lmart_1K", format="QuickRank")
msn_qr_lmart_15Ktrees = RTEnsemble(msn_qr_lmart_15Ktrees_file, name="QR_lmart_15K", format="QuickRank")
msn_xgb_lmart_1Ktrees = RTEnsemble(msn_xgb_lmart_1Ktrees_file, name="XGB_lmart_1K", format="XGBoost")
msn_lgbm_lmart_1Ktrees = RTEnsemble(msn_lgbm_lmart_1Ktrees_file, name="LGBM_lmart_1K", format="LightGBM")
Traceback (most recent call last):

  File "/home/user/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-dbc6ad59c659>", line 5, in <module>
    msn_qr_lmart_15Ktrees = RTEnsemble(msn_qr_lmart_15Ktrees_file, name="QR_lmart_15K", format="QuickRank")

  File "/home/user/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/rt_ensemble.py", line 120, in __init__
    ProxyQuickRank.load(file_path, self)

  File "/home/user/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/proxy_QuickRank.py", line 52, in load
    n_trees, n_nodes = ProxyQuickRank._count_nodes(file_path)

  File "/home/user/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/proxy_QuickRank.py", line 230, in _count_nodes
    _, root = next(context)

  File "/home/user/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1221, in iterator
    yield from pullparser.read_events()

  File "/home/user/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1296, in read_events
    raise event

  File "/home/user/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1268, in feed
    self._parser.feed(data)

  File "<string>", line unknown
ParseError: syntax error: line 1, column 0

It's not clear from the error, but msn_qr_lmart_15Ktrees fails, so I commented it out and proceeded and was able to load the xgb model. Then I run into the following error in cell 8:

from rankeval.analysis.effectiveness import model_performance

msn_model_perf = model_performance(
    datasets=[msn_test], 
    models=[msn_qr_lmart_1Ktrees, msn_xgb_lmart_1Ktrees],
    metrics=[precision_10, recall_10, ndcg_10])
msn_model_perf.to_dataframe()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-d205ca4bd2ab> in <module>()
      4     datasets=[msn_test],
      5     models=[msn_qr_lmart_1Ktrees, msn_xgb_lmart_1Ktrees],
----> 6     metrics=[precision_10, recall_10, ndcg_10])
      7 msn_model_perf.to_dataframe()

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/analysis/effectiveness.py in model_performance(datasets, models, metrics, cache)
     54     for idx_dataset, dataset in enumerate(datasets):
     55         for idx_model, model in enumerate(models):
---> 56             y_pred = model.score(dataset, detailed=False, cache=cache)
     57             for idx_metric, metric in enumerate(metrics):
     58                 data[idx_dataset][idx_model][idx_metric] = metric.eval(dataset,

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/rt_ensemble.py in score(self, dataset, detailed, cache)
    328         # check that the features used by the model are "compatible" with the
    329         # features in the dataset (at least, in terms of their number)
--> 330         if np.max(self.trees_nodes_feature) + 1 > dataset.n_features:
    331             raise RuntimeError("Dataset features are not compatible with "
    332                                "model features")

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial)
   2503     """
   2504     return _wrapreduction(a, np.maximum, 'max', axis, None, out, keepdims=keepdims,
-> 2505                           initial=initial)
   2506 
   2507 

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     84                 return reduction(axis=axis, out=out, **passkwargs)
     85 
---> 86     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     87 
     88 

ValueError: zero-size array to reduction operation maximum which has no identity

Cell 5 failed because I'm using python 3.6 and the print statement is in 2.7, so I fixed that. FYI

I'm now on Ubuntu 16.04, Python 3.6.4 . I built rankeval from develop and passed the nose tests.

@strani
Copy link
Contributor

strani commented Jul 18, 2019

I made a new release on pypi, fixing all the notebooks and some other minor stuff in order to let the project be fully compliant with python3. Let me know if it works.

@strani strani closed this as completed Jul 18, 2019
@paulperry
Copy link
Author

Confirming it works. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants