KeyError: 'None of [...] are in the [index]' #21

paulperry · 2019-02-04T23:35:11Z

I'm a bit lost here. Is there a toy example I can play with ?

from rankeval.analysis.effectiveness import model_performance

model_perf = model_performance(
    datasets=[rank_train], 
    models=[rankeval_model], 
    metrics=[precision_5, recall_5, ndcg_5])

model_perf.to_dataframe()

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-211-bb7252cc105d> in <module>()
      4     datasets=[rank_train],
      5     models=[rankeval_model],
----> 6     metrics=[ ndcg_5])
      7 
      8 model_perf.to_dataframe()

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/analysis/effectiveness.py in model_performance(datasets, models, metrics, cache)
     57             for idx_metric, metric in enumerate(metrics):
     58                 data[idx_dataset][idx_model][idx_metric] = metric.eval(dataset,
---> 59                                                                        y_pred)[0]
     60 
     61     performance = xr.DataArray(data,

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/metrics/ndcg.py in eval(self, dataset, y_pred)
     91             for rel_id, (qid, q_y, _) in enumerate(
     92                     self.query_iterator(dataset, dataset.y)):
---> 93                 idcg_score[rel_id] = self.dcg.eval_per_query(q_y, q_y)
     94 
     95             self._cache_idcg_score[self._current_dataset] = idcg_score

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-macosx-10.7-x86_64.egg/rankeval/metrics/dcg.py in eval_per_query(self, y, y_pred)
     97             gain = y[idx_y_pred_sorted]
     98         elif self.implementation == "exp":
---> 99             gain = np.exp2(y[idx_y_pred_sorted]) - 1.0
    100 
    101         dcg = (gain / discount).sum()

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    808             key = check_bool_indexer(self.index, key)
    809 
--> 810         return self._get_with(key)
    811 
    812     def _get_with(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    840             if key_type == 'integer':
    841                 if self.index.is_integer() or self.index.is_floating():
--> 842                     return self.loc[key]
    843                 else:
    844                     return self._get_values(key)

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1476 
   1477             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1478             return self._getitem_axis(maybe_callable, axis=axis)
   1479 
   1480     def _is_scalar_access(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1899                     raise ValueError('Cannot index with multidimensional key')
   1900 
-> 1901                 return self._getitem_iterable(key, axis=axis)
   1902 
   1903             # nested tuple slicing

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1141             if labels.is_unique and Index(keyarr).is_unique:
   1142                 indexer = ax.get_indexer_for(key)
-> 1143                 self._validate_read_indexer(key, indexer, axis)
   1144 
   1145                 d = {axis: [ax.reindex(keyarr)[0], indexer]}

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1204                 raise KeyError(
   1205                     u"None of [{key}] are in the [{axis}]".format(
-> 1206                         key=key, axis=self.obj._get_axis_name(axis)))
   1207 
   1208             # we skip the warning on Categorical/Interval

KeyError: 'None of [3807    76\n4956    59\n3972    72\n635     73\n3664    20\nName: target, dtype: int64] are in the [index]'

 =

The text was updated successfully, but these errors were encountered:

strani · 2019-02-11T17:47:35Z

I need more context here. What about rank_train, rankeval_model and the several metrics object in the code above?

A toy example showing how to use the model_performance method is here

It seems like you are using somewhere in the dataset (is it the y label ?!?) a pandas object instead of a numpy one...

paulperry · 2019-02-26T20:40:51Z

I'm trying the toy example, but I get stuck at cell 6:

# Loading Models
from rankeval.model import RTEnsemble

msn_qr_lmart_1Ktrees = RTEnsemble(msn_qr_lmart_1Ktrees_file, name="QR_lmart_1K", format="QuickRank")
msn_qr_lmart_15Ktrees = RTEnsemble(msn_qr_lmart_15Ktrees_file, name="QR_lmart_15K", format="QuickRank")
msn_xgb_lmart_1Ktrees = RTEnsemble(msn_xgb_lmart_1Ktrees_file, name="XGB_lmart_1K", format="XGBoost")
msn_lgbm_lmart_1Ktrees = RTEnsemble(msn_lgbm_lmart_1Ktrees_file, name="LGBM_lmart_1K", format="LightGBM")

Traceback (most recent call last):

  File "/home/user/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-7-dbc6ad59c659>", line 5, in <module>
    msn_qr_lmart_15Ktrees = RTEnsemble(msn_qr_lmart_15Ktrees_file, name="QR_lmart_15K", format="QuickRank")

  File "/home/user/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/rt_ensemble.py", line 120, in __init__
    ProxyQuickRank.load(file_path, self)

  File "/home/user/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/proxy_QuickRank.py", line 52, in load
    n_trees, n_nodes = ProxyQuickRank._count_nodes(file_path)

  File "/home/user/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/proxy_QuickRank.py", line 230, in _count_nodes
    _, root = next(context)

  File "/home/user/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1221, in iterator
    yield from pullparser.read_events()

  File "/home/user/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1296, in read_events
    raise event

  File "/home/user/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1268, in feed
    self._parser.feed(data)

  File "<string>", line unknown
ParseError: syntax error: line 1, column 0

It's not clear from the error, but msn_qr_lmart_15Ktrees fails, so I commented it out and proceeded and was able to load the xgb model. Then I run into the following error in cell 8:

from rankeval.analysis.effectiveness import model_performance

msn_model_perf = model_performance(
    datasets=[msn_test], 
    models=[msn_qr_lmart_1Ktrees, msn_xgb_lmart_1Ktrees],
    metrics=[precision_10, recall_10, ndcg_10])
msn_model_perf.to_dataframe()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-d205ca4bd2ab> in <module>()
      4     datasets=[msn_test],
      5     models=[msn_qr_lmart_1Ktrees, msn_xgb_lmart_1Ktrees],
----> 6     metrics=[precision_10, recall_10, ndcg_10])
      7 msn_model_perf.to_dataframe()

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/analysis/effectiveness.py in model_performance(datasets, models, metrics, cache)
     54     for idx_dataset, dataset in enumerate(datasets):
     55         for idx_model, model in enumerate(models):
---> 56             y_pred = model.score(dataset, detailed=False, cache=cache)
     57             for idx_metric, metric in enumerate(metrics):
     58                 data[idx_dataset][idx_model][idx_metric] = metric.eval(dataset,

~/anaconda3/lib/python3.6/site-packages/rankeval-0.7.2-py3.6-linux-x86_64.egg/rankeval/model/rt_ensemble.py in score(self, dataset, detailed, cache)
    328         # check that the features used by the model are "compatible" with the
    329         # features in the dataset (at least, in terms of their number)
--> 330         if np.max(self.trees_nodes_feature) + 1 > dataset.n_features:
    331             raise RuntimeError("Dataset features are not compatible with "
    332                                "model features")

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial)
   2503     """
   2504     return _wrapreduction(a, np.maximum, 'max', axis, None, out, keepdims=keepdims,
-> 2505                           initial=initial)
   2506 
   2507 

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     84                 return reduction(axis=axis, out=out, **passkwargs)
     85 
---> 86     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     87 
     88 

ValueError: zero-size array to reduction operation maximum which has no identity

Cell 5 failed because I'm using python 3.6 and the print statement is in 2.7, so I fixed that. FYI

I'm now on Ubuntu 16.04, Python 3.6.4 . I built rankeval from develop and passed the nose tests.

strani · 2019-07-18T14:42:59Z

I made a new release on pypi, fixing all the notebooks and some other minor stuff in order to let the project be fully compliant with python3. Let me know if it works.

paulperry · 2019-11-28T12:51:24Z

Confirming it works. Thank you!

strani closed this as completed Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'None of [...] are in the [index]' #21

KeyError: 'None of [...] are in the [index]' #21

paulperry commented Feb 4, 2019

strani commented Feb 11, 2019 •

edited

Loading

paulperry commented Feb 26, 2019 •

edited

Loading

strani commented Jul 18, 2019

paulperry commented Nov 28, 2019

KeyError: 'None of [...] are in the [index]' #21

KeyError: 'None of [...] are in the [index]' #21

Comments

paulperry commented Feb 4, 2019

strani commented Feb 11, 2019 • edited Loading

paulperry commented Feb 26, 2019 • edited Loading

strani commented Jul 18, 2019

paulperry commented Nov 28, 2019

strani commented Feb 11, 2019 •

edited

Loading

paulperry commented Feb 26, 2019 •

edited

Loading