Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

'PcaAnomalyDetector' meets error "System.FormatException: 'One of the identified items was in an invalid format.'" in predicting #497

Closed
chjinche opened this issue Aug 5, 2020 · 1 comment · Fixed by dotnet/machinelearning#5349

Comments

@chjinche
Copy link

chjinche commented Aug 5, 2020

Describe the bug
'PcaAnomalyDetector' meets error in predicting randomly. RuntimeError: Error: *** System.FormatException: 'One of the identified items was in an invalid format.'

To Reproduce
Steps to reproduce the behavior:

  1. Run the following code.
import pandas as pd
from nimbusml import Pipeline
from nimbusml.decomposition import PcaAnomalyDetector

data = pd.DataFrame({"f1": [0, 0.5, 1], "f2": [1, 1, 1]})

pipeline = Pipeline([
    PcaAnomalyDetector(rank=2, feature=['f1', 'f2'],
                       oversampling=2,
                       center=False,
                       random_state=123)])

# train, predict
pipeline.fit(data)
pred = pipeline.predict(data)
print(pred)
  1. See error
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.
Not training a calibrator because it is not needed.
Elapsed time: 00:00:00.6893169
Error: *** System.FormatException: 'One of the identified items was in an invalid format.'Traceback (most recent call last):
  File "debug.py", line 15, in <module>
    pred = pipeline.predict(data)
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py", line 220, in wrapper
    params = func(*args, **kwargs)
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/pipeline.py", line 2228, in predict
    as_binary_data_stream=as_binary_data_stream, **params)
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py", line 220, in wrapper
    params = func(*args, **kwargs)
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/pipeline.py", line 2172, in _predict
    raise e
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/pipeline.py", line 2169, in _predict
    **params)
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/internal/utils/entrypoints.py", line 449, in run
    output_predictor_modelfilename)
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/internal/utils/entrypoints.py", line 306, in _try_call_bridge
    raise e
  File "/mnt/miniconda3/envs/py36/lib/python3.7/site-packages/nimbusml/internal/utils/entrypoints.py", line 278, in _try_call_bridge
    ret = px_call(call_parameters)
RuntimeError: Error: *** System.FormatException: 'One of the identified items was in an invalid format.'

Expected behavior
We expected PcaAnomalyDetector predicting successful. And this issue is affecting our product.

Additional context
Weirdly, when we change random_state to 42, above code example passed. However, when we try other input data, it failed again.

Could you pleases help clarify the root cause and provide solution?

@antoniovs1029
Copy link
Member

As discussed offline, the problem wasn't in the random_state but in the dataset. The dataset was:

0 1
0.5 1
1 1

Since the second column was all 1s, it got normalized as all 0's, ending with a small dataset where all the rows where linearly dependent, which then caused problems when extracting the 2 eigenvectors required by PCA (with rank = 2), and made the eigenvectors to have NaN values.

Although the problem is in the dataset, the exception that is thrown is not clear to read, so changing the exception message makes sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants