Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAGRegressor prediction data type changes results #140

Closed
nick-gorman opened this issue Oct 5, 2021 · 1 comment
Closed

DAGRegressor prediction data type changes results #140

nick-gorman opened this issue Oct 5, 2021 · 1 comment

Comments

@nick-gorman
Copy link

Description

When using the non linear DAGRegressor very different results are returned depending on whether prediction data is provided as float or int. See example below.

Context

When writing unit tests for a project that uses causalnex I provided the prediction data as an int, this gave an unexpected results. As I did not expect the data type to cause this issue it took several hours to debug.

Steps to Reproduce

import pandas as pd
import numpy as np
from causalnex.structure.pytorch import DAGRegressor

training_data = pd.DataFrame({
    'x': np.linspace(0, 500, num=500),
    'y': np.linspace(0, 500, num=500)})

reg = DAGRegressor(threshold=0.0,
                   alpha=0.0001,
                   beta=0.5,
                   fit_intercept=True,
                   hidden_layer_units=[10],
                   standardize=True)

X = training_data.loc[:, ['x']]
y = training_data['y']

reg.fit(X, y)

test_data_int = pd.DataFrame({
    'x': [0, 250, 500]})

test_data_float = pd.DataFrame({
    'x': [0.0, 250.0, 500.0]})

print(reg.predict(test_data_int))
# [ 99.82387053 250.16034682 400.16523327]

print(reg.predict(test_data_float))
# [ 22.48109366 250.16034682 477.11883298]

Expected Result

ints and floats provided to predict should yield the same result, i.e. providing 0.0 or 0, should yield the same result.

Actual Result

See example above.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • CausalNex version 0.10.0
  • Python version 3.7
  • Windows version 10.0.19043 Build 19043
@qbphilip
Copy link
Contributor

qbphilip commented Nov 9, 2021

Hello Nick,
Thanks for raising this issue, this should definitely not be the case and I feel your frustration for bugfixing 😓 !
I much appreciate the example, I put it into a unit test :)

Had a quick look and it seems like its due to some numpy dtype behaviour that was not taken into account (and I personally didnt know either) If a numpy array has dtype int, assigning a float to a column will make it lose the decimals because you dont overwrite the dtype.

I have a local fix and will push it as part of our upcoming v0.11 release in the next days.

@qbphilip qbphilip mentioned this issue Nov 10, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants