DAGRegressor prediction data type changes results #140

nick-gorman · 2021-10-05T00:23:09Z

Description

When using the non linear DAGRegressor very different results are returned depending on whether prediction data is provided as float or int. See example below.

Context

When writing unit tests for a project that uses causalnex I provided the prediction data as an int, this gave an unexpected results. As I did not expect the data type to cause this issue it took several hours to debug.

Steps to Reproduce

import pandas as pd
import numpy as np
from causalnex.structure.pytorch import DAGRegressor

training_data = pd.DataFrame({
    'x': np.linspace(0, 500, num=500),
    'y': np.linspace(0, 500, num=500)})

reg = DAGRegressor(threshold=0.0,
                   alpha=0.0001,
                   beta=0.5,
                   fit_intercept=True,
                   hidden_layer_units=[10],
                   standardize=True)

X = training_data.loc[:, ['x']]
y = training_data['y']

reg.fit(X, y)

test_data_int = pd.DataFrame({
    'x': [0, 250, 500]})

test_data_float = pd.DataFrame({
    'x': [0.0, 250.0, 500.0]})

print(reg.predict(test_data_int))
# [ 99.82387053 250.16034682 400.16523327]

print(reg.predict(test_data_float))
# [ 22.48109366 250.16034682 477.11883298]

Expected Result

ints and floats provided to predict should yield the same result, i.e. providing 0.0 or 0, should yield the same result.

Actual Result

See example above.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

CausalNex version 0.10.0
Python version 3.7
Windows version 10.0.19043 Build 19043

qbphilip · 2021-11-09T22:26:41Z

Hello Nick,
Thanks for raising this issue, this should definitely not be the case and I feel your frustration for bugfixing 😓 !
I much appreciate the example, I put it into a unit test :)

Had a quick look and it seems like its due to some numpy dtype behaviour that was not taken into account (and I personally didnt know either) If a numpy array has dtype int, assigning a float to a column will make it lose the decimals because you dont overwrite the dtype.

I have a local fix and will push it as part of our upcoming v0.11 release in the next days.

tsanikgr closed this as completed in 3118e16 Nov 10, 2021

qbphilip mentioned this issue Nov 10, 2021

Release/0.11.0 #141

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAGRegressor prediction data type changes results #140

DAGRegressor prediction data type changes results #140

nick-gorman commented Oct 5, 2021

qbphilip commented Nov 9, 2021

DAGRegressor prediction data type changes results #140

DAGRegressor prediction data type changes results #140

Comments

nick-gorman commented Oct 5, 2021

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

qbphilip commented Nov 9, 2021