You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please indicate the following details about the environment in which you found the bug:
SDMetrics version: 0.9.2
Python version: 3.9.7
Operating System: WSL
Error Description
Hello,
The denormalized table in SDMetrics uses the following code:
def _denormalize(data, foreign_key):
"""Denormalize the child table over the parent."""
parent_table, parent_key, child_table, child_key = foreign_key
flat = data[parent_table].set_index(parent_key).merge(
data[child_table].set_index(child_key),
how='outer',
left_index=True,
right_index=True,
).reset_index(drop=True)
return flat
The “how” parameter in “merge” function is set to “outer” as value. This value has as effect to generate new lines with NaN values for the child table columns when there are rows in parent table without child rows. For example, let us compute the denormalized table of train table (with 7381 rows) of Telstra database. Train table has severity_type table as parent table (with 18552 rows). With the above code, we get the following denormalized table:
Index
severity_type
location
fault_severity
0
severity_type 1
location 601
1.0
1
severity_type 2
NaN
NaN
2
severity_type 1
NaN
NaN
3
severity_type 4
NaN
NaN
4
severity_type 2
location 460
0.0
...
...
...
...
18547
severity_type 2
location 278
0.0
18548
severity_type 1
NaN
NaN
18549
severity_type 1
location 12
0.0
18550
severity_type 1
NaN
NaN
18551
severity_type 2
NaN
NaN
The expected denormalized table has to be:
severity_type
location
fault_severity
0
severity_type 2
location 118
1
1
severity_type 2
location 91
0
2
severity_type 2
location 152
1
3
severity_type 1
location 931
1
4
severity_type 1
location 120
0
...
...
...
...
7376
severity_type 2
location 167
0
7377
severity_type 1
location 106
0
7378
severity_type 2
location 1086
2
7379
severity_type 1
location 7
0
7380
severity_type 1
location 885
0
with only 7381 rows. We can get this result by changing the value of “how” parameter to “right” instead of “outer”.
Best regards!
The text was updated successfully, but these errors were encountered:
Hi @mohamedgy thanks for catching this. I agree that "outer" doesn't make sense as the join here -- a "right" makes much more sense since the right rows (child table) should point to exactly 1 left row (parent table).
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Hello,
The denormalized table in SDMetrics uses the following code:
The “how” parameter in “merge” function is set to “outer” as value. This value has as effect to generate new lines with NaN values for the child table columns when there are rows in parent table without child rows. For example, let us compute the denormalized table of train table (with 7381 rows) of Telstra database. Train table has severity_type table as parent table (with 18552 rows). With the above code, we get the following denormalized table:
The expected denormalized table has to be:
with only 7381 rows. We can get this result by changing the value of “how” parameter to “right” instead of “outer”.
Best regards!
The text was updated successfully, but these errors were encountered: