refactor tree encoder: add encoding dict, param unseen, inverse transform #757

solegalli · 2024-05-06T10:36:54Z

closes #729
closes #728
closes #588

solegalli · 2024-05-06T12:53:32Z

If you have a minute to spare, a quick look to see that there is nothing big that I am missing will be much appreciated :)

codecov · 2024-05-06T12:55:29Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.19%. Comparing base (46155a0) to head (c0c1407).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #757   +/-   ##
=======================================
  Coverage   98.18%   98.19%           
=======================================
  Files         105      105           
  Lines        4074     4093   +19     
  Branches      795      802    +7     
=======================================
+ Hits         4000     4019   +19     
  Misses         29       29           
  Partials       45       45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/test_encoding/test_decision_tree_encoder.py

feature_engine/encoding/decision_tree.py

solegalli · 2024-05-07T10:43:18Z

Thank you guys! @93lorenzo @glevv @VascoSch92

Really appreciate it :)

glevv · 2024-05-07T11:12:42Z

tests/test_encoding/test_decision_tree_encoder.py

+        "Parameter `unseen` takes only values ignore, raise, encode. "
+        f"Got {unseen} instead."
+    )
+    with pytest.raises(ValueError) as record:


This one should also be fixed, but I'm not sure why there would be an error

Is this comment new? I don't think I left any record in the file I committed last. Just doubled checked and didnt find any left.

The error comes from the values within parametrize, when I pass a list or a tuple, say ["raise", "ignore"] and then that tuple is passed as {unseen}, it doesn't like it. But it resolved with re.escape(message here)

glevv · 2024-05-07T11:13:23Z

tests/test_encoding/test_decision_tree_encoder.py

+        f"{var_ls}."
+    )
+    # new categories will raise an error
+    with pytest.raises(ValueError) as record:


Also look at this one

Yep, I didn't miss them. I changed them to match=msg, but then these 2 returned an error which I didn't understand and I was lazy to resolve. Although the messages were identical, the test failed, and it said something about did you want to escape something? Did you come across this before? Probably something to do with regex. which is not my forte :/

both lines of the msg should be f-strings in both cases

Hmm, not sure I follow: This is the error

E AssertionError: Regex pattern did not match. E Regex: 'During the encoding, NaN values were introduced in the feature(s) var_A, var_B.' E Input: 'During the encoding, NaN values were introduced in the feature(s) var_A, var_B.' E Did you mean to `re.escape()` the regex?

And this is the code:

def test_fit_errors_if_new_cat_values_and_unseen_is_raise_param(df_enc): encoder = DecisionTreeEncoder(unseen="raise", regression=False) encoder.fit(df_enc[["var_A", "var_B"]], df_enc["target"]) X = pd.DataFrame( { "var_A": ["A", "ZZZ", "YYY"], "var_B": ["C", "YYY", "ZZZ"], } ) var_ls = "var_A, var_B" msg = ( f"During the encoding, NaN values were introduced in the feature(s) {var_ls}." ) # new categories will raise an error with pytest.raises(ValueError, match=msg): encoder.transform(X) # assert str(record.value) == msg

What am I doing wrong?

Oh, I see. Match is a regex pattern search, so it gets confused by (s). Put an escape character (should be \) like this:

rf"During the encoding, NaN values were introduced in the feature\(s\) {var_ls}."

Other options would be calling re.escape() on msg:

re.escape(f"During the encoding, NaN values were introduced in the feature(s) {var_ls}.")

or shortening the msg (since its not an equality but a regex match it would work):

f"{var_ls}."

Updated the comment

Life saver @glevv thank you so much! Now it works :)

solegalli · 2024-05-08T13:32:46Z

tests/test_encoding/test_decision_tree_encoder.py

+    "unseen", ["string", False, ("raise", "ignore"), ["ignore"], np.nan]
+)
+def test_error_if_unseen_gets_not_permitted_value(unseen):
+    msg = re.escape(


@glevv fixed here

solegalli added 2 commits May 6, 2024 12:35

refactor tree encoder

0e61d2f

add precision parameter

60a0e26

solegalli mentioned this pull request May 6, 2024

refactor: exposing the unseen var of the categorical encoder #729

Closed

finalize files and expand user guide

c33cbdd

solegalli changed the title ~~refactor tree encoder: add encoding dict, param unseen, inverse transform~~ [MRG] refactor tree encoder: add encoding dict, param unseen, inverse transform May 6, 2024

glevv reviewed May 6, 2024

View reviewed changes

tests/test_encoding/test_decision_tree_encoder.py Outdated Show resolved Hide resolved

tests/test_encoding/test_decision_tree_encoder.py Outdated Show resolved Hide resolved

VascoSch92 reviewed May 6, 2024

View reviewed changes

feature_engine/encoding/decision_tree.py Show resolved Hide resolved

make tests more readable

9ae8317

93lorenzo approved these changes May 7, 2024

View reviewed changes

glevv reviewed May 7, 2024

View reviewed changes

solegalli added 3 commits May 7, 2024 16:22

add escape character

24062d3

undo escape, it causes style error

00e3773

add escape syntax

c0c1407

glevv approved these changes May 8, 2024

View reviewed changes

solegalli commented May 8, 2024

View reviewed changes

solegalli changed the title ~~[MRG] refactor tree encoder: add encoding dict, param unseen, inverse transform~~ refactor tree encoder: add encoding dict, param unseen, inverse transform May 14, 2024

solegalli merged commit 389b515 into main May 14, 2024
10 checks passed

solegalli deleted the reformat_tree_encoder branch May 14, 2024 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor tree encoder: add encoding dict, param unseen, inverse transform #757

refactor tree encoder: add encoding dict, param unseen, inverse transform #757

solegalli commented May 6, 2024

solegalli commented May 6, 2024

codecov bot commented May 6, 2024 •

edited

solegalli commented May 7, 2024

glevv May 7, 2024 •

edited

glevv May 8, 2024 •

edited

solegalli May 8, 2024

glevv May 7, 2024

solegalli May 7, 2024

glevv May 7, 2024

solegalli May 7, 2024

glevv May 7, 2024 •

edited

glevv May 7, 2024

solegalli May 8, 2024

solegalli May 8, 2024

refactor tree encoder: add encoding dict, param unseen, inverse transform #757

refactor tree encoder: add encoding dict, param unseen, inverse transform #757

Conversation

solegalli commented May 6, 2024

solegalli commented May 6, 2024

codecov bot commented May 6, 2024 • edited

Codecov Report

solegalli commented May 7, 2024

glevv May 7, 2024 • edited

Choose a reason for hiding this comment

glevv May 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glevv May 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 6, 2024 •

edited

glevv May 7, 2024 •

edited

glevv May 8, 2024 •

edited

glevv May 7, 2024 •

edited