Skip to content

Conversation

@solegalli
Copy link

@solegalli solegalli commented Nov 8, 2021

Hi @gverbock

I made tiny edits to the docstrings and the class and added a few additional asserts and tests to the test file.

I have one final concern. I am not sure split_distrinct is working as I expect. Maybe my expectation is wrong.

For example, if df["A"] = [ A, A, A, A, A, A, B, C, D, E]

  • if split_distinct=False and split_frac=0.5, I expect basis to be [ A, A, A, A, A] and test [ A, B, C, D, E]
  • if split_distinct=True and split_frac=0.5, I expect basis to be [ A, A, A, A, A, A] and test [ B, C, D, E]

Also when split_distinct=True and the split_col is numerical, are we obtaining what we expect?

For example, if df["A"] = [ 0, 0, 0, 0, 0, 0, 1, 2, 3, 4]

  • if split_distinct=False and split_frac=0.5, I expect basis to be [ 0, 0, 0, 0, 0] and test [0, 1, 2, 3, 4]
  • if split_distinct=True and split_frac=0.5, I expect basis to be [ 0, 0, 0, 0, 0, 0] and test [1, 2, 3, 4]

Are we obtaining that? can we test? or could you tell me which one is the test for this? or maybe I am interpreting the functionality wrongly?

Sorry to ask, but I can't see it directly from the logic and I've been thinking about this and going back and forth for a while now, so my brain is burnt.

I also added comments in the test file in the original PR to elaborate more on my doubts. Could you have a look at those?

Thank you!

@solegalli solegalli changed the title still working on this [WIP] final edits to the class Nov 8, 2021
@solegalli solegalli changed the title [WIP] final edits to the class [MRG] final edits to the class Nov 8, 2021
@gverbock gverbock merged commit 05b7fd6 into gverbock:psi_selector Nov 8, 2021
@solegalli solegalli deleted the psi_selector_3 branch November 16, 2021 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants