Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Transformer] ChiMergeDiscretisor #459

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Morgan-Sell
Copy link
Collaborator

@solegalli
Copy link
Collaborator

Hi @Morgan-Sell

Thanks for kicking this off.

I reckon this one is not ready for review, right?

It would be great to have some tests with the expected result for the transformer.

Thank you!

@Morgan-Sell
Copy link
Collaborator Author

hola @solegalli,

Correct, it's not ready for review. I'm still working on it. And, yes, I'll create tests.

@Morgan-Sell
Copy link
Collaborator Author

Morgan-Sell commented Jun 14, 2022

@solegalli,

Should we avoid using dataframes and instead use dictionaries and numpy arrays? I suspect iterating through dataframes increases computational costs.

I'm going to keep the question, but I think the answer is "yes". Dictionaries and numpy arrays simplify the merging of frequency distributions ;)

…hod now returns a 2-d numpy array and 1-d numpy array instead of a dictionary.
…hod now returns a 2-d numpy array and 1-d numpy array instead of a dictionary.
… New method is incomplete. Issue with some of the chi-square calculations. It only happens w/ certain distributions
…the first 2 and last 2 chi-square values do not match expected results. meanwhile, the other 9 chi-square values match. unsure what is the cause of the discrepancy
@Morgan-Sell
Copy link
Collaborator Author

hola @solegalli,

I think I need an extra set of eyes. I'm struggling to identify what is causing the error for test_chi_merge(). I believe the root cause is in _calc_chi_square(); however, I cannot identify where.

In test_chi_merge(), the expected results are [4.1, 2.4, 8.6, 2.9, 1.7, 1.8, 2.2, 4.8, 4.1, 3.2, 1.5, 3.6]. These are the results stated in the Chi-Merge paper.

The transformer returns the following results: [5.9, 5.2, 8.6, 2.9, 1.8, 1.8, 3.1, 4.8, 4.1, 3.2, 1.8, 13.0].

The above values are the results from the chi-square tests of the consecutive distributions. 5 of the 12 expected results are incorrect.

Indices of the values that don't reconcile: 0, 1, 6, 10, and 11.

Do you see the bug?

@Morgan-Sell
Copy link
Collaborator Author

Morgan-Sell commented Jul 6, 2022

hola @solegalli,

Did you have a chance to look at this bug? I'm stumped.

@Morgan-Sell
Copy link
Collaborator Author

hi @solegalli,

Are you still getting around to reviewing this discretizer? I think it's super cool!

I know you're quite busy. I'm trying to organize myself.

@solegalli
Copy link
Collaborator

Still pending. I send you an email? would that work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new discretisation transformer: chimerge
2 participants