Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code optimization #33

Merged
merged 6 commits into from
Mar 13, 2022
Merged

Code optimization #33

merged 6 commits into from
Mar 13, 2022

Conversation

zStupan
Copy link
Collaborator

@zStupan zStupan commented Mar 13, 2022

  • Updated get_rules to accept algorithm as a string of the class name and set algorithm parameters from kwargs
  • Replaced Dataset.feature_report() with a __repr__ method, which outputs detailed information about the dataset, including: Number of transactions, Number of features, Data types, min, max, categories of each feature.
  • Optimized code for performance by:
    • Converting Feature and Rule classes to use __slots__ (this means constant attribute access and less memory usage)
    • Optimized the __post_init__ method to not use DataFrame.agg and using numpy arrays for storing the boolean mask of transactions containing the antecedent and/or consequent and then just summing those masks instead of subindexing transactions
  • Fixed Yule's Q computation
  • Updated CLI docs

@zStupan
Copy link
Collaborator Author

zStupan commented Mar 13, 2022

I'm still not sure if I got Yule's Q correct.. The Definition in that IEEE article gives a different result to this one

EDIT: I've decided to use the wikipedia definition, which is basically the same as the one in the IEEE paper, just more compact.

@zStupan zStupan merged commit ad99667 into firefly-cpp:main Mar 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant