Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize metric mappings #51

Merged
merged 11 commits into from
Jun 6, 2024
Merged

Conversation

gcalcedo
Copy link
Owner

@gcalcedo gcalcedo commented Jun 6, 2024

Resolves #50.

This PR includes considerable changes to the core functionality.

  • Metric maps are now written to filesystem as a dataframe with every sampling combination, instead of writing the complete map. This results in a CSV size orders of magnitude smaller.
  • Metrics now use **kwargs instead of explicit arguments. The motivation comes from having metrics require different inputs, but still needing a homogeneous invoking call.
  • The complete BERTopic pipeline is removed from the metric sampling. Instead, only clusters are computed in this step. Similarly, dimensionality reduction is now performed once per run of the MetricMapper. Both of these changes result in a substantial performance improvement.
  • A progress bar with tqdm is added for better visualization.

@gcalcedo gcalcedo merged commit 029b69d into develop Jun 6, 2024
@gcalcedo gcalcedo deleted the feat/optimize-metric-mappings branch June 6, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize metric mappings
1 participant