Skip to content

[DRAFT]feat: boost revise_paragraph_classification#52

Open
prnake wants to merge 1 commit intomiso-belica:mainfrom
prnake:main
Open

[DRAFT]feat: boost revise_paragraph_classification#52
prnake wants to merge 1 commit intomiso-belica:mainfrom
prnake:main

Conversation

@prnake
Copy link

@prnake prnake commented Mar 4, 2025

This is almost a draft for discussion, including several changes:

Base on https://gist.github.com/prnake/e40fb3dd9b0af1f7a5fc73f2ee5236e9, it's much faster than original implementation with the same result, reduce time complexity from O(n^2) to O(n).

===== Test data size: 1000 =====
Performance comparison for 1000 paragraphs:
Original function time: 0.055953 seconds
Optimized function time: 0.001080 seconds
Performance improvement: 51.79x
Results consistency: Pass

===== Test data size: 5000 =====
Performance comparison for 5000 paragraphs:
Original function time: 1.181464 seconds
Optimized function time: 0.005710 seconds
Performance improvement: 206.91x
Results consistency: Pass

===== Test data size: 10000 =====
Performance comparison for 10000 paragraphs:
Original function time: 4.699047 seconds
Optimized function time: 0.011343 seconds
Performance improvement: 414.28x
Results consistency: Pass

===== Test data size: 20000 =====
Performance comparison for 20000 paragraphs:
Original function time: 18.514682 seconds
Optimized function time: 0.022034 seconds
Performance improvement: 840.28x
Results consistency: Pass

@prnake prnake changed the title [Draft]feat: boost revise_paragraph_classification [DRAFT]feat: boost revise_paragraph_classification Mar 4, 2025
@miso-belica
Copy link
Owner

Thank you for the changes. Can you write tests for these, please? It seems like a lot of code with quite a big impact. Also, it's 3 different changes so maybe we could split the smaller ones into PRs to merge them faster.

@prnake
Copy link
Author

prnake commented Mar 5, 2025

Thank you for the changes. Can you write tests for these, please? It seems like a lot of code with quite a big impact. Also, it's 3 different changes so maybe we could split the smaller ones into PRs to merge them faster.

Sure, I’ll do these later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants