Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance when parsing many strings in the same format #28

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

robin-xyzt-ai
Copy link

Proposal for #17 .

By keeping track of which rules were used to parse the first string, parsing the next strings can try to use a matcher that only uses a subset of those rules.

The case in the benchmark is between 2 and 3 times faster on my machine:

Benchmark                                                          Mode  Cnt     Score     Error  Units
OptimizeForReuseSimilarFormattedBenchmark.optimizedForReuseParser  avgt    6   462.362 ±  54.300  ms/op
OptimizeForReuseSimilarFormattedBenchmark.regularParser            avgt    6  1130.171 ± 162.117  ms/op

Copy link

@akash-yadagouda akash-yadagouda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit more explanation on code.

* Some additional comments
* Some small tweaks to the code in an attempt to improve readability
* Extended the test case a bit
@robin-xyzt-ai
Copy link
Author

I tried to make the code a bit more clear by leaving some additional comments and doing a bit more code cleanup.

Let me know if there are specific parts that are still unclear.

pom.xml Outdated
@@ -5,7 +5,7 @@
<modelVersion>4.0.0</modelVersion>

<groupId>com.github.sisyphsu</groupId>
<artifactId>dateparser</artifactId>
<artifactId>dateparser-xyzt-ai</artifactId>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow, this was a mistake.
This was done for our local fork, and should not have been included in this PR. Will revert this for the PR.

@robin-xyzt-ai
Copy link
Author

Looks like this PR isn't ready to be merged. The following test fails:

    @Test
    void foo() {
        DateParser parser = DateParser.newBuilder().optimizeForReuseSimilarFormatted(true).build();
        String inputString = "2022-08-09 19:04:31.600000+00:00";
        assertEquals(parser.parseDate(inputString), parser.parseDate(inputString));
    }

I'm afraid it will require a fix for #29 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants