Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests are a bit flaky with the new redis backend #95

Closed
Ch4s3 opened this issue Jan 6, 2017 · 19 comments · Fixed by #97
Closed

Tests are a bit flaky with the new redis backend #95

Ch4s3 opened this issue Jan 6, 2017 · 19 comments · Fixed by #97

Comments

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 6, 2017

It looks like the tests are a bit flaky now and seem to hang from time to time.

@ibnesayeed
Copy link
Contributor

I usually test in isolated docker containers and I didn't notice anything unusual, except it is talking longer than memory backend. Additionally, the integration test is taking noticeably longer as it needs to train about 7K instances, classify 1K instances, untrain 2K instances, and classify 1K instances again, both for Memory Redis. If the Redis is not running or cant connect, those tests are skipped. I think the lag is coming from the periodic save operations when Redis tries to log the write operations into dump.rdb file. However, since we don't need persistence on disk, we can disable this behavior. I will give it a try and will see if the speed improves.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 6, 2017

Let try disabling that. If the slower tests are mostly due to the larger test case, I'm fine with that. We aren't running tons of builds back to back, so I think better test quality beats speed here.

@ibnesayeed
Copy link
Contributor

Data size does affect the testing time. I have refactored the code to make it configurable as constants rather than hard coding numbers in the test itself.

TRAINING_SIZE = 4000, TESTING_SIZE = 1000 => About 17 sec.
TRAINING_SIZE = 1000, TESTING_SIZE = 100 => About 4 sec.

@ibnesayeed
Copy link
Contributor

I would much prefer to move the backend initialization and configuration to be run once per test suit rather than for each test case. However, MiniTest does not provide this functionality out of the box. It is possible with custom runner though.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 6, 2017

If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit.

@ibnesayeed
Copy link
Contributor

If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit.

I am fairly new to actually writing tests, so I will have to understand the dynamics of what is involved and how would that custom runner work. Test unit had various lifecycle callback methods including before and after each test case and suit, but for some reason Minitest decided not to have them available. I initially tried writing an initialize method in my test class, but it turned out that it was going to be called for each test case, so it was not going to help. However, we can perhaps cache the instance variable that holds the reference to Redis backend. On the other hand, I feel like it will be a micro-optimization, because currently the slowest test is the integration one with lots of data.

That said, #97 can still be merged and we can revisit custom runner option later.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 6, 2017

I'll leave this open for now so we can continue to discuss and explore.

@ibnesayeed
Copy link
Contributor

ibnesayeed commented Jan 6, 2017

It looks like writing the custom runner will not benefit us in this case. I did some experiments and here are my findings:

  • Caching instance variables (such as @redis_backend or @classifier) that are defined in the setup method is not going to help because each test case runs as independent instance with its own initialization.
  • Changing those instance variables to class variables (such as @@redis_backend) would allow caching, but does not improve the over all execution time (evident from the empirical results).
  • Connecting to the Redis server and creating the @redis_backend instance variable takes less than 0.3ms on my machine (average ≈0.25ms). We are currently running 23 tests that require @redis_backend. We are also creating an @alternate_redis_backend each time (though it is only needed in some tests). This means we are attempting to connect to Redis 23 * 2 = 46 times. The total time would be around 23 * 2 * 0.25 = 11.5ms.
  • Now, is it really worth the effort? Needless to mention the added complexity it will bring in the test code along with the ugliness of putting data in the class rather than instances.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 6, 2017

Excellent profiling. I'd say based on your results, it's safe to close this.

@Ch4s3 Ch4s3 closed this as completed Jan 6, 2017
@ibnesayeed
Copy link
Contributor

ibnesayeed commented Jan 6, 2017

The integration test is where all the time is consumed, here are some stats based on different values of training and testing data sizes (time values are sum of Memory and Redis backends for the same task):

TRAINING_SIZE = 4000
TESTING_SIZE = 1000

Load data (tarining: 4000) + (testing: 1000) => 2.80472ms
Train 4000 records => 5790.959149ms
Test 1000 records => 2065.361196ms
Untrain 2000 records => 6426.361111ms
Test 1000 records => 2320.733856ms

Time spent in Memory classifier => 410.059577ms
Time spent in Redis classifier => 16948.311882ms
TRAINING_SIZE = 1000
TESTING_SIZE = 100

Load data (tarining: 1000) + (testing: 100) => 0.760892ms
Train 1000 records => 1522.433455ms
Test 100 records => 200.561411ms
Untrain 500 records => 1623.415373ms
Test 100 records => 193.476481ms

Time spent in Memory classifier => 85.74332ms
Time spent in Redis classifier => 3573.399931ms
TRAINING_SIZE = 100
TESTING_SIZE = 10

Load data (tarining: 100) + (testing: 10) => 0.11450099999999999ms
Train 100 records => 151.393102ms
Test 10 records => 22.735926ms
Untrain 50 records => 187.779914ms
Test 10 records => 18.576174ms

Time spent in Memory classifier => 10.057782ms
Time spent in Redis classifier => 363.12243900000004ms

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 6, 2017

Cool. Let's keep an eye on it. I think after #97 its pretty good.

@ibnesayeed
Copy link
Contributor

I have added some more data in the previous comment to indicate how much accumulative time the integration test took individually in both the backends. It looks like Memory is about 40 times faster than Redis, but obviously the benefit of Redis is in persistence and collaboration.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 6, 2017

not terribly surprising. We may want to document that.

@ibnesayeed
Copy link
Contributor

not terribly surprising. We may want to document that.

With #98 it is not only documented, but reproducible as well. By documenting if you mean we should add it in the README file then yes, we can add a summarized note there.

Ch4s3 pushed a commit that referenced this issue Jan 7, 2017
* Disabled Redis disc persistence and refactored integration test, fixes #95

* Added Bayes backend benchmarks
@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 7, 2017

Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work.

Ch4s3 pushed a commit that referenced this issue Jan 7, 2017
* Disabled Redis disc persistence and refactored integration test, fixes #95

* Changed test class and file name as per #84
@ibnesayeed
Copy link
Contributor

ibnesayeed commented Jan 7, 2017

Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work.

I have added brief note about the performance of the Redis backend in PR #103. However, I agree that the README needs some restructure with more headings and sub-headings to reorganize the information that was accumulated over the years.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 7, 2017

Yeah, the readme is a bit of a mess, but shouldn't be too hard to reorganize. I'll probably throw up a quick github page or something similar to help with discovery as well.

Ch4s3 pushed a commit that referenced this issue Jan 7, 2017
* Disabled Redis disc persistence and refactored integration test, fixes #95

* rb-gsl gem is now maintained as gsl

* Documented Redis backend performance
@ibnesayeed
Copy link
Contributor

I'll probably throw up a quick github page or something similar to help with discovery as well.

This is a rather good idea actually. We can just use the /docs folder to keep the static documentation files and configure the repo to automatically serve static pages from the /docs folder of the master branch. This way the documentation will be maintained along with the code in the master branch.

@Ch4s3
Copy link
Member Author

Ch4s3 commented Jan 8, 2017

That sounds good.

Ch4s3 pushed a commit that referenced this issue Jan 8, 2017
* Disabled Redis disc persistence and refactored integration test, fixes #95

* Added Dockerfile with documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants