New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix [LB-68] Memory Leak | Scraper #72
Fix [LB-68] Memory Leak | Scraper #72
Conversation
Cool. Just fix the indentation please. |
I really need to work on my indentation. :) |
bbbf3bd
to
50bef41
Compare
I suggest to use additional tool that can check all this and more for you. I don't know which editor you use, but take a look at pylint. |
I don't see how this could do anything, because you can't delete local variables in JavaScript, only variables that are properties of an object. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/delete At best you can assign |
My earlier implementation was assigning Will try some tests, otherwise revert the changes. :) |
I did some tests using the Memory Profiler in Chrome. I did it for 3 scripts, original_script, script_with_null, script_with_delete
I am in favour of using |
You could've just tried > (function () { var foo = {a: 1}; delete foo; console.log(foo); }());
Object {a: 1} // foo is still referenceable, so can't possibly have been GC'd |
Git uses 1tab=8spaces, while the LB code is written in 1tab=4spaces, with tab_expand enabled. Fixed my styling issues.
Ready for merge. :) |
How many pages did you try to import? We have reports of it using lots of memory for users with over 6000 pages of data. It'd be good if you can test it on this too. Can you show screenshots of memory usage with and without the Are you also able to measure the usage using Firefox? |
I used I tried with firefox but could not get a hang of it. :( |
I have setup a page to test the Would be posting the screenshot soon, but anyone else can also test and comment their experience. |
@alastair The above images show the frame rates during a test of submitting Complete result dump can be found at https://github.com/pinkeshbadjatiya/memory-leak-test-ListenBrainz/tree/gh-pages/testresults |
This is kind of a wild guess from looking at the code a bit thinking of other ways to help with memory, but the recursive calls within reportScrobbles (in xhr.onload - line 177) look like they might use a fair bit of memory if a bunch of the requests were coming back with errors or otherwise failing consecutively. Not sure what the GC does with all that, but it might not get to the part where xhr is set to null until a successful request goes through. |
That's not actually recursive, because |
I spent some time digging into this with the chrome memory profiler. Here is what I found:
I want to do a few more things here:
|
* Add first test for request consumer * Use python -m pytest instead of py.test py.test doesn't add the current dir to PYTHONPATH which is why if you try to add a test anywhere other than listenbrainz_spark/tests you'll get an import error. this fixes that problem. * fix definition of self * Import create_app * Add test for get_result
* add tests for create_dataframes modify utils to avoid prepending hdfs_cluster_uri to every path * unit tests for create_dataframes * unit tests for train models * Add tests for request consumer and fix test.sh path problems (#72) * Add first test for request consumer * Use python -m pytest instead of py.test py.test doesn't add the current dir to PYTHONPATH which is why if you try to add a test anywhere other than listenbrainz_spark/tests you'll get an import error. this fixes that problem. * fix definition of self * Import create_app * Add test for get_result * declared constant var as a class member changed class names for consistency changed test_get_dates... to assert difference used assertListEqual for lists removed wildcard imports typos and newlines * unit tests for candidate_sets.py and recommend.py * defined date as an instance variable Co-authored-by: Param Singh <iliekcomputers@gmail.com>
* used mbids->msids mapping in create_dataframes used pyspark API over sql queries * fixed bad indent * Use combined mapping (recording_artist_msid_mbid) and not recording and artist mapping independently * Use artist_credit_artist_credit relation and recording_artist_mbid_msid mapping to generate recommendations * vertical align pyspark functions to increase readability * Update utils.py * Unit test listenbrainz_spark/recommendations (#74) * add tests for create_dataframes modify utils to avoid prepending hdfs_cluster_uri to every path * unit tests for create_dataframes * unit tests for train models * Add tests for request consumer and fix test.sh path problems (#72) * Add first test for request consumer * Use python -m pytest instead of py.test py.test doesn't add the current dir to PYTHONPATH which is why if you try to add a test anywhere other than listenbrainz_spark/tests you'll get an import error. this fixes that problem. * fix definition of self * Import create_app * Add test for get_result * declared constant var as a class member changed class names for consistency changed test_get_dates... to assert difference used assertListEqual for lists removed wildcard imports typos and newlines * unit tests for candidate_sets.py and recommend.py * defined date as an instance variable Co-authored-by: Param Singh <iliekcomputers@gmail.com> * upload mappings, artist relation and listens to HDFS (#77) * download listens and mapping from FTP upload listens and mapping to HDFS * calculate time taken to download files from ftp * correct listenbrainz listens dump path on ftp * add misiing import and remove extra func arg * upload and download script for artist relation * add *force* utility to delete existing data * rectify name of imports, delete unused import files * update tests and recommendation engine with mdis_mbid_mapping * use NotImplementedException to catch null callback * Add archival warning * Unit tests for HDFS/FTP module (#698) * download listens and mapping from FTP upload listens and mapping to HDFS * calculate time taken to download files from ftp * correct listenbrainz listens dump path on ftp * add misiing import and remove extra func arg * upload and download script for artist relation * add *force* utility to delete existing data * rectify name of imports, delete unused import files * update tests and recommendation engine with mdis_mbid_mapping * use NotImplementedException to catch null callback * tests for ftp downloader and hdfs uploader for mapping, listens, artist relations * update func name in utils and add test for it * Fix startup script of spark-request-consumer * add pxz.wait to avoid race condiiton modified utils.create_dataframe to warp spark row object in list * improve function names * define constants on top of file and import them in tests Co-authored-by: Param Singh <iliekcomputers@gmail.com> * Attributes for candidate recording dumps (#710) * get attributes for candidate recoridngs dump * update unit tests with changes in recommed.py * typo in function name * import utils as module and not attribute Co-authored-by: Param Singh <iliekcomputers@gmail.com>
- importer uses lots of ram
GC takes its own time to clean up the xhr variables in closure, which is slower than the amount at which requests were happening. By manually deleting the reference, ram usage is reduced.