Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page crashes after 4,500 Quora answers downloaded #97

Open
vttoth opened this issue Oct 14, 2018 · 7 comments
Open

Page crashes after 4,500 Quora answers downloaded #97

vttoth opened this issue Oct 14, 2018 · 7 comments

Comments

@vttoth
Copy link

vttoth commented Oct 14, 2018

I am trying to download my Quora answers (over 6,000). When the count gets to 4505, the page crashes (Chrome crash, the "aw snap" message.)

@eloquence
Copy link
Owner

Thanks for the report, trying to reproduce now.

@eloquence
Copy link
Owner

Hi Viktor, the browser crash was most likely due to running out of memory. I can see if I can improve memory management during the extension run, but in the meantime, ensuring sufficient free memory before attempting the download should fix the issue.

How much is sufficient? I was able to download 6,072 answers for you on a machine with 16GB RAM, but I did have to end all other applications before it would go all the way through (it did crash on the first run, with an out of memory error from the operating system).

I do have the 6,072 answers in JSON format if that would be helpful and would be happy to email them to you, just ping me at eloquence AT gmail DOT com.

@vttoth
Copy link
Author

vttoth commented Oct 15, 2018

Thanks for the response. This machine (Windows 10 64-bit) actually has 32 GB of RAM, yet Chrome crashes (with plenty of free RAM remaining). But I was able to download my stuff just fine on a Linux machine (also 32 GB). In fact, I just came back to report that fact when I saw your message. Thanks for the quick support!

@vttoth
Copy link
Author

vttoth commented Dec 11, 2018

I am now running into the same issue on Linux, too. After a little less than 4000 answers, aw, snap, says Chrome. Chrome is up to date, machine has 32 GB of RAM, same issue occurs on a machine with a lot less memory, same issue occurs on Windows 10 and Windows Server 2016. Suggestion (forgive me if it is just incompatible with the plugin architecture): Would it be possible to break up the download into, say, 1000-answer chunks?

@vttoth
Copy link
Author

vttoth commented Dec 11, 2018

I should have added, Chrome is up-to-date and all other plugins were disabled.

@eloquence
Copy link
Owner

Thanks for the report, and sorry you're now experiencing this issue on all machines. Unfortunately I don't see an obvious way to split the download into chunks. We're basically pretending to keep paging through the content the way a user would, and there does not appear to be any support for offsets in Quora's internal APIs, at least not in a way that I can determine from the highly obfuscated nature of the network requests.

There is one other technical avenue which could work, which is the https://www.quora.com/content set of pages, which is at least segmented by year. The downsides:

  • It's only accessible for the logged in user, which makes it hard for me, for example, to test with larger accounts (I only have a handful of answers on mine).
  • Each answer is its own page, which could have its own problems in terms of performance and reliability.

Since that's a possible dead end, and hard for me to test, I'm not going down that road yet, but I would encourage others to try that approach as well.

As far as I can tell, the problem with our current approach is that memory usage keeps growing with each request, even though elements are removed from the DOM as we go. I suspect there are standard optimization techniques we can use to make sure the process frees up more memory as it goes, which then would reduce the "Aw, snap!" likelihood dramatically. That seems the most fruitful avenue to dig into further, but it would take a few hours of research, so will take me a while to get into.

If you yourself are interested in poking at the extension, and would like a code walkthrough, please do let me know, and I'd be happy to assist with that.

@eloquence
Copy link
Owner

eloquence commented Jan 22, 2019

I did a bit more poking today to see if I can do anything in the extension itself to improve memory usage.

Unfortunately, my preliminary investigation suggests that the increased memory usage as we load more and more answers is caused by the code that Quora itself runs. Beyond just rendering the answers, it holds references to them in memory, which the extension cannot clear out.

Your best bet right now is to use Quark, which is a Firefox extension. It doesn't let you publish your answers to FreeYourStuff.cc, but it does let you download them: https://addons.mozilla.org/en-US/firefox/addon/quark/

Quark does what I'm suggesting above, which is to spider https://www.quora.com/content year-by-year and answer-by-answer. That approach is much less prone to memory leaks. I've taken a quick look at the code, and it doesn't look like it's doing anything evil. :)

The biggest problem with this approach is still that I can't easily test it with accounts other than my own, as https://www.quora.com/content is per-user, whereas the "answers" URL is public. Since I only have a few answers, I'm worried that if I switch to that approach, I'll lose the ability to test. That said, it may be worth offering it as an experimental option at least.

In any case, if you haven't already done so, it would be useful if you could give Quark a spin and let me know if it works on your account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants