Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note to self: Execution time is unnecessarily high #61

Open
jacobwb opened this issue Nov 14, 2014 · 11 comments
Open

Note to self: Execution time is unnecessarily high #61

jacobwb opened this issue Nov 14, 2014 · 11 comments
Assignees
Milestone

Comments

@jacobwb
Copy link
Owner

jacobwb commented Nov 14, 2014

Consider the following:

  1. XML parse time for 10 files: 0.0003 seconds.
  2. HashOver execution time with 10 XML files: 0.0025 seconds.

Now let's say there's 1000 files:

HashOver's execution time would be 2.5 seconds, whereas the XML's parse time is only 0.3 seconds. That means 88% of the execution time is HashOver itself, considering HashOver is doing very little work to display the comments, and most of it is done in JavaScript on the client's web browser, this is unacceptable. It needs to be fixed before HashOver becomes like Disqus :-)

Memory is also high as well, with script memory peak of 0.51Mb and system memory peak of 0.75Mb for only 10 XML files. These numbers should be around 0.22Mb for 1000 files not 10.

@jacobwb jacobwb self-assigned this Nov 14, 2014
@jacobwb jacobwb added this to the 2.0 milestone Nov 14, 2014
@Lux-Delux
Copy link
Contributor

Does it have to do with the way PHP parses XML files? I'm just trying to educate myself a bit more with PHP, found for example this interesting tip
http://stackoverflow.com/a/1835324

If that even covers a part of this problem

@jacobwb
Copy link
Owner Author

jacobwb commented Nov 14, 2014

@Lux-Delux

I've looked into XMLReader, and various other methods of reading XML files. What I've found is that XMLReader is indeed faster, but only on large files. Since HashOver uses many small files, and uses everything in each file, XMLReader isn't really practical, and isn't faster either.

However, currently what's happening is each file is read and parsed separately, meaning simplexml_load_file() is called multiple times. What I've done so far is changed it to load each file into a single string and then just call simplexml_load_string() once on that large string. This has improved the performance a little, it looks like by 17.64%. So I may do the same thing with JSON.

That being said, 88% of the execution time is spent on HashOver's other functions, and that is the problem that really has to be addressed. XML is slightly slower to parse than JSON, and some configurations of SQL, but the difference between them is quite small. To demonstrate this I recently wrote this code: https://github.com/jacobwb/lorem-ipsum-article-generator

@jacobwb
Copy link
Owner Author

jacobwb commented Jan 16, 2015

It turns out I did my math wrong. The execution time above is for 10,000 files not 1,000. For 1,000 files the execution time for the XML is 0.03 seconds, and 0.25 seconds total. The percentages are the same, and the memory usage is still high for only 10 files, so I won't close this just yet.

@taophp
Copy link
Contributor

taophp commented Jan 19, 2016

It seems to me that massive disk Inputs/outputs are often an explanation for slowness. Have think about store all comments in only one file ? You may concatenate xml strings, one per comment in a single file for a page, so you could easily add one.

@jacobwb
Copy link
Owner Author

jacobwb commented Jan 25, 2016

@taophp

It has been considered, but ultimately rejected. If all comments are in a single file, a single XML error or wrong file permission would bring down every comment, and it would make finding and fixing the XML error very difficult. This separated, individual file nature of XML is one of its benefits.

@taophp
Copy link
Contributor

taophp commented Jan 26, 2016

I did not suggest to use only one XML file but to concatenate XML files (standing in memory) in one RAW file to reduce disk I/O. Have you considered using phardata http://php.net/manual/en/class.phardata.php ?

@jacobwb
Copy link
Owner Author

jacobwb commented Jan 26, 2016

@taophp

Very interesting idea. While this may technically introduce the same issues I mentioned before (corrupt the archive file and you lose all the comments, or fail to update the archive file and new or edited comments won't appear), I would not be opposed to this feature for all of the flat-file storage formats (XML and JSON). Provided it actually improves performance.

You may have noticed the odd repository on my account named "lorem-ipsum-article-generator". That repository was created during a discussion I was having with some people on Google+ about flat-file storage format performance. The discussion involved running those tests in RAM to remove any possible performance bottlenecks from filesystem read/write, and if I remember correctly, the read/write from the hard drive wasn't very detrimental to performance.

@SikoSoft
Copy link

Stupid question perhaps, but: why not JSON?

What is the need for storing all the data as XML files?

JSON encoding/decoding is built into PHP core, is incredibly reliable and fast, and the files themselves will be considerably smaller.

Am I missing something?

@mro
Copy link

mro commented Aug 11, 2016

is server side parsing neccessary at all (be it json or xml IMO doesn't matter too much). For a different endeavor I do render xml to html in the browser. Both, for tiny xml as well as consolidated, larger ones. See (oh, the server is down right now for another 2h) http://rec.mro.name/stations/b2/2016/08/10/

@jacobwb
Copy link
Owner Author

jacobwb commented Aug 19, 2016

@SikoSoft

JSON is being considered, actually.

I would recommend you read the "JSON" section of the "Data storage format" issue.

While it's true that JSON encoding/decoding is built into PHP core, so is XML. Also, the performance difference isn't very significant, especially compared to a database, in fact JSON performs worse than XML during writing the file. Here is some code I wrote that tests the read/write performance of a few different data formats: lorem-ipsum-article-generator

The main fault with JSON is that it isn't easily readable for a human, whereas XML is, which is why it is the current default format. When I say "easily readable for a human", I mean if you need to edit a comment file manually, it would be quite difficult since the entire -- perhaps multi-paragraph -- comment would be on a single line (with or without pretty print), and this is very inconvenient.

@jacobwb
Copy link
Owner Author

jacobwb commented Aug 19, 2016

@mro

Yes and no.

Technically speaking the XML files could indeed be used as-is.

However, where enabled, the comment files will store sensitive user information, such as e-mail addresses and passwords. For this reason the files cannot simply be displayed as-is on a web page, or transmitted asynchronously via XMLHttpRequest in JavaScript, some parsing is required to remove this information before sending the comments to visitors.

Specifically, HashOver uses the files to generate HTML necessary to display the comments in a structured and style-able way, as well as load and display avatars and remotely hosted images, etc. The parsing that is done is to display the comments in a user-friendly way, I doubt this could be done by simply displaying the XML files themselves alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants