Skip to content
This repository has been archived by the owner on Nov 26, 2017. It is now read-only.

JGoogle Package #1480

Closed
wants to merge 29 commits into from
Closed

JGoogle Package #1480

wants to merge 29 commits into from

Conversation

aaronschmitz
Copy link
Contributor

This is my GSoC project. I'd love to get some feedback from the maintainers.

@@ -11,6 +11,8 @@

<xi:include href="packages/database.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />

<xi:include href="packages/google.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#nitpick - strictly speaking, google comes after github alphabetically.

@aaronschmitz
Copy link
Contributor Author

If there's anything else I can do to help speed up the merging process, let me know!

*
* @since 12.2
*/
static protected function arrayToJavascript($array, $escape = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear as to why this is necessary instead of just using json_encode()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe json encode quotes array keys which doesn't seem to suit my use case. I could be wrong though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something breaking by having the keys quoted?

@LouisLandry
Copy link
Contributor

In my initial look-through I have a little bit of concern over the use of SimpleXMLElement for processing XML for GData feeds simply because it is a very memory intensive method of parsing/processing XML. Not as bad as DOM, but the memory usage gets unwieldy quickly for moderate to large XML strings. If we aren't going to try to parse large data files with that though, it could be a non-issue. Have you noticed any sort of memory spike in your usage?

Admittedly I only gave it a quick look-over tonight. I'm going to give it another look hopefully tomorrow if I can find the time. I'm hoping I can also look deeper into some of the other JSOC pull requests... time is just hard to come by at the moment. I need to digest this a little bit before I can give you any more detailed feedback. I left you a couple of line notes after a first pass, and I think we should consider expanding the manual stuff a bit given the sheer amount of functionality that you have added in this pull request.

That being said, dude you absolutely KILLED this thing. I'm incredibly impressed at the code quality, the structure, and the thoroughness that I am seeing in here. I personally cannot wait to get this into the platform and even more so I can't wait to find a reason to use it in one of my projects. You clearly put a lot of thought, time, and passion into this work and it shows. Well done.

@aaronschmitz
Copy link
Contributor Author

Thanks for positive the feedback, @LouisLandry. I can't say I noticed any performance problems, but running this on a dedicated machine with 8GB of RAM probably isn't a standard configuration.

I thought SimpleXMLElement was the preferred method for XML handling after JSimpleXML was depreciated. If there's a more optimal technique let me know. The XML strings can grow rather unwieldy (maybe 100kb) since some queries list several photos/albums at once and each photo/album contains 10-12 long urls in addition to lots of other nodes.

@LouisLandry
Copy link
Contributor

I tend to use a hybrid SimpleXMLElement and XMLReader technique when I have to deal with potentially large XML strings. You can see a pretty good writeup on the issue at http://posterous.richardcunningham.co.uk/using-a-hybrid-of-xmlreader-and-simplexml. You can see how I did this for my JFeedParser class at https://github.com/LouisLandry/joomla-platform/blob/feed/libraries/joomla/feed/parser.php.

The principle is fairly simple, you use XMLReader to stream parse over the larger XML string/file and only use SimpleXMLElement for the sub-elements where you really need that fine-grained control and flexibility. If you are dealing with a long list of "entities" then you stream parse the list and use SimpleXMLElement for processing each "entity" in the list so that you don't end up with massive memory usage.

@aaronschmitz
Copy link
Contributor Author

  1. Ok - it worked fine with quoted keys so I switched to standard json encoding.
  2. Louis, that article is talking about 100,000 - 1,500,000 records in files from 100MB to 1.6GB. Do you know if this performance problem occurs for significantly fewer records (as I recall albums are capped at 2000 photos which are about 4k per record i.e. 8MB as a rough cap)?

@LouisLandry
Copy link
Contributor

The memory usage curve is essentially the same for smaller xml strings as it is for larger ones. Obviously it isn't as impactful for smaller ones, but the concern is the same. I created a very simple test script and 3 sample XML files (trivial elements) with record counts of 200, 2000 and 20,000 to test memory usage. Something like:

<entities>
    <entity>
        <entity_id>1</entity_id>
        <code>AJV11QPL5JS</code>
        <email>Integer.aliquam.adipiscing@Suspendissesagittis.com</email>
        <description>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Curabitur sed tortor. Integer aliquam adipiscing lacus. Ut nec urna et arcu imperdiet ullamcorper. Duis at lacus. Quisque purus sapien, gravida non, sollicitudin a, malesuada id, erat. Etiam vestibulum massa rutrum magna. Cras</description>
        <color>violet</color>
    </entity>
    ...
</entities>
  • What I found was that with no XML processing the memory usage peaked at 768KB.
  • Processing 200 entities memory usage peaked at 768KB as well.
  • Processing 2000 entities memory usage peaked at 1.75MB.
  • Processing 20000 entities memory usage peaked at 8.5MB.

Now the interesting thing about this is that there was no real depth to my XML. If the XML starts adding extra levels of depth at the entity level then this memory footprint starts to get larger pretty quickly. I'd say that we aren't talking about a deal breaker, but Joomla is supposed to be built for the least common denominator from a hardware requirements perspective... at least as much as is possible. That would indicate that it is something to consider.

@aaronschmitz
Copy link
Contributor Author

Sorry for the delay - school starts this week so I've been a bit busy.

Looking at the docs for XMLReader, it appears as though the main reason XMLReader is more efficient is that is reads on-demand from the document stream rather than loading the raw file into memory (not 100% sure on this).

To make XMLReader a viable solution it appears as though I would need to make some considerable changes to the classes. While for your feeds application everything is a simple GET request as far as I can tell, for Picasa everything is authenticated via OAuth with some fun POST requests and fancy HTTP headers.

From what I can tell, the HTTP headers can be moved to query parameters and the two queries prone to large results (list photos and list albums) are both GET so technically it would be feasible to switch to using XMLReader, but unfortunately it would require substantial changes to how the code is currently implemented because right now all IO is handled by JHTTP through the JOauthV2Client class.

Perhaps long term it would be good to have explicit xml and json handling available in the OAuth class, but it's not a quick addition. Moreover, my Picasa class is poorly designed in the sense that at present it just returns all list data in a big array of objects (i.e. no more efficient than loading the whole string at once). Perhaps I could make available an option to use an iterator to loop through and return each object for processing rather than storing them all at once.

Do you think these changes would be worthwhile? It seems like it would be a lot of work which isn't exactly well-timed considering the heavy course load I have this semester (read: it won't get done quickly).

TL;DR: To really improve the efficiency would take a lot of work. Is it worth it?

@LouisLandry
Copy link
Contributor

OK, that's fair. I'd rather err on the side of getting this merged than worry to much about it, but I think we will want to have a go at optimizing that at some point. Your use of JResponse is the only thing giving me pause here then. Is there a way we can just use JApplication::redirect() instead? JApplicationWeb doesn't rely on JResponsebut encapsulates that logic in it's own header based methods.

@elinw
Copy link
Contributor

elinw commented Sep 12, 2012

@LouisLandry Do you want JOAuth instead of JOauth?

@aaronschmitz
Copy link
Contributor Author

Ok @LouisLandry. Now using $application->redirect(). That took more work than I expected.

@aaronschmitz aaronschmitz mentioned this pull request Oct 4, 2012
@aaronschmitz
Copy link
Contributor Author

Ok, @LouisLandry, I renamed it as requested.

@LouisLandry
Copy link
Contributor

OK, I'm pretty much ready to get this merged into 12.3. There are two things I'd like done before we do that ... hopefully you have a little time @aaronschmitz. The first is that I need you to change all of the 12.2 to 12.3 obviously. Secondly, I would really love it if you could squash some of these commits. Not all, but some. Whatever seems logical to you. It feels like 116 commits is an awful lot of "cruft" for this. I know it is a ton of work, not questioning that, but it would just be cleaner that way.

I'm going to close the request for now. It's been marked for 12.3 inclusion so I'm obviously not rejecting it. Once you've got the @SInCE tags sorted out (and hopefully a little commit squashing) please re-open the pull request and we'll get it merged. Thanks a bunch for all the work that went into this. Totally looking forward to using it.

@LouisLandry LouisLandry closed this Oct 9, 2012
@aaronschmitz
Copy link
Contributor Author

Well, I changed the tags and squashed it to 28 commits. Not sure how to proceed...

@LouisLandry LouisLandry reopened this Oct 10, 2012
@LouisLandry
Copy link
Contributor

I've reopened it. Once i get a positive result from the pull tester I'll merge it in. Thanks!

@LouisLandry
Copy link
Contributor

I rebased and merged this manually. Thanks a bunch Aaron!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants