Is speed an issue? #18
Comments
Ok, I've done some ghetto profiling, and I think the line
in Creating a query which returns more records (~1'400) makes the discrepancy worse:
I'd be interested in trying to improve this. Is there a faster XML library? Does one really need to convert the whole XML to a list? Would JSON be quicker? |
More updates:
|
OK, I've done some further benchmarking -- it seems that we can get some good speed increases if we use @hiratake55 -- I think refactoring the whole library will be a challenge to do in one shot, what is your appetite to do this one function at a time? It will mean requiring both xml libraries and both curl libraries (as some functions are likely to use the one, and others the other. |
Hi @ax42 , Thank you for contacting me, I'll check xml2 package fits RForcecom or not. |
@ax42 Have you considered using the Bulk API features of the package? Below are some timings with roughly 650K records and I've pulled 50K in a second or two. Salesforce caches the queries, so they become faster if you repeat them. Note: rforcecom.bulkQuery is a convenience wrapper I've written around
rforcecom.bulkQuery
|
That's cool, thanks. I'd not seen the bulk API pieces as I seem to be using v0.7 (off CRAN). One thing the bulk query does not seem to be able to do is deal with foreign keys (e.g. fetching the details of an Account owner). SalesForce returns: So it's probably really useful in some situations (straight dumps) but not in others (complex queries), although it may be faster to pull straight dumps off SF and combine them in R than run complex queries. |
Yes, for large joins I would recommend pulling straight dumps of each object and joining in R (I use the If you want to experiment with some of the Bulk functions (since it's not on CRAN yet) you can install from the maintainer's Github or mine Github Install
|
I've installed the github version from the maintainer, and just copy/pasted the code you kindly provided. I'll try and get some benchmarks done over the next few days (although it seems our SalesForce instance is a lot smaller than yours). My workflow so far has been to use http://dataloader.io to help me formulate my queries, and then I run them with The advantage of letting SalesForce do the joins is that you don't have to worry about consistency and you always get your dataset back exactly like you want it (especially if you are calling a bunch of lookup fields in a query). Each approach has its applications in the right place, and having both available in the library is great! |
Any update on this, my queries from SFDC are extremely slow thus far. |
Hi
RForceCom feels slow -- it's taking about 4 minutes to execute a query which returns about 18'500 records with 9 variables.
I've not started tracing through this to figure out where the issue could be (SF? Network? XML vs JSON?). What's the best way to start digging into this issue?
The text was updated successfully, but these errors were encountered: