Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fewer than total number of rows returned #21

Closed
dmfenton opened this issue May 18, 2015 · 32 comments
Closed

Fewer than total number of rows returned #21

dmfenton opened this issue May 18, 2015 · 32 comments
Assignees
Labels

Comments

@dmfenton
Copy link
Contributor

  1. http://koop.dc.esri.com/socrata/seattle/3k2p-39jp.csv
    -> 1000 rows
  2. curl -XGET 'https://data.seattle.gov/resource/3k2p-39jp.json?$select=count(*)'
[ {
  "count_cad_cdw_id" : "1248159"
}
 ]

cc @pholleran

@dmfenton
Copy link
Contributor Author

Hmm works locally:
http://koop.dc.esri.com/status does not have the version of Socrata so I can't tell if the provider is out of date.

@dmfenton
Copy link
Contributor Author

Oddly:

wc -l 3k2p-39jp.csv 
189001 3k2p-39jp.csv

@chelm
Copy link
Contributor

chelm commented May 18, 2015

@dmfenton koop.dc is running 0.1.2 of koop-socrata. I'll PR making it show up in the status

@dmfenton
Copy link
Contributor Author

Anddd, even after the file is served my node process seems to get permanently pegged. Lots of postgres action too. I guess that's the geohash indexing on Opendata-koop?

@dmfenton
Copy link
Contributor Author

Now it's just 504's at http://koop.dc.esri.com/socrata/seattle/3k2p-39jp.csv

@chelm
Copy link
Contributor

chelm commented May 18, 2015

@dmfenton geohash is very low impact, I doubt that has any impact here...

@chelm
Copy link
Contributor

chelm commented May 19, 2015

I've implemented a new way to page large data like this. However at this point in time we are not using externally managed queues. This means that paging over socrata data is not very stable and is only in-mem. So if a koop process were to die (via any number of issues with other providers or any cause at all) the paging would be lost and the dataset will be stuck in a processing state.

This means in order to actually use this provider in production we'll need use a request worker strategy that is more durable and persistent.

@dmfenton
Copy link
Contributor Author

Is there a way to abstract the process so that it's easier for other providers to tap in to worker resources?

@chelm
Copy link
Contributor

chelm commented May 19, 2015

@dmfenton potentially

@chelm
Copy link
Contributor

chelm commented May 19, 2015

We could create a centralized request worker that would take an array or page urls, make requests, and insert data into tables while not knowing what specific type of work they are doing.

@chelm
Copy link
Contributor

chelm commented May 19, 2015

I had trouble getting this particular dataset from socrata to cache locally. Are there rate limits that I am not aware of @dmfenton ?

@dmfenton
Copy link
Contributor Author

Throttling and Application Tokens

Hold on a second! Before you go storming off to make the next great open data app, you should understand how SODA handles throttling. You can make a certain number of requests without an application token, but they come from a shared pool and you’re eventually going to get cut off.

If you want more requests, register for an application tokenhttp://dev.socrata.com/register and your application will be granted up to 1000 requests per rolling hour period. If you need even more than that, special exceptions are made by request. Use the Help! tab on the right of this page to file a trouble ticket.

On Tue, May 19, 2015 at 12:13 PM -0700, "Christopher Helm" <notifications@github.commailto:notifications@github.com> wrote:

I had trouble getting this particular dataset from socrata to cache locally. Are there rate limits that I am not aware of @dmfentonhttps://github.com/dmfenton ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/21#issuecomment-103637444.

@chelm
Copy link
Contributor

chelm commented May 19, 2015

so we probably need to add support for sending an app key/token with requests - this would be a config param

@sirws
Copy link

sirws commented May 28, 2015

@chelm I am having the same issue here. When I used to access this: http://koop.dc.esri.com/socrata/wastate/9ubz-5r4b/FeatureServer/0/query?where=1=1 I would get over 5000 records and now it is limited to 1000. Is there something I need to do to make it get all of the records?

@chelm
Copy link
Contributor

chelm commented May 28, 2015

ahh so the FeatureServices respect the maxRecordCount. I'm surprised it ever returned more than 1000 features though.

The koop-socrata provider needs to support setting a limit and offset before it goes to the DB to get the data. koop-agol does this https://github.com/Esri/koop-agol/blob/master/controller/index.js#L585-L586

@chelm
Copy link
Contributor

chelm commented May 28, 2015

FYI @sirws I just added this locally and it works great. I'll PR it.

@chelm
Copy link
Contributor

chelm commented May 28, 2015

These are actually separate issue @sirws - what you are seeing is an issue getting data from the service and what @dmfenton was seeing was an issue with getting all the data from the server.

I've made a PR for your issue @sirws #30

@dmfenton's is already fixed.

@sirws
Copy link

sirws commented May 28, 2015

I implemented the PR and it does not seem to work at all for me.

Thu, 28 May 2015 19:02:48 GMT express deprecated res.send(body, status): Use res.status(status).send(body) instead at node_modules\koop-socrata\controller\index.js:54:17
Application has thrown an uncaught exception and is terminated:
TypeError: Cannot read property 'features' of undefined
at Object.module.exports as bounds
at Object.module.exports.extent (D:\koop-sample-app\node_modules\koop\lib\FeatureServices.js:94:19)
at Object.module.exports.info (D:\koop-sample-app\node_modules\koop\lib\FeatureServices.js:126:65)
at Object.processFeatureServer (D:\koop-sample-app\node_modules\koop\lib\BaseController.js:81:27)
at D:\koop-sample-app\node_modules\koop-socrata\controller\index.js:160:24
at D:\koop-sample-app\node_modules\koop-socrata\models\Socrata.js:153:23
at D:\koop-sample-app\node_modules\koop\lib\Cache.js:11:9
at D:\koop-sample-app\node_modules\koop-pgcache\index.js:563:11
at null.callback (D:\koop-sample-app\node_modules\koop-pgcache\index.js:596:11)
at Query.handleReadyForQuery (D:\koop-sample-app\node_modules\koop-pgcache\node_modules\pg\lib\query.js:80:10)

@chelm
Copy link
Contributor

chelm commented May 28, 2015

@sirws hmmm how would that happen? What URL are you trying?

@sirws
Copy link

sirws commented May 28, 2015

Weird http://geodata.wa.gov/koop/socrata/wa/9ubz-5r4b/FeatureServer/0 seems to be working now. But it is still only returning 1000 records...http://geo.wa.gov/datasets/405e3ffff86b4de48bb4ade6b57c8054_0?filterByExtent=false&uiTab=table

How do I get it to return more records?

@chelm
Copy link
Contributor

chelm commented May 28, 2015

@sirws i think the service only has 1000 features in the DB... that is odd

http://geodata.wa.gov/koop/socrata/wa/9ubz-5r4b/FeatureServer/0/query?returnCountOnly=true -> 1000
http://koop.dc.esri.com/socrata/wastate/9ubz-5r4b/FeatureServer/0/query?returnCountOnly=true -> 5800

Did you drop the cache?

@sirws
Copy link

sirws commented May 28, 2015

Ok, I thought I dropped the cache. It is now saying 5830, but the OD app is still saying 1000. Do i need to reindex on the opendata site?

@chelm
Copy link
Contributor

chelm commented May 28, 2015

@sirws will you try re-indexing it?

@sirws
Copy link

sirws commented May 28, 2015

I kicked off the re-index. It could take a while.

@dmfenton
Copy link
Contributor Author

FYI @sirws, you can reindex individual datasets. But indeed this could take a while.

@sirws
Copy link

sirws commented May 28, 2015

But there should be 5830

@dmfenton
Copy link
Contributor Author

http://geodata.wa.gov/koop/socrata/wa/9ubz-5r4b
Count at the bottom says 1148. So that's interesting.

@sirws
Copy link

sirws commented May 28, 2015

Something goofy is going on with my indexes. I check them and they show 5830, then 1148. Not sure what is going on. I dropped them again and it now showing 5830. And I can download them all but the OD app says there are only 1000 records.

http://geo.wa.gov/datasets/405e3ffff86b4de48bb4ade6b57c8054_0?filterByExtent=false&uiTab=table

@dmfenton
Copy link
Contributor Author

Well the OD app is not going to update until reharvest. And we’re having some delays with that right now.

@sirws
Copy link

sirws commented May 28, 2015

Ok. I will let that go for a while then. Will check later on.

@dmfenton
Copy link
Contributor Author

I'm going to close this. Locally I get the correct results and the original bug has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants