Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter data with geq, neq and not in #24

Open
NielsBosNL opened this issue Aug 28, 2019 · 3 comments
Open

filter data with geq, neq and not in #24

NielsBosNL opened this issue Aug 28, 2019 · 3 comments

Comments

@NielsBosNL
Copy link

NielsBosNL commented Aug 28, 2019

Hi Edwin,
very nice package.
In your examples you show how to filter on a specific variable-value:
cbs_get_data(id="03759ned", Perioden=c("2013JJ00","2014JJ00"), Geslacht="T001038")

however, in this large file of 45 million records I want to filter to e.g.
Perioden > "1990JJ00",
Geslacht != "T001038",
! Leeftijd %in% c(10000, 60100,60200,60300,60400,60500,60600,60700,60800,60900,21900)

Is that possible?
What would be the correct syntax for "not equal", "greater then" or "not in"?

Or filter substr(RegioS,1,2)="GM" filtering just municipalities :-)

@edwindj
Copy link
Owner

edwindj commented May 27, 2020

This syntax is currently not supported, sorry!

@edwindj edwindj changed the title filter data filter data with geq, neq and not in May 27, 2020
@edwindj
Copy link
Owner

edwindj commented Sep 21, 2020

Only a small subset is supported: has_substring detects for substrings.

@lverweijen
Copy link

I tried performing your query using ODataQuery (I'm the author).

It worked on http://beta-odata4.cbs.nl/

library(ODataQuery)

leeftijden <- c(
  "10000", "60100", "60200", "60300", "60400",
  "60500", "60600", "60700", "60800", "60900",
  "21900")

opendata_service <- ODataQuery$new("http://beta-odata4.cbs.nl/")
observations_path <- opendata_service$path('CBS', '03759ned', "Observations")
observations_query <-
  observations_path$filter(to_odata(Perioden > "1990JJ00"
                                    && Geslacht != "T001038"
                                    && Leeftijd %in% !!leeftijden))

print(observations_query$url)  
observations_df <- observations_query$all()
head(observations_df)

http://beta-odata4.cbs.nl/CBS/03759ned/Observations?$filter=(Perioden%20gt%20'1990JJ00'%20and%20Geslacht%20ne%20'T001038'%20and%20Leeftijd%20in%20('10000','60100','60200','60300','60400','60500','60600','60700','60800','60900','21900'))

Id Measure ValueAttribute   Value StringValue BurgerlijkeStaat Geslacht Leeftijd RegioS Perioden
1 30690548 M000352           None 7419501          NA          T001019     3000    10000   NL01 1991JJ00
2 30690549 M000352           None 7480422          NA          T001019     3000    10000   NL01 1992JJ00
3 30690550 M000352           None 7535268          NA          T001019     3000    10000   NL01 1993JJ00
4 30690551 M000352           None 7585887          NA          T001019     3000    10000   NL01 1994JJ00
5 30690552 M000352           None 7627482          NA          T001019     3000    10000   NL01 1995JJ00
6 30690553 M000365           None 7644886          NA          T001019     3000    10000   NL01 1995JJ00

Unfortunately, it didn't work on the stable ODataService:

http://opendata.cbs.nl/ODataApi/odata/03759ned/TypedDataSet?$filter=(Perioden%20gt%20'1990JJ00'%20and%20Geslacht%20ne%20'T001038'%20and%20Leeftijd%20in%20('10000','60100','60200','60300','60400','60500','60600','60700','60800','60900','21900'))
Error getting TypedDataSet for '03759ned': Object reference not set to an instance of an object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants