Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limits for geocodePL_get #41

Closed
BERENZ opened this issue Oct 28, 2020 · 12 comments
Closed

Limits for geocodePL_get #41

BERENZ opened this issue Oct 28, 2020 · 12 comments
Assignees

Comments

@BERENZ
Copy link
Contributor

BERENZ commented Oct 28, 2020

Is there a limit for the number of / time between queries for geocoding using geocodePL_get? I tried to find this information on GUGIK webpage but I failed.

@kadyb
Copy link
Owner

kadyb commented Oct 28, 2020

Also, I have not found such information anywhere. There are probably no such restrictions. However, in my experience, GUGiK's servers and services are problematic.
I think the safe solution will be to set some interval (maybe 1 s?) between requests.

BTW: At this moment the geocodePL_get() function needs some output improvements (#11).

@BERENZ
Copy link
Contributor Author

BERENZ commented Oct 28, 2020

Ok, I understand. Maybe you could contact GUGIK's staff to ask about the limitations?

BTW. is it possible that geocodePL_get() may return sf object instead of list? That would be super useful for speeding up the processing and merging with other data?

@kadyb
Copy link
Owner

kadyb commented Oct 28, 2020

OK, I will write a message asking if there are limits on the number of requests and the time between them.

Yes. This is a very good idea. There is definitely room for improvement. Currently we don't have time to do it, but I will definitely keep it in mind in the future.

Edit: I sent email.

@BERENZ
Copy link
Contributor Author

BERENZ commented Oct 28, 2020

Ok, so here is small proposal that combines the result of geocodePL_get.

output <- geocodePL_get(address = "Marki")

if (sapply(output, length)[1] == 1) {
  df <- as.data.frame(do.call(cbind, test), stringsAsFactors = FALSE)
  df$geometry_wkt <- NULL
  df <- st_as_sf(x = df, coords = c("x", "y"), crs = 2180)
} else {
  df <- lapply(output, FUN = function(x) as.data.frame(do.call(cbind, x), stringsAsFactors = FALSE))
  df <- do.call('rbind',df)
  df$geometry_wkt <- NULL
  df <- st_as_sf(x = df, coords = c("x", "y"), crs = 2180)
}

Here's how it works

  1. For multiple results
> output <- geocodePL_get(address = "Marki") ## list of 10
> df[,1:5]

Simple feature collection with 10 features and 5 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 469003.1 ymin: 193553.4 xmax: 710402 ymax: 631605
CRS:            EPSG:2180
    city  teryt    simc  voivodeship            county                  geometry
1  Marki 100103 0538774      łódzkie      bełchatowski POINT (523435.6 398347.3)
2  Marki 120702 0960993  małopolskie powiat limanowski POINT (576498.3 199686.8)
3  Marki 120709 0453724  małopolskie powiat limanowski POINT (583279.3 195401.8)
4  Marki 120711 0467212  małopolskie powiat limanowski POINT (597554.8 204842.4)
5  Marki 121508 0994934  małopolskie             suski   POINT (537196 193553.4)
6  Marki 143402 0920901  mazowieckie        wołomiński POINT (644467.9 498763.1)
7  Marki 160804 0143432     opolskie     powiat oleski POINT (469003.1 358536.7)
8  Marki 160804 0143366     opolskie            oleski POINT (469243.6 358790.1)
9  Marki 182001 0787721 podkarpackie      tarnobrzeski     POINT (685860 289770)
10 Marki 200602 0397167    podlaskie         kolneński     POINT (710402 631605)
  1. For one result
> output <- geocodePL_get(address = "Marki, Andersa")
> df[,1:5]

Simple feature collection with 1 feature and 5 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 643949.4 ymin: 499656.9 xmax: 643949.4 ymax: 499656.9
CRS:            EPSG:2180
   street  teryt    simc  ulic  city                  geometry
1 Andersa 143402 0920901 00285 Marki POINT (643949.4 499656.9)
  1. Works also for other objects given in documentation
> output <- geocodePL_get(rail_crossing = "001 018 478")
> df[,1:5]

Simple feature collection with 1 feature and 4 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 620704.5 ymin: 478258.4 xmax: 620704.5 ymax: 478258.4
CRS:            EPSG:2180
          operator category            phone    mobile phone                  geometry
1 PKP PLK WARSZAWA        A +48 22 473 37 34 +48 600 084 183 POINT (620704.5 478258.4)

EDIT: if you like this proposal I may prepare PR with respect to geocodePL_get.R and test-geocodePL_get.R

EDIT2: I don't know how to use element geometry_wkt that contains sf object which may be a better idea than using coords = c("x","y").

@kadyb
Copy link
Owner

kadyb commented Oct 29, 2020

I looked at your code (but I didn't test it). Maybe can we simplify it?

output = geocodePL_get(address = "Marki")
df_output = do.call(rbind.data.frame, output)
# use "geometry_wkt"
df_output = sf::st_as_sf(df_output, wkt = "geometry_wkt", crs = 2180)

Also, in geocodePL_get.R, we can remove

if (length(output) == 1) {
  output = output[[1]]
}

so a nested list will always be returned, then we can drop length condition (in your code) or just use rbind.data.frame.

The question: what if any column (attribute) is empty (NULL)? Will the function even work?
The next point is that we should only choose the relevant columns at the end (#11).
One more thing, there will probably be some duplicate code, so we should create some helper function.

@BERENZ
Copy link
Contributor Author

BERENZ commented Oct 29, 2020

If you simplify then results with only one query give incorrect output, see below:

> output <- geocodePL_get(address = "Marki, Andersa")
> df_output <- do.call(rbind.data.frame, output)
> df_output
1 Andersa
2 143402
3 0920901
4 00285
5 Marki
6 643949.3987
7 499656.945800001
8 LINESTRING(643691.7537 499759.7709,643714.492 499753.1515,643768.427 499731.363399999,643801.4207 499717.6074,643827.3306 499706.843599999,643949.3987 499656.945800001,644044.1973 499614.359099999,644077.5194 499600.2992,644169.6761 499559.555500001,644200.1808 499546.196699999,644271.0002 499515.1812,644276.6037 499513.287)
9 1
10 1
11 {Marki,143402}
> df_output = sf::st_as_sf(df_output, wkt = "geometry_wkt", crs = 2180)
Error in `[[<-.data.frame`(`*tmp*`, wkt, value = list()) : 
  replacement has 0 rows, data has 11

Concerning the NULL results it may be verified before applying these lines?

EDIT: I noticed that geocodePL_get(rail_crossing = "001 018 478") will give results without geometry_wkt so we cannot use wkt = "geometry_wkt" in sf::st_as_sf.

@kadyb
Copy link
Owner

kadyb commented Oct 29, 2020

I think we should remove

if (length(output) == 1) {
  output = output[[1]]
}

in source code and then use rbind.data.frame, because it will be a nested list.
But I can be wrong.

@kadyb
Copy link
Owner

kadyb commented Oct 29, 2020

You check NULLs after

output = jsonlite::fromJSON(prepared_URL)[["results"]]

@kadyb
Copy link
Owner

kadyb commented Oct 29, 2020

EDIT: I noticed that geocodePL_get(rail_crossing = "001 018 478") will give results without geometry_wkt so we cannot use wkt = "geometry_wkt" in sf::st_as_sf.

There is probably geometry_wkt attribute, just we're not returning it on the output currently.

rgugik/R/geocodePL_get.R

Lines 69 to 71 in 5e01945

# remove unnecessary attributes
sel = c("operator", "category", "phone", "mobile phone", "x", "y")
output = output[[1]][sel]

@BERENZ
Copy link
Contributor Author

BERENZ commented Oct 29, 2020

Ok, I will go back with some improvements to the end of this week.

@kadyb
Copy link
Owner

kadyb commented Oct 30, 2020

Response from GUGiK:

W odpowiedzi na Pańskie pytanie informuję, że w usłudze wprowadzony jest mechanizm blokowania adresów IP, który jest uruchamiany w wyniku przesyłania masowej ilości zapytań do źródłowego serwera usługi. Ograniczenie to ma na celu ochronę usługi na poziomie aplikacyjnym przed nadmierną ilością zapytań wysyłanych od użytkownika, w szczególności ataków DDoS.

W przypadku gdyby na Pański adres IP została nałożona taka blokada, wówczas należy postępować zgodnie z wyświetlonym komunikatem.

So we don't know what the limit is, but I think we can assume that there should be a 1 second delay between requests. If the limit is exceeded, the function will stop working (there will be an error in fromJSON()).

@kadyb
Copy link
Owner

kadyb commented Oct 31, 2020

Fixed in #43.

@kadyb kadyb closed this as completed Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants