Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignettes #31

Closed
wants to merge 39 commits into from
Closed

Vignettes #31

wants to merge 39 commits into from

Conversation

jannes-m
Copy link
Collaborator

Hey @paleolimbot,
I should have started writing the vignettes in another branch right from the beginning instead of editing the master. So here we go. This is still work in progress but since it appears that the multilayer input bug was fixed (#25, qgis/QGIS#40287), I will resume working on the vignettes in the coming weeks.

@jannes-m jannes-m marked this pull request as draft December 15, 2020 09:39
@florisvdh
Copy link
Member

@jannes-m perhaps you can give an update about the status and your further plans for this draft PR? Should it be reviewed already?

@jannes-m
Copy link
Collaborator Author

It's been a while so I guess before reviewing I would have to update the vignette so that it reflects the many changes in the code base, great work, by the way 😊👍. See also my reply to #124.

@jannes-m
Copy link
Collaborator Author

jannes-m commented Mar 20, 2023

@florisvdh, could you pls have a look at the qgisprocess.Rmd vignette and let me know if there is anything which we need to add.
Following topics came already to mind:

  • introduce qgis_pipe()
  • introduce qgis_plugins() and qgis_enable_plugins() and qgis_disable_plugins()
  • check if geocompr/geocompr:qgis-ext still works (I have to admit I have lost track which versions are actually supported: qgis, qgis-ext, qgis-min, qgis-dev). Could you please clarify @Robinlovelace? And Robin do you think it is still necessary to write a vignette on how to run qgisprocess from within a docker container or is it sufficient to just show here the docker pull command and reference the corresponding READMEs on github and dockerhub?
  • reference package qgis
  • show compat methods
  • @florisvdh maybe we should add something regarding the package structure (see Revising the namespace #133)?
  • should we say something on json serialization?
  • introduce qgis_run()
  • multilayer input example

Originally, we thought it might be a good idea to write another vignette replicating the use case we have introduced in the RQGIS paper but I have to admit I am a bit unsure about the added value. Thoughts? In any case, first and foremost we should finish this introductory vignette.

@florisvdh
Copy link
Member

@jannes-m thanks for reviving this!

I have looked into qgisprocess.Rmd and have proposed some changes in PR #142. These changes use the vignettes branch in this repo as base; that branch is at the same commit as your jannes-m:vignettes branch. Can you update current PR to use vignettes instead of jannes-m:vignettes (your fork), this will make it easier to propose changes in a branch based on yours. I have given you commit rights to the repo (with main branch protected) so that we can work and merge intermittently within this repo. And so you can work on the vignettes branch in this repo directly.

Of course PR #142 is best handled first before making new changes here.

Further comments (not tackled in PR #142):

  • we will need to decide whether we want the vignettes that depend on QGIS included in the installed package. CRAN cannot build them, and hardcoding all output means more maintenance. Dewey's recommendation is not to do it (using .Rbuildignore), and only include them in the pkgdown site (usethis::use_article). But perhaps an intermediate solution is to put most code in a rmarkdown child document for such vignettes, and use the has_qgis() condition to trigger including the child document. If !has_qgis(), then print a line that refers to the pkgdown article URL. What do you think?
  • I recommend using the <- operator everywhere, can you change it? (it's used most often in R, also in the package code and examples, as in other r-spatial packages).

Regarding your questions:

  • yes, qgis_pipe() (qgis_run_algorithm_2()?: what's your pref) has a place here IMO. Same for the reference to qgis package.
  • perhaps the plugin handling can go into a small separate vignette. I think the basic-usage vignette does not need to get much longer.
  • actually the current 'package configuration' part (paragraph distinguished in the other PR) could also be moved into its own vignette, where the plugins can then be easily handled and then there's no problem to add some more config stuff.
  • I doubt whether qgis_run() needs to be included here. Not all functions need to be introduced in 'basic usage' and qgis_run() is more advanced since one needs a feeling with qgis_process CLI (debugging, special cases perhaps). It's in the function reference (and according function family) anyway.
  • JSON serialization: it has been touched in the other PR; feel free to elaborate but I think it's best not to complicate within this vignette.
  • indeed some functions are planned to change names (deprecating old names); implementing that will be followed by automated search-and-replace for the whole package, including the vignettes, so there's no need to anticipate. Does that answer your question wrt Revising the namespace #133?
  • multilayer input: I'm curious, but all examples are welcome! I leave to you whether it deserves its own vignette or is best put inside basic-usage (here). If you mean a multilayer argument (e.g. in native:aggregate), then the layers can be passed as a list in R and will need QGIS>=3.24 since that depends on JSON input (cannot be specified as regular CLI argument to qgis_process).

I'll look into the other vignette later on.

@jannes-m
Copy link
Collaborator Author

Hey @florisvdh,
since you already have created a vignettes branch in the qgisprocess repo, we can basically delete this PR, right?

we will need to decide whether we want the vignettes that depend on QGIS included in the installed package. CRAN cannot build them, and hardcoding all output means more maintenance. Dewey's recommendation is not to do it (using .Rbuildignore), and only include them in the pkgdown site (usethis::use_article). But perhaps an intermediate solution is to put most code in a rmarkdown child document for such vignettes, and use the has_qgis() condition to trigger including the child document. If !has_qgis(), then print a line that refers to the pkgdown article URL. What do you think?

It would be easier to present no output at all, i.e., just setting eval=FALSE as one knitr default option at the beginning of the vignette. But it is up to you to decide.

I recommend using the <- operator everywhere, can you change it? (it's used most often in R, also in the package code and examples, as in other r-spatial packages).
Ok, I'll change that.

Otherwise, I'll agree with your suggestions and will act on them accordingly (qgis_pipe(), I have admit I have never used it so far, but with regard to the naming I would stick with qgis_pipe(), because you can run (pipe) more than two geoalgorithms consecutively, right?)

Today, I have talked to @Robinlovelace, and he agreed that we can just reference the Dockerfile in the basic vignette and link to the according help pages instead of writing an explicit docker vignette. And I also think that I would just delete the other vignette (use_case.Rmd).

@florisvdh
Copy link
Member

  • this PR: well, first see whether you can replace the feature branch jannes-m:vignettes by the vignettes branch of this repo, in the definition of this PR (at the top of the GH page). I cannot, but that may be because it comes from your fork. If you cannot change that either, then yes best make a new PR and close this one.
  • regarding knitting and eval=FALSE, I'll see if I can achieve something with a child document in PR Vignette on using QGIS expressions with qgisprocess #143. A good test is the GHA workflow 'macos + none', since that one misses QGIS (like CRAN).
  • qgis_pipe() advantage is that it's a shorter name, but its name pinpoints a special feature rather than what it does essentially (= what a function name should do), i.e. doing the same as qgis_run_algorithm(). That's not clear from the name, so I was rather thinking of e.g. qgis_run_algorithm_2() or qgis_run_algorithm_p().
  • qgis_pipe() after Increase capability of qgis_pipe() wrt qgis_result objects #134 has been merged will be able to just use the function repeatedly with pipes between them, because it will accept a qgis_result object, on condition the latter has an 'OUTPUT' element which it then uses.

@jannes-m
Copy link
Collaborator Author

jannes-m commented Mar 24, 2023

  • I cannot change the source branch of this PR either, so I will close this one, and then we can just work further on your vignettes branch, i.e., create a PR from it.
  • knitr child documents: sounds good.
  • Regarding qgis_pipe(), ok, I get your point. Not tested but shouldn't work something like this as well:
qgis_run_algorithm("qgis_alg", args) |>
 {\(x) qgis_run_algorithm("qgis_alg2", input = x$OUTPUT)} ()

I know the lambda function looks odd but this would be imo the most intuitive way of piping. This would also let the user specify the output actually needed for further processing instead of just using the first output object by default. Possibly it would also make qgis_pipe() superfluous.

@jannes-m jannes-m closed this Mar 24, 2023
@jannes-m jannes-m mentioned this pull request Mar 24, 2023
@florisvdh
Copy link
Member

@jannes-m Indeed, piping is always possible if the user undertakes the effort needed to correctly pass arguments. However qgis_run_algorithm() cannot, by nature (because of the dynamic dots), be called pipe-friendly, which would mean that a meaningful object can be passed as its first argument. However qgis_pipe() does accept such object and passes it to qgis_process as the first argument to be used for the algorithm.

@jannes-m
Copy link
Collaborator Author

Hey @florisvdh,
I prefer the lambda piping over qgis_pipe() because:

  • one would not have to write, teach and maintain qgis_pipe()
  • in the case of qgis_pipe() the user has to know that the piped object becomes automatically the argument for the first parameter of the chosen geoalgorithm. This might work for most geoalgorithms but I would assume that there are at least a few for which this does not work. The lambda piping avoids this because the user has to explicitly pass arguments to the corresponding parameter which is more explicit and also allows for more flexibility.
  • the effort of teaching the lambda piping or qgis_pipe() is the same, e.g., in a vignette. However, the former could be used for every function whose first parameter is not the input object on which to work on, e.g., gsub(), etc.

Of course, this might be just personal taste, and I just want to let you know my reasoning. If you still prefer to use qgis_pipe(), that's perfectly fine. But then I wonder if it would not make for a better user experience to provide a qgis_run_algorithm.data_input_path and a qgis_run_algorithm.qgis_result method instead of qgis_pipe() or qgis_run_algorithm_2()?

P.S.: I forgot the {} in the pseudo code lamba piping example, so I have added them, ah, what the hack, let's try a real world example:

library("qgisprocess")
#> Attempting to load the cache ... Success!
#> QGIS version: 3.28.2-Firenze
#> Having access to 1164 algorithms from 6 QGIS processing providers.
#> Run `qgis_configure(use_cached_data = TRUE)` to reload cache and get more details.
#> >>> Run `qgis_enable_plugins()` to enable 1 disabled plugin(s) and
#>     access their algorithms: otbprovider
library("terra")
#> terra 1.7.18

dem = rast(system.file("raster/dem.tif", package = "spDataLarge"))
out = qgis_run_algorithm(algorithm = "saga:sinkremoval", DEM = dem, 
                         METHOD = 1, .quiet = TRUE) |>
   (\(x) qgis_run_algorithm(algorithm = "saga:sagawetnessindex",
                            DEM = x$DEM_PREPROC[1], .quiet = TRUE)) ()
#> Argument `SINKROUTE` is unspecified (using QGIS default value).
#> Using `DEM_PREPROC = qgis_tmp_raster()`
#> Argument `THRESHOLD` is unspecified (using QGIS default value).
#> Argument `THRSHEIGHT` is unspecified (using QGIS default value).
#> Argument `WEIGHT` is unspecified (using QGIS default value).
#> Using `AREA = qgis_tmp_raster()`
#> Using `SLOPE = qgis_tmp_raster()`
#> Using `AREA_MOD = qgis_tmp_raster()`
#> Using `TWI = qgis_tmp_raster()`
#> Argument `SUCTION` is unspecified (using QGIS default value).
#> Using `AREA_TYPE = "[0] total catchment area"`
#> Using `SLOPE_TYPE = "[0] local slope"`
#> Argument `SLOPE_MIN` is unspecified (using QGIS default value).
#> Argument `SLOPE_OFF` is unspecified (using QGIS default value).
#> Argument `SLOPE_WEIGHT` is unspecified (using QGIS default value).

# and doing the same using `qgis_pipe()`, though there certainly is a more
# elegant way of doing it

# out_2 = qgis_run_algorithm(algorithm = "saga:sinkremoval", DEM = dem, 
#                            METHOD = 1, .quiet = TRUE) |>
#   qgis_result_single() |> 
#   unclass() |> 
#   qgis_pipe("saga:sagawetnessindex")

Created on 2023-03-24 with reprex v2.0.2

Excellent, this has worked!
FYI, starting with R 4.3.0, the placeholder of the pipe operator also accepts the $ operator (see here, and then something like the code below should work:

out = qgis_run_algorithm(algorithm = "saga:sinkremoval", DEM = dem, 
                         METHOD = 1, .quiet = TRUE) |>
  qgis_run_algorithm(algorithm = "saga:sagawetnessindex",
                     DEM = _$DEM_PREPROC[1], .quiet = TRUE)

This would be pretty amazing. IMO this would make qgis_pipe() obsolete but maybe I am missing something.

@florisvdh
Copy link
Member

Hi @jannes-m thanks for your involvement and these examples!

Let's try to make a comparison (which can evolve), since there are multiple differences:

criterion piping to qgis_run_algorithm() piping to qgis_pipe()
control + - (works mostly, not always)
simple syntax - (incoming object needs redirection) + (redirection unneeded. But only useable where incoming object can go into first algorithm argument)
handles qgis_result out of the box - (manual OUTPUT extraction) + (automated OUTPUT extraction, on condition it's called OUTPUT)
cleaning tempfiles of intermediate objects - (one cannot get hold of the intermediate qgis_result) + (cleaning tempfiles of incoming qgis_result after it is processed *)

* that is after #134 is completed

So the syntax for qgis_pipe() is as follows:

result2 <- result |>
qgis_pipe("native:subdivide", MAX_NODES = 10) |>
qgis_pipe("native:dissolve")

That being said, @paleolimbot is not a fan of piping algorithms at all IIUC, especially because it looses track of intermediate results, which can then not be easily cleaned (if you want to remove temp output files after further reading/processing, you need to pass a qgis_result object to qgis_result_clean(); you loose them in regular piping). That's why I intend to add that in qgis_pipe(), where this is possible for the case it gets a qgis_result in.

Personally I'm in favour of having multiple possibilities. If qgis_pipe() works for a user's situation (it mostly should), it will be the simplest code and provide automated cleanup (except for the final result of course). If it doesn't for some reason, or more probably if you want manual control over inputs and outputs, then nothing's wrong with regular piping with qgis_run_algorithm() except that you are piling up temp files; but they should get cleaned after the R session finishes anyway! But it may be an inconvenience in case of larger files / more limited space.

Two other points that you touched:

  • alternative syntaxes for piping with qgis_run_algorithm(), where by necessity the incoming object must be redirected to another argument than the first (which is the algorithm id). There's the anonymous function, its shorter lambda form, and the '_' placeholder. Personally I much favour the placeholder because it minimizes complexity. But I don't think users must be recommended one or the other way in the context of qgisprocess, since such choice belongs to the general syntax of using the native pipe.
  • methods for qgis_run_algorithm(). I don't expect this will happen (you're welcome to propose a makeover though), especially since qgis_run_algorithm() has no explicit argument for the input or the qgis_result object to come in (all algorithm arguments are being captured by ...). Making that possible would imply the same 'guessing' as in qgis_pipe() (i.e. that the first algorithm argument is the one to use for this object). It could be done for qgis_pipe() itself though, which has the .data argument as first argument, but then it would still remain a separate function distinguished from qgis_run_algorithm() right because of that argument.

So to me several options seem viable and could co-exist.

@jannes-m
Copy link
Collaborator Author

Hey @florisvdh,
thank you for the nice summary of the advantages and disadvantages of the two approaches and the fruitful discussion!
And I absolutely agree with you, that several options can co-exist.
Hence, I suggest to add a section 'piping' to the basic usage vignette presenting the two approaches with their pros and cons, and explicitly telling the user that all piping approaches share the problem that one cannot (easily) access (and explore) intermediate results. This is especially painful when after an already long processing time the last pipe returns an error.

@florisvdh
Copy link
Member

Hence, I suggest to add a section 'piping' to the basic usage vignette presenting the two approaches with their pros and cons, and explicitly telling the user that all piping approaches share the problem that one cannot (easily) access (and explore) intermediate results. This is especially painful when after an already long processing time the last pipe returns an error.

Yes, good ideas!

Coming week (the 2 weeks after that I may be not much online) I hope to:

  • conclude and merge PR Increase capability of qgis_pipe() wrt qgis_result objects #134
    • beside adding behaviour to clean intermediate results, I believe qgis_pipe() should also gain an argument, by default equal to "OUTPUT", in order to control which element to extract from an incoming qgis_result.
  • implement the namespace changes enlisted in Revising the namespace #133 (with old names deprecated).
    • do you follow the reasoning to drop the qgis_pipe() name? And then what would you rather prefer (I'm quite undecided here): qgis_run_algorithm_2() or qgis_run_algorithm_p() ('p' referring to pipe), or still something else?

If you could have a quick look at PR #143, that would be great. I actually want to publish that vignette (or its first version) soon since I'm referring it in a newsletter of my organization (we supported QGIS financially to improve QGIS expression support in qgis_process).

@jannes-m
Copy link
Collaborator Author

Hey @florisvdh,
yes, I think it might be a good idea to rename qgis_pipe(). I still don't fully understand why qgis_run_algorithm() could not get the method you have written for qgis_pipe(), so basically a qgis_run_algorithm.qgis_result method which calls the stuff now living in qgis_pipe(). But if you don't like that, then what about qgis_run_next_algorithm()?

@florisvdh
Copy link
Member

qgis_run_algorithm() doesn't provide an input argument (the latter is implicit in ...) so the input object is not available for method dispatch. Hence qgis_run_algorithm.qgis_result will not work in current package philosophy.

Apart from that, it could be an idea to move the handling of a qgis_result to a new as_qgis_argument.qgis_result method. This would effectively allow qgis_run_algorithm(INPUT = some_qgis_result_piped_or_not) to do something similar in terms of auto-extracting output element; however control over which output to select or over result cleaning would be missing (or that should be incorporated at high level too and passed on deep down to as_qgis_argument(), just for the qgis_result case ...). So since that would be missing qgis_pipe() couldn't take advantage of it through qgis_run_algorithm() (which it calls through qgis_function()!). So after all I doubt whether it's worth the trouble to also make as_qgis_argument.qgis_result: in the case of qgis_run_algorithm() you'd still need to take manual control of output extraction (if different from OUTPUT) and do placeholder (piping) redirection, while qgis_pipe() wouldn't make use of as_qgis_argument.qgis_result given that it wants control over output extraction and cleaning.

qgis_run_next_algorithm(): well, I might like it 🙂, sounds good. However it is not necessarily used after a previous qgis_run_algorithm() step, which the name seems to suggest though. For example, this looks a bit odd to me:

library(sf)
library(qgisprocess)
filepath <- system.file("gpkg/buildings.gpkg", package = "sf")
read_sf(filepath) |>
  dplyr::filter(cat < 20) |>
  qgis_run_next_algorithm("native:buffer", DISTANCE = 300)

So still looking for some more independent name, _2 suffix keeps in my mind as a fallback.

qgis_pipe_algorithm() ? Although if thinking about IDE's: it is nicer that if user starts typing 'qgis_run...', the function is still among the results.

qgis_run_algorithm_pipe() ?

@jannes-m
Copy link
Collaborator Author

qgis_run_algorithm() doesn't provide an input argument (the latter is implicit in ...) so the input object is not available for method dispatch. Hence qgis_run_algorithm.qgis_result will not work in current package philosophy.

But qgis_run_algorithm.qgis_result() could be structured the same way as qgis_pipe(), i.e.,

     qgis_run_algorithm.qgis_result = function(
       .data,
       algorithm,
       ...,
       .select = "OUTPUT",
       .clean = TRUE,
       .quiet = TRUE
     ) {...}

But anyways, this is just me rambling from a bird's eye perspective without ever having touched the underlying source code, so you are definitely in a better position to judge what makes sense and what not and if this complies with the package philosophy or not.
Yes, I also thought that qgis_run_next_algorithm() sounds nice, but I agree this does not fully capture what the function is doing. Hence, the best names capturing the function's behavior are so far qgis_run_algorithm_p() or qgis_run_algorithm_pipe().

@florisvdh
Copy link
Member

florisvdh commented Mar 29, 2023

Thank you for challenging this 🙂. OK, what I learned now is that a method can indeed take a superset of the generic's arguments (e.g. adding .data) on condition that the generic has ..., which is the case. (https://adv-r.hadley.nz/s3.html#s3-arguments). An example is given in https://rstudio-education.github.io/hopr/s3.html#methods.

But. The dispatch takes place at the level of the generic, so needs an argument at the level of the generic to dispatch by. Hence the argument by which to dispatch cannot belong to .... And the generic cannot have .data here IMO since it is not handled by (regular) qgis_run_algorithm(), that would break how it works (i.e. sanitizing and serializing the arguments, which all are expected in ...).

@florisvdh
Copy link
Member

OK, settling on qgis_run_algorithm_p(), it advertises the 'pipe' less prominently and is shorter.

@jannes-m
Copy link
Collaborator Author

Maybe, I am wrong but I sill think it should work. But instead of guessing, one should try. So if I find the time (but I cannot promise anything, it's really hard to code for fun with three little kids), I'll write a qgis_run_algorithm.qgis_result() method, and if it works you can still decide if you like the approach or not.

@florisvdh
Copy link
Member

Feel free to try! For it to be used when piping a qgis_result to qgis_run_algorithm(), qgis_run_algorithm() would need to call UseMethod() based on some argument that gets the qgis_result in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants