Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutting down phylodiversity.net #43

Closed
camwebb opened this issue Oct 5, 2020 · 17 comments
Closed

Shutting down phylodiversity.net #43

camwebb opened this issue Oct 5, 2020 · 17 comments

Comments

@camwebb
Copy link

camwebb commented Oct 5, 2020

@sckott I'm going to shut down phylodiversity.net in a year (it's been over twenty years!). Phylomatic has phylodiversity.net/phylomatic as its front-end, but the code actually runs elsewhere, on a cloud server I also manage. I know phylomatic is still used heavily: ~300 POSTs to pmws a week, most from r-curl which I assume are generated from taxize/brranching. Can you suggest a clean way to transition to a sustainable, long-term cloud solution for phylomatic? Does it have to be a cloud solution? If we could package it right, the script engine (gawk) and data files are only a few MB.

Also, there are many requests from 128.123.63.10 (Mesilla Park, NM) via python-requests. Any idea who is running this interface?

Thanks, Cam

@sckott
Copy link
Contributor

sckott commented Oct 6, 2020

thanks for your message @camwebb

Looks like that IP address points to https://phylo.cs.nmsu.edu/

It's possible most of the R requests are coming via this package, hard to say. If you record user-agent strings, UA strings from this package should look like libcurl/7.72.0 r-curl/4.3 crul/1.0.0 (pkg versions can change of course)

I imagine you remember I also maintain the phylocomr package wrapping the Phylocom library with an interface to it's phylomatic equivalent https://github.com/ropensci/phylocomr#phylomatic - What are the differences between the gawk version running on the web and the version in Phylocom?

I don't know the details of what the service needs if run on a server. I do run some servers for various web services I maintain. It sounds like the gawk based version of Phylomatic could just run from a local software installation. Could the application of gawk scripts plus data be packaged up as a binary that folks could run from any major OS? It's possible to use gawk from within R (for example, this package), but the integration with C/C++ is better.

@camwebb
Copy link
Author

camwebb commented Oct 6, 2020

Thinking about this more, I realize the simplest thing is just to rewrite the basic phylomatic functionality in R - it's basic function is a simple graft and prune that would be quite easy (I guess) to implement in R with ape. I know R loops can be slow when iterated many thousands of times, but there may be ways to vectorize the operations. The basic code is here - it's well-documented, so you should be able to see roughly what it does even if the Awk (C-ish) looks unfamiliar. Could you maybe assess how this would translate to R?

(To assist reading: a tree is simply a node, parent[node] structure. The code reads through the slash-path use input format, matching node labels and attaching to the megatree at the most distal level it can. Then all the unmatched branches are pruned out.)

@sckott
Copy link
Contributor

sckott commented Oct 6, 2020

Good idea. I'd imagine a pure R implementation would be rather slow, but I don't know for sure. Maybe this: Translating to C++ would make it still fast and make it easy to provide a thin interface for that in R that would make it portable to all operating systems

@camwebb
Copy link
Author

camwebb commented Oct 6, 2020

Yes, that makes sense. And since it came from C (in the phylocom distibution), I could start there. Would you be available to help with the R side and integration into your library? This will be a low-priority project, but I should be able to get it done in a year.

@sckott
Copy link
Contributor

sckott commented Oct 6, 2020

Yes, definitely available to help on the R side.

I can help with the translation if we do it in C++, but i've not much experience in C.

@camwebb
Copy link
Author

camwebb commented Oct 7, 2020 via email

@sckott
Copy link
Contributor

sckott commented Oct 7, 2020

Sounds good

@camwebb
Copy link
Author

camwebb commented Oct 25, 2021

Hi again @sckott. I'm working on this now. Using old/simple .C() call. Have got reading and writing ape phylo objects into C working. Now to add the phylomatic C code... A few more days. Repo is https://github.com/camwebb/phylomatic-r/

@sckott
Copy link
Contributor

sckott commented Oct 25, 2021

That's great there's progress on this! However, I've moved on to a new job and am no longer doing R work for the most part. If you want any help on this @LunaSare has taken over this package and https://github.com/ropensci/phylocomr - I can point out some other folks that might be interested as well, just let me know

@camwebb
Copy link
Author

camwebb commented Oct 25, 2021

OK, good luck! I will definitely need help for incorporating this C code into a package. @LunaSare can you please help me? @sckott just one question: should this new feature (R native phylomatic function) go in taxize or brranching or somewhere else? A small package of its own?

@LunaSare
Copy link
Contributor

Hi @camwebb and @sckott!
I should be able to help, however I have various deadlines coming up soon. I will have time to start with a new project in a couple weeks. Would that work?

@camwebb
Copy link
Author

camwebb commented Oct 26, 2021

Hi @LunaSare, thanks and no hurry. What I'm hoping for is

  1. for me just to build the C extension and R function (which read an ape phylo and taxa list, and returns an ape phylo) and...
  2. then have someone else bundle it in a R package and handle the issues of compiling the C code for various binary packages (if the package is released as binaries).

I could learn (2) and release my own package, but would rather not have to, especially as brranching was already designed to run the web service version of phylomatic, which I am shortly shutting down.

@sckott
Copy link
Contributor

sckott commented Oct 26, 2021

I'm not sure where is best. The https://github.com/ropensci/phylocomr package calls out to the Phylocom C code - I'm not sure how the code in https://github.com/camwebb/phylomatic-r/ will relate to the Phylocom code? Perhaps it will make sense to include the new code in phylocomr.

It probably makes sense to deprecate the brranching package b/c the web service will be gone and the only other thing it does is call out the phylocomr. Make sense @LunaSare ?

I don't think it makes sense to put the https://github.com/camwebb/phylomatic-r/ in taxize - that pkg is focused on taxonomic names, not trees.

A separate package is also a good option. You'd have the most freedom in that case as opposed to including in another package. And then other packages could depend on your package.

@camwebb
Copy link
Author

camwebb commented Oct 27, 2021

Would rather not it be in phylocomr because the whole point of rewriting it as an C extension for R is to make it R 'native', which phylocom is not. So I guess a small stand alone package makes sense. But I'm wary of spending the time needed to learn how to package something myself - not intending to do it for anything else. This is why getting @LunaSare's help would be great.

@sckott
Copy link
Contributor

sckott commented Oct 27, 2021

Agree, makes sense then to not include in phylocomr. Probably makes sense as a separate standalone package. Before getting to CRAN (which builds binaries for you), you can use https://r-universe.dev/help/ which allows for the normal install path using install.packages and builds binaries as well - which is important when the pkg has compiled code.

@LunaSare
Copy link
Contributor

Hi! Yesterday was my last pressing deadline, so I'll have time from now on to help with this @camwebb! I agree with all the points @sckott raised, I think it makes more sense to build it as a standalone package, it would be cleaner than trying to incorporate it to an existing one.

@camwebb
Copy link
Author

camwebb commented Nov 16, 2021

Thanks @LunaSare. I may let this thread languish a bit. The CLI phylomatic is working well, and it will take a fair amount of work to re-write it in C, for an R package, so it's a project of 2-3 days, I think. I also have no indication if anyone would actually use an R package. It's on the list though, and I'll contact you when I get to it. I'll close this thread and an an issue and ping on the new repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants