Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Distinguish NA and 0 #9

Open
hadley opened this Issue May 7, 2015 · 5 comments

Comments

Projects
None yet
2 participants

hadley commented May 7, 2015

I have no idea how hard this would be, but it would be nice to distinguish "package was not on CRAN" (e.g. NA) from "package was not downloaded" (e.g. 0)

Owner

gaborcsardi commented May 7, 2015

Yeah, I was thinking about this, but then did not do it, because it is somewhat hard.

Whether the package was on CRAN is in my other DB about packages, in JSON. So it is not impossible, just need to add another table to cranlogs DB about package availability (just the first submission date and time, essentially), and update it from crandb, periodically.

The thing is, I am running so many small services and updates now, that I need to create some check and notification system, so that I can be sure that everything is working properly, and I am focusing on this (and improving www.r-pkg.org) right now.

So I am a little reluctant to add more updater scripts before this dashboard is up.

But soonish.

hadley commented May 7, 2015

Yeah, and if you really want to be thorough, you'd also want NAs if the package was temporarily archived. This isn't a big deal for me, just a nice-to-have.

Owner

gaborcsardi commented May 7, 2015

As for temporarily archived packages, you can still download them from the archives, downloads of old package versions are actually counted currently.

hadley commented May 7, 2015

Oh hmmm, I didn't think about that. In that case, it would also be nice to expose some information about the package versions being downloaded

Owner

gaborcsardi commented May 7, 2015

Yeah, first I would need to put it in the DB. :)

When I first started, I wanted to put everything in a DB, and have a rich API. But then it turned out that that would require a much bigger machine, in terms of disk, memory and cpu.

If the daily download log is about 15MB, then yearly I need ~5GB in the DB, and that is not huge, but it is not something for tiny digitalocean instance, especially if the download numbers are growing fast.

Of course something in the middle is also possible, I mean between the current simple DB, and including all data.

But do downloads of old versions matter much, anyway? Given R's dependency handling, they should not happen very often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment