Replies: 1 comment
-
Hi @serkor1, thanks for taking the time and the detailed write-up. I agree with your point that the current tl;drThe official logs-API (consumed via the JSON-endpoint in CRAN/E) can't deliver fresh data, earliest data is from "last day", I assume relative to UTC. CRAN/E current doesn't want to show stale data, so Immediate actionI'll update the wording + API-call to always use More contextAfter looking a little bit more into it, I understand that
are closed source, therefore we have no ability to hook into this system. Apparently, the Logs-DB (i.e. downloads + trends metadata) is a simple Postgres-DB. The SQL-code in In other words, if the referenced script is showing us the actual schema of the DB, CRAN is not using Postgres to its full extent, not even close TBH. A refactor of the DB would be required to use As of writing, CRAN has 21,520 packages. Assuming 20% of all packages are used within any given hour on average then 4,304 entries would be aggregated for each hour. The schema for the "daily"-table is currently minimal:
So, assuming one package/hour has 70 Bytes, we would be at 301.28 KB per hour, thus having ~7.2 MB/day. This would yield 216 MB per month for a time-bucket with an hourly-scope. This size can be drastically reduced by only projecting e.g. the most 100 downloaded packages (as it's done now in the referenced script), so the size would be minimal. ConclusionI would really like to set up the time series database for CRAN 😄 sadly that's not possible, as the real data/access to it is closed source. The current implementation of the Thanks again for taking a look into it @serkor1 👍 |
Beta Was this translation helpful? Give feedback.
-
At the time of writing packages like
tidyverse
,data.table
andRcpp
shows 0 downloads for the last 24 hours. This is clearly wrong.The error comes, I believe, from how CRAN updates the package downloads. See the following for an elaboration,
Last-day download counts
Using
last-day
gives the latest published downloads, which corresponds to the last (available) 24 hour downloads.Date-specific download counts
Using
Sys.Date() - 1
, returns the last, or last two days depending on what time you make the call, dates and will return it as 0 if you call it before the numbers are published, or we pass into the next day.I am not familiar with the language you are using, so I couldn't locate the error. But my guess is that your API calls uses date-specific calls, which doesn't return any sensible value until CRAN releases the actual numbers. But when time passes midnight (I guess) you are back to the date-specific issue which returns 0 downloads.
Beta Was this translation helpful? Give feedback.
All reactions