Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add incoming time to history snapshots #36

Closed
nx10 opened this issue Feb 9, 2021 · 4 comments · Fixed by #51
Closed

Add incoming time to history snapshots #36

nx10 opened this issue Feb 9, 2021 · 4 comments · Fixed by #51

Comments

@nx10
Copy link

nx10 commented Feb 9, 2021

Can you please add the incoming time to the history branch CSV snapshots?

Those currently only include snapshot_time which is constant for all of them.

I am trying to use the GitHub API to build an alternative frontend for the data.

Thank you for your time.

Edit: For reference I am doing this (TypeScript):

async function fetchCranSays() {

    // fetch last commit by actions bot on history branch
    const reCommints: any[] = await fetch(
        "https://api.github.com/repos/lockedata/cransays/commits?sha=history&author=actions-user&per_page=1"
    ).then((response) => response.json());

    const commitSha = reCommints[0].sha;
    console.log(commitSha);

    // fetch commit (for filename)
    const reCommitExt: any = await fetch(
        "https://api.github.com/repos/lockedata/cransays/commits/" + commitSha
    ).then((response) => response.json());

    const csvFilename = reCommitExt.files[0].filename;
    console.log(csvFilename);

    // fetch csv
    const reCsv: any = await fetch(
        "https://api.github.com/repos/lockedata/cransays/contents/" +
        csvFilename +
        "?ref=history"
    ).then((response) => response.json());

    const csv = atob(reCsv.content);
    console.log(csv)
}
@maelle
Copy link
Member

maelle commented Feb 9, 2021

👋 @nx10! Thanks for your interest!
I wouldn't recommend using the history branch as it might disappear soon (I think at one point there will be too many files cc @llrs -- a better solution would use some sort of cloud storage)
I'd recommend creating your own GitHub Actions workflow, potentially building on ours.
If you simply fork this repo, the line to change to add more info to the CSVs is https://github.com/lockedata/cransays/blob/1ecabc11740a4bbfe7445765b0845ec7a6499bf8/vignettes/dashboard.Rmd#L45
I have little to no time to work on this myself.

@maelle maelle closed this as completed Feb 9, 2021
@llrs
Copy link
Contributor

llrs commented Feb 9, 2021

Hi Florian @nx10, not sure how this incoming time would be, if you mean when was the package submitted, this information is not available on the folders available (afaik). snapshot_time is constant for each batch, but I don't think there is any workaround as CRAN does not provide any API notifying about changes on the queue and then the only solution is to scan at some frequency, which cannot be too high to not create more pressure on CRAN's website. But if you want to calculate how long does it take I've done an analysis of the data here.

The number of files is going to be a problem. I've set up a similar GitHub Action to record hourly CRAN and Bioconductor packages and sometimes downloading the branch fails due to its size. Moving to a more general solution would help provide a better service and recording.

Metacran does look if there are new packages available on CRAN every minute and stores the results on a database. If you wanted something of this detail you will need to set up something outside a GitHub Action

@nx10
Copy link
Author

nx10 commented Feb 9, 2021

Thank you for your responses.

@maelle I already wrote an implementation using the foghorn package, but wanted to avoid duplicating efforts (and minimize traffic to the CRAN servers, but I think that is pretty low when querying every hour anyway).

@llrs I am sorry for being unclear, with "incoming time" I meant the timestamp on the FTP server. I want to use it to show time in folder, placement in the current "queue" (folder) and (maybe) estimated time remaining using some regression model. While I agree, that there should be a dedicated server that collects the data in a proper database and exposes a public API, I do not have the resources to run one by myself. (Although I would be happy to contribute code). I have already seen your analysis when you posted it in the mailing list, very cool!

@llrs
Copy link
Contributor

llrs commented Feb 9, 2021

Thanks! mmh, yes, including the timestamp reported by the ftp site would be useful, this would have helped my analysis. I'm sure that if this move forward to a dedicated API it will be included.

Bisaloo added a commit that referenced this issue Feb 11, 2022
snapshot_time is largely redundant with information contained in the file name anyways
Fix #36
Fix #49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants