Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If version is not specified installation fails #9

Closed
DavidGOrtega opened this issue Feb 1, 2021 · 9 comments · Fixed by #13
Closed

If version is not specified installation fails #9

DavidGOrtega opened this issue Feb 1, 2021 · 9 comments · Fixed by #13
Assignees

Comments

@DavidGOrtega
Copy link
Contributor

Version should be latest by default. However this works

- uses: iterative/setup-dvc@v1
        with:
          version: 'latest'

and this returns an error

- uses: iterative/setup-dvc@v1

image

Run iterative/setup-dvc@v1
Error: Command failed: /usr/bin/sudo dpkg -i 'dvc.deb' && /usr/bin/sudo rm -f 'dvc.deb'
dpkg-deb: error: 'dvc.deb' is not a Debian format archive
dpkg: error processing archive dvc.deb (--install):
 dpkg-deb --control subprocess returned error exit status 2
Errors were encountered while processing:
 dvc.deb
@DavidGOrtega DavidGOrtega changed the title In version is not specified fails If version is not specified installation fails Feb 1, 2021
@fabiosantoscode
Copy link

fabiosantoscode commented Feb 3, 2021

I get this sometimes when installing deb packages by file path using dpkg. Usually apt-get install ./package.deb fixes it (the ./ makes it not look up the package in its internal list) but I have no idea why.

@DavidGOrtega
Copy link
Contributor Author

@fabiosantoscode I got this error 1 out of 20 times, I was hoping to be differences in the GH image. But its also very hard to fix it because it's very hard to repro it.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Feb 24, 2021

I've configured a cron job in order to automatically reproduce the bug.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 3, 2021

Error

# apt install ··· ./dvc.deb
E: Invalid archive signature
E: Internal error, could not locate member control.tar.{zstlz4gzxzbz2lzma}
E: Could not read meta data from /home/runner/work/dvc-action-test/dvc-action-test/dvc.deb
E: The package lists or status file could not be parsed or opened.

Timeline

Cause

This race condition will happen in a time window of approximately 10 minutes after each DVC release, because published releases appear as latest even before triggering the platform-specific asset building actions, and the code on this repository does not consider this particular edge case.

setup-dvc/src/utils.js

Lines 34 to 40 in 19880d0

const get_latest_version = async () => {
const endpoint = 'https://api.github.com/repos/iterative/dvc/releases/latest';
const response = await fetch(endpoint, { method: 'GET' });
const { tag_name } = await response.json();
return tag_name;
};

This issue was specially hard to narrow down because the 404 error for the package artifact wasn't being handled on the code, causing apt to be fed with the contents of an error page instead of the expected package file; checking for a successful response through the res.ok property would probably have produced a more meaningful error.

setup-dvc/src/utils.js

Lines 20 to 32 in 19880d0

const download = async (url, path) => {
const res = await fetch(url);
const fileStream = fs.createWriteStream(path);
await new Promise((resolve, reject) => {
res.body.pipe(fileStream);
res.body.on('error', err => {
reject(err);
});
fileStream.on('finish', function() {
resolve();
});
});
};

After reviewing the code, I'm inclined to think that there would not be any difference between specifying version: 'latest' and not specifying any version, so the observation in the original post would only be a direct consequence of the erratic nature of this issue.

let { version = 'latest' } = opts;

Solutions

Check the latest version for matching assets and, if there isn't any, resort to the penultimate version.

⚠️ Not feasible due to the GitHub API rate limits for unauthenticated users; asking for a GITHUB_TOKEN is impractical.

import { writeFile } from 'fs/promises';
import { Octokit } from '@octokit/rest';

async function install() {
  const release = await get_matching_asset({
    owner: 'iterative',
    repository: 'dvc',
    condition: (asset) => /^dvc_.+_amd64\.deb$/.test(asset.name),
    tag: 'latest',
  });
  await writeFile('dvc.deb', Buffer.from(release));
}

async function get_matching_asset({
  owner,
  repository,
  condition,
  tag = 'latest',
  depth = 2,
}) {
  const octokit = new Octokit();

  let releases = [];
  if (tag === 'latest') {
    const { data } = await octokit.repos.listReleases({
      owner: owner,
      repo: repository,
      per_page: depth,
    });
    releases.push(...data);
  } else {
    const { data } = await octokit.repos.getReleaseByTag({
      owner: owner,
      repo: repository,
      tag: tag,
    });
    releases.push(data);
  }

  for (const release of releases)
    for (const asset of release.assets)
      if (condition(asset))
        return (
          await octokit.repos.getReleaseAsset({
            headers: {
              Accept: 'application/octet-stream',
            },
            owner: owner,
            repo: repository,
            asset_id: asset.id,
          })
        ).data;
  throw new Error(`Asset not found for the ${tag} release.`);
}
Use pip install for all the operating systems.

See #12

Logs

2021-03-03 10:26:40
2021-03-03 10:27:13

@DavidGOrtega
Copy link
Contributor Author

Awesome catch!

Not feasible due to the GitHub API rate limits for unauthenticated users; asking for a GITHUB_TOKEN is impractical.

GITHUB_TOKEN is accesible in the workflow. Is that GITHUB_TOKEN that you are referring to?

@0x2b3bfa0
Copy link
Member

@DavidGOrtega, yes, it's accessible, but it doesn't look appropriate to ask for it when setting up a workflow dependency; at least, not without an excellent reason.

I would prefer either (1) to get DVC to publish releases once assets have been uploaded, or (2) to install the package from other official sources as @efiop suggested here. Nevertheless, it looks like our brew recipe only has the latest version and we would have to run the extra mile anyways.

@DavidGOrtega
Copy link
Contributor Author

but it doesn't look appropriate to ask for it when setting up a workflow dependency; at least, not without an excellent reason.

You mean this?

     - uses: iterative/setup-cml@v1
        with:
          token: ${{ secrets.GITHUB_TOKEN }} 

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Mar 4, 2021

Yes, this would allow us to issue authenticated API calls and raise the rate limit, but at the cost of asking the user for a token on a step that (ideally) should not require it.

Another possible option would be to hardcode the latest version on this repository and update it automatically with a cron GitHub Action, but it's still a bit of a X-Y problem.

@0x2b3bfa0
Copy link
Member

Definitely, the ideal solution would be having all the DVC release assets uploaded before publishing. All the alternatives I can think of would add up to the technical debt laundry list, one way or another.

📖   Note: this issue reduces our layman availability metric to 99.88 %

Considering that the build packages action takes approximately 8 minutes to upload the release assets after publishing, we can perform some quick downtime estimations with the following formulas:

release_dates = [
    1614942525,
    1614900985,
    1614785918,
    1614766914,
    1614715882,
    1614615657,
    1614106844,
    1614339349,
    1613533904,
    1613415465,
    1612260018,
    1612208847,
    1611628591,
    1611527735,
    1611227359,
    1609850097,
    1608723473,
    1608124083,
    1608054971,
    1607970359,
    1607819710,
    1607443941,
    1606930143,
    1606926602,
    1606924068,
    1606235653,
    1605813745,
    1605662310,
    1603812829,
    1603387362
]

release_intervals = [
    abs(first - second)
    for first, second in
    zip(releases, releases[1:])
]

average_release_interval = sum(release_intervals) / len(release_intervals)
estimated_downtime_ratio = 8 * 60 / average_release_interval

print(f"Estimated availability {1 - estimated_downtime_ratio:.2%}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants