Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Hetionet Link #1220

Open
3 tasks done
HxyScotthuang opened this issue Feb 2, 2023 · 13 comments
Open
3 tasks done

Missing Hetionet Link #1220

HxyScotthuang opened this issue Feb 2, 2023 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@HxyScotthuang
Copy link

HxyScotthuang commented Feb 2, 2023

Describe the bug

When I want to import Hetionet in PyKeen, there is a 404 error as the target URL in the source code is not functioning. Please check and update the function of Hetionet().

Source code: https://pykeen.readthedocs.io/en/stable/_modules/pykeen/datasets/hetionet.html#Hetionet

URL in source code: https://github.com/hetio/hetionet/raw/master/hetnet/tsv/hetionet-v1.0-edges.sif.gz

How to reproduce

from pykeen.datasets import Hetionet

dataset = Hetionet()

train_triples = create_triples_from_pykeen_dataset(dataset.training)

Environment

Key Value
OS posix
Platform Linux
Release 5.10.147+
Time Thu Feb 2 22:07:29 2023
Python 3.8.10
PyKEEN 1.10.0
PyKEEN Hash UNHASHED
PyKEEN Branch
PyTorch 1.13.1+cu116
CUDA Available? true
CUDA Version 11.6
cuDNN Version 8302

Additional information

No response

Issue Template Checks

  • This is not a feature request (use a different issue template if it is)
  • This is not a question (use the discussions forum instead)
  • I've read the text explaining why including environment information is important and understand if I omit this information that my issue will be dismissed
@HxyScotthuang HxyScotthuang added the bug Something isn't working label Feb 2, 2023
@cthoyt cthoyt self-assigned this Feb 2, 2023
@cthoyt
Copy link
Member

cthoyt commented Feb 2, 2023

Hi @HxyScotthuang , looks like @dhimmel did some reorganizing on the hetionet repo. i'll try and get to this tomorrow - it just requires changing that URL in the source code. Would be glad to accept a PR if you want to take care of this yourself!

@HxyScotthuang
Copy link
Author

It seems that the url has changed to https://github.com/hetio/hetionet/blob/main/hetnet/tsv/hetionet-v1.0-edges.sif.gz, but substituting this throws me an BadGzipFile: Not a gzipped file (b'\n\n') Error.

@HxyScotthuang
Copy link
Author

Please have a look, as I am not sure if this is the correct URL. Thanks!

@HxyScotthuang
Copy link
Author

HxyScotthuang commented Feb 2, 2023

It seems that the url has changed to https://github.com/hetio/hetionet/blob/main/hetnet/tsv/hetionet-v1.0-edges.sif.gz, but substituting this throws me an BadGzipFile: Not a gzipped file (b'\n\n') Error.

Code to reproduce the error:

from pykeen.datasets.base import SingleTabbedDataset

URL = 'https://github.com/hetio/hetionet/blob/main/hetnet/tsv/hetionet-v1.0-edges.sif.gz'

class Hetionet(SingleTabbedDataset):
    def __init__(self, **kwargs):
        super().__init__(url=URL, **kwargs)

dataset = Hetionet()
print(dataset.training)

@cthoyt
Copy link
Member

cthoyt commented Feb 2, 2023

Unfortunately, the original URL did not pin to a specific version. All you need to do is swap master with v1.0.0 (the tag) so it should read:

https://github.com/hetio/hetionet/raw/v1.0.0/hetnet/tsv/hetionet-v1.0-edges.sif.gz

@cthoyt cthoyt reopened this Feb 2, 2023
@cthoyt
Copy link
Member

cthoyt commented Feb 2, 2023

@HxyScotthuang it's not closed until we have a code fix in the package, sketchy patches aren't good for the future ;)

@HxyScotthuang
Copy link
Author

Oh I see. The fix is working on my end. Thank you very much for the update!

@dhimmel
Copy link

dhimmel commented Feb 3, 2023

Ah I renamed the master branch to main. I thought that old URLs would continue to work. It seems like they sort of do... https://github.com/hetio/hetionet/raw/master/hetnet/tsv/hetionet-v1.0-edges.sif.gz returns 404 but still downloads the file in my browser.

I wonder if I should make a master tag so these existing download URLs continue to work (or create a master branch, but can be confusing to have both main and master).

@mberr
Copy link
Member

mberr commented Feb 3, 2023

You could also consider to either make a GitHub release with a release artifact to download the data, or, maybe even better, create a persistent dump of the data in Zenodo.

@dhimmel
Copy link

dhimmel commented Feb 3, 2023

There is a zenodo release, but that doesn't fix the problem. The problem is that there are links in the wild to the master branch that are now broken since we renamed that branch.

I think I will try creating a master tag.

@dhimmel
Copy link

dhimmel commented Feb 3, 2023

Created a master tag. See https://github.com/hetio/hetionet/releases/tag/master. Old links should continue to work now. Let me know if otherwise.

Still not a bad idea to specify an actual versioned tag like v1.0.0 or a commit hash.

@mberr
Copy link
Member

mberr commented Feb 3, 2023

There is a zenodo release, but that doesn't fix the problem.

It might, at least for use within PyKEEN, fix the problem, if we change to using the zenodo URL instead of accessing github/.../raw/... URLs 🙂

@dhimmel
Copy link

dhimmel commented Feb 3, 2023

The way Zenodo archived the GitHub repo is that it made a zip of the repo contents as of the v1.0.0 tag. Doesn't allow URL download access to individual files in the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants