Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FigShare uses with external resources are kinda broken #45

Open
oxinabox opened this issue Jul 6, 2018 · 3 comments
Open

FigShare uses with external resources are kinda broken #45

oxinabox opened this issue Jul 6, 2018 · 3 comments

Comments

@oxinabox
Copy link
Owner

oxinabox commented Jul 6, 2018

This is a pathological case:
http://doi.org/10.6084/m9.figshare.5557801.v1
It is a Document on Figshare with an external file

I do not think this is worth fixing any time soon.
It is a fairly rare corner case.
And fiddly to fix.

I am just noting it down for record keeping

Wrong Outputs:

Figshare generator:

julia> generate(Figshare(), "http://doi.org/10.6084/m9.figshare.5557801.v1") |> println
WARNING: Generated registration block uses MD5 hashes, the MD5.jl package will be required.
register(DataDep(
    "Practices and documentation in the Open Source community",
    """
        Dataset: Practices and documentation in the Open Source community
        Website: https://figshare.com/articles/Practices_and_documentation_in_the_Open_Source_community/5557801
        Author: Chris Holdgraf, 0000-0002-8748-6546
        License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
        Date: 2017-10-31T20:44:31Z

        Responses and analysis code to a questionnaire asking open source developers about their practices in open source software, and their beliefs about documentation's role in the community.

        Please cite this work:
        Holdgraf, Chris; 0000-0002-8748-6546 (2017): Practices and documentation in the Open Source community. figshare. Dataset.
        if you use this in your research.
    """,
    Any["https://github.com/choldgraf/blog-documentation_questionnaire"],
    [(md5, )]
))

Broken hash, and the URL does not point to a downloadable file.

JSONLD_Web generator

julia> generate(JSONLD_Web(), "http://doi.org/10.6084/m9.figshare.5557801.v1") |> println
register(DataDep(
    "Practices and documentation in the Open Source community",
    """
        Dataset: Practices and documentation in the Open Source community
        Website: http://doi.org/10.6084/m9.figshare.5557801.v1
        Author: Chris Holdgraf, 0000-0002-8748-6546
        Date: missing
        License: missing

        Responses and analysis code to a questionnaire asking open source developers about their practices in open source software, and their beliefs about documentation's role in the community.
    """,
    String["https://github.com/choldgraf/blog-documentation_questionnaire"],
))

URL wrong, still

(normal) incomplete outputs

DataCite Generator

julia> generate(DataCite(), "http://doi.org/10.6084/m9.figshare.5557801.v1") |> println
INFO: DataCite based generation can only generate partial registration blocks, as DataCite metadata does not (currently) include the URL to the resource. You will have to edit in the URL after generation.
register(DataDep(
    "Practices and documentation in the Open Source community",
    """
        Dataset: Practices and documentation in the Open Source community
        Website: https://doi.org/10.6084/m9.figshare.5557801.v1
        Author: Chris Holdgraf, 0000-0002-8748-6546
        License: https://creativecommons.org/licenses/by/4.0/
        Date: 2017

        Responses and analysis code to a questionnaire asking open source developers about their practices in open source software, and their beliefs about documentation's role in the community.

        Please cite this dataset:
        Holdgraf, C., & 0000-0002-8748-6546. (2017). Practices and documentation in the Open Source community [Data set]. Figshare. https://doi.org/10.6084/m9.figshare.5557801.v1

        if you use this in your research.
    """,
    String["PUT DOWNLOAD URL HERE"],

))

This is actually as good as DataCite ever is.

JSON_DOI

julia> generate(JSONLD_DOI(), "http://doi.org/10.6084/m9.figshare.5557801.v1") |> println
register(DataDep(
    "Practices and documentation in the Open Source community",                                                                       """
        Dataset: Practices and documentation in the Open Source community
        Website: http://doi.org/10.6084/m9.figshare.5557801.v1
        Author: Chris Holdgraf, 0000-0002-8748-6546
        Date: 2017
        License: https://creativecommons.org/licenses/by/4.0

        Responses and analysis code to a questionnaire asking open source developers about their practices in open source software, and their beliefs about documentation's role in the community.
    """,
    missing,
))

This is fine, just like DataCite it is as usual missing URLs.

@SebastinSanty
Copy link
Collaborator

SebastinSanty commented Jul 6, 2018

Maybe implement some sort of recursive download for github based repos (or any folder based format for that matter) in DataDeps.jl?

@oxinabox
Copy link
Owner Author

oxinabox commented Jul 6, 2018

Maybe yes.
like some kind of (opt-in?) post processing that tries to generate MetaData for the URLS that are being downloaded, and then take the URLs from that or something.

Since the Github generator has the files right but inferior metadata on creator etc.

julia> generate(GitHub(), "https://github.com/choldgraf/blog-documentation_questionnaire") |> println
register(DataDep(
    "blog-documentation_questionnaire",
    """
        Dataset: blog-documentation_questionnaire
        Website: https://github.com/choldgraf/blog-documentation_questionnaire
        License: Unknown

        # blog-documentation_questionnaire
        A public repository for data + analyses for a blog post on documentation
    """,
    Any[Any["https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/data/contribs.csv", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/data/credit_enjoyment.csv"], Any["https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_contrib_type_bar.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_credit_enjoyment.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_diff_hist.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_docs_diff_compare.png", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/figures/plot_docs_usual_should.png"], "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/.gitignore", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/README.md", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/analysis.py", "https://cdn.rawgit.com/choldgraf/blog-documentation_questionnaire/1e145ef3d167d7fe8fd48434433069ae3d3f0193/plot_figs.py"],

))

@oxinabox
Copy link
Owner Author

oxinabox commented Oct 1, 2018

While I remember
FigShare is actually breaking the spec.
as per https://schema.org/DataDownload .

contentUrl is only for linking to "Actual bytes of the media object"

They should be using url or mainEntityOfPage
When linking to external sites like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants