Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify file integrity of downloaded files by hash sum #115

Open
6 of 16 tasks
skasberger opened this issue Mar 1, 2021 · 2 comments
Open
6 of 16 tasks

Verify file integrity of downloaded files by hash sum #115

skasberger opened this issue Mar 1, 2021 · 2 comments
Assignees
Labels
pkg:api api related activities prio:medium status:confirmed Is a valid issue and will be moved forward soon. type:feature New feature

Comments

@skasberger
Copy link
Member

skasberger commented Mar 1, 2021

Verify the file integrity of files downloaded with their hash values. Mentioned in a call by @atrisovic.

Prepare

  • check Python hash implementation
    • md5
    • sha-1
    • sha-256
    • sha-512
  • check what has to be hashed from the response: the resp.content or needs a temporary file be saved before hashing the file? -> requests.Response.content

Implementation

  • Write tests
    • add argument to enable/disable this
    • add argument to pass checksum algorithm to be used: default = MD5, other = SHA-1, SHA-256 or SHA-512
  • Update code: get_datafile()
import hashlib
from pyDataverse.api import NativeApi
api = NativeApi("https://data.aussda.at)
resp = api.get_datafile(3702)
m = hashlib.md5()
# m = hashlib.sha1()
# m = hashlib.sha256()
# m = hashlib.sha512()
m.update(resp.content)
m.hexdigest()
  • Update Docs
  • Update Docstrings
  • Run pytest
  • Run tox
  • Run pylint
  • Run mypy

Review

Follow-Ups

  • [ ]
@skasberger skasberger added type:feature New feature pkg:api api related activities prio:medium status:confirmed Is a valid issue and will be moved forward soon. labels Mar 1, 2021
@skasberger skasberger added this to the v0.4.0 milestone Mar 1, 2021
@skasberger skasberger self-assigned this Mar 1, 2021
@skasberger skasberger mentioned this issue Mar 14, 2021
35 tasks
@atrisovic
Copy link
Member

Hey @skasberger!

This is how I solved the problem for checking the checksum error in my previous project: https://github.com/atrisovic/dataverse-r-study/blob/0fc1c223ed0a0777633f94f9b7cad699003aaf7a/docker/download_dataset.py#L32-L39

I tried playing with the client to incorporate the code, but I think it's quite awkward to do it the same way.
I can still share the code if you think it would be any helpful, but I think there needs to be another approach x)

@pdurbin
Copy link
Member

pdurbin commented Feb 14, 2024

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

@pdurbin pdurbin removed this from the v0.4.0 milestone Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:api api related activities prio:medium status:confirmed Is a valid issue and will be moved forward soon. type:feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants