Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing to pyvista/vtk-data #1790

Closed
adam-grant-hendry opened this issue Nov 7, 2021 Discussed in #1789 · 14 comments
Closed

Contributing to pyvista/vtk-data #1790

adam-grant-hendry opened this issue Nov 7, 2021 Discussed in #1789 · 14 comments

Comments

@adam-grant-hendry
Copy link
Contributor

adam-grant-hendry commented Nov 7, 2021

Discussed in #1789

Originally posted by adam-grant-hendry November 7, 2021
If I want to create a pull request to add a dataset to pyvista/vtk-data, should I fork the repository? It contains lfs files and downloading counts against the maintainer's bandwidth.

Are there details on contributing to vtk-data? The Contributing.md only talks about licenses (i.e. don't add a data set for which there isn't a license).

The above may be a good opportunity to elaborate in Contributing.md

@banesullivan
Copy link
Member

I'm going to remove the LFS files stored in that repo and place them somewhere else (they're geo example files I put in there for another PyVista-based project). Unfortunately, the only way I've been able to find how to do this is to delete and recreate the whole repository removing history. I'm perfectly content doing this as the history of that repo really isn't important, though it may mean that our examples will go down for a bit as the files are re-uploaded and github's mirrors update.

@pyvista/developers, does anyone have a problem with me moving forward with this in the next few days?

@adeak
Copy link
Member

adeak commented Nov 28, 2021

I'm not personally affected, so just out of curiosity: why do the files need removing?

@banesullivan
Copy link
Member

why do the files need removing?

Honestly, I don't know... just when I was looking up how to remove git-lfs files from a repo, I stumbled up this: https://stackoverflow.com/questions/34579211/how-to-delete-a-file-tracked-by-git-lfs-and-release-the-storage-quota

There are a lot of mixed answers in there. First, I'm going to try to remove the files, uninstall LFS from the repo, and clean the history to see if this works. But there's some discussion about GitHub not actually purging the files and any clones will still count against the users LFS quota.

At the end of the day, it may just be easier to delete the repo and re-upload all non git-lfs files rather than spending an hour or so to get it right to preserve some history that really does not matter

@adeak
Copy link
Member

adeak commented Nov 28, 2021

So I take it the answer is "because they take up git LFS quota" 😆

For what it's worth there's a link to the authoritative reference: https://docs.github.com/en/repositories/working-with-files/managing-large-files/removing-files-from-git-large-file-storage

@adam-grant-hendry
Copy link
Contributor Author

adam-grant-hendry commented Nov 28, 2021

As long as it doesn't affect the maintainer, I'm fine with any approach 🙂. I have a PR to add a DICOM stack dataset thats around 65 MB, if I remember correctly, to

  • implement the VTKDicomReader in pyvista. and
  • add memory use and speed improvements to add_volume()

which is why I was asking

@banesullivan
Copy link
Member

We already have a small dicom dataset that we use for testing: https://github.com/pyvista/vtk-data/blob/master/Data/DICOM_KNEE.dcm and https://github.com/pyvista/vtk-data/tree/master/Data/DICOMDirectory

its the examples.download_knee() data

Further the vtkDICOMImageReader is implemented in PyVista:

'.dcm': _vtk.vtkDICOMImageReader,

@MatthewFlamm
Copy link
Contributor

The DICOMDirectory example would probably suffice for testing. This data set would provide a more interesting example.

We have added Reader classes which allow for more control over reading data files. This request was about adding more control. See https://github.com/pyvista/pyvista/blob/main/pyvista/utilities/reader.py.

@banesullivan
Copy link
Member

This request was about adding more control

Ah, forgive me - drive by commenting here to be honest. I just wanted to make sure you all were aware we had and use that data since there was talk in pyvista/pyvista-support#500 (comment) about adding a reader and a dataset

@MatthewFlamm
Copy link
Contributor

For this PR, I think it is about whether this additional data set would provide some advantage for testing or example usage. I'm not a DICOM user, so I'm not sure if the single file in the existing DICOMDirectory is sufficient.

@banesullivan
Copy link
Member

I'm not a DICOM user, so I'm not sure if the single file in the existing DICOMDirectory is sufficient.

Me either, I've assumed it is sufficient though. @adam-grant-hendry, would this new dataset improve our examples/tests to make sure we sufficiently cover what you hope to do?

@adam-grant-hendry
Copy link
Contributor Author

adam-grant-hendry commented Nov 29, 2021

would this new dataset improve our examples/tests to make sure we sufficiently cover what you hope to do?

@banesullivan Yes, this will. In addition to porting over the VTKDICOMReader and add the ability to read a directory of files, there are two things my update to add_volume() does:

  1. Fixes a memory leak that causes RAM to explode and pyvista to crash, and
  2. Loads a volume at a speed comparable to paraview

I can run a memory profile before and after the commit that implements the change in the PR to show the improvement, both in memory usage and load time.

(FYI, need to test the ability for the DICOMReader to open multiple files)

@adam-grant-hendry
Copy link
Contributor Author

adam-grant-hendry commented Nov 29, 2021

However, ignoring the multi-file reader, I could probably show the memory and time improvements with the knee DICOM example

@akaszynski
Copy link
Member

Can this be closed?

@adam-grant-hendry
Copy link
Contributor Author

Can this be closed?

I think so, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants