Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Git LFS files #40

Closed
GemmaTuron opened this issue Apr 9, 2024 · 4 comments
Closed

Remove Git LFS files #40

GemmaTuron opened this issue Apr 9, 2024 · 4 comments

Comments

@GemmaTuron
Copy link
Member

We have a lot of legacy files tracked in git-lfs, which creates a history of almost 2GB
We can clean this up by changing all the commit history (otherwise the pointers are not deleted actually)

Steps:

git lfs ls-files #this will list current git LFS files
git lfs migrate export --include-ref=main --include="requirements.txt" #this will remove the requirements.txt from git lfs
git push origin main -f

We can remove from the tracking most git lfs files in the same fashion, saving us a lot of space and also users when clonning the repo.
These are the current git lfs files:

feee072247 * zairachem/data/atom_pols.txt
4c79cd109b * zairachem/tools/fpsim2/FPSim2/docs/_sources/index.rst.txt
715b45b3e0 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/FPSim2.io.backends.rst.txt
7ca37e817c * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/FPSim2.io.rst.txt
da1e5a9c83 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/FPSim2.rst.txt
33f87ccb8e * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/modules.rst.txt
4e300b8af9 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/create_fp_db.rst.txt
a32031e60f * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/gpu_sim.rst.txt
2ac4ca5bcb * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/install.rst.txt
5c152d2d9c * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/limitations.rst.txt
77545a4d9b * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/sim.rst.txt
b0a14ded29 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/sim_matrix.rst.txt
d18b20bd9d * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/subs.rst.txt
929d824e27 * zairachem/tools/fpsim2/FPSim2/docs/_sources/source/user_guide/tversky.rst.txt
0dd174b621 * zairachem/tools/fpsim2/FPSim2/tests/data/test.h5
527d498610 * zairachem/tools/fpsim2/data/reference_library.csv
92ca86fed2 * zairachem/tools/fpsim2/data/reference_library.h5
1571cfaac2 * zairachem/tools/ghost/ghostml/test_data/chembl3371_testing_data.pkl
5feeae0837 * zairachem/tools/mollib/virtual_libraries/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar.txt
1abc0e3b08 * zairachem/tools/mollib/virtual_libraries/data/chembl24_cleaned_unique_canon.txt
3c9e79bb69 * zairachem/tools/mollib/virtual_libraries/data/my_molecules.txt
5feeae0837 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/1_140_x0.txt
801a3aafb8 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/data_tr.txt
835b3235f5 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/data_val.txt
97fbc7d936 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/desc.pkl
14d2f3e81d * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/fp.pkl
1d5beca471 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/generic_scaffolds.txt
4d8400f42a * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/idx_tr.pkl
bfd663a77f * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/idx_val.pkl
53c4615f48 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/scaf.pkl
27fdbc2468 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/MEGx_Release_180901_All_4557_length_filtered_wo_sugar/1_140_x0/scaffolds.txt
ec0f78f70d * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/1_140_x10.txt
8f55532089 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/data_tr.txt
fb1816171d * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/data_val.txt
6101313bf4 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/desc.pkl
a453a0541e * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/fp.pkl
af2e34e529 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/generic_scaffolds.txt
588d6b7f14 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/idx_tr.pkl
329fc06062 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/idx_val.pkl
3bdbcd6fbb * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/scaf.pkl
157c174924 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/chembl24_cleaned_unique_canon/1_140_x10/scaffolds.txt
866154deb5 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/1_140_x10.txt
3c9e79bb69 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/data_tr.txt
3c9e79bb69 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/data_val.txt
b3cbfab822 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/desc.pkl
ad3a6af4cb * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/fp.pkl
8f6ed5c635 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/generic_scaffolds.txt
f4909f12da * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/idx_tr.pkl
7225963e04 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/idx_val.pkl
3bb58bee46 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/scaf.pkl
3f1fc605b1 * zairachem/tools/mollib/virtual_libraries/experiments/results/data/my_molecules/1_140_x10/scaffolds.txt
1f9074e73e * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/02.h5
171d3a40c0 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/04.h5
c3392cb261 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/06.h5
051838d651 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/08.h5
8d721c6f1c * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/10.h5
68d1e33663 * zairachem/tools/mollib/virtual_libraries/experiments/results/my_molecules/models/history.pkl
de7a6e875d * zairachem/tools/mollib/virtual_libraries/models/c24_augmentationx10_minlen1_maxlen140.h5
2aacc3e996 * zairachem/tools/mollib/virtual_libraries/models/molecules_start_0.7.txt
5eafefebe7 * zairachem/tools/mollib/virtual_libraries/src/python/fcd/model_FCD_all.h5
@GemmaTuron
Copy link
Member Author

@miquelduranfrigola and @DhanshreeA I'd like to do this to remove the heavy parts of ZairaChem which are not needed - but I'd need your confirmation if you are ok with it

@DhanshreeA
Copy link
Member

If the files are truly legacy and not needed by any tool (ie fpsim, mollib as I can see here), then I don't see why we can't remove these.

@GemmaTuron
Copy link
Member Author

OK, these files need to be removed one by one and the entire Git History rewritten. We are in a similar situation as we encountered with Chem Sampler - this does not make much sense to do one by one given that we want to refactor all of the code - the problem is that there is so much history and legacy files it will always take forever to clone unless we delete all the history @miquelduranfrigola what do you suggest?

fyi @DhanshreeA, deleting the files does not eliminate them from the git lfs registry, that is what is so annoying. pointers remain there forever unless you rewrite the entire commit history (>200 commits, it takes forever)

@GemmaTuron
Copy link
Member Author

ok, I have deleted Mollib entirely and cleaned up FPSIM2 of files that were not needed (incl the reference libraries)

We can close this issue for the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants