Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split IO #2 #2972

Merged
merged 132 commits into from Jul 22, 2022
Merged

Split IO #2 #2972

merged 132 commits into from Jul 22, 2022

Conversation

francisco-dlp
Copy link
Member

@francisco-dlp francisco-dlp commented Jul 5, 2022

Description of the change

Split HyperSpy's IO plugins into a separate package RosettaSciIO. See #1978 for the related discussion.

This supersedes #2174

Progress of the PR

How to test it

To test it, simply install rosettasciio (currently at hyperspy's root folder, it'll move to its own repository once there are no more code exchanges between hyperspy and RosettaSciIO) in addition to hyperspy. Everything should work as usual.

Additionally, it is possible to import the readers directly from RosettaSciIO as follows:

from rsciio.msa import api
api.file_reader("your_msa_file.msa")

@ericpre
Copy link
Member

ericpre commented Jul 21, 2022

Considering that there is not much left to do, I would prefer to finish in the next few days. After that, I am happy to give a go at splitting! :)

@jlaehne
Copy link
Contributor

jlaehne commented Jul 21, 2022

Considering that there is not much left to do, I would prefer to finish in the next few days. After that, I am happy to give a go at splitting! :)

+1 - would be great if you could do so!

@pietsjoh will be starting to contribute new readers in the next weeks and it would be great to do so directly to the new repo.

P.S.: I am available until Tuesday to review the remaining steps.

@francisco-dlp francisco-dlp modified the milestones: v1.8, v2.0 HyperSpy Split Jul 21, 2022
@ericpre
Copy link
Member

ericpre commented Jul 21, 2022

The remaining tasks are done in francisco-dlp#58.

@francisco-dlp, just in case, we can merge and push to your branch, would have time to review and merge francisco-dlp#58? The change are mostly trivial.

Worst-case scenario, we create another branch but it would be better to keep it all in this PR!

@jlaehne jlaehne linked an issue Jul 22, 2022 that may be closed by this pull request
@jlaehne
Copy link
Contributor

jlaehne commented Jul 22, 2022

Just realized that

  1. We are missing a changelog entry for this PR.
  2. v1.7.1 was not automatically merged from RnP into RnM and RnM is still on 1.7.1.dev0 - including the changelog.

I would propose merging this one nevertheless and sorting out anything related to the v1.7.1 merge into RnM in a separate PR, possibly including the changelog entry of this PR (which should actually be about the split and not about the preparation of the split).

@ericpre
Copy link
Member

ericpre commented Jul 22, 2022

Yes and there is some tidying up needed on the RELEASE_next_major branch too!

@ericpre ericpre merged commit 076e88d into hyperspy:RELEASE_next_major Jul 22, 2022
@ericpre
Copy link
Member

ericpre commented Jul 23, 2022

For the record, here are the step that I did to split into a separate repository:

Command to split the repository

`git filter-repo --path rosettascio/ --path rosettasciio/ --path hyperspy/io_plugins/ --path hyperspy/misc/io/ --path hyperspy/tests/io/ --path doc/user_guide/io.rst --path-rename rosettasciio/: --force

I needed to add all relevant paths to keep the history after folder renames, etc.

Tidy up left over code

Since it was necessary to also filter doc/user_guide/io.rst to keep the history, I remove it in a commit.

Clean some large file from the history

We had some large which were added and removed from code during a pull request and never cleaned up. This was possible to find them using the following command:

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

ref: https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history

The largest file before history cleaning were:

3c272c431de2  8.7MiB rsciio/tests/emd_files/fei_emd_files.zip
b4dce2fe8c67   10MiB hyperspy/tests/io/sur_data/test_spectral_map_compressed.sur
5fdd25abfa0b   11MiB hyperspy/tests/io/edax_files/spd_map.spd.gz
f89958aa88ac   11MiB hyperspy/tests/io/edax_files.zip
48e1eba14394   21MiB hyperspy/tests/io/nexus_files/file2.nxs
b949accc2fcf   32MiB hyperspy/tests/io/sur_data/test_spectral_map.sur
4a8123661a42   35MiB hyperspy/tests/io/edax_files.zip
502c66aed3c6   35MiB rsciio/tests/edax_files.zip
e5ad43b77116   56MiB hyperspy/tests/io/nexus_files/file1.nxs

And are now

3c272c431de2  8.7MiB rsciio/tests/emd_files/fei_emd_files.zip
5fdd25abfa0b   11MiB hyperspy/tests/io/edax_files/spd_map.spd.gz
f89958aa88ac   11MiB hyperspy/tests/io/edax_files.zip
4a8123661a42   35MiB hyperspy/tests/io/edax_files.zip
502c66aed3c6   35MiB rsciio/tests/edax_files.zip

I keep a local copy of the git repository before in case something was incorrect! After hyperspy 2.0 is released, we should possibly do the same to the hyperspy repository.

@jlaehne
Copy link
Contributor

jlaehne commented Jul 23, 2022

Thanks Eric for finishing it up!

jlaehne added a commit that referenced this pull request Aug 24, 2022
Follow up of #2972 and tidy up left up rosettasciio
@ericpre ericpre mentioned this pull request Aug 25, 2022
57 tasks
@ericpre ericpre mentioned this pull request Jul 21, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making separate IO-library
5 participants