Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate test files by modules and collect test files csv/cif into folders #3746

Merged
merged 112 commits into from
Apr 19, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Apr 11, 2024

Summary

  • Collect test files csv/cif into their own folders
  • Separate json files by modules
  • Relocate test files by modules

Tasks

  • Separate folders by modules
  • Separate json files by modules
  • Many of the files under molecules are specific to io.qchem, but there is terribly strong interdependence need to be clarified
  • Some test files are located inside test dir

Important: removed unused test files

Side note

  • The re pattern used for renaming json file path in VS code: source (f"\{TEST_FILES_DIR\})(/[^/]+\.json(\.gz)?"), target $1/json$2, reference here
  • The re pattern for cif: just replace above json with cif

Copy link

coderabbitai bot commented Apr 11, 2024

Note

Reviews Paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@DanielYang59
Copy link
Contributor Author

@coderabbitai ignore

@DanielYang59 DanielYang59 marked this pull request as ready for review April 11, 2024 11:47
@DanielYang59
Copy link
Contributor Author

@janosh Can you please review this? Thank you!

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of subfolders based on filetype, we split test files by the corresponding pymatgen module?

so a lot of the tests/files/json would go in tests/files/entries instead if they contain serialized phase diagram or Pourbaix entries.
CIF files and some JSON files would go in tests/files/structures.

other folders could be tests/files/electronic_structure (or just tests/files/electronic) for band structures and DOS, tests/files/phonons for phonon BS/DOS, tests/files/lobster (or perhaps tests/files/io/lobster), etc.

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Apr 11, 2024

how about instead of subfolders based on filetype, we split test files by the corresponding pymatgen module?

It would be good to make the test file folders follow the same hierarchy as the code, but I'm afraid there would be huge overlap (and confusion) at least at this stage? (Currently we don't have a one-to-one mapping between test and test files) Multiple module might require access to the same format of file (or even the same file). For example electronic_structure might require DOSCAR from vasp, how can we decide where DOSCAR should go then (not sure if it truly did, just for example)?

On the other hand, separating files by format seems much easier to access for developers? And would encourage test file re-use (If we group them by modules, people would tend to have a set of test files for each module, and test files from other module may not be re-used). Though people might need to access multiple test file folders if multiple formats are involved within the test.

@janosh
Copy link
Member

janosh commented Apr 11, 2024

On the other hand, separating files by format seems much easier to access for developers

hmmm... not sure. how does it become easier? the file extension already makes searching by file type easy. i just type .json in the file picker to get a list

Screenshot 2024-04-11 at 14 21 23

putting it in a JSON folder doesn't confer additional meaning on what the file was meant to be used for.
putting it in tests/files/phonons gives more information. that doesn't mean a phonon test file can't be used for tests elsewhere. we just have to decide on a location. e.g. in your DOSCAR example, i think it should definitely go in tests/files/io/vasp

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Apr 11, 2024

how does it become easier? the file extension already makes searching by file type easy. i just type .json in the file picker to get a list. putting it in a JSON folder doesn't confer additional meaning on what the file was meant to be used for. putting it in tests/files/phonons gives more information.

I think my intention for cleaning up test files is not to add more information, but to clean up the "root directory" of test file folder 😄 . Currently we have quite a lot json and cif files lie flat under the test file dir, and it would flood ls command (of course if you do a more explicit search there would be no difference).

we just have to decide on a location. e.g. in your DOSCAR example, i think it should definitely go in tests/files/io/vasp

Yes that what I'm concerned about, others would need to "learn" about this agreement then. People would not need to read a positioning_convention file or README file to know cif files would reside in the cif dir I assume?

@DanielYang59
Copy link
Contributor Author

I would say it's great to make test files follow the same hierarchy as the test in the long term. But that would take more effort and is not what this PR is intended for (just want to avoid having everything lie flat under the root of test file dir).

Though I would be happy to help push towards this in the long term :)

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the cif/ folder could make sense but i'm less sure about the JSON folder since it's quite a lot more generic, containing files of very different purpose. how about we leave the cif/ folder as you currently have it but split the json/ based on what pymatgen module the test file belongs to? that way we minimize git diff from repeatedly moving any files around in subsequent PRs

@janosh janosh added tests Issues with or changes to the pymatgen test suite housekeeping Moving around or cleaning up old code/files labels Apr 13, 2024
@DanielYang59
Copy link
Contributor Author

how about we leave the cif/ folder as you currently have it but split the json/ based on what pymatgen module the test file belongs to?

That's a great idea! Also sounds like a good start towards organizing test files by modules. I would get my hands on this next week.

@DanielYang59 DanielYang59 changed the title Collect test files (csv/json/cif) into folders Collect test files csv/cif into folders and separate json by modules Apr 15, 2024
@DanielYang59 DanielYang59 changed the title Collect test files csv/cif into folders and separate test files by module Separate test files by modules and collect test files csv/cif into folders Apr 19, 2024
@DanielYang59 DanielYang59 marked this pull request as ready for review April 19, 2024 12:01
@DanielYang59
Copy link
Contributor Author

DanielYang59 commented Apr 19, 2024

@janosh Please review this as well. I wasn't thinking that we could just do everything in one PR, but luckily here we are :) All test files sorted by module.

A side note being: the test files for io.qchem seem very large (146 MB alone, total size 554 MB), might be good to compress those as well.

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, huge PR 👍

Github crashes when i scroll down the list of changes. but tests are passing so this looks safe to merge 😄

@janosh janosh merged commit 2d008e0 into materialsproject:master Apr 19, 2024
22 checks passed
@DanielYang59
Copy link
Contributor Author

Thanks for reviewing!

Yes should be safe. No functional changes just relocating test files.

Several test files removed with caution though (did global search before removing any), but still could be used implicitly by those disabled tests, so I tagged them in the summary section in case we need to recover any.

Do we need to run garbage collection or something? Current clone size seems over 800 MB?

@DanielYang59 DanielYang59 deleted the cleanup-test-files branch April 19, 2024 12:18
@janosh
Copy link
Member

janosh commented Apr 19, 2024

Do we need to run garbage collection or something? Current clone size seems over 800 MB?

it used to be ~1GB so 800MB is already an improvement. don't think there's such a thing as garbage collection for git. history just keeps growing. you can decrease clone time by only getting the last commit

git clone https://github.com/materialsproject/pymatgen --depth 1

@DanielYang59
Copy link
Contributor Author

it used to be ~1GB so 800MB is already an improvement.

Great to know!

don't think there's such a thing as garbage collection for git. history just keeps growing.

There actually is https://git-scm.com/docs/git-gc. Not sure if it would be run automatically or not.

you can decrease clone time by only getting the last commit

git clone https://github.com/materialsproject/pymatgen --depth 1

Thanks for the tip. Just saying though, I don't clone very often so it's not a problem :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping Moving around or cleaning up old code/files tests Issues with or changes to the pymatgen test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants