Skip to content

Refactored config file loading logic for easier default config usage.#149

Merged
AdrianDAlessandro merged 23 commits intomainfrom
default_config_refactor
Mar 25, 2025
Merged

Refactored config file loading logic for easier default config usage.#149
AdrianDAlessandro merged 23 commits intomainfrom
default_config_refactor

Conversation

@Thomas-Rowlands
Copy link
Copy Markdown
Collaborator

@Thomas-Rowlands Thomas-Rowlands commented Mar 18, 2025

Description

Config files are now loaded separately depending on whether a default config is used or a new, user-made custom one.

Fixes #143

This new way of loading a default config file can be done like this:

from autocorpus.Autocorpus import Autocorpus
from autocorpus.configs.default_config import DefaultConfig

ac = Autocorpus(base_dir="my_base_dir", main_text="path/to/html/file")
ac.config = DefaultConfig.load_config(DefaultConfig.LEGACY_PMC)
ac.process_files("output/path")

For custom config files, users can do the following:

from autocorpus.Autocorpus import Autocorpus

ac = Autocorpus(base_dir="my_base_dir", main_text="path/to/html/file")
ac.config = ac.read_config("path/to/my_config.json")
ac.process_files("output/path")

Type of change

  • Documentation (non-breaking change that adds or improves the documentation)
  • New feature (non-breaking change which adds functionality)
  • Optimization (non-breaking, back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Key checklist

  • All tests pass (eg. pytest)
  • The documentation builds and looks OK (eg. mkdocs)
  • Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added or an issue has been opened to tackle that in the future. (Indicate issue here: # (issue))

@Thomas-Rowlands Thomas-Rowlands marked this pull request as ready for review March 18, 2025 15:57
@Thomas-Rowlands Thomas-Rowlands self-assigned this Mar 18, 2025
Copy link
Copy Markdown
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! There's a missing call to process_files() needed and I'd replace that assert statement, but other than that I think it's all ok. I've made some other suggestions, but I'll leave it up to you whether you think they're worth it or not.

Comment thread autocorpus/Autocorpus.py Outdated
Comment thread autocorpus/__main__.py Outdated
Comment thread autocorpus/configs/default_config.py Outdated
Comment thread autocorpus/configs/default_config.py Outdated
Comment thread autocorpus/configs/default_config.py Outdated
Copy link
Copy Markdown
Collaborator

@AdrianDAlessandro AdrianDAlessandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall pretty good. Alex made some relevant comments that should be addressed

Comment thread autocorpus/Autocorpus.py
Comment thread autocorpus/__main__.py Outdated
Comment thread autocorpus/configs/default_config.py Outdated
Comment thread autocorpus/__main__.py
@Thomas-Rowlands
Copy link
Copy Markdown
Collaborator Author

Re-requested reviews as plenty of requested changes have been implemented but one may have been missed, I've put a refactor of the DefaultConfig class in which outdated some of the issues

Copy link
Copy Markdown
Collaborator

@AdrianDAlessandro AdrianDAlessandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good overall! I've made one suggestion about making the type of config consistent, which should also simplify the Enum. Will be happy to merge after that change

Comment thread autocorpus/__main__.py Outdated
Copy link
Copy Markdown
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @AdrianDAlessandro's comment, but aside from that it looks good!

Comment thread autocorpus/configs/default_config.py Outdated
@Thomas-Rowlands
Copy link
Copy Markdown
Collaborator Author

load_config is now called to retrieve the config data as a dict, with the class docs adjusted

Copy link
Copy Markdown
Collaborator

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@alexdewar
Copy link
Copy Markdown
Collaborator

Oh, except that ruff is complaining about missing docstrings. You'll need to fix that before it'll let you merge.

Tip: you can install a hook locally to check this stuff before you commit if you run pre-commit install.

@Thomas-Rowlands
Copy link
Copy Markdown
Collaborator Author

Oh, except that ruff is complaining about missing docstrings. You'll need to fix that before it'll let you merge.

Tip: you can install a hook locally to check this stuff before you commit if you run pre-commit install.

Just tried my local ruff and its not reporting what is listed in the pre-commit.ci check on GitHub, so looks like I have a config issue. I've made changes based on the listed problems in the ruff GitHub action so far

Comment thread autocorpus/configs/default_config.py Outdated
Thomas-Rowlands and others added 2 commits March 25, 2025 11:15
Co-authored-by: Adrian D'Alessandro <a.dalessandro@imperial.ac.uk>
Copy link
Copy Markdown
Collaborator

@AdrianDAlessandro AdrianDAlessandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I've enabled auto-merge and updated the branch, so this will merge once the checks have passed.

@AdrianDAlessandro AdrianDAlessandro merged commit 8ff3824 into main Mar 25, 2025
14 checks passed
@AdrianDAlessandro AdrianDAlessandro deleted the default_config_refactor branch March 25, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No way to use bundled config files

3 participants