Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Rename, reorganize schema module #1963

Merged
merged 35 commits into from
May 10, 2022

Conversation

ksbrar
Copy link
Collaborator

@ksbrar ksbrar commented Apr 26, 2022

Redo of #1936 (which was reverted by 6d8474a) . This ends up being a significant reorganization:

Changes

  • (1) Moves ludwig.utils.schema into ludwig.marshmallow, and (2) Renames ludwig.marshmallow module to ludwig.schema and ludwig/marshmallow/marshmallow_schema_utils.py to ludwig/schema/utils.py. Also, for convenience, all of the functionality inside of the current schema.py (namely, get_schema() which returns the full Ludwig schema and validate_config()) lives inside ludwig/schema/__init__.py to make imports elsewhere more readable.
  • Moves TrainerConfig and get_trainer_jsonschema() into ludwig/schema/trainer.py.
  • Moves all optimizer configs out of ludwig/modules/optimization_modules.py and into ludwig/schema/optimizers.py'
  • Moves sequence_encoder_registry out of ludwig/combiners/combiners.py, converts it into a full-fledged Registry(), and places it in ludwig/encoders/registry.py.
  • Moves all combiner configs out of ludwig/combiners/combiners.py and into a new ludwig/schema/combiners/ directory (module) with a file per individual combiner. Again, to make things more readable elsewhere, __init__.py is set so that a user can import directly from this submodule (e.g. a user can write from ludwig.schema.combiners import ConcatCombinerConfig).

Misc.

  • Small change to train_with_backend integration test.
  • Some docstring/comment changes.

Overview:

  • The structure of new schema module is the following:
ludwig.schema               <-- Meant to contain all schemas, utilities, helpers related to describing and validating Ludwig configs.
├── __init__.py             <-- Contains fully assembled Ludwig schema (`get_schema()`), `validate_config()` for YAML validation, and all "top-level" schema functions.
├── utils.py                <-- An extensive set of marshmallow-related fields, methods, and schemas that are used elsewhere in Ludwig.
├── trainer.py              <-- Contains `TrainerConfig()` and `get_trainer_jsonschema`
├── optimizers.py           <--- Contains every optimizer config (e.g. `SGDOptimizerConfig`, `AdamOptimizerConfig`, etc.) and related marshmallow fields/methods.
└── combiners/
    ├── __init__.py         <-- Imports for each combiner config file (making imports elsewhere more convenient).
    ├── utils.py            <-- New location of `combiner_registry`, `get_combiner_jsonschema()`, `get_combiner_conds()`
    ├── base.py             <-- New location of `BaseCombinerConfig`
    |
    ├──  comparator.py      <-- New location of `ComparatorCombinerConfig`
    ... <file for each combiner> ...
    └──  transformer.py     <-- New location of `TransformerCombinerConfig`

Do not merge until after #1961 goes through! Ready for review.

@ksbrar ksbrar added the productivity and code quality Engineer productivity, maintainability, consistency, readability label Apr 26, 2022
@ksbrar ksbrar self-assigned this Apr 26, 2022
@ksbrar ksbrar changed the title refactor: Fix, rename, reorganize schema module refactor: Rename, reorganize schema module Apr 26, 2022
@github-actions
Copy link

github-actions bot commented Apr 26, 2022

Unit Test Results

       6 files  ±0         6 suites  ±0   1h 34m 7s ⏱️ + 3m 29s
2 774 tests ±0  2 741 ✔️ ±0    33 💤 ±0  0 ±0 
8 322 runs  ±0  8 219 ✔️ ±0  103 💤 ±0  0 ±0 

Results for commit de7769e. ± Comparison against base commit a01336d.

♻️ This comment has been updated with latest results.

@ksbrar ksbrar marked this pull request as ready for review April 26, 2022 18:35
@ksbrar ksbrar requested review from tgaddair and removed request for tgaddair April 26, 2022 18:35
@justinxzhao
Copy link
Collaborator

Could you add your directory structure description as header """ comments in the relevant files (perhaps replacing the copyright header, which we shouldn't add for new files)?

ludwig.schema               <-- Meant to contain all schemas, utilities, helpers related to describing and validating Ludwig configs.
├── utils.py                <-- An extensive set of marshmallow-related fields, methods, and schemas that are used elsewhere in Ludwig.
├── schema.py               <-- Contains the fully assembled Ludwig schema and validate() function that is used for user-input YAML validation. Users should generally only need to look at this.
└── __init__.py

@ksbrar
Copy link
Collaborator Author

ksbrar commented Apr 26, 2022

Could you add your directory structure description as header """ comments in the relevant files (perhaps replacing the copyright header, which we shouldn't add for new files)?

ludwig.schema               <-- Meant to contain all schemas, utilities, helpers related to describing and validating Ludwig configs.
├── utils.py                <-- An extensive set of marshmallow-related fields, methods, and schemas that are used elsewhere in Ludwig.
├── schema.py               <-- Contains the fully assembled Ludwig schema and validate() function that is used for user-input YAML validation. Users should generally only need to look at this.
└── __init__.py

Added

ludwig/models/trainer.py Outdated Show resolved Hide resolved
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Module structure:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ludwig/schema/schema.py is weird. can you move this function into ludwig/schema/__init__.py instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually causes a circular import error

I don't know enough about how python does submodule initialization but it seems as though to import schema.utils (which is used in e.g. the combiner and trainer files) it will also pull in that init file (which itself pulls in schemas from those trainer/combiner files)

something I talked about with @justinxzhao in the past was moving the trainer, combiner, optimizer schemas into this new ludwig.schema module as well. I was going to do that in a follow up PR, but should I do this now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to move the other schemas over into this module, per conversation with @tgaddair

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just two things to be fixed.

@tgaddair tgaddair merged commit 5c1a3b3 into ludwig-ai:master May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
productivity and code quality Engineer productivity, maintainability, consistency, readability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants