refactor: Rename, reorganize schema module #1963

ksbrar · 2022-04-26T03:43:24Z

Redo of #1936 (which was reverted by 6d8474a) . This ends up being a significant reorganization:

Changes

(1) Moves ludwig.utils.schema into ludwig.marshmallow, and (2) Renames ludwig.marshmallow module to ludwig.schema and ludwig/marshmallow/marshmallow_schema_utils.py to ludwig/schema/utils.py. Also, for convenience, all of the functionality inside of the current schema.py (namely, get_schema() which returns the full Ludwig schema and validate_config()) lives inside ludwig/schema/__init__.py to make imports elsewhere more readable.
Moves TrainerConfig and get_trainer_jsonschema() into ludwig/schema/trainer.py.
Moves all optimizer configs out of ludwig/modules/optimization_modules.py and into ludwig/schema/optimizers.py'
Moves sequence_encoder_registry out of ludwig/combiners/combiners.py, converts it into a full-fledged Registry(), and places it in ludwig/encoders/registry.py.
Moves all combiner configs out of ludwig/combiners/combiners.py and into a new ludwig/schema/combiners/ directory (module) with a file per individual combiner. Again, to make things more readable elsewhere, __init__.py is set so that a user can import directly from this submodule (e.g. a user can write from ludwig.schema.combiners import ConcatCombinerConfig).

Misc.

Small change to train_with_backend integration test.
Some docstring/comment changes.

Overview:

The structure of new schema module is the following:

ludwig.schema               <-- Meant to contain all schemas, utilities, helpers related to describing and validating Ludwig configs.
├── __init__.py             <-- Contains fully assembled Ludwig schema (`get_schema()`), `validate_config()` for YAML validation, and all "top-level" schema functions.
├── utils.py                <-- An extensive set of marshmallow-related fields, methods, and schemas that are used elsewhere in Ludwig.
├── trainer.py              <-- Contains `TrainerConfig()` and `get_trainer_jsonschema`
├── optimizers.py           <--- Contains every optimizer config (e.g. `SGDOptimizerConfig`, `AdamOptimizerConfig`, etc.) and related marshmallow fields/methods.
└── combiners/
    ├── __init__.py         <-- Imports for each combiner config file (making imports elsewhere more convenient).
    ├── utils.py            <-- New location of `combiner_registry`, `get_combiner_jsonschema()`, `get_combiner_conds()`
    ├── base.py             <-- New location of `BaseCombinerConfig`
    |
    ├──  comparator.py      <-- New location of `ComparatorCombinerConfig`
    ... <file for each combiner> ...
    └──  transformer.py     <-- New location of `TransformerCombinerConfig`

Do not merge until after #1961 goes through! Ready for review.

for more information, see https://pre-commit.ci

…sbrar/ludwig into refactor_remove_marshmallow_extraction

github-actions · 2022-04-26T04:08:10Z

Unit Test Results

      6 files ±0       6 suites ±0 1h 34m 7s ⏱️ + 3m 29s
2 774 tests ±0 2 741 ✔️ ±0   33 💤 ±0 0 ❌ ±0
8 322 runs ±0 8 219 ✔️ ±0 103 💤 ±0 0 ❌ ±0

Results for commit de7769e. ± Comparison against base commit a01336d.

♻️ This comment has been updated with latest results.

ludwig/combiners/combiners.py

justinxzhao · 2022-04-26T21:41:54Z

Could you add your directory structure description as header """ comments in the relevant files (perhaps replacing the copyright header, which we shouldn't add for new files)?

ludwig.schema               <-- Meant to contain all schemas, utilities, helpers related to describing and validating Ludwig configs.
├── utils.py                <-- An extensive set of marshmallow-related fields, methods, and schemas that are used elsewhere in Ludwig.
├── schema.py               <-- Contains the fully assembled Ludwig schema and validate() function that is used for user-input YAML validation. Users should generally only need to look at this.
└── __init__.py

ksbrar · 2022-04-26T21:46:39Z

Could you add your directory structure description as header """ comments in the relevant files (perhaps replacing the copyright header, which we shouldn't add for new files)?

ludwig.schema               <-- Meant to contain all schemas, utilities, helpers related to describing and validating Ludwig configs.
├── utils.py                <-- An extensive set of marshmallow-related fields, methods, and schemas that are used elsewhere in Ludwig.
├── schema.py               <-- Contains the fully assembled Ludwig schema and validate() function that is used for user-input YAML validation. Users should generally only need to look at this.
└── __init__.py

Added

ludwig/models/trainer.py

tgaddair · 2022-04-27T17:42:35Z

ludwig/schema/schema.py

-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
+# Module structure:


ludwig/schema/schema.py is weird. can you move this function into ludwig/schema/__init__.py instead?

this actually causes a circular import error

I don't know enough about how python does submodule initialization but it seems as though to import schema.utils (which is used in e.g. the combiner and trainer files) it will also pull in that init file (which itself pulls in schemas from those trainer/combiner files)

something I talked about with @justinxzhao in the past was moving the trainer, combiner, optimizer schemas into this new ludwig.schema module as well. I was going to do that in a follow up PR, but should I do this now?

Going to move the other schemas over into this module, per conversation with @tgaddair

tgaddair

Looks great, just two things to be fixed.

…be some imports).

for more information, see https://pre-commit.ci

ksbrar and others added 22 commits April 23, 2022 22:07

Remove generated files.

b7ce8c4

Remove extraction script.

6424655

Adjust marshmallow_schema_utils.

74a4d32

change manifest.

3e166d4

adjust combiners, rename unload method.

bfb2e61

adjust trainer

5ee4eca

convert optimizers.

664b4c7

update FloatRange allow_none

765c783

Add descriptions to optimizer, gradientclipping dataclass fields.

11a2857

update marshmallow tests.

4e8a232

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e57cb2

for more information, see https://pre-commit.ci

fix

b266763

Merge branch 'refactor_remove_marshmallow_extraction' of github.com:k…

39dfdc0

…sbrar/ludwig into refactor_remove_marshmallow_extraction

remove old method refs.

f7601e7

fix

f3df813

replace default description None with TODO

a3406f4

fix test

a927865

additionalProperties fix

1affa5b

Rename marshamllow_schema_utils file -> utils

7742720

rename ludwig.marshmallow folder -> ludwig.schema

d8863b1

Move ludwig/utils/schema file to new ludwig/schema folder.

26008fd

style: clean up some import aliases.

b3e93dd

ksbrar added the productivity and code quality Engineer productivity, maintainability, consistency, readability label Apr 26, 2022

ksbrar self-assigned this Apr 26, 2022

ksbrar changed the title ~~refactor: Fix, rename, reorganize schema module~~ refactor: Rename, reorganize schema module Apr 26, 2022

Merge in latest.

dd5e36c

ksbrar marked this pull request as ready for review April 26, 2022 18:35

ksbrar requested review from tgaddair and removed request for tgaddair April 26, 2022 18:35

ksbrar requested review from justinxzhao, tgaddair and w4nderlust April 26, 2022 18:35

integration test fix.

1800687

justinxzhao reviewed Apr 26, 2022

View reviewed changes

ludwig/combiners/combiners.py Outdated Show resolved Hide resolved

add header comments.

9279905

ksbrar added 3 commits April 26, 2022 17:59

add header comments.

31b1b58

add header comments.

77ea23a

Rename import ... utils -> import ... utils as schema_utils.

09e06e0

tgaddair reviewed Apr 27, 2022

View reviewed changes

ludwig/models/trainer.py Outdated Show resolved Hide resolved

tgaddair reviewed Apr 27, 2022

View reviewed changes

fix replace-all, move schema.py into schema/__init__.py

9fb1f48

ksbrar requested review from tgaddair and justinxzhao April 27, 2022 18:24

ksbrar mentioned this pull request Apr 27, 2022

fix: Various marshmallow improvements. #1975

Merged

justinxzhao approved these changes Apr 27, 2022

View reviewed changes

ksbrar mentioned this pull request Apr 27, 2022

Move marshmallow schemas into single schema module. #1964

Closed

ksbrar added 4 commits April 27, 2022 20:19

Move trainer and combiner schemas over. (still need to fix tests, may…

d9eb45d

…be some imports).

tmp - fix tests, move sequence_encoder_registry, fix some imports.

f6dd4e3

more import fixes.

4c0a2fc

fix import.

b6f48af

ksbrar requested a review from justinxzhao April 28, 2022 02:35

ksbrar mentioned this pull request May 9, 2022

Refactored validation scripts in a single package #2006

Closed

ksbrar and others added 2 commits May 9, 2022 13:34

Merge in latest.

3197e31

[pre-commit.ci] auto fixes from pre-commit.com hooks

de7769e

for more information, see https://pre-commit.ci

tgaddair approved these changes May 10, 2022

View reviewed changes

tgaddair merged commit 5c1a3b3 into ludwig-ai:master May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Rename, reorganize schema module #1963

refactor: Rename, reorganize schema module #1963

ksbrar commented Apr 26, 2022 •

edited

Loading

github-actions bot commented Apr 26, 2022 •

edited

Loading

justinxzhao commented Apr 26, 2022

ksbrar commented Apr 26, 2022 •

edited

Loading

tgaddair Apr 27, 2022

ksbrar Apr 27, 2022

ksbrar Apr 27, 2022

ksbrar Apr 27, 2022

tgaddair left a comment

refactor: Rename, reorganize schema module #1963

refactor: Rename, reorganize schema module #1963

Conversation

ksbrar commented Apr 26, 2022 • edited Loading

Changes

Misc.

Overview:

Do not merge until after #1961 goes through! Ready for review.

github-actions bot commented Apr 26, 2022 • edited Loading

Unit Test Results

justinxzhao commented Apr 26, 2022

ksbrar commented Apr 26, 2022 • edited Loading

tgaddair Apr 27, 2022

Choose a reason for hiding this comment

ksbrar Apr 27, 2022

Choose a reason for hiding this comment

ksbrar Apr 27, 2022

Choose a reason for hiding this comment

ksbrar Apr 27, 2022

Choose a reason for hiding this comment

tgaddair left a comment

Choose a reason for hiding this comment

ksbrar commented Apr 26, 2022 •

edited

Loading

github-actions bot commented Apr 26, 2022 •

edited

Loading

ksbrar commented Apr 26, 2022 •

edited

Loading