Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REFACTOR-#3257: Move logging and caching to the gen_data function #7046

Merged
merged 4 commits into from Mar 13, 2024

Conversation

arunjose696
Copy link
Collaborator

@arunjose696 arunjose696 commented Mar 11, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves [ASV] Move logging and caching to the gen_data function #3257
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

… function

Signed-off-by: arunjose696 <arunjose696@gmail.com>
@YarShev YarShev changed the title REFACTOR-#3257: Move logging and caching to the gen_data… REFACTOR-#3257: Move logging and caching to the gen_data function Mar 11, 2024
YarShev
YarShev previously approved these changes Mar 11, 2024
@YarShev
Copy link
Collaborator

YarShev commented Mar 11, 2024

@anmyachev, any comments?

nrows, ncols, rand_low, rand_high
)
)
data = gen_int_data(nrows, ncols, rand_low, rand_high).copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gen_int_data used anywhere else? If not, we could remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would be called from gen_data function as data_generator can be gen_int_data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

assert data_type in type_to_generator
data_generator = type_to_generator[data_type]

data = data_generator(nrows, ncols, rand_low, rand_high)
data = data_generator(nrows, ncols, rand_low, rand_high, cache_key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove that new line from the lower level functions?

Suggested change
data = data_generator(nrows, ncols, rand_low, rand_high, cache_key)
data = data_generator(nrows, ncols, rand_low, rand_high, cache_key)
data_cache[cache_key] = weakdict(data)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)
)
data = gen_int_data(nrows, ncols, rand_low, rand_high).copy()
data = gen_data("int", nrows, ncols, rand_low, rand_high).copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call a higher level function from a lower level function? gen_int_data will be called anyway, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call gen_int_data the data wouldnt be cached as the caching logic has been moved to get_data, Thus I figured calling gen_data higher level function would be better to cache data for later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call gen_int_data the data wouldnt be cached as the caching logic has been moved to get_data,

That's ok, we shouldn't use a low level function if there is a way to use it through a high level function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I dont need to make any changes here right? (As gen_data higher level function is called)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we should leave this line as it was before, imagine the execution flow, we design it so that it starts with function gen_data, since it contains additional caching functionality. If the function of the lower part of the flow (for example, gen_str_int_data) calls the upper one, then the flow is not unidirectional (this is usually more difficult to work with and understand).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

…ired

Signed-off-by: arunjose696 <arunjose696@gmail.com>
asv_bench/benchmarks/utils/common.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/utils/common.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/utils/common.py Outdated Show resolved Hide resolved
@arunjose696 arunjose696 force-pushed the gen_data_log_refactor branch 2 times, most recently from 4d03c40 to 80c0147 Compare March 13, 2024 12:47
anmyachev
anmyachev previously approved these changes Mar 13, 2024
Copy link
Collaborator

@anmyachev anmyachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

YarShev
YarShev previously approved these changes Mar 13, 2024
@YarShev
Copy link
Collaborator

YarShev commented Mar 13, 2024

asv job failed with ValueError: All arrays must be of the same length.

Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
Signed-off-by: arunjose696 <arunjose696@gmail.com>
@arunjose696 arunjose696 dismissed stale reviews from YarShev and anmyachev via f53c7bf March 13, 2024 16:08
@anmyachev anmyachev merged commit 8710994 into modin-project:master Mar 13, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ASV] Move logging and caching to the gen_data function
3 participants