Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: reducing coverage loss #619

Merged
merged 163 commits into from
Mar 23, 2023
Merged

chore: reducing coverage loss #619

merged 163 commits into from
Mar 23, 2023

Conversation

jarulraj
Copy link
Member

@jarulraj jarulraj commented Mar 20, 2023

rbatch
), f"Left and Right batch does not have equal elements: left: {len(lbatch)} right: {len(rbatch)}"

assert self.etype in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't keeping it in the else clause easier to maintain? Similar to the switch clause in cpp. Future changes won't need to worry about modifying assert. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. But, it will be harder to hit the else without a synthetic test case (from a coverage standpoint).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, if the assert is not updated and a new expression type is added, then they will have to update the assert before passing the test case. So, they will be forced to update the assert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move the assert condition to the test case? Any update to the logic will fail the test case and will require updating it.
It is a one-time effort, either we do it in the code using assert or in the test. I prefer test from readability perspective. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure if I understand. Can you take care of this change?

Copy link
Collaborator

@xzdandy xzdandy Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like eva/catalog/catalog_manager.py, shall we also replace logger.execption and raise BinderError with assert for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done

except Exception as e:
logger.error(e)
raise ExecutorError(e)
if table_catalog.table_type == TableType.STRUCTURED_DATA:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need this if check, the previous assert ensures that table_catalog.table_type is TableType.STRUCTURED_DATA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, resolved.

raise ExecutorError(e)

def __call__(self, *args, **kwargs) -> Generator[Batch, None, None]:
yield from self.exec(*args, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ray supporting lateral join now? I think the __call__ is needed for ray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this \ cc @kaushikravichandran @jiashenC @gaurav274

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it were the case, this code was not being used in any of the test cases and repeated in a few operators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I think it could be triggered by Ray if there is a LateralJoin under Projection. Maybe we can move this to the AbstractExecutor to improve the coverage? I think it will anyway have the same function implementation for any executor.

raise ExecutorError(
f"StorageEngine {storage_engine} create call failed"
)
storage_engine.create(table_obj)
Copy link
Collaborator

@xzdandy xzdandy Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here do we still need to check the success from create? We probably don't want the next write if create failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this logic breaks these two test cases -- test_should_fail_to_load_videos_with_same_path and test_should_fail_to_load_images_with_same_path

            storage_engine = StorageEngine.factory(table_obj)
            if do_create:
                create_status = storage_engine.create(table_obj)
                if create_status:
                    storage_engine.write(
                        table_obj,
                        Batch(pd.DataFrame({"file_path": valid_files})),
                    )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine. create will raise error

setattr(result, k, deepcopy(v, memo))
return result

def copy(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is copy not needed anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not being used

for column in by:
if column not in self._frames.columns:
raise KeyError(
"Can not orderby non-projected column: {}".format(column)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also remove this raise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

raise BinderError("Index input needs to be float32.")
if not len(output.array_dimensions) == 2:
raise BinderError("Index input needs to be 2 dimensional.")
assert IndexType.is_faiss_index_type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, if we support index type other than Faiss later, we still need to do it with if else right (assuming we have some index specific checks)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@gaurav274
Copy link
Member

This PR affects ray. However, upon my testing ray does not work on master anyways. So, planning to take care of it in another PR.

@gaurav274 gaurav274 merged commit 77c07d7 into master Mar 23, 2023
@gaurav274 gaurav274 deleted the coverage branch March 23, 2023 04:13
@jarulraj jarulraj mentioned this pull request Apr 3, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants