Skip to content

Conversation

@singhals
Copy link
Contributor

@singhals singhals commented Feb 15, 2024

Summary

  • Expose get_cluster_activity_events as a public function
  • Add mechanism to stop monitor cluster polling when we are hosting it until the cluster has reached a termination state
  • Create submission with all the cluster report info coming in from outside the library

Checklist

Before formally opening this PR, please adhere to the following standards:

  • Branch/PR names begin with the related Jira ticket id (ie PROD-31) for Jira integration
  • File names are lower_snake_case
  • Relevant unit tests have been added or not applicable
  • Relevant documentation has been added or not applicable
  • Mark yourself as the assignee (makes it easier to scan the PR list)

Related Jira Ticket (add id)

Add any relevant testing examples or screenshots.

@taylorgaw
Copy link
Contributor

At first pass, the logic here is 👍🏻. My concern is that a lot of the method parameters have Dict types, which isn't great for future code readability. I don't think this is a "must have", but it would be nice to have these dicts defined as types of their own so that other developers know how to interact with these objects

@singhals
Copy link
Contributor Author

At first pass, the logic here is 👍🏻. My concern is that a lot of the method parameters have Dict types, which isn't great for future code readability. I don't think this is a "must have", but it would be nice to have these dicts defined as types of their own so that other developers know how to interact with these objects

Agreed but we were trying to avoid making a drastic changes here while dealing with some legacy code that is used in the library today.

@singhals singhals requested a review from taylorgaw February 26, 2024 19:16
taylorgaw
taylorgaw previously approved these changes Feb 26, 2024
Comment on lines +311 to +314
cluster: dict,
cluster_info: dict,
cluster_activity_events: dict,
tasks: List[dict],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these type hints need to be Dict to maintain the older python compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Theres a nice mix between the two already in the library. ill go with Dict though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hm, maybe it's only an issue when you try to type the contents of the dict. E.g. I'm almost certain that dict[str] would fail ... but maybe just a play dict is fine.

Comment on lines +447 to +450
if kill_on_termination:
cluster_state = get_default_client().get_cluster(cluster_id).get("state")
if cluster_state == "TERMINATED":
while_condition = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need the kill_on_termination flag at all? At a glance, it looks like if you did away with it but kept the check for TERMINATED then we're set for hosted monitoring and the existing functionality is unchanged (existing functionality being the monitoring running on the job cluster, in which case it's killed when the cluster terminates anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a safeguard on changing behavior in existing functionality and not needing to check for the cluster state repeatedly from the init script. When we use hosted monitoring, we set kill_on_termination to true

Comment on lines +149 to +154
def set_databricks_config(db_config: DatabricksConf):
global _db_config
if _db_config is not None:
raise RuntimeError("Databricks config has already been set and the library does not support resetting "
"credentials")
_db_config = db_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, nice.

Comment on lines +232 to +235
cluster: dict,
cluster_info: dict,
cluster_activity_events: dict,
tasks: List[dict],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dict type hints here again.

Copy link
Contributor

@gorskysd gorskysd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few dict type hints that I think need to get changed to Dict to maintain compatibility prior to python 3.9. Also, might be able to clean up the monitoring flag, but not blocking on this. Otherwise looks good -- congratulations on navigating the hot mess of the library ;)

@singhals singhals force-pushed the singhals/PROD-16170-override-cluster-report-submission-location branch from 114898c to 4335ae0 Compare February 27, 2024 18:07
Copy link
Contributor

@gorskysd gorskysd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the Dict change is unnecessary afterall, and it already exists elsewhere in the library. LGTM!

@singhals singhals merged commit dc7a217 into main Feb 27, 2024
@singhals singhals deleted the singhals/PROD-16170-override-cluster-report-submission-location branch February 27, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants