Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hailctl] Spark monitor #7087

Merged
merged 4 commits into from
Sep 19, 2019
Merged

[hailctl] Spark monitor #7087

merged 4 commits into from
Sep 19, 2019

Conversation

tpoterba
Copy link
Contributor

No description provided.

@johnc1231
Copy link
Contributor

I checked out your branch, ran make install-hailctl, started a cluster, connected to a notebook, and ran hl.utils.range_table(1_000_000, 10000)._force_count(). Did not see any monitor UI show up.

@johnc1231
Copy link
Contributor

johnc1231 commented Sep 18, 2019

Though running: pprint(dict(os.environ.items())), yielded:

{'CLICOLOR': '1',
 'GIT_PAGER': 'cat',
 'HOME': '/root',
 'INVOCATION_ID': '0faec80a970f4cf29ce69112519fe641',
 'JOURNAL_STREAM': '8:38888',
 'JPY_PARENT_PID': '5858',
 'LANG': 'en_US.UTF-8',
 'LOGNAME': 'root',
 'MPLBACKEND': 'module://ipykernel.pylab.backend_inline',
 'PAGER': 'cat',
 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
 'SHELL': '/bin/sh',
 'SPARKMONITOR_KERNEL_PORT': '38853',
 'TERM': 'xterm-color',
 'USER': 'root'}

which does not include the environment variable you added saying to use the new thing, though that's clearly present in init_notebook.py

@tpoterba
Copy link
Contributor Author

huh. I just tested again and it worked!

@johnc1231
Copy link
Contributor

I'll try again tomorrow, maybe I made a mistake.

@tpoterba
Copy link
Contributor Author

bad news, though -- after playing with the thing a bit longer and trying to bump the partition number up, it's really not reliable. At all.

@tpoterba
Copy link
Contributor Author

I think we should probably go back to jupyter spark.

@danking
Copy link
Contributor

danking commented Sep 18, 2019

:((((((((((((

@tpoterba
Copy link
Contributor Author

aha it gets a push for every task end! Running even a 5000-partition short job is a denial of service attack against the extension

@tpoterba
Copy link
Contributor Author

tpoterba commented Sep 18, 2019

after running one 5000-partition job, the extension log is 300k lines

@tpoterba
Copy link
Contributor Author

OK, give this a try, John. I fixed the Spark UI linking issue, and added some other QoL features.

@tpoterba
Copy link
Contributor Author

The changes to the fork are in a PR here: https://github.com/hail-is/sparkmonitor/pull/1/files

This PR points to an artifact built from that PR.

@tpoterba
Copy link
Contributor Author

@johnc1231 fixed the problem, ready to go.

Copy link
Contributor

@johnc1231 johnc1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to be great!

@danking danking merged commit bda4b11 into hail-is:master Sep 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants