Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for ViewFs #58

Merged
merged 3 commits into from Aug 30, 2018

Conversation

Projects
None yet
2 participants
@superbobry
Copy link
Contributor

commented Aug 9, 2018

Prior to this commit the Daemon only accquired a delegation token for
defaultFs which did not work well in the presence of federation. Now the
application spec contains a list the namenodes to acquire the delegation
token for in addition to defaultFs. In theory, all tokens should be
automatically renewed by the RM. However, I did not have the occasion
to test this.

Added support for ViewFs
Prior to this commit the Daemon only accquired a delegation token for
defaultFs which did not work well in the presence of federation. Now the
application spec contains a list the namenodes to acquire the delegation
token for in addition to defaultFs. In theory, all tokens should be
automatically renewed by the RM. However, I did not have the occasion
to test this.
@superbobry

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2018

A proper way to test this is, of course, to add federation to hadoop-test-cluster, but I'd rather not have any more iterations on YARN/HDFS configuration this week :)

@jcrist
Copy link
Owner

left a comment

Overall this seems fairly sane to me. I might add a test where the normal namenode is specified manually and see if things still work, no need to mess with the configuration here.

@@ -472,20 +472,24 @@ class ApplicationSpec(Specification):
The queue to submit to. Defaults to the default queue.
tags : set, optional
A set of strings to use as tags for this application.
name_nodes : list, optional
A list of namenode URIs to acquire delegation tokens for.

This comment has been minimized.

Copy link
@jcrist

jcrist Aug 14, 2018

Owner

I haven't used ViewFs before. Is this a normal way of handling this (what do other tools do)? Is there a way to introspect what namenode uris are needed from configuration somewhere?

This comment has been minimized.

Copy link
@superbobry

superbobry Aug 16, 2018

Author Contributor

Is this a normal way of handling this (what do other tools do)?

Spark does the same, see spark.yarn.access.hadoopFileSystems here.

Is there a way to introspect what namenode uris are needed from configuration somewhere?

In theory, yes, but the API for doing this is not available in 2.6.X. It is possible to back-port it, I did just that, but the list of mountpoints I got for our HDFS configuration was incomplete. I decided not to investigate and take the Spark route instead. After all, it is probably rare that an app needs all of the mountpoints.

This comment has been minimized.

Copy link
@jcrist

jcrist Aug 16, 2018

Owner

Thanks for the information.

Given Spark deprecated spark.yarn.access.namenodes in favor of spark.yarn.access.hadoopFileSystems, perhaps we should call this field something similar as well. A few ideas (no strong opinions):

  • hadoop_file_systems
  • additional_file_systems (since the default one is always acquired)
  • file_systems

This comment has been minimized.

Copy link
@superbobry

superbobry Aug 17, 2018

Author Contributor

additional_file_systems look good, will amend the PR.

@superbobry

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2018

I might add a test where the normal namenode is specified manually and see if things still work [...]

Do you mean an integration test using hadoop-test-cluster? I have one checking that YAML/ProtoBuf I/O works.

@jcrist
Copy link
Owner

left a comment

Do you mean an integration test using hadoop-test-cluster?

Yeah. Something like submitting an application with the following spec and checking for successful completion:

name: test_namenodes
name_nodes:
    - hdfs://master.example.com

services:
    test:
        resources:
            vcores: 1
            memory: 128
        commands:
            - echo ok
@@ -472,20 +472,24 @@ class ApplicationSpec(Specification):
The queue to submit to. Defaults to the default queue.
tags : set, optional
A set of strings to use as tags for this application.
name_nodes : list, optional
A list of namenode URIs to acquire delegation tokens for.

This comment has been minimized.

Copy link
@jcrist

jcrist Aug 16, 2018

Owner

Thanks for the information.

Given Spark deprecated spark.yarn.access.namenodes in favor of spark.yarn.access.hadoopFileSystems, perhaps we should call this field something similar as well. A few ideas (no strong opinions):

  • hadoop_file_systems
  • additional_file_systems (since the default one is always acquired)
  • file_systems
spec.getNameNodesList(),
new Function<String, Path>() {
public Path apply(String uri) { return new Path(uri); }
});

This comment has been minimized.

Copy link
@jcrist

jcrist Aug 16, 2018

Owner

Any reason not to do (untested):

List<Path> nameNodes = new ArrayList<Path>(spec.getNameNodesCount());
for (String uri : spec.getNameNodesList()) {
  nameNodes.add(new Path(uri));
}

I find this version more readable and concise, but admit it's just a matter of style. I am quite new at Java, and find the deep nesting needed here to do what amounts to a name_nodes = map(Path, name_node_uris) harder to read than manually writing out the loop. Perhaps my opinion on this will change over time.

Not a strong opinion though, feel free to keep it as is.

This comment has been minimized.

Copy link
@superbobry

superbobry Aug 17, 2018

Author Contributor

The deep nesting comes from lack of lambdas in Java prior to version 8. In Java 8 this would look like

Lists.transform(..., uri -> new Path(uri))

which is clearly much more readable :) which brings us back to Java 8 or Kotlin (#41) for improving the development experience.

@superbobry

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2018

I've added a test and validated it in htcluster.

Note that the latest commit uses file_systems as the hadoop_ prefix is probably redundant (we are running on YARN after all), and additional_ makes the name too long.

@jcrist

This comment has been minimized.

Copy link
Owner

commented Aug 30, 2018

Thanks for your patience here, been busy with other things. Just needed one small fix and everything passed. Merging.

@jcrist jcrist merged commit db1e0c5 into jcrist:master Aug 30, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.