-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for rubix scheme and populating filesystem counters #77
Conversation
rubix-core/pom.xml
Outdated
@@ -15,19 +15,15 @@ | |||
|
|||
<properties> | |||
<main.basedir>${project.parent.basedir}</main.basedir> | |||
<dep.hadoop2.version>2.6.0</dep.hadoop2.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not required, it is provided by root pom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
@VisibleForTesting | ||
public CachedReadRequestChain(String fileToRead) | ||
throws IOException | ||
{ | ||
this(fileToRead, ByteBuffer.allocate(1024)); | ||
this(fileToRead, ByteBuffer.allocate(1024), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simpler to create new Statistics object here to keep rest of the code simple
@@ -102,13 +107,17 @@ public void initialize(URI uri, Configuration conf) throws IOException | |||
throw new IOException("Cluster Manager not set"); | |||
} | |||
super.initialize(uri, conf); | |||
fs.initialize(uri, conf); | |||
this.uri = URI.create(uri.getScheme() + "://" + uri.getAuthority()); | |||
this.workingDir = new Path("/user", System.getProperty("user.name")).makeQualified(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to set workingDir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most of the filesystem implementations, the initialize method sets the working directory and uri for that class.Working directory is required to create the absolute path. Just to be consistent for all the filesystem implementations, i did this change
@@ -100,6 +103,9 @@ public Integer call() | |||
log.info(String.format("Read %d bytes from cached file", read)); | |||
fileChannel.close(); | |||
raf.close(); | |||
if (statistics != null) { | |||
statistics.incrementBytesRead(read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way that we can provide our own Statistics implementation? Basically, we would want to differentiate amount of data read locally (i.e. in CachenRRC) and amount of data read from other nodes (i.e. in NonLocalRRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes right now we are only populating bytes read counter for local read. Current implementation of NonLocalRead stats gives you both remote read and non local read combined. But once we have the asynchronous cache warm up feature, it will be easy to differentiate between the data read from the non local cache and data we are downloading in the remote node as part of this request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how are these statistics shown by Hive jobs? i.e. does it call FileSystem.printStatistics method? Based on that we can think of workarounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileSystem has the statistics object. This is part of hadoop file system level counter. Hive just gets a report from AM and prints it
@@ -206,7 +219,27 @@ public boolean mkdirs(Path path, FsPermission fsPermission) | |||
public FileStatus getFileStatus(Path path) | |||
throws IOException | |||
{ | |||
return fs.getFileStatus(path); | |||
FileStatus originalStatus = fs.getFileStatus(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So changing listing's path to rubix scheme was the main change to support rubix scheme!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this one is main along with initializing the remotefilesystem with appropriate scheme (for CachingNativeS3FileSystem, its s3n) so that when we are reading from the remote filesystem, it gets reflected properly
No description provided.