Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for rubix scheme and populating filesystem counters #77

Merged
merged 7 commits into from
Nov 28, 2017

Conversation

abhishekdas99
Copy link
Contributor

No description provided.

@@ -15,19 +15,15 @@

<properties>
<main.basedir>${project.parent.basedir}</main.basedir>
<dep.hadoop2.version>2.6.0</dep.hadoop2.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not required, it is provided by root pom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@VisibleForTesting
public CachedReadRequestChain(String fileToRead)
throws IOException
{
this(fileToRead, ByteBuffer.allocate(1024));
this(fileToRead, ByteBuffer.allocate(1024), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler to create new Statistics object here to keep rest of the code simple

@@ -102,13 +107,17 @@ public void initialize(URI uri, Configuration conf) throws IOException
throw new IOException("Cluster Manager not set");
}
super.initialize(uri, conf);
fs.initialize(uri, conf);
this.uri = URI.create(uri.getScheme() + "://" + uri.getAuthority());
this.workingDir = new Path("/user", System.getProperty("user.name")).makeQualified(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to set workingDir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of the filesystem implementations, the initialize method sets the working directory and uri for that class.Working directory is required to create the absolute path. Just to be consistent for all the filesystem implementations, i did this change

@@ -100,6 +103,9 @@ public Integer call()
log.info(String.format("Read %d bytes from cached file", read));
fileChannel.close();
raf.close();
if (statistics != null) {
statistics.incrementBytesRead(read);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way that we can provide our own Statistics implementation? Basically, we would want to differentiate amount of data read locally (i.e. in CachenRRC) and amount of data read from other nodes (i.e. in NonLocalRRC).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right now we are only populating bytes read counter for local read. Current implementation of NonLocalRead stats gives you both remote read and non local read combined. But once we have the asynchronous cache warm up feature, it will be easy to differentiate between the data read from the non local cache and data we are downloading in the remote node as part of this request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how are these statistics shown by Hive jobs? i.e. does it call FileSystem.printStatistics method? Based on that we can think of workarounds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileSystem has the statistics object. This is part of hadoop file system level counter. Hive just gets a report from AM and prints it

@@ -206,7 +219,27 @@ public boolean mkdirs(Path path, FsPermission fsPermission)
public FileStatus getFileStatus(Path path)
throws IOException
{
return fs.getFileStatus(path);
FileStatus originalStatus = fs.getFileStatus(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So changing listing's path to rubix scheme was the main change to support rubix scheme!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this one is main along with initializing the remotefilesystem with appropriate scheme (for CachingNativeS3FileSystem, its s3n) so that when we are reading from the remote filesystem, it gets reflected properly

@abhishekdas99 abhishekdas99 merged commit 3242001 into qubole:master Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants