Adding support for rubix scheme and populating filesystem counters #77

abhishekdas99 · 2017-09-28T22:16:52Z

No description provided.

shubhamtagra · 2017-09-29T05:49:41Z

rubix-core/pom.xml

@@ -15,19 +15,15 @@

    <properties>
        <main.basedir>${project.parent.basedir}</main.basedir>
+        <dep.hadoop2.version>2.6.0</dep.hadoop2.version>


this is not required, it is provided by root pom

shubhamtagra · 2017-09-29T05:51:24Z

rubix-core/src/main/java/com/qubole/rubix/core/CachedReadRequestChain.java

    }

    @VisibleForTesting
    public CachedReadRequestChain(String fileToRead)
            throws IOException
    {
-        this(fileToRead, ByteBuffer.allocate(1024));
+        this(fileToRead, ByteBuffer.allocate(1024), null);


simpler to create new Statistics object here to keep rest of the code simple

shubhamtagra · 2017-09-29T05:56:12Z

rubix-core/src/main/java/com/qubole/rubix/core/CachingFileSystem.java

@@ -102,13 +107,17 @@ public void initialize(URI uri, Configuration conf) throws IOException
            throw new IOException("Cluster Manager not set");
        }
        super.initialize(uri, conf);
-        fs.initialize(uri, conf);
+        this.uri = URI.create(uri.getScheme() + "://" + uri.getAuthority());
+        this.workingDir = new Path("/user", System.getProperty("user.name")).makeQualified(this);


is it necessary to set workingDir?

For most of the filesystem implementations, the initialize method sets the working directory and uri for that class.Working directory is required to create the absolute path. Just to be consistent for all the filesystem implementations, i did this change

shubhamtagra · 2017-09-29T06:00:46Z

rubix-core/src/main/java/com/qubole/rubix/core/CachedReadRequestChain.java

@@ -100,6 +103,9 @@ public Integer call()
        log.info(String.format("Read %d bytes from cached file", read));
        fileChannel.close();
        raf.close();
+        if (statistics != null) {
+          statistics.incrementBytesRead(read);


Is there a way that we can provide our own Statistics implementation? Basically, we would want to differentiate amount of data read locally (i.e. in CachenRRC) and amount of data read from other nodes (i.e. in NonLocalRRC).

Yes right now we are only populating bytes read counter for local read. Current implementation of NonLocalRead stats gives you both remote read and non local read combined. But once we have the asynchronous cache warm up feature, it will be easy to differentiate between the data read from the non local cache and data we are downloading in the remote node as part of this request.

Do you know how are these statistics shown by Hive jobs? i.e. does it call FileSystem.printStatistics method? Based on that we can think of workarounds

FileSystem has the statistics object. This is part of hadoop file system level counter. Hive just gets a report from AM and prints it

shubhamtagra · 2017-09-29T06:01:40Z

rubix-core/src/main/java/com/qubole/rubix/core/CachingFileSystem.java

@@ -206,7 +219,27 @@ public boolean mkdirs(Path path, FsPermission fsPermission)
    public FileStatus getFileStatus(Path path)
            throws IOException
    {
-        return fs.getFileStatus(path);
+        FileStatus originalStatus = fs.getFileStatus(path);


So changing listing's path to rubix scheme was the main change to support rubix scheme!!

Yes this one is main along with initializing the remotefilesystem with appropriate scheme (for CachingNativeS3FileSystem, its s3n) so that when we are reading from the remote filesystem, it gets reflected properly

Abhishek Das added 4 commits August 28, 2017 14:14

filesystem counters

74a6ab3

Merge branch 'master' into filesystem

4f22ed8

Rubix scheme

4b44070

Merge remote-tracking branch 'upstream/master' into filesystem

6a54f58

abhishekdas99 requested review from vrajat, pvam, shubhamtagra and ankitdixit September 28, 2017 22:16

shubhamtagra reviewed Sep 29, 2017

View reviewed changes

Abhishek Das added 3 commits September 29, 2017 15:11

removed hadoop version

220b8a3

change for rubix scheme

6ce70d6

Merge branch 'master' into filesystem

2dd655e

abhishekdas99 merged commit 3242001 into qubole:master Nov 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for rubix scheme and populating filesystem counters #77

Adding support for rubix scheme and populating filesystem counters #77

abhishekdas99 commented Sep 28, 2017

shubhamtagra Sep 29, 2017

abhishekdas99 Sep 29, 2017

shubhamtagra Sep 29, 2017

shubhamtagra Sep 29, 2017

abhishekdas99 Sep 29, 2017

shubhamtagra Sep 29, 2017

abhishekdas99 Sep 29, 2017

shubhamtagra Oct 11, 2017

abhishekdas99 Nov 21, 2017

shubhamtagra Sep 29, 2017

abhishekdas99 Sep 29, 2017

Adding support for rubix scheme and populating filesystem counters #77

Adding support for rubix scheme and populating filesystem counters #77

Conversation

abhishekdas99 commented Sep 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment