Skip to content
This is a project to integrate COMPSs with HDFS
Java Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Java
Python
docs
.gitignore
README.md
apiHdfs.png

README.md

Integration: COMPSs and HDFS


In order to COMPSs be able to access the HDFS's data, it has been developed an integration between COMPSs and HDFS that consists on a HDFS connector.

The idea of this API is to use a file in the HDFS as a commom file. You can read the entire file from HDFS and then split it or you can read the same file in the HDFS already splitted. The second way is more elegant and the number of fragmentations of the file (number of splits) can be choosed. There are two options about the number of fragmentations: the first one is retrieve the natural number of logical blocks (it means, the same fragmentation adopted by HDFS to store the data), using the command findALLBlocks(filename) and the second way is to retrieve the blocks by forcing a number of split, using the command findBlocksByRecords(filename,number).

Schema

Such integration is available for COMPSs applications in Java and Python. Support for C/C++ language can be developed later using the same technologies in the Python version.

You can’t perform that action at this time.