[DS] Generate repositories partitions from HDFS blocks #10

ajnavarro · 2017-08-30T09:58:07Z

Repositories will be in a specific folder. Example:

repositories/
├── repo1/
│   └── ...
├── repo2/
│   └── ...
└── repo3/
    └── ...

We should get all the files from a repository, get the blocks information, and aggregate repositories by datanodes with more block of each repository.
With this information we need to create a new class called RepositoryPartition that extends the trait org.apache.spark.Partition, that will include a list of repository folders.

This partitions will be sent to each relation to create RDD partitions correctly, depending of the locality.

The text was updated successfully, but these errors were encountered:

ajnavarro · 2017-09-06T10:06:44Z

not needed anymore.

ajnavarro closed this as completed Sep 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DS] Generate repositories partitions from HDFS blocks #10

[DS] Generate repositories partitions from HDFS blocks #10

ajnavarro commented Aug 30, 2017

ajnavarro commented Sep 6, 2017

[DS] Generate repositories partitions from HDFS blocks #10

[DS] Generate repositories partitions from HDFS blocks #10

Comments

ajnavarro commented Aug 30, 2017

ajnavarro commented Sep 6, 2017