Processing of siva files bigger than 2Gb #31

smacker · 2018-01-03T18:50:08Z

There is a limit for a job in Spark. It's 2GB. We need to investigate how to change it if possible and how it will affect spark. (the limit was introduced for some reason)

If somebody else will look at it, here is a tip. It looks like the limit is not from Spark actually, but JVM. I can be wrong. JFYI

Exception:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 415 in stage 1.0 failed 4 times, most recent failure: Lost task 415.3 in stage 1.0 (TID 1072, 10.2.15.79, executor 8): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
	at tech.sourced.siva.SivaReader.getEntry(SivaReader.java:42)
	at tech.sourced.engine.provider.RepositoryObjectFactory$$anonfun$genSivaRepository$1.apply(RepositoryProvider.scala:209)

The text was updated successfully, but these errors were encountered:

smacker · 2018-01-04T15:27:54Z

Siva reader uses MappedByteBuffer:
https://github.com/src-d/siva-java/blob/master/src/main/java/tech/sourced/siva/SivaReader.java#L43

And there is a limit in jdk https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/nio/ch/FileChannelImpl.java#L788

Then there is an issue in Spark, it can't have partition more than 2G:
https://issues.apache.org/jira/browse/SPARK-6235
The issue is open since 2015 year but no much progress since then.

We can't change it.

@bzz @dpordomingo

bzz · 2018-01-05T09:41:21Z

2 issues here:

siva-java fails to read .siva files > 2Gb

To understand, if that is by-design, or is it a 🐛 I would suggest
- find a repository that results in > 2Gb .siva file
- borges pack it locally
- try siva unpack using Go implementation

If it works - log an issue in https://github.com/src-d/siva-java

Apache Spark can't have partition more than 2G
Will update this later today

smacker · 2018-01-05T10:01:13Z

I have created an issue already. Because even if it's by-design it should be documented.

But here is test:

$ ls -lah
-rw-r--r--. 1 root root 4.6G Jan  5 09:48 ec644e00e3cab40629bc32562269c011ec2a6b14.siva
$ siva unpack ec644e00e3cab40629bc32562269c011ec2a6b14.siva
$ ls
HEAD  config  ec644e00e3cab40629bc32562269c011ec2a6b14.siva  go  objects  refs

Big files are available in /apps/borges/too-big if anybody else need them.

bzz · 2018-01-05T10:04:27Z

Nice, why not linking the issue here?

smacker · 2018-01-05T10:06:17Z

Good point haha. I honestly believed I did, but no. Here you go: src-d/siva-java#18

bzz · 2018-01-17T10:04:31Z

siva-java Issue was fixed src-d/siva-java#18 (comment) and new v0.1.3 was released + a new engine version that includes it.

smacker · 2018-01-23T12:03:48Z

with the last engine processing works

smacker closed this as completed Jan 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing of siva files bigger than 2Gb #31

Processing of siva files bigger than 2Gb #31

smacker commented Jan 3, 2018 •

edited

Loading

smacker commented Jan 4, 2018

bzz commented Jan 5, 2018 •

edited

Loading

smacker commented Jan 5, 2018 •

edited

Loading

bzz commented Jan 5, 2018 •

edited

Loading

smacker commented Jan 5, 2018

bzz commented Jan 17, 2018 •

edited

Loading

smacker commented Jan 23, 2018

Processing of siva files bigger than 2Gb #31

Processing of siva files bigger than 2Gb #31

Comments

smacker commented Jan 3, 2018 • edited Loading

smacker commented Jan 4, 2018

bzz commented Jan 5, 2018 • edited Loading

smacker commented Jan 5, 2018 • edited Loading

bzz commented Jan 5, 2018 • edited Loading

smacker commented Jan 5, 2018

bzz commented Jan 17, 2018 • edited Loading

smacker commented Jan 23, 2018

smacker commented Jan 3, 2018 •

edited

Loading

bzz commented Jan 5, 2018 •

edited

Loading

smacker commented Jan 5, 2018 •

edited

Loading

bzz commented Jan 5, 2018 •

edited

Loading

bzz commented Jan 17, 2018 •

edited

Loading