Permalink
Commits on Dec 29, 2013
  1. Add some TODOs

    Evan Chan committed Dec 29, 2013
Commits on Dec 20, 2013
  1. Add a /healthz route

    Evan Chan committed Dec 20, 2013
  2. Enable assembly task for jobserver project

    The job server assembly excludes everything in spark core, resulting in
    a much smaller assembly.
    Evan Chan committed Dec 19, 2013
Commits on Dec 18, 2013
Commits on Dec 16, 2013
Commits on Dec 9, 2013
Commits on Dec 8, 2013
Commits on Dec 2, 2013
  1. Add jobserver project, README, docs

    Evan Chan committed Nov 28, 2013
Commits on Nov 25, 2013
  1. Merge pull request #101 from colorant/yarn-client-scheduler

    For SPARK-527, Support spark-shell when running on YARN
    
    sync to trunk and resubmit here
    
    In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote.
    
    This approaching won't support application that involve local interaction and need to be run on where it is launched.
    
    So In this pull request I have a YarnClientClusterScheduler and backend added.
    
    With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well.
    
    This enables spark-shell to run upon YARN.
    
    This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on.
    
    Docs also updated to show how to use this yarn-client mode.
    mateiz committed Nov 25, 2013
  2. Merge pull request #203 from witgo/master

     Fix Maven build for metrics-graphite
    rxin committed Nov 25, 2013
  3. Merge pull request #151 from russellcardullo/add-graphite-sink

    Add graphite sink for metrics
    
    This adds a metrics sink for graphite.  The sink must
    be configured with the host and port of a graphite node
    and optionally may be configured with a prefix that will
    be prepended to all metrics that are sent to graphite.
    mateiz committed Nov 25, 2013
Commits on Nov 24, 2013
  1. Merge pull request #185 from mkolod/random-number-generator

    XORShift RNG with unit tests and benchmark
    
    This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950
    
    To run unit test, start SBT console and type:
    compile
    test-only org.apache.spark.util.XORShiftRandomSuite
    To run benchmark, type:
    project core
    console
    Once the Scala console starts, type:
    org.apache.spark.util.XORShiftRandom.benchmark(100000000)
    XORShiftRandom is also an object with a main method taking the
    number of iterations as an argument, so you can also run it
    from the command line.
    mateiz committed Nov 24, 2013
  2. Merge pull request #197 from aarondav/patrick-fix

    Fix 'timeWriting' stat for shuffle files
    
    Due to concurrent git branches, changes from shuffle file consolidation patch
    caused the shuffle write timing patch to no longer actually measure the time,
    since it requires time be measured after the stream has been closed.
    rxin committed Nov 24, 2013
  3. Merge pull request #200 from mateiz/hash-fix

    AppendOnlyMap fixes
    
    - Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`)
    - Some other small optimizations throughout (see commit comments)
    rxin committed Nov 24, 2013
  4. Some other optimizations to AppendOnlyMap:

    - Don't check keys for equality when re-inserting due to growing the
      table; the keys will already be unique
    - Remember the grow threshold instead of recomputing it on each insert
    mateiz committed Nov 24, 2013
  5. Fixes to AppendOnlyMap:

    - Use Murmur Hash 3 finalization step to scramble the bits of HashCode
      instead of the simpler version in java.util.HashMap; the latter one
      had trouble with ranges of consecutive integers. Murmur Hash 3 is used
      by fastutil.
    - Use Object.equals() instead of Scala's == to compare keys, because the
      latter does extra casts for numeric types (see the equals method in
      https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)
    mateiz committed Nov 24, 2013
Commits on Nov 23, 2013
  1. Merge pull request #198 from ankurdave/zipPartitions-preservesPartiti…

    …oning
    
    Support preservesPartitioning in RDD.zipPartitions
    
    In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.
    rxin committed Nov 23, 2013
Commits on Nov 22, 2013
  1. Fix 'timeWriting' stat for shuffle files

    Due to concurrent git branches, changes from shuffle file consolidation patch
    caused the shuffle write timing patch to no longer actually measure the time,
    since it requires time be measured after the stream has been closed.
    aarondav committed Nov 22, 2013
  2. Merge pull request #193 from aoiwelle/patch-1

    Fix Kryo Serializer buffer documentation inconsistency
    
    The documentation here is inconsistent with the coded default and other documentation.
    rxin committed Nov 22, 2013
  3. Merge pull request #196 from pwendell/master

    TimeTrackingOutputStream should pass on calls to close() and flush().
    
    Without this fix you get a huge number of open files when running shuffles.
    rxin committed Nov 22, 2013
  4. Add YarnClientClusterScheduler and Backend.

    With this scheduler, the user application is launched locally,
    While the executor will be launched by YARN on remote nodes.
    
    This enables spark-shell to run upon YARN.
    colorant committed Oct 23, 2013
  5. TimeTrackingOutputStream should pass on calls to close() and flush().

    Without this fix you get a huge number of open shuffles after running
    shuffles.
    pwendell committed Nov 22, 2013
Commits on Nov 21, 2013
  1. Fix Kryo Serializer buffer inconsistency

    The documentation here is inconsistent with the coded default and other documentation.
    aoiwelle committed Nov 21, 2013
Commits on Nov 20, 2013
  1. Merge branch 'master' of github.com:tbfenet/incubator-spark

    PartitionPruningRDD is using index from parent
    
    I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.
    rxin committed Nov 20, 2013
  2. Merge pull request #191 from hsaputra/removesemicolonscala

    Cleanup to remove semicolons (;) from Scala code
    
    -) The main reason for this PR is to remove semicolons from single statements of Scala code.
    -) Remove unused imports as I see them
    -) Fix ASF comment header from some of files (bad copy paste I suppose)
    mateiz committed Nov 20, 2013
  3. Merge branch 'master' into removesemicolonscala

    Henry Saputra committed Nov 20, 2013
  4. Another set of changes to remove unnecessary semicolon (;) from Scala…

    … code.
    
    Passed the sbt/sbt compile and test
    Henry Saputra committed Nov 20, 2013
  5. Merge pull request #181 from BlackNiuza/fix_tasks_number

    correct number of tasks in ExecutorsUI
    
    Index `a` is not `execId` here
    mateiz committed Nov 20, 2013