Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling functionality from apache spark #1

Merged
merged 1,286 commits into from
May 5, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1286 commits
Select commit Hold shift + click to select a range
517bdf3
[doc][streaming] Fixed broken link in mllib section
BenFradet Apr 20, 2015
ce7ddab
[SPARK-6368][SQL] Build a specialized serializer for Exchange operator.
yhuai Apr 21, 2015
c736220
[SPARK-6635][SQL] DataFrame.withColumn should replace columns with id…
viirya Apr 21, 2015
8136810
[SPARK-6490][Core] Add spark.rpc.* and deprecate spark.akka.*
zsxwing Apr 21, 2015
ab9128f
[SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression
Apr 21, 2015
1f2f723
[SPARK-5990] [MLLIB] Model import/export for IsotonicRegression
yanboliang Apr 21, 2015
5fea3e5
[SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOver…
Apr 21, 2015
c035c0f
[SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized C…
kayousterhout Apr 21, 2015
c25ca7c
SPARK-3276 Added a new configuration spark.streaming.minRememberDuration
emres Apr 21, 2015
45c47fa
[SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix
MechCoder Apr 21, 2015
04bf34e
[SPARK-7011] Build(compilation) fails with scala 2.11 option, because…
ScrapCodes Apr 21, 2015
2e8c6ca
[SPARK-6994] Allow to fetch field values by name in sql.Row
Apr 21, 2015
03fd921
[SQL][minor] make it more clear that we only need to re-throw GetFiel…
cloud-fan Apr 21, 2015
6265cba
[SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used
yhuai Apr 21, 2015
2a24bf9
[SPARK-6996][SQL] Support map types in java beans
Apr 21, 2015
7662ec2
[SPARK-5817] [SQL] Fix bug of udtf with column names
chenghao-intel Apr 21, 2015
f83c0f1
[SPARK-3386] Share and reuse SerializerInstances in shuffle paths
JoshRosen Apr 21, 2015
a70e849
[minor] [build] Set java options when generating mima ignores.
Apr 21, 2015
7fe6142
[SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls
MechCoder Apr 21, 2015
686dd74
[SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark
mengxr Apr 21, 2015
ae036d0
[Minor][MLLIB] Fix a minor formatting bug in toString method in Node.…
Apr 21, 2015
b063a61
Avoid warning message about invalid refuse_seconds value in Mesos >=0…
Apr 22, 2015
e72c16e
[SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.
Apr 22, 2015
3134c3f
[SPARK-6953] [PySpark] speed up python tests
rxin Apr 22, 2015
41ef78a
Closes #5427
rxin Apr 22, 2015
a0761ec
[SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XX…
texasmichelle Apr 22, 2015
3a3f710
[SPARK-6490][Docs] Add docs for rpc configurations
zsxwing Apr 22, 2015
70f9f8f
[MINOR] Comment improvements in ExternalSorter.
pwendell Apr 22, 2015
607eff0
[SPARK-6113] [ML] Small cleanups after original tree API PR
jkbradley Apr 22, 2015
bdc5c16
[SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution…
srowen Apr 22, 2015
33b8562
[SPARK-7052][Core] Add ThreadUtils and move thread methods from Utils…
zsxwing Apr 22, 2015
cdf0328
[SQL] Rename some apply functions.
rxin Apr 22, 2015
fbe7106
[SPARK-7039][SQL]JDBCRDD: Add support on type NVARCHAR
szheng79 Apr 22, 2015
baf865d
[SPARK-7059][SQL] Create a DataFrame join API to facilitate equijoin.
rxin Apr 22, 2015
f4f3998
[SPARK-6827] [MLLIB] Wrap FPGrowthModel.freqItemsets and make it cons…
yanboliang Apr 23, 2015
04525c0
[SPARK-6967] [SQL] fix date type convertion in jdbcrdd
adrian-wang Apr 23, 2015
b69c4f9
Disable flaky test: ReceiverSuite "block generator throttling".
rxin Apr 23, 2015
1b85e08
[MLlib] UnaryTransformer nullability should not depend on PrimitiveType.
rxin Apr 23, 2015
d206860
[SPARK-7066][MLlib] VectorAssembler should use NumericType not Native…
rxin Apr 23, 2015
03e85b4
[SPARK-7046] Remove InputMetrics from BlockResult
kayousterhout Apr 23, 2015
d9e70f3
[HOTFIX][SQL] Fix broken cached test
viirya Apr 23, 2015
2d33323
[MLlib] Add support for BooleanType to VectorAssembler.
rxin Apr 23, 2015
29163c5
[SPARK-7068][SQL] Remove PrimitiveType
rxin Apr 23, 2015
f60bece
[SPARK-7069][SQL] Rename NativeType -> AtomicType.
rxin Apr 23, 2015
a7d65d3
[HOTFIX] [SQL] Fix compilation for scala 2.11.
ScrapCodes Apr 23, 2015
975f53e
[minor][streaming]fixed scala string interpolation error
Apr 23, 2015
cc48e63
[SPARK-7044] [SQL] Fix the deadlock in script transformation
chenghao-intel Apr 23, 2015
534f2a4
[SPARK-6752][Streaming] Allow StreamingContext to be recreated from c…
tdas Apr 23, 2015
c1213e6
[SPARK-7055][SQL]Use correct ClassLoader for JDBC Driver in JDBCRDD.g…
Apr 23, 2015
6afde2c
[SPARK-7058] Include RDD deserialization time in "task deserializatio…
JoshRosen Apr 23, 2015
3e91cc2
[SPARK-7085][MLlib] Fix miniBatchFraction parameter in train method c…
Apr 23, 2015
baa83a9
[SPARK-6879] [HISTORYSERVER] check if app is completed before clean i…
WangTaoTheTonic Apr 23, 2015
6d0749c
[SPARK-7087] [BUILD] Fix path issue change version script
tijoparacka Apr 23, 2015
1ed46a6
[SPARK-7070] [MLLIB] LDA.setBeta should call setTopicConcentration.
mengxr Apr 23, 2015
6220d93
[SQL] Break dataTypes.scala into multiple files.
rxin Apr 23, 2015
73db132
[SPARK-6818] [SPARKR] Support column deletion in SparkR DataFrame API.
Apr 23, 2015
336f7f5
[SPARK-7037] [CORE] Inconsistent behavior for non-spark config proper…
Apr 24, 2015
2d010f7
[SPARK-7060][SQL] Add alias function to python dataframe
yhuai Apr 24, 2015
67bccbd
Update sql-programming-guide.md
kgeis Apr 24, 2015
d3a302d
[SQL] Fixed expression data type matching.
rxin Apr 24, 2015
4c722d7
Fixed a typo from the previous commit.
rxin Apr 24, 2015
8509519
[SPARK-5894] [ML] Add polynomial mapper
yinxusen Apr 24, 2015
78b39c7
[SPARK-7115] [MLLIB] skip the very first 1 in poly expansion
mengxr Apr 24, 2015
6e57d57
[SPARK-6528] [ML] Add IDF transformer
yinxusen Apr 24, 2015
ebb77b2
[SPARK-7033] [SPARKR] Clean usage of split. Use partition instead whe…
Apr 24, 2015
caf0136
[SPARK-6852] [SPARKR] Accept numeric as numPartitions in SparkR.
Apr 24, 2015
438859e
[SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3
calvinjia Apr 24, 2015
d874f8b
[PySpark][Minor] Update sql example, so that can read file correctly
Sephiroth-Lin Apr 25, 2015
59b7cfc
[SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and…
dbsiegel Apr 25, 2015
cca9905
update the deprecated CountMinSketchMonoid function to TopPctCMS func…
caikehe Apr 25, 2015
a61d65f
Revert "[SPARK-6752][Streaming] Allow StreamingContext to be recreate…
pwendell Apr 25, 2015
a7160c4
[SPARK-6113] [ML] Tree ensembles for Pipelines API
jkbradley Apr 25, 2015
aa6966f
[SQL] Update SQL readme to include instructions on generating golden …
yhuai Apr 25, 2015
a11c868
[SPARK-7092] Update spark scala version to 2.11.6
ScrapCodes Apr 25, 2015
f5473c2
[SPARK-6014] [CORE] [HOTFIX] Add try-catch block around ShutDownHook
nishkamravi2 Apr 26, 2015
9a5bbe0
[MINOR] [MLLIB] Refactor toString method in MLLIB
Apr 26, 2015
ca55dc9
[SPARK-7152][SQL] Add a Column expression for partition ID.
rxin Apr 26, 2015
d188b8b
[SQL][Minor] rename DataTypeParser.apply to DataTypeParser.parse
scwf Apr 27, 2015
82bb7fd
[SPARK-6505] [SQL] Remove the reflection call in HiveFunctionWrapper
baishuo Apr 27, 2015
998aac2
[SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact
chernetsov Apr 27, 2015
7078f60
[SPARK-6856] [R] Make RDD information more useful in SparkR
Jeffrharr Apr 27, 2015
ef82bdd
SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputfor…
tedyu Apr 27, 2015
ca9f4eb
[SPARK-6991] [SPARKR] Adds support for zipPartitions.
hlin09 Apr 27, 2015
b9de9e0
[SPARK-7103] Fix crash with SparkContext.union when RDD has no partit…
stshe Apr 27, 2015
8e1c00d
[SPARK-6738] [CORE] Improve estimate the size of a large array
shenh062326 Apr 27, 2015
5d45e1f
[SPARK-3090] [CORE] Stop SparkContext if user forgets to.
Apr 27, 2015
ab5adb7
[SPARK-7145] [CORE] commons-lang (2.x) classes used instead of common…
srowen Apr 27, 2015
62888a4
[SPARK-7162] [YARN] Launcher error in yarn-client
witgo Apr 27, 2015
4d9e560
[SPARK-7090] [MLLIB] Introduce LDAOptimizer to LDA to further improve…
hhbyyh Apr 28, 2015
874a2ca
[SPARK-7174][Core] Move calling `TaskScheduler.executorHeartbeatRecei…
zsxwing Apr 28, 2015
29576e7
[SPARK-6829] Added math functions for DataFrames
brkyvz Apr 28, 2015
9e4e82b
[SPARK-5946] [STREAMING] Add Python API for direct Kafka stream
jerryshao Apr 28, 2015
bf35edd
[SPARK-7187] SerializationDebugger should not crash user code
Apr 28, 2015
d94cd1a
[SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs.
rxin Apr 28, 2015
e13cd86
[SPARK-6352] [SQL] Custom parquet output committer
Apr 28, 2015
7f3b3b7
[SPARK-7168] [BUILD] Update plugin versions in Maven build and centra…
srowen Apr 28, 2015
75905c5
[SPARK-7100] [MLLIB] Fix persisted RDD leak in GradientBoostTrees
Apr 28, 2015
268c419
[SPARK-6435] spark-shell --jars option does not add all jars to class…
tsudukim Apr 28, 2015
6a827d5
[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN
Apr 28, 2015
b14cd23
[SPARK-7140] [MLLIB] only scan the first 16 entries in Vector.hashCode
mengxr Apr 28, 2015
52ccf1d
[Core][test][minor] replace try finally block with tryWithSafeFinally
liyezhang556520 Apr 28, 2015
8aab94d
[SPARK-4286] Add an external shuffle service that can be run as a dae…
dragos Apr 28, 2015
2d222fb
[SPARK-5932] [CORE] Use consistent naming for size properties
Apr 28, 2015
8009810
[SPARK-6314] [CORE] handle JsonParseException for history server
liyezhang556520 Apr 28, 2015
53befac
[SPARK-5338] [MESOS] Add cluster mode support for Mesos
tnachen Apr 28, 2015
28b1af7
[MINOR] [CORE] Warn users who try to cache RDDs with dynamic allocati…
Apr 28, 2015
f0a1f90
[SPARK-7201] [MLLIB] move Identifiable to ml.util
mengxr Apr 28, 2015
555213e
Closes #4807
mengxr Apr 28, 2015
d36e673
[SPARK-6965] [MLLIB] StringIndexer handles numeric input.
mengxr Apr 29, 2015
5c8f4bd
[SPARK-7138] [STREAMING] Add method to BlockGenerator to add multiple…
tdas Apr 29, 2015
a8aeadb
[SPARK-7208] [ML] [PYTHON] Added Matrix, SparseMatrix to __all__ list…
jkbradley Apr 29, 2015
5ef006f
[SPARK-6756] [MLLIB] add toSparse, toDense, numActives, numNonzeros, …
mengxr Apr 29, 2015
271c4c6
[SPARK-7215] made coalesce and repartition a part of the query plan
brkyvz Apr 29, 2015
f98773a
[SPARK-7205] Support `.ivy2/local` and `.m2/repositories/` in --packages
brkyvz Apr 29, 2015
8dee274
MAINTENANCE: Automated closing of pull requests.
pwendell Apr 29, 2015
fe917f5
[SPARK-7188] added python support for math DataFrame functions
brkyvz Apr 29, 2015
1fd6ed9
[SPARK-7204] [SQL] Fix callSite for Dataframe and SQL operations
pwendell Apr 29, 2015
f49284b
[SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use managed memory for aggr…
JoshRosen Apr 29, 2015
baed3f2
[SPARK-6918] [YARN] Secure HBase support.
deanchen Apr 29, 2015
687273d
[SPARK-7223] Rename RPC askWithReply -> askWithReply, sendWithReply -…
rxin Apr 29, 2015
3df9c5d
Better error message on access to non-existing attribute
ksonj Apr 29, 2015
81ea42b
[SQL][Minor] fix java doc for DataFrame.agg
cloud-fan Apr 29, 2015
c0c0ba6
Fix a typo of "threshold"
yinxusen Apr 29, 2015
1868bd4
[SPARK-7056] [STREAMING] Make the Write Ahead Log pluggable
tdas Apr 29, 2015
a9c4e29
[SPARK-6752] [STREAMING] [REOPENED] Allow StreamingContext to be recr…
tdas Apr 29, 2015
3a180c1
[SPARK-6629] cancelJobGroup() may not work for jobs whose job groups …
JoshRosen Apr 29, 2015
15995c8
[SPARK-7222] [ML] Added mathematical derivation in comment and compre…
Apr 29, 2015
c9d530e
[SPARK-6529] [ML] Add Word2Vec transformer
yinxusen Apr 29, 2015
d7dbce8
[SPARK-7156][SQL] support RandomSplit in DataFrames
brkyvz Apr 29, 2015
7f4b583
[SPARK-7181] [CORE] fix inifite loop in Externalsorter's mergeWithAgg…
chouqin Apr 29, 2015
3fc6cfd
[SPARK-7155] [CORE] Allow newAPIHadoopFile to support comma-separated…
yongtang Apr 29, 2015
f8cbb0a
[SPARK-7229] [SQL] SpecificMutableRow should take integer type as int…
chenghao-intel Apr 29, 2015
b1ef6a6
[SPARK-7259] [ML] VectorIndexer: do not copy non-ML metadata to outpu…
jkbradley Apr 29, 2015
1fdfdb4
[SQL] [Minor] Print detail query execution info when spark answer is …
scwf Apr 30, 2015
114bad6
[SPARK-7176] [ML] Add validation functionality to Param
jkbradley Apr 30, 2015
1b7106b
[SPARK-6862] [STREAMING] [WEBUI] Add BatchPage to display details of …
zsxwing Apr 30, 2015
7143f6e
[SPARK-7234][SQL] Fix DateType mismatch when codegen on.
Apr 30, 2015
5553198
[SPARK-7156][SQL] Addressed follow up comments for randomSplit
brkyvz Apr 30, 2015
ba49eb1
Some code clean up.
Apr 30, 2015
4459514
[SPARK-7225][SQL] CombineLimits optimizer does not work
pzzs Apr 30, 2015
254e050
[SPARK-1406] Mllib pmml model export
selvinsource Apr 30, 2015
47bf406
[HOTFIX] Disabling flaky test (fix in progress as part of SPARK-7224)
pwendell Apr 30, 2015
7dacc08
[SPARK-7224] added mock repository generator for --packages tests
brkyvz Apr 30, 2015
6c65da6
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
harishreedharan Apr 30, 2015
adbdb19
[SPARK-7207] [ML] [BUILD] Added ml.recommendation, ml.regression to S…
jkbradley Apr 30, 2015
e0628f2
Revert "[SPARK-5342] [YARN] Allow long running Spark apps to run on s…
pwendell Apr 30, 2015
6702324
[SPARK-7196][SQL] Support precision and scale of decimal type for JDBC
viirya Apr 30, 2015
07a8620
[SPARK-7288] Suppress compiler warnings due to use of sun.misc.Unsafe…
JoshRosen Apr 30, 2015
77cc25f
[SPARK-7267][SQL]Push down Project when it's child is Limit
pzzs Apr 30, 2015
fa01bec
[Build] Enable MiMa checks for SQL
JoshRosen Apr 30, 2015
1c3e402
[SPARK-7279] Removed diffSum which is theoretical zero in LinearRegre…
Apr 30, 2015
149b3ee
[SPARK-7242][SQL][MLLIB] Frequent items for DataFrames
brkyvz Apr 30, 2015
ee04413
[SPARK-7280][SQL] Add "drop" column/s on a data frame
rakeshchalasani May 1, 2015
0797338
[SPARK-7093] [SQL] Using newPredicate in NestedLoopJoin to enable cod…
scwf May 1, 2015
a0d8a61
[SPARK-7109] [SQL] Push down left side filter for left semi join
scwf May 1, 2015
e991255
[SPARK-6913][SQL] Fixed "java.sql.SQLException: No suitable driver fo…
SlavikBaranov May 1, 2015
3ba5aaa
[SPARK-5213] [SQL] Pluggable SQL Parser Support
chenghao-intel May 1, 2015
473552f
[SPARK-7123] [SQL] support table.star in sqlcontext
scwf May 1, 2015
beeafcf
Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support"
pwendell May 1, 2015
69a739c
[SPARK-7282] [STREAMING] Fix the race conditions in StreamingListener…
zsxwing May 1, 2015
b5347a4
[SPARK-7248] implemented random number generators for DataFrames
brkyvz May 1, 2015
36a7a68
[SPARK-6479] [BLOCK MANAGER] Create off-heap block storage API
zhzhan May 1, 2015
a9fc505
HOTFIX: Disable buggy dependency checker
pwendell May 1, 2015
0a2b15c
[SPARK-4550] In sort-based shuffle, store map outputs in serialized form
sryza May 1, 2015
7cf1eb7
[SPARK-7287] enabled fixed test
brkyvz May 1, 2015
14b3288
[SPARK-7291] [CORE] Fix a flaky test in AkkaRpcEnvSuite
zsxwing May 1, 2015
c24aeb6
[SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation
MechCoder May 1, 2015
7fe0f3f
[SPARK-3468] [WEBUI] Timeline-View feature
sarutak May 1, 2015
3052f49
[SPARK-4705] Handle multiple app attempts event logs, history server.
May 1, 2015
3b514af
[SPARK-3066] [MLLIB] Support recommendAll in matrix factorization model
May 1, 2015
7630213
[SPARK-5891] [ML] Add Binarizer ML Transformer
viirya May 1, 2015
c8c481d
Limit help option regex
May 1, 2015
27de6fe
changing persistence engine trait to an abstract class
nirandaperera May 1, 2015
7d42722
[SPARK-5854] personalized page rank
dwmclary May 1, 2015
1262e31
[SPARK-6846] [WEBUI] [HOTFIX] return to GET for kill link in UI since…
srowen May 1, 2015
1686032
[SPARK-7183] [NETWORK] Fix memory leak of TransportRequestHandler.str…
viirya May 1, 2015
3753776
[SPARK-7274] [SQL] Create Column expression for array/struct creation.
rxin May 1, 2015
58d6584
Revert "[SPARK-7287] enabled fixed test"
pwendell May 1, 2015
c6d9a42
Revert "[SPARK-7224] added mock repository generator for --packages t…
pwendell May 1, 2015
f53a488
[SPARK-7213] [YARN] Check for read permissions before copying a Hadoo…
nishkamravi2 May 1, 2015
7b5dd3e
[SPARK-7281] [YARN] Add option to set AM's lib path in client mode.
May 1, 2015
4dc8d74
[SPARK-7240][SQL] Single pass covariance calculation for dataframes
brkyvz May 1, 2015
b1f4ca8
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
harishreedharan May 1, 2015
5c1faba
Ignore flakey test in SparkSubmitUtilsSuite
pwendell May 1, 2015
41c6a44
[SPARK-7312][SQL] SPARK-6913 broke jdk6 build
yhuai May 1, 2015
e6fb377
[SPARK-7304] [BUILD] Include $@ in call to mvn consistently in make-d…
May 2, 2015
98e7045
[SPARK-6999] [SQL] Remove the infinite recursive method (useless)
chenghao-intel May 2, 2015
ebc25a4
[SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in Received…
zsxwing May 2, 2015
b88c275
[SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the …
jerryshao May 2, 2015
4786484
[SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
koeninger May 2, 2015
ae98eec
[SPARK-3444] Provide an easy way to change log level
holdenk May 2, 2015
099327d
[SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a…
sryza May 2, 2015
2022193
[SPARK-7216] [MESOS] Add driver details page to Mesos cluster UI.
tnachen May 2, 2015
b4b43df
[SPARK-6443] [SPARK SUBMIT] Could not submit app in standalone cluste…
WangTaoTheTonic May 2, 2015
8f50a07
[SPARK-2691] [MESOS] Support for Mesos DockerInfo
hellertime May 2, 2015
38d4e9e
[SPARK-6229] Add SASL encryption to network library.
May 2, 2015
b79aeb9
[SPARK-7317] [Shuffle] Expose shuffle handle
May 2, 2015
2e0f357
[SPARK-7242] added python api for freqItems in DataFrames
brkyvz May 2, 2015
7394e7a
[SPARK-7120] [SPARK-7121] Closure cleaner nesting + documentation + t…
May 2, 2015
ecc6eb5
[SPARK-7315] [STREAMING] [TEST] Fix flaky WALBackedBlockRDDSuite
tdas May 2, 2015
856a571
[SPARK-3444] Fix typo in Dataframes.py introduced in []
deanchen May 2, 2015
da30352
[SPARK-7323] [SPARK CORE] Use insertAll instead of insert while mergi…
May 2, 2015
bfcd528
[SPARK-6030] [CORE] Using simulated field layout method to compute cl…
advancedxy May 2, 2015
82c8c37
[MINOR] [HIVE] Fix QueryPartitionSuite.
May 2, 2015
5d6b90d
[SPARK-5213] [SQL] Pluggable SQL Parser Support
chenghao-intel May 2, 2015
ea841ef
[SPARK-7255] [STREAMING] [DOCUMENTATION] Added documentation for spar…
BenFradet May 2, 2015
49549d5
[SPARK-7031] [THRIFTSERVER] let thrift server take SPARK_DAEMON_MEMOR…
WangTaoTheTonic May 2, 2015
f4af925
[SPARK-7022] [PYSPARK] [ML] Add ML.Tuning.ParamGridBuilder to PySpark
May 3, 2015
daa70bf
[SPARK-6907] [SQL] Isolated client for HiveMetastore
marmbrus May 3, 2015
9e25b09
[SPARK-7302] [DOCS] SPARK building documentation still mentions build…
srowen May 3, 2015
1ffa8cb
[SPARK-7329] [MLLIB] simplify ParamGridBuilder impl
mengxr May 4, 2015
9646018
[SPARK-7241] Pearson correlation for DataFrames
brkyvz May 4, 2015
3539cb7
[SPARK-5563] [MLLIB] LDA with online variational inference
hhbyyh May 4, 2015
343d3bf
[SPARK-5100] [SQL] add webui for thriftserver
tianyi May 4, 2015
5a1a107
[MINOR] Fix python test typo?
May 4, 2015
e0833c5
[SPARK-5956] [MLLIB] Pipeline components should be copyable.
mengxr May 4, 2015
f32e69e
[SPARK-7319][SQL] Improve the output from DataFrame.show()
May 4, 2015
fc8b581
[SPARK-6943] [SPARK-6944] DAG visualization on SparkUI
May 4, 2015
8055411
[SPARK-7243][SQL] Contingency Tables for DataFrames
brkyvz May 5, 2015
678c4da
[SPARK-7266] Add ExpectsInputTypes to expressions when possible.
rxin May 5, 2015
8aa5aea
[SPARK-7236] [CORE] Fix to prevent AkkaUtils askWithReply from sleepi…
BryanCutler May 5, 2015
e9b16e6
[SPARK-7314] [SPARK-3524] [PYSPARK] upgrade Pyrolite to 4.4
mengxr May 5, 2015
da738cf
[MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and…
pippobaudos May 5, 2015
c5790a2
[MINOR] [BUILD] Declare ivy dependency in root pom.
May 5, 2015
1854ac3
[SPARK-7139] [STREAMING] Allow received block metadata to be saved to…
tdas May 5, 2015
8776fe0
[HOTFIX] [TEST] Ignoring flaky tests
tdas May 5, 2015
8436f7e
[SPARK-7113] [STREAMING] Support input information reporting for Dire…
jerryshao May 5, 2015
4d29867
[SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark…
zsxwing May 5, 2015
fc8feaa
[SPARK-6653] [YARN] New config to specify port for sparkYarnAM actor …
May 5, 2015
4222da6
[SPARK-5112] Expose SizeEstimator as a developer api
sryza May 5, 2015
51f4620
[SPARK-7357] Improving HBaseTest example
JihongMA May 5, 2015
d497358
[SPARK-3454] separate json endpoints for data in the UI
squito May 5, 2015
b83091a
[MINOR] Minor update for document
viirya May 5, 2015
5ffc73e
[SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map…
zsxwing May 5, 2015
c6d1efb
[SPARK-7350] [STREAMING] [WEBUI] Attach the Streaming tab when callin…
zsxwing May 5, 2015
5ab652c
[SPARK-7202] [MLLIB] [PYSPARK] Add SparseMatrixPickler to SerDe
MechCoder May 5, 2015
5995ada
[SPARK-6612] [MLLIB] [PYSPARK] Python KMeans parity
FlytxtRnD May 5, 2015
9d250e6
Closes #5591
mengxr May 5, 2015
d4cb38a
[MLLIB] [TREE] Verify size of input rdd > 0 when building meta data
May 5, 2015
1fdabf8
[SPARK-7237] Many user provided closures are not actually cleaned
May 5, 2015
57e9f29
[SPARK-7318] [STREAMING] DStream cleans objects that are not closures
May 5, 2015
9f1f9b1
[SPARK-7007] [CORE] Add a metric source for ExecutorAllocationManager
jerryshao May 5, 2015
18340d7
[SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames
brkyvz May 5, 2015
ee374e8
[SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark
mengxr May 5, 2015
47728db
[SPARK-5888] [MLLIB] Add OneHotEncoder as a Transformer
sryza May 5, 2015
489700c
[SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs fo…
zsxwing May 5, 2015
735bc3d
[SPARK-7294][SQL] ADD BETWEEN
May 5, 2015
fec7b29
[SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatche…
zsxwing May 5, 2015
3059291
[SQL][Minor] make StringComparison extends ExpectsInputTypes
scwf May 5, 2015
c688e3c
[SPARK-7230] [SPARKR] Make RDD private in SparkR.
shivaram May 5, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
*.iml
*.iws
*.pyc
*.pyo
.idea/
.idea_modules/
build/*.jar
Expand Down Expand Up @@ -62,6 +63,8 @@ ec2/lib/
rat-results.txt
scalastyle.txt
scalastyle-output.xml
R-unit-tests.log
R/unit-tests.out

# For Hive
metastore_db/
Expand Down
18 changes: 18 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
target
cache
.gitignore
.gitattributes
.project
Expand All @@ -14,10 +15,12 @@ TAGS
RELEASE
control
docs
docker.properties.template
fairscheduler.xml.template
spark-defaults.conf.template
log4j.properties
log4j.properties.template
metrics.properties
metrics.properties.template
slaves
slaves.template
Expand All @@ -27,7 +30,13 @@ spark-env.sh.template
log4j-defaults.properties
bootstrap-tooltip.js
jquery-1.11.1.min.js
d3.min.js
dagre-d3.min.js
graphlib-dot.min.js
sorttable.js
vis.min.js
vis.min.css
vis.map
.*avsc
.*txt
.*json
Expand Down Expand Up @@ -65,3 +74,12 @@ logs
.*scalastyle-output.xml
.*dependency-reduced-pom.xml
known_translations
json_expectation
local-1422981759269/*
local-1422981780767/*
local-1425081759269/*
local-1426533911241/*
local-1426633911242/*
local-1427397477963/*
DESCRIPTION
NAMESPACE
22 changes: 13 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
## Contributing to Spark

Contributions via GitHub pull requests are gladly accepted from their original
author. Along with any pull requests, please state that the contribution is
your original work and that you license the work to the project under the
project's open source license. Whether or not you state this explicitly, by
submitting any copyrighted material via pull request, email, or other means
you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.
*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
- Is the change being proposed clearly explained and motivated?

Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
for more information.
When you contribute code, you affirm that the contribution is your original work and that you
license the work to the project under the project's open source license. Whether or not you
state this explicitly, by submitting any copyrighted material via pull request, email, or
other means you agree to license the material under the project's open source license and
warrant that you have the legal authority to do so.
47 changes: 47 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,36 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

========================================================================
For d3 (core/src/main/resources/org/apache/spark/ui/static/d3.min.js):
========================================================================

Copyright (c) 2010-2015, Michael Bostock
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* The name Michael Bostock may not be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL MICHAEL BOSTOCK BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

========================================================================
For Scala Interpreter classes (all .scala files in repl/src/main/scala
Expand Down Expand Up @@ -771,6 +801,22 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

========================================================================
For TestTimSort (core/src/test/java/org/apache/spark/util/collection/TestTimSort.java):
========================================================================
Copyright (C) 2015 Stijn de Gouw

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

========================================================================
For LimitedInputStream
Expand Down Expand Up @@ -798,6 +844,7 @@ BSD-style licenses
The following components are provided under a BSD-style license. See project link for details.

(BSD 3 Clause) core (com.github.fommil.netlib:core:1.1.2 - https://github.com/fommil/netlib-java/core)
(BSD 3 Clause) JPMML-Model (org.jpmml:pmml-model:1.1.15 - https://github.com/jpmml/jpmml-model)
(BSD 3-clause style license) jblas (org.jblas:jblas:1.2.3 - http://jblas.org/)
(BSD License) AntLR Parser Generator (antlr:antlr:2.7.7 - http://www.antlr.org/)
(BSD License) Javolution (javolution:javolution:5.5.1 - http://javolution.org)
Expand Down
6 changes: 6 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.o
*.so
*.Rd
lib
pkg/man
pkg/html
12 changes: 12 additions & 0 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

You can verify if your changes are good by running

R CMD check pkg/
67 changes: 67 additions & 0 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.

### SparkR development

#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR

You can start using SparkR by launching the SparkR shell with

./bin/sparkR

The `sparkR` script automatically creates a SparkContext with Spark by default in
local mode. To specify the Spark master of a cluster for the automatically created
SparkContext, you can run

./bin/sparkR --master "local[2]"

To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/sparkR <filename> <args>`. For example:

./bin/sparkR examples/src/main/r/pi.R local[2]

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh

### Running on YARN
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/pi.R 4
```
13 changes: 13 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Building SparkR on Windows

To build SparkR on Windows, the following steps are required

1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
include Rtools and R in `PATH`.
2. Install
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
`JAVA_HOME` in the system environment variables.
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`
46 changes: 46 additions & 0 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Script to create API docs for SparkR
# This requires `devtools` and `knitr` to be installed on the machine.

# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html

# Figure out where the script is
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
pushd $FWDIR

# Generate Rd file
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))'

# Install the package
./install-dev.sh

# Now create HTML files

# knit_rd puts html in current working directory
mkdir -p pkg/html
pushd pkg/html

Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")'

popd

popd
27 changes: 27 additions & 0 deletions R/install-dev.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
@echo off

rem
rem Licensed to the Apache Software Foundation (ASF) under one or more
rem contributor license agreements. See the NOTICE file distributed with
rem this work for additional information regarding copyright ownership.
rem The ASF licenses this file to You under the Apache License, Version 2.0
rem (the "License"); you may not use this file except in compliance with
rem the License. You may obtain a copy of the License at
rem
rem http://www.apache.org/licenses/LICENSE-2.0
rem
rem Unless required by applicable law or agreed to in writing, software
rem distributed under the License is distributed on an "AS IS" BASIS,
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

rem Install development version of SparkR
rem

set SPARK_HOME=%~dp0..

MKDIR %SPARK_HOME%\R\lib

R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\
36 changes: 36 additions & 0 deletions R/install-dev.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This scripts packages the SparkR source files (R and C files) and
# creates a package that can be loaded in R. The package is by default installed to
# $FWDIR/lib and the package can be loaded by using the following command in R:
#
# library(SparkR, lib.loc="$FWDIR/lib")
#
# NOTE(shivaram): Right now we use $SPARK_HOME/R/lib to be the installation directory
# to load the SparkR package on the worker nodes.


FWDIR="$(cd `dirname $0`; pwd)"
LIB_DIR="$FWDIR/lib"

mkdir -p $LIB_DIR

# Install R
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/
28 changes: 28 additions & 0 deletions R/log4j.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the file target/unit-tests.log
log4j.rootCategory=INFO, file
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.append=true
log4j.appender.file.file=R-unit-tests.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n

# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
org.eclipse.jetty.LEVEL=WARN
Loading