Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/data frame methods #121

Merged
merged 327 commits into from
Oct 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
327 commits
Select commit Hold shift + click to select a range
4c8b377
Implement GetField
tools4origins Aug 17, 2020
966f630
Implement Contains
tools4origins Aug 17, 2020
bca996b
Implement StartsWith
tools4origins Aug 17, 2020
bd6b085
Implement EndsWith
tools4origins Aug 17, 2020
aeb18c3
Implement IsIn
tools4origins Aug 17, 2020
b58c31c
Implement IsNotNull
tools4origins Aug 17, 2020
d6beaff
Implement Cast
tools4origins Aug 17, 2020
f7f20ba
Implement Substring
tools4origins Aug 17, 2020
d2dd5eb
Implement IsNull
tools4origins Aug 17, 2020
14ac86f
Implement Alias
tools4origins Aug 17, 2020
c606a91
Expose operator functions
tools4origins Aug 17, 2020
9fd0c9c
Define SortOrder
tools4origins Aug 17, 2020
acb7b08
Define main orders
tools4origins Aug 17, 2020
5866720
Define Alias orders
tools4origins Aug 17, 2020
d48c464
Implement Literal Expression
tools4origins Aug 17, 2020
410be9d
Add a format schema util
tools4origins Aug 17, 2020
93e6f41
Add a utils to find an expr position in a schema
tools4origins Aug 17, 2020
5c38890
Add a FieldAsExpression function
tools4origins Aug 17, 2020
18fe649
Implement StarOperator Expression
tools4origins Aug 17, 2020
0790308
Implement CaseWhen Expression
tools4origins Aug 17, 2020
193d05f
Implement Otherwise Expression
tools4origins Aug 17, 2020
8b96a83
Implement col function
tools4origins Aug 17, 2020
4a06d2c
Implement when function
tools4origins Aug 17, 2020
3e3ea47
Add basic Column implementation
tools4origins Aug 18, 2020
ab68c9c
Implement a column operator parser util
tools4origins Aug 18, 2020
0a3d7a1
Implement Column arithmetic operators
tools4origins Aug 18, 2020
a39ccf4
Implement Column arithmetic right operators
tools4origins Aug 18, 2020
e2a2a15
Implement Column comparison operators
tools4origins Aug 18, 2020
9244178
Implement Column null safe equal operator
tools4origins Aug 18, 2020
4f36f2e
Implement Column logic operators
tools4origins Aug 18, 2020
09f6b18
Implement Column bitwise logic methods
tools4origins Aug 18, 2020
a20a8a5
Implement Column contains operator
tools4origins Aug 18, 2020
405cf2e
Implement Column getField method
tools4origins Aug 18, 2020
11a8e3d
Implement Column getattr method
tools4origins Aug 18, 2020
b5d21ff
Implement Column getitem method
tools4origins Aug 18, 2020
28bbaab
Implement Column iter method
tools4origins Aug 18, 2020
1134e96
Implement Column getItem method
tools4origins Aug 18, 2020
4fbae51
Implement Column contain method
tools4origins Aug 18, 2020
57dc79b
Clarify like and rlike implementation status
tools4origins Aug 18, 2020
6a64f48
Implement Column nullity methods
tools4origins Aug 18, 2020
6938c2f
Implement Column prefix check
tools4origins Aug 18, 2020
d120991
Implement Column suffix check
tools4origins Aug 18, 2020
4ea1ddb
Implement Column substring extraction method
tools4origins Aug 18, 2020
ba90dd1
Implement Column isin method
tools4origins Aug 18, 2020
76520d0
Reorder Column methods
tools4origins Aug 18, 2020
7b78d8e
Implement column order methods
tools4origins Aug 18, 2020
a3b7e3e
Implement Column alias method
tools4origins Aug 18, 2020
709794d
Alias Column alias method to name
tools4origins Aug 18, 2020
521fa1f
Implement Column cast methods
tools4origins Aug 18, 2020
96f83c8
Implement Column between method
tools4origins Aug 18, 2020
a2d9c1c
Implement a parse expression util
tools4origins Aug 18, 2020
24eb545
Implement Column when/otherwise methods
tools4origins Aug 18, 2020
3b31fe7
Implement an helper Column find position in schema method
tools4origins Aug 18, 2020
a312cde
Implement an helper Column find fields in schema method
tools4origins Aug 18, 2020
d491841
Implement Column eval method
tools4origins Aug 18, 2020
012bf26
Add Column multiple output boolean properties
tools4origins Aug 18, 2020
9a74386
Add Column aggregation boolean property
tools4origins Aug 18, 2020
c42d6f9
Expose Column output fields
tools4origins Aug 18, 2020
d0118e0
Implement Column aggregation methods
tools4origins Aug 18, 2020
befb64a
Implement Column initialization methods
tools4origins Aug 18, 2020
82f2195
Add Columm sort order property
tools4origins Aug 18, 2020
6ce91b6
Clarify window function implementation status
tools4origins Aug 18, 2020
dd3e08b
Prevent casting a column to a bool
tools4origins Aug 18, 2020
763a7d8
Add Column data type info properties
tools4origins Aug 18, 2020
c6342e3
Add Column string representation
tools4origins Aug 18, 2020
aaf2a30
Shorten lines that are longer than project code style
tools4origins Oct 3, 2020
dd25c8d
Remove not yet supported test that rely on Dataframe creation
tools4origins Oct 3, 2020
e0d8f70
Implement schema from list inference
tools4origins Aug 23, 2020
b7f6d1a
Implement schema from RDD inference
tools4origins Aug 23, 2020
18c9b16
Implement Schema from Columns builder
tools4origins Aug 23, 2020
7dd2035
Implement join column getter
tools4origins Aug 23, 2020
c38ce56
Implement schema merge logic for joins
tools4origins Aug 23, 2020
584f021
Implement Column computation method
tools4origins Aug 23, 2020
a0a09e3
Implement Field ID generator
tools4origins Aug 23, 2020
932a23f
Implement DataFrameInternal base object
tools4origins Aug 23, 2020
4cbdb03
Expose DFI unbound schema
tools4origins Aug 23, 2020
0040fd6
Implement DFI builder
tools4origins Aug 23, 2020
5f612e1
Expose DFI rdd
tools4origins Aug 23, 2020
d112137
Implement range in DFI
tools4origins Aug 23, 2020
bf0ad13
Implement count in DFI
tools4origins Aug 23, 2020
ea21cae
Implement collect in DFI
tools4origins Aug 23, 2020
8c722c3
Create SQL context object
tools4origins Aug 23, 2020
73661de
Implement SQL Context builder
tools4origins Aug 23, 2020
46411c0
Implement SQL context session builder
tools4origins Aug 23, 2020
30adc82
Add a Spark Conf setter to SQL context
tools4origins Aug 23, 2020
f5425b0
Implement DataFrame object
tools4origins Aug 23, 2020
15227a0
Bind SQL context to Session
tools4origins Aug 23, 2020
3bf5471
Implement RDD creation from local type
tools4origins Aug 23, 2020
d6e0129
Implement DataFrame creation
tools4origins Aug 23, 2020
b98baa8
Implement DataFrame creation from a range
tools4origins Aug 23, 2020
2b6e0b0
Implement DFI toLocalIterator
tools4origins Aug 23, 2020
cb23601
Implement DFI limit
tools4origins Aug 23, 2020
7a299ba
Implement DFI take
tools4origins Aug 23, 2020
738b50b
Implement DFI foreach
tools4origins Aug 23, 2020
fd02a6b
Implement DFI foreachPartition
tools4origins Aug 23, 2020
4743577
Implement DFI cache
tools4origins Aug 23, 2020
9ab9e42
Implement DFI persist
tools4origins Aug 23, 2020
0041b3e
Implement DFI unpersist
tools4origins Aug 23, 2020
76971c7
Implement DFI coalesce
tools4origins Aug 23, 2020
5920f56
Implement DFI distinct
tools4origins Aug 23, 2020
8b3b5c1
Implement DFI sample
tools4origins Aug 23, 2020
c84dffe
Implement DFI randomSplit
tools4origins Aug 23, 2020
69c2207
Add storageLevel property to DFI
tools4origins Aug 23, 2020
3d1f145
Implement DFI is_cached
tools4origins Aug 23, 2020
bef9501
Implement DFI simple_repartition
tools4origins Aug 23, 2020
abaf506
Implement DFI repartitionByValues
tools4origins Aug 23, 2020
51071f0
Implement DFI repartition
tools4origins Aug 23, 2020
c47c25c
Implement subset per partition extraction from a DFI
tools4origins Aug 23, 2020
a2d836e
Implement DFI repartitionByRange
tools4origins Aug 23, 2020
7a141ee
Implement DFI toJSON
tools4origins Aug 23, 2020
050508f
Implement DFI sortWithinPartitions
tools4origins Aug 23, 2020
a146534
Implement DFI sort
tools4origins Aug 23, 2020
df7a3eb
Implement DFI select
tools4origins Aug 23, 2020
bb17c91
Clarify DFI selectExpr status
tools4origins Aug 23, 2020
0eff052
Implement data frame internal filter method
tools4origins Aug 23, 2020
aefaca0
Implement data frame internal union method
tools4origins Aug 23, 2020
0c3819d
Implement data frame internal unionByName method
tools4origins Aug 23, 2020
1d68222
Implement data frame internal withColumn method
tools4origins Aug 23, 2020
86321e7
Implement data frame internal withColumnRenamed method
tools4origins Aug 23, 2020
9a5b80e
Implement data frame internal toDF method
tools4origins Aug 23, 2020
a8d5520
Implement data frame internal aggregate method
tools4origins Aug 23, 2020
e7b6207
Implement data frame internal describe method
tools4origins Aug 23, 2020
6f4689e
Implement data frame internal show method
tools4origins Aug 23, 2020
92daf50
Implement data frame approxQuantile method
tools4origins Aug 23, 2020
efe7a9d
Implement data frame correlation and covariance methods
tools4origins Aug 23, 2020
85cba7b
Implement data frame internal join methods
tools4origins Aug 23, 2020
de77418
Add a method to repartition and sort partitions
tools4origins Aug 23, 2020
5bf9aca
Implement data frame internal exceptAll method
tools4origins Aug 23, 2020
9b18960
Implement data frame internal intersectAll method
tools4origins Aug 23, 2020
d1a0e7d
Implement data frame internal intersect method
tools4origins Aug 23, 2020
c6062b9
Implement data frame internal drop method
tools4origins Aug 23, 2020
8845284
Clarify some method implementation status
tools4origins Aug 23, 2020
ebebb1f
Align code style with project guidelines
tools4origins Oct 3, 2020
2758f13
Implement the GroupedStats object
tools4origins Aug 23, 2020
b7955da
Implement GroupedDataFrame operations
tools4origins Aug 23, 2020
0bf97b1
Add abstract Aggregation class
tools4origins Oct 3, 2020
5c44c34
Add SimpleStatAggregation abstract class
tools4origins Oct 3, 2020
c034088
Implement Rand expression
tools4origins Oct 3, 2020
e9d6b00
Implement CreateStruct Expression
tools4origins Oct 3, 2020
d0127af
Implement MapFromArraysColumn Expression
tools4origins Oct 3, 2020
d955fa5
Implement Count aggregation
tools4origins Oct 3, 2020
74efe5c
Implement ArrayColumn Expression
tools4origins Oct 3, 2020
47b59c7
Implement CollectSet Aggregation
tools4origins Oct 3, 2020
fee8721
Implement typedLit function
tools4origins Oct 3, 2020
71a950a
Implement lit function
tools4origins Oct 3, 2020
d88c3b1
Implement rand function
tools4origins Oct 3, 2020
933cd58
Implement struct function
tools4origins Oct 3, 2020
a833168
Implement array function
tools4origins Oct 3, 2020
6a08443
Implement map_from_arrays function
tools4origins Oct 3, 2020
e1edd24
Implement count function
tools4origins Oct 3, 2020
96fe05b
Implement collect_set function
tools4origins Oct 3, 2020
5adedfb
Comment tests as they rely on the not yet implemented DataFrame.selec…
tools4origins Oct 3, 2020
99a6108
Store DataFrame SQL context
tools4origins Oct 3, 2020
0fbce05
Implement DataFrame.rdd
tools4origins Oct 3, 2020
4053b4c
Implement DataFrame.is_cached
tools4origins Oct 3, 2020
55f1e68
Implement DataFrame.dropna
tools4origins Oct 3, 2020
ed904b3
Implement DataFrame.fillna
tools4origins Oct 3, 2020
f45a37f
Implement DataFrame._check_replace_input
tools4origins Oct 3, 2020
9e5aa9e
Implement DataFrame.replace
tools4origins Oct 3, 2020
1b9105d
Implement DataFrameNaFunctions
tools4origins Oct 3, 2020
860e4b2
Implement DataFrame.na
tools4origins Oct 3, 2020
11e801a
Implement DataFrame.approxQuantile
tools4origins Oct 3, 2020
04baebd
Implement DataFrame.corr
tools4origins Oct 3, 2020
88d9bf4
Implement DataFrame.cov
tools4origins Oct 3, 2020
6ca1ae1
Implement DataFrame.crosstab
tools4origins Oct 3, 2020
ab6541d
Implement DataFrame.freqItems
tools4origins Oct 3, 2020
5df4576
Implement DataFrame.sampleBy
tools4origins Oct 3, 2020
99ec26d
Implement DataFrameStatFunctions
tools4origins Oct 3, 2020
57c00cc
Implement DataFrame.stat
tools4origins Oct 3, 2020
811e475
Implement DataFrame.toJSON
tools4origins Oct 3, 2020
228984d
Implement DataFrame.createTempView
tools4origins Oct 3, 2020
be4c032
Implement DataFrame.createOrReplaceTempView
tools4origins Oct 3, 2020
fbd7cfe
Implement DataFrame.createGlobalTempView
tools4origins Oct 3, 2020
da32e11
Implement DataFrame.createOrReplaceGlobalTempView
tools4origins Oct 3, 2020
903f15f
Implement DataFrame.schema
tools4origins Oct 3, 2020
17f51c4
Implement DataFrame.printSchema
tools4origins Oct 3, 2020
2f81c51
Clarify DataFrame.explain status
tools4origins Oct 3, 2020
794ad34
Implement DataFrame.exceptAll
tools4origins Oct 3, 2020
1cf84bc
Implement DataFrame.isLocal
tools4origins Oct 3, 2020
47f52cd
Implement DataFrame.isStreaming
tools4origins Oct 3, 2020
0985795
Implement DataFrame.show
tools4origins Oct 3, 2020
920cdb5
Implement DataFrame repr
tools4origins Oct 3, 2020
c867b2b
Clarify DataFrame checkpoint implementation status
tools4origins Oct 3, 2020
2833303
Clarify DataFrame localCheckpoint implementation status
tools4origins Oct 3, 2020
0d14693
Clarify DataFrame withWatermark implementation status
tools4origins Oct 3, 2020
0042264
Implement DataFrame.hint
tools4origins Oct 3, 2020
56d04b3
Implement DataFrame.count
tools4origins Oct 3, 2020
20b8040
Move DataFrame.collect definition
tools4origins Oct 3, 2020
4923b85
Implement DataFrame.toLocalIterator
tools4origins Oct 3, 2020
48e9bf5
Implement DataFrame.limit
tools4origins Oct 3, 2020
ddb9416
Implement DataFrame.take
tools4origins Oct 3, 2020
55bd41e
Implement DataFrame.foreach
tools4origins Oct 3, 2020
c842a9c
Implement DataFrame.foreachPartition
tools4origins Oct 3, 2020
f32ebb1
Implement DataFrame.cache
tools4origins Oct 3, 2020
ac5fe43
Implement DataFrame.persist
tools4origins Oct 3, 2020
4feb5d3
Implement DataFrame.storageLevel
tools4origins Oct 3, 2020
23d8644
Implement DataFrame.unpersist
tools4origins Oct 3, 2020
c00225d
Implement DataFrame.coalesce
tools4origins Oct 3, 2020
f078629
Implement DataFrame.repartition
tools4origins Oct 3, 2020
e3cdf49
Implement DataFrame.repartitionByRange
tools4origins Oct 3, 2020
ede630a
Implement DataFrame.distinct
tools4origins Oct 3, 2020
ceb0571
Implement DataFrame.sample
tools4origins Oct 3, 2020
817fafb
Implement DataFrame.randomSplit
tools4origins Oct 3, 2020
f6235b9
Implement DataFrame.dtypes
tools4origins Oct 3, 2020
007de20
Implement DataFrame.columns
tools4origins Oct 3, 2020
d886c3b
Clarify DataFrame.alias support
tools4origins Oct 3, 2020
c6d0c2f
Implement DataFrame.crossJoin
tools4origins Oct 3, 2020
c68dfd1
Implement DataFrame.join
tools4origins Oct 3, 2020
575cf06
Implement DataFrame.sortWithinPartitions
tools4origins Oct 3, 2020
2372609
Implement DataFrame._sort_cols
tools4origins Oct 3, 2020
b271370
Implement DataFrame.sort
tools4origins Oct 3, 2020
7f6d8a8
Implement DataFrame.orderBy
tools4origins Oct 3, 2020
aab5fdb
Implement DataFrame.describe
tools4origins Oct 3, 2020
55cbcfd
Implement DataFrame.summary
tools4origins Oct 3, 2020
edc0ec4
Implement DataFrame.head
tools4origins Oct 3, 2020
fe729cc
Implement DataFrame.first
tools4origins Oct 3, 2020
199bec6
Implement DataFrame.__getitem__
tools4origins Oct 3, 2020
06dc760
Implement DataFrame.__getattr__
tools4origins Oct 3, 2020
077e449
Implement DataFrame.select
tools4origins Oct 3, 2020
2bfd0d5
Implement DataFrame.selectExpr
tools4origins Oct 3, 2020
4a0ccd7
Implement DataFrame.filter
tools4origins Oct 3, 2020
4c444ed
Implement DataFrame.union
tools4origins Oct 3, 2020
cc0ccf6
Implement DataFrame.unionAll
tools4origins Oct 3, 2020
84d8b0e
Implement DataFrame.unionByName
tools4origins Oct 3, 2020
4f1754c
Implement DataFrame.intersect
tools4origins Oct 3, 2020
a6ea851
Implement DataFrame.intersectAll
tools4origins Oct 3, 2020
22a6be1
Implement DataFrame.subtract
tools4origins Oct 3, 2020
1ebca85
Implement DataFrame.dropDuplicates
tools4origins Oct 3, 2020
5a050fa
Implement DataFrame.withColumn
tools4origins Oct 3, 2020
844ce23
Implement DataFrame.withColumnRenamed
tools4origins Oct 3, 2020
020be36
Implement DataFrame.drop
tools4origins Oct 3, 2020
081c326
Implement DataFrame.toDF
tools4origins Oct 3, 2020
379f581
Implement DataFrame.transform
tools4origins Oct 3, 2020
754c62f
Implement a Spark to Panda type converter
tools4origins Oct 3, 2020
bbf3742
Implement DataFrame.toPandas
tools4origins Oct 3, 2020
e8b1b84
Implement DataFrame.drop_duplicates
tools4origins Oct 3, 2020
9c6eadf
Implement DataFrame.where
tools4origins Oct 3, 2020
85bb26d
Implement DataFrame.toPandas
tools4origins Oct 3, 2020
88ed28a
Enable logic that relies on previously unavailable components
tools4origins Oct 3, 2020
41dbf77
Enable tests that rely on previously not implemented components
tools4origins Oct 3, 2020
549acf2
Fix date to JSON convertion
tools4origins Oct 3, 2020
97fe02e
Clarify XORShiftRandom API
tools4origins Oct 3, 2020
40101b9
Align utils with newly implemented functions
tools4origins Oct 3, 2020
0f1190c
Implement RDD.toDF
tools4origins Oct 3, 2020
02c769a
Implement DFI sampleBy
tools4origins Oct 3, 2020
67ad2c9
Add support of aggregations in DataFrame.select
tools4origins Oct 3, 2020
ee6a05d
Implement DFI.summary
tools4origins Oct 3, 2020
41d4608
Implement DFI.crosstab
tools4origins Oct 3, 2020
ba84bbc
Implement DFI.dropDuplicates
tools4origins Oct 3, 2020
4d6ba0e
Add support of pivot without pivot value being specified
tools4origins Oct 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 9 additions & 11 deletions pysparkling/rdd.py
Original file line number Diff line number Diff line change
Expand Up @@ -2079,19 +2079,17 @@ def toDF(self, schema=None, sampleRatio=None):
:param samplingRatio: the sample ratio of rows used for inferring
:return: a DataFrame

# todo: Activate those tests once pysparkling.sql is implemented
# >>> from pysparkling import Context, Row
# >>> rdd = Context().parallelize([Row(age=1, name='Alice')])
# >>> rdd.toDF().collect()
# [Row(age=1, name='Alice')]
>>> from pysparkling import Context, Row
>>> rdd = Context().parallelize([Row(age=1, name='Alice')])
>>> rdd.toDF().collect()
[Row(age=1, name='Alice')]
"""
# # Top level import would cause cyclic dependencies
# Top level import would cause cyclic dependencies
# pylint: disable=import-outside-toplevel
# from pysparkling import Context
# from pysparkling.sql.session import SparkSession
# sparkSession = SparkSession._instantiatedSession or SparkSession(Context())
# return sparkSession.createDataFrame(self, schema, sampleRatio)
return NotImplementedError
from pysparkling import Context
from pysparkling.sql.session import SparkSession
sparkSession = SparkSession._instantiatedSession or SparkSession(Context())
return sparkSession.createDataFrame(self, schema, sampleRatio)


class MapPartitionsRDD(RDD):
Expand Down
Loading