Skip to content

milesgranger/pontem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pontem

Treat PySpark DataFrames like pandas.

This is currently just a hobby project, not suitable for use.

Turn somethinig like this:

# Pure PySpark API; df is type pyspark.sql.DataFrame
def multiply(n):
    return udf(lambda col: col * n, FloatType())
df = df.withColumn('new_col', df.select(multiply(2)(df['other_col'])))

...into this:

# Using pontem.core.DataFrame object.
df['new_col'] = df['other_col'] * 2