In [1]:
from pyspark.sql import SparkSession

In [2]:
spark = SparkSession.builder.appName('Transformations').getOrCreate()
sc =spark.sparkContext

### Narrow Transformations :
Narrow transformations are the result of map() and filter() functions and these compute data that live on a single partition meaning there will not be any data movement between partitions to execute narrow transformations.

- map: applies a function to each element in the RDD and returns a new RDD containing the - results.
- flatMap: applies a function to each element in the RDD and returns a new RDD containing the  concatenated results.
- filter: returns a new RDD containing only the elements that satisfy a given predicate.
- union: returns a new RDD containing the union of two RDDs.
- distinct: returns a new RDD containing the distinct elements of an RDD.
- sample: returns a random sample of the elements in an RDD.
- sortBy: sorts the elements of an RDD based on one or more key functions.


### Wide Transformations :
Wide transformations are transformations in Spark that require shuffling of data between partitions.

- groupByKey: groups the values for each key in an RDD and returns a new RDD containing the - grouped values.
- reduceByKey: aggregates the values for each key in an RDD and returns a new RDD containing the - reduced values.
- aggregateByKey: aggregates the values for each key in an RDD using a user-defined aggregation - function and returns a new RDD containing the aggregated values.
- join: joins two RDDs on a key and returns a new RDD containing the joined values.
- cogroup: groups the values for each key in two RDDs and returns a new RDD containing the - grouped values.
- repartition: rearranges the partitions of an RDD and returns a new RDD with the desired number - of partitions.
- sortByKey: sorts the elements of an RDD based on the keys and returns a new RDD sorted by the - keys.
- coalesce:

In [3]:
from pyspark.sql.functions import col
from pyspark.sql.functions import sum,min,max,avg,count
from pyspark.sql.functions import upper,round,to_date,date_format,to_timestamp

In [4]:
sales = spark.read.csv('./data/SalesAnalysis.csv',header=True,inferSchema=True)
sales.show(5)

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order ID|             Product|Quantity Ordered|Price Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|
|  176560|        Google Phone|               1|     600.0|04/12/19 14:38|669 Spruce St, Lo...|
|  176560|    Wired Headphones|               1|     11.99|04/12/19 14:38|669 Spruce St, Lo...|
+--------+--------------------+----------------+----------+--------------+--------------------+
only showing top 5 rows



In [5]:
type(sales)

pyspark.sql.dataframe.DataFrame

In [6]:
sales.createOrReplaceTempView('sales_view')

In [7]:
spark.sql('Select distinct(Product) from sales_view;').show()

+--------------------+
|             Product|
+--------------------+
|    Wired Headphones|
|  Macbook Pro Laptop|
|Apple Airpods Hea...|
|              iPhone|
|Lightning Chargin...|
|Bose SoundSport H...|
|USB-C Charging Cable|
|AAA Batteries (4-...|
|        20in Monitor|
|    27in FHD Monitor|
|     Vareebadd Phone|
|34in Ultrawide Mo...|
|            LG Dryer|
|AA Batteries (4-p...|
|        Google Phone|
|       Flatscreen TV|
|  LG Washing Machine|
|             Product|
|27in 4K Gaming Mo...|
|     ThinkPad Laptop|
+--------------------+
only showing top 20 rows



In [5]:
spark.sql(' select * from sales_view  limit 5;').show()

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order ID|             Product|Quantity Ordered|Price Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|
|  176560|        Google Phone|               1|       600|04/12/19 14:38|669 Spruce St, Lo...|
|  176560|    Wired Headphones|               1|     11.99|04/12/19 14:38|669 Spruce St, Lo...|
+--------+--------------------+----------------+----------+--------------+--------------------+



In [8]:
new_df = sales.groupBy('Product').agg(sum('Price Each')/count('Product')).show()
new_df

+--------------------+----------------------------------+
|             Product|(sum(Price Each) / count(Product))|
+--------------------+----------------------------------+
|    Wired Headphones|                11.989999999999549|
|  Macbook Pro Laptop|                            1700.0|
|Apple Airpods Hea...|                             150.0|
|              iPhone|                             700.0|
|                NULL|                              NULL|
|Lightning Chargin...|                14.949999999998477|
|Bose SoundSport H...|                 99.98999999999525|
|USB-C Charging Cable|                11.949999999998806|
|AAA Batteries (4-...|                2.9899999999998124|
|        20in Monitor|                 109.9900000000018|
|    27in FHD Monitor|                 149.9899999999961|
|     Vareebadd Phone|                             400.0|
|34in Ultrawide Mo...|                379.98999999999336|
|            LG Dryer|                             600.0|
|AA Batteries 

In [12]:
spark.sql("select * from sales_view where 'Quantity Ordered' >  ;").show()

+--------+-------+----------------+----------+----------+----------------+
|Order ID|Product|Quantity Ordered|Price Each|Order Date|Purchase Address|
+--------+-------+----------------+----------+----------+----------------+
+--------+-------+----------------+----------+----------+----------------+



In [13]:
sales  = sales.withColumnRenamed('Quantity Ordered','Quantity_Ordered')

In [14]:
sales.show()

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order ID|             Product|Quantity_Ordered|Price Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|
|  176560|        Google Phone|               1|     600.0|04/12/19 14:38|669 Spruce St, Lo...|
|  176560|    Wired Headphones|               1|     11.99|04/12/19 14:38|669 Spruce St, Lo...|
|  176561|    Wired Headphones|               1|     11.99|04/30/19 09:27|333 8th St, Los A...|
|  176562|USB-C Charging Cable|               1|     11.95|04/29/19 13:03|381 Wilson St, Sa...|
|  176563|Bose SoundSport H...|         

In [15]:
sales.createOrReplaceTempView('sales_view')

In [16]:
spark.sql("select * from sales_view where Quantity_Ordered > 1 limit 5;").show()

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order ID|             Product|Quantity_Ordered|Price Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|
|  176583|AAA Batteries (4-...|               2|      2.99|04/20/19 12:00|146 Jackson St, P...|
|  176586|AAA Batteries (4-...|               2|      2.99|04/10/19 17:00|365 Center St, Sa...|
|  176593|Lightning Chargin...|               2|     14.95|04/15/19 13:45|906 7th St, Portl...|
|  176595|    Wired Headphones|               3|     11.99|04/02/19 09:11|383 6th St, Los A...|
+--------+--------------------+----------------+----------+--------------+--------------------+



In [18]:
sales.filter(sales.Quantity_Ordered> 2).show(5)

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order ID|             Product|Quantity_Ordered|Price Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176595|    Wired Headphones|               3|     11.99|04/02/19 09:11|383 6th St, Los A...|
|  176645|AAA Batteries (4-...|               3|      2.99|04/26/19 18:38|514 Lake St, Dall...|
|  176674|AAA Batteries (4-...|               3|      2.99|04/20/19 20:53|907 West St, Aust...|
|  176841|AAA Batteries (4-...|               3|      2.99|04/26/19 22:50|177 Highland St, ...|
|  176847|AA Batteries (4-p...|               3|      3.84|04/03/19 16:00|616 9th St, Austi...|
+--------+--------------------+----------------+----------+--------------+--------------------+
only showing top 5 rows



- The problem is that you are using the filter method of the DataFrame class, which expects a SQL expression as a string or a Column object as the condition. However, you are passing a lambda function as the condition, which is not a valid argument for this method.

In [19]:
sales.filter(col('Quantity_Ordered') > 2).show(5)

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order ID|             Product|Quantity_Ordered|Price Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176595|    Wired Headphones|               3|     11.99|04/02/19 09:11|383 6th St, Los A...|
|  176645|AAA Batteries (4-...|               3|      2.99|04/26/19 18:38|514 Lake St, Dall...|
|  176674|AAA Batteries (4-...|               3|      2.99|04/20/19 20:53|907 West St, Aust...|
|  176841|AAA Batteries (4-...|               3|      2.99|04/26/19 22:50|177 Highland St, ...|
|  176847|AA Batteries (4-p...|               3|      3.84|04/03/19 16:00|616 9th St, Austi...|
+--------+--------------------+----------------+----------+--------------+--------------------+
only showing top 5 rows



In [25]:
# order by
sales.filter(col('Quantity_Ordered') > 2).orderBy('Product',ascending=False).show(5)

+--------+----------------+----------------+----------+--------------+--------------------+
|Order ID|         Product|Quantity_Ordered|Price Each|    Order Date|    Purchase Address|
+--------+----------------+----------------+----------+--------------+--------------------+
|  182857|Wired Headphones|               3|     11.99|04/24/19 12:12|817 Lake St, New ...|
|  221301|Wired Headphones|               3|     11.99|06/15/19 13:15|785 Hill St, New ...|
|  315983|Wired Headphones|               4|     11.99|12/12/19 12:59|520 Wilson St, Sa...|
|  165537|Wired Headphones|               4|     11.99|03/13/19 18:02|472 Cherry St, Se...|
|  291169|Wired Headphones|               3|     11.99|11/01/19 21:30|228 Highland St, ...|
+--------+----------------+----------------+----------+--------------+--------------------+
only showing top 5 rows



In [27]:
# order by
sales.filter(col('Quantity_Ordered') > 2).orderBy(col('Product').desc(),col('Order ID').asc()).show(5)

+--------+----------------+----------------+----------+--------------+--------------------+
|Order ID|         Product|Quantity_Ordered|Price Each|    Order Date|    Purchase Address|
+--------+----------------+----------------+----------+--------------+--------------------+
|  146446|Wired Headphones|               3|     11.99|01/01/19 18:20|678 2nd St, New Y...|
|  146495|Wired Headphones|               3|     11.99|01/18/19 10:17|296 7th St, Bosto...|
|  147101|Wired Headphones|               3|     11.99|01/26/19 10:08|898 Center St, Sa...|
|  150417|Wired Headphones|               3|     11.99|01/23/19 22:08|351 13th St, Atla...|
|  150439|Wired Headphones|               3|     11.99|01/27/19 20:08|193 9th St, San F...|
+--------+----------------+----------------+----------+--------------+--------------------+
only showing top 5 rows



In [7]:
sales = sales.withColumnRenamed('Order ID','Order_ID')

In [29]:
sales.select('Product','Order_ID').filter(col('Quantity_Ordered') > 2).orderBy(col('Product').desc(),col('Order ID').asc()).show(5)

+----------------+--------+
|         Product|Order_ID|
+----------------+--------+
|Wired Headphones|  146446|
|Wired Headphones|  146495|
|Wired Headphones|  147101|
|Wired Headphones|  150417|
|Wired Headphones|  150439|
+----------------+--------+
only showing top 5 rows



In [30]:
sales = sales.withColumnRenamed('Price Each','Price_Each')

In [34]:
sales.select('Product','Order_ID').filter((col('Quantity_Ordered') > 2) & (col('Price_Each') == 11.99 )).orderBy(col('Product').desc(),col('Order ID').asc()).show(5)

+----------------+--------+
|         Product|Order_ID|
+----------------+--------+
|Wired Headphones|  146446|
|Wired Headphones|  146495|
|Wired Headphones|  147101|
|Wired Headphones|  150417|
|Wired Headphones|  150439|
+----------------+--------+
only showing top 5 rows



In [36]:
sales.groupBy('Product').agg(sum('Quantity_Ordered')).show(5)

+--------------------+---------------------+
|             Product|sum(Quantity_Ordered)|
+--------------------+---------------------+
|    Wired Headphones|              20557.0|
|  Macbook Pro Laptop|               4728.0|
|Apple Airpods Hea...|              15661.0|
|              iPhone|               6849.0|
|                NULL|                 NULL|
+--------------------+---------------------+
only showing top 5 rows



In [37]:
#having
sales.groupBy('Product') \
    .agg(sum('Quantity_Ordered').alias('total_quantity')) \
    .filter(col('total_quantity') > 100) \
    .show(5)

+--------------------+--------------+
|             Product|total_quantity|
+--------------------+--------------+
|    Wired Headphones|       20557.0|
|  Macbook Pro Laptop|        4728.0|
|Apple Airpods Hea...|       15661.0|
|              iPhone|        6849.0|
|Lightning Chargin...|       23217.0|
+--------------------+--------------+
only showing top 5 rows



### functions

In [39]:
#upper
from pyspark.sql.functions import upper
sales.withColumn("product_upp",upper(col('Product'))).select(col('product'),col('product_upp')).show(5)

+--------------------+--------------------+
|             product|         product_upp|
+--------------------+--------------------+
|USB-C Charging Cable|USB-C CHARGING CABLE|
|                NULL|                NULL|
|Bose SoundSport H...|BOSE SOUNDSPORT H...|
|        Google Phone|        GOOGLE PHONE|
|    Wired Headphones|    WIRED HEADPHONES|
+--------------------+--------------------+
only showing top 5 rows



In [40]:
sales.show()

+--------+--------------------+----------------+----------+--------------+--------------------+
|Order_ID|             Product|Quantity_Ordered|Price_Each|    Order Date|    Purchase Address|
+--------+--------------------+----------------+----------+--------------+--------------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|
|  176560|        Google Phone|               1|       600|04/12/19 14:38|669 Spruce St, Lo...|
|  176560|    Wired Headphones|               1|     11.99|04/12/19 14:38|669 Spruce St, Lo...|
|  176561|    Wired Headphones|               1|     11.99|04/30/19 09:27|333 8th St, Los A...|
|  176562|USB-C Charging Cable|               1|     11.95|04/29/19 13:03|381 Wilson St, Sa...|
|  176563|Bose SoundSport H...|         

In [8]:
sales = sales.withColumnRenamed('Purchase Address','Purchase_Address')

In [43]:
#substring
sales.withColumn('subs',col('Purchase_Address').substr(1,4)).show(5)

+--------+--------------------+----------------+----------+--------------+--------------------+----+
|Order_ID|             Product|Quantity_Ordered|Price_Each|    Order Date|    Purchase_Address|subs|
+--------+--------------------+----------------+----------+--------------+--------------------+----+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|917 |
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|682 |
|  176560|        Google Phone|               1|       600|04/12/19 14:38|669 Spruce St, Lo...|669 |
|  176560|    Wired Headphones|               1|     11.99|04/12/19 14:38|669 Spruce St, Lo...|669 |
+--------+--------------------+----------------+----------+--------------+--------------------+----+
only showing top 5 rows



In [12]:
sales.printSchema()

root
 |-- Order_ID: integer (nullable = true)
 |-- Product: string (nullable = true)
 |-- Quantity_Ordered: integer (nullable = true)
 |-- Price Each: double (nullable = true)
 |-- Order Date: string (nullable = true)
 |-- Purchase_Address: string (nullable = true)



In [13]:
sales = sales.withColumnRenamed('Price Each','Price_Each')

In [20]:
sales = sales.withColumnRenamed('Order Date','Order_Date')

In [19]:
sales.withColumn('Revenue',round(col('Quantity_Ordered')*col('Price_Each'))).select('Product','Revenue').show()

+--------------------+-------+
|             Product|Revenue|
+--------------------+-------+
|USB-C Charging Cable|   24.0|
|                NULL|   NULL|
|Bose SoundSport H...|  100.0|
|        Google Phone|  600.0|
|    Wired Headphones|   12.0|
|    Wired Headphones|   12.0|
|USB-C Charging Cable|   12.0|
|Bose SoundSport H...|  100.0|
|USB-C Charging Cable|   12.0|
|  Macbook Pro Laptop| 1700.0|
|    Wired Headphones|   12.0|
|        Google Phone|  600.0|
|Lightning Chargin...|   15.0|
|27in 4K Gaming Mo...|  390.0|
|AA Batteries (4-p...|    4.0|
|Lightning Chargin...|   15.0|
|Apple Airpods Hea...|  150.0|
|USB-C Charging Cable|   12.0|
|        Google Phone|  600.0|
|USB-C Charging Cable|   12.0|
+--------------------+-------+
only showing top 20 rows



- Date functions

In [26]:
sales.printSchema()

root
 |-- Order_ID: integer (nullable = true)
 |-- Product: string (nullable = true)
 |-- Quantity_Ordered: integer (nullable = true)
 |-- Price_Each: double (nullable = true)
 |-- Order_Date: string (nullable = true)
 |-- Purchase_Address: string (nullable = true)
 |-- std_fmt: date (nullable = true)



In [35]:
sales = sales.withColumn('std_fmt', to_date(col('Order_Date'), 'MM/dd/yy HH:mm'))

In [30]:
#spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')
#spark.conf.set('spark.sql.legacy.timeParserPolicy', 'CORRECTED')

In [39]:
sales.show(4)

+--------+--------------------+----------------+----------+--------------+--------------------+----------+--------------+
|Order_ID|             Product|Quantity_Ordered|Price_Each|    Order_Date|    Purchase_Address|   std_fmt|      date_fmt|
+--------+--------------------+----------------+----------+--------------+--------------------+----------+--------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|2019-04-19|19/04/19 00:00|
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|      NULL|          NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|2019-04-07|07/04/19 00:00|
|  176560|        Google Phone|               1|     600.0|04/12/19 14:38|669 Spruce St, Lo...|2019-04-12|12/04/19 00:00|
+--------+--------------------+----------------+----------+--------------+--------------------+----------+--------------+
only showing top 4 rows


In [47]:
sales.withColumn('k_fmt', date_format(to_timestamp(col('Order_Date'),'MM/dd/yy HH:mm'),'dd/MM/yy HH:mm')).show(4)

+--------+--------------------+----------------+----------+--------------+--------------------+----------+--------------+--------------+
|Order_ID|             Product|Quantity_Ordered|Price_Each|    Order_Date|    Purchase_Address|   std_fmt|      date_fmt|         k_fmt|
+--------+--------------------+----------------+----------+--------------+--------------------+----------+--------------+--------------+
|  176558|USB-C Charging Cable|               2|     11.95|04/19/19 08:46|917 1st St, Dalla...|2019-04-19|19/04/19 00:00|19/04/19 08:46|
|    NULL|                NULL|            NULL|      NULL|          NULL|                NULL|      NULL|          NULL|          NULL|
|  176559|Bose SoundSport H...|               1|     99.99|04/07/19 22:30|682 Chestnut St, ...|2019-04-07|07/04/19 00:00|07/04/19 22:30|
|  176560|        Google Phone|               1|     600.0|04/12/19 14:38|669 Spruce St, Lo...|2019-04-12|12/04/19 00:00|12/04/19 14:38|
+--------+--------------------+----------

In [52]:
sales.agg(min('Price_Each')).show()

+---------------+
|min(Price_Each)|
+---------------+
|           2.99|
+---------------+



### Joins

In [None]:
j_df = df1.join(df2, df1['d']==df2['k',"inner"])

In [None]:
df.na.fill("",["column"])

### Creating new columns

In [54]:
from pyspark.sql.functions import lit
from pyspark.sql.types import IntegerType
sales = sales.withColumn('new',lit(None).cast(IntegerType()))
#None can be replaced with default values

In [55]:
sales.printSchema()

root
 |-- Order_ID: integer (nullable = true)
 |-- Product: string (nullable = true)
 |-- Quantity_Ordered: integer (nullable = true)
 |-- Price_Each: double (nullable = true)
 |-- Order_Date: string (nullable = true)
 |-- Purchase_Address: string (nullable = true)
 |-- std_fmt: date (nullable = true)
 |-- date_fmt: string (nullable = true)
 |-- new: integer (nullable = true)



### drop columns

In [56]:
sales = sales.drop('new')

In [57]:
sales.printSchema()

root
 |-- Order_ID: integer (nullable = true)
 |-- Product: string (nullable = true)
 |-- Quantity_Ordered: integer (nullable = true)
 |-- Price_Each: double (nullable = true)
 |-- Order_Date: string (nullable = true)
 |-- Purchase_Address: string (nullable = true)
 |-- std_fmt: date (nullable = true)
 |-- date_fmt: string (nullable = true)



- save files

In [None]:
df.write.csv(path)
df.coalesce().csv(path)
df.write.mode('append').csv(path)

In [61]:
sales.write.csv('./data1/',header=True,sep ='|')

Py4JJavaError: An error occurred while calling o316.csv.
: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
	at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
	at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249)
	at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454)
	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
	at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:334)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:404)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:377)
	at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:192)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$writeAndCommit$3(FileFormatWriter.scala:275)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:552)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:275)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240)
	at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:850)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.lang.Thread.run(Thread.java:748)
