# Pivot in PySpark

The pivot operation in PySpark is used to transpose rows into columns based on a specified column's unique values. It's particularly useful for creating wide-format data where values in one column become new column headers, and corresponding values from another column fill those headers.

```
dataframae.groupBy("group_column").pivot("pivot_column").agg(aggregation_function)
```



In [8]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *  # Import the function
spark = SparkSession.builder.getOrCreate()
from pyspark.sql.functions import regexp_replace, col
from google.colab import drive


#### Code Implementation

In [10]:
#Create Data
data =[
    ("A", "North", 1000),
    ("A", "South", 1500),
    ("B", "North", 2000),
    ("B", "South", 1250),
    ("C", "North", 3000)
    ]
#Create Schema
columns = ["Product", "Region", "Sales"]

#Create DataFrame
df = spark.createDataFrame(data, columns)

# Disply the Data
df.show()

# Create Pivode data frame
pivot_df = df.groupBy("Product").pivot("Region").agg(sum("Sales"))

#Disply pivot data freame
pivot_df.show()



+-------+------+-----+
|Product|Region|Sales|
+-------+------+-----+
|      A| North| 1000|
|      A| South| 1500|
|      B| North| 2000|
|      B| South| 1250|
|      C| North| 3000|
+-------+------+-----+

+-------+-----+-----+
|Product|North|South|
+-------+-----+-----+
|      B| 2000| 1250|
|      C| 3000| NULL|
|      A| 1000| 1500|
+-------+-----+-----+

