# Union and UnionByName in PySpark
In PySpark, both Union and UnionByName are operations that allow you to combine two or more DataFrames. However, they do this in slightly different ways, particularly regarding how they handle column names.


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *  # Import the function
spark = SparkSession.builder.getOrCreate()
from pyspark.sql.functions import regexp_replace, col
from google.colab import drive


## UNION()
The union() function is used to combine two DataFrames with the same schema (i.e., the same number of columns with the same data types). It appends the rows of one DataFrame to the other.

**Syntax:**
```
DataFrame.union(OtherDataFrame)
```


In [3]:
# Create two data frame
data1 = [("Alice", 1), ("Bob", 2)]
data2 = [("Cathy", 3), ("David", 4)]
columns = ["Name", "Id"]

# Perform Union
df1 = spark.createDataFrame(data1, columns)
df2 = spark.createDataFrame(data2, columns)

#Perform Union
result_of_union = df1.union(df2)

#Display the result
result_of_union.show()


+-----+---+
| Name| Id|
+-----+---+
|Alice|  1|
|  Bob|  2|
|Cathy|  3|
|David|  4|
+-----+---+



## UnionByName()

The unionByName() function allows you to combine two DataFrames by matching column names. If the DataFrames do not have the same schema, it will fill in missing columns with null.


```
DataFrame.unionByName(otherDataFrame, allowMissingColumns = False)
```



In [12]:
#The diffrent based column data set
data3 = [("Eve", 5), ("Frank", 6)]
data4 = [("Grace", "New York"), ("Hannah", "Los Angeles")]
columns1 = ["Name", "Id"]
columns2 = ["Name", "City"]

#Create dataframe
df3 = spark.createDataFrame(data3, columns1)
df4 = spark.createDataFrame(data4, columns2)

#Create new dataframe using unionByName()
result_union_by_name_df = df3.unionByName(df4, allowMissingColumns=True)

#Display the result
result_union_by_name_df.show()





+------+----+-----------+
|  Name|  Id|       City|
+------+----+-----------+
|   Eve|   5|       NULL|
| Frank|   6|       NULL|
| Grace|NULL|   New York|
|Hannah|NULL|Los Angeles|
+------+----+-----------+

