# Consistency Issues

In order to be a proper abstraction layer, Fugue spends a lot of effort guaranteeing consistency. A solution prototyped on the Pandas engine must behave the same way when running on Spark, Dask, and Ray. The core Fugue repository has a unified test suite so all of the operations have the same results. So even if data teams had the bandwidth to re-write Python and Pandas solutions to native Spark, they have to worry about consistency.

Consistency comes in two ways, the first one is result consistency, and the second one is execution consistency.

## Result Consistency

Dask is more compatible with Pandas, but Spark is less so. Take a look at the following table that outlines differences in Pandas and Spark.

<img src="https://miro.medium.com/v2/resize:fit:1400/0*fv0FKyt3jB0ehVrU" align="left" width="700"/>

### Setup

First we create an identical DataFrame in both Pandas and Spark.

In [2]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

In [4]:
import pandas as pd

df = pd.DataFrame({"a": [None, None, 1, 1, 2, 2], "b": [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({"a":[None,1,2], "c":["a","b","c"]})
sdf = spark.createDataFrame(df)
sdf2 = spark.createDataFrame(df2)
df.head()

Unnamed: 0,a,b
0,,1
1,,2
2,1.0,3
3,1.0,4
4,2.0,5


### Joining

**Pandas**

Recall that pandas joins on index by default. We need to set the index to properly use join. We then `reset_index()` to get a as a column again.

In [7]:
df.merge(df2)

Unnamed: 0,a,b,c
0,,1,a
1,,2,a
2,1.0,3,b
3,1.0,4,b
4,2.0,5,c
5,2.0,6,c


In [8]:
sdf.join(sdf2, on="a").show()

+---+---+---+
|  a|  b|  c|
+---+---+---+
|1.0|  3|  b|
|1.0|  4|  b|
|2.0|  5|  c|
|2.0|  6|  c|
+---+---+---+

