# Differences between SPARK df vs Pandas df in Great Expectations
### Before starting the dependencies we will use
```
jupyterlab==3.1.13
great-expectations==0.13.35
pyspark==3.1.2
```
### Create the df for pandas and based in this the Spark df

In [1]:
import great_expectations as ge

import pandas as pd

# first lets create a simple dataframe
data = {
  "String": ["one", "two", "two",],
  "Value": [1, 2, 2,],
}

# lets create a pandas dataframe
pd_df = pd.DataFrame.from_dict(data)

# we can use pandas to avoid needing to define schema
df = spark.createDataFrame(
  pd_df
)



NameError: name 'spark' is not defined

### Now let us create the appropriate great-expectations objects

In [None]:
# for pandas we create a great expectations object like this
pd_df_ge = ge.from_pandas(pd_df) 

# while for pyspark we can do it like this
df_ge = ge.dataset.SparkDFDataset(df)

# Running Great Expectations tests

Expectations return a dictionary of metadata, including a boolean "success" value

In [None]:
#this works the same for bot Panmdas and PySpark Great Expectations datasets
print(pd_df_ge.expect_table_row_count_to_be_between(1,10))

print(df_ge.expect_table_row_count_to_be_between(1,10))

# Differences between Great Expectations Pandas and Pyspark Datasets

In [None]:
# pandas datasets inherit all the pandas dataframe methods
print(pd_df_ge.count())

# while GE pyspark datasets do not and the following leads to an error
print(df_ge.count())

In [None]:
# however you can access the original pyspark dataframe using df_ge.spark_df
df_ge.spark_df.count()