# Transformation from Pyspark to Pandas Table

In this notebook we will show you how to transform a Pyspark table into a pandas table

We will start by importing all of the Pyspark machinery and a function that generates the Pyspark table that we will use.

In [1]:
# Spark related machinery
import pyspark
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql.types import *
from pyspark.sql import SparkSession, HiveContext, Window
from pyspark.sql.functions import concat_ws

spark = pyspark.sql.SparkSession.builder.enableHiveSupport().getOrCreate()

In [2]:
#Import function
from pyspark_functions import create_sp_table3
import pandas as pd

In the following block of code we create a table called **data** using the function ```create_sp_table3```. Then, we will print it to screen:

In [3]:
#Create table
data = create_sp_table3()

#print table dorted by id
data.show()

+---+----+-----+
| id|team|score|
+---+----+-----+
|  5|   3|   81|
|  1|   2|   14|
|  2|   1|   48|
|  5|   1|   44|
|  3|   2|   97|
|  1|   2|   70|
|  2|   2|   60|
|  5|   2|   89|
|  4|   3|   38|
|  1|   2|   30|
|  3|   3|   79|
|  1|   3|   43|
|  2|   2|   75|
|  1|   3|   64|
|  1|   1|   16|
|  4|   2|   61|
|  2|   2|    0|
|  5|   3|   57|
|  5|   3|   87|
|  3|   3|   83|
+---+----+-----+



In order to transform this table into a Pandas table we will use the ```toPandas``` methods, see the block of code below:

In [4]:
#Transform to a Pandas table
pd_data = data.toPandas()

#Print the first 4 rows of the new Pandas table
pd_data.head(4)

Unnamed: 0,id,team,score
0,5,3,81
1,1,2,14
2,2,1,48
3,5,1,44


# Final Words
Now you know how to transform Pyspark tables into Pandas tables. It is important to mention that this method works fine when the tables are relatively small. If the tables are big (million of rows) you might get errors when using this method. In those cases you would need to use Pyarrow to make the transformation.