## A beautifull way to work with data

This is an API proposal to access data.

Dataframes would have rows and columns. 

* To access columns just use df.cols()
* To access rows just use df.rows()
* I/O operations to load and save data are in Optimus. op.load.csv(). op.save.csv()

Easy and simple

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from optimus import *

from pyspark.sql.session import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, BooleanType, IntegerType, ArrayType

sc = SparkSession.builder.getOrCreate()

In [3]:
# Create optimus
op = Optimus(sc)

Using a created Spark Session...
Done.


## Create dataframe
### Spark

This is ugly:

```
val someData = Seq(
  Row(8, "bat"),
  Row(64, "mouse"),
  Row(-27, "horse")
)

val someSchema = List(
  StructField("number", IntegerType, true),
  StructField("word", StringType, true)
)

val someDF = spark.createDataFrame(
  spark.sparkContext.parallelize(someData),
  StructType(someSchema)
)```

In [15]:
# Thanks Mr Powers
df = op.create.df([
                ("  I like     fish  ", 1, "dog", "housé", 5 ),
                ("    zombies", 2, "cat", "tv", 6),
                ("simpsons   cat lady", 2, "frog", "table", 7),
                (None, 3, "eagle", "glass", 8)
            ],
            [
                ("words", "str", True),
                ("num", "int", True),
                ("animals", "str", True),
                ("thing", StringType(), True),
                ("second", "int", True)
            ])

df.show()

+-------------------+---+-------+-----+------+
|              words|num|animals|thing|second|
+-------------------+---+-------+-----+------+
|  I like     fish  |  1|    dog|housé|     5|
|            zombies|  2|    cat|   tv|     6|
|simpsons   cat lady|  2|   frog|table|     7|
|               null|  3|  eagle|glass|     8|
+-------------------+---+-------+-----+------+



### Math Operations

In [73]:
print(df.cols().min("num"))
print(df.cols().max("num"))
print(df.cols().range(["num","second"]))
print(df.cols().median(["num","second"]))

print(df.cols().stddev("num"))
print(df.cols().kurt("num"))
print(df.cols().mean("num"))
print(df.cols().skewness("num"))
print(df.cols().sum("num"))
print(df.cols().variance("num"))

{'num': 1}
{'num': 3}
{'num': 2.0, 'second': 2.0}
{'num': 0.816496580927726}
{'num': -1.0000000000000002}
{'num': 2.0}
{'num': 0.0}
{'num': 0.0}
{'num': 0.6666666666666666}


In [82]:
df\
    .cols().trim("words")\
    .cols().lower("words")\
    .cols().upper("animals")\
    .cols().reverse("thing")\
    .show()
    

+-------------------+---+-------+-----+------+
|              words|num|animals|thing|second|
+-------------------+---+-------+-----+------+
|    i like     fish|  1|    DOG|ésuoh|     5|
|            zombies|  2|    CAT|   vt|     6|
|simpsons   cat lady|  2|   FROG|elbat|     7|
|               null|  3|  EAGLE|ssalg|     8|
+-------------------+---+-------+-----+------+

