# Coming to Tablite from Pandas

If you're coming to Tablite from Pandas you will notice some differences.

Here's the ultra short comparison to the documentation from Pandas:

### 10 minutes intro to pandas
The tutorials provide the generic overview:

-  [pandas tutorial](https://pandas.pydata.org/docs/user_guide/10min.html#minutes-to-pandas) 
-  [tablite tutorial](https://github.com/root-11/tablite/blob/master/tutorial.ipynb) 

Some key differences


| topic | Tablite |
|---|---|
| Viewing data | Just use `table.show()` |
| Selection | Slicing works both on columns and rows, and you can filter using `any` or `all`: `table['A','B', 2:30:3].any(A=lambda x:x>3)` <br>to copy: `t2 = t.copy()`|
| Missing data | tablite uses `mixed` column format for any format that isn't uniform<br>To get rid of rows with `None`s and `np.nan`s use any:<br>`table.drop_na()` <br> `table.replace(None,5)` <br> `table.replace_missing_values(sources, target)` |
|Operations| Descriptive statistics are on a colum by column basis:<br>`table['a'].statistics()` <br> the pandas function `df.apply` doesn't exist in tablite. Use a list comprehension instead. For example: `df.apply(np.cumsum)` is just `np.cumsum(t)` <br> "histogramming" in tablite is per column: `table['a'].histogram()` <br> string methods? Use list comprehensions: `table['A', 'B'].any(A=lambda x: "hello" in x, B=lambda x: "world" in x)` |
|Merge | Concatenation: Just use `+` or `+=` as in `t1 = t2 + t3 += t4`. <br>If the columns are out of order, tablite will sort the headers according to the order in the first table.<br>If you're worried that the header mismatch use `t1.,stack(t2)` <br> Joins is ANSI92 compliant: `t1.join(t2, <...args...>)`. |
|Grouping| tablite supports multikey groupby using `from tablite import Groupby as gb`.<br> `table.groupby(keys, functions)` |
|Reshaping| to reshape a table use `transpose`. <br> to perform pivot table like operations, use: `table.pivot(rows, columns, functions)` subtotals aside tablite will give you everything Excels pivot table can do. |
|Time series| To convert time series use a list comprehension.<br>`t1['GMT'] = [timedelta(hours=1) + v for v in t1['date'] ]` <br> to generate a date range use:<br>`from Tablite import daterange`<br>`t['date'] = date_range(start=2022/1/1, stop=2023/1/1, step=timedelta(days=1))`|
|Categorical| Pandas only seems to use this for sorting and grouping. <br> Tablite table has `.sort`, `.groupby` and `.pivot`  to achieve the same task. |
|Plotting| import your favorite plotting package and feed it the values, such as:<br>`import matplotlib.pyplot as plt` <br> `plt.plot(t['a'],t['b'])` <br> `plt.showw()` |
|Import/Export| tablite supports the same import/export options as pandas, but is slightly less work as tablite guesses the datatype for all ISOformats and uses multiprocessing. |
|Gotchas| None really. Should you come across something non-pythonic, then please post it on the [issue list](https://github.com/root-11/tablite/issues). |




### intro to data structures

Pandas has `Series` and `DataFrame`. 

Tablite calls them `Column`s and `Table`s. A `Table` should have one or more `Column`s. `Table`s behave like a dict: `{column name: [list of values], ...}`

Pandas has a thorough explanation of how to get data in, such as:



```python
import pandas as pd
d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}

df = pd.DataFrame(d)
```

Doing the same in tablite requires:

```python

from tablite import Table
t = Table()
t['index'] = ["a", "b", "c", "d"]
t['one'] = [1.0, 2.0, 3.0]
t['two'] = [1.0, 2.0, 3.0, 4.0]
```

If you're lazy this also works:

```python

d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(d)
t = Table.from_dict(d)
```



Pandas also permits the usage of namedtuples, which are unpacked upon entry.

```python
from collections import namedtuple
Point = namedtuple("Point", "x y")
points = [Point(0, 0), Point(0, 3)]
pd.DataFrame(points)
```

Doing that in tablite is a bit different. To unpack the named tuple, you should do so explicitly:

```python
t = Table()
t['x'] = [p.x for p in points]
t['y'] = [p.y for p in points]
```
However should you want to keep the points as namedtuple, you can do so in tablite:

```python
t = Table()
t['points'] = points
```
Tablite will store a serialised version of the points, so your memory overhead will be close to zero.



Transposing in pandas means "rotate the table".

In tablite transpose can mean that too, but gives more options:

```python

t.transpose(on='a')  # rotate the table with 'a' becoming columns and the current columns becoming rows

t.transpose(columns=['Monday', 'Tuesday','Wednesday', 'Thursday', 'Friday'], as='day')  # turns the columns into a single column under the heading of 'day'

```


### Unique to tablite

daterange  # similar in pd.

datatypes.guess

xround on dates

replace missing values (source=[...], target=name)

