## Tableau and Tabpy

#### Getting Started:

1. In a new terminal window, type:
- ```pip install tabpy```
- ```pip install tabpy-client```
- ```tabpy```


This should start a server allowing you to use that terminal window as a server for Tableau to connect to.

###### Reference: https://github.com/tableau/TabPy/blob/master/docs/server-install.md

#### Loading Tableau

Open Tableau and connect to the "Sample - Superstore" dataset 
- (probably in `Documents > My Tableau Repository > Datasources > 2020.2 > en_US-US > Sample Superstore.xls` )

In order to connect to the Tableau server the first time, you'll have to do the following: 
- Help > Settings and performance > Manage analytics extension connection
- Choose `TabPy/External API` from the from the `Select an Analytics Extension` dropdown
- Choose `localhost` from the `Server` dropdown.
- Type `9004` in the `Port` section

Next, import the necessary libraries:

In [2]:
import numpy as np
import tabpy_client

Define a connection for later, so we know which server to send the function to


In [3]:
connection = tabpy_client.Client('http://localhost:9004/')

Let's do a little feature-engineering by multiplying two columns together.

This is a very simple manipulation, but you can use `tabpy` to make machine-learning models and more.

In [14]:
def multiply_two_columns(col1, col2):
    """
     Multiplies two columns

    Arguments
    ---------
    col1 : list
        A column from Tableau
        
    col2 : list
        A column from Tableau

    Returns
    -------
    col3 : list
        A new column for the DataFrame
    """

    col3 = np.multiply(col1, col2)
    
    return col3.tolist()

In [15]:
# Sends the function to the server to be used by Tableau
connection.deploy('ColumnMultiplier',  # name, can be anything. This is the name you'll be referencing later
                  multiply_two_columns,   # the name of the function we're sending
                  description = 'Returns the product of two columns', # A short docstring
                  override=True)  # force update (in case you want to change the function)

Next, go to your Tableau workbook, go to a sheet, and right-click on the left pane (where the column names are)

Choose `Create > Calculated field...`

Type the following (or copy and paste) into the new window:

```
SCRIPT_REAL("

return tabpy.query('ColumnMultiplier', _arg1, _arg2)['response']

",

SUM([Quantity]), SUM([Profit]))
```

To see the calculation in action:

- drag `State` to rows
- Click on 'Show Me' in the top-right
- Choose the (second) map
- You should have a now
- Drag `Calculation 1` (or whatever you named the new Calculated Field) to color
- You're done!



### Further Reading

If you want to try something more complicated, [here](https://www.tableau.com/about/blog/2017/1/building-advanced-analytics-applications-tabpy-64916) is a blog where a Jupyter notebook is used to make multiple machine-learning models and return a diagnosis based on attributes of a tumor. 

[Here](https://github.com/tableau/TabPy) is the Tabpy github page.

[Here](https://help.tableau.com/current/prep/en-us/prep_scripts._R.htm) is a post to get you started using R with Tableau.