# SeriesStrMethodTransformer
This notebook shows the functionality in the `SeriesStrMethodTransformer` class. This transformer applys a `pd.Series.str` method to a specific column in the input `X`. <br>
This generic transformer means that many `pd.Series.str` methods are available for use within the package without having to directly implement a transformer for each specific function.

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tubular
from tubular.strings import SeriesStrMethodTransformer

In [3]:
tubular.__version__

'0.3.0'

## Create dummy dataset

In [4]:
df = pd.DataFrame(
    {
        "a": [1, 5, 2, 3, 3],
        "b": ["w", "w", "z", "y", "x"],
        "c": ["a", "a", "c", "b", "a"],
    },
    index=[10, 15, 200, 251, 59],
)

df["c"] = df["c"].astype("category")

In [5]:
df

Unnamed: 0,a,b,c
10,1,w,a
15,5,w,a
200,2,z,c
251,3,y,b
59,3,x,a


## Simple usage

### Initialising SeriesStrMethodTransformer

The user must specify the following; <br>
- `new_column_name` the name of the column to assign the outputs of the `pd.Series.str` method to <br> 
- `pd_method_name` the name of the `pd.Series.str` method to be called <br>
- `columns` the **column** in the `DataFrame` passed to the `transform` method to be transformed <br>
- `pd_method_kwargs` a dictionary of keyword arguments that are passed to the `pd.Series.str` method when called <br>

In [6]:
find_transformer = SeriesStrMethodTransformer(
    columns="c",
    pd_method_name="find",
    new_column_name="c_find",
    pd_method_kwargs={"sub": "a"},
)

### SeriesStrMethodTransformer fit
There is no fit method for the SeriesStrMethodTransformer as the methods that it can run do not 'learn' anything from the data.

### SeriesStrMethodTransformer transform
When running transform with this configuration a new column `c_find` is added to the input `X` which is the result or running `df['c'].str.find(sub = "a")`.

In [7]:
df_2 = find_transformer.transform(df)

In [8]:
df_2[["c", "c_find"]].head()

Unnamed: 0,c,c_find
10,a,0
15,a,0
200,c,-1
251,b,-1
59,a,0
