<h1>FastAI library basics<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Installation" data-toc-modified-id="Installation-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Installation</a></span><ul class="toc-item"><li><span><a href="#Local" data-toc-modified-id="Local-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Local</a></span></li><li><span><a href="#Google-Colab" data-toc-modified-id="Google-Colab-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Google Colab</a></span></li></ul></li><li><span><a href="#Functions-(v0.7)" data-toc-modified-id="Functions-(v0.7)-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Functions (v0.7)</a></span><ul class="toc-item"><li><span><a href="#Data-pre-processing" data-toc-modified-id="Data-pre-processing-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Data pre-processing</a></span><ul class="toc-item"><li><span><a href="#add_datepart" data-toc-modified-id="add_datepart-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>add_datepart</a></span></li><li><span><a href="#train_cats" data-toc-modified-id="train_cats-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>train_cats</a></span></li><li><span><a href="#apply_cats" data-toc-modified-id="apply_cats-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>apply_cats</a></span></li><li><span><a href="#proc_df" data-toc-modified-id="proc_df-2.1.4"><span class="toc-item-num">2.1.4&nbsp;&nbsp;</span>proc_df</a></span></li></ul></li></ul></li></ul></div>

In this notebook I'll explain the most useful bits of the **fastai** library :)

In [1]:
from fastai.imports import *
from fastai.structured import *

# Installation

## Local
1. Clone https://github.com/fastai/fastai repo.
2. Follow the Readme.
 * For ML course: https://forums.fast.ai/t/fastai-v0-7-install-issues-thread/24652
 * If you are on Windows: https://forums.fast.ai/t/howto-installation-on-windows/10439?source_topic_id=10663
3. Install PyTorch: https://pytorch.org/get-started/locally/
	```
	conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
    ```
    
4. Install additional modules: https://medium.com/@GuruAtWork/fast-ai-lesson-1-7fc38e978d37
5. To include fast.ai 0.7 in your anaconda environment (Windows), so every project inside your env can use it:
	```
	[C:\Users\<User>\.conda\envs\<env>\Lib\site-packages >]
    mklink /J fastai C:\Users\<User>\Dropbox\DEV\projects\cloned\fastai\old\fastai
    ```

## Google Colab
 * Follow [this article](https://medium.com/@prakash_31206/fastest-way-to-setup-fast-ai-course-notebooks-for-free-using-google-colab-gpu-and-clouderizer-c8a004e1d50d)

# Functions (v0.7)

## Data pre-processing

FastAI gives the data scientists very useful functions to prepare datasets before feeding ML algorithms.

### add_datepart

`add_datepart(df, fldname, drop=True, time=False, errors="raise")`

Converts a column of the dataframe `df` from datetime64 to several columns containing the information from the date. The original column will be dropped by default.
This applies changes inplace.


Code:

```
    fld = df[fldname]
    fld_dtype = fld.dtype
    if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
        fld_dtype = np.datetime64

    if not np.issubdtype(fld_dtype, np.datetime64):
        df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True, errors=errors)
    targ_pre = re.sub('[Dd]ate$', '', fldname)
    attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
            'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
    if time: attr = attr + ['Hour', 'Minute', 'Second']
    for n in attr: df[targ_pre + n] = getattr(fld.dt, n.lower())
    df[targ_pre + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9
    if drop: df.drop(fldname, axis=1, inplace=True)
```

### train_cats

`train_cats(df)`

Change any column of string values in a Pandas dataframe (`df`) to a column of categorical values (from object to category type).
Applies the changes in place.

Code:

```
for n,c in df.items():
        if is_string_dtype(c): df[n] = c.astype('category').cat.as_ordered()
```

### apply_cats

`apply_cats(df, trn)`

Change any column of string values in a Pandas dataframe (`df`) to a column of categorical values (from object to category type), using another dataframe (`trn`) as template for the category codes, so the mapping category - code will be the same in df as in trn.
Applies the changes in place.

Code:

```
for n,c in df.items():
    if (n in trn.columns) and (trn[n].dtype.name=='category'):
        df[n] = c.astype('category').cat.as_ordered()
        df[n].cat.set_categories(trn[n].cat.categories, ordered=True, inplace=True)
```

### proc_df

`proc_df(df, y_fld=None, skip_flds=None, ignore_flds=None, do_scale=False, na_dict=None, preproc_fn=None, max_n_cat=None, subset=None, mapper=None)`

Basic example:
`df, y, nas, mapper = proc_df(df, 'ColumnName', do_scale=True)`

Takes a dataframe (`df`) as input and:
 * splits off the response variable, if `y_fld` is filled.
 * changes the columns of type category to numeric, adding 1 to the Pandas category code (so -1, the code for NA values, is replaced by 0; and 0 by 1, 1 by 2, etc).
 * for each numeric column of df which is not in `skip_flds` nor in `ignore_flds`, NA values are replaced by the median value of the column. A new boolean column is created with the same name of the original one followed by the *_na* suffix, containing the replaced rows.
 * applies a preprocessing function if specified in the `preproc_fn` field.
 * applies normalization (mean 0, std dev 1) if specified with the `do_scale` parameter (recommended for linear models)
 * returns the resulting dataframe, the target variable, and a dictionary with the name of the added columns (so it can be passed in the `na_dict` field for validation/test datasets). An additional `mapper` is returned if do_scale=True (so the same normalization can be applied to the validation/test datasets).

Code:

```
if not ignore_flds: ignore_flds=[]
if not skip_flds: skip_flds=[]
if subset: df = get_sample(df,subset)
else: df = df.copy()
ignored_flds = df.loc[:, ignore_flds]
df.drop(ignore_flds, axis=1, inplace=True)
if preproc_fn: preproc_fn(df)
if y_fld is None: y = None
else:
    if not is_numeric_dtype(df[y_fld]): df[y_fld] = pd.Categorical(df[y_fld]).codes
    y = df[y_fld].values
    skip_flds += [y_fld]
df.drop(skip_flds, axis=1, inplace=True)

if na_dict is None: na_dict = {}
else: na_dict = na_dict.copy()
na_dict_initial = na_dict.copy()
for n,c in df.items(): na_dict = fix_missing(df, c, n, na_dict)
if len(na_dict_initial.keys()) > 0:
    df.drop([a + '_na' for a in list(set(na_dict.keys()) - set(na_dict_initial.keys()))], axis=1, inplace=True)
if do_scale: mapper = scale_vars(df, mapper)
for n,c in df.items(): numericalize(df, c, n, max_n_cat)
df = pd.get_dummies(df, dummy_na=True)
df = pd.concat([ignored_flds, df], axis=1)
res = [df, y, na_dict]
if do_scale: res = res + [mapper]
return res
```