## Chapter 10: Nonlinear Learning and Feature Engineering

# 10.1 Introduction

## 10.1.1 Mathematical, algorithmic, and empirical functions

Throughout this text we have often used the word *function* to refer to mathematical formulae like that of a line, a quadratic, a logistic sigmoid, etc. Such formulae are often called *mathematical functions* as well, hence our use of the phrase 'function' to refer to them.  However the word 'function' is a much more general term, and broadly encompasses any rule or set of rules for assigning or transforming some kind of 'input' into some kind of 'output'.  For example a recipe for chocolate cake is an example of a function, that is a set of rules for how to take certain inputs - like eggs, flour, sugar, cocoa powder, etc., - and transform them into an 'output' chocolate cake.   

Another very common type of function is an *algorithm*, or one that is naturally expressed programaticaly (that is, in code).  A classic example of an *algorithmic function* is a basic *sorting scheme*.  Sorting schemes take in *input*  lists of numbers like the following

\begin{equation}
[18,25,10,33,16,75,50,14]
\end{equation}

and return as *output* its sorted version 

\begin{equation}
[10, 14, 16, 18, 25, 33, 50, 75].
\end{equation}

Such a function is easy to express as code, and not so much in mathematical formulae.

We have also seen a variety of algorithmic functions in prior Chapters.  For example *gradient descent* which (see Section 3.5) is an algorithmic function that takes in various inputs - a mathematical function to minimize, an initial point, a steplength, and a maximum number of steps to take - and returns certain outputs (e.g., a history of descent steps).  The process of building a Bag of Words histogram - as detailed at a high level in Section 9.1 - is another an example of an algorithmic function.  Here the input, raw text, is transformed into a numerical histogram representing counts of the words the input contains.  Such a function is much more easily detailed programatically, in code, than it is mathematically, using formulae. 

Data is the third sort of function we deal with regularly - each instance of which we refer to as an *emprical function*.  The generic set of supervised learning data which we often denote by $\left\{\mathbf{x}_p,y_p\right\}_{p=1}^P$ always defines a trivial rule for assigning input to output: pairing.  The input $\mathbf{x}_p$ is simply paired with its output $y_p$.  This perahps can be more easily see by listing off each data pair in a table, as shown below. 



<br>
<table>
    <thead style="background-color: #eee;">
        <tr>
            <th>input: &nbsp;$\mathbf{x}_p$</th>
            <th>output: &nbsp;$y_p$</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td class="centered">$\mathbf{x}_1$</td>
            <td class="centered">$y_1$</td>
        </tr>
        <tr>
            <td class="centered">$\mathbf{x}_2$</td>
            <td class="centered">$y_2$</td>
        </tr>
        <tr>
            <td class="centered">$\vdots$</td>
            <td class="centered">$\vdots$</td>
        </tr>
        <tr>
            <td class="centered">$\mathbf{x}_P$</td>
            <td class="centered">$y_P$</td>
        </tr>
    </tbody>
</table>
<br>


These three categories of functions - *mathematical*, *algorithmic*, and *empircal* - are not mutually exclusive by any means.  For example, we can write down the formula of a line (a mathematical function) and implement it in code (an algorithmic function).  To visualize this line we can plug in a fine range of inputs into our implementation (an algorithmic function), producing a corresponding set of outputs, and plot the input/output pairs (an empircal dataset).