## Writing Functions

You can’t do anything in data science without using functions, but have you ever written your own? Why would you?

- Efficiency
- Customized functionality
- Reproducibility
- Extend the work that’s already been done

There are many benefits to writing your own functions, and it’s actually easy to do. Once you get the basic concept down, you’ll likely find yourself using your own functions more and more.

There is less convincing needed for Python users to use functions.  If anything, many Pythonistas seem quick to write a custom function, rather than rely on tested code from an imported module.  They also seem eager to write custom classes, for which no other functions other than their custom-built ones will work for.  I believe this stems from Python being a general programming language rather than a data-science specific language, and most courses teach the basic programming part before data science applications, even when the latter is the focus.  In addition, while R and other statistical programming languages assume interactive/line-by-line use, Python does not, and many use it in a much different fashion than what would be more useful for data science.

In general, if something out there is available that is tested and already does the job, I suggest using it.

### A Starting Point

In [1]:
import pandas as pd
import numpy as np

In [18]:
def my_summary(x):
    out = pd.DataFrame(
        {
        'mean': np.mean(x),
        'sd': np.std(x),
        'N_missing': np.sum(np.isnan(x))
        },
        index = ['row1']   # index is required for 1 row result
    )
    return(out)


In [19]:
my_summary([1,2,3])

Unnamed: 0,mean,sd,N_missing
row1,2.0,0.816497,0
