# Methods and Functions

This lesson covers:

* Calling functions with more than one input and output 
* Calling functions when some inputs are not used

Read the data in momentum.csv and creating some variable. This cell uses
some magic to automate repeated typing.

In [1]:
# Setup: Load the momentum data
import pandas as pd

momentum = pd.read_csv("data/momentum.csv", index_col="date", parse_dates=True)

print(momentum.head())

mom_01 = momentum["mom_01"]
mom_10 = momentum["mom_10"]

            mom_01  mom_02  mom_03  mom_04  mom_05  mom_06  mom_07  mom_08  \
date                                                                         
2016-01-04    0.67   -0.03   -0.93   -1.11   -1.47   -1.66   -1.40   -2.08   
2016-01-05   -0.36    0.20   -0.37    0.28    0.16    0.18   -0.22    0.25   
2016-01-06   -4.97   -2.33   -2.60   -1.16   -1.70   -1.45   -1.15   -1.46   
2016-01-07   -4.91   -1.91   -3.03   -1.87   -2.31   -2.30   -2.70   -2.31   
2016-01-08   -0.40   -1.26   -0.98   -1.26   -1.13   -1.02   -0.96   -1.42   

            mom_09  mom_10  
date                        
2016-01-04   -1.71   -2.67  
2016-01-05    0.29    0.13  
2016-01-06   -1.14   -0.45  
2016-01-07   -2.36   -2.66  
2016-01-08   -0.94   -1.32  


This data set contains 2 years of data on the 10 momentum portfolios from
2016–2018. The variables are named mom_XX where XX ranges from 01 (work
return over the past 12 months) to 10 (best return over the past 12 months). 

## Problem: Calling Methods
Get used to calling methods by computing the mean, standard deviation, skewness, kurtosis, max, and min. 

Use the DataFrame functions `mean`, `std`, `skew` and `kurt`, `min` and `max` to print the
values for `mom_01`.

In the second cell, call `describe`, a method that summarizes `Series` and `DataFrames` on `mom_01`.

In [2]:
# Use the functions attached to the Series
print(
    mom_01.mean(),
    mom_01.std(),
    mom_01.skew(),
    mom_01.kurt(),
    mom_01.min(),
    mom_01.max(),
)

0.10190854870775357 1.720167442855678 -0.10718993942161407 3.6858942336434177 -7.28 7.67


In [3]:
mom_01.describe()

count    503.000000
mean       0.101909
std        1.720167
min       -7.280000
25%       -0.615000
50%        0.080000
75%        0.890000
max        7.670000
Name: mom_01, dtype: float64

## Problem: Use NumPy and SciPy functions

Use the NumPy functions `mean`, `std`, `min`, `max` and the SciPy `stats` functions
`skew` and `kurtosis` to produce the same output.

In [4]:
# Use the NumPy functions and the statistics function in SciPY
# These are the same up to some bias-adjustment constants that depend only
# on sample size
import numpy as np
import scipy.stats as stats

print(
    np.mean(mom_01),
    np.std(mom_01),
    stats.skew(mom_01),
    stats.kurtosis(mom_01),
    np.min(mom_01),
    np.max(mom_01),
)

0.10190854870775357 1.7184566841600861 -0.10687002235784658 3.6374521972731158 -7.28 7.67


## Problem: Calling Functions with 2 Outputs

Some useful functions return 2 or more outputs. One example is ``np.linalg.slogdet`` 
computes the signed log determinant of a square array. It returns two output,
the sign and the log of the absolute determinant.

Use this function to compute the sign and log determinant of the 2 by 2 array:

```
1  2
2  9
```  

In [5]:
# The full set of outputs is returned as a tuple
data = np.array([[1, 2], [2, 9]])
output = np.linalg.slogdet(data)
print(output)

(1.0, 1.6094379124341005)


In [6]:
# Alternatively supply as many output as required to assign each component
sign, log_det = np.linalg.slogdet(data)
print(sign)
print(log_det)

1.0
1.6094379124341005


## Problem: Calling Functions with 2 Inputs

Many functions take two or more inputs. Like outputs, the inputs are simply
listed in order separated by commas. Use `np.linspace` to produce a series
of 11 points evenly spaced between 0 and 1.

```
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
```  

In [7]:
np.linspace(0, 1, 11)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

## Problem: Calling Functions using Keyword Arguments

Many functions have optional arguments. You can see these in a docstring since
optional arguments take the form `variable=default`. For example, see
the help for `scipy.special.comb`, which has the function signature

```
comb(N, k, exact=False, repetition=False)
```

This tells us that `N` and `k` are required and
that the other 2 inputs can be omitted if you are happy with the defaults.
However, if we want to change some of the optional inputs, then we can
directly use the inputs name in the function call.

Compute the number of distinct combinations of 5 objects from a set of 10.

In [8]:
from scipy.special import comb

comb(10, 5)

252.0

Compute the total number of combinations allowing for repetition 
using the `repetition=True` keyword argument.

In [9]:
comb(10, 5, repetition=True)

2002.0

Compute the number of combinations using the exact representation using 
only positional arguments for all 3 inputs.  Repeat using the keyword
argument for `exact`.

In [10]:
comb(10, 5, True)

252

In [11]:
comb(10, 5, exact=True)

252

## Problem: Function Help

Explore the help available for calling functions `?` operator. For example,

```python
import scipy.stats as stats

stats.kurtosis?
```  

opens a help window that shows the inputs and output, while

```python
help(stats.kurtosis)
```

shows the help in the console.

**Note**: VS Code does **not** support the `?` form of help

In [12]:
# Opens the help in a special window
stats.kurtosis?

In [13]:
help(stats.kurtosis)

Help on function kurtosis in module scipy.stats.stats:

kurtosis(a, axis=0, fisher=True, bias=True, nan_policy='propagate')
    Compute the kurtosis (Fisher or Pearson) of a dataset.
    
    Kurtosis is the fourth central moment divided by the square of the
    variance. If Fisher's definition is used, then 3.0 is subtracted from
    the result to give 0.0 for a normal distribution.
    
    If bias is False then the kurtosis is calculated using k statistics to
    eliminate bias coming from biased moment estimators
    
    Use `kurtosistest` to see if result is close enough to normal.
    
    Parameters
    ----------
    a : array
        Data for which the kurtosis is calculated.
    axis : int or None, optional
        Axis along which the kurtosis is calculated. Default is 0.
        If None, compute over the whole array `a`.
    fisher : bool, optional
        If True, Fisher's definition is used (normal ==> 0.0). If False,
        Pearson's definition is used (normal ==> 3.0)

## Problem: Use `help` with a method

Use `help` to get the help for the `kurt` method attached to `momentum`.

In [14]:
help(momentum.kurt)

Help on method kurt in module pandas.core.generic:

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) method of pandas.core.frame.DataFrame instance
    Return unbiased kurtosis over requested axis.
    
    Kurtosis obtained using Fisher's definition of
    kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
    
    Parameters
    ----------
    axis : {index (0), columns (1)}
        Axis for the function to be applied on.
    skipna : bool, default True
        Exclude NA/null values when computing the result.
    level : int or level name, default None
        If the axis is a MultiIndex (hierarchical), count along a
        particular level, collapsing into a Series.
    numeric_only : bool, default None
        Include only float, int, boolean columns. If None, will attempt to use
        everything, then use only numeric data. Not implemented for Series.
    **kwargs
        Additional keyword arguments to be passed to the function.
    
    Returns
   

## Exercises

### Exercise: Use `info`

Use the `info` method on `momentum` to get information about this `DataFrame`.

In [15]:
momentum.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 503 entries, 2016-01-04 to 2017-12-29
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   mom_01  503 non-null    float64
 1   mom_02  503 non-null    float64
 2   mom_03  503 non-null    float64
 3   mom_04  503 non-null    float64
 4   mom_05  503 non-null    float64
 5   mom_06  503 non-null    float64
 6   mom_07  503 non-null    float64
 7   mom_08  503 non-null    float64
 8   mom_09  503 non-null    float64
 9   mom_10  503 non-null    float64
dtypes: float64(10)
memory usage: 43.2 KB


### Exercise: Compute the day-by-day mean

Compute the day-by-day mean return of the portfolios in the momentum `DataFrame` using
the `axis` keyword argument. Use `head` and `tail` to show
the first 5 rows and last 5 rows 

In [16]:
day_by_day = momentum.mean(axis=1)
day_by_day.head()

date
2016-01-04   -1.239
2016-01-05    0.054
2016-01-06   -1.841
2016-01-07   -2.636
2016-01-08   -1.069
dtype: float64

In [17]:
day_by_day.tail()

date
2017-12-22   -0.025
2017-12-26    0.111
2017-12-27   -0.069
2017-12-28    0.218
2017-12-29   -0.555
dtype: float64

### Exercise: Compute the standard deviation of mean returns

Compute the standard deviation of the mean returns by chaining methods.

In [18]:
momentum.mean().std()

0.019303560023756744

### Exercise: Compute the average standard deviation

Compute the mean standard deviation as:

$$ \sqrt{N^{-1} \sum_{i=1}^N V[r_i]} $$

where $V[r_i]$ is the variance of portfolio $i$.

In [19]:
import numpy as np

np.sqrt(momentum.var().mean())

0.9702282548279761