# STAT 440 Statistical Data Management - Fall 2021
## Week 13 Notes
### Created by Christopher Kinson and Huiqin Xin


***


## Table of Contents

- [Introduction to R Chapter 10](#introduction-to-r-chapter10)
  - [Custom Functions AKA Tools](#functions)
  


***

## <a name="introduction-to-r-chapter10"></a>Introduction to R Chapter 10

Below, I introduce the idea of creating your own functions and data management tools. The bulk of this material is covered in STAT 385. One reference textbook for that course is [An Introduction to R. Venables, Smith and the R Core Team](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf). Another useful reference textbook for that course is [Hands-On Programming with R by G. Grolemund's](https://rstudio-education.github.io/hopr/). **For python users, please check out [https://www.w3schools.com/python/python_functions.asp](https://www.w3schools.com/python/python_functions.asp).**

The crux of what you do as a programmer is create your own tools. These tools can greatly speed up data analysis, data wrangling, data visualization, modeling, etc. They speed up the processes because we can eliminate very specific redundant code, and utilize general flexible code that works for multiple uses and cases. 

### <a name="functions"></a>Custom Functions AKA Tools

Python allows users to create their own functions (i.e. user-defined functions) AKA tools using the `def` keyword. User-defined functions can be created with 

- \*arguments that may be named or unnamed; think of arguments as inputs 

- expressions which are code that perform the actions; think of expressions as the body of the function

- braces which wrap around the expressions

- \*optional returned objects as outputs

```
def newfunction(namedarg):
  expressions
```

Using the function to see the results becomes:

```
newfunction(input)
```

Your custom functions can be complicated or simple. It will depend on the nature of what you want to build and your programming skill set. Here are some examples.

1. A $t$-statistic for the two-sample case (found in the Introduction to R textbook)

In [39]:
import numpy as np
import math
def twosam(y1,y2):
    n1, n2 = len(y1), len(y2)
    yb1, yb2 = np.mean(y1), np.mean(y2)
    s1, s2 = np.var(y1), np.var(y2)
    s = ((n1-1)*s1 + (n2-1)*s2) / (n1+n2-2)
    tst = (yb1-yb2)/math.sqrt(s*(1/n1+1/n2))
    return tst
twosam(np.random.normal(size=20),np.random.normal(size=20))

1.2314817318210705

2. An $n$-dimensional correlation matrix with exchangeable correlation $\rho$

In [40]:
def exc(n,rho):
    mat = np.full((n,n),rho)
    for i in range(n):
        mat[i,i]=1
    return mat
exc(5,0.5)

array([[1. , 0.5, 0.5, 0.5, 0.5],
       [0.5, 1. , 0.5, 0.5, 0.5],
       [0.5, 0.5, 1. , 0.5, 0.5],
       [0.5, 0.5, 0.5, 1. , 0.5],
       [0.5, 0.5, 0.5, 0.5, 1. ]])

3. In STAT 385, we discussed card shuffling and the concept of randomization. Using the Uno deck of cards created as a dataset, we can create a function that shuffles the deck. This function is general because we supply the deck, and we do not require the function to use only the cards from those notes.


In [41]:
import pandas as pd
import random
cards_faces = (list(range(10)) + list(range(1,10)) + ['Skip','Reverse','Draw+2']*2)*4 + ['Wild','WildDraw+4']*4
cards_colors = ["red","blue","green","yellow"]*25 + ['any']*8
cards_points = (list(range(10)) + list(range(1,10)) + [20]*6)*4 + [50]*8
cards_tibble = pd.DataFrame(data={'face':cards_faces, 'color':cards_colors, 'point': cards_points})

def shuffle(deck):
    if not isinstance(deck, pd.DataFrame):
        return "Not a data frame. Supply a data frame as input."
    else:
        n = deck.shape[0]
        return deck.iloc[random.sample(list(range(n)),n)]
shuffle(cards_tibble)

Unnamed: 0,face,color,point
91,7,yellow,7
32,7,red,7
46,Draw+2,green,20
100,Wild,any,50
82,7,green,7
...,...,...,...
98,Reverse,green,20
103,WildDraw+4,any,50
39,5,yellow,5
44,Skip,red,20


#### END OF NOTES