# STATS INTRO NOTES

## DIFFERENT TESTING:
- Chi Square Test = χ2 test
- t-test
- correlation test

## VOCABULARY:

### Measures of Central Tendency
- mean: average value
- median: center most value
- mode: most frequently occuring value
    - bi-modal: two values tie as mode
- expected value: similar to mean but weighted

### Measures of Spread
- min: lowest value
- max: highest value
- range: difference between max and min
- percentile: cut into 100 equal parts

- quantile: an set of cut points that divides values equally

- quartile: slice a data set into four pieces
    - 25% of observations between min and Q1
    - 50% of observations between min and Q2
    - 75% of observations between min and Q3
    
- IQR: Q3-Q1
    
- Variance

- Standard Deviation: square root Variance 

- Skew: Symmetric - hump in middle and tails even on each side
    - left-skewed: mean is less than medial, left tail is longer
    - right-skewed: mean is greater then median, right tail is longer
    

# SIMULATIONS:
- Madeline file copy

In [8]:
%matplotlib inline
import numpy as np
import pandas as pd

np.random.seed(1349)

### How will we utilize Python to obtain probabilities?

We will utilize Monte Carlo simulations.

A Monte Carlo simulation is a means to recreate potential events and empirically take the results of simiulated trials to obtain a reasonably precise estimate of a desired probability.

What does this mean for us here?

In [7]:
# Let's take a hypothetical base probability. 
# What is the probability of rolling a one (1) on a single, standard, fair six-sided die?

In [9]:
# Potential outcomes of a die roll:
possible_outcomes = [1,2,3,4,5,6]

In [None]:
# options that equal 1: just 1, literally one

In [10]:
ideal_roll = 1

In [None]:
#theoretical probability: 1/6

In [11]:
1/6

0.16666666666666666

In [None]:
# Now how would we do this with a simulation?

In [None]:
# We will do it utilizing a large number of trials, that we calculate.

In [None]:
# Allow us to examine the same problem: Probability of rolling a 1 on a fair six-sided die.

In [12]:
# First, we will set a value for the number of trials that we want to conduct.
# We have the power of computation at our finger tips, so let's shoot for something like one million.

num_trails= 10 ** 5

In [13]:
# We have one die roll for each trial, which is our event, that we call a single simulation
n_dice= 1

In [None]:
# We will do a single simulation one million times, with each simulation being a die roll.

In [15]:
rolls = np.random.choice(possible_outcomes, num_trails*n_dice).reshape(num_trails, n_dice)

In [16]:
type(rolls)

numpy.ndarray

In [17]:
rolls

array([[3],
       [2],
       [5],
       ...,
       [4],
       [5],
       [5]])

In [18]:
rolls.shape

(100000, 1)

In [19]:
rolls == 1
#^indicates whether roll is 1 or not in True or False

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

In [20]:
(rolls == 1).mean()
#^this gives the amount of times that roll is 1

0.16894

## Generating Random Numbers with Numpy

The `numpy.random` module provides a number of functions for generating random numbers.

- `np.random.choice`: selects random options from a list
- `np.random.uniform`: generates numbers between a given lower and upper bound
- `np.random.random`: generates numbers between 0 and 1
- `np.random.randn`: generates numbers from the standard normal distribution
- `np.random.normal`: generates numbers from a normal distribution with a specified mean and standard deviation

## Example Problems

### Carnival Dice Rolls

> You are at a carnival and come across a person in a booth offering you a game
> of "chance" (as people in booths at carnivals tend to do).

> You pay 5 dollars and roll 3 dice. If the sum of the dice rolls is greater
> than 12, you get 15 dollars. If it's less than or equal to 12, you get
> nothing.

> Assuming the dice are fair, should you play this game? How would this change
> if the winning condition was a sum greater than *or equal to* 12?

In [24]:
n_trials = nrows= 10_000 #number of times we're going to roll
n_dice = ncols= 3 #number of dice

rolls = np.random.choice([1,2,3,4,5,6], n_trials * n_dice)
rolls

array([3, 1, 3, ..., 5, 6, 5])

In [26]:
rolls = np.random.choice([1,2,3,4,5,6], n_trials * n_dice).reshape(nrows,ncols)
rolls #reshape for 10,000 columns and 3 rows


array([[4, 3, 4],
       [4, 6, 5],
       [2, 3, 4],
       ...,
       [5, 5, 3],
       [1, 5, 2],
       [5, 2, 1]])

In [27]:
#AGAIN, we want an outcome that is OVER 12
sums_by_trial = rolls.sum()
sums_by_trial
#this adds everything up row by row

104735

In [32]:
#to do it CORRECTLY
sums_by_trial = rolls.sum(axis=1)
sums_by_trial

array([11, 15,  9, ..., 13,  8,  8])

In [29]:
# We can now convert each value in our array to a boolean value indicating whether or not we won:

wins = sums_by_trial >12
wins
#this will give us whether the 3 rolls added up to over 12 in True and False

In [37]:
win_rate = wins.mean()
win_rate

0.2525

### with win rate, we can calculate profit:
- $15 if your 3 rolls add up to over 12

- $5 to play the game


In [40]:
expected_winnings = win_rate * 15
cost = 5
expected_profit = expected_winnings - cost
expected_profit
#losing 1.2125 on average

-1.2125

In [41]:
expected_winnings

3.7875

In [42]:
# change the standards to 12 or greater! instead of just greater then 12

In [45]:
wins= sums_by_trial >=12
win_rate = wins.mean()
expected_winnings= win_rate * 15 #prize
cost= 5
expected_profit = expected_winnings -cost
expected_profit
# just by changing for 12 or greater... your probability changes

0.4764999999999997

## Winnings = 3.7875 (greater than 12) VS .47649 (12 or greater)