## Understanding Basic Statistics

# Notes

---

# Chapter 2: Organizing Data

## Section 2.1 _Frequency Distributions, Histograms, and Related Topics_

### np.histogram
https://numpy.org/doc/stable/reference/generated/numpy.histogram.html

## Section 2.2 _Bar Graphs, Circle Graphs, and Time-Series Graphs_

## Section 2.3 _Stem-and-Leaf Displays_

A __stem-and-leaf display__ is a method of exploratory data analysis that is used to rank-order and arrange data into groups.


---

# Chapter 3: Averages and Variation

## Section 3.1 _Measures of Central Tendency: Mode, Median, and Mean_

---

### Mode
The __mode__ of a data set is the value that occurs most frequently.  _Note:_ If a data set has no single value that occurs more frequently than any other, then that data set has no mode.

`mode(data)` <br>
`vals, cnts = np.unique(data, return_counts=True)` <br>
`scipy.stats.mode(data)` <br>

---

### Median
The __median__ is the central value of an ordered distribution

For an ordered data set of size $n$, position of the middle value is $\frac{n + 1}{2}$

`np.median(data)`

---

### Mean
Sample mean, $\bar{x} = \frac{\Sigma x}{n}$

Population mean, $\mu = \frac{\Sigma x}{N}$

where $n$ is the number of data values in the sample, and $N$ is the number of data values in the population.

`np.mean(data)`

---

### Trimmed mean

`stats.trim_mean(data, 0.05)`

---

### Weighted Average

Weighted Average, $\frac{\Sigma x w}{\Sigma w}$

where $x$ is a data value and $w$ is the weight assigned to that data value. The sum is taken over all data values.

`np.average(data, weights=weights)`



## Section 3.2 _Measures of Variation_

---

### Sum of Squares

$$ \Sigma(x - \bar x)^2 = \Sigma x^2 - \frac{(\Sigma x)^2}{n}$$

Defining formula equals computation formula.

---

### Sample Variance

$$s^2 = \frac{\Sigma (x - \bar x)^2}{n-1}$$

`np.var(data, ddof=1)`

---

### Sample Standard Deviation

$$s = \sqrt{\frac{\Sigma (x - \bar x)^2}{n-1}}$$

`np.std(data, ddof=1)`

---

### Population Variance

$$ \sigma^2 = \frac{\Sigma (x - \mu)^2}{N}$$

`np.var(data, ddof=0)`

---

### Population Standard Deviation

$$ \sigma = \sqrt{\frac{\Sigma (x - \mu)^2}{N}} $$

`np.std(data, ddof=0)`

---

### Coefficient of variation (CV)

$$ CV = \frac{s}{\bar x} \cdot 100 \% $$

$$ CV = \frac{\sigma}{\mu} \cdot 100 \% $$

---

### Chebyshev's Theorem

For _any_ set of data, the proportion of the data that must lie within $k$ standard deviations on either side of the mean is _at least_ $$1 − \frac{1}{k^2}$$ where $k > 1$.

---



## Section 3.3 Percentiles and Box-and-Whisker Plots

---

### Percentile

For whole numbers, $P$ (where $1 \le P \le 99$ ), the $Pth$ __percentile__ of a distribution is a value such that the $P\%$ of the data fall at or below it and $(100 - P)\%$ of the data fall at or above it.

`np.percentile(data, q)` 
> where:
>> __q : array_like of float__ in the range \[0, 100\].

---

### Quartile

`np.quantile(data, q)` where __q : array_like of float__ in the range \[0, 1\].

---

### Interquartile range (IQR)

The difference between the third ansd first quartile.

---

### Five-number summary

Lowest value, $Q_1$, median, $Q_3$, highest value

---


---

# Chapter 4: Correlation and Regression
