title | sidebar_label |
---|---|
Basic Probability and Statistics |
Probability and Statistics |
The expectation value of a function
$$ \mathrm{E}{\mathrm{X}}[g(x)] = \int{\Chi} g(x) p(x) dx $$
In case of
$$ \mathrm{E}{\mathrm{X}}[g(x)] = \sum \limits{x \in \Chi} g(x) p(x) $$
For multivariate random variable, the expectation value is defined element wise,
$$ \mathrm{E}{\mathrm{X}}[g(x)] = \begin{bmatrix} \mathrm{E}{\mathrm{X_1}} [g(x_1)] \ \mathrm{E}{\mathrm{X_2}}[g(x_2)] \ \vdots \ \mathrm{E}{\mathrm {X_D}}[g(x_D)]\end{bmatrix} \in \R^D $$
Also see Bessel's correction.
Uncorrelated:
Positive correlation:
Negative correlation:
Measure of correlation: Covariance
Pearson's correlation coefficient:
Range for Pearson's coefficient: [-1, 1].
Note: If two variables are uncorrelated, their covariance is 0. However, the reverse need not be true: two variables can dependent on each other and still can have zero covariance.
Confidence Interval = Best Estimate ± Margin of Error
In case of 95% Confidence interval, the formula becomes:
Where
Below is the
Confidence interval (%) | z-multiplier |
---|---|
90 | 1.645 |
95 | 1.96 |
98 | 2.326 |
99 | 2.576 |
When finding out the difference between two Proportion Confidence Intervals: $$ \hat{p_1} - \hat{p_2} \pm 1.96 \times \sqrt{\frac{\hat{p_1}(1 - \hat{p_1})}{n_1}
- \frac{\hat{p_2}(1 - \hat{p_2})}{n_2}} $$
There are also other approaches to define the standard error: $$ \hat{p} \pm z^* \frac{1}{2 \sqrt{n}} \approx \hat{p} \pm \frac{1}{\sqrt{n}} $$
In case of quantitative data, the standard error is given by: $$ \frac{\sigma}{\sqrt{n}} $$
So the confidence interval is given by: $$ \mu \pm \frac{\sigma}{\sqrt{n}} $$
For a specific confidence interval: $$ \mu \pm t^* \frac{\sigma}{\sqrt{n}} $$
Where $t^$ multiplier comes from a t-distribution with (n-1) degrees of
freedom. For 95% confidence interval with sample size (n) = 25, $t^$ = 2.064,
and with a sample size of 1000, $t^$ = 1.962. For large sample sizes, the $t^$
value will be closer to the
Differences in the population means with confidence interval for two independent groups: $$ \mu_1 - \mu_2 \pm t^* \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} $$
We can express one probability event in terms of other events, especially through unions, intersections and compliments. Oftentimes, one expression is easier to calculate than another. In this regards, De Morgans laws are very helpful.
Note that analogous results hold for unions and intersection of more than two events.
For any nonnegative integers
For
Binomial Theorem:
Vandermonde's identity:
Definition of conditional probability:
- Probability and statistics: https://projects.iq.harvard.edu/stat110
- Standard deviation Wikipedia : https://en.wikipedia.org/wiki/Standard_deviation
- Seeing Theory
- Bayes' rule Wikipedia: https://en.m.wikipedia.org/wiki/Bayes%27_theorem