This repository comes as a rearrangement of all the work done during my MSc in Mathematical Engineering @ Politecnico di Milano. The purpose of the repository is to put at reach an easy-to-use collection of statistical methods and techniques. Out of scope, on the other hand, is to address all the problems that these methods and techniques may encounter in a real-world data contest. In fact, all the datasets used are toy datasets, used only to introduce an application.
All the code is written in R
.
A python
correspondent will also exist in the future.
Mainly, the collection is divided into two sections:
- Standard Statistics - parametric statistics, classical approaches with, often, strong assumptions on data
- Nonparametric Statistics - modern approaches, free from heavy assumptions on data
I uploaded the files in .r
(script version, easy to download and use directly with a custom dataset), in .rmd
and in .html
(for visualization purposes). Since GitHub does not provide a preview for .rmd
(nor .html
), I made use of an extension that allows the viewing of .html
.
The files are viewable (code chuncks, outputs and plots) at the links below.
- 01 - PCA
- 02 - Multivariate Gaussian, One Population - Test and CR for the Mean
- 03 - Paired Gaussian Data - Test for the Mean
- 04 - Repeated Measures
- 05 - Multivariate Gaussian, Two Populations - Test for the Mean
- 06 - One-way ANOVA (p=1, g=6)
- 07 - One-way MANOVA (p=4, g=3)
- 08 - Two-way ANOVA (p=1, g=2, b=2)
- 09 - Two-way MANOVA (p=3, g=2, b=2)
- 10 - Supervised learning - LDA (univariate, bivariate), QDA (bivariate), KNN, Fisherβs argument
- 11 - Unsupervised learning - Hierarchical, K-means clustering
- 12 - Linear Models - Model, IC, IP, PCA Regression, Ridge Regression, Lasso Regression
- 13 - Linear Models - Variables Selection
- 14 - Functional Data Analysis (FDA)
- 15 - Functional Data Analysis (FDA) - Example
- 16 - Geostatistics
- 01 - Depth Measures
- 02 - Sign Test
- 03 - Rank Test (Mann-Withney U Test)
- 04 - Signed Rank Test (Wilcoxon Signed Rank W Test)
- 05 - Permutation Test - Two Independent Samples (1-dim)
- 06 - Permutation Test - Two Independent Samples (n-dim)
- 07 - Permutation Test - Center of Simmetry
- 08 - Permutation Test - Regression
- 09 - Permutation Test - ANOVA, MANOVA
- 10 - Permutation Test - Confidence Intervals
- 11 - Bootstrap - Confidence Intervals
- 12 - Bootstrap - Regression
- 13 - Bootstrap - Two Independent Samples (1-dim)
- 14 - Bootstrap - One Sample (n-dim)
- 15 - Bootstrap - Test and p-values
- 16 - Nonparametric Regression
- 17 - Splines
- 18 - GAMs
- 19 - Full Conformal Prediction
- 20 - Split Conformal Prediction
- 21 - Conformal Prediction Intervals
- 22 - Survival Analysis
If any code is not working, or if any dataset is missing let me know.
As a python fan, I plan to "translate" the work into python language (where I know that, with the right libraries it will take much less lines of code π). In addition, it would also be interesting to push beyond toy datasets and try some real-world applications.