Skip to content

Commit

Permalink
feat: statistical test flow chart (#106)
Browse files Browse the repository at this point in the history
* experiment: flowchart

Add React flow library.
Create data.

* experiment: flowChart basis

* experiment: flowchart - complete basic hidden elements

* experiment: add description to node data

Change node style on click.

* experiment: track clicked nodes and current path

* experiment: fix path tracking

* experiment: add more data & node components

* experiment: add more data

* experiment: add bottom branch data

* experiment: complete all data

* fix: hide child nodes connected from other nodes

Add animated stroke on hover elements.

* experiment: add path tracking info

* experiment: add help text

Create new visualisation item

* docs: add flow chart visualisation

Modify `2021-07-22-Millions-of-UK-residents-struggle-to-access-food`
for better interpretation of the chart.

* docs: add flow chart to the learning path

Add links to optional chapters of chapter 2.

* fix: styles being removed in production mode

* fix: image not rendering correctly in flow flowChart

Replace gif with png.

* refactor: remove duplicated flow chart files

* ci: add script for new build without cache
  • Loading branch information
yld-weng committed Aug 5, 2021
1 parent 8631e69 commit d2ea672
Show file tree
Hide file tree
Showing 34 changed files with 3,187 additions and 92 deletions.
48 changes: 48 additions & 0 deletions .github/workflows/ci-without-cache.yml
@@ -0,0 +1,48 @@
name: CI-without-cache

on:
workflow_dispatch:
inputs:
Name:
required: true
default: 'Re-run the workflow'
Description:
default: ''

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Cancel Previous Runs
uses: styfle/cancel-workflow-action@0.9.0

- uses: actions/checkout@v2
# Retrieve cache
- name: Gatsby Cache
id: gatsby-ci-cache
uses: actions/cache@v2
with:
path: |
public
.cache
key: ${{ runner.os }}-gatsby-ci-${{ github.run_id }}
# Not restore cache
# restore-keys: ${{ runner.os }}-gatsby-ci-

- name: Install dependencies
run: npm ci

# Not using incremental build
- name: Build without Cache
run: npm run build:noCache
env:
EVENT_API_KEY_1: ${{ secrets.EVENT_API_KEY_1 }}
EVENT_API_KEY_2: ${{ secrets.EVENT_API_KEY_2 }}
EVENT_ORG_ID_1: ${{ secrets.EVENT_ORG_ID_1 }}
EVENT_ORG_ID_2: ${{ secrets.EVENT_ORG_ID_2 }}
GA_TRACKING_ID: ${{ secrets.GA_TRACKING_ID }}
GATSBY_GH_APP_GITALK_ID: ${{ secrets.GATSBY_GH_APP_GITALK_ID }}
GATSBY_GH_APP_GITALK_SECRET: ${{ secrets.GATSBY_GH_APP_GITALK_SECRET }}
GATSBY_ENV: ${{ secrets.GATSBY_ENV }}

Expand Up @@ -7,6 +7,8 @@ description: Stratified, systematic, and cluster sampling methods.
date: 2021-03-18
---

Note: This page is an option chapter of <Link to="/docs/18/03/2021/LearningPath-Statistical-Modeling-2">Statistical Modeling Part 2 - Sampling</Link>.

## Random Sampling
Random sampling (or probability sampling) refers to nonsubjective sampling methods that apply some mechanism to ensure randomness.

Expand Down
Expand Up @@ -7,6 +7,7 @@ description: Subjective methods of extracting samples based on the researcher's
date: 2021-03-18
---

Note: This page is an option chapter of <Link to="/docs/18/03/2021/LearningPath-Statistical-Modeling-2">Statistical Modeling Part 2 - Sampling</Link>.

## Non-random Sampling
Non-random sampling (or non-probability sampling) refers to subjective sampling methods in which researchers draw samples according to his/her own convenience or subjective judgment. It does not strictly follow the principle of random sampling to draw samples so it cannot determine the sampling error, and cannot correctly explain to what extent the statistical value of the sample is suitable for the population.
Expand Down
Expand Up @@ -7,7 +7,7 @@ description: Statistical Modeling Part 2 Optional - Sampling techniques in compu
date: 2021-03-18
---


Note: This page is an option chapter of <Link to="/docs/18/03/2021/LearningPath-Statistical-Modeling-2">Statistical Modeling Part 2 - Sampling</Link>.

## Sampling from distributions
If we have samples from a population then we can use these samples to estimate the distribution and parameters. Why do we need sampling from distributions if the distribution is already known or partially known? There are several applications:
Expand Down
Expand Up @@ -22,18 +22,18 @@ learningPathDescription: "Interested in statistics and inferences."

<div className="mx-auto max-w-3xl px-3 md:px-0">
<p>
<span className="text-5xl">A</span> statistical model is a mathematical model used to describe the relationship between different variables. It contains a set of assumptions about the sample data and usually represent the data generation process in a idealised form. Statistical modeling is the process of exploring of statistical models which could represent and best describe observed data. This process includes (but not limited to) initial selection of probability distributions, encapsulate assumptions as parameters, estimation of stochastic variables, sampling, and comparison between different models. Once a statistical model is drafted, the predictive model will be used to test hypotheses, create predicted values (make prediction), and compute confidence interval.
<span className="text-5xl">A</span> statistical model is a mathematical model used to describe the relationship between different variables. It contains a set of assumptions about the sample data and usually represents the data generation process in an idealised form. Statistical modelling is the process of exploring statistical models which could represent and best describe observed data. This process includes (but not limited to) initial selection of probability distributions, encapsulating assumptions as parameters, estimation of stochastic variables, sampling, and comparison between different models. Once a statistical model is drafted, the model will be used to test hypotheses, create predicted values (make predictions), and compute confidence intervals (an interval we are confident that values will fall into).
</p>
<img src="distribution.png" style={{maxWidth: '350px', margin: '3rem auto'}} />
<p>
Most modelling methods that do not model the random component de facto assume Gaussian distribution. The most common exceptions are often growth models which will often assume a lognormal. The random component in statistical models can be either from soemthing that is nondeterministic or it might be due to deterministic elements that are unknown or noises in the system caused by elements that are not captured. Statistics is therefore the art of handling elements in a model that create random noise, and statisticians are agnostic about the nature of the randomness with respect to whether it is deterministic or not.
Most modelling methods that do not model the random component de facto assume Gaussian distribution. The most common exceptions are often growth models which will often assume a lognormal. The random component in statistical models can be either from something that is non deterministic or it might be due to deterministic elements that are unknown or noises in the system caused by elements that are not captured. Statistics is therefore the art of handling elements in a model that create random noise, and statisticians are agnostic about the nature of the randomness with respect to whether it is deterministic or not.
</p>
</div>

<div className="my-16">
<div className="max-w-3xl mx-auto bg-shefPurple p-5 md:p-8 rounded-md">
<p className="mt-0 mb-10 text-white">
In this learning path we will be introducing you to probability distributions for common variable types, sampling and how to describe a sample from a certain distribution, basic of statistical model, common statistical testing techniques, and many more.
In this learning path we will be introducing you to probability distributions for common variable types, sampling and how to describe a sample from a certain distribution, basics of statistical model, common statistical testing techniques, and many more.
</p>
{/* <p>
Throughout the course, we will aim for two objectives:
Expand Down
Expand Up @@ -9,10 +9,16 @@ published: false
---

import { HiOutlineLightBulb } from "react-icons/hi"

import FlowChart from "../../visualisation/2021-08-04-Which-Statistical-Test-To-Use-For-Two-Variables/flowChart/flowChart"
import { ReactFlowProvider } from 'react-flow-renderer';

## Introduction
Statistical (hypothesis) testing is a methodology that used to determine whether the difference between sample and sample, sample and population is caused by sampling error or underlying difference. The significance test is one of the most commonly used methods of hypothesis testing and the most basic form of statistical inference. The basic principle of which is to make some assumptions about the characteristics of the population first, then make inference through studies of samples and decide whether the hypothesis/assumption should be rejected or accepted. Commonly used hypothesis test methods are Z-test, T-test, and F-test. These tests are called parametric tests because they rely on several assumptions for data related to underlying distribution and sampling variances, and parameters are fully specified.
Statistical (hypothesis) testing is a methodology that is used to determine whether the difference between sample and sample, sample and population is caused by sampling error or underlying difference. The significance test is one of the most commonly used methods of hypothesis testing and the most basic form of statistical inference. The basic principle of which is to
- make some assumptions about the characteristics of the population first
- then make inference through studies of samples, and
- decide whether the hypothesis/assumption should be rejected or accepted

Commonly used hypothesis test methods are Z-test, T-test, and F-test. These tests are called parametric tests because they rely on several assumptions for data related to underlying distribution and sampling variances, and parameters of the distribution are fully specified.


Parametric tests has the following applications:
Expand All @@ -29,10 +35,16 @@ Performing a test usually consists the following steps:
3. Choose a suitable statistical test.
4. Based on the size of the statistic and the p-value obtained from the test, determine whether to accept or reject the null hypothesis. If the p-value (probability of the observed value being as a rare event) is less than $\alpha$ then we can reject $H_{0}$. Otherwise in the case of $p > \alpha$ we accept $H_{0}$.


In part 4 we will be looking at common hypothesis test method and their application in linear models. You can download the R script file [here](./script.R).

We have also prepared a flow chart helps you to choose a statistical test for comparing two variables. Please click the button below to start.

<ReactFlowProvider>
<FlowChart />
</ReactFlowProvider>

<br/>

## Prerequisites

Expand Down Expand Up @@ -67,7 +79,7 @@ There is no built-in function in R for calculating z-test, however, you can work
The T-test is a hypothesis test that investigate the significance difference between two groups with the following assumptions:
- samples are independent
- populations follows normal distribution
- data are ordinal or continuous
- data are categorical or continuous
- homogeneity of variance (meaning variances are equal among independent samples), depending on what t-test you use. Though the [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test) doesn't require this. Whereas the Levene's test and Student's t-test requires this assumption to be true
- sample size is not large

Expand Down
Expand Up @@ -2,7 +2,7 @@
type: visualisation
author: [Yu Liang Weng]
title: Millions of UK residents struggle to access food
thumbnail: thumb.jpg
thumbnail: thumb.png
description: >
A new study from the University of Sheffield has revealed the areas in the UK where residents most struggle to afford or access food.
For the first time researchers were able to identify food insecurity at local authority scale across three categories, from those experiencing hunger, to those just one emergency away from going without food.
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Expand Up @@ -64,6 +64,13 @@ const radarPlot = () => {
color: 'rgb(55, 65, 81)'
}

const percentageInfo = {
position: 'absolute',
left: '48%',
color: '#eee',
fontSize: '.82rem'
}

const theme = {
dots: {
text: {
Expand Down Expand Up @@ -162,6 +169,9 @@ const radarPlot = () => {
% adults experiencing hunger, <br/>struggled to have food, <br/>worried about having<br/>enough food in<br/>five UK cities.
</h3>
<img src={Dish} alt="Food dish" style={{opacity: '0.05', maxWidth: '180px', position: 'absolute', top: '13%', right: 0, margin: '1.5rem'}} />
<h3 style={{bottom: '10%', ...percentageInfo}}>21%</h3>
<h3 style={{bottom: '20%', ...percentageInfo}}>14%</h3>
<h3 style={{bottom: '30%', ...percentageInfo}}>&nbsp;7%</h3>
<h1 style={{fontWeight: 800, left: 0, fontSize: '.9rem', ...sourceInfo}}>Dataviz.Shef</h1>
<h1 style={{right: 0, ...sourceInfo}}>Source: The University of Sheffield - News</h1>
</div>
Expand Down
Binary file not shown.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d2ea672

Please sign in to comment.