feat: statistical test flow chart (#106)

* experiment: flowchart Add React flow library. Create data. * experiment: flowChart basis * experiment: flowchart - complete basic hidden elements * experiment: add description to node data Change node style on click. * experiment: track clicked nodes and current path * experiment: fix path tracking * experiment: add more data & node components * experiment: add more data * experiment: add bottom branch data * experiment: complete all data * fix: hide child nodes connected from other nodes Add animated stroke on hover elements. * experiment: add path tracking info * experiment: add help text Create new visualisation item * docs: add flow chart visualisation Modify `2021-07-22-Millions-of-UK-residents-struggle-to-access-food` for better interpretation of the chart. * docs: add flow chart to the learning path Add links to optional chapters of chapter 2. * fix: styles being removed in production mode * fix: image not rendering correctly in flow flowChart Replace gif with png. * refactor: remove duplicated flow chart files * ci: add script for new build without cache
researchdata-sheffield · Aug 5, 2021 · d2ea672 · d2ea672
1 parent 8631e69
commit d2ea672
Show file tree

Hide file tree

Showing 34 changed files with 3,187 additions and 92 deletions.
diff --git a/.github/workflows/ci-without-cache.yml b/.github/workflows/ci-without-cache.yml
@@ -0,0 +1,48 @@
+name: CI-without-cache
+
+on:
+  workflow_dispatch:
+    inputs:
+      Name:   
+        required: true
+        default: 'Re-run the workflow'
+      Description:
+        default: ''  
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Cancel Previous Runs
+        uses: styfle/cancel-workflow-action@0.9.0
+
+      - uses: actions/checkout@v2
+      # Retrieve cache
+      - name: Gatsby Cache
+        id: gatsby-ci-cache
+        uses: actions/cache@v2
+        with:
+          path: |
+            public
+            .cache
+          key: ${{ runner.os }}-gatsby-ci-${{ github.run_id }}
+          # Not restore cache
+          # restore-keys: ${{ runner.os }}-gatsby-ci-
+
+      - name: Install dependencies
+        run: npm ci
+
+      # Not using incremental build
+      - name: Build without Cache
+        run: npm run build:noCache
+        env:
+          EVENT_API_KEY_1: ${{ secrets.EVENT_API_KEY_1 }}
+          EVENT_API_KEY_2: ${{ secrets.EVENT_API_KEY_2 }}
+          EVENT_ORG_ID_1: ${{ secrets.EVENT_ORG_ID_1 }}
+          EVENT_ORG_ID_2: ${{ secrets.EVENT_ORG_ID_2 }}
+          GA_TRACKING_ID: ${{ secrets.GA_TRACKING_ID }}
+          GATSBY_GH_APP_GITALK_ID: ${{ secrets.GATSBY_GH_APP_GITALK_ID }}
+          GATSBY_GH_APP_GITALK_SECRET: ${{ secrets.GATSBY_GH_APP_GITALK_SECRET }}
+          GATSBY_ENV: ${{ secrets.GATSBY_ENV }}
+
diff --git a/...s/2021-03-18-LearningPath-Statistical-Modeling-2-more-random-sampling/index.mdx b/...s/2021-03-18-LearningPath-Statistical-Modeling-2-more-random-sampling/index.mdx
@@ -7,6 +7,8 @@ description: Stratified, systematic, and cluster sampling methods.
 date: 2021-03-18
 ---  
 
+Note: This page is an option chapter of <Link to="/docs/18/03/2021/LearningPath-Statistical-Modeling-2">Statistical Modeling Part 2 - Sampling</Link>.  
+
 ## Random Sampling
 Random sampling (or probability sampling) refers to nonsubjective sampling methods that apply some mechanism to ensure randomness. 
 

diff --git a/...cs/2021-03-18-LearningPath-Statistical-Modeling-2-non-random-sampling/index.mdx b/...cs/2021-03-18-LearningPath-Statistical-Modeling-2-non-random-sampling/index.mdx
@@ -7,6 +7,7 @@ description: Subjective methods of extracting samples based on the researcher's
 date: 2021-03-18
 ---  
 
+Note: This page is an option chapter of <Link to="/docs/18/03/2021/LearningPath-Statistical-Modeling-2">Statistical Modeling Part 2 - Sampling</Link>.  
 
 ## Non-random Sampling
 Non-random sampling (or non-probability sampling) refers to subjective sampling methods in which researchers draw samples according to his/her own convenience or subjective judgment. It does not strictly follow the principle of random sampling to draw samples so it cannot determine the sampling error, and cannot correctly explain to what extent the statistical value of the sample is suitable for the population.  

diff --git a/content/docs/2021-03-18-LearningPath-Statistical-Modeling-2-optional/index.mdx b/content/docs/2021-03-18-LearningPath-Statistical-Modeling-2-optional/index.mdx
@@ -7,7 +7,7 @@ description: Statistical Modeling Part 2 Optional - Sampling techniques in compu
 date: 2021-03-18
 ---  
 
-
+Note: This page is an option chapter of <Link to="/docs/18/03/2021/LearningPath-Statistical-Modeling-2">Statistical Modeling Part 2 - Sampling</Link>.  
 
 ## Sampling from distributions
 If we have samples from a population then we can use these samples to estimate the distribution and parameters. Why do we need sampling from distributions if the distribution is already known or partially known? There are several applications:  

diff --git a/content/docs/2021-03-18-LearningPath-Statistical-Modeling/index.mdx b/content/docs/2021-03-18-LearningPath-Statistical-Modeling/index.mdx
@@ -22,18 +22,18 @@ learningPathDescription: "Interested in statistics and inferences."
 
 <div className="mx-auto max-w-3xl px-3 md:px-0">
   <p>
-    <span className="text-5xl">A</span> statistical model is a mathematical model used to describe the relationship between different variables. It contains a set of assumptions about the sample data and usually represent the data generation process in a idealised form. Statistical modeling is the process of exploring of statistical models which could represent and best describe observed data. This process includes (but not limited to) initial selection of probability distributions, encapsulate assumptions as parameters, estimation of stochastic variables, sampling, and comparison between different models. Once a statistical model is drafted, the predictive model will be used to test hypotheses, create predicted values (make prediction), and compute confidence interval.
+    <span className="text-5xl">A</span> statistical model is a mathematical model used to describe the relationship between different variables. It contains a set of assumptions about the sample data and usually represents the data generation process in an idealised form. Statistical modelling is the process of exploring statistical models which could represent and best describe observed data. This process includes (but not limited to) initial selection of probability distributions, encapsulating assumptions as parameters, estimation of stochastic variables, sampling, and comparison between different models. Once a statistical model is drafted, the model will be used to test hypotheses, create predicted values (make predictions), and compute confidence intervals (an interval we are confident that values will fall into).
   </p>
   <img src="distribution.png" style={{maxWidth: '350px', margin: '3rem auto'}} /> 
   <p>
-    Most modelling methods that do not model the random component de facto assume Gaussian distribution. The most common exceptions are often growth models which will often assume a lognormal. The random component in statistical models can be either from soemthing that is nondeterministic or it might be due to deterministic elements that are unknown or noises in the system caused by elements that are not captured. Statistics is therefore the art of handling elements in a model that create random noise, and statisticians are agnostic about the nature of the randomness with respect to whether it is deterministic or not.
+    Most modelling methods that do not model the random component de facto assume Gaussian distribution. The most common exceptions are often growth models which will often assume a lognormal. The random component in statistical models can be either from something that is non deterministic or it might be due to deterministic elements that are unknown or noises in the system caused by elements that are not captured. Statistics is therefore the art of handling elements in a model that create random noise, and statisticians are agnostic about the nature of the randomness with respect to whether it is deterministic or not.
   </p>
 </div>
 
 <div className="my-16"> 
   <div className="max-w-3xl mx-auto bg-shefPurple p-5 md:p-8 rounded-md">
     <p className="mt-0 mb-10 text-white">
-      In this learning path we will be introducing you to probability distributions for common variable types, sampling and how to describe a sample from a certain distribution, basic of statistical model, common statistical testing techniques, and many more.
+      In this learning path we will be introducing you to probability distributions for common variable types, sampling and how to describe a sample from a certain distribution, basics of statistical model, common statistical testing techniques, and many more.
     </p>
     {/* <p>
       Throughout the course, we will aim for two objectives:

diff --git a/content/docs/2021-04-07-LearningPath-Statistical-Modeling-4/index.mdx b/content/docs/2021-04-07-LearningPath-Statistical-Modeling-4/index.mdx
@@ -9,10 +9,16 @@ published: false
 ---  
 
 import { HiOutlineLightBulb } from "react-icons/hi"  
-
+import FlowChart from "../../visualisation/2021-08-04-Which-Statistical-Test-To-Use-For-Two-Variables/flowChart/flowChart"
+import { ReactFlowProvider } from 'react-flow-renderer';
 
 ## Introduction
-Statistical (hypothesis) testing is a methodology that used to determine whether the difference between sample and sample, sample and population is caused by sampling error or underlying difference. The significance test is one of the most commonly used methods of hypothesis testing and the most basic form of statistical inference. The basic principle of which is to make some assumptions about the characteristics of the population first, then make inference through studies of samples and decide whether the hypothesis/assumption should be rejected or accepted. Commonly used hypothesis test methods are Z-test, T-test, and F-test. These tests are called parametric tests because they rely on several assumptions for data related to underlying distribution and sampling variances, and parameters are fully specified.  
+Statistical (hypothesis) testing is a methodology that is used to determine whether the difference between sample and sample, sample and population is caused by sampling error or underlying difference. The significance test is one of the most commonly used methods of hypothesis testing and the most basic form of statistical inference. The basic principle of which is to  
+- make some assumptions about the characteristics of the population first
+- then make inference through studies of samples, and
+- decide whether the hypothesis/assumption should be rejected or accepted  
+
+Commonly used hypothesis test methods are Z-test, T-test, and F-test. These tests are called parametric tests because they rely on several assumptions for data related to underlying distribution and sampling variances, and parameters of the distribution are fully specified.  
 
 
 Parametric tests has the following applications:  
@@ -29,10 +35,16 @@ Performing a test usually consists the following steps:
 3. Choose a suitable statistical test.
 4. Based on the size of the statistic and the p-value obtained from the test, determine whether to accept or reject the null hypothesis. If the p-value (probability of the observed value being as a rare event) is less than $\alpha$ then we can reject $H_{0}$. Otherwise in the case of $p > \alpha$ we accept $H_{0}$.  
 
+
 In part 4 we will be looking at common hypothesis test method and their application in linear models. You can download the R script file [here](./script.R).  
+
+We have also prepared a flow chart helps you to choose a statistical test for comparing two variables. Please click the button below to start.  
 
+<ReactFlowProvider>
+  <FlowChart />
+</ReactFlowProvider>
 
-  
+<br/>
 
 ## Prerequisites
 
@@ -67,7 +79,7 @@ There is no built-in function in R for calculating z-test, however, you can work
 The T-test is a hypothesis test that investigate the significance difference between two groups with the following assumptions:  
 - samples are independent 
 - populations follows normal distribution
-- data are ordinal or continuous  
+- data are categorical or continuous  
 - homogeneity of variance (meaning variances are equal among independent samples), depending on what t-test you use. Though the [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test) doesn't require this. Whereas the Levene's test and Student's t-test requires this assumption to be true
 - sample size is not large  
 

diff --git a/...alisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/index.mdx b/...alisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/index.mdx
@@ -2,7 +2,7 @@
 type: visualisation
 author: [Yu Liang Weng]
 title: Millions of UK residents struggle to access food
-thumbnail: thumb.jpg
+thumbnail: thumb.png
 description: >
   A new study from the University of Sheffield has revealed the areas in the UK where residents most struggle to afford or access food.
   For the first time researchers were able to identify food insecurity at local authority scale across three categories, from those experiencing hunger, to those just one emergency away from going without food.

diff --git a/...ts-struggle-to-access-food/millions-of-uk-residents-struggle-to-access-food.png b/...ts-struggle-to-access-food/millions-of-uk-residents-struggle-to-access-food.png
diff --git a/...ts-struggle-to-access-food/millions-of-uk-residents-struggle-to-access-food.svg b/...ts-struggle-to-access-food/millions-of-uk-residents-struggle-to-access-food.svg
diff --git a/...t/visualisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/radarPlot.jsx b/...t/visualisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/radarPlot.jsx
@@ -64,6 +64,13 @@ const radarPlot = () => {
     color: 'rgb(55, 65, 81)'
   }
 
+  const percentageInfo = {
+    position: 'absolute',  
+    left: '48%', 
+    color: '#eee', 
+    fontSize: '.82rem'
+  }
+
   const theme = {
     dots: {
       text: {
@@ -162,6 +169,9 @@ const radarPlot = () => {
         % adults experiencing hunger, <br/>struggled to have food, <br/>worried about having<br/>enough food in<br/>five UK cities.
       </h3>
       <img src={Dish} alt="Food dish" style={{opacity: '0.05', maxWidth: '180px', position: 'absolute', top: '13%', right: 0, margin: '1.5rem'}}  />
+      <h3 style={{bottom: '10%', ...percentageInfo}}>21%</h3>
+      <h3 style={{bottom: '20%', ...percentageInfo}}>14%</h3>
+      <h3 style={{bottom: '30%', ...percentageInfo}}>&nbsp;7%</h3>
       <h1 style={{fontWeight: 800, left: 0, fontSize: '.9rem', ...sourceInfo}}>Dataviz.Shef</h1>
       <h1 style={{right: 0, ...sourceInfo}}>Source: The University of Sheffield - News</h1>
     </div>

diff --git a/...alisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/thumb.jpg b/...alisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/thumb.jpg
diff --git a/...alisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/thumb.png b/...alisation/2021-07-22-Millions-of-UK-residents-struggle-to-access-food/thumb.png
diff --git a/...8-04-Which-Statistical-Test-To-Use-For-Two-Variables/flowChart/consultation.gif b/...8-04-Which-Statistical-Test-To-Use-For-Two-Variables/flowChart/consultation.gif
diff --git a/...8-04-Which-Statistical-Test-To-Use-For-Two-Variables/flowChart/consultation.png b/...8-04-Which-Statistical-Test-To-Use-For-Two-Variables/flowChart/consultation.png