Skip to content

Commit

Permalink
docs: blog posts grammar checks (#102)
Browse files Browse the repository at this point in the history
* docs: Grammarpolice2 (#101)

* Hackathon
* Stats. Central Tendancy
* Colour Schemes
* Stats. Practical
* Non-Numeric
* Dash

Co-authored-by: yld-weng <59968766+yld-weng@users.noreply.github.com>
Co-authored-by: yld-weng <Y.Weng@sheffield.ac.uk>

Co-authored-by: GemmaRIT <60690118+GemmaRIT@users.noreply.github.com>
  • Loading branch information
yld-weng and GemmaRIT committed Jul 28, 2021
1 parent 19f7cbe commit 375b6f2
Show file tree
Hide file tree
Showing 6 changed files with 62 additions and 63 deletions.
11 changes: 5 additions & 6 deletions content/blog/2020-02-28-Urban-Observatories-hackathon/index.mdx
Expand Up @@ -12,13 +12,12 @@ tag: ["Data Engineering", "Data Analytics", "Research & Innovation", "Urban Obse

---

In February, The University of Sheffield hosted a three-day hackathon, an intensive collaborative software development event, organised by
the <a href="https://www.dafni.ac.uk/">Data & Analytics Facility for National Infrastructure</a> (DAFNI). The event aimed to begin tackling the
complex challenges involved in coordinating analytics using data from urban observatories across the UK. DAFNI provides an analytics platform to enable innovative data analysis solutions
In February, The University of Sheffield hosted a three-day hackathon, an intensive collaborative software development event organised by the <a href="https://www.dafni.ac.uk/">Data & Analytics Facility for National Infrastructure</a> (DAFNI).
The event aimed to begin tackling the complex challenges involved in coordinating analytics using data from urban observatories across the UK. DAFNI provides an analytics platform to enable innovative data analysis solutions
for infrastructure research.

An urban observatory is a network of sensors capturing atmospheric and energy flow data from across a city. Several <a href="https://urbanobservatory.ac.uk/">such projects</a> are being set up nationally aiming to offer the largest set of
publicly-available real-time urban data in the country by UKRIC (UK Collaboratorium for Research on Infrastructure and Cities).
publicly available real-time urban data in the country by UKRIC (UK Collaboratorium for Research on Infrastructure and Cities).

The Research & Innovation team supports the Sheffield observatory being run by the <a href="https://urbanflows.ac.uk/">Urban Flows</a> project by providing data engineering work. IT Services provide infrastructure and security support.

Expand All @@ -44,8 +43,8 @@ on JavaScript Object Notation (JSON.) This provided the front-end discussed abov

![solution](./solution.jpg)

Another team focussed on integrated data from the different platforms in each city. It developed data pipelines using Python to extract and transform data from the Newcastle and Sheffield observatories
respectively. These data were then loaded into a PostgreSQL database that ran in a Docker container hosted on a cloud container service provider. The database used a Postgres foreign data wrapper called <a href="https://github.com/citusdata/cstore_fdw">cstore</a>
Another team focussed on integrated data from the different platforms in each city. It developed data pipelines using Python to extract and transform data from the Newcastle and Sheffield observatories, respectively.
These data were then loaded into a PostgreSQL database that ran in a Docker container hosted on a cloud container service provider. The database used a Postgres foreign data wrapper called <a href="https://github.com/citusdata/cstore_fdw">cstore</a>
to store the data in columnar form to facilitate faster analytics queries.

A system was developed that allowed the end-user to use a simple web-based form to select the data they are interested in, run a query against this database and spawn a small container
Expand Down
34 changes: 17 additions & 17 deletions content/blog/2020-05-02-dataviz-stats-1/index.mdx
Expand Up @@ -12,40 +12,40 @@ import { Link } from "gatsby"


# Descriptive statistics
Statistics is the science of the collection, analysis, interpretation and presentation of data. Statistics can be applied to varies of areas such as education, biology, engineering,
Statistics is the science of the collection, analysis, interpretation and presentation of data. Statistics can be applied to various areas such as education, biology, engineering,
chemistry, psychology, sports etc. Statistics is mainly divided into <b>descriptive statistics</b> and <b>inferential statistics</b> (statistical inference).

Given a set of data, the usage of descriptive statistics is to summarise and describe the data. For example, Use specific numbers or charts to reflect the concentration and dispersion of data.
The average score, the highest score, the distribution of the number of people in each segment, etc., also belong to the scope of descriptive statistics.
On the other hand, the usage of the observer establishes a mathematical model to explain randomness and uncertainty of data, and uses it to infer the steps in the research is called inferential statistics.
On the other hand, the usage of the observer establishes a mathematical model to explain randomness and uncertainty of data and uses it to infer the steps in the research is called inferential statistics.
For example, infer the overall data characteristics based on the sample data. Product quality inspection is generally conducted by random inspection, and the overall quality qualification rate is estimated based on the quality qualification rate of the sample.

![chart](./img_2.png)

Since **descriptive statistics** is one of ingradients in exploring a dataset (data exploratory analysis), this articles covers commonly used tools when analysing and summarising data.
Since **descriptive statistics** is one of the ingredients in exploring a dataset (data exploratory analysis), this article covers commonly used tools when analysing and summarising data.
If you are looking for specific charts or want to choose an appropriate chart for your findings, check out this <Link to="/blog/06/04/2020/chart-choice">article</Link>.

<br />

# Measures of Central Tendency

The central tendency describes the concentration situation of collected data (and its subgroup), and is also a value describing the central position of a group.
There are many types of concentration, including arithmetic average (mean), median, mode, weighted average, geometric average, harmonic average, etc,
the central tendency mostly represented by mean, median and mode.
The central tendency describes the concentration situation of collected data (and its subgroup) and is also a value describing the central position of a group.
There are many types of concentration, including arithmetic average (mean), median, mode, weighted average, geometric average, harmonic average. However,
the central tendency is mostly represented by mean, median and mode.

### Mean
**Mean** is the arithmetic average of dataset, the calculation is simple and suitable for further calculation, and is less affected by sampling changes.
**Mean** is the arithmetic average of a dataset; the calculation is simple and suitable for further calculation and is less affected by sampling changes.
At the same time, there are certain disadvantages that limit its use. The arithmetic mean is susceptible to extreme data where every change in the data will affect the final result.
If the data appears blurred, average cannot be calculated.
If the data appears blurred, an average cannot be calculated.
> Application principles for **mean**:
1. Homogeneous data
2. Consider the combination of average and individual values
3. The average is considered by combining variance and standard deviation

### Median
After sorting a group of numbers, **median** is the number in the middle (the number of numbers is odd); or the average of the middle two numbers (the number of numbers is even).
This number may be one of the data, or it may not be the original number at all. Similar to mean, median is simple to calculate and easy to understand, but it doesn't affected by extreme value.
However, median not sensitive enough and is greatly affected by sampling. In most cases, no further algebraic operations involves median.
This number may be one of the data, or it may not be the original number at all. Similar to mean, median is simple to calculate and easy to understand and is less affected by extreme values.
However, median is often not sensitive enough and is greatly affected by sampling. In most cases, algebraic operations no longer use median.
> Application principles for **median**:
1. When you need to quickly estimate the concentration value
2. When there is extreme data
Expand All @@ -63,7 +63,7 @@ The most frequent value in a set of data is called the **mode**. Mode is unstabl

# Measures of Variation

The measure of variation also known as the tendency of dispersion. It is one of the basic concepts of statistics.
The measure of variation is also known as the tendency of dispersion. It is one of the basic concepts of statistics.
It is the quantity that represents the trend of the sample data deviating from the intermediate value, or it reflects the degree of dispersion of the sample frequency distribution.
Common tools of variation are **mean absolute deviation**, **variance**, **standard deviation**, **range**, and **interquartile range**.

Expand All @@ -79,21 +79,21 @@ Mean absolute deviation of a set **${x_{1}, x_{2}, ..., x_{n}}$** contains $n$ s
$$
\frac{1}{n} \sum_{i=1}^{n} \left|x_{i} - m(X) \right|
$$
where $m(X)$ is the choice of measure of central tendency of the dataset, in this case the mean of the dataset.
It is the average of the arithmetic average and the distance of each data point, and can directly reflect the degree of difference of data.
where $m(X)$ is the choice of measure of central tendency of the dataset, in this case, the mean of the dataset.
It is the average of the arithmetic average and the distance of each data point and can directly reflect the degree of difference of data.
However, due to the use of absolute values, it is difficult to carry out algebraic operations and conduct theoretical analysis, so it is rarely used.

### Variance
Variance is obtained by changing the distance in **mean absolute deviation** to the square of the distance, use the same dataset above:
$$
\sigma^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{X})^2}{n}
$$
It can effectively use information from data, and can well reflect the degree of difference of data. After this change, although it is not as direct as the mean absolute deviation in
reflects the difference, the absolute value is avoided, so that algebraic operations is more convenient and therefore more widely used.
Variance can effectively use information from data and can reflect the degree of difference. Although it is not as direct as the mean absolute deviation in
reflecting the difference, it is more verstile than the absolute value which is typically avoided within algebraic operations, therefore variance is more widely used.

### Standard deviation
The standard deviation is defined as the **square root of the variance**, reflecting the degree of dispersion between individuals within the group.
It is effectively avoid the measurement problem caused by unit square.
It effectively avoids the measurement problem caused by the unit square.
$$
\sigma = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{X})^2}{n}}
$$
Expand All @@ -105,7 +105,7 @@ It applies to isometric variables, ratio variables, not to nominal variables or

### Interquartile range
The interquartile range (IQR) is defined as $IQR = Q_{3} - Q_{1}$, where $Q_{3}$ and $Q_{1}$ is 75th and 25th percentiles respectively.
The interquartile range is usually used to construct box plots, and a brief graphical overview of the probability distribution. IQR only uses part of the information of the data.
The interquartile range is usually used to construct box plots and a brief graphical overview of the probability distribution. IQR only uses part of the information of the data.
Generally, it is used when the data information is incomplete and the mean difference and variance and their improvement cannot be used.


Expand Down
14 changes: 7 additions & 7 deletions content/blog/2020-05-06-Colour-Schemes/index.mdx
Expand Up @@ -34,14 +34,14 @@ thumbnail: "./8389.jpg"
<div className="container mx-auto">
<br />
<p> Colour can also be used effectively to distinguish different groups or categories when plotting.
The chart below looks at the average Kickstarter campaign goal between 2009 and 2017, here, colour is used to differentiate between different categories of Kickstarter campaigns.
The chart below looks at the average Kickstarter campaign goal between 2009 and 2017; here, colour is used to differentiate between different categories of Kickstarter campaigns.
Colours should be easy to distinguish from one another to maximise the effect, however, do ensure that they are not garish or uncomfortable to the eye.
Just to note, when using colour to distinguish between two groups, you may wish to ensure that each have a similar visual attraction, for example when combining red and grey, the eye will likely be drawn to the red which may imply that those values are of higher importance to the message.
Just to note, when using colour to distinguish between two groups, you may wish to ensure that each has a similar visual attraction; for example, when combining red and grey, the eye will likely be drawn to the red, which may imply that those values are of higher importance to the message.
</p>
<br />
<img src="./Kickstarter2.png" alt="Using colur to distinguish categories"></img>
<img src="./Kickstarter2.png" alt="Using colour to distinguish categories"></img>
<br />
<p> As addressed by <a href="https://serialmentor.com/dataviz/color-pitfalls.html#encoding-too-much-or-irrelevant-information/"> Claus O. Wilke</a>, when using colour to distinguish categories, it is important that you don’t give colour too great of task.
<p> As addressed by <a href="https://serialmentor.com/dataviz/color-pitfalls.html#encoding-too-much-or-irrelevant-information/"> Claus O. Wilke</a>, when using colour to distinguish categories, it is important that you don’t give colour too great of a task.
If the number of categories becomes too large, it can overwhelm the possibility of perceptually different colours, meaning the reader cannot effectively and reliably distinguish differences in categories.
</p>
</div>
Expand All @@ -66,21 +66,21 @@ thumbnail: "./8389.jpg"
<div className="container mx-auto">
<p> <strong> Colour Vision Deficiency </strong></p>
<p> When using colour, it is important to take into account that some readers may have Colour Vision Deficiency (CVD).
By including a perceptible change in contrast you can maximise the clarity of your palette for those with CVD.
By including a perceptible change in contrast, you can maximise the clarity of your palette for those with CVD.
Alternatively, Viz Palette offers the opportunity to easily check the clarity of your colour scheme for those with CVD, and tools like ColorBrewer allow you to choose only palettes whose changes in colour will be perceptible to those with CVD.
</p>
<p> <strong> Colour Expectations </strong></p>
<p> Often, readers will have an expectation of certain colours. To list some of the most common examples; blue for water and green for land,
red for warm anomalies and blue for cold anomalies, each of the colours of prominent political parties.
When you invert these expectations, it removes the simplicity of the graph.
For example, the figure below shows global temperature anomalies, the non-standard continuous colour scale, where the warmest anomalies are represented by a pale yellow, and the coolest anomalies by dark green, makes the interpretation of the graph far more challenging that necessary.
For example, the figure below shows global temperature anomalies, the non-standard continuous colour scale - where the warmest anomalies are represented by a pale yellow, and the coolest anomalies by dark green, makes the interpretation of the graph far more challenging than necessary.
<br />
<br />
<img src="./WrongColors2.png" alt="A figure that use wrong colours"></img>
<br />
</p>
<p> <strong> Consistency </strong></p>
<p> One final note, when creating multiple figures, you could consider a colour scheme for your docuemnt.
<p> One final note, when creating multiple figures, you could consider a colour scheme for your document.
Particularly if you plan to use the same categories in different contexts.
</p>
</div>

0 comments on commit 375b6f2

Please sign in to comment.