diff --git a/.DS_Store b/.DS_Store index 32bdf2a..8e98f38 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/story/OneNum.html b/story/OneNum.html index 84412e2..e8f7004 100644 --- a/story/OneNum.html +++ b/story/OneNum.html @@ -1,234 +1,268 @@ - - - - - - - - - - - - - - - - - -
-
- -

-

-

Airbnb prices on the french riviera

-
- A few data analytics ideas from Data-to-Viz.com -
-

- - - - - - - - - - - - - -
- - - - - - - - + + + + + + + + + +
+
+ +

+

+

Airbnb prices on the french riviera

+
+
+ A few data analytics ideas from + Data-to-Viz.com +
+
+

+ + + + + +
+ + + OneNum.knit + + + + - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - -

+ + + + + + + + + + + + +
+ + + + + + + + +

@@ -238,21 +272,29 @@
-




This document porvides a few suggestions for analying a dataset composed of a unique numeric variable.
It considers the nightly price of about 10,000 Airbnb apartements on the French Riviera in France.
This example dataset has been downloaded from the Airbnb website and is available on this Github repository. Basically it looks like the table to the right.

+




This document porvides a few suggestions for analying a +dataset composed of a unique numeric variable.
It considers the +nightly price of about 10,000 Airbnb +apartements on the French Riviera in France.
This example dataset has +been downloaded from the Airbnb website and +is available on this Github +repository. Basically it looks like the table to the right.

-
# Libraries
-library(tidyverse)
-library(hrbrthemes)
-library(kableExtra)
-options(knitr.table.format = "html")
-
-# Load dataset from github
-data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)
-
-# show data
-data %>% head(6) %>% kable() %>%
-  kable_styling(bootstrap_options = "striped", full_width = F)
+
# Libraries
+library(tidyverse)
+library(hrbrthemes)
+library(kableExtra)
+options(knitr.table.format = "html")
+
+# Load dataset from github
+data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)
+
+# show data
+data %>% head(6) %>% kable() %>%
+  kable_styling(bootstrap_options = "striped", full_width = F)
@@ -296,111 +338,134 @@
-
-

Histogram

-
-

The most common way to represent a unique numeric variable is with a histogram. Basically, the numeric variable is cut in several bins: between 0 and 10 euros a night, between 10 and 20 and so on. This is represented on the X axis. Then, the number of apartments per bin is counted and represented on the Y axis.

-

Here, it appears that about 500 appartments have a price between 80 and 90 euros. A histogram is a convenient way to visualize the data: it allows us to understand its distribution.

-
data %>%
-  filter( price<300 ) %>%
-  ggplot( aes(x=price)) +
-    stat_bin(breaks=seq(0,300,10), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
-    ggtitle("Night price distribution of Airbnb appartements") +
-    theme_ipsum() +
-    theme(
-      plot.title = element_text(size=12)
-    )
+

#Histogram ***

+

The most common way to represent a unique numeric variable is with a +histogram. Basically, the numeric variable is cut in several +bins: between 0 and 10 euros a night, between 10 and 20 and +so on. This is represented on the X axis. Then, the number of apartments +per bin is counted and represented on the Y axis.

+

Here, it appears that about 500 appartments have a price between 80 +and 90 euros. A histogram is a convenient way to visualize the data: it +allows us to understand its distribution.

+
data %>%
+  filter( price<300 ) %>%
+  ggplot( aes(x=price)) +
+    stat_bin(breaks=seq(0,300,10), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
+    ggtitle("Night price distribution of Airbnb appartements") +
+    theme_ipsum() +
+    theme(
+      plot.title = element_text(size=12)
+    )

-

Note that it is important to play with the bin size during your exploratory analysis. Let’s check what happens when spliting prices by bins of 2 euros instead of 10:

-
data %>%
-  filter( price<300 ) %>%
-  ggplot( aes(x=price)) +
-    stat_bin(breaks=seq(0,300,3), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
-    ggtitle("Night price distribution of Airbnb appartements") +
-    theme_ipsum() +
-    theme(
-      plot.title = element_text(size=12)
-    )
+

Note that it is important to play with the bin size +during your exploratory analysis. Let’s check what happens when spliting +prices by bins of 2 euros instead of 10:

+
data %>%
+  filter( price<300 ) %>%
+  ggplot( aes(x=price)) +
+    stat_bin(breaks=seq(0,300,3), fill="#69b3a2", color="#e9ecef", alpha=0.9) +
+    ggtitle("Night price distribution of Airbnb appartements") +
+    theme_ipsum() +
+    theme(
+      plot.title = element_text(size=12)
+    )

-

There is a huge difference difference between these 2 histograms. Actually a few values are over represented in the dataset (like 58, 64, 69, 75, 80..). This is definitely a signal that you want to understand when analysing your dataset.

-
-
-

Density

-
-

A variation of the histogram is the density plot, which is basically a smoothed version of the histogram. It represents a kernel density estimate of the variable. As seen for the bin size of the histogram, it is important to try several values for the bandwidth argument for the same reason:

+

There is a huge difference difference between these 2 histograms. +Actually a few values are over represented in the dataset (like 58, 64, +69, 75, 80..). This is definitely a signal that you want to understand +when analysing your dataset.

+

#Density ***

+

A variation of the histogram is the density plot, which is basically +a smoothed version of the histogram. It represents a +kernel density estimate of the variable. As seen for the +bin size of the histogram, it is important to try several values for the +bandwidth argument for the same reason:

-
data %>%
-  filter( price<300 ) %>%
-  ggplot( aes(x=price)) +
-    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.7, bw=10) +
-    ggtitle("Bandwidth: 10") +
-    theme_ipsum()
+
data %>%
+  filter( price<300 ) %>%
+  ggplot( aes(x=price)) +
+    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.7, bw=10) +
+    ggtitle("Bandwidth: 10") +
+    theme_ipsum()

-
data %>%
-  filter( price<300 ) %>%
-  ggplot( aes(x=price)) +
-    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.7, bw=2) +
-    ggtitle("Bandwidth: 2") +
-    theme_ipsum()
+
data %>%
+  filter( price<300 ) %>%
+  ggplot( aes(x=price)) +
+    geom_density(fill="#69b3a2", color="#e9ecef", alpha=0.7, bw=2) +
+    ggtitle("Bandwidth: 2") +
+    theme_ipsum()

-
- - - -

Going further

-
-

You can learn more about each type of graphic presented in this story in the dedicated sections. Click the icon below:

- - - - - - - - - - - - - - - -
-

Comments

-
-

Any thoughts on this? Found any mistake? Have another way to show the data? Please drop me a word on Twitter or in the comment section below:

-
+ +

Going further

+
+

+ You can learn more about each type of graphic presented in this + story in the dedicated sections. Click the icon below: +

+ + + + +
+
+
-
- - +
+

+ Dataviz decision tree +

+

+ Data To Viz is a + comprehensive classification of chart types organized by data + input format. Get a high-resolution version of our decision tree + delivered to your inbox now! +

+
+ +
+
+
+ High Resolution Poster +
+
- - - +
- - -  +  

A work by Yan Holtz for data-to-viz.com

@@ -423,28 +488,28 @@

Comments

gtag('config', 'UA-79254642-3'); - - -  - - - - -
- +  + + diff --git a/story/OneNum_files/figure-html/unnamed-chunk-2-1.png b/story/OneNum_files/figure-html/unnamed-chunk-2-1.png index 45a2f30..ba2290c 100644 Binary files a/story/OneNum_files/figure-html/unnamed-chunk-2-1.png and b/story/OneNum_files/figure-html/unnamed-chunk-2-1.png differ diff --git a/story/OneNum_files/figure-html/unnamed-chunk-3-1.png b/story/OneNum_files/figure-html/unnamed-chunk-3-1.png index 03f66ab..382300f 100644 Binary files a/story/OneNum_files/figure-html/unnamed-chunk-3-1.png and b/story/OneNum_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/story/OneNum_files/figure-html/unnamed-chunk-4-1.png b/story/OneNum_files/figure-html/unnamed-chunk-4-1.png index c328d87..738a469 100644 Binary files a/story/OneNum_files/figure-html/unnamed-chunk-4-1.png and b/story/OneNum_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/story/OneNum_files/figure-html/unnamed-chunk-5-1.png b/story/OneNum_files/figure-html/unnamed-chunk-5-1.png index 66f2737..5a06d4a 100644 Binary files a/story/OneNum_files/figure-html/unnamed-chunk-5-1.png and b/story/OneNum_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/story/libs/header-attrs-2.26/header-attrs.js b/story/libs/header-attrs-2.26/header-attrs.js new file mode 100644 index 0000000..dd57d92 --- /dev/null +++ b/story/libs/header-attrs-2.26/header-attrs.js @@ -0,0 +1,12 @@ +// Pandoc 2.9 adds attributes on both header and div. We remove the former (to +// be compatible with the behavior of Pandoc < 2.8). +document.addEventListener('DOMContentLoaded', function(e) { + var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); + var i, h, a; + for (i = 0; i < hs.length; i++) { + h = hs[i]; + if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 + a = h.attributes; + while (a.length > 0) h.removeAttribute(a[0].name); + } +}); diff --git a/story/template_story.html b/story/template_story.html index 3ff5834..334671e 100644 --- a/story/template_story.html +++ b/story/template_story.html @@ -573,70 +573,60 @@

Going further

- $endif$ + -
+ +
+
+
+

+ Dataviz decision tree +

+

+ Data To Viz is a + comprehensive classification of chart types organized by data + input format. Get a high-resolution version of our decision tree + delivered to your inbox now! +

+
-
-
-

- Dataviz decision tree -

-

- Data To Viz is a - comprehensive classification of chart types organized - by data input format. Get a high-resolution version of our - decision tree delivered to your inbox now! -

-
- -
-
-
- High Resolution Poster -
-
-
-
- - $for(include-after)$ $include-after$ $endfor$ $if(theme)$ - $if(toc_float)$ + + +
+
+ High Resolution Poster +
- $endif$ - +
+ + $for(include-after)$ $include-after$ $endfor$ $if(theme)$ $if(toc_float)$ + $endif$