Skip to content

Commit

Permalink
First draft for second edition.
Browse files Browse the repository at this point in the history
  • Loading branch information
rafalab committed Jan 4, 2024
1 parent d9830b6 commit d15a5fb
Show file tree
Hide file tree
Showing 163 changed files with 9,140 additions and 5,805 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ copy-qmds.R
crossref.sh
cover.png
/.quarto/
*.tex
*.pdf
13 changes: 13 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ book:
- inference/clt.qmd
- inference/confidence-intervals.qmd
- inference/hypothesis-testing.qmd
- inference/bootstrap.qmd
- inference/models.qmd
- inference/bayes.qmd
- inference/hierarchical-models.qmd
Expand Down Expand Up @@ -83,6 +84,18 @@ format:
code-link: true
author-meta: Rafael A. Irizarry
callout-appearance: simple
pdf:
documentclass: krantz
classoption: [krantz2,10pt,twoside,onecolumn,final,openright]
include-in-header: preamble.tex
header-includes: |
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{subfigure}
\usepackage{makeidx}
\usepackage{multicol}
keep-tex: true

knitr:
opts_chunk:
Expand Down
318 changes: 162 additions & 156 deletions docs/highdim/dimension-reduction.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
60 changes: 33 additions & 27 deletions docs/highdim/intro-highdim.html
Original file line number Diff line number Diff line change
Expand Up @@ -205,23 +205,29 @@
<a href="../inference/hypothesis-testing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Hypothesis testing</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../inference/bootstrap.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Bootstrap</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../inference/models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data-driven models</span></span></a>
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Data-driven models</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../inference/bayes.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Bayesian statistics</span></span></a>
<span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Bayesian statistics</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../inference/hierarchical-models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Hierarchichal Models</span></span></a>
<span class="menu-text"><span class="chapter-number">13</span>&nbsp; <span class="chapter-title">Hierarchichal Models</span></span></a>
</div>
</li>
</ul>
Expand All @@ -238,37 +244,37 @@
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../linear-models/regression.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">13</span>&nbsp; <span class="chapter-title">Regression</span></span></a>
<span class="menu-text"><span class="chapter-number">14</span>&nbsp; <span class="chapter-title">Regression</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../linear-models/multivariate-regression.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">14</span>&nbsp; <span class="chapter-title">Multivariate Regression</span></span></a>
<span class="menu-text"><span class="chapter-number">15</span>&nbsp; <span class="chapter-title">Multivariate Regression</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../linear-models/measurement-error-models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">15</span>&nbsp; <span class="chapter-title">Measurement error models</span></span></a>
<span class="menu-text"><span class="chapter-number">16</span>&nbsp; <span class="chapter-title">Measurement error models</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../linear-models/treatment-effect-models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">16</span>&nbsp; <span class="chapter-title">Treatment effect models</span></span></a>
<span class="menu-text"><span class="chapter-number">17</span>&nbsp; <span class="chapter-title">Treatment effect models</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../linear-models/association-tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">17</span>&nbsp; <span class="chapter-title">Association tests</span></span></a>
<span class="menu-text"><span class="chapter-number">18</span>&nbsp; <span class="chapter-title">Association tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../linear-models/association-not-causation.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">18</span>&nbsp; <span class="chapter-title">Association is not causation</span></span></a>
<span class="menu-text"><span class="chapter-number">19</span>&nbsp; <span class="chapter-title">Association is not causation</span></span></a>
</div>
</li>
</ul>
Expand All @@ -285,31 +291,31 @@
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../highdim/matrices-in-R.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">19</span>&nbsp; <span class="chapter-title">Matrices in R</span></span></a>
<span class="menu-text"><span class="chapter-number">20</span>&nbsp; <span class="chapter-title">Matrices in R</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../highdim/linear-algebra.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">20</span>&nbsp; <span class="chapter-title">Applied Linear Algebra</span></span></a>
<span class="menu-text"><span class="chapter-number">21</span>&nbsp; <span class="chapter-title">Applied Linear Algebra</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../highdim/dimension-reduction.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">21</span>&nbsp; <span class="chapter-title">Dimension reduction</span></span></a>
<span class="menu-text"><span class="chapter-number">22</span>&nbsp; <span class="chapter-title">Dimension reduction</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../highdim/regularization.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">22</span>&nbsp; <span class="chapter-title">Regularization</span></span></a>
<span class="menu-text"><span class="chapter-number">23</span>&nbsp; <span class="chapter-title">Regularization</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../highdim/matrix-factorization.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">23</span>&nbsp; <span class="chapter-title">Matrix factorization</span></span></a>
<span class="menu-text"><span class="chapter-number">24</span>&nbsp; <span class="chapter-title">Matrix Factorization</span></span></a>
</div>
</li>
</ul>
Expand All @@ -326,49 +332,49 @@
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/notation-and-terminology.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">24</span>&nbsp; <span class="chapter-title">Notation and Terminology</span></span></a>
<span class="menu-text"><span class="chapter-number">25</span>&nbsp; <span class="chapter-title">Notation and terminology</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/evaluation-metrics.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">25</span>&nbsp; <span class="chapter-title">Evaluation metrics</span></span></a>
<span class="menu-text"><span class="chapter-number">26</span>&nbsp; <span class="chapter-title">Evaluation metrics</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/conditionals.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">26</span>&nbsp; <span class="chapter-title">Conditional probabilities and expectations</span></span></a>
<span class="menu-text"><span class="chapter-number">27</span>&nbsp; <span class="chapter-title">Conditional probabilities and expectations</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/smoothing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">27</span>&nbsp; <span class="chapter-title">Smoothing</span></span></a>
<span class="menu-text"><span class="chapter-number">28</span>&nbsp; <span class="chapter-title">Smoothing</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/cross-validation.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">28</span>&nbsp; <span class="chapter-title">Cross validation</span></span></a>
<span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Resampling methods</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/algorithms.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">29</span>&nbsp; <span class="chapter-title">Examples of algorithms</span></span></a>
<span class="menu-text"><span class="chapter-number">30</span>&nbsp; <span class="chapter-title">Examples of algorithms</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/ml-in-practice.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">30</span>&nbsp; <span class="chapter-title">Machine learning in practice</span></span></a>
<span class="menu-text"><span class="chapter-number">31</span>&nbsp; <span class="chapter-title">Machine learning in practice</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../ml/clustering.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">31</span>&nbsp; <span class="chapter-title">Clustering</span></span></a>
<span class="menu-text"><span class="chapter-number">32</span>&nbsp; <span class="chapter-title">Clustering</span></span></a>
</div>
</li>
</ul>
Expand Down Expand Up @@ -401,9 +407,9 @@ <h1 class="title">High dimensional data</h1>

</header>

<p>There is a variety of computational techniques and statistical concepts that are useful for analysis of datasets for which each observation is associated with a large number of numerical variables. In this chapter we provide a basic introduction to these techniques and concepts by describing matrix operations in R, dimension reduction, regularization, and matrix factorization. Handwritten digits data and movie recommendation systems serve as motivating examples.</p>
<p>A task that serves as motivation for this part of the book is quantifying the similarity between any two observations. For example, we might want to know how much two handwritten digits look like each other. However, note that each observations is associated with <span class="math inline">\(28 \times 28 = 784\)</span> pixels so we can’t simply use subtraction as we would do if our data was one dimensional. Instead, we will define observations as <em>points</em> in a <em>high-dimensional</em> space and mathematically define a <em>distance</em>. Many machine learning techniques, discussed in the next part of the book, require this calculation.</p>
<p>Additionally, this part of the book discusses dimension reduction. Here we search of data summaries that result in more manageable lower dimension versions of the data, but preserve most or all the <em>information</em> we need. Here too we can use distance between observations as specific challenge: we will reduce the dimensions summarize the data into lower dimensions, but in a way that preserves the distance between any two observations. We use <em>linear algebra</em> as a mathematical foundation for all the techniques presented here.</p>
<p>There is a variety of computational techniques and statistical concepts that are useful for analysis of datasets for which each observation is associated with a large number of numerical variables. In this chapter, we provide a basic introduction to these techniques and concepts by describing matrix operations in R, dimension reduction, regularization, and matrix factorization. Handwritten digits data and movie recommendation systems serve as motivating examples.</p>
<p>A task that serves as motivation for this part of the book is quantifying the similarity between any two observations. For example, we might want to know how much two handwritten digits look like each other. However, note that each observation is associated with <span class="math inline">\(28 \times 28 = 784\)</span> pixels so we can’t simply use subtraction as we would if our data was one dimensional. Instead, we will define observations as <em>points</em> in a <em>high-dimensional</em> space and mathematically define a <em>distance</em>. Many machine learning techniques, discussed in the next part of the book, require this calculation.</p>
<p>Additionally, this part of the book discusses dimension reduction. Here we search for data summaries that provide more manageable lower dimension versions of the data, but preserve most or all the <em>information</em> we need. We again use distance between observations as a specific example: we will summarize the data into lower dimensions, but in a way that preserves distance between any two observations. We use <em>linear algebra</em> as a mathematical foundation for all the techniques presented here.</p>



Expand Down Expand Up @@ -644,12 +650,12 @@ <h1 class="title">High dimensional data</h1>
<nav class="page-navigation">
<div class="nav-page nav-page-previous">
<a href="../linear-models/association-not-causation.html" class="pagination-link">
<i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">18</span>&nbsp; <span class="chapter-title">Association is not causation</span></span>
<i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">19</span>&nbsp; <span class="chapter-title">Association is not causation</span></span>
</a>
</div>
<div class="nav-page nav-page-next">
<a href="../highdim/matrices-in-R.html" class="pagination-link">
<span class="nav-page-text"><span class="chapter-number">19</span>&nbsp; <span class="chapter-title">Matrices in R</span></span> <i class="bi bi-arrow-right-short"></i>
<span class="nav-page-text"><span class="chapter-number">20</span>&nbsp; <span class="chapter-title">Matrices in R</span></span> <i class="bi bi-arrow-right-short"></i>
</a>
</div>
</nav>
Expand Down
Loading

0 comments on commit d15a5fb

Please sign in to comment.