<div align=left style="margin-left:10%; margin-top:5%">
<font face="LMRoman17-Regular" size=10 color="0068b4">
1<br>Introduction
</font>
</div>

---


# An Overview of Statistical Learning
Statistical learning tools can be classified as **supervised** or **unsupervised**.
In **Supervised** learning we have an output based on one or more labeled inputs, While with **unsupervised** statistical learning, there are unlabeled inputs and
no supervising output.
Three real-world data sets that are considered in this book:
### Wage Data
It's referred to as <font color='#be5100'>Wage</font> data set throughout this book.
Number of factors that relate to wages for a group of men from the Atlantic region of the United States. In particular, we wish to understand the association between an employee’s <font color='#be5100'>age</font> and <font color='#be5100'>education</font>, as well as the calendar <font color='#be5100'>year</font>, on his <font color='#be5100'>wage</font>.
### Stock Market Data
In certain cases we may wish to predict a non-numerical value—that is, a categorical or qualitative output. For example, in Chapter 4 we examine a stock market data set that contains the daily movements in the Standard & Poor’s 500
(S&P) stock index over a 5-year period between 2001 and 2005. We refer to this as the <font color='#be5100'>Smarket</font> data.
The goal is to predict whether the index will increase or decrease on a given day, using the past 5 days’ percentage changes in the index.
### Gene Expression Data
Another important class of problems involves situations in which we only observe input variables, with no corresponding output. For example, in a marketing setting, we might have demographic information for a number of current or potential customers. We may wish to understand which types of customers are similar to each other by grouping individuals according to their observed characteristics. This is known as a **clustering** problem. Unlike in the previous examples, here we are not trying to predict an output variable.

We devote Chapter 12 to a discussion of statistical learning methods for problems in which no natural output variable is available. We consider the <font color='#be5100'>NCI60</font> data set, which consists of 6,830 gene expression measurements for each of 64 cancer cell lines. Instead of predicting a particular output variable, we are interested in determining whether there are groups, or clusters, among the cell lines based on their gene expression measurements. This is a difficult question to address, in part because there are thousands of gene expression measurements per cell line, making it hard to visualize the data.



# A Brief History of Statistical Learning

1. At the beginning of the **nineteenth** century, the method of *least squares* was developed, implementing the earliest form of what is now known as *linear regression*. The approach was first successfully applied to problems in astronomy.

 Linear regression is used for predicting **quantitative** values, such as an individual’s salary.

2. In **1936**, to predict **qualitative** values, such as whether a patient survives or dies, or whether the stock market increases or decreases, *linear discriminant* analysis was proposed.

3. In the **1940s**, various authors put forth an alternative approach: *logistic regression*.

4. In the early **1970s**, the term *generalized linear model* was developed to describe an entire class of statistical learning methods that include both linear and logistic regression.

5. By the **1980s**, computing technology had improved sufficiently that *non-linear* methods were no longer computationally prohibitive. In the mid **1980s**, *classification* and *regression trees* were developed,followed shortly by *generalized additive models*.

6. *Neural networks* gained popularity in the **1980s**, and support *vector machines* arose in the **1990s**.

7. Since that time, statistical learning has emerged as a new subfield in statistics, focused on supervised and unsupervised modeling and prediction. In recent years, progress in statistical learning has been marked by the


# Notation and Simple Matrix Algebra

* $n$ : to represent the number of distinct data points, or observations, in our sample.

* $p$ : denotes the number of variables that are available for use in making predictions.

* $x_{ij}$ : represent the value of the $j$th variable for the
$i$th observation, where $i = 1, 2, ..., n$ and $j = 1, 2, ..., p$.

* $\mathbf{X}$ : denote an $n × p$ matrix whose $(i, j)$th element is $x_{ij}$:

$$
\mathbf{X} = \begin{pmatrix}
x_{11} & x_{12} & ... & x_{1p} \\
x_{21} & x_{22} & ... & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & ... & x_{np} \\
\end{pmatrix}
$$
<br>

* $x_i$: the rows of $\mathbf{X}$, which we write as $x_1, x_2, . . . ,x_n$. Here $x_i$ is a vector of length $p$, containing the $p$ variable measurements for the $i$th observation. That is,

$$
x_i = \begin{pmatrix}
x_{i1} \\
x_{i2} \\
\vdots \\
x_{ip} \\
\end{pmatrix}
$$
<br>

* $x_j$: the columns of $\mathbf{X}$, which we write as $\mathbf{x}_1, \mathbf{x}_2, . . . ,\mathbf{x}_p$. Here $\mathbf{x}_j$ is a vector of length $n$, containing the $n$ variable measurements for the $j$th observation. That is,

$$
\mathbf{x}_j = \begin{pmatrix}
x_{1j} \\
x_{2j} \\
\vdots \\
x_{nj} \\
\end{pmatrix}
$$

* $\mathbf{X} = (\mathbf{x}_1, \mathbf{x}_2, . . . ,\mathbf{x}_p)$ , or

  $\mathbf{X} = \begin{pmatrix}
  x_1^T \\
  x_2^T \\
  \vdots \\
  x_n^T \\
  \end{pmatrix}$
<br>

* The $^T$ notation denotes the transpose of a matrix or vector. So, for example,
$$
\mathbf{X}^T = \begin{pmatrix}
x_{11} & x_{21} & ... & x_{n1} \\
x_{12} & x_{22} & ... & x_{n2} \\
\vdots & \vdots & \ddots & \vdots \\
x_{1p} & x_{2p} & ... & x_{np} \\
\end{pmatrix}
$$
<br>

* In this text, a vector of length "$n$" will always be denoted in lower case bold; e.g.
$$
\mathbf{a} = \begin{pmatrix}
a_1 \\
a_2 \\
\vdots \\
a_n \\
\end{pmatrix}
$$
However, vectors that are not of length $n$ (such as feature vectors of length $p$) will be denoted in lowercase normal font !
<br>

* Matrix multiplication: Suppose that $A \in \mathbb{R} ^ {r×d}$ and $B \in \mathbb{R}^{d×s}$. Then $(AB)_{ij} = \sum_{k=1}^d a_{ik}b_{kj}$

# Organization of This Book
1. [Chapter 1](https://colab.research.google.com/drive/1rVZoTW_6vTt1QZZA5Ngx-Ba8c-9qHD6C?usp=sharing) : Introduction
2. [Chapter 2](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : Introduces the basic terminology and concepts behind statistical learning and presents the K-nearest neighbor classifier
3. [Chapter 3](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : linear regression
4. [Chapter 4](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : logistic regression and linear discriminant analysis
5. [Chapter 5](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : cross-validation and the bootstrap
6. [Chapter 6](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : stepwise selection, ridge regression, principal components regression, and the lasso
7. [Chapter 7](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : introduce non-linear methods
8. [Chapter 8](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : tree-based methods, including bagging, boosting, and random forests
9. [Chapter 9](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : Support vector machines
10. [Chapter 10](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : deep learning
11. [Chapter 11](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : survival analysis
12. [Chapter 12](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : principal components analysis, K-means clustering, and hierarchical clustering
13. [Chapter 12](https://colab.research.google.com/drive/1I86AG8APgP4vls-HFjCVTDgJykFpnIsQ?usp=sharing) : multiple hypothesis testing.

At the end of each chapter, we present one or more Python lab sections.

the![picture](https://drive.google.com/uc?export=view&id=1DhoHFe5Uzd6VLYksRGAssvoc7cqq8qb-) denotes sections or exercises that contain more challenging concepts. These can be easily skipped by readers who do not wish to delve as deeply into the material, or who lack the mathematical background.


# Data Sets Used in Labs and Exercises

Here we illustrate statistical learning methods using applications from marketing, finance, biology, and other areas.
The following table contains a summary of the data sets required to perform the labs and exercises.

| Name | Description |
| :---: | :---: |
| <font color='#be5100'>Auto</font> | Gas mileage, horsepower, and other information for cars. |
| <font color='#be5100'>Bikeshare</font> | Hourly usage of a bike sharing program in Washington, DC. |
| <font color='#be5100'>Boston</font> | Housing values and other information about Boston census tracts. |
| <font color='#be5100'>BrainCancer</font> | Survival times for patients diagnosed with brain cancer. |
| <font color='#be5100'>Caravan</font> | Information about individuals offered caravan insurance. |
| <font color='#be5100'>Carseats</font> | Information about car seat sales in 400 stores. |
| <font color='#be5100'>College</font> | Demographic characteristics, tuition, and more for USA colleges. |
| <font color='#be5100'>Credit</font> | Information about credit card debt for 400 customers. |
| <font color='#be5100'>Default</font> | Customer default records for a credit card company. |
| <font color='#be5100'>Fund</font> | Returns of 2,000 hedge fund managers over 50 months. |
| <font color='#be5100'>Hitters</font> | Records and salaries for baseball players. |
| <font color='#be5100'>Khan</font> |Gene expression measurements for four cancer types. |
| <font color='#be5100'>NCI60</font> | Gene expression measurements for 64 cancer cell lines. |
| <font color='#be5100'>NYSE</font> | Returns, volatility, and volume for the New York Stock Exchange. |
| <font color='#be5100'>OJ</font> | Sales information for Citrus Hill and Minute Maid orange juice. |
| <font color='#be5100'>Portfolio</font> | Past values of financial assets, for use in portfolio allocation. |
| <font color='#be5100'>Publication</font> | Time to publication for 244 clinical trials. |
| <font color='#be5100'>Smarket</font> | Daily percentage returns for S&P 500 over a 5-year period. |
| <font color='#be5100'>USArrests</font> | Crime statistics per 100,000 residents in 50 states of USA. |
| <font color='#be5100'>Wage</font> | Income survey data for men in central Atlantic region of USA. |
| <font color='#be5100'>Weekly</font> |1,089 weekly stock market returns for 21 years. |
|  |  |
| |A list of data sets needed to perform the labs and exercises in this textbook.<br> All data sets are available in the <font color='#be5100'>ISLP</font> package, with the exception of <font color='#be5100'>USArrests</font>,<br> which is part of the base <font color='#be5100'>R</font> distribution, but accessible from <font color='#be5100'>Python</font>.|
