**1. Explain the difference between artificial intelligence, machine learning, and deep learning**

1. **Artificial Intelligence (AI):**
   - AI can be described as the effort to automate intellectual tasks normally performed by humans. As such, AI is a general field encompasses machine learning and deep learning, but that also includes many more approaches that may not involve any learning

2. **Machine Learning (ML):**
   - Machine learning is a subset of artificial intelligence that focuses on the development of algorithms and models that enable computers to learn from data and make predictions or decisions without being explicitly programmed to perform specific tasks. Machine learning discovers rules for executing a data processing task, given examples of what’s expected.

3. **Deep Learning (DL):**
   - Deep learning is a specific subfield of machine learning: a new take on learning rep- resentations from data that puts an emphasis on learning successive layers of increas- ingly meaningful representations. The “deep” in “deep learning” isn’t a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations. How many layers contribute to a model of the data is called the depth of the model.

**2. Explain supervised learning and unsupervised learning. Give 3 application examples for each type of machine learning method. For each application, give examples of its corresponding loss function.**

**Supervised Learning:**

Supervised learning is a type of machine learning where the model is trained on a labeled dataset, meaning each input data point is associated with a corresponding target label. The goal is to learn a mapping function from input variables to output variables. During training, the model adjusts its parameters to minimize the discrepancy between its predictions and the actual labels in the training data.

Examples of supervised learning applications:

1. **Email Spam Classification**: In this application, the task is to classify emails into spam and non-spam categories. Each email is represented as a set of features (such as the presence of certain keywords) and labeled as either spam or non-spam. Loss Function: Cross-entropy loss.

2. **Medical Diagnosis**: Supervised learning can be used to diagnose medical conditions based on patient data such as symptoms, test results, and medical history. Each patient's data is associated with a diagnosis label (e.g., presence or absence of a disease). Loss Function: Binary cross-entropy loss or categorical cross-entropy loss depending on the number of classes.

3. **Stock Price Prediction**: Predicting stock prices based on historical stock data, market trends, and other relevant factors. The task involves predicting the future price of a stock given its past performance. Loss Function: Mean squared error (MSE) or mean absolute error (MAE) to measure the difference between predicted and actual stock prices.

**Unsupervised Learning:**

Unsupervised learning involves training a model on an unlabeled dataset, where the algorithm learns patterns and structures from the input data without explicit supervision. The objective is typically to find hidden patterns, groupings, or representations within the data.

Examples of unsupervised learning applications:

1. **Clustering Customer Segmentation**: Grouping customers based on their purchasing behavior, demographics, or other features without any prior labels. This can help businesses target specific customer segments with tailored marketing strategies.

2. **Anomaly Detection**: Identifying unusual patterns or outliers in data that do not conform to expected behavior. This is useful in various domains such as fraud detection in financial transactions or detecting abnormal behavior in industrial machinery.

3. **Dimensionality Reduction**: Reducing the number of features in a dataset while preserving its essential information. Techniques like Principal Component Analysis (PCA) or t-distributed Stochastic Neighbor Embedding (t-SNE) are commonly used for this purpose.

**3. Explain underfitting and overfitting, and optimization and generalization.**

1. **Underfitting**:

   Underfitting occurs when a model is too simple to capture the underlying structure of the data. In other words, the model fails to learn the patterns present in the training data and performs poorly not only on the training data but also on unseen data (test or validation data). Underfitting typically happens when the model is too simplistic or when it hasn't been trained for long enough.

   Characteristics of underfitting:
   - High training error
   - High testing error
   - Poor performance on both training and unseen data

2. **Overfitting**:

   Overfitting occurs when a model learns the training data too well, capturing noise and random fluctuations rather than the underlying patterns. As a result, an overfitted model performs very well on the training data but generalizes poorly to new, unseen data.

   Characteristics of overfitting:
   - Low training error (model fits training data very well)
   - High testing error (poor performance on unseen data)
   - Model captures noise and irrelevant details from the training data

**Optimization and Generalization**:

Optimization and generalization are two key concepts in machine learning that are closely related to the training and performance of models.

1. **Optimization**:

   Optimization refers to the process of adjusting the parameters of a model to minimize (or maximize) a loss function. During the training phase, the goal of optimization is to find the set of parameters that yield the best performance on the training data. This is typically achieved through iterative optimization algorithms such as gradient descent, where the model parameters are updated in the direction that reduces the loss function.

2. **Generalization**:

   Generalization refers to how well a trained model performs on new, unseen data. The ultimate goal of machine learning is to build models that generalize well, meaning they can make accurate predictions on data that they haven't seen during training. A model that generalizes well has learned the underlying patterns in the data rather than memorizing specific instances from the training set. Generalization is crucial because the true performance of a model is evaluated on its ability to make predictions on unseen data.

In summary, underfitting and overfitting represent two types of poor model performance, whereas optimization and generalization are key concepts in the training and evaluation of machine learning models. Achieving a balance between fitting the training data well (optimization) and generalizing to unseen data (generalization) is essential for building effective and robust machine learning models. Regularization techniques and appropriate model selection can help mitigate the issues of underfitting and overfitting, leading to better generalization performance.

**4. We consider the same stock price data used in class. This data set contains the daily price of 470 stocks from Feb 8, 2013 – Feb 7, 2018. We are interested in predicting whether the return of Adobe Inc (stock symbol: ADBE) will be negative, low, or high using the remaining 469 stocks. Let us use the first 4 years (2/8/2013-2/7/2017) for training and the last year (after 2/8/2017) for testing. Here, a stock has a negative return if the return is negative, a low return if its return falls between 0 and 2%, and a high return if its return exceeds 2%. Follow the steps below to achieve this goal:**
- a. Download the Python script from Blackboard used in class for analyzing the same data set. Use the code to convert the stock price data to return data. Modify the code to convert the continuous return to a categorical variable. In particular, the response variable takes the value 1 if negative, 2 if low, and 3 if high.
- b. Divide the data into training and test sets.
- c. Fit a multinomial regression model. The document below should be able to help. Note
that the Python code I used in class works for the binary response, so you should not directly apply it here.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
- d. Use your model in step 3 to predict the value of y on both the training and test data. Note that the model you built returns a probability vector of dimension 3 specifying the probabilities of y=1,2,3 at each time point. For example, if on 1/1/2018, your model returns a probability vector (0.9,0.1,0), then you should set yhat=1 because the probability vector achieves its maximum value at location 1.

**1. Loading the dataset and creating response variable**

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

from sklearn.linear_model import ElasticNet, ElasticNetCV

from sklearn.linear_model import LogisticRegression, LogisticRegressionCV

data_path = "https://www.dropbox.com/s/jtecbtn6mktwspr/stock.csv?dl=1"
df = pd.read_csv(data_path)

In [2]:
df.head()

Unnamed: 0,Time,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,...,XL,XLNX,XOM,XRAY,XRX,XYL,YUM,ZBH,ZION,ZTS
0,2013-02-08,45.08,14.75,78.9,67.8542,36.25,46.89,34.41,73.31,39.12,...,28.24,37.51,88.61,42.87,31.84,27.09,65.3,75.85,24.14,33.05
1,2013-02-11,44.6,14.46,78.39,68.5614,35.85,46.76,34.26,73.07,38.64,...,28.31,37.46,88.28,42.84,31.96,27.46,64.55,75.65,24.21,33.26
2,2013-02-12,44.62,14.27,78.6,66.8428,35.42,46.96,34.3,73.37,38.89,...,28.41,37.58,88.46,42.87,31.84,27.95,64.75,75.44,24.49,33.74
3,2013-02-13,44.75,14.66,78.97,66.7156,35.27,46.64,34.46,73.56,38.81,...,28.42,37.8,88.67,43.08,32.0,28.26,64.41,76.0,24.74,33.55
4,2013-02-14,44.58,13.99,78.84,66.6556,36.57,46.77,34.7,73.13,38.61,...,28.22,38.44,88.52,42.91,32.12,28.47,63.89,76.34,24.63,33.27


In [3]:
df.tail()

Unnamed: 0,Time,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,...,XL,XLNX,XOM,XRAY,XRX,XYL,YUM,ZBH,ZION,ZTS
1254,2018-02-01,72.83,53.88,117.29,167.78,116.34,99.29,62.18,160.46,199.38,...,36.79,72.49,89.07,60.73,32.75,74.84,83.98,128.19,54.98,77.82
1255,2018-02-02,71.25,52.1,113.93,160.5,115.17,96.02,61.69,156.9,195.64,...,38.25,70.64,84.53,60.06,31.63,75.66,82.63,125.79,54.15,76.78
1256,2018-02-05,68.22,49.76,109.86,156.49,109.51,91.9,58.73,151.83,190.27,...,37.68,66.97,79.72,58.54,31.38,72.66,79.8,123.18,51.65,73.83
1257,2018-02-06,68.45,51.18,112.2,163.03,111.2,91.54,58.86,154.69,194.47,...,37.34,68.99,78.35,58.46,30.85,71.33,80.58,122.3,52.52,73.27
1258,2018-02-07,68.06,51.4,109.93,159.54,113.62,94.22,58.67,155.15,192.34,...,42.0,66.97,76.94,58.3,31.18,71.79,80.13,120.78,54.02,73.86


In [4]:
Prices_diff = df.iloc[:, 1:].diff()
Prices_diff.dropna(inplace=True)  ## Drop the rows which contain missing values
X = pd.DataFrame(Prices_diff.to_numpy()/df.iloc[:(df.shape[0]-1), 1:].to_numpy(),
                 columns = df.columns[1:])

X['return_tier'] = [1 if i < 0 else 2 if i >= 0 and i <= 0.02 else 3 for i in X['ADBE']]

# take the return tier variable as y
y = X.loc[:,'return_tier']
# print(y)

X = X.drop(['ADBE', 'return_tier'], axis=1)
print(X.shape)

(1258, 469)


In [5]:
# the sample size (number of observations or data points)
n = X.shape[0]
print(n)

# the dimensionality (number of predictor or X variables)
p = X.shape[1]
print(p)

1258
469


In [6]:
# data training period: 2/8/2013 - 2/7/2017

df[(df.Time >='2013-02-08') & (df.Time <= '2017-02-07')].head()

Unnamed: 0,Time,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,...,XL,XLNX,XOM,XRAY,XRX,XYL,YUM,ZBH,ZION,ZTS
0,2013-02-08,45.08,14.75,78.9,67.8542,36.25,46.89,34.41,73.31,39.12,...,28.24,37.51,88.61,42.87,31.84,27.09,65.3,75.85,24.14,33.05
1,2013-02-11,44.6,14.46,78.39,68.5614,35.85,46.76,34.26,73.07,38.64,...,28.31,37.46,88.28,42.84,31.96,27.46,64.55,75.65,24.21,33.26
2,2013-02-12,44.62,14.27,78.6,66.8428,35.42,46.96,34.3,73.37,38.89,...,28.41,37.58,88.46,42.87,31.84,27.95,64.75,75.44,24.49,33.74
3,2013-02-13,44.75,14.66,78.97,66.7156,35.27,46.64,34.46,73.56,38.81,...,28.42,37.8,88.67,43.08,32.0,28.26,64.41,76.0,24.74,33.55
4,2013-02-14,44.58,13.99,78.84,66.6556,36.57,46.77,34.7,73.13,38.61,...,28.22,38.44,88.52,42.91,32.12,28.47,63.89,76.34,24.63,33.27


**2. Divide the data into training and test sets**

In [7]:
# divide the data into training and testing
n_train = 1006
n_test = n - n_train

X_train = X.iloc[:n_train, :]
y_train = y[:n_train]

X_test = X.iloc[n_train:, :]
y_test = y[n_train:]

**3. Fitting multinominial regression model**

In [8]:
from sklearn.linear_model import LogisticRegression
logit_fitting_model = LogisticRegression(random_state=0, multi_class='multinomial').fit(X_train, y_train)

**4. Predicting value on both training and test data**

In [9]:
yhat_train = logit_fitting_model.predict(X_train)
yhat_train_prob = logit_fitting_model.predict_proba(X_train)

yhat_test = logit_fitting_model.predict(X_test)
yhat_test_prob = logit_fitting_model.predict_proba(X_test)

In [10]:
print(yhat_train[:10])
print(yhat_train_prob[:10])

print(yhat_test[:10])
print(yhat_test_prob[:10])

[1 2 2 2 2 2 1 1 2 1]
[[0.55049891 0.40293902 0.04656207]
 [0.38221577 0.52733701 0.09044722]
 [0.39278209 0.53774917 0.06946874]
 [0.44237046 0.49417497 0.06345457]
 [0.38770461 0.55355773 0.05873765]
 [0.25692891 0.6277563  0.11531479]
 [0.90889522 0.08821783 0.00288695]
 [0.77126336 0.21461166 0.01412499]
 [0.14251007 0.6767797  0.18071023]
 [0.94315388 0.05528298 0.00156314]]
[2 2 2 2 2 2 1 2 2 1]
[[0.37236162 0.55149288 0.07614551]
 [0.22822421 0.62530094 0.14647485]
 [0.3289157  0.56240863 0.10867567]
 [0.34952556 0.54827241 0.10220203]
 [0.42756569 0.49245105 0.07998326]
 [0.25261295 0.64610262 0.10128443]
 [0.55920936 0.40032123 0.04046941]
 [0.38749245 0.53565552 0.07685203]
 [0.21356212 0.66712875 0.11930913]
 [0.53956104 0.42030917 0.04012979]]


**5. Calculating prediction error**

In [11]:
print(f'Prediction error on train set: {1 - np.mean(y_train == yhat_train)}')
print(f'Prediction error on test set: {1 - np.mean(y_test == yhat_test)}')

Prediction error on train set: 0.2813121272365805
Prediction error on test set: 0.33333333333333337
