In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

<html>
<p style="font-size:32px"><strong>Classical Machine Learning</strong></p>
</html>

<html>
<p style="font-size:26px"><strong>Week 0</strong></p>
</html>
 

**Plan**
- Setting up your learning and programming environment


**Getting started**
- [Setting up your ML environment](Setup_NYU.ipynb)
    - [Choosing an ML environment](Choosing_an_ML_Environment_NYU.ipynb)
- [Quick intro to the tools](Getting_Started.ipynb)

<!--- #include (README.md) --->

# Week 1: Introduction
**Plan**
- Motivate Machine Learning
- Introduce notation used throughout course
- Plan for initial lectures
    - *What*: Introduce, motivate a model
    - *How*:  How to use a model: function signature, code (API)
    - *Why*:  Mathematical basis -- enhance understanding and ability to improve results

        
- [Course Overview](Course_overview_NYU.ipynb)
- [Machine Learning: Overview](ML_Overview.ipynb)
- [Intro to Classical ML](Intro_Classical_ML.ipynb)

## Using an AI Assistant

AI Assistants are often very good at coding.

But using one to just "get the answer" deprives you of a valuable tool
- you can ask the Assistant *why* it chose to do something
- keep on asking
- treat it as a private tutor !

[Learning about the Landscape of ML](https://www.perplexity.ai/search/i-am-interested-in-the-landsca-_yO63NWfSGS8iHR5nyQYVA)

[Learning about KNN using an Assistant as a private tutor](https://www.perplexity.ai/search/using-python-and-sklearn-pleas-407oe3uzTXu1i9xEHVR2MQ)
- [Code answer from Assistant](KNN_illustration_Perplexity.ipynb)

## Week 2 (early start in Week 1)

We began covering the 
**Recipe, as illustrated by Linear Regression**

[The Recipe for Machine Learning: Solving a Regression task](Recipe_via_Linear_Regression.ipynb)
- A *process* for Machine Learning
    - Go through the methodical, multi-step process
        - Quick first pass, followed by Deeper Dives
 

# Week 2: Regression task

**Plan**

We will learn the Recipe for Machine Learning,  a disciplined approach to solving problems in Machine Learning.

We will illustrate the Recipe while, at the same time,
introducing a model for the Regression task: Linear Regression.

Our coverage of the Recipe will be rapid and shallow (we use an extremely simple example for illustration).

I highly recommend reviewing and understanding
this [Geron notebook](external/handson-ml2/02_end_to_end_machine_learning_project.ipynb)
in order to acquire a more in-depth appreciation of the Recipe.

<table>
    <tr>
        <th><center>Recipe for Machine Learning</center></th>
    </tr>
    <tr>
        <td><img src="images/W1_L3_S4_ML_Process.png" width="100%"></td>
    </tr>
</table>



**Recipe, as illustrated by Linear Regression**

[The Recipe for Machine Learning: Solving a Regression task (continued)](Recipe_via_Linear_Regression.ipynb#Create-a-test-set)

- A *process* for Machine Learning
    - Go through the methodical, multi-step process
        - Quick first pass, followed by Deeper Dives
     
**Fitting a model: details**

Recall: fitting a model (finding optimal value for the parameters) is found by minimizing a Loss function.

Let's examine a typical Loss function for Regression
- [Regression: Loss Function](Linear_Regression_Loss_Function.ipynb)

**Iterative training: when to stop**

Increasing the number of parameters of a model improves in-sample fit (reduces Loss) but may compromise
out-of-sample prediction (generalization).

We examine the issues of having too many/too few parameters.
- [When to stop iterating: Bias and Variance](Bias_and_Variance.ipynb)

**Get the data: Fundamental Assumption of Machine Learning**

- [Getting *good* training examples](Recipe_Training_data.ipynb)

**Regression: final thoughts (for now)**

- [Regression: coda](Regression_coda.ipynb)

**Deeper dives**
- [Fine tuning techniques](Fine_tuning.ipynb)

## Recipe "Prepare the Data" step: Transformations

We discuss the importance of adding *synthetic* features to our Linear Regression example
- and *preview* the *mechanical* process of creating these features via *Transformations*

**Transformations**
 - [Prepare Data: Intro to Transformations](Prepare_data_Overview.ipynb)

## Validation

Our test dataset can be used only once, yet
- we have an iterative process for developing models
- each iteration requires a proxy for out of sample data to use in the Performance Metric

The solution: create a proxy for out of sample that is a *subset* of the training data.


- [Validation and Cross-Validation](Recipe_via_Linear_Regression.ipynb#Validation-and-Cross-Validation)
- [Avoiding cheating in Cross-Validation](Prepare_data_Overview.ipynb#Using-pipelines-to-avoid-cheating-in-cross-validation)

## Using an AI Assistant

[Learning about Linear Regression using an Assistant as private tutor](https://www.perplexity.ai/search/using-python-sklearn-and-matpl-vTYy7oGdRQ6upR5L5OSjrg)
- [Code Answer from Assistant](LinearRegression_Illustration_Perplexity.ipynb)

## Week 3 (early start in Week 1)

**Classification intro**
- [Classification: Overview](Classification_Overview.ipynb)
- [Classification and Categorical Variables](Classification_Notebook_Overview.ipynb)
    - [linked notebook](Classification_and_Non_Numerical_Data.ipynb)

**Categorical variables** (contained as subsections of Classification and Categorical Variables)

We examine the proper treatment of categorical variables (target or feature).

Along the way, we run into a subtle difficulty: the Dummy Variable Trap.

- [Classification and Categorical Variables: Categorical Variables](Classification_Notebook_Overview.ipynb#Categorical-variables)
    - [Categorical variables, One Hot Encoding (OHE)](Categorical_Variables.ipynb)

# Week 3: Classification task

**Non-feature dimensions**

In response to questions about Assignment 1, 
- we will clarify 
the limitations in our ability to handle *timeseries* data with our current tools.


[Non-feature dimensions: preview](Non-feature_dimensions_preview.ipynb)



**Plan**
- We introduce a model for the Classification task: Logistic Regression
- How to deal with Categorical (non-numeric) variables
    - classification target
    - features

**Classification intro**
- [Classification: Overview](Classification_Overview.ipynb)  **Covered last week**
- [Classification and Categorical Variables (continued)](Classification_and_Non_Numerical_Data.ipynb#Recipe-Step-B:-Exploratory-Data-Analysis-(EDA))
    - [linked notebook](Classification_and_Non_Numerical_Data.ipynb)

**Categorical variables** (contained as subsections of Classification and Categorical Variables)

We examine the proper treatment of categorical variables (target or feature).

Along the way, we run into a subtle difficulty: the Dummy Variable Trap.

- [Classification and Categorical Variables: Categorical Variables](Classification_Notebook_Overview.ipynb#Categorical-variables)
    - [Categorical variables, One Hot Encoding (OHE)](Categorical_Variables.ipynb)

**Multinomial Classification**

We generalize Binary Classification into classification into more than two classes.

- [Multinomial Classification](Multinomial_Classification.ipynb)

**Error Analysis**

We can only improve our model's out of sample Performance Metric
- by diagnosing the in-sample errors
- that is the goal of the Error Analysis step of the Recipe
- We explain Error Analysis for the Classification Task, with a detailed example
- How Training Loss can be improved

The conversion of a probability (e.g., model output) to a Class (categorical variable) for Classification
- often involves the comparison of a probability to a threshold
- we show how varying the threshold changes the conditional Performance Metric for Classification
    - the threshold is a hyper-parameter, thus this is a kind of Fine-Tuning
  
- [Error Analysis](Error_Analysis_Overview.ipynb)
    - [linked notebook](Error_Analysis.ipynb)
        - Summary statistics
        - Conditional statistics
    - [Worked example](Error_Analysis_MNIST.ipynb)**Deferred**

- [Loss Analysis: Using training loss to improve models](Training_Loss.ipynb)

**Classification and Categorical variables wrapup**

- [Classification Loss Function](Classification_Loss_Function.ipynb)
- [Baseline model for Classification](Classification_Baseline_Model.ipynb)
- [OHE issue: Dummy variable trap](Dummy_Variable_Trap.ipynb)


**Classification: final thoughts (for now)**

- [Classification: coda](Classification_coda.ipynb)

**Plan**
**Deeper dives**
- [Log odds](Classification_Log_Odds.ipynb)

# Week 4: Transformations

**Plan**

Now you know how to create models.  What happens if the Performance Metric for your model
is disappointing ?

The first step is recognizing the issue, and diagnosing it.  That is the role of Error Analysis.

The second step is attempting to improve Performance.  Quite often we will need to perform
*Feature Engineering*.

We explain
- why it is often necessary to create *synthetic* features to augment or replace *raw* feature
- the mechanical process in `sklearn` that makes the application of transformations easy and consistent

## Error Analysis: worked example (deferred from prior week)

- [Error Analysis](Error_Analysis_Overview.ipynb)
    - [linked notebook](Error_Analysis.ipynb)
        - Summary statistics
        - Conditional statistics
    - [Worked example](Error_Analysis_MNIST.ipynb)
- [Loss Analysis: Using training loss to improve models](Training_Loss.ipynb)




## Transformations: the "why"

Part of becoming a better Data Scientist is transforming raw features into more useful synthetic features.

We focus
on the necessity (the "why"): transforming raw data into something that tells a story.

We will then discuss the mechanics (how to use `sklearn` to implement transformation Pipelines) of Transformations.



<!--- #include (nvda_normalization_data.csv) --->
<!--- #include (MORTGAGE30US.csv) --->

- [Becoming a successful Data Scientist](Becoming_a_successful_Data_Scientist.ipynb)
- [Transformations: overview](Transformations_Overview.ipynb)
    - linked notebooks:
        - [Transformations: adding a missing feature](Transformations_Missing_Features.ipynb)

##  Transformations: the "how"

Having hopefully motivated the use of transformations in theory
- we turn to the *mechanical* process of creating these features via *Transformations in `sklearn`*

**Transformations**
 - [Prepare Data: Intro to Transformations](Prepare_data_Overview.ipynb)

### Transformations: Avoiding cheating when using Cross-Validation

Our test dataset can be used only once, yet
- we have an iterative process for developing models
- each iteration requires a proxy for out of sample data to use in the Performance Metric

The solution: create a proxy for out of sample that is a *subset* of the training data.

- [Validation and Cross-Validation](Recipe_via_Linear_Regression.ipynb#Validation-and-Cross-Validation)
(**Covered in week 1**)
- [Avoiding cheating in Cross-Validation](Prepare_data_Overview.ipynb#Using-pipelines-to-avoid-cheating-in-cross-validation)




# Week 5: Other Classification models

## Imbalanced data
- [Imbalanced data](Imbalanced_Data.ipynb)

## More models for classification

**Plan**
- More models: Decision Trees, Naive Bayes, Support Vector Classifier
    - Different flavor: more procedural, less mathematical
    - Decision Trees: a model with *non-linear* boundaries
- Ensembles
    - Bagging and Boosting
    - Random Forests

**Decision Trees, Ensembles**

- [Decision Trees: Overview](Decision_Trees_Overview.ipynb)
- [Decision Trees](Decision_Trees_Notebook_Overview.ipynb)
    - [linked notebook](Decision_Trees.ipynb)
- [Trees, Forests, Ensembles](Ensembles.ipynb)


**Naive Bayes**
- [Naive Bayes](Naive_Bayes.ipynb)


**Support Vector Classifiers**
- [Support Vector Machines: Overview](SVM_Overview.ipynb)
- [SVC Loss function](SVM_Hinge_Loss.ipynb)
- [SVC: Large Margin Classification](SVM_Large_Margin.ipynb)  
- [SVM: Kernel Transformations](SVM_Kernel_Functions.ipynb)
- [SVM Wrapup](SVM_Coda.ipynb)

**Classification: final thoughts**

- [Classification: coda -- review again](Classification_coda.ipynb)


## Using an AI Assistant to learn about other models for classification
- [SVC conversation](https://www.perplexity.ai/search/what-is-the-relationship-betwe-Pq8r22pISH.gUGmM1gkgbg#4)


# Week 6: Unsupervised Learning

## More models for classification (continued)

**SVC: review**

- [SVC: Key points](SVM_Large_Margin.ipynb#SVC:-Key-points)

- [SVC conversation](https://www.perplexity.ai/search/what-is-the-relationship-betwe-Pq8r22pISH.gUGmM1gkgbg#4)


**SVM (deferred from last week)**
- [SVM: Kernel Transformations](SVM_Kernel_Functions.ipynb)
- [SVM Wrapup](SVM_Coda.ipynb)


**Naive Bayes** (deferred from last week)
- [Naive Bayes](Naive_Bayes.ipynb)

**Classification coda**
- [Classification: probability distribution over classes](Classification_coda.ipynb#Output:-probabilities-or-just-classes-?)

## Unsupervised Learning
**Unsupervised Learning: PCA**
- [Unsupervised Learning: Overview](Unsupervised_Overview.ipynb)
- [PCA Notebook Overview](Unsupervised_Notebook_Overview.ipynb)
    - [linked notebook](Unsupervised.ipynb)
- [PCA in Finance](PCA_Yield_Curve_Intro.ipynb)

**Unsupervised Learning: PCA** (continued)
- [Importance of number of components: visualization](Unsupervised.ipynb#Visualizing-the-fidelity-of-the-reduced-dimension-representation)
- [Interpreting the components](Unsupervised.ipynb#Can-we-interpret-the-components-?)

# Week 7: DL Week 1 Introduction to Neural Networks and Deep Learning

## Bridge between Classical ML and Deep Learning

**Gradient Descent** 

Machine Learning is based on minimization of a Loss Function.  Gradient Descent is one algorithm
to achieve that.
- [Gradient Descent](Gradient_Descent.ipynb)


**Recommender Systems (Pseudo SVD)**

How does Amazon/Netflix/etc. recommend products/films to us ?  We describe a method similar to SVD
but that is solved using Gradient Descent.

This theme of creating a custom Loss Functions and minimizing it via Gradient Descent is a recurring
theme in the upcoming Deep Learning second half of the course.

- [Recommender Systems](Recommender_Systems.ipynb)
- [Preview: Some cool Loss functions](Loss_functions.ipynb#Loss-functions-for-Deep-Learning:-Preview)

**Deeper Dives**


- [Other matrix factorization methods](Unsupervised_Other_Factorizations.ipynb)


## Classical ML: deeper dives

**Loss functions: mathematical basis** (deferred)

Where do the Loss functions of Classical Machine Learning come from ?  We take a brief mathematical
detour into Loss functions.

- [Entropy, Cross Entropy, and KL Divergence](Entropy_Cross_Entropy_KL_Divergence.ipynb)
- [Loss functions: the math](Loss_functions.ipynb)
    - Maximum likelihood
    - Preview: custom loss functions and Deep Learning

**Deeper Dives**
- [Linear Regression in more depth](Linear_Regression_fitting.ipynb)
- [Interpretation: Linear Models](Linear_Model_Interpretation.ipynb)
- [Missing data: clever ways to impute values](Missing_Data.ipynb)
- [Feature importance](Feature_Importance.ipynb)
- [SVC Loss function derivation](SVM_Derivation.ipynb)

## Deep Learning: Introduction

**Plan**

Deep Learning/Neural networks

- [Set up your Tensorflow environment](Tensorflow_setup.ipynb)
- [Neural Networks Overview](Neural_Networks_Overview.ipynb)


**Neural network: practical**
- Coding Neural Networks:  Keras
    - [Intro to Keras](Keras_intro.ipynb)

    - **Note**
        - If you have problems using the `plot_model` function in Keras on your local machine: see [here](Setup_ML_Environment_NYU.ipynb#Tools-for-visualization-of-graphs-(optional)) for a fix.

    - Linked notebooks
    <!--- #include (DNN_Keras_example.ipynb) --->
        - [DNN Keras example](DNN_Keras_example.ipynb) **local machine**
        - [DNN Keras example Notebook from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/DNN_Keras_example.ipynb) (**Google Colab**)


- Practical Colab
<!--- The Colab notebook imports some modules; make sure they are in the repo --->
<!--- #include (neural_net_helper.py) --->
<!--- The Colab notebook imports some modules; make sure they are in the repo --->
<!--- #include (Colab_practical.ipynb)) --->
<!--- #include (CommonLib.py) --->
   - **Colab**: [Practical Colab Notebook from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/Colab_practical.ipynb)
   
**Practical advice**

- Karpathy: [Recipe for training Neural Nets](Karpathy_Recipe_for_training_NN.ipynb)





# Week 8 (DL Week 2)  Intro to NN (continued); Convolutional Neural Networks

## Introduction (continued)

Here is a quick review of Neural Networks

- [Neural Network summary](Neural_Network_summary.ipynb)

Overview continued:

- [Overview (continued)](Neural_Networks_Overview.ipynb#What-is-$W_\llp$-?-Where-did-$\Theta$-go-?)

Coding a Neural Network

- Coding Neural Networks:  Keras
    - [Intro to Keras](Keras_intro.ipynb) **covered last lecture**

    - **Note**
        - If you have problems using the `plot_model` function in Keras on your local machine: see [here](Setup_ML_Environment_NYU.ipynb#Tools-for-visualization-of-graphs-(optional)) for a fix.

    - Linked notebooks
    <!--- #include (DNN_Keras_example.ipynb) --->
        - [DNN Keras example](DNN_Keras_example.ipynb) **local machine**
        - [DNN Keras example Notebook from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/DNN_Keras_example.ipynb) (**Google Colab**)


- Practical Colab
<!--- The Colab notebook imports some modules; make sure they are in the repo --->
<!--- #include (neural_net_helper.py) --->
<!--- The Colab notebook imports some modules; make sure they are in the repo --->
<!--- #include (Colab_practical.ipynb)) --->
<!--- #include (CommonLib.py) --->
   - **Colab**: [Practical Colab Notebook from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/Colab_practical.ipynb)
   
**Practical advice** (continued)
- Karpathy: [Recipe for training Neural Nets](Karpathy_Recipe_for_training_NN.ipynb)

## NN: in depth

**Plan**

The topics introduced in the Neural Networks Overview are now covered more in-depth.
- Where do Neural Networks get their power from ?
- How exactly do we compute the gradients ?
- How does a special language/library facilitate automatic computation of the gradients ?

**Neural network theory**
- [A neural network is a Universal Function Approximator](Universal_Function_Approximator.ipynb)

**Training Neural Networks (introduction)**
- [Intro to Training](Neural_Networks_Intro_to_Training.ipynb)
- [Training Neural Networks - Back propagation](Training_Neural_Network_Backprop.ipynb)

**How to compute gradients automatically**
- [Why TensorFlow ?: Gradients made easy](Training_Neural_Network_Operation_Forward_and_Backward_Pass.ipynb)

   
**Deeper Dives**
<!--- #include (Raw_TensorFlow.ipynb)) --->
- [Keras, from past to present](Tensorflow_Keras_Archaeology.ipynb)
- [History/Computation Graphs: Tensorflow version 1](DNN_TensorFlow_Using_TF_version_1.ipynb)
- [Raw_TensorFlow example Notebook from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/Raw_TensorFlow.ipynb) (**Colab**)
- [Computation Graphs](Computation_Graphs.ipynb)

# Week 9 (DL Week 3): Training dynamics; Convolutional Neural Network Layer

## Training Neural Networks: details

**Plan**
- Why training a Neural Network can be difficult: fine-details of training

**Training Neural Networks: the fine details**
- [The dynamics of training](Training_Neural_Networks_Overview.ipynb)
    - Effects of changing: activation functions; weight initialization
    - initialization and scaling
    - dropout
    - learning rate schedules
    - vanishing/exploding gradients
    
## Convolutional Neural Network (CNN) Layer

**Plan**

We introduce a new layer type.

This is motivated by inputs with dimensions in addition to the feature dimension.

The Convolutional Neural Network layer type

- [Non-feature dimensions: preview](Non-feature_dimensions_preview.ipynb)

- [Introduction to CNN](Intro_to_CNN.ipynb)
    - [CNN pictorial](CNN_pictorial.ipynb)
- [Notational standards, definitions](CNN_Notation.ipynb)
- [CNN: Space and Time](CNN_Space_and_Time.ipynb)
    <!--- #include (CNN_Keras.ipynb) --->
    - [CNN example from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/CNN_Keras.ipynb) (**Colab**) 
    - [CNN example from github](CNN_Keras.ipynb) (**local machine**) 

The following notebooks are an older attempt at a *visual* explanation of the CNN.

Hopefully, the "Introduction" notebook is more intuitive and may supercede these visual notebooks.

- [CNN: explained in pictures](CNN_Overview.ipynb)


**Deeper dives**
- [Convolution as Matrix Multiplication](CNN_Convolution_as_Matrix_Multiplication.ipynb)


# Week 10 (DL Week 4): Recurrent Neural Networks

**CNN: code**

<!--- #include (CNN_Keras.ipynb) --->
- [CNN example from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/CNN_Keras.ipynb) (**Colab**) 
- [CNN example from github](CNN_Keras.ipynb) (**local machine**) 

## Recurrent Neural Networks

**Recurrent Neural Networks (RNN)**
- [Introduction to Recurrent Neural Network (RNN)](Intro_to_RNN.ipynb)
- [Recurrent Neural Network Overview](RNN_Overview.ipynb)
    - [linked notebook: RNN in code -- Imdb sentiment classification](NLP_Keras.ipynb#Try-an-LSTM-as-a-means-of-obtaining-a-finite-length-representation-of-the-sequence)


**RNN: Issues**
- [Gradients of an RNN](RNN_Gradients.ipynb)
- [RNN: Gradients that Vanish/Explode](RNN_Vanishing_and_exploding_gradients.ipynb)
- [RNN: Visualization](RNN_Visualization.ipynb)

## Advanced Recurrent Architectures: LSTM


**Plan**

The "vanilla" Recurrent Neural Network (RNN) layer we learned is very much exposed to the problem of vanishing/exploding gradients.

We will review the issue and demonstrate a related layer type (the LSTM) designed to mitigate the problem.

We  present an extremely useful trick (Transfer Learning) for leveraging the hard work that others have done.

**Concepts**

There are a number of pieces of the LSTM which can appear overwhelming when seen together for the first time.  We will explore these concepts separately before seeing how they are integrated into the LSTM.

- [Residual connections](RNN_Residual_Networks.ipynb)
- [Neural Programming](Neural_Programming.ipynb)

**LSTM: An improved RNN**

- [Introduction to the LSTM](Intro_to_LSTM.ipynb)
- [LSTM Overview](LSTM_Overview.ipynb)





## Layer types: review

Sprint is over ! We have covered the basic layer types; time for you to learn by experimenting.

**Review of layer types**
- [What layer type to choose](Neural_Network_Layer_Review.ipynb)

**Deeper dives**
- [RNN: How to deal with long sequences](RNN_Long_Sequences.ipynb)


# Week 11 (DL Week 5 ): Transfer Learning; Natural Language Processing

## Transfer Learning

Transfer learning allows us to adapt a model trained for one task to be able to solve a new task with a small amount of work.  As models get bigger and bigger, the future of Deep Learning may be one where you use Transfer Learning more than developing your own models from scratch.

- [Transfer Learning (Continued)](Transfer_Learning.ipynb)

     - [Transfer Learning example from github](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/TransferLearning_Keras.ipynb) (**Colab**)
     - [Transfer Learning example from github](TransferLearning_Keras.ipynb) (**local machine**)

     - [Utility notebook](Dogs_and_Cats_reformat.ipynb)
         - Takes the *very large* raw data (from Kaggle) used in the Transfer Learning example
         - Creates a much smaller subset, using a different directory structure
         - The above notebook uses this reorganized, smaller subset

## NLP

**Plan**

We will make an initial pass on the topic of learning from text: Natural Language Processing.

The first pass will use well-established techniques that are relatively easy to follow.

We then explore some recent advances that have greatly increased the power of NLP.

The Transformer architecture is a key contributor.

**Learning from text: Deep Learning for Natural Language Processing (NLP)**
- [Natural Language Processing Overview](NLP_Overview.ipynb)

We revisit some code we had previously studied
- in the RNN module: to illustrate various ways to eliminate the time dimension
- but this time with an emphasis on the NLP aspects
    - [NLP from github (Colab)](https://colab.research.google.com/github/kenperry-public/ML_Fall_2025/blob/master/NLP_Keras.ipynb)
    - [NLP from github (local machine)](NLP_Keras.ipynb)
   
<!--- #include (squad_show.csv) --->

**Evolution of Word representations**
- [How to represent a word: syntax](NLP_Tokenization.ipynb)
- [How to represent a word: meaning](NLP_Word_Representations.ipynb)

## Transformers: motivation

**Plan**

We present Attention, a way to enhance the power of RNN's, which is heavily used in a new layer type for sequence processing: the Transformer.  

The Transformer layer type is now predominant in the area of Natural Language Processing (NLP).
We give a quick introduction but we will revisit it in the module on advanced NLP.

### Attention

- [Attention: motivation](Intro_to_Attention.ipynb)


# Week 12 (DL Week 6)

We summarize (with illustrations) the key points regarding attention.
- The evolution of the Encoder/Decoder RNN to
    - a "loop-free" architecture via Self-Attention
    - combined with Cross attention from Encoder to Decoder

- [Transformer motivation: Illustrated](Attention_motivation_illustrated.ipynb)

These ideas are combined into a new architecture called the **Transformer**


## Transformers: details
- [Transformer](Transformer.ipynb)


### Attention: in depth

- [Implementing Attention](Attention_Lookup.ipynb)


**Deeper dives**
- Transformer
    - [Keras example: pre-defined Attention layers](https://keras.io/examples/nlp/text_classification_with_transformer/)
        - [notebook](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/nlp/ipynb/text_classification_with_transformer.ipynb)
    - [TensorFlow tutorial: implements Attention, positional encoding](https://www.tensorflow.org/text/tutorials/transformer)
       - [notebook](https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/transformer.ipynb)


**Further reading**
- Attention
    - [Neural Machine Translation by Jointly Learning To Align and Translate](https://arxiv.org/pdf/1409.0473.pdf)
    - [Attention is all you need](https://arxiv.org/pdf/1706.03762.pdf)
- Transfer Learning    
    - [Sebastian Ruder: Transfer Learning](https://ruder.io/transfer-learning/)

## Reinforcement Learning (RL)

**Reference**

[Sutton and Barto: Reinforcement Learning: An Introduction, 2nd edition](http://incompleteideas.net/sutton/book/the-book-2nd.html)

- Note: this is the website of one author: Sutton

A preview

# Week 13 (DL Week 7): Reinforcement Learning (continued), Language Models

## Reinforcement Learning (do-over)

**Reference**

[Sutton and Barto: Reinforcement Learning: An Introduction, 2nd edition](http://incompleteideas.net/sutton/book/the-book-2nd.html)

- Note: this is the website of one author: Sutton

**Introduction to Reinforcement Learning**

- [Introduction to Reinforcement Learning](RL_intro.ipynb)

    **Colab**
    - [RL Intro via Gymnasium](https://colab.research.google.com/drive/1d-JjgUp7Xjjtf5TsDFULBJvwj-dP_E40#scrollTo=ddf2bfa9)
    - [RL Playground](https://colab.research.google.com/drive/1Ei39dUXXA3d3H5AzdxFmbFOF-QS__vqT#scrollTo=de23d490)

**Value based methods**

- [Value-based methods: Introduction](RL_Value_based_intro.ipynb)
- [Value-based methods (model-based](RL_Value_based_model_based.ipynb)
- [Value-based methods  (model-free)](RL_Value_based_model_free.ipynb)

    **On-Policy vs Off-Policy: Supplemental notebooks**
    
    - On-Policy vs Off-Polic: code examples](RL_OnPolicy_vs_OffPolicy_code_examples.ipynb)
    - [On-Policy SARSA vs DQN](https://colab.research.google.com/drive/1vItk1GUHLYd4vsYma5lLGBnNEbSUR1ya)

**Policy based methods**

- [Policy-based methods: Introduction](RL_Policy_gradient_methods_intro.ipynb)
- [Policy-based methods for RL](RL_Policy_gradient_methods_classic.ipynb)


**Preference methods**

[RL Preference methods: introduction](RL_Preference_methods_intro.ipynb)




## Language Models

**Language Models: the future (present ?) of NLP ?**

- [Language Models, the future (present ?) of NLP: Review](Review_LLM.ipynb)

<!--- #include (squad_show.csv) --->

# Additional Deep Learning resources

Here are some resources that I have found very useful.

Some of them are very nitty-gritty, deep-in-the-weeds (even the "introductory" courses)
- For example: let's make believe PyTorch (or Keras/TensorFlow) didn't exists; let's invent Deep Learning without it !
    - You will gain a deeper appreciation and understanding by re-inventing that which you take for granted
    

## [Andrej Karpathy course: Neural Networks, Zero to Hero](https://karpathy.ai/zero-to-hero.html)
- PyTorch
- Introductory, but at a very deep level of understanding
    - you will get very deep into the weeds (hand-coding gradients !) but develop a deeper appreciation
    
## fast.ai

`fast.ai` is a web-site with free courses from Jeremy Howard.
- PyTorch
- Introductory and courses "for coders"
- Same courses offered every few years, but sufficiently different so as to make it worthwhile to repeat the course !
    - [Practical Deep Learning](https://course.fast.ai/)
    - [Stable diffusion](https://course.fast.ai/Lessons/part2.html)
        - Very detailed, nitty-gritty details (like Karpathy) that will give you a deeper appreciation
        
## [Stefan Jansen: Machine Learning for Trading](https://github.com/stefan-jansen/machine-learning-for-trading)

An excellent github repo with notebooks
- using Deep Learning for trading
- Keras
- many notebooks are cleaner implementations of published models


# Assignments

Your assignments should follow the [Assignment Guidelines](assignments/Assignment_Guidelines.ipynb)

## Regression
- Assignment notebook: [Using Machine Learning for Hedging](assignments/Regression%20task/Using_Machine_Learning_for_Hedging.ipynb)
- Data
    - There is an archive file containing the data
    - You can find it
        - Under the course page: Content --> Data --> Assignments --> Regression task
        - You won't be able to view the file in the browser, but you **will** be able to Download it
    - You should unzip this archive into the *the same directory* as the assignment notebook
    - The end result is that the directory should contain
        - The assignment notebook and a helper file
        - A directory named `Data`


## Classification
- Assignment notebook: [Ships in satellite images](assignments/Classification%20task/Ships_in_satellite_images.ipynb#)
- Data
    - There is an archive file containing the data
    - You can find it
        - Under the course page: Content --> Data --> Assignments --> Classification task
        - You won't be able to view the file in the browser, but you **will** be able to Download it
    - You should unzip this archive into the *the same directory* as the assignment notebook
    - The end result is that the directory should contain
        - The assignment notebook and a helper file


## Midterm Project: Bankruptcy One Year Ahead
- Assignment notebook [Bankruptcy One Year Ahead](assignments/bankruptcy_one_yr/Bankruptcy_oya.ipynb)
- Data
    - There is an archive file containing the data
    - You can find it
        - Under the course page: Content --> Data --> Assignments --> Bankruptcy One Year Ahead
        - You won't be able to view the file in the browser, but you **will** be able to Download it
    - You should unzip this archive into the *the same directory* as the assignment notebook
    - The end result is that the directory should contain
        - The assignment notebook and a helper file
        - A directory named `Data`


## Keras practice
- Assignment notebook [Ships in satellite images: Neural Network](assignments/keras_intro/Ships_in_satellite_images_P1.ipynb)
- Data (same as for the Classification assignment)


## Convolutional Neural Networks (CNN)
- Assignment notebook [Ships in satellite images: Neural Network](assignments/CNN_intro/Ships_in_satellite_images_P2.ipynb)
- Data (same as for the Classification assignment)
    - please repeat the directions given in that assignment for obtaining the data

## Final project; Stock prediction

 - Assignment notebooks:
    - [Stock prediction](assignments/stock_prediction/Final_project_StockPrediction.ipynb)
    - [Submission guidelines](assignments/stock_prediction/Final_project.ipynb)
   
 - Data
    - There is an archive file containing the data
    - You can find it
        - Under the course page: Content --> Data --> Assignments --> Stock Prediction
        - You won't be able to view the file in the browser, but you **will** be able to Download it
    - You should unzip this archive into the *the same directory* as the assignment notebook
    - The end result is that the directory should contain
        - The assignment notebook, submission guidelines notebook
        - A directory named `Data/train`


In [1]:
print("Done")

Done
