# 8. Machine learning

> **Note**. Djalgo's AI approach is experimental. 

We introduced machine learning while fitting Gaussian processes in section [5. Walks](05_walks.html). Djalgo's module `djai` includes tools for modeling music from MIDI data relying on Tensorflow (a package for deep learning). `djai` is not loaded by default when importing Djalgo, since otherwise Tensorflow, a large and complicated package, should have been added to Djalgo's dependencies. To use `djalgo`, you must [install Tensorflow](https://www.tensorflow.org/install) in your environment. `djai` also rely on Pretty-midi to load and process MIDI files: you should also install it with `!pip install pretty-midi`. `djai` should be loaded as:

In [1]:
import djalgo as dj
from djalgo import djai

2024-04-18 15:07:15.500285: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


The `djalgo` module is a sophisticated Python library designed for processing MIDI files using deep learning models. It extracts pitch, durations, offsets, an extra timing feature (time delta) and the (one-hot encoded) track the note belongs to. This module is could be useful for music researchers, AI enthusiasts, and developers working in the domain of automated music generation.

Before goind into coding...

## Ethics: art as the witnesses of experience

My ethos will fluctuate and evolve, as anything should in the precious, short time we exist. Their is nothing inherently wrong woth AI, but if your piece was generating with a banal command prompt, your creative process is anything but banal and uninteresting, no matter the result. In times when any artistic piece needed years of work, the result was more important than the process. Now, when anyone can ask a LLM to generate an image of a cat riding a dinausar in a 5D space in the style of a mixed of Daly and cyber-punk, well, results are generated within seconds, and the process becomes more relevant. If, like me, you have spent years to designed your own AI, the *process* (not the result) behind the musical piece has an artistic value as good as any composer who has spent those years studying musical theory. Artists are people who spent the precious time they own to think on the narration of the object they created. When the process becomes applying reciepe, it belongs to home sweet home printed carpets sold on Amazon.

The `djai` module doesn't come with pre-trained models. That would have been too easy, right? I prefer seeing you tweak it and train it with your own compositions rather than just use it on Leonard Cohen song to generate new one. You worth more than this, and the world deserves more than command-prompt artists.

## Key Features

`djai` has the following features.

1. **MIDI File Scanning**: Scans directories for MIDI files, allowing for selective processing based on user-defined limits.
1. **Feature Extraction**: Extracts musical features such as pitch, duration, and timing from MIDI files.
1. **Data Preprocessing**: Handles scaling and one-hot encoding of musical features for neural network processing.
1. **Model Training and Prediction**: Supports building and training of LSTM and Transformer-based models for music prediction.
1. **Music Generation**: Generates new music tracks by predicting sequences of musical notes.

## Components

There are three classes in `djai`. The `DataProcessor` class is used internally tomanages feature extraction and sequence generation from MIDI files and performs preprocessing tasks such as feature scaling and encoding. `DataProcessor` is automatically called in the second class, `ModelManager`, which facilitates the creation, training, and management of neural network models. `ModelManager` supports three kinds of architectures: *LSTM*, *GRU* and *transformer* and provides functionalities for model training, prediction, and music generation. The third class, `PositionalEncoding`, is a custom Tensorflow layer used internally to build transformer models.

## Example

The maestro data set comprises hundreds of midi files. Only three were selected to showcase the `DjFlow` class. To scan the files, use the `scan_midi_files` utility.

In [2]:
midi_files = djai.scan_midi_files('_djai-files/_maestro-sample')

The model can be created with a class instanciation comprising a long list of arguments.

In [3]:
deep_djmodel = djai.ModelManager(
    sequence_length_i=30, sequence_length_o=10,
    num_instruments=1, model_type='gru',
    n_layers=5, n_units=64, dropout=0.1, batch_size=32,
    learning_rate=0.005, loss_weights=None
)

### Understanding Model Configuration in `djalgo`

#### Key Parameters and Their Impact on Model Performance

In the `djalgo` module, several parameters play critical roles in defining how the neural network learns and generates music based on MIDI files. Let's break down these parameters for better clarity.

##### Sequence Length
- **`sequence_length_i`** and **`sequence_length_o`** determine the number of notes the model uses to make predictions. Specifically, `sequence_length_i` refers to the number of input notes used to predict the next `sequence_length_o` notes. For example, setting `sequence_length_i` to 30 and `sequence_length_o` to 10 means the model uses 30 notes to predict the subsequent 10 notes.

##### Number of Instruments
- **`num_instruments`** specifies how many different instruments the model should consider. This parameter is crucial for models trained on diverse ensembles. Note that training on MIDI files with fewer instruments than specified can lead to inefficiencies and unnecessary computational overhead.

##### Model Type
- **`model_type`** can be set to `'lstm'`, `'gru'`, or `'transformer'`:
  - **LSTMs** (Long Short-Term Memory networks) are more traditional and capable but tend to be complex.
  - **GRUs** (Gated Recurrent Units) aim to simplify the architecture of LSTMs with fewer parameters while maintaining performance.
  - **Transformers** are at the forefront of current large language model (LLM) technology, offering potentially superior learning capabilities due to their attention mechanisms, albeit at the cost of increased complexity and computational demands.

##### Architecture Configuration
- **`n_layers`** and **`n_units`** control the depth and width of the neural network. `n_layers` is the number of layers in the network, and `n_units` represents the number of neurons in each of these layers.

##### Training Dynamics
- **`dropout`** is a technique to prevent overfitting by randomly deactivating a portion of the neurons during training, specified by a ratio between 0 and 1.
- **`batch_size`** affects how many samples are processed before the model updates its internal parameters, impacting both training speed and convergence behavior.
- **`learning_rate`** influences the step size at each iteration in the training process. A higher learning rate can cause overshooting optimal solutions, while a very low rate may lead to slow convergence.

##### Loss Weights
- **`loss_weights`** allows customization of the importance of different prediction components such as pitch, duration, offset, and time delta, potentially skewing the model to prioritize accuracy in specific areas.

### Fitting the Model

To train the model, you use the `.fit()` method with a list of MIDI file paths. The number of epochs, which represent complete passes over the entire dataset, can be adjusted according to the complexity of the task and desired accuracy. More epochs typically lead to better model performance but require more time to complete.

This configuration gives a comprehensive view of how `djalgo` harnesses advanced neural network architectures to generate music, allowing users to tailor the learning process to specific needs and datasets.

In [4]:
history = deep_djmodel.fit(midi_files, epochs=5)

Epoch 1/5
[1m311/311[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 103ms/step - instrument_index_accuracy: 0.9812 - loss: 7.2406
Epoch 2/5
[1m311/311[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 105ms/step - instrument_index_accuracy: 0.9994 - loss: 2.6191
Epoch 3/5
[1m311/311[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 98ms/step - instrument_index_accuracy: 1.0000 - loss: 2.5502
Epoch 4/5
[1m311/311[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 99ms/step - instrument_index_accuracy: 1.0000 - loss: 2.3235
Epoch 5/5
[1m311/311[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 98ms/step - instrument_index_accuracy: 1.0000 - loss: 2.1008


Models are long to fit, so you might want to save it for future use.

In [6]:
deep_djmodel.save('_djai-files/lstm.keras')

To predict a new sequence, you can use the `.generate()` method of the ModelManager object. The predict method takes the first notes of a MIDI file (defined in `sequence_length_i`) and returns a Djalgo track or, for multiple instruments, a list of tracks. Make sure that the MIDI file has enough notes.

In [22]:
predictions = deep_djmodel.generate(midi_files[0], length=10)
predictions

[[(71, 0.0021451586, 1.2516425),
  (71, 0.0021453972, 1.2528331),
  (71, 0.0021450035, 1.2544166),
  (71, 0.0021549403, 1.2575257),
  (71, 0.00215076, 1.2582741),
  (71, 0.0021510506, 1.2592133),
  (71, 0.0021390854, 1.2571211),
  (71, 0.002152808, 1.2582511),
  (71, 0.002152864, 1.2610896),
  (71, 0.0021475654, 1.2620306)]]

Predictions are clearly not suited yet for music.