# 8. 🤖 Machine learning

> **Note**. Djalgo's AI approach produces uniform outcomes. Want to help? github.com/essicolo/djalgo 

We introduced machine learning while fitting Gaussian processes in section [5. Walks](05_walks.html). Djalgo's module `djai` includes tools for modeling music from MIDI data relying on Tensorflow (a package for deep learning). `djai` is not loaded by default when importing Djalgo, since otherwise Tensorflow, a large and complicated package, should have been added to Djalgo's dependencies. To use `djai`, you must [install Tensorflow](https://www.tensorflow.org/install) in your environment. `djai` also rely on Music21 to load and process MIDI files: you should also install it with `!pip install music21`. Although Music21 is not as fast as Pretty-midi to process MIDI files, I had a better experience with processing files with it. `djai` should be loaded as:

In [1]:
import djalgo as dj
from djalgo import djai

2024-04-30 15:00:38.225849: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


The `djai` module is designed for processing MIDI files using deep learning models. It extracts pitch, durations, an offset computed in terms of difference in quarter lengths from the previous note (called tick delta) and the (one-hot encoded) track the note belongs to.

## 💭 Ethics: art as the witnesses of experience

Even though `djai` was the module which took me the most time to develop, it is these days, to my opinion, the least interesting. Who needs to DIY their own AI when interesting results can already be generated with a command prompt to a large langage model (LLM)? My ethos will fluctuate and evolve, as anything should in the precious, short time we exist. Their is nothing inherently wrong with AI, but if your piece was generated with a banal command prompt, your creative process is anything but banal and uninteresting, no matter the result. In times when any artistic piece needed years of work, the result was more important than the process. Now, when anyone can ask a LLM to generate an image of a cat riding a dinausar in space in the style of a mixed of Daly and cyber-punk, well, results are generated within seconds, and the process becomes more relevant. The process can, of course, be interesting *and* imply AI. Indded, if like me you have spent months to design your own AI (which is still not working well...), the *process* (not the result) behind the musical piece has an artistic value as good as any composer who has spent those months studying musical theory. Artists are people who spent the precious time they own to think on the narration of the object they created. When the process becomes applying a reciepe, the result quits art ant belongs to the same category of home sweet home printed carpets sold on Amazon.

That's why the `djai` module doesn't come with pre-trained models. That would have been too easy, right? I prefer seeing you tweak it and train it with your own compositions rather than just use it on Leonard Cohen's songs to generate new ones. You worth more than this, and the world deserves more than command-prompt artists.

> In the quiet moments between the shadow and the light, we find the songs that our hearts forgot to sing. — *"Write an original quote in the style of Leonard Cohen", sent to ChatGPT-4.*

Finally, the process includes both the originality of the approach and the enjoyment of the artist.

![Mastodon post by @AuthorJMac@indiepocalypse.social : You know what the biggest problem with pushing all-things-AI is? Wrong direction. I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.](_images/authorjmac.jpg)

## 🗝️ Features

`djai` has the following features.

1. **MIDI File Scanning**: Scans directories for MIDI files, allowing for selective processing based on user-defined limits.
1. **Feature Extraction**: Extracts musical features such as pitch, duration, and timing from MIDI files.
1. **Data Preprocessing**: Handles scaling and one-hot encoding of musical features for neural network processing.
1. **Model Training and Prediction**: Supports building and training models for music prediction.
1. **Music Generation**: Generates new music tracks by predicting sequences of musical notes.

## 🧩 Components

There are three classes in `djai`. The `DataProcessor` class is used internally tomanages feature extraction and sequence generation from MIDI files and performs preprocessing tasks such as feature scaling and encoding. `DataProcessor` is automatically called in the second class, `ModelManager`, which facilitates the creation, training, and management of neural network models. `ModelManager` supports three kinds of architectures: *LSTM*, *GRU* and *transformer* and provides functionalities for model training, prediction, and music generation. The third class, `PositionalEncoding`, is a custom Tensorflow layer used internally to build transformer models.

## 🪜 Example

I downloaded three midi files were selected to showcase DjAI. To scan the files, use the `scan_midi_files` utility.

In [2]:
midi_files = djai.scan_midi_files('_midi')
midi_files

['_midi/tetris.mid',
 '_midi/pinkpanther.mid',
 '_midi/adams.mid',
 '_midi/rocky.mid',
 '_midi/mario.mid']

The model can be created with a class instanciation comprising a list of arguments, which are explained right away.

In [3]:
model_manager = djai.ModelManager(
    sequence_length_i=24, sequence_length_o=8,
    n_instruments=1, model_type='transformer',
    n_layers=8, n_units=256, dropout=0.3, batch_size=64,
    learning_rate=0.0001, n_heads=8
)

### 🎛️ Understanding Model Configuration in `djai`

#### Key Parameters and Their Impact on Model Performance

In the `djai` module, several parameters play critical roles in defining how the neural network learns and generates music based on MIDI files. Let's break down these parameters for better clarity.

##### Sequence Length
- `sequence_length_i` and `sequence_length_o` determine the number of notes the model uses to make predictions. Specifically, `sequence_length_i` refers to the number of input notes used to predict the next `sequence_length_o` notes. For example, setting `sequence_length_i` to 30 and `sequence_length_o` to 10 means the model uses 30 notes to predict the subsequent 10 notes. Even though the model predicts a sequence, DjAI retains only the first prediction, an approach named *teacher forcing*. The autoregressive approach removes the first item of the sequence to predict from, then append the newly predicted one as basis to predict the next. 

##### Number of Instruments
- `n_instruments` specifies how many different instruments the model should consider, starting from the first of each MIDI file. This parameter is crucial for models trained on diverse ensembles. Note that training on MIDI files with fewer instruments than specified can lead to inefficiencies and unnecessary computational overhead.

##### Model Type
- `model_type` can be set to `'lstm'`, `'gru'`, or `'transformer'`:
  - LSTMs (Long Short-Term Memory networks) are more traditional and capable but tend to be complex.
  - GRUs (Gated Recurrent Units) aim to simplify the architecture of LSTMs with fewer parameters while maintaining performance.
  - Transformers are at the forefront of current large language model (LLM) technology, offering potentially superior learning capabilities due to their attention mechanisms, albeit at the cost of increased complexity and computational demands.

##### Architecture Configuration
- `n_layers` and `n_units` control the depth and width of the neural network. `n_layers` is the number of layers in the network, and `n_units` represents the number of neurons in each of these layers. `n_heads` is the number of heads in the multi-head attention algorithm, and is only taken into account in the *transformer* model type.

##### Training Dynamics
- `dropout` is a technique to prevent overfitting by randomly deactivating a portion of the neurons during training, specified by a ratio between 0 and 1.
- `batch_size` affects how many samples are processed before the model updates its internal parameters, impacting both training speed and convergence behavior.
- `learning_rate` influences the step size at each iteration in the training process. A higher learning rate can cause overshooting optimal solutions, while a very low rate may lead to slow convergence.

##### Loss Weights
- `loss_weights` allows customization of the importance of different prediction components such as pitch, duration, offset, and time delta, potentially skewing the model to prioritize accuracy in specific areas.

### 🏋️ Fitting the Model

To train the model, you use the `.fit()` method with a list of MIDI file paths. The number of epochs, which represent complete passes over the entire dataset, can be adjusted according to the complexity of the task and desired accuracy. More epochs typically lead to better model performance but require more time to complete.

In [None]:
history = model_manager.fit(midi_files, epochs=250)

Models are long to fit, so you might want to save it for future use.

In [6]:
model_manager.save('_output/transformer.keras')

To predict a new sequence, you can use the `.generate()` method of the `djai.ModelManager` object. The generate method takes the first notes of a MIDI file (defined in `sequence_length_i`) and returns a Djalgo track or, for multiple instruments, a list of tracks. Make sure that the MIDI file has enough notes.

In [7]:
predictions = model_manager.generate('_output/polyloop.mid', length=10)
predictions

[(46, 0.922152, 0.54616755),
 (46, 0.9221335, 0.546182),
 (46, 0.9220556, 0.5462458),
 (46, 0.9221041, 0.54620606),
 (46, 0.9221695, 0.54615307),
 (46, 0.9225299, 0.5458617),
 (46, 0.92227435, 0.5460685),
 (46, 0.9221095, 0.546202),
 (46, 0.9220371, 0.5462603),
 (46, 0.9221659, 0.5461564)]

Predictions are uniform and clearly not suited yet for music yet.