# Infer-11-Sequences : Hidden Markov Models et Series Temporelles

**Serie** : Programmation Probabiliste avec Infer.NET (11/12)  
**Duree estimee** : 65 minutes  
**Prerequis** : Infer-10-Crowdsourcing

---

## Objectifs

- Comprendre les Hidden Markov Models (HMM)
- Implementer les emissions gaussiennes
- Decoder les sequences d'etats caches
- Appliquer au motif finding (bioinformatique)

---

## Navigation

| Precedent | Suivant |
|-----------|--------|
| [Infer-10-Crowdsourcing](Infer-10-Crowdsourcing.ipynb) | [Infer-12-Recommenders](Infer-12-Recommenders.ipynb) |

---

## 1. Configuration

In [None]:
#r "nuget: Microsoft.ML.Probabilistic"
#r "nuget: Microsoft.ML.Probabilistic.Compiler"

using Microsoft.ML.Probabilistic;
using Microsoft.ML.Probabilistic.Distributions;
using Microsoft.ML.Probabilistic.Utilities;
using Microsoft.ML.Probabilistic.Math;
using Microsoft.ML.Probabilistic.Models;
using Microsoft.ML.Probabilistic.Algorithms;
using Microsoft.ML.Probabilistic.Compiler;

Console.WriteLine("Infer.NET pret !");

## 2. Introduction aux HMM

### Structure

Un HMM est defini par :
- **Etats caches** : $z_t$ - non observables
- **Observations** : $x_t$ - dependant de l'etat cache
- **Transitions** : $P(z_t | z_{t-1})$
- **Emissions** : $P(x_t | z_t)$

### Schema

```
z_1 --> z_2 --> z_3 --> ... --> z_T  (etats caches)
 |       |       |               |
 v       v       v               v
x_1     x_2     x_3     ...     x_T  (observations)
```

### Applications

| Domaine | Etats caches | Observations |
|---------|-------------|---------------|
| NLP | POS tags | Mots |
| Finance | Regime de marche | Prix |
| Bio | Gene/Intergene | Sequence ADN |
| Meteo | Vrai temps | Mesures capteurs |

## 3. HMM avec Emissions Gaussiennes

In [None]:
// Parametres du HMM
int nStates = 2;  // Deux etats : "Normal" et "Anomalie"
int T = 10;       // Longueur de sequence

// Donnees observees (simulees : normal ~10, anomalie ~25)
double[] observations = { 9.5, 11.2, 10.8, 24.5, 26.1, 25.3, 10.1, 9.8, 11.5, 10.2 };
// Vrais etats : 0, 0, 0, 1, 1, 1, 0, 0, 0, 0

Console.WriteLine("=== HMM : Detection d'Anomalies ===");
Console.WriteLine($"\nObservations : {string.Join(", ", observations.Select(o => o.ToString("F1")))}");
Console.WriteLine("\nEtats attendus : Normal(~10) -> Anomalie(~25) -> Normal(~10)");

In [None]:
// Definition du modele HMM

Range stateRange = new Range(nStates).Named("state");
Range timeRange = new Range(T).Named("time");

// Distribution initiale
Variable<Vector> probInit = Variable.Dirichlet(new Dirichlet(1, 1)).Named("probInit");

// Matrice de transition (lignes = etat courant, colonnes = etat suivant)
VariableArray<Vector> transMatrix = Variable.Array<Vector>(stateRange).Named("transMatrix");
transMatrix[stateRange] = Variable.Dirichlet(new Dirichlet(5, 1)).ForEach(stateRange);  // Favorise rester dans le meme etat

// Parametres d'emission par etat
VariableArray<double> emitMean = Variable.Array<double>(stateRange).Named("emitMean");
VariableArray<double> emitPrec = Variable.Array<double>(stateRange).Named("emitPrec");

// Priors sur les emissions
emitMean[0] = Variable.GaussianFromMeanAndVariance(10, 10);  // Etat 0 : Normal
emitMean[1] = Variable.GaussianFromMeanAndVariance(25, 10);  // Etat 1 : Anomalie
emitPrec[stateRange] = Variable.GammaFromShapeAndScale(2, 0.5).ForEach(stateRange);

// Sequence d'etats
VariableArray<int> states = Variable.Array<int>(timeRange).Named("states");

// Observations
VariableArray<double> obs = Variable.Array<double>(timeRange).Named("obs");

Console.WriteLine("Variables HMM definies.");

In [None]:
// Modele de sequence (simplifie sans ForEach temporel)
// Note : Infer.NET a des limitations pour les HMM complets

// Approche simplifiee : inferer chaque etat independamment
// (perd les dependances temporelles mais illustre le concept)

Console.WriteLine("\n=== Inference des etats (approche simplifiee) ===");
Console.WriteLine();

for (int t = 0; t < T; t++)
{
    // Pour chaque observation, determiner l'etat le plus probable
    Variable<int> etat = Variable.DiscreteUniform(nStates);
    Variable<double> obsVar = Variable.New<double>();
    
    // Emission selon l'etat
    using (Variable.Case(etat, 0))
    {
        obsVar.SetTo(Variable.GaussianFromMeanAndPrecision(10, 1));  // Normal
    }
    using (Variable.Case(etat, 1))
    {
        obsVar.SetTo(Variable.GaussianFromMeanAndPrecision(25, 1));  // Anomalie
    }
    
    obsVar.ObservedValue = observations[t];
    
    InferenceEngine eng = new InferenceEngine();
    eng.Compiler.CompilerChoice = CompilerChoice.Roslyn;
    
    Discrete etatPost = eng.Infer<Discrete>(etat);
    int etatMAP = etatPost.GetProbs()[0] > 0.5 ? 0 : 1;
    string nomEtat = etatMAP == 0 ? "Normal" : "Anomalie";
    
    Console.WriteLine($"t={t} : obs={observations[t]:F1}, P(Normal)={etatPost.GetProbs()[0]:F2}, P(Anomalie)={etatPost.GetProbs()[1]:F2} -> {nomEtat}");
}

## 4. Detection de Regimes Meteo

In [None]:
// Exemple : Detection de regimes meteo (Soleil/Pluie) a partir de temperature

// Soleil : temperature ~22C
// Pluie : temperature ~15C

double[] tempJour = { 21, 23, 22, 20, 15, 14, 16, 15, 14, 21, 22, 23 };
int Tmeteo = tempJour.Length;

Console.WriteLine("=== Detection Regimes Meteo ===");
Console.WriteLine($"\nTemperatures : {string.Join(", ", tempJour)}\n");

for (int t = 0; t < Tmeteo; t++)
{
    Variable<int> meteo = Variable.DiscreteUniform(2);
    Variable<double> temp = Variable.New<double>();
    
    using (Variable.Case(meteo, 0))  // Soleil
    {
        temp.SetTo(Variable.GaussianFromMeanAndVariance(22, 4));
    }
    using (Variable.Case(meteo, 1))  // Pluie
    {
        temp.SetTo(Variable.GaussianFromMeanAndVariance(15, 4));
    }
    
    temp.ObservedValue = tempJour[t];
    
    InferenceEngine eng = new InferenceEngine();
    eng.Compiler.CompilerChoice = CompilerChoice.Roslyn;
    
    Discrete meteoPost = eng.Infer<Discrete>(meteo);
    string regime = meteoPost.GetProbs()[0] > 0.5 ? "Soleil" : "Pluie";
    
    Console.WriteLine($"Jour {t+1,2} : {tempJour[t]:F0}C -> {regime} (P={meteoPost.GetProbs().Max():F2})");
}

## 5. Motif Finding (Bioinformatique)

### Probleme

Trouver des motifs conserves dans des sequences ADN.

### Modele

- Arriere-plan : nucleotides uniformes (A, C, G, T)
- Motif : positions avec distributions specifiques

In [None]:
// Motif Finding simplifie

// Sequences ADN (codees : A=0, C=1, G=2, T=3)
int[][] sequences = {
    new[] { 0, 1, 2, 0, 0, 1, 3, 2, 0, 1 },  // ...ACGAACTGAC
    new[] { 3, 0, 0, 1, 2, 3, 0, 1, 2, 0 },  // ...TAACGTACGA
    new[] { 2, 0, 0, 1, 1, 0, 3, 0, 1, 2 }   // ...GAACCATACG
};

// Motif cible : "AAC" aux positions 2-4 dans seq 1, 1-3 dans seq 2, 1-3 dans seq 3

Console.WriteLine("=== Motif Finding ===");
Console.WriteLine("\nSequences ADN (A=0, C=1, G=2, T=3) :");

string[] bases = { "A", "C", "G", "T" };
for (int s = 0; s < sequences.Length; s++)
{
    string seqStr = string.Join("", sequences[s].Select(n => bases[n]));
    Console.WriteLine($"  Seq {s+1} : {seqStr}");
}

Console.WriteLine("\nRecherche du motif conserve...");

In [None]:
// Comptage des k-mers
int motifLen = 3;
var kmerCounts = new Dictionary<string, int>();

foreach (var seq in sequences)
{
    for (int i = 0; i <= seq.Length - motifLen; i++)
    {
        string kmer = string.Join("", seq.Skip(i).Take(motifLen).Select(n => bases[n]));
        if (!kmerCounts.ContainsKey(kmer)) kmerCounts[kmer] = 0;
        kmerCounts[kmer]++;
    }
}

var topKmers = kmerCounts.OrderByDescending(kv => kv.Value).Take(5);

Console.WriteLine("\nTop 5 k-mers (longueur 3) :");
foreach (var kv in topKmers)
{
    Console.WriteLine($"  {kv.Key} : {kv.Value} occurrences");
}

Console.WriteLine($"\n=> Motif candidat : {topKmers.First().Key}");

## 6. Exercice : Detection d'Anomalies

### Enonce

Utilisez un HMM pour detecter des periodes anormales dans une serie temporelle de ventes.

In [None]:
// EXERCICE : Detection d'anomalies dans les ventes

// Ventes journalieres (normal ~100, promo ~200)
double[] ventes = { 98, 105, 102, 99, 195, 210, 205, 198, 103, 97, 101, 100 };

Console.WriteLine("=== Detection Periodes de Promotion ===");
Console.WriteLine($"\nVentes : {string.Join(", ", ventes.Select(v => v.ToString("F0")))}\n");

for (int t = 0; t < ventes.Length; t++)
{
    Variable<int> regime = Variable.DiscreteUniform(2);
    Variable<double> venteVar = Variable.New<double>();
    
    using (Variable.Case(regime, 0))  // Normal
    {
        venteVar.SetTo(Variable.GaussianFromMeanAndVariance(100, 100));
    }
    using (Variable.Case(regime, 1))  // Promotion
    {
        venteVar.SetTo(Variable.GaussianFromMeanAndVariance(200, 100));
    }
    
    venteVar.ObservedValue = ventes[t];
    
    InferenceEngine eng = new InferenceEngine();
    eng.Compiler.CompilerChoice = CompilerChoice.Roslyn;
    
    Discrete regimePost = eng.Infer<Discrete>(regime);
    string etat = regimePost.GetProbs()[1] > 0.5 ? "PROMO" : "Normal";
    
    Console.WriteLine($"Jour {t+1,2} : {ventes[t],3:F0} -> {etat} (P={regimePost.GetProbs().Max():F2})");
}

Console.WriteLine("\n=> Jours 5-8 detectes comme periode de promotion");

## 7. Resume

| Concept | Description |
|---------|-------------|
| **HMM** | Modele a etats caches avec dependances temporelles |
| **Emissions** | Distribution des observations selon l'etat |
| **Transitions** | Probabilites de changement d'etat |
| **Viterbi** | Algorithme pour trouver la sequence d'etats optimale |
| **Forward-Backward** | Calcul des probabilites marginales |

---

## Prochaine etape

Dans [Infer-12-Recommenders](Infer-12-Recommenders.ipynb), nous explorerons :

- Les systemes de recommandation
- La factorisation matricielle
- Le modele ClickModel pour sources multiples