# Infer-8-Model-Selection : Selection et Comparaison de Modeles

**Serie** : Programmation Probabiliste avec Infer.NET (8/12)  
**Duree estimee** : 45 minutes  
**Prerequis** : Infer-7-Classification

---

## Objectifs

- Comprendre le probleme du surapprentissage
- Calculer l'evidence du modele (marginal likelihood)
- Utiliser le facteur de Bayes pour comparer des modeles
- Implementer l'Automatic Relevance Determination (ARD)

---

## Navigation

| Precedent | Suivant |
|-----------|--------|
| [Infer-7-Classification](Infer-7-Classification.ipynb) | [Infer-9-Topic-Models](Infer-9-Topic-Models.ipynb) |

---

## 1. Configuration

In [None]:
#r "nuget: Microsoft.ML.Probabilistic"
#r "nuget: Microsoft.ML.Probabilistic.Compiler"

using Microsoft.ML.Probabilistic;
using Microsoft.ML.Probabilistic.Distributions;
using Microsoft.ML.Probabilistic.Utilities;
using Microsoft.ML.Probabilistic.Math;
using Microsoft.ML.Probabilistic.Models;
using Microsoft.ML.Probabilistic.Algorithms;
using Microsoft.ML.Probabilistic.Compiler;

Console.WriteLine("Infer.NET pret !");

## 2. Le Probleme du Surapprentissage

### Observation

Un modele complexe peut parfaitement ajuster les donnees d'entrainement mais mal generaliser.

### Exemple

Ajuster un polynome de degre n-1 a n points : ajustement parfait mais prediction catastrophique.

### Solution bayesienne

- Les priors penalisent les modeles complexes
- L'evidence du modele equilibre ajustement et complexite
- C'est le **rasoir d'Occam bayesien**

## 3. Evidence du Modele (Marginal Likelihood)

### Definition

$$P(D|M) = \int P(D|\theta, M) P(\theta|M) d\theta$$

L'evidence est la probabilite des donnees sous le modele, marginalisee sur les parametres.

### Interpretation

- Un modele simple fait des predictions moins precises mais moins dispersees
- Un modele complexe fait des predictions plus precises mais plus dispersees
- L'evidence favorise le bon equilibre

In [None]:
// Calcul de l'evidence avec Infer.NET

// Donnees
double[] observations = { 13, 15, 17, 14, 16, 15, 18 };
int n = observations.Length;

// MODELE 1 : Une seule gaussienne
Variable<bool> evidence1 = Variable.Bernoulli(0.5).Named("evidence1");

using (Variable.If(evidence1))
{
    Variable<double> moyenne1 = Variable.GaussianFromMeanAndPrecision(15, 0.01);
    Variable<double> precision1 = Variable.GammaFromShapeAndScale(2, 0.5);
    
    for (int i = 0; i < n; i++)
    {
        Variable<double> obs1 = Variable.GaussianFromMeanAndPrecision(moyenne1, precision1);
        obs1.ObservedValue = observations[i];
    }
}

InferenceEngine moteur1 = new InferenceEngine();
moteur1.Compiler.CompilerChoice = CompilerChoice.Roslyn;

double logEvidence1 = moteur1.Infer<Bernoulli>(evidence1).LogOdds;

Console.WriteLine("=== Evidence du Modele ===");
Console.WriteLine($"\nModele 1 (1 gaussienne) : log evidence = {logEvidence1:F2}");

In [None]:
// MODELE 2 : Melange de deux gaussiennes
Variable<bool> evidence2 = Variable.Bernoulli(0.5).Named("evidence2");

using (Variable.If(evidence2))
{
    Variable<double> moyenne2a = Variable.GaussianFromMeanAndPrecision(10, 0.01);
    Variable<double> moyenne2b = Variable.GaussianFromMeanAndPrecision(20, 0.01);
    Variable<double> precision2 = Variable.GammaFromShapeAndScale(2, 0.5);
    Variable<double> poidsMixte = Variable.Beta(1, 1);
    
    for (int i = 0; i < n; i++)
    {
        Variable<bool> composante = Variable.Bernoulli(poidsMixte);
        Variable<double> obs2 = Variable.New<double>();
        using (Variable.If(composante))
        {
            obs2.SetTo(Variable.GaussianFromMeanAndPrecision(moyenne2a, precision2));
        }
        using (Variable.IfNot(composante))
        {
            obs2.SetTo(Variable.GaussianFromMeanAndPrecision(moyenne2b, precision2));
        }
        obs2.ObservedValue = observations[i];
    }
}

InferenceEngine moteur2 = new InferenceEngine(new VariationalMessagePassing());
moteur2.Compiler.CompilerChoice = CompilerChoice.Roslyn;

double logEvidence2 = moteur2.Infer<Bernoulli>(evidence2).LogOdds;

Console.WriteLine($"Modele 2 (melange 2 gaussiennes) : log evidence = {logEvidence2:F2}");

## 4. Facteur de Bayes

### Definition

$$BF_{12} = \frac{P(D|M_1)}{P(D|M_2)} = \exp(\log E_1 - \log E_2)$$

### Interpretation (echelle de Jeffreys)

| log(BF) | BF | Evidence pour M1 |
|---------|----|-----------------|
| 0-1 | 1-3 | Negligeable |
| 1-2 | 3-10 | Substantielle |
| 2-3 | 10-30 | Forte |
| 3-5 | 30-150 | Tres forte |
| >5 | >150 | Decisive |

In [None]:
// Facteur de Bayes
double logBF = logEvidence1 - logEvidence2;
double BF = Math.Exp(logBF);

Console.WriteLine("=== Facteur de Bayes ===");
Console.WriteLine($"\nlog(BF) = {logBF:F2}");
Console.WriteLine($"BF = {BF:F2}");

string interpretation;
if (Math.Abs(logBF) < 1) interpretation = "Evidence negligeable";
else if (Math.Abs(logBF) < 2) interpretation = "Evidence substantielle";
else if (Math.Abs(logBF) < 3) interpretation = "Evidence forte";
else interpretation = "Evidence tres forte/decisive";

string favori = logBF > 0 ? "Modele 1 (1 gaussienne)" : "Modele 2 (melange)";
Console.WriteLine($"\n{interpretation} en faveur de : {favori}");

## 5. Selection du Nombre de Composantes

### Application

Determiner le nombre optimal de composantes dans un modele de melange.

In [None]:
// Donnees bimodales
double[] dataBimodal = { 5, 6, 7, 5.5, 6.5, 15, 16, 17, 14, 15.5, 16.5, 6, 15 };
int nBi = dataBimodal.Length;

Console.WriteLine("=== Selection du nombre de composantes ===");
Console.WriteLine($"Donnees : {string.Join(", ", dataBimodal)}\n");

// Test avec 1, 2, 3 composantes
double[] logEvidences = new double[3];

// 1 composante
{
    Variable<bool> ev = Variable.Bernoulli(0.5);
    using (Variable.If(ev))
    {
        Variable<double> m = Variable.GaussianFromMeanAndPrecision(10, 0.01);
        Variable<double> p = Variable.GammaFromShapeAndScale(2, 0.5);
        foreach (var d in dataBimodal)
        {
            Variable<double> o = Variable.GaussianFromMeanAndPrecision(m, p);
            o.ObservedValue = d;
        }
    }
    var eng = new InferenceEngine();
    eng.Compiler.CompilerChoice = CompilerChoice.Roslyn;
    logEvidences[0] = eng.Infer<Bernoulli>(ev).LogOdds;
}

Console.WriteLine($"1 composante : log evidence = {logEvidences[0]:F2}");

// 2 composantes - simplifie
{
    Variable<bool> ev = Variable.Bernoulli(0.5);
    using (Variable.If(ev))
    {
        Variable<double> m1 = Variable.GaussianFromMeanAndPrecision(6, 0.1);
        Variable<double> m2 = Variable.GaussianFromMeanAndPrecision(15, 0.1);
        Variable<double> p = Variable.GammaFromShapeAndScale(2, 1);
        Variable<double> w = Variable.Beta(1, 1);
        
        foreach (var d in dataBimodal)
        {
            Variable<bool> c = Variable.Bernoulli(w);
            Variable<double> o = Variable.New<double>();
            using (Variable.If(c)) { o.SetTo(Variable.GaussianFromMeanAndPrecision(m1, p)); }
            using (Variable.IfNot(c)) { o.SetTo(Variable.GaussianFromMeanAndPrecision(m2, p)); }
            o.ObservedValue = d;
        }
    }
    var eng = new InferenceEngine(new VariationalMessagePassing());
    eng.Compiler.CompilerChoice = CompilerChoice.Roslyn;
    logEvidences[1] = eng.Infer<Bernoulli>(ev).LogOdds;
}

Console.WriteLine($"2 composantes : log evidence = {logEvidences[1]:F2}");

// Meilleur modele
int meilleur = logEvidences[0] > logEvidences[1] ? 1 : 2;
Console.WriteLine($"\n=> Le modele a {meilleur} composante(s) est prefere");

## 6. Automatic Relevance Determination (ARD)

### Principe

ARD utilise des priors hierarchiques pour determiner automatiquement quelles features sont pertinentes.

### Modele

$$\alpha_f \sim \text{Gamma}(a, b)$$
$$w_f \sim \mathcal{N}(0, \alpha_f^{-1})$$

Si $\alpha_f$ devient grand, le poids $w_f$ est contraint pres de 0 -> feature non pertinente.

In [None]:
// ARD pour regression

// Donnees : y = 2*x1 + 0*x2 + 3*x3 + bruit
// x2 est une feature non pertinente
int nSamples = 20;
int nFeatures = 3;
Random rng = new Random(42);

double[,] X = new double[nSamples, nFeatures];
double[] y = new double[nSamples];
double[] vraisPoids = { 2.0, 0.0, 3.0 };  // x2 a poids 0

for (int i = 0; i < nSamples; i++)
{
    for (int f = 0; f < nFeatures; f++)
    {
        X[i, f] = rng.NextDouble() * 2 - 1;  // [-1, 1]
    }
    y[i] = vraisPoids[0] * X[i, 0] + vraisPoids[1] * X[i, 1] + vraisPoids[2] * X[i, 2]
           + rng.NextDouble() * 0.5 - 0.25;  // Bruit
}

Console.WriteLine("=== ARD : Automatic Relevance Determination ===");
Console.WriteLine($"\nVrais poids : w1={vraisPoids[0]}, w2={vraisPoids[1]} (non pertinent), w3={vraisPoids[2]}");

In [None]:
// Modele ARD
Range sampleRange = new Range(nSamples).Named("sample");
Range featureRange = new Range(nFeatures).Named("feature");

// Precisions par feature (ARD)
VariableArray<double> alpha = Variable.Array<double>(featureRange).Named("alpha");
alpha[featureRange] = Variable.GammaFromShapeAndScale(1, 1).ForEach(featureRange);

// Poids avec prior dependant de alpha
VariableArray<double> poids = Variable.Array<double>(featureRange).Named("poids");
using (Variable.ForEach(featureRange))
{
    poids[featureRange] = Variable.GaussianFromMeanAndPrecision(0, alpha[featureRange]);
}

// Bruit de l'observation
Variable<double> noisePrecision = Variable.GammaFromShapeAndScale(2, 1).Named("noise");

// Donnees
VariableArray2D<double> xVar = Variable.Array<double>(sampleRange, featureRange).Named("x");
VariableArray<double> yVar = Variable.Array<double>(sampleRange).Named("y");

using (Variable.ForEach(sampleRange))
{
    Variable<double> prediction = Variable.Constant(0.0);
    for (int f = 0; f < nFeatures; f++)
    {
        prediction = prediction + poids[f] * xVar[sampleRange, f];
    }
    yVar[sampleRange] = Variable.GaussianFromMeanAndPrecision(prediction, noisePrecision);
}

xVar.ObservedValue = X;
yVar.ObservedValue = y;

InferenceEngine moteurARD = new InferenceEngine(new ExpectationPropagation());
moteurARD.Compiler.CompilerChoice = CompilerChoice.Roslyn;

Gaussian[] poidsPost = moteurARD.Infer<Gaussian[]>(poids);
Gamma[] alphaPost = moteurARD.Infer<Gamma[]>(alpha);

Console.WriteLine("\nResultats ARD :");
for (int f = 0; f < nFeatures; f++)
{
    double wMean = poidsPost[f].GetMean();
    double wStd = Math.Sqrt(poidsPost[f].GetVariance());
    double alphaMean = alphaPost[f].GetMean();
    string relevance = alphaMean > 5 ? "faible" : alphaMean > 1 ? "moyenne" : "haute";
    Console.WriteLine($"  Feature {f+1} : poids = {wMean:F2} +/- {wStd:F2}, alpha = {alphaMean:F2} (pertinence {relevance})");
}

Console.WriteLine("\n=> Les features avec alpha eleve sont considerees non pertinentes");

## 7. Validation Croisee Bayesienne

### Principe

Au lieu de diviser les donnees, utiliser la predictive posterieure pour evaluer le modele.

### Leave-One-Out (LOO)

$$\text{LOO-CV} = \sum_{i=1}^n \log P(y_i | y_{-i}, M)$$

In [None]:
// Validation LOO simplifiee

double[] dataLOO = { 10, 12, 11, 13, 12, 11, 14, 10, 12, 11 };
int nLOO = dataLOO.Length;

double totalLogPred = 0;

for (int i = 0; i < nLOO; i++)
{
    // Entrainer sur toutes les donnees sauf i
    Variable<double> mu = Variable.GaussianFromMeanAndPrecision(10, 0.01);
    Variable<double> prec = Variable.GammaFromShapeAndScale(2, 0.5);
    
    for (int j = 0; j < nLOO; j++)
    {
        if (j != i)
        {
            Variable<double> obs = Variable.GaussianFromMeanAndPrecision(mu, prec);
            obs.ObservedValue = dataLOO[j];
        }
    }
    
    InferenceEngine eng = new InferenceEngine();
    eng.Compiler.CompilerChoice = CompilerChoice.Roslyn;
    
    Gaussian muPost = eng.Infer<Gaussian>(mu);
    Gamma precPost = eng.Infer<Gamma>(prec);
    
    // Probabilite predictive pour le point i
    double predMean = muPost.GetMean();
    double predVar = muPost.GetVariance() + 1.0 / precPost.GetMean();
    
    double logProb = Gaussian.FromMeanAndVariance(predMean, predVar).GetLogProb(dataLOO[i]);
    totalLogPred += logProb;
}

Console.WriteLine("=== Validation Leave-One-Out ===");
Console.WriteLine($"\nLog predictive totale : {totalLogPred:F2}");
Console.WriteLine($"Log predictive moyenne : {totalLogPred / nLOO:F2}");

## 8. Exercice : Comparer Polynomes

### Enonce

Comparez trois modeles de regression :
- Lineaire : y = a*x + b
- Quadratique : y = a*x^2 + b*x + c
- Cubique : y = a*x^3 + b*x^2 + c*x + d

Sur des donnees lineaires avec bruit.

In [None]:
// EXERCICE : Comparaison de modeles polynomiaux

// Donnees lineaires : y = 2*x + 1 + bruit
double[] xPoly = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
double[] yPoly = { 1.2, 3.1, 4.8, 7.2, 8.9, 11.1, 13.0, 14.8, 17.2, 19.1 };
int nPoly = xPoly.Length;

Console.WriteLine("=== Comparaison de Modeles Polynomiaux ===");
Console.WriteLine("Vraie relation : y = 2*x + 1\n");

// Modele lineaire
Variable<bool> evLin = Variable.Bernoulli(0.5);
using (Variable.If(evLin))
{
    Variable<double> a = Variable.GaussianFromMeanAndVariance(0, 10);
    Variable<double> b = Variable.GaussianFromMeanAndVariance(0, 10);
    Variable<double> noise = Variable.GammaFromShapeAndScale(2, 0.5);
    
    for (int i = 0; i < nPoly; i++)
    {
        Variable<double> pred = a * xPoly[i] + b;
        Variable<double> obs = Variable.GaussianFromMeanAndPrecision(pred, noise);
        obs.ObservedValue = yPoly[i];
    }
}
var engLin = new InferenceEngine();
engLin.Compiler.CompilerChoice = CompilerChoice.Roslyn;
double logEvLin = engLin.Infer<Bernoulli>(evLin).LogOdds;
Console.WriteLine($"Modele lineaire : log evidence = {logEvLin:F2}");

// Modele quadratique
Variable<bool> evQuad = Variable.Bernoulli(0.5);
using (Variable.If(evQuad))
{
    Variable<double> a = Variable.GaussianFromMeanAndVariance(0, 10);
    Variable<double> b = Variable.GaussianFromMeanAndVariance(0, 10);
    Variable<double> c = Variable.GaussianFromMeanAndVariance(0, 10);
    Variable<double> noise = Variable.GammaFromShapeAndScale(2, 0.5);
    
    for (int i = 0; i < nPoly; i++)
    {
        Variable<double> pred = a * xPoly[i] * xPoly[i] + b * xPoly[i] + c;
        Variable<double> obs = Variable.GaussianFromMeanAndPrecision(pred, noise);
        obs.ObservedValue = yPoly[i];
    }
}
var engQuad = new InferenceEngine();
engQuad.Compiler.CompilerChoice = CompilerChoice.Roslyn;
double logEvQuad = engQuad.Infer<Bernoulli>(evQuad).LogOdds;
Console.WriteLine($"Modele quadratique : log evidence = {logEvQuad:F2}");

// Meilleur modele
string meilleurMod = logEvLin > logEvQuad ? "Lineaire" : "Quadratique";
Console.WriteLine($"\n=> Le modele {meilleurMod} est prefere (rasoir d'Occam)");

## 9. Resume

| Concept | Description |
|---------|-------------|
| **Evidence** | P(D\|M) - vraisemblance marginale |
| **Facteur de Bayes** | Ratio d'evidences pour comparer modeles |
| **Rasoir d'Occam** | Preference automatique pour modeles simples |
| **ARD** | Selection automatique de features |
| **LOO-CV** | Validation sans diviser les donnees |

---

## Prochaine etape

Dans [Infer-9-Topic-Models](Infer-9-Topic-Models.ipynb), nous explorerons :

- Latent Dirichlet Allocation (LDA)
- Modelisation de topics dans les documents
- Inference sur structures hierarchiques complexes