# Génération des réseaux à partir des règles et d'un jeu d'essai de formes

Il faut décider de la place des corrections phonologiques dans le traitement et dans l'évaluation.

- Swim1 basé sur les règles de l'échantillon initial
    - Phon : 
        - sans correction phonologique
        - avec correction phonologique
    - Gén-1 : génération des formes d'après les contextes phonologiques
    - Gén-2 : génération du réseau d'après les contextes phonologiques
    - Filt-1 : extraction du sous-réseau symétrique
    - Filt-2 : génération du réseau non-orienté correspondant à Filt-1
    - Filt-3 : extraction des cliques maximales & fidèles
 
- Swim2 basé sur le réseau calculé par Swim1
    - Exp : génération d'un nouvel échantillon basé sur Swim1
        - Phon :
             - sans correction phonologique
             - avec correction phonologique
    - Gén-1 : génération des formes sans contexte phonologique
    - Gén-2 : génération du réseau sans contexte phonologique
    - Filt-1 : extraction du sous-réseau symétrique
    - Filt-2 : génération du réseau non-orienté correspondant à Filt-1
    - Filt-3 : extraction des cliques maximales & fidèles
- Évaluation
    - Phon :
        - sans correction phonologique
        - avec correction phonologique

## Importations
- codecs pour les encodages
- pandas et numpy pour les calculs sur tableaux
- matplotlib pour les graphiques
- itertools pour les itérateurs sophistiqués (paires sur liste, ...)

<a id="top"></a>

Liens :
- [top](#top)
- [finRegles](#finRegles)
- [laverForms](#laverForms)

In [1]:
# -*- coding: utf8 -*-
import codecs,operator,datetime,os,glob,cellbell
import features
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import itertools as it
import pickle
import networkx as nx
#%pylab inline
#pd.options.display.mpl_style = 'default'
debug=False
from __future__ import print_function

from FrenchPhonology import makeFrench,setNeutralisation,normalizePhono
import SWiM_Utils as SwimU
import SWiM_Network as SwimN
from SWiM_Utils import *
from SWiM_Network import *

In [2]:
%matplotlib inline

In [3]:
import yaml

In [4]:
from ipywidgets import FloatProgress
from IPython.display import display, HTML

In [5]:
import datetime
def dateheure():
    return datetime.datetime.utcnow().strftime('%y%m%d%H%M')

In [6]:
saut="\n"

### Préparation des matrices de traits

In [7]:
features.add_config('bdlexique.ini')
fs=features.FeatureSystem('phonemes')
tableNeutralise=setNeutralisation("NS")

# Lecture du Jeu d'essai
- chargement du fichier
    - trialFormes
    - listeTest
- normalisation de la phonologie

In [8]:
trialPrefix="/Users/gilles/ownCloud/Recherche/Boye/HDR/Memoire/Longitudinales/"
trialFile="JeuEssai-200901.csv"
trialFormes=pd.read_csv(trialPrefix+trialFile,sep=";",encoding="utf8")

Liste des lexèmes dans le jeu d'essai
- Sans doute inefficace
    - un lexème peut apparaître plusieurs dans le jeu d'essai avec des lexicalisations différentes

In [9]:
listeTest=trialFormes.lexeme.unique().tolist()

Liste des cases dans le jeu d'essai

## Normalisation des entrées

⚠️ Attention, la normalisation des entrées interagit avec l'analyse déjà faite. Il faut rester cohérent avec le codage utilisé pour la génération des règles

In [10]:
# trialFormes=normalizePhono(trialFormes,tableNeutralise)

# Lecture du Gold
- chargement du fichier
- normalisation de la phonologie

In [11]:
goldPrefix="/Users/gilles/pCloud Drive/FOD/GB/2015-Data/"
goldFile="MGC-171229-Verbes3.pkl"
lexiqueGold=pd.read_pickle(path=goldPrefix+goldFile)

In [12]:
'''Rectifications phonologiques'''
lexiqueGold["phono"]=lexiqueGold["phono"].apply(lambda x: makeFrench(x,tableNeutralise))
completeParadigmes=pd.pivot_table(lexiqueGold, values='phono', index=['lexeme'], columns=['case'], aggfunc=lambda x: ",".join(x)).reset_index().reindex()
completeParadigmes

case,lexeme,ai1P,ai1S,ai2P,ai2S,ai3P,ai3S,fi1P,fi1S,fi2P,...,ppFP,ppFS,ppMP,ppMS,ps1P,ps1S,ps2P,ps2S,ps3P,ps3S
0,abaisser,abEsam,abEsE,abEsat,abEsa,abEsEr,abEsa,abEs9rô,abEs9rE,abEs9rE,...,abEsE,abEsE,abEsE,abEsE,abEsjô,abEs,abEsjE,abEs,abEs,abEs
1,abandonner,abâdOnam,abâdOnE,abâdOnat,abâdOna,abâdOnEr,abâdOna,abâdOn9rô,abâdOn9rE,abâdOn9rE,...,abâdOnE,abâdOnE,abâdOnE,abâdOnE,abâdOnjô,abâdOn,abâdOnjE,abâdOn,abâdOn,abâdOn
2,abasourdir,abazurdim,abazurdi,abazurdit,abazurdi,abazurdir,abazurdi,abazurdirô,abazurdirE,abazurdirE,...,abazurdi,abazurdi,abazurdi,abazurdi,abazurdisjô,abazurdis,abazurdisjE,abazurdis,abazurdis,abazurdis
3,abattre,abatim,abati,abatit,abati,abatir,abati,abatrô,abatrE,abatrE,...,abaty,abaty,abaty,abaty,abatjô,abat,abatjE,abat,abat,abat
4,abcéder,absEdam,absEdE,absEdat,absEda,absEdEr,absEda,absEd9rô,absEd9rE,absEd9rE,...,,,,absEdE,absEdjô,absEd,absEdjE,absEd,absEd,absEd
5,abdiquer,abdikam,abdikE,abdikat,abdika,abdikEr,abdika,abdik9rô,abdik9rE,abdik9rE,...,abdikE,abdikE,abdikE,abdikE,abdikjô,abdik,abdikjE,abdik,abdik,abdik
6,abecquer,abEkam,abEkE,abEkat,abEka,abEkEr,abEka,abEk9rô,abEk9rE,abEk9rE,...,abEkE,abEkE,abEkE,abEkE,abEkjô,abEk,abEkjE,abEk,abEk,abEk
7,aberrer,abEram,abErE,abErat,abEra,abErEr,abEra,abEr9rô,abEr9rE,abEr9rE,...,,,,abErE,abErjô,abEr,abErjE,abEr,abEr,abEr
8,abhorrer,abOram,abOrE,abOrat,abOra,abOrEr,abOra,abOr9rô,abOr9rE,abOr9rE,...,abOrE,abOrE,abOrE,abOrE,abOrjô,abOr,abOrjE,abOr,abOr,abOr
9,abjurer,abZyram,abZyrE,abZyrat,abZyra,abZyrEr,abZyra,abZyr9rô,abZyr9rE,abZyr9rE,...,abZyrE,abZyrE,abZyrE,abZyrE,abZyrjô,abZyr,abZyrjE,abZyr,abZyr,abZyr


In [13]:
'''Mise en liste des formes de références'''
goldTestForms=pd.melt(completeParadigmes[completeParadigmes["lexeme"].isin(listeTest)],id_vars=["lexeme"]).dropna()
goldTestForms["lexeme-case"]=goldTestForms["lexeme"]+"-"+goldTestForms["case"]
goldTestForms.drop(labels=["lexeme","case"],axis=1,inplace=True)
goldTestForms.set_index(["lexeme-case"],inplace=True)

# Choix des règles
⚠️ pour charger les règles PKL, il faut avoir [déclaré les classes correspondantes](https://www.stefaanlippens.net/python-pickling-and-dealing-with-attributeerror-module-object-has-no-attribute-thing.html)

- *rulesPrefix* est une partie du nom des règles

Si les règles sont basées sur des morphomes, il faut créer une correspondance entre morphomes et cases :
- dictMorphomeCases pour remplir le paradigme après traitement

In [14]:
def prefixRules(numero,sampleType="",casesType=""):
    candidats=[]
    matchFile=ur"(^.*/Longitudinal-%s-T\d+-F\d+%s)-Regles\.pkl"%(numero,sampleType+casesType)
#     matchFile=ur"^.*/Longitudinal-Lexique3(-%s-T\d+-F\d+)%s\.pkl"%(numero,sampleType+casesType)
    print (matchFile)
    for sample in rulesFiles:
        m=re.match(matchFile,sample)
        if m:
#            print (sample)
#            print (m.group(1))
            candidats.append(m.group(1))
    if len(candidats)==1:
        print (candidats[0])
        return candidats[0]
    else:
        print ("PB pas de nom unique correspondant",len(candidats),numero,sampleType,casesType)

In [15]:
rulesRep="/Users/gilles/pCloud Drive/FOD/GB/2015-Data/Longitudinales/"
rulesFiles=glob.glob(rulesRep+"*.pkl")

In [20]:
sampleNumero="10"
sampleType="-X"
casesType="-Morphomes"
#sampleType=""
casesType=""
rulesPrefix=prefixRules(sampleNumero,sampleType,casesType)
with open(rulesPrefix+'-Regles.pkl', 'rb') as input:
    analyseRules = pickle.load(input)
SwimN.analyseRules=analyseRules
analyseCases=list(set([c1 for (c1,c2) in analyseRules.keys()]))
SwimN.analyseCases=analyseCases

(^.*/Longitudinal-10-T\d+-F\d+-X)-Regles\.pkl
/Users/gilles/pCloud Drive/FOD/GB/2015-Data/Longitudinales/Longitudinal-10-T110000-F15616-X


In [21]:
def makeMorphomeMaps():
    if "Morphome" in casesType:
        baseMap=pd.read_pickle(rulesPrefix+'.pkl').groupby("morphome")["case"].first().to_dict()
        msp2omp={}
        omp2msp={}
        for k in baseMap:
            mspCases=k.split("/")
            ompCase=baseMap[k]
            for case in mspCases:
                msp2omp[case]=ompCase
                if ompCase not in omp2msp:
                    omp2msp[ompCase]=[]
                omp2msp[ompCase].append(case)
    else:
        msp2omp={case:case for case in analyseCases}
        omp2msp={case:[case] for case in analyseCases}
    return (msp2omp,omp2msp)

In [22]:
(msp2omp,omp2msp)=makeMorphomeMaps()
SwimN.omp2msp=omp2msp
SwimN.msp2omp=msp2omp

<a id="finRegles"></a>
# SWIM1

Liens :
- [top](#top)
- [finRegles](#finRegles)
- [laverForms](#laverForms)

In [23]:
trialGen1=generateFromTrial(trialFormes,contextFree=False)
display(trialGen1)
trialGen2=generateFromTrial(trialGen1,contextFree=True)
display(trialGen2)

Unnamed: 0,lexeme,ii1P,pP,is1S,ii1S,ai3S,ppMP,is3P,is3S,ai1P,...,pc3S,pc3P,pc1P,pc1S,fi3S,fi3P,ai3P,fi1P,fi1S,pI1P
0,laver,,lavâ,,,lava,lave,,,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,
1,laver,,lavâ,,,lava,lave,,,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,
2,laver,,lavâ,,,lava,lave,,,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,
3,laver,,lavâ,,,lava,lave,,,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,
4,laver,,lavâ,,lavE,lava,lave,,,,...,,,,,,lav6rô,lavEr,lav6rô,lav6rE,
5,finir,,finisâ,,,fini,,,fini,,...,finirE,,,,finira,finirô,,finirô,finirE,
6,finir,,finisâ,,,fini,,,fini,,...,finirE,,,,finira,finirô,,finirô,finirE,
7,finir,,finisâ,,,fini,,,fini,,...,finirE,,,,finira,finirô,,finirô,finirE,
8,finir,,finisâ,,,fini,,,fini,,...,finirE,,,,finira,finirô,,finirô,finirE,
9,être,Etjô,Etâ,fys,EtE,fy,,fys,fy,fym,...,s6rE,s6rE,s6rjô,s6rE,s6ra,s6rô,fyr,s6rô,s6rE,swajô


Unnamed: 0,lexeme,ii1P,pP,is1S,ii1S,ai3S,ppMP,is3P,is3S,ai1P,...,pc3S,pc3P,pc1P,pc1S,fi3S,fi3P,ai3P,fi1P,fi1S,pI1P
0,laver,lavjô,lavâ,,lavE,lava,lave,,lava,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,lavô
1,laver,lavjô,lavâ,,lavE,lava,lave,,lava,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,lavô
2,laver,lavjô,lavâ,,lavE,lava,lave,,lava,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,lavô
3,laver,lavjô,lavâ,,lavE,lava,lave,,lava,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,lavô
4,laver,lavjô,lavâ,,lavE,lava,lave,,lava,,...,lav6rE,lav6rE,,lav6rE,lav6ra,lav6rô,lavEr,lav6rô,lav6rE,lavô
5,finir,,finisâ,,,fini,fini,,fini,,...,finirE,,,,finira,finirô,finir,finirô,finirE,
6,finir,,finisâ,,,fini,fini,,fini,,...,finirE,,,,finira,finirô,finir,finirô,finirE,
7,finir,,finisâ,,,fini,fini,,fini,,...,finirE,,,,finira,finirô,finir,finirô,finirE,
8,finir,,finisâ,,,fini,fini,,fini,,...,finirE,,,,finira,finirô,finir,finirô,finirE,
9,être,Etjô,Etâ,fys,EtE,fy,,fys,fy,fym,...,s6rE,s6rE,s6rjô,s6rE,s6ra,s6rô,fyr,s6rô,s6rE,swajô


# To Do
- gérer les morphomes
- gérer les cliques ex-aequo
- colorer les formes en sortie

In [None]:
row=trialGen1.iloc[0]
print (len(row.dropna())-1, row.dropna().to_dict())
rowForms1=generateRowForms(row,True)
rowForms2,digraphe,graphe=generateRowParadigm(row,rowForms1,True)
lexCliques=list(nx.algorithms.clique.find_cliques(graphe))

In [None]:
laverForms=paradigmeDistribution("laver",analyseCases)

laverForms.ajouterFormes("pi3P",analyseRules[("ai3S", "pi3P")].sortirForme(u"lava",True))
laverForms.ajouterFormes("pi3P",analyseRules[("pP", "pi3P")].sortirForme(u"lavâ",True))
laverForms.ajouterFormes("pi3P",analyseRules[("ppFS", "pi3P")].sortirForme(u"lave",True))
laverForms.ajouterFormes("pi3P",analyseRules[("pi3P", "pi3P")].sortirForme(u"lav",True))
laverForms.ajouterFormes("pi3P",analyseRules[("ppMS", "pi3P")].sortirForme(u"lave",True))
laverForms.ajouterFormes("pi3P",analyseRules[("ppMP", "pi3P")].sortirForme(u"lave",True))
laverForms.ajouterFormes("pi3P",analyseRules[("pi1S", "pi3P")].sortirForme(u"lav",True))
laverForms.ajouterFormes("pi3P",analyseRules[("inf", "pi3P")].sortirForme(u"lave",True))
laverForms.ajouterFormes("pi3P",analyseRules[("pi2S", "pi3P")].sortirForme(u"lav",True))
laverForms.ajouterFormes("pi3P",analyseRules[("pi3S", "pi3P")].sortirForme(u"lav",True))


<a id="laverForms"></a>

Liens :
- [top](#top)
- [finRegles](#finRegles)
- [laverForms](#laverForms)

In [None]:
laverForms=paradigmeDistribution("laver",analyseCases)
laverForms.ajouterFormes("ai3S",analyseRules[("pi3P", "ai3S")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("pP",analyseRules[("pi3P", "pP")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("ppFS",analyseRules[("pi3P", "ppFS")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("pi3P",analyseRules[("pi3P", "pi3P")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("ppMS",analyseRules[("pi3P", "ppMS")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("ppMP",analyseRules[("pi3P", "ppMP")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("pi1S",analyseRules[("pi3P", "pi1S")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("inf",analyseRules[("pi3P", "inf")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("pi2S",analyseRules[("pi3P", "pi2S")].sortirForme(u"lav",True))
# laverForms.ajouterFormes("pi3S",analyseRules[("pi3P", "pi3S")].sortirForme(u"lav",True))


In [None]:
def dictCliqueForms(clique):
    result={}
    for element in clique:
        lexeme,forme,case=splitArrivee(element)
        for c in dictMorphomeCases[case]:
            result[c]=forme
    return result

def dictPdRowForms(row):
    result={}
    for case in sampleCases:
        print (case,row[case].values[0])
    return result

def tableZero(case):
    if case in sampleCases:
        return u"Ø"
    else:
        return u"="

def makeTable(dictForms,title=""):
    tabular=[]
    labelTenseCode={"pi":"Present","ii":"Imperfective","ai":"Simple Past","fi":"Future",
                    "ps":"Subjunctive Pres.","is":"Subjunctive Imp.","pc":"Conditional","pI":"Imperative",
                    "inf":"Infinitive",
                    "ppMS":"Past Part. MS","ppMP":"Past Part. MP",
                    "ppFS":"Past Part. FS","ppMP":"Past Part. FP"
                   }
    def makeLine6(tenseCode):
        line=[]
        line.append(r"<th>%s</th>"%labelTenseCode[tenseCode])
        for person in [per+nb for nb in ["S","P"] for per in ["1","2","3"]]:
            case=tenseCode+person
            if (case in dictForms) and (not (type(dictForms[case]) == float and np.isnan(dictForms[case]))):
                line.append(r"<td>%s</td>"%(dictForms[case]))
            else:
                line.append(r"<td>%s</td>"%(tableZero(case)))
        return r"<tr>"+r"".join(line)+r"</tr>"

    def makeLine3(tenseCode):
        line=[]
        line.append(r"<th>%s</th>"%labelTenseCode[tenseCode])
        for person in [per+nb for nb in ["S","P"] for per in ["1","2","3"]]:
            if person in ["2S","1P","2P"]:
                case=tenseCode+person
                if case in dictForms and (not (type(dictForms[case]) == float and np.isnan(dictForms[case]))):
                    line.append(r"<td>%s</td>"%(dictForms[case]))
                else:
                    line.append(r"<td>%s</td>"%(tableZero(case)))
            else:
                line.append(r"<td>%s</td>"%(u"---"))
        return r"<tr>"+r"".join(line)+r"</tr>"
    
    def makeLineNF():
        line=[]
        line.append(r"<th>%s</th>"%"NF")
        for case in ["inf","pP","ppMS","ppMP","ppFS","ppFP"]:
            if case in dictForms and (not (type(dictForms[case]) == float and np.isnan(dictForms[case]))):
                line.append(r"<td>%s</td>"%(dictForms[case]))
            else:
                line.append(r"<td>%s</td>"%(tableZero(case)))
        return r"<tr>"+r"".join(line)+r"</tr>"
    
        
    top=[
        r"<table>",
        r"<caption style='caption-side:bottom;text-align:center'>",
        "Verbe : %s"%title,
        r"</caption>",
#        r"<tr><th/><th>1S</th><th>2S</th><th>3S</th><th>1P</th><th>2P</th><th>3P</th></tr>"
        r"<tr><th/><th>1SG</th><th>2SG</th><th>3SG</th><th>1PL</th><th>2PL</th><th>3PL</th></tr>"
        ]
    bottom=[
        r"</table>"
        ]
    tabular.append("\n".join(top))
    for tenseCode in ["pi","ii","fi","pc", "ps","ai", "is"]:
        tabular.append(makeLine6(tenseCode))
    tabular.append(makeLine3("pI"))
    tabular.append(makeLineNF())
    tabular.append("\n".join(bottom))
    return "\n".join(tabular)    

def diffParadigme(lexeme):
    outLen=lexemeMaxCliques[lexeme][0]
    inLen=paradigmes[paradigmes["lexeme"]==lexeme].notnull().sum(axis=1).values[0]-1
    if outLen>inLen:
        print (lexemeMaxCliques[lexeme][1])
        print (paradigmes[paradigmes["lexeme"]=="grandir"].values)
    return outLen-inLen
    

In [None]:
def checkFidelite(fidelite,clique):
    lFidele=False
    for element in clique:
        if fidelite in element:
            lFidele=True
    return lFidele

def generateCliques(contextFree=False):
    
    def bruteCliques(lexeme,maxCliqueSize=51):
        cliquesBrutes={n+1:0 for n in range(maxCliqueSize)}
        for l in cliquesListes[lexeme].values():
            longueur=len(l)
            if longueur>1:
                if not longueur in cliquesBrutes:
                    cliquesBrutes[longueur]=0
                cliquesBrutes[longueur]+=1
        return cliquesBrutes
    
    globDigraphe=nx.DiGraph()
    globGraphe=nx.Graph()

    globDigraphe,globGraphe,numClique=generateAnalysis(globDigraphe,globGraphe,contextFree)
    print 

    lexemeMaxCliques={}
    lexemeParadigmes={}
    progressBarCliques = FloatProgress(min=0, max=len(cliquesListes)-1,description="Analysis (%d verbes)"%len(cliquesListes))
    display(progressBarCliques)
    for lexeme in cliquesListes:
        progressBarCliques.value+=1
        maxLen=max([len(c) for c in cliquesListes[lexeme].values()])
        lexemeMaxCliques[lexeme]=bruteCliques(lexeme,maxLen)
        print (lexeme,"Nombre de cliques",sum([v for k,v in lexemeMaxCliques[lexeme].iteritems()]))
        maxNbCliques=max([v for k,v in lexemeMaxCliques[lexeme].iteritems()])
        if plotDistributionCliques:
            ax=pd.DataFrame.from_dict(lexemeMaxCliques[lexeme],orient="index").plot(kind="bar",legend=False,grid=True,figsize=(10,3))
            ax.set(xlim=(0,maxLen+.5),ylim=(0,maxNbCliques+10))
            ax.set_xlabel("Clique Size in Cells",fontsize=16)
            ax.set_ylabel("Number of Cliques",fontsize=16)

        dictParadigmes=paradigmes.set_index("lexeme").to_dict(orient="index")

        cliquesFideles={}
        fidelites=[v+"-"+k for k,v in dictParadigmes[lexeme].iteritems() if isinstance(v,unicode)]
        for l in cliquesListes[lexeme].values():
            longueur=len(l)
            if longueur>1:
                fidele=True
                for fidelite in fidelites:
                    if "," in fidelite:
                        fideliteForme,fideliteCase=fidelite.split("-")
                        fideliteFormes=fideliteForme.split(",")
                        fideliteItems=[fideliteF+"-"+fideliteCase for fideliteF in fideliteFormes]
#                        print (fideliteItems)
                        lFidele=True
                        for f in fideliteItems:
                            lFidele=lFidele & checkFidelite(f,l)
                    else:
                        lFidele=checkFidelite(fidelite,l)
                    if not lFidele:
                        fidele=False
                        break
                if fidele:
                    if not longueur in cliquesFideles:
                        cliquesFideles[longueur]=[]
                    cliquesFideles[longueur].append(l)
#                else:
#                    if lexeme==u"bégayer": print ()
#        print ([(k,len(v)) for k,v in cliquesFideles.iteritems()])
        if cliquesFideles:
            maxCliquesCard=max([k for k,v in cliquesFideles.iteritems()])
    #        print (maxCliquesCard)
    #        print (cliquesScores[lexeme])
    #        print (cliquesListes[lexeme])
            lexemeParadigmes[lexeme]=[]
            maxScoreCliques=max([clique for cliqueNumber, clique in cliquesScores[lexeme].items()])
            maxCardScoreNums=[numC for numC, c in cliquesListes[lexeme].items() if c in cliquesFideles[maxCliquesCard]]
            maxCardScore=max([scoreC for numC, scoreC in cliquesScores[lexeme].items() if numC in maxCardScoreNums])
    #        print ("max score among all cliques:",maxScoreCliques)
            print ("max score among faithfull cliques of %d forms:"%maxCliquesCard,maxCardScore)
            for c in cliquesFideles[maxCliquesCard]:
                cNumber=[cliqueNumber for cliqueNumber, clique in cliquesListes[lexeme].items() if clique == c]
                if len(cNumber)!=1:
                    print ("TOO MANY SCORES PROBLEM WITH CLIQUE", cNumber)
    #            print ("Liste n°",cNumber[0],cliquesScores[lexeme][cNumber[0]])
    #            print (sorted(cliquesScores[lexeme].items(), key=operator.itemgetter(1)))
    #            display(HTML(makeTable(dictCliqueForms(c),title=c[0].split("-")[0])))
    #            print (cliquesScores[lexeme][cNumber[0]], maxCardScore)
                if cliquesScores[lexeme][cNumber[0]]==maxCardScore:
                    lexemeParadigmes[lexeme].append(c)
        else:
            lexemeParadigmes[lexeme]=[[lexeme+"-"+f for f in fidelites]]
            print (u"no new faithfull clique, the previous one contained %d forms"%len(lexemeParadigmes[lexeme][0]))
    return lexemeParadigmes

In [None]:
def cutNodeName(nodeName):
    items=nodeName.split("-")
    nbItems=len(items)
    if nbItems>3:
        items=["-".join(items[0:nbItems-2]),items[-2],items[-1]]
    return items

def filledOutClique(cliques):
    if cliques:
        result=[]
        for clique in cliques:
#            print (clique)
            fullClique=[]
            for element in clique:
                if debug: print ("element",element)
                lexeme,forme,case=splitArrivee(element)
                for c in dictMorphomeCases[case]:
                    fullClique.append("-".join([lexeme,forme,c]))
            result.append(fullClique)
        return result
    else:
        return cliques

In [None]:
def extendParadigmes(contextParadigmes,extendMorphomes=False):
    lexemesParadigmeListe=[]
    for lexeme in contextParadigmes:
        if extendMorphomes:
            lexParadigmes=filledOutClique(contextParadigmes[lexeme])
        else:
            lexParadigmes=contextParadigmes[lexeme]
        if len(lexParadigmes)!=1:
            if debug:
                print ("LEXEME WITH A NON UNIQUE PARADIGM PB",len(lexParadigmes),lexeme)
                print (lexParadigmes)
        lexParadigme=lexParadigmes[0]
        for lexForme in lexParadigme:
            lexemesParadigmeListe.append(cutNodeName(lexForme))
    newForms=pd.DataFrame(lexemesParadigmeListe)
    newForms.columns=["lexeme","form","case"]
#    newParadigmes=newForms.pivot(index="lexeme", columns="case", values="form")
    newParadigmes=pd.pivot_table(newForms, values='form', index=['lexeme'], columns=['case'], aggfunc=lambda x: ",".join(x)).reset_index().reindex()
    for i in newParadigmes.itertuples():
#        print (i[0],i[1])
        lexeme=i[1]
        lexemeIndexes=paradigmes.lexeme[paradigmes.lexeme==lexeme].index.tolist()
        if lexemeIndexes:
            lexemeIndex=lexemeIndexes[0]
        else:
            print (i,lexeme,lexemeIndexes)
        newParadigmes.loc[newParadigmes.lexeme==lexeme,"index"]=int(lexemeIndex)
    newParadigmes.set_index("index",inplace=True)   
    return paradigmes.combine_first(newParadigmes)

In [None]:
def countSplits(dfForms):
    dfForms.loc[:,"split"]=dfForms.loc[:,"value"].str.split(",")
    return dfForms["split"].str.len().sum()

def calculerResultats(contextParadigmes,extension="-Swim2"):
    
    '''Préparer le paradigme des prédictions'''
    brutParadigmes=extendParadigmes(contextParadigmes,extendMorphomes=False)
    finalParadigmes=extendParadigmes(contextParadigmes,extendMorphomes=True)
    finalParadigmes.to_csv(path_or_buf=analysisPrefix+"-paradigmes%s.csv"%extension,encoding="utf8",sep=";")
    finalTestForms=pd.melt(finalParadigmes[finalParadigmes["lexeme"].isin(listeTest)],id_vars=["lexeme"]).dropna()
#     finalTestForms["lexeme-case"]=finalTestForms["lexeme"]+"-"+finalTestForms["variable"]
#     finalTestForms.drop(labels=["lexeme","variable"],axis=1,inplace=True)
    finalTestForms["lexeme-case"]=finalTestForms["lexeme"]+"-"+finalTestForms["case"]
    finalTestForms.drop(labels=["lexeme","case"],axis=1,inplace=True)
    finalTestForms.set_index(["lexeme-case"],inplace=True)
    
    '''Soustraire les formes initiales'''
    finalForms=finalTestForms.loc[~finalTestForms.index.isin(initialFormsIndex)]
    finalFormsIndex=finalForms.index.tolist()
    
    '''Calculer les sur/sous-générations'''
    underGeneration=goldForms.loc[~goldForms.index.isin(finalFormsIndex)]
    overGeneration=finalForms.loc[~finalForms.index.isin(goldFormsIndex)]
    
    '''Réduire les prédictions et la référence aux cases communes'''
    predictedForms=finalForms.loc[finalForms.index.isin(goldFormsIndex)]
    actualForms=goldForms.loc[goldForms.index.isin(finalFormsIndex)]
    
    '''Créer un tableau pour les comparaisons'''
    compareForms=predictedForms.copy()
    compareForms.loc[:,"right"]=actualForms.loc[:,"value"]
    
    '''Séparer les cases identiques des cases différentes'''
    sameForms=compareForms[compareForms["value"]==compareForms["right"]]
    diffForms=compareForms[compareForms["value"]!=compareForms["right"]]

    
    '''Sauvegarder les comparatifs'''
    overGeneration.to_csv(path_or_buf=analysisPrefix+"-overGeneration%s.csv"%extension,encoding="utf8")
    underGeneration.to_csv(path_or_buf=analysisPrefix+"-underGeneration%s.csv"%extension,encoding="utf8")
    sameForms.to_csv(path_or_buf=analysisPrefix+"-sameForms%s.csv"%extension,encoding="utf8")
    diffForms.to_csv(path_or_buf=analysisPrefix+"-diffForms%s.csv"%extension,encoding="utf8")
    
    
    '''Transformer les surabondances en liste'''
    diffForms.loc[:,"split-value"]=diffForms.loc[:,"value"].str.split(",")
    diffForms.loc[:,"split-right"]=diffForms.loc[:,"right"].str.split(",")

    '''Transformer les surabondances en set()'''
    diffForms.loc[:,"split-value"]=diffForms.loc[:,"split-value"].apply(set)
    diffForms.loc[:,"split-right"]=diffForms.loc[:,"split-right"].apply(set)
    
    '''Calculer le nombre de formes (y compris surabondances)'''
    nbValues=diffForms["split-value"].str.len().sum()
    nbRights=diffForms["split-right"].str.len().sum()

    '''Calculer les identités et les inclusions'''
    nbIdenticalSets=diffForms[diffForms["split-value"]==diffForms["split-right"]]["split-value"].str.len().sum()
    nbIncludedSets=diffForms[diffForms["split-value"]<diffForms["split-right"]]["split-value"].str.len().sum()
    nbWrongForms=(nbValues-nbIdenticalSets-nbIncludedSets)
    underBonus=(nbRights-nbIdenticalSets-nbIncludedSets)

    UG=countSplits(underGeneration)+underBonus
    OG=countSplits(overGeneration)
    TP=countSplits(sameForms)+nbIdenticalSets+nbIncludedSets
    FP=nbWrongForms
    resultCharacteristics=(UG,OG,TP,FP)
    recall=float(TP)/(UG+TP+FP)
    precision=float(TP)/(OG+TP+FP)
    fMeasure=2*recall*precision/(recall+precision)
    resultMeasures=(precision,recall,fMeasure)
    print ("UG",UG ,"OG",OG,"TP",TP,"FP",FP)
    print ("recall", recall, "precision", precision)
    print (fMeasure)
    return (brutParadigmes,finalParadigmes,resultCharacteristics,resultMeasures)


# SWIM1

In [None]:
cliques=[]
cliquesScores={}
cliquesListes={}

formesScores={}
formesScoresNormes={}

print (datetime.datetime.now())
%time swim1ContextParadigmes=generateCliques()

In [None]:
newParadigmes,paradigmesSwim1,characteristicsSwim1,measuresSwim1=calculerResultats(swim1ContextParadigmes,"-Swim1")
%ding

# SWIM2

## Préparation des paradigmes

In [None]:
paradigmesOriginaux=paradigmes.copy()
paradigmesSample=paradigmesOriginaux[paradigmesOriginaux["lexeme"].isin(listeTest)]

In [None]:
paradigmes=newParadigmes.copy()
paradigmesColumns=paradigmes.columns.tolist()
for c in sampleCases:
    if not c in paradigmesColumns:
        print (c)
        paradigmes[c]=np.NaN

In [None]:
cliques=[]
cliquesScores={}
cliquesListes={}

formesScores={}
formesScoresNormes={}

print (datetime.datetime.now())
%time swim2ContextParadigmes=generateCliques(contextFree=True)

In [None]:
newParadigmes,paradigmesSwim2,characteristicsSwim2,measuresSwim2=calculerResultats(swim2ContextParadigmes,"-Swim2")

In [None]:
nomFichierResultats=filePrefix+"-X-Resultats.yaml"
if os.path.isfile(nomFichierResultats):
    with open(nomFichierResultats, 'r') as stream:
            resultats=yaml.load(stream)
else:
    resultats={}

if casesType:
    sampleExt=casesType
else:
    sampleExt=sampleType
sampleId=sampleNumber.strip("-")+sampleExt
resultats[sampleId]={}
resultats[sampleId]["Swim1"]={}
resultats[sampleId]["Swim1"]["UG"]=characteristicsSwim1[0]
resultats[sampleId]["Swim1"]["OG"]=characteristicsSwim1[1]
resultats[sampleId]["Swim1"]["TP"]=characteristicsSwim1[2]
resultats[sampleId]["Swim1"]["FP"]=characteristicsSwim1[3]
resultats[sampleId]["Swim1"]["Precision"]=measuresSwim1[0]
resultats[sampleId]["Swim1"]["Recall"]=measuresSwim1[1]
resultats[sampleId]["Swim1"]["F-Measure"]=measuresSwim1[2]
resultats[sampleId]["Swim2"]={}
resultats[sampleId]["Swim2"]["UG"]=characteristicsSwim2[0]
resultats[sampleId]["Swim2"]["OG"]=characteristicsSwim2[1]
resultats[sampleId]["Swim2"]["TP"]=characteristicsSwim2[2]
resultats[sampleId]["Swim2"]["FP"]=characteristicsSwim2[3]
resultats[sampleId]["Swim2"]["Precision"]=measuresSwim2[0]
resultats[sampleId]["Swim2"]["Recall"]=measuresSwim2[1]
resultats[sampleId]["Swim2"]["F-Measure"]=measuresSwim2[2]

yaml.safe_dump(resultats, file(nomFichierResultats, 'w'), encoding='utf-8', allow_unicode=True)

In [None]:
%ding
%ding

In [None]:
print ("Sample",sampleNumber.strip("-"))
print ("Swim1",characteristicsSwim1,measuresSwim1)
print ("Swim2",characteristicsSwim2,measuresSwim2)

# Fin du traitement