# Classification de mots pour des annonces immobilières

### Le but de ce projet est de récuperer les informations importantes d'une annonce immobilière grâce à un classifieur de mots pour des annonces immobilières.

Les informations à récupérer sont : 
+ M2: surface en m2
+ N_PIECES : nombre de pièces
+ N_CHAMBRES: nombre de chambres
+ VILLE:
+ QUARTIER: nom du quartier
+ ADRESSE: nom du la rue (avec le numero si indiqué)
+ TRANSPORTS_PROXIMITE: transports à proximité
+ ANNEE_CONSTRUCTION: annee de construction de l'immeuble
+ CODE_POSTAL: code postal (92130)
+ LOYER_CC: montant du loyer charges comprises
+ LOYER_HC: montant du loyer hors charges
+ CHARGES_LOCATAIRE_MOIS: montant des charges mensuelles
+ DEPOT_GARANTIE: montant du depot de garantie
+ N_ETAGE:numero etage
+ AVEC_ASCENSEUR:
+ DATE_DISPO:
+ TYPE_CHAUFFAGE: individuel /collectif
+ TYPE_LOCATION: meublé ou non meublé
+ PARKING :
+ EXTERIEUR : présence d'un jardin/balcon/terrasse
+ COPROPRIETE :
+ HONORAIRE : montant des honoraires de l'agence
+ STOCKAGE : présence d'une cave/box ou autre élément de stockage

Pour cela nous allons utiliser le NER tagging et des techniques features-based.

# Sommaire

* I/ Importation de la base de données
* II/ Création des features
* III/ Classification avec le modele CRF 
    + A/ Creation base de donnée pour CRF
    + B/ Apprentissage et Validation du modele CRF
        + 1/ Apprentissage et Validation simple du modele CRF
        + 2/ Apprentissage après optimisation des hyperparametres et k-cross Validation sur le train et validation du modele CRF sur le test
* IV/ Interpretation avec le modele CRF
* V/ Exportation Prediction vers Fichier Json importable sur Doccano
* VI/ Sequence Tagging avec bi LSTM-CRF

# Importation des librairies

In [1]:
import json
import pandas as pd
import numpy as np
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize 
import unicodedata
from difflib import SequenceMatcher
from sklearn.model_selection import train_test_split
import sklearn_crfsuite
from sklearn_crfsuite import scorers
from sklearn_crfsuite import metrics
import scipy.stats
from sklearn.metrics import make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RandomizedSearchCV
from collections import Counter
import eli5
from sklearn.metrics import confusion_matrix

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\sujiv\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Using TensorFlow backend.


### Fonctions

# I/ Importation de la base de données

In [2]:
def from_ad_to_dataframe(line, nb_line):
    """Fonction qui, à partir d'une ligne d'un fichier Json, permet de créer une dataframe
    à trois colonnes. La première colonne correspond au numéro de l'annonce, la seconde 
    contient les mots de l'annonce et la troisième les positions du mot dans l'annonce.""" 
    Vect_word=(word_tokenize(eval(line.strip().replace('\xa0',' '))["text"])) # Tokenisation
    nb_sent_list=list(map(int, nb_line*np.ones(len(Vect_word)))) # Numéro annonce
    # Position
    offset = 0                                                                  
    list_pos=list()
    for token in Vect_word:
        offset = eval(line.strip().replace('\xa0',' '))["text"].find(token, offset)
        list_pos.append([offset, offset+len(token)])
        offset += len(token)
    # Creation de la dataframe
    data={'Ad#':nb_sent_list,'Words':Vect_word,'Pos':list_pos}
    dataframe=pd.DataFrame(data)
    return dataframe

def clean_text(text):
    """Fonction qui permet de corriger les annotations qui surlignent un espace blanc au début 
    ou à la fin de l'annotations sous Doccano"""
    if text[0]==' ':      
        if text[-1]==' ':
            return 3
        return 1
    elif text[-1]==' ':
        return 2
    else :
        return 0
    
def from_line_to_list_label(line):
    """Fonction qui permet de sortir les informations des labels (text, label et positions)
    à partir d'une ligne du fichier json"""
    list_word_label=list()
    for i in range(len(eval(line)["labels"])):
        start=eval(line)["labels"][i][0]  # position de depart
        end=eval(line)["labels"][i][1]    # position d'arrivee
        label=eval(line)["labels"][i][2]  # label
        # Distinction des cas au fonction de la fonction clean_text
        # on supprime les \xa0 de nos annonces 
        if (clean_text(eval(line.strip().replace('\xa0',' '))["text"][start:end])==0):
            list_word_label.append([eval(line.strip().replace('\xa0',' '))["text"][start:end],label,start,end])
        elif (clean_text(eval(line.strip().replace('\xa0',' '))["text"][start:end])==1):
            list_word_label.append([eval(line.strip().replace('\xa0',' '))["text"][(start+1):end],label,start+1,end])
        elif (clean_text(eval(line.strip().replace('\xa0',' '))["text"][start:end])==2):
            list_word_label.append([eval(line.strip().replace('\xa0',' '))["text"][start:(end-1)],label,start,end-1])
        else:
            list_word_label.append([eval(line.strip().replace('\xa0',' '))["text"][(start+1):(end-1)],label,start+1,end-1])
    return list_word_label

def column_tag(vect_word,list_word_pos_label):
    """Creation de la colonne contenant les labels pour chaque mot d'une annonce avec la convention Inside–outside–beginning tagging"""
    list_tag=["O"]*len(vect_word["Pos"])
    for i in range(len(vect_word["Pos"])):
        for elmt in list_word_pos_label:
            if vect_word["Pos"][i][0]==elmt[2] and vect_word["Pos"][i][1]<=elmt[3]:
                list_tag[i]="B-"+elmt[1]
            elif vect_word["Pos"][i][0]>elmt[2] and vect_word["Pos"][i][1]<=elmt[3]:
                list_tag[i]="I-"+elmt[1]
    return list_tag

### Importation et creation de la base de données à partir des fichiers Json1 de Doccano

In [3]:
%%time
def dataframe_from_json():
    cnt = 1 # Numéro annonce
    for i in range(1,6): # 5 fichier Json
        with open('data/doccano/bdd'+str(i)+'.json1', encoding="utf-8") as fp: #ouverture fichier
            line = fp.readline() # lecture de la ligne
            # Modification de la dataframe
            if i==1: # Creation initiale de la dataframe 
                dataframe=from_ad_to_dataframe(line.replace('null','"null"'),cnt)
                list_word_pos_label=from_line_to_list_label(line.replace('null','"null"'))
                list_tag=column_tag(dataframe,list_word_pos_label)
                dataframe["Tag"]=list_tag
            else :
                df_ad=from_ad_to_dataframe(line.replace('null','"null"'),cnt)
                list_word_pos_label=from_line_to_list_label(line.replace('null','"null"'))
                list_tag=column_tag(df_ad,list_word_pos_label)
                df_ad["Tag"]=list_tag
                dataframe=dataframe.append(df_ad, ignore_index = True)
            while line: # pour toutes les lignes
                if cnt!=1:
                    df_ad=from_ad_to_dataframe(line.replace('null','"null"'),cnt)
                    list_word_pos_label=from_line_to_list_label(line.replace('null','"null"'))
                    list_tag=column_tag(df_ad,list_word_pos_label)
                    df_ad["Tag"]=list_tag
                    dataframe=dataframe.append(df_ad, ignore_index = True)
                line = fp.readline()
                cnt += 1
    return dataframe

df=dataframe_from_json()

Wall time: 1min 8s


In [4]:
print(df.loc[df['Ad#']==482,:]) # On affiche une partie de la dataframe pour l'annonce 482

       Ad#         Words         Pos                       Tag
41057  482            24      [0, 2]                 B-ADRESSE
41058  482           rue      [3, 6]                 I-ADRESSE
41059  482            du      [7, 9]                 I-ADRESSE
41060  482     Capitaine    [10, 19]                 I-ADRESSE
41061  482       Ferber-    [20, 27]                         O
41062  482   Copropriété    [28, 39]             B-COPROPRIETE
41063  482            de    [40, 42]                         O
41064  482          2004    [43, 47]      B-ANNEE_CONSTRUCTION
41065  482            de    [48, 50]                         O
41066  482      standing    [51, 59]                         O
41067  482             ,    [59, 60]                         O
41068  482            au    [61, 63]                         O
41069  482         2ième    [64, 69]                 B-N_ETAGE
41070  482         étage    [70, 75]                         O
41071  482             ,    [75, 76]                   

A ce niveau la dataframe contient donc une colonne qui indique le numéro de l'annonce, une colonne avec les mots, une colonne pour les postions du mots dans l'annonce et le label/tag avec les convientions IOB

# II/ Creation des features

Nous allons créer les features suivantes : 
+ le mot contient 4 caractères ou moins 
+ le mot est un nombre
+ le mot commence par une lettre majuscule 
+ le mot est en majuscule
+ le mot contient un symbole 
+ le mot contient des chiffres et des lettres 
+ le mot contient un mot clé (nous avons une liste de mots clés que nous avons choisi judicieusement)
+ le mot precedent et reconnaitre si c'est un mot clé
+ le mot precedent le mot precedent et reconnaitre si c'est un mot clé
+ le mot suivant et reconnaitre si c'est un mot clé
+ le mot suivant le mot suivant et reconnaitre si c'est un mot clé

### Fonctions

In [5]:
# Feature mot court
def is_small_word(Vect_word):
    list_small=list()
    for word in Vect_word:      
        if(len(word)<=4):
            list_small.append(1)
        else:
            list_small.append(0)
    return list_small

# Feature le mot est un nombre (decimal ou non)
def is_number(Vect_word):
    list_number=list()
    for word in Vect_word:      
        word=word.replace(",","").replace(".","") # on peut ecrire un nombre decimal avec un point ou une virgule
        try:
            float(word)
            list_number.append(1)
        except ValueError:
            list_number.append(0)
    return list_number

# Feature premiere lettre en majuscule
def is_first_letter_upper(Vect_word):
    list_upper=list()
    for word in Vect_word:
        list_upper.append(int(word[0].isupper()))
    return list_upper

# Feature mot en majuscule
def is_all_upper(Vect_word):
    list_upper=list()
    for word in Vect_word:
        list_upper.append(int(word.isupper()))
    return list_upper

# Feature symbole dans le mot
def symbole_in_word(Vect_word):
    list_symbole=list()
    for word in Vect_word:
        list_symbole.append( int( not( word.isalpha() or word.isnumeric() ) ) ) # pas un chiffre, pas une lettre donc un symbole
    return list_symbole

#Feature nombre et lettre dans le mot
def is_number_and_letter(Vect_word):
    list_number_and_letter = list()
    for word in Vect_word:
        numeric = 0
        alpha = 0
        for c in word:
            if c.isnumeric():
                numeric=1
            if c.isalpha():
                alpha=1
        list_number_and_letter.append(alpha*numeric)
    return(list_number_and_letter)

#Feature mot clé

def strip_accents(text):
    """Fonction qui retire les accents du texte"""
    try:
        text = unicode(text, 'utf-8')
    except NameError: # unicode is a default on python 3 
        pass
    text = unicodedata.normalize('NFD', text)\
           .encode('ascii', 'ignore')\
           .decode("utf-8")
    return text

def is_key_word(Vect_word):
    keywords = ["chambres","pieces","m2","m","loyer","cc","hc","rue","avenue","quartier","euro","eur","etage","€","individuel","collectif",
                "meuble","jardin","balcon","terasse","stationnement","parking","cave","box","immediatement","suite"]
    list_keyword = list()
    for word in Vect_word:
        test = 0
        #remove all accents
        s = strip_accents(word)
        #put it in lower case
        s = s.lower()
        #use sequencematcher
        for key in keywords:
            if(SequenceMatcher(None,s,key).ratio() > 0.7): # 0.7 est le threshold
                test = 1
                break
        list_keyword.append(test)
    return(list_keyword)

On complete la dataframe

In [6]:
%%time

# Features mots precedents, mots suivants, et reconnaitre si ce sont des mots clés
def list_word_features(dataframe):    
    Number_ad=dataframe['Ad#'].iloc[-1]

    list_prev_word=list()
    list_2prev_word=list()
    list_next_word=list()
    list_2next_word=list()

    list_prev_key=list()
    list_2prev_key=list()
    list_next_key=list()
    list_2next_key=list()

    for i in range(Number_ad):
        Vect_ad_word=dataframe.loc[dataframe['Ad#']==i+1,:]["Words"]
        Vect_ad_key=dataframe.loc[dataframe['Ad#']==i+1,:]["is_key_word"]
        for j in range(len(Vect_ad_word)):
            # Pour chaque annonce, on fait attention aux deux premiers et deux derniers mots  
            if j==0:
                list_prev_word.append("__Start1__")
                list_2prev_word.append("__Start2__")
                list_next_word.append(Vect_ad_word.iloc[j+1])
                list_2next_word.append(Vect_ad_word.iloc[j+2])

                list_prev_key.append(0)
                list_2prev_key.append(0)
                list_next_key.append(Vect_ad_key.iloc[j+1])
                list_2next_key.append(Vect_ad_key.iloc[j+2])

            elif j==1:
                list_prev_word.append(Vect_ad_word.iloc[j-1])
                list_2prev_word.append("__Start1__")
                list_next_word.append(Vect_ad_word.iloc[j+1])
                list_2next_word.append(Vect_ad_word.iloc[j+2])

                list_prev_key.append(Vect_ad_key.iloc[j-1])
                list_2prev_key.append(0)
                list_next_key.append(Vect_ad_key.iloc[j+1])
                list_2next_key.append(Vect_ad_key.iloc[j+2])

            elif j==len(Vect_ad_word)-2:
                list_prev_word.append(Vect_ad_word.iloc[j-1])
                list_2prev_word.append(Vect_ad_word.iloc[j-2])
                list_next_word.append(Vect_ad_word.iloc[j+1])
                list_2next_word.append("__End1__")

                list_prev_key.append(Vect_ad_key.iloc[j-1])
                list_2prev_key.append(Vect_ad_key.iloc[j-2])
                list_next_key.append(Vect_ad_key.iloc[j+1])
                list_2next_key.append(0)

            elif j==len(Vect_ad_word)-1:
                list_prev_word.append(Vect_ad_word.iloc[j-1])
                list_2prev_word.append(Vect_ad_word.iloc[j-2])
                list_next_word.append("__End1__")
                list_2next_word.append("__End2__")

                list_prev_key.append(Vect_ad_key.iloc[j-1])
                list_2prev_key.append(Vect_ad_key.iloc[j-2])
                list_next_key.append(0)
                list_2next_key.append(0)
            else :
                list_prev_word.append(Vect_ad_word.iloc[j-1])
                list_2prev_word.append(Vect_ad_word.iloc[j-2])
                list_next_word.append(Vect_ad_word.iloc[j+1])
                list_2next_word.append(Vect_ad_word.iloc[j+2])

                list_prev_key.append(Vect_ad_key.iloc[j-1])
                list_2prev_key.append(Vect_ad_key.iloc[j-2])
                list_next_key.append(Vect_ad_key.iloc[j+1])
                list_2next_key.append(Vect_ad_key.iloc[j+2])
    return [list_2prev_word, list_prev_word, list_next_word, list_2next_word, list_2prev_key, list_prev_key, list_next_key, list_2next_key]

def add_features(dataframe):
    '''Fonction qui permet de rajouter les features à la dataframe'''
    dataframe["is_small_word"]=is_small_word(dataframe['Words'])
    dataframe["is_number"]=is_number(dataframe['Words'])
    dataframe["is_first_letter_upper"]=is_first_letter_upper(dataframe['Words'])
    dataframe["is_all_upper"]=is_all_upper(dataframe['Words'])
    dataframe["symbole_in_word"]=symbole_in_word(dataframe['Words'])
    dataframe["is_number_and_letter"]=is_number_and_letter(dataframe['Words'])
    dataframe["is_key_word"]=is_key_word(dataframe['Words'])
    
    word_features=list_word_features(dataframe)
    dataframe["2prev_word"]=word_features[0]
    dataframe["2prev_key"]=word_features[4]
    dataframe["prev_word"]=word_features[1]
    dataframe["prev_key"]=word_features[5]
    dataframe["next_word"]=word_features[2]
    dataframe["next_key"]=word_features[6]
    dataframe["2next_word"]=word_features[3]
    dataframe["2next_key"]=word_features[7]

add_features(df)

Wall time: 1min 16s


In [7]:
print(df.loc[df['Ad#']==482,:].head()) # On affiche une partie de la dataframe pour l'annonce 482

       Ad#      Words       Pos        Tag  is_small_word  is_number  \
41057  482         24    [0, 2]  B-ADRESSE              1          1   
41058  482        rue    [3, 6]  I-ADRESSE              1          0   
41059  482         du    [7, 9]  I-ADRESSE              1          0   
41060  482  Capitaine  [10, 19]  I-ADRESSE              0          0   
41061  482    Ferber-  [20, 27]          O              0          0   

       is_first_letter_upper  is_all_upper  symbole_in_word  \
41057                      0             0                0   
41058                      0             0                0   
41059                      0             0                0   
41060                      1             0                0   
41061                      1             0                1   

       is_number_and_letter  is_key_word  2prev_word  2prev_key   prev_word  \
41057                     0            0  __Start2__          0  __Start1__   
41058                     0   

In [8]:
print(df.isnull().sum()) # On verifie qu'il n'y a pas de NA

Ad#                      0
Words                    0
Pos                      0
Tag                      0
is_small_word            0
is_number                0
is_first_letter_upper    0
is_all_upper             0
symbole_in_word          0
is_number_and_letter     0
is_key_word              0
2prev_word               0
2prev_key                0
prev_word                0
prev_key                 0
next_word                0
next_key                 0
2next_word               0
2next_key                0
dtype: int64


# III/ Classification avec le modele CRF

Pour utiliser le modele CRF, on doit modifier notre base de donnée et créer des dictionnaire pour pouvoir mettre notre dataframe en entrée du modele

### A/ Creation base de donnée pour CRF

In [9]:
# Pour chaque mot on crée un dictionaire à partir de la dataframe existante
def word2features(df_one_ad, i):
    df_one_word = df_one_ad.iloc[i]
    features = {
        'bias': 1.0,
        'word.lower()': df_one_word["Words"].lower(),
        'is_small_word': bool(df_one_word["is_small_word"]),
        'is_number': bool(df_one_word["is_number"]),
        'is_first_letter_upper': bool(df_one_word["is_first_letter_upper"]),
        'is_all_upper': bool(df_one_word["is_all_upper"]),
        'symbole_in_word': bool(df_one_word["symbole_in_word"]),      
        'is_number_and_letter': bool(df_one_word["is_number_and_letter"]),
        'is_key_word': bool(df_one_word["is_key_word"]),
        '2prev_word': df_one_word["2prev_word"].lower(),
        '2prev_key': bool(df_one_word["2prev_key"]),
        'prev_word': df_one_word["prev_word"].lower(),
        'prev_key': bool(df_one_word["prev_key"]), 
        'next_word': df_one_word["next_word"].lower(),
        'next_key': bool(df_one_word["next_key"]), 
        '2next_word': df_one_word["2next_word"].lower(),
        '2next_key': bool(df_one_word["2next_key"]), 
    }
    return features

# Vecteur contenant les dictionnaires d'une annonce 
def ad2features(df_one_ad):
    return [word2features(df_one_ad, i) for i in range(len(df_one_ad))]

# Return le tag d'un mot
def word2tags(df_one_ad, i):
    return df_one_ad.iloc[i]["Tag"]

# Vecteur de tags d'une annonce
def ad2tags(df_one_ad):
    return [word2tags(df_one_ad, i) for i in range(len(df_one_ad))]

Number_ad=df['Ad#'].iloc[-1]

# Base de données pour CRF
X_crf=list() 
y_crf=list() # Target 
for i in range(Number_ad):
    X_crf.append(ad2features(df.loc[df['Ad#']==i+1,:]))
    y_crf.append(ad2tags(df.loc[df['Ad#']==i+1,:]))
    

In [10]:
X_crf[0][0:2] # On observe la base de donnée CRF pour les deux premiers mots 

[{'bias': 1.0,
  'word.lower()': 'situé',
  'is_small_word': False,
  'is_number': False,
  'is_first_letter_upper': True,
  'is_all_upper': False,
  'symbole_in_word': False,
  'is_number_and_letter': False,
  'is_key_word': True,
  '2prev_word': '__start2__',
  '2prev_key': False,
  'prev_word': '__start1__',
  'prev_key': False,
  'next_word': 'à',
  'next_key': False,
  '2next_word': '6',
  '2next_key': False},
 {'bias': 1.0,
  'word.lower()': 'à',
  'is_small_word': True,
  'is_number': False,
  'is_first_letter_upper': False,
  'is_all_upper': False,
  'symbole_in_word': False,
  'is_number_and_letter': False,
  'is_key_word': False,
  '2prev_word': '__start1__',
  '2prev_key': False,
  'prev_word': 'situé',
  'prev_key': True,
  'next_word': '6',
  'next_key': False,
  '2next_word': 'stations',
  '2next_key': False}]

In [11]:
y_crf[0][0:2] # On observe la base de donnée CRF pour les deux premiers mots 

['O', 'O']

### B/ Apprentissage et Validation du modele CRF

Creation de la base de train et de test (proportion 2/3, 1/3)

Attention, l'argument random_state est utilisé.

In [12]:
X_crf_train, X_crf_test, y_crf_train, y_crf_test = train_test_split(X_crf,y_crf,test_size=0.33, random_state=12) # ATTENTION AU RANDOM STATE

#### 1/ Apprentissage et Validation simple du modele CRF

On entraine le modele avec la base de train.

In [13]:
%%time
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs', 
    c1=0.1, 
    c2=0.1, 
    max_iterations=100, 
    all_possible_transitions=True
)
crf.fit(X_crf_train, y_crf_train)

Wall time: 1min 3s


In [14]:
# On affiche les différents modeles que l'on doit recuperer
labels = list(crf.classes_)
labels.remove('O')
labels

['B-ADRESSE',
 'I-ADRESSE',
 'B-N_ETAGE',
 'B-AVEC_ASCENSEUR',
 'I-AVEC_ASCENSEUR',
 'B-N_PIECES',
 'B-N_CHAMBRES',
 'B-STOCKAGE',
 'B-PARKING',
 'B-DATE_DISPO',
 'I-VILLE',
 'B-TRANSPORTS_PROXIMITE',
 'I-TRANSPORTS_PROXIMITE',
 'B-TYPE_CHAUFFAGE',
 'B-M2',
 'B-EXTERIEUR',
 'B-QUARTIER',
 'B-VILLE',
 'B-LOYER_CC',
 'I-DATE_DISPO',
 'B-HONORAIRE',
 'I-QUARTIER',
 'I-N_PIECES',
 'B-TYPE_LOCATION',
 'B-ANNEE_CONSTRUCTION',
 'B-LOYER_HC',
 'B-CHARGES_LOCATAIRE_MOIS',
 'B-DEPOT_GARANTIE',
 'I-N_ETAGE',
 'I-LOYER_HC',
 'I-ANNEE_CONSTRUCTION',
 'I-DEPOT_GARANTIE',
 'B-COPROPRIETE',
 'I-HONORAIRE',
 'B-CODE_POSTAL',
 'I-LOYER_CC',
 'I-M2',
 'I-TYPE_CHAUFFAGE',
 'I-TYPE_LOCATION',
 'I-COPROPRIETE',
 'I-STOCKAGE',
 'I-PARKING',
 'I-CHARGES_LOCATAIRE_MOIS']

On predit les classes sur la base de test

In [15]:
y_pred = crf.predict(X_crf_test)
metrics.flat_f1_score(y_crf_test, y_pred, 
                      average='weighted', labels=labels)

# WARNING EXPLANATION some labels in y_true don't appear in y_pred
# This means that there is no F-score to calculate for this label, and thus the F-score for this case is considered to be 0.0. Since you requested an average of the score, you must take into account that a score of 0 was included in the calculation, and this is why scikit-learn is showing you that warning.

  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


0.8412282068598782

On observe une accuracy proche de 85%, ce qui pas mal vu le nombre de classes à predire. 

Nous allons calculer la precision, le recall et le f1-score pour chaque classe.

In [16]:
# group B and I results
sorted_labels = sorted(
    labels, 
    key=lambda name: (name[1:], name[0])
)
print(metrics.flat_classification_report(
    y_crf_test, y_pred, labels=sorted_labels, digits=3
))

  'precision', 'predicted', average, warn_for)
  'recall', 'true', average, warn_for)


                          precision    recall  f1-score   support

               B-ADRESSE      0.957     0.800     0.871        55
               I-ADRESSE      0.919     0.771     0.839       118
    B-ANNEE_CONSTRUCTION      0.950     0.679     0.792        28
    I-ANNEE_CONSTRUCTION      1.000     0.600     0.750         5
        B-AVEC_ASCENSEUR      0.754     0.776     0.765        67
        I-AVEC_ASCENSEUR      0.767     0.868     0.814        53
B-CHARGES_LOCATAIRE_MOIS      0.895     0.919     0.907        37
I-CHARGES_LOCATAIRE_MOIS      0.000     0.000     0.000         0
           B-CODE_POSTAL      0.833     0.833     0.833         6
           B-COPROPRIETE      0.850     1.000     0.919        17
           I-COPROPRIETE      0.000     0.000     0.000         0
            B-DATE_DISPO      0.816     0.705     0.756        44
            I-DATE_DISPO      0.903     0.651     0.757        43
        B-DEPOT_GARANTIE      1.000     0.955     0.977        22
        I

Pour tout les F1_score inferieur à 0.75 et un support superieur à 10, nous allons regarder la matrice de confusion pour éventuellement trouver des features à rajouter. 

Nous allons regarder B-LOYER-CC (l18), B-QUARTIER (l31), I-QUARTIER (l32) et B-TYPE_LOCATION (l39)

In [17]:
flat_list_y_test = [item for sublist in y_crf_test for item in sublist]
flat_list_y_pred= [item for sublist in y_pred for item in sublist]
conf_mat=confusion_matrix(flat_list_y_test, flat_list_y_pred, labels=sorted_labels)
print(conf_mat[18])
print(conf_mat[31])
print(conf_mat[32])
print(conf_mat[39])

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0 14  0  4  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
[ 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0 17  2  0  0  0  0  0  0  0  0  0  0]
[ 0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  3 12  0  0  0  0  0  0  0  0  1  0]
[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 31  0  0  0]


In [18]:
nb_support=0
nb_predict_not_O_from_true_O=0
for i in range(len(flat_list_y_test)):
    if flat_list_y_test[i]!='O':
        nb_support+=1
    else :
        if flat_list_y_pred[i]!='O':
            nb_predict_not_O_from_true_O+=1

nb_good_pred=0
sum_line=0
for i in range(len(conf_mat)):
    nb_good_pred+=conf_mat[i,i]
    sum_line+=sum(conf_mat[i])
    
print("Nombre d'erreur en prevoyant 'O' à partir d'un label different de 'O'", nb_support-sum_line)
print("Nombre d'erreur en prevoyant un autre label que 'O' à partir d'un label different de 'O'", sum_line-nb_good_pred)
print("Nombre de labels 'O' mal prédits", nb_predict_not_O_from_true_O)
print("Nombre de labels 'O' bien prédits", len(flat_list_y_test)-nb_support-nb_predict_not_O_from_true_O)

Nombre d'erreur en prevoyant 'O' à partir d'un label different de 'O' 308
Nombre d'erreur en prevoyant un autre label que 'O' à partir d'un label different de 'O' 63
Nombre de labels 'O' mal prédits 155
Nombre de labels 'O' bien prédits 14299


Lorsque le tag est mauvais c'est souvent dû au fait qu'on predit un label 'O' (ce n'est pas indiqué sur la matrice de confusion mais pour calculer le nombre de mauvaises prédictions en 'O', on a juste à faire support - somme element sur la ligne). Nous avons deja des features mots cles, il est difficile de trouver encore d'autres features pour differencier 'O' et les 'vrais' tag. De plus si on regarde le nombre de labels 'O' mal prédits, ce nombre est faible par rapport aux nombres de labels 'O' bien prédits. 

#### 2/ Apprentissage après optimisation des hyperparametres et k-cross Validation sur le train et validation du modele CRF sur le test

##### Optimisation Hyperparametres

ATTENTION : Le prochain chunk met du temps à s'executer

On utilise la fonction RandomizedSearchCV pour optimiser les hyperparametres. 
Remarque : en argument de cette fonction, on peut mettre en entrée le nombre k, utilisé pour le k-cross Validation.
Nous avons choisi k=3, on pourrait mettre un plus grand nombre mais vu le temps elevé pour l'execution du code, on se contente de k=3.

In [19]:
%%time
# define fixed parameters and parameters to search
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs', 
    max_iterations=100, 
    all_possible_transitions=True
)
params_space = {
    'c1': scipy.stats.expon(scale=0.5),
    'c2': scipy.stats.expon(scale=0.05),
}

# use the same metric for evaluation
f1_scorer = make_scorer(metrics.flat_f1_score, 
                        average='weighted', labels=labels)

# search
rs = RandomizedSearchCV(crf, params_space, 
                        cv=3, 
                        verbose=1, 
                        n_jobs=-1, 
                        n_iter=50, 
                        scoring=f1_scorer)
rs.fit(X_crf_train, y_crf_train)

Fitting 3 folds for each of 50 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed: 11.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 36.4min finished


Wall time: 37min 4s


Meilleur paramètre

In [20]:
print(rs.best_estimator_)
print('best params:', rs.best_params_)
print('best CV score:', rs.best_score_)

CRF(algorithm='lbfgs', all_possible_states=None,
  all_possible_transitions=True, averaging=None, c=None,
  c1=0.3862339436562489, c2=0.02116947246432221,
  calibration_candidates=None, calibration_eta=None,
  calibration_max_trials=None, calibration_rate=None,
  calibration_samples=None, delta=None, epsilon=None, error_sensitive=None,
  gamma=None, keep_tempfiles=None, linesearch=None, max_iterations=100,
  max_linesearch=None, min_freq=None, model_filename=None,
  num_memories=None, pa_type=None, period=None, trainer_cls=None,
  variance=None, verbose=False)
best params: {'c1': 0.3862339436562489, 'c2': 0.02116947246432221}
best CV score: 0.8528094661561127


##### Validation sur le test avec les meilleurs paramètres

In [21]:
crf = rs.best_estimator_
y_pred = crf.predict(X_crf_test)
print(metrics.flat_classification_report(
    y_crf_test, y_pred, labels=sorted_labels, digits=3
))

                          precision    recall  f1-score   support

               B-ADRESSE      0.957     0.800     0.871        55
               I-ADRESSE      0.930     0.788     0.853       118
    B-ANNEE_CONSTRUCTION      0.950     0.679     0.792        28
    I-ANNEE_CONSTRUCTION      1.000     0.600     0.750         5
        B-AVEC_ASCENSEUR      0.754     0.776     0.765        67
        I-AVEC_ASCENSEUR      0.746     0.887     0.810        53
B-CHARGES_LOCATAIRE_MOIS      0.902     1.000     0.949        37
I-CHARGES_LOCATAIRE_MOIS      0.000     0.000     0.000         0
           B-CODE_POSTAL      0.833     0.833     0.833         6
           B-COPROPRIETE      0.850     1.000     0.919        17
           I-COPROPRIETE      0.000     0.000     0.000         0
            B-DATE_DISPO      0.737     0.636     0.683        44
            I-DATE_DISPO      0.824     0.651     0.727        43
        B-DEPOT_GARANTIE      1.000     1.000     1.000        22
        I

Les résultats sont similaires même apres l'optimisation.

# IV/ Interpretation avec le modele CRF

On regarde les transitions possibles entre les labels (plus le nombre est grand plus la possibilité de transition entre ces deux labels est grande).

In [22]:
def print_transitions(trans_features):
    for (label_from, label_to), weight in trans_features:
        print("%-6s -> %-7s %0.6f" % (label_from, label_to, weight))

print("Top likely transitions:")
print_transitions(Counter(crf.transition_features_).most_common(20))

print("\nTop unlikely transitions:")
print_transitions(Counter(crf.transition_features_).most_common()[-20:])

Top likely transitions:
I-ADRESSE -> I-ADRESSE 7.486226
B-ADRESSE -> I-ADRESSE 6.958531
I-N_ETAGE -> I-N_ETAGE 6.762510
B-DATE_DISPO -> I-DATE_DISPO 6.554792
B-QUARTIER -> I-QUARTIER 6.316372
I-VILLE -> I-VILLE 5.916148
I-QUARTIER -> I-QUARTIER 5.897852
I-DATE_DISPO -> I-DATE_DISPO 5.820874
B-AVEC_ASCENSEUR -> I-AVEC_ASCENSEUR 5.515281
B-TRANSPORTS_PROXIMITE -> I-TRANSPORTS_PROXIMITE 4.676148
I-AVEC_ASCENSEUR -> I-AVEC_ASCENSEUR 4.675277
B-LOYER_CC -> I-LOYER_CC 4.657026
B-HONORAIRE -> I-HONORAIRE 4.636944
B-DEPOT_GARANTIE -> I-DEPOT_GARANTIE 4.628858
I-TRANSPORTS_PROXIMITE -> I-TRANSPORTS_PROXIMITE 4.557614
I-TYPE_LOCATION -> I-TYPE_LOCATION 4.337032
I-ANNEE_CONSTRUCTION -> I-ANNEE_CONSTRUCTION 4.307428
B-CODE_POSTAL -> B-VILLE 4.304589
B-M2   -> I-M2    4.219815
B-TYPE_LOCATION -> I-TYPE_LOCATION 3.973663

Top unlikely transitions:
O      -> I-CHARGES_LOCATAIRE_MOIS -0.524001
O      -> I-VILLE -0.572797
O      -> I-STOCKAGE -0.575285
B-DEPOT_GARANTIE -> B-DEPOT_GARANTIE -0.684227
O  

Au vu de la convention IOB qu'on suit, les resultats sont coherents.

On affiche les features les plus communs et les moins communs.

In [23]:
def print_state_features(state_features):
    for (attr, label), weight in state_features:
        print("%0.6f %-8s %s" % (weight, label, attr))    

print("Top positive:")
print_state_features(Counter(crf.state_features_).most_common(30))

print("\nTop negative:")
print_state_features(Counter(crf.state_features_).most_common()[-30:])

Top positive:
10.169776 B-TRANSPORTS_PROXIMITE word.lower():tram
9.233750 B-STOCKAGE word.lower():cave
8.910010 B-TRANSPORTS_PROXIMITE word.lower():rer
8.657974 B-M2     next_word:m²
8.557143 B-N_PIECES next_word:pièces
8.292009 B-COPROPRIETE word.lower():copropriété
8.032126 O        bias
7.776240 B-QUARTIER word.lower():centre-ville
7.709535 B-TRANSPORTS_PROXIMITE word.lower():métro
7.454967 B-N_CHAMBRES next_word:chambres
7.440319 B-STOCKAGE word.lower():box
7.347194 B-VILLE  word.lower():issy
7.185743 B-TRANSPORTS_PROXIMITE word.lower():tramway
7.155676 B-EXTERIEUR word.lower():balcon
7.050116 B-AVEC_ASCENSEUR word.lower():ascenseur
6.892365 B-PARKING word.lower():parking
6.839029 B-DEPOT_GARANTIE 2prev_word:garantie
6.662178 B-TYPE_LOCATION word.lower():vide
6.564131 B-PARKING word.lower():stationnement
6.562711 B-EXTERIEUR word.lower():jardin
6.354313 B-TYPE_LOCATION word.lower():meublé
6.321068 B-ADRESSE word.lower():impasse
6.299143 B-TRANSPORTS_PROXIMITE word.lower():bus
6.140

On affiche les poids des transition et les top features pour chaque classe.

In [24]:
 eli5.show_weights(crf, top=10) # Currently ELI5 allows to explain weights and predictions of scikit-learn classifier

From \ To,O,B-ADRESSE,I-ADRESSE,B-ANNEE_CONSTRUCTION,I-ANNEE_CONSTRUCTION,B-AVEC_ASCENSEUR,I-AVEC_ASCENSEUR,B-CHARGES_LOCATAIRE_MOIS,I-CHARGES_LOCATAIRE_MOIS,B-CODE_POSTAL,B-COPROPRIETE,I-COPROPRIETE,B-DATE_DISPO,I-DATE_DISPO,B-DEPOT_GARANTIE,I-DEPOT_GARANTIE,B-EXTERIEUR,B-HONORAIRE,I-HONORAIRE,B-LOYER_CC,I-LOYER_CC,B-LOYER_HC,I-LOYER_HC,B-M2,I-M2,B-N_CHAMBRES,B-N_ETAGE,I-N_ETAGE,B-N_PIECES,I-N_PIECES,B-PARKING,I-PARKING,B-QUARTIER,I-QUARTIER,B-STOCKAGE,I-STOCKAGE,B-TRANSPORTS_PROXIMITE,I-TRANSPORTS_PROXIMITE,B-TYPE_CHAUFFAGE,I-TYPE_CHAUFFAGE,B-TYPE_LOCATION,I-TYPE_LOCATION,B-VILLE,I-VILLE
O,1.545,0.428,-3.202,0.2,-2.712,1.682,-2.899,0.454,-0.524,-0.068,1.015,-0.512,1.529,-2.299,0.279,-1.159,2.056,1.519,-0.517,0.105,-1.049,0.466,-0.494,1.575,-1.08,0.609,1.702,-2.372,1.541,-1.2,1.519,-0.513,1.399,-1.688,1.374,-0.575,2.074,-2.554,1.258,-0.685,1.499,-1.231,1.507,-0.573
B-ADRESSE,-0.521,-0.006,6.959,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I-ADRESSE,-0.09,0.0,7.486,0.0,0.0,0.0,0.0,0.0,0.0,0.856,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B-ANNEE_CONSTRUCTION,0.002,0.0,0.0,0.0,3.549,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.53,-0.066,0.0,0.0,0.0,0.0,0.0,0.0
I-ANNEE_CONSTRUCTION,0.0,0.0,0.0,0.0,4.307,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B-AVEC_ASCENSEUR,-1.049,0.0,0.0,0.0,0.0,-0.376,5.515,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I-AVEC_ASCENSEUR,0.538,0.0,0.0,0.0,0.0,0.0,4.675,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B-CHARGES_LOCATAIRE_MOIS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.184,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I-CHARGES_LOCATAIRE_MOIS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B-CODE_POSTAL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.305,0.0

Weight?,Feature,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Unnamed: 13_level_0,Unnamed: 14_level_0,Unnamed: 15_level_0,Unnamed: 16_level_0,Unnamed: 17_level_0,Unnamed: 18_level_0,Unnamed: 19_level_0,Unnamed: 20_level_0,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0,Unnamed: 25_level_0,Unnamed: 26_level_0,Unnamed: 27_level_0,Unnamed: 28_level_0,Unnamed: 29_level_0,Unnamed: 30_level_0,Unnamed: 31_level_0,Unnamed: 32_level_0,Unnamed: 33_level_0,Unnamed: 34_level_0,Unnamed: 35_level_0,Unnamed: 36_level_0,Unnamed: 37_level_0,Unnamed: 38_level_0,Unnamed: 39_level_0,Unnamed: 40_level_0,Unnamed: 41_level_0,Unnamed: 42_level_0,Unnamed: 43_level_0
Weight?,Feature,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1
Weight?,Feature,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2
Weight?,Feature,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3,Unnamed: 35_level_3,Unnamed: 36_level_3,Unnamed: 37_level_3,Unnamed: 38_level_3,Unnamed: 39_level_3,Unnamed: 40_level_3,Unnamed: 41_level_3,Unnamed: 42_level_3,Unnamed: 43_level_3
Weight?,Feature,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4,Unnamed: 17_level_4,Unnamed: 18_level_4,Unnamed: 19_level_4,Unnamed: 20_level_4,Unnamed: 21_level_4,Unnamed: 22_level_4,Unnamed: 23_level_4,Unnamed: 24_level_4,Unnamed: 25_level_4,Unnamed: 26_level_4,Unnamed: 27_level_4,Unnamed: 28_level_4,Unnamed: 29_level_4,Unnamed: 30_level_4,Unnamed: 31_level_4,Unnamed: 32_level_4,Unnamed: 33_level_4,Unnamed: 34_level_4,Unnamed: 35_level_4,Unnamed: 36_level_4,Unnamed: 37_level_4,Unnamed: 38_level_4,Unnamed: 39_level_4,Unnamed: 40_level_4,Unnamed: 41_level_4,Unnamed: 42_level_4,Unnamed: 43_level_4
Weight?,Feature,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5,Unnamed: 11_level_5,Unnamed: 12_level_5,Unnamed: 13_level_5,Unnamed: 14_level_5,Unnamed: 15_level_5,Unnamed: 16_level_5,Unnamed: 17_level_5,Unnamed: 18_level_5,Unnamed: 19_level_5,Unnamed: 20_level_5,Unnamed: 21_level_5,Unnamed: 22_level_5,Unnamed: 23_level_5,Unnamed: 24_level_5,Unnamed: 25_level_5,Unnamed: 26_level_5,Unnamed: 27_level_5,Unnamed: 28_level_5,Unnamed: 29_level_5,Unnamed: 30_level_5,Unnamed: 31_level_5,Unnamed: 32_level_5,Unnamed: 33_level_5,Unnamed: 34_level_5,Unnamed: 35_level_5,Unnamed: 36_level_5,Unnamed: 37_level_5,Unnamed: 38_level_5,Unnamed: 39_level_5,Unnamed: 40_level_5,Unnamed: 41_level_5,Unnamed: 42_level_5,Unnamed: 43_level_5
Weight?,Feature,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,Unnamed: 7_level_6,Unnamed: 8_level_6,Unnamed: 9_level_6,Unnamed: 10_level_6,Unnamed: 11_level_6,Unnamed: 12_level_6,Unnamed: 13_level_6,Unnamed: 14_level_6,Unnamed: 15_level_6,Unnamed: 16_level_6,Unnamed: 17_level_6,Unnamed: 18_level_6,Unnamed: 19_level_6,Unnamed: 20_level_6,Unnamed: 21_level_6,Unnamed: 22_level_6,Unnamed: 23_level_6,Unnamed: 24_level_6,Unnamed: 25_level_6,Unnamed: 26_level_6,Unnamed: 27_level_6,Unnamed: 28_level_6,Unnamed: 29_level_6,Unnamed: 30_level_6,Unnamed: 31_level_6,Unnamed: 32_level_6,Unnamed: 33_level_6,Unnamed: 34_level_6,Unnamed: 35_level_6,Unnamed: 36_level_6,Unnamed: 37_level_6,Unnamed: 38_level_6,Unnamed: 39_level_6,Unnamed: 40_level_6,Unnamed: 41_level_6,Unnamed: 42_level_6,Unnamed: 43_level_6
Weight?,Feature,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,Unnamed: 8_level_7,Unnamed: 9_level_7,Unnamed: 10_level_7,Unnamed: 11_level_7,Unnamed: 12_level_7,Unnamed: 13_level_7,Unnamed: 14_level_7,Unnamed: 15_level_7,Unnamed: 16_level_7,Unnamed: 17_level_7,Unnamed: 18_level_7,Unnamed: 19_level_7,Unnamed: 20_level_7,Unnamed: 21_level_7,Unnamed: 22_level_7,Unnamed: 23_level_7,Unnamed: 24_level_7,Unnamed: 25_level_7,Unnamed: 26_level_7,Unnamed: 27_level_7,Unnamed: 28_level_7,Unnamed: 29_level_7,Unnamed: 30_level_7,Unnamed: 31_level_7,Unnamed: 32_level_7,Unnamed: 33_level_7,Unnamed: 34_level_7,Unnamed: 35_level_7,Unnamed: 36_level_7,Unnamed: 37_level_7,Unnamed: 38_level_7,Unnamed: 39_level_7,Unnamed: 40_level_7,Unnamed: 41_level_7,Unnamed: 42_level_7,Unnamed: 43_level_7
Weight?,Feature,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,Unnamed: 9_level_8,Unnamed: 10_level_8,Unnamed: 11_level_8,Unnamed: 12_level_8,Unnamed: 13_level_8,Unnamed: 14_level_8,Unnamed: 15_level_8,Unnamed: 16_level_8,Unnamed: 17_level_8,Unnamed: 18_level_8,Unnamed: 19_level_8,Unnamed: 20_level_8,Unnamed: 21_level_8,Unnamed: 22_level_8,Unnamed: 23_level_8,Unnamed: 24_level_8,Unnamed: 25_level_8,Unnamed: 26_level_8,Unnamed: 27_level_8,Unnamed: 28_level_8,Unnamed: 29_level_8,Unnamed: 30_level_8,Unnamed: 31_level_8,Unnamed: 32_level_8,Unnamed: 33_level_8,Unnamed: 34_level_8,Unnamed: 35_level_8,Unnamed: 36_level_8,Unnamed: 37_level_8,Unnamed: 38_level_8,Unnamed: 39_level_8,Unnamed: 40_level_8,Unnamed: 41_level_8,Unnamed: 42_level_8,Unnamed: 43_level_8
Weight?,Feature,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,Unnamed: 10_level_9,Unnamed: 11_level_9,Unnamed: 12_level_9,Unnamed: 13_level_9,Unnamed: 14_level_9,Unnamed: 15_level_9,Unnamed: 16_level_9,Unnamed: 17_level_9,Unnamed: 18_level_9,Unnamed: 19_level_9,Unnamed: 20_level_9,Unnamed: 21_level_9,Unnamed: 22_level_9,Unnamed: 23_level_9,Unnamed: 24_level_9,Unnamed: 25_level_9,Unnamed: 26_level_9,Unnamed: 27_level_9,Unnamed: 28_level_9,Unnamed: 29_level_9,Unnamed: 30_level_9,Unnamed: 31_level_9,Unnamed: 32_level_9,Unnamed: 33_level_9,Unnamed: 34_level_9,Unnamed: 35_level_9,Unnamed: 36_level_9,Unnamed: 37_level_9,Unnamed: 38_level_9,Unnamed: 39_level_9,Unnamed: 40_level_9,Unnamed: 41_level_9,Unnamed: 42_level_9,Unnamed: 43_level_9
Weight?,Feature,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,Unnamed: 11_level_10,Unnamed: 12_level_10,Unnamed: 13_level_10,Unnamed: 14_level_10,Unnamed: 15_level_10,Unnamed: 16_level_10,Unnamed: 17_level_10,Unnamed: 18_level_10,Unnamed: 19_level_10,Unnamed: 20_level_10,Unnamed: 21_level_10,Unnamed: 22_level_10,Unnamed: 23_level_10,Unnamed: 24_level_10,Unnamed: 25_level_10,Unnamed: 26_level_10,Unnamed: 27_level_10,Unnamed: 28_level_10,Unnamed: 29_level_10,Unnamed: 30_level_10,Unnamed: 31_level_10,Unnamed: 32_level_10,Unnamed: 33_level_10,Unnamed: 34_level_10,Unnamed: 35_level_10,Unnamed: 36_level_10,Unnamed: 37_level_10,Unnamed: 38_level_10,Unnamed: 39_level_10,Unnamed: 40_level_10,Unnamed: 41_level_10,Unnamed: 42_level_10,Unnamed: 43_level_10
Weight?,Feature,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,Unnamed: 12_level_11,Unnamed: 13_level_11,Unnamed: 14_level_11,Unnamed: 15_level_11,Unnamed: 16_level_11,Unnamed: 17_level_11,Unnamed: 18_level_11,Unnamed: 19_level_11,Unnamed: 20_level_11,Unnamed: 21_level_11,Unnamed: 22_level_11,Unnamed: 23_level_11,Unnamed: 24_level_11,Unnamed: 25_level_11,Unnamed: 26_level_11,Unnamed: 27_level_11,Unnamed: 28_level_11,Unnamed: 29_level_11,Unnamed: 30_level_11,Unnamed: 31_level_11,Unnamed: 32_level_11,Unnamed: 33_level_11,Unnamed: 34_level_11,Unnamed: 35_level_11,Unnamed: 36_level_11,Unnamed: 37_level_11,Unnamed: 38_level_11,Unnamed: 39_level_11,Unnamed: 40_level_11,Unnamed: 41_level_11,Unnamed: 42_level_11,Unnamed: 43_level_11
Weight?,Feature,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,Unnamed: 13_level_12,Unnamed: 14_level_12,Unnamed: 15_level_12,Unnamed: 16_level_12,Unnamed: 17_level_12,Unnamed: 18_level_12,Unnamed: 19_level_12,Unnamed: 20_level_12,Unnamed: 21_level_12,Unnamed: 22_level_12,Unnamed: 23_level_12,Unnamed: 24_level_12,Unnamed: 25_level_12,Unnamed: 26_level_12,Unnamed: 27_level_12,Unnamed: 28_level_12,Unnamed: 29_level_12,Unnamed: 30_level_12,Unnamed: 31_level_12,Unnamed: 32_level_12,Unnamed: 33_level_12,Unnamed: 34_level_12,Unnamed: 35_level_12,Unnamed: 36_level_12,Unnamed: 37_level_12,Unnamed: 38_level_12,Unnamed: 39_level_12,Unnamed: 40_level_12,Unnamed: 41_level_12,Unnamed: 42_level_12,Unnamed: 43_level_12
Weight?,Feature,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,Unnamed: 14_level_13,Unnamed: 15_level_13,Unnamed: 16_level_13,Unnamed: 17_level_13,Unnamed: 18_level_13,Unnamed: 19_level_13,Unnamed: 20_level_13,Unnamed: 21_level_13,Unnamed: 22_level_13,Unnamed: 23_level_13,Unnamed: 24_level_13,Unnamed: 25_level_13,Unnamed: 26_level_13,Unnamed: 27_level_13,Unnamed: 28_level_13,Unnamed: 29_level_13,Unnamed: 30_level_13,Unnamed: 31_level_13,Unnamed: 32_level_13,Unnamed: 33_level_13,Unnamed: 34_level_13,Unnamed: 35_level_13,Unnamed: 36_level_13,Unnamed: 37_level_13,Unnamed: 38_level_13,Unnamed: 39_level_13,Unnamed: 40_level_13,Unnamed: 41_level_13,Unnamed: 42_level_13,Unnamed: 43_level_13
Weight?,Feature,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,Unnamed: 15_level_14,Unnamed: 16_level_14,Unnamed: 17_level_14,Unnamed: 18_level_14,Unnamed: 19_level_14,Unnamed: 20_level_14,Unnamed: 21_level_14,Unnamed: 22_level_14,Unnamed: 23_level_14,Unnamed: 24_level_14,Unnamed: 25_level_14,Unnamed: 26_level_14,Unnamed: 27_level_14,Unnamed: 28_level_14,Unnamed: 29_level_14,Unnamed: 30_level_14,Unnamed: 31_level_14,Unnamed: 32_level_14,Unnamed: 33_level_14,Unnamed: 34_level_14,Unnamed: 35_level_14,Unnamed: 36_level_14,Unnamed: 37_level_14,Unnamed: 38_level_14,Unnamed: 39_level_14,Unnamed: 40_level_14,Unnamed: 41_level_14,Unnamed: 42_level_14,Unnamed: 43_level_14
Weight?,Feature,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,Unnamed: 16_level_15,Unnamed: 17_level_15,Unnamed: 18_level_15,Unnamed: 19_level_15,Unnamed: 20_level_15,Unnamed: 21_level_15,Unnamed: 22_level_15,Unnamed: 23_level_15,Unnamed: 24_level_15,Unnamed: 25_level_15,Unnamed: 26_level_15,Unnamed: 27_level_15,Unnamed: 28_level_15,Unnamed: 29_level_15,Unnamed: 30_level_15,Unnamed: 31_level_15,Unnamed: 32_level_15,Unnamed: 33_level_15,Unnamed: 34_level_15,Unnamed: 35_level_15,Unnamed: 36_level_15,Unnamed: 37_level_15,Unnamed: 38_level_15,Unnamed: 39_level_15,Unnamed: 40_level_15,Unnamed: 41_level_15,Unnamed: 42_level_15,Unnamed: 43_level_15
Weight?,Feature,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16,Unnamed: 17_level_16,Unnamed: 18_level_16,Unnamed: 19_level_16,Unnamed: 20_level_16,Unnamed: 21_level_16,Unnamed: 22_level_16,Unnamed: 23_level_16,Unnamed: 24_level_16,Unnamed: 25_level_16,Unnamed: 26_level_16,Unnamed: 27_level_16,Unnamed: 28_level_16,Unnamed: 29_level_16,Unnamed: 30_level_16,Unnamed: 31_level_16,Unnamed: 32_level_16,Unnamed: 33_level_16,Unnamed: 34_level_16,Unnamed: 35_level_16,Unnamed: 36_level_16,Unnamed: 37_level_16,Unnamed: 38_level_16,Unnamed: 39_level_16,Unnamed: 40_level_16,Unnamed: 41_level_16,Unnamed: 42_level_16,Unnamed: 43_level_16
Weight?,Feature,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17,Unnamed: 10_level_17,Unnamed: 11_level_17,Unnamed: 12_level_17,Unnamed: 13_level_17,Unnamed: 14_level_17,Unnamed: 15_level_17,Unnamed: 16_level_17,Unnamed: 17_level_17,Unnamed: 18_level_17,Unnamed: 19_level_17,Unnamed: 20_level_17,Unnamed: 21_level_17,Unnamed: 22_level_17,Unnamed: 23_level_17,Unnamed: 24_level_17,Unnamed: 25_level_17,Unnamed: 26_level_17,Unnamed: 27_level_17,Unnamed: 28_level_17,Unnamed: 29_level_17,Unnamed: 30_level_17,Unnamed: 31_level_17,Unnamed: 32_level_17,Unnamed: 33_level_17,Unnamed: 34_level_17,Unnamed: 35_level_17,Unnamed: 36_level_17,Unnamed: 37_level_17,Unnamed: 38_level_17,Unnamed: 39_level_17,Unnamed: 40_level_17,Unnamed: 41_level_17,Unnamed: 42_level_17,Unnamed: 43_level_17
Weight?,Feature,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18,Unnamed: 10_level_18,Unnamed: 11_level_18,Unnamed: 12_level_18,Unnamed: 13_level_18,Unnamed: 14_level_18,Unnamed: 15_level_18,Unnamed: 16_level_18,Unnamed: 17_level_18,Unnamed: 18_level_18,Unnamed: 19_level_18,Unnamed: 20_level_18,Unnamed: 21_level_18,Unnamed: 22_level_18,Unnamed: 23_level_18,Unnamed: 24_level_18,Unnamed: 25_level_18,Unnamed: 26_level_18,Unnamed: 27_level_18,Unnamed: 28_level_18,Unnamed: 29_level_18,Unnamed: 30_level_18,Unnamed: 31_level_18,Unnamed: 32_level_18,Unnamed: 33_level_18,Unnamed: 34_level_18,Unnamed: 35_level_18,Unnamed: 36_level_18,Unnamed: 37_level_18,Unnamed: 38_level_18,Unnamed: 39_level_18,Unnamed: 40_level_18,Unnamed: 41_level_18,Unnamed: 42_level_18,Unnamed: 43_level_18
Weight?,Feature,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19,Unnamed: 10_level_19,Unnamed: 11_level_19,Unnamed: 12_level_19,Unnamed: 13_level_19,Unnamed: 14_level_19,Unnamed: 15_level_19,Unnamed: 16_level_19,Unnamed: 17_level_19,Unnamed: 18_level_19,Unnamed: 19_level_19,Unnamed: 20_level_19,Unnamed: 21_level_19,Unnamed: 22_level_19,Unnamed: 23_level_19,Unnamed: 24_level_19,Unnamed: 25_level_19,Unnamed: 26_level_19,Unnamed: 27_level_19,Unnamed: 28_level_19,Unnamed: 29_level_19,Unnamed: 30_level_19,Unnamed: 31_level_19,Unnamed: 32_level_19,Unnamed: 33_level_19,Unnamed: 34_level_19,Unnamed: 35_level_19,Unnamed: 36_level_19,Unnamed: 37_level_19,Unnamed: 38_level_19,Unnamed: 39_level_19,Unnamed: 40_level_19,Unnamed: 41_level_19,Unnamed: 42_level_19,Unnamed: 43_level_19
Weight?,Feature,Unnamed: 2_level_20,Unnamed: 3_level_20,Unnamed: 4_level_20,Unnamed: 5_level_20,Unnamed: 6_level_20,Unnamed: 7_level_20,Unnamed: 8_level_20,Unnamed: 9_level_20,Unnamed: 10_level_20,Unnamed: 11_level_20,Unnamed: 12_level_20,Unnamed: 13_level_20,Unnamed: 14_level_20,Unnamed: 15_level_20,Unnamed: 16_level_20,Unnamed: 17_level_20,Unnamed: 18_level_20,Unnamed: 19_level_20,Unnamed: 20_level_20,Unnamed: 21_level_20,Unnamed: 22_level_20,Unnamed: 23_level_20,Unnamed: 24_level_20,Unnamed: 25_level_20,Unnamed: 26_level_20,Unnamed: 27_level_20,Unnamed: 28_level_20,Unnamed: 29_level_20,Unnamed: 30_level_20,Unnamed: 31_level_20,Unnamed: 32_level_20,Unnamed: 33_level_20,Unnamed: 34_level_20,Unnamed: 35_level_20,Unnamed: 36_level_20,Unnamed: 37_level_20,Unnamed: 38_level_20,Unnamed: 39_level_20,Unnamed: 40_level_20,Unnamed: 41_level_20,Unnamed: 42_level_20,Unnamed: 43_level_20
Weight?,Feature,Unnamed: 2_level_21,Unnamed: 3_level_21,Unnamed: 4_level_21,Unnamed: 5_level_21,Unnamed: 6_level_21,Unnamed: 7_level_21,Unnamed: 8_level_21,Unnamed: 9_level_21,Unnamed: 10_level_21,Unnamed: 11_level_21,Unnamed: 12_level_21,Unnamed: 13_level_21,Unnamed: 14_level_21,Unnamed: 15_level_21,Unnamed: 16_level_21,Unnamed: 17_level_21,Unnamed: 18_level_21,Unnamed: 19_level_21,Unnamed: 20_level_21,Unnamed: 21_level_21,Unnamed: 22_level_21,Unnamed: 23_level_21,Unnamed: 24_level_21,Unnamed: 25_level_21,Unnamed: 26_level_21,Unnamed: 27_level_21,Unnamed: 28_level_21,Unnamed: 29_level_21,Unnamed: 30_level_21,Unnamed: 31_level_21,Unnamed: 32_level_21,Unnamed: 33_level_21,Unnamed: 34_level_21,Unnamed: 35_level_21,Unnamed: 36_level_21,Unnamed: 37_level_21,Unnamed: 38_level_21,Unnamed: 39_level_21,Unnamed: 40_level_21,Unnamed: 41_level_21,Unnamed: 42_level_21,Unnamed: 43_level_21
Weight?,Feature,Unnamed: 2_level_22,Unnamed: 3_level_22,Unnamed: 4_level_22,Unnamed: 5_level_22,Unnamed: 6_level_22,Unnamed: 7_level_22,Unnamed: 8_level_22,Unnamed: 9_level_22,Unnamed: 10_level_22,Unnamed: 11_level_22,Unnamed: 12_level_22,Unnamed: 13_level_22,Unnamed: 14_level_22,Unnamed: 15_level_22,Unnamed: 16_level_22,Unnamed: 17_level_22,Unnamed: 18_level_22,Unnamed: 19_level_22,Unnamed: 20_level_22,Unnamed: 21_level_22,Unnamed: 22_level_22,Unnamed: 23_level_22,Unnamed: 24_level_22,Unnamed: 25_level_22,Unnamed: 26_level_22,Unnamed: 27_level_22,Unnamed: 28_level_22,Unnamed: 29_level_22,Unnamed: 30_level_22,Unnamed: 31_level_22,Unnamed: 32_level_22,Unnamed: 33_level_22,Unnamed: 34_level_22,Unnamed: 35_level_22,Unnamed: 36_level_22,Unnamed: 37_level_22,Unnamed: 38_level_22,Unnamed: 39_level_22,Unnamed: 40_level_22,Unnamed: 41_level_22,Unnamed: 42_level_22,Unnamed: 43_level_22
Weight?,Feature,Unnamed: 2_level_23,Unnamed: 3_level_23,Unnamed: 4_level_23,Unnamed: 5_level_23,Unnamed: 6_level_23,Unnamed: 7_level_23,Unnamed: 8_level_23,Unnamed: 9_level_23,Unnamed: 10_level_23,Unnamed: 11_level_23,Unnamed: 12_level_23,Unnamed: 13_level_23,Unnamed: 14_level_23,Unnamed: 15_level_23,Unnamed: 16_level_23,Unnamed: 17_level_23,Unnamed: 18_level_23,Unnamed: 19_level_23,Unnamed: 20_level_23,Unnamed: 21_level_23,Unnamed: 22_level_23,Unnamed: 23_level_23,Unnamed: 24_level_23,Unnamed: 25_level_23,Unnamed: 26_level_23,Unnamed: 27_level_23,Unnamed: 28_level_23,Unnamed: 29_level_23,Unnamed: 30_level_23,Unnamed: 31_level_23,Unnamed: 32_level_23,Unnamed: 33_level_23,Unnamed: 34_level_23,Unnamed: 35_level_23,Unnamed: 36_level_23,Unnamed: 37_level_23,Unnamed: 38_level_23,Unnamed: 39_level_23,Unnamed: 40_level_23,Unnamed: 41_level_23,Unnamed: 42_level_23,Unnamed: 43_level_23
Weight?,Feature,Unnamed: 2_level_24,Unnamed: 3_level_24,Unnamed: 4_level_24,Unnamed: 5_level_24,Unnamed: 6_level_24,Unnamed: 7_level_24,Unnamed: 8_level_24,Unnamed: 9_level_24,Unnamed: 10_level_24,Unnamed: 11_level_24,Unnamed: 12_level_24,Unnamed: 13_level_24,Unnamed: 14_level_24,Unnamed: 15_level_24,Unnamed: 16_level_24,Unnamed: 17_level_24,Unnamed: 18_level_24,Unnamed: 19_level_24,Unnamed: 20_level_24,Unnamed: 21_level_24,Unnamed: 22_level_24,Unnamed: 23_level_24,Unnamed: 24_level_24,Unnamed: 25_level_24,Unnamed: 26_level_24,Unnamed: 27_level_24,Unnamed: 28_level_24,Unnamed: 29_level_24,Unnamed: 30_level_24,Unnamed: 31_level_24,Unnamed: 32_level_24,Unnamed: 33_level_24,Unnamed: 34_level_24,Unnamed: 35_level_24,Unnamed: 36_level_24,Unnamed: 37_level_24,Unnamed: 38_level_24,Unnamed: 39_level_24,Unnamed: 40_level_24,Unnamed: 41_level_24,Unnamed: 42_level_24,Unnamed: 43_level_24
Weight?,Feature,Unnamed: 2_level_25,Unnamed: 3_level_25,Unnamed: 4_level_25,Unnamed: 5_level_25,Unnamed: 6_level_25,Unnamed: 7_level_25,Unnamed: 8_level_25,Unnamed: 9_level_25,Unnamed: 10_level_25,Unnamed: 11_level_25,Unnamed: 12_level_25,Unnamed: 13_level_25,Unnamed: 14_level_25,Unnamed: 15_level_25,Unnamed: 16_level_25,Unnamed: 17_level_25,Unnamed: 18_level_25,Unnamed: 19_level_25,Unnamed: 20_level_25,Unnamed: 21_level_25,Unnamed: 22_level_25,Unnamed: 23_level_25,Unnamed: 24_level_25,Unnamed: 25_level_25,Unnamed: 26_level_25,Unnamed: 27_level_25,Unnamed: 28_level_25,Unnamed: 29_level_25,Unnamed: 30_level_25,Unnamed: 31_level_25,Unnamed: 32_level_25,Unnamed: 33_level_25,Unnamed: 34_level_25,Unnamed: 35_level_25,Unnamed: 36_level_25,Unnamed: 37_level_25,Unnamed: 38_level_25,Unnamed: 39_level_25,Unnamed: 40_level_25,Unnamed: 41_level_25,Unnamed: 42_level_25,Unnamed: 43_level_25
Weight?,Feature,Unnamed: 2_level_26,Unnamed: 3_level_26,Unnamed: 4_level_26,Unnamed: 5_level_26,Unnamed: 6_level_26,Unnamed: 7_level_26,Unnamed: 8_level_26,Unnamed: 9_level_26,Unnamed: 10_level_26,Unnamed: 11_level_26,Unnamed: 12_level_26,Unnamed: 13_level_26,Unnamed: 14_level_26,Unnamed: 15_level_26,Unnamed: 16_level_26,Unnamed: 17_level_26,Unnamed: 18_level_26,Unnamed: 19_level_26,Unnamed: 20_level_26,Unnamed: 21_level_26,Unnamed: 22_level_26,Unnamed: 23_level_26,Unnamed: 24_level_26,Unnamed: 25_level_26,Unnamed: 26_level_26,Unnamed: 27_level_26,Unnamed: 28_level_26,Unnamed: 29_level_26,Unnamed: 30_level_26,Unnamed: 31_level_26,Unnamed: 32_level_26,Unnamed: 33_level_26,Unnamed: 34_level_26,Unnamed: 35_level_26,Unnamed: 36_level_26,Unnamed: 37_level_26,Unnamed: 38_level_26,Unnamed: 39_level_26,Unnamed: 40_level_26,Unnamed: 41_level_26,Unnamed: 42_level_26,Unnamed: 43_level_26
Weight?,Feature,Unnamed: 2_level_27,Unnamed: 3_level_27,Unnamed: 4_level_27,Unnamed: 5_level_27,Unnamed: 6_level_27,Unnamed: 7_level_27,Unnamed: 8_level_27,Unnamed: 9_level_27,Unnamed: 10_level_27,Unnamed: 11_level_27,Unnamed: 12_level_27,Unnamed: 13_level_27,Unnamed: 14_level_27,Unnamed: 15_level_27,Unnamed: 16_level_27,Unnamed: 17_level_27,Unnamed: 18_level_27,Unnamed: 19_level_27,Unnamed: 20_level_27,Unnamed: 21_level_27,Unnamed: 22_level_27,Unnamed: 23_level_27,Unnamed: 24_level_27,Unnamed: 25_level_27,Unnamed: 26_level_27,Unnamed: 27_level_27,Unnamed: 28_level_27,Unnamed: 29_level_27,Unnamed: 30_level_27,Unnamed: 31_level_27,Unnamed: 32_level_27,Unnamed: 33_level_27,Unnamed: 34_level_27,Unnamed: 35_level_27,Unnamed: 36_level_27,Unnamed: 37_level_27,Unnamed: 38_level_27,Unnamed: 39_level_27,Unnamed: 40_level_27,Unnamed: 41_level_27,Unnamed: 42_level_27,Unnamed: 43_level_27
Weight?,Feature,Unnamed: 2_level_28,Unnamed: 3_level_28,Unnamed: 4_level_28,Unnamed: 5_level_28,Unnamed: 6_level_28,Unnamed: 7_level_28,Unnamed: 8_level_28,Unnamed: 9_level_28,Unnamed: 10_level_28,Unnamed: 11_level_28,Unnamed: 12_level_28,Unnamed: 13_level_28,Unnamed: 14_level_28,Unnamed: 15_level_28,Unnamed: 16_level_28,Unnamed: 17_level_28,Unnamed: 18_level_28,Unnamed: 19_level_28,Unnamed: 20_level_28,Unnamed: 21_level_28,Unnamed: 22_level_28,Unnamed: 23_level_28,Unnamed: 24_level_28,Unnamed: 25_level_28,Unnamed: 26_level_28,Unnamed: 27_level_28,Unnamed: 28_level_28,Unnamed: 29_level_28,Unnamed: 30_level_28,Unnamed: 31_level_28,Unnamed: 32_level_28,Unnamed: 33_level_28,Unnamed: 34_level_28,Unnamed: 35_level_28,Unnamed: 36_level_28,Unnamed: 37_level_28,Unnamed: 38_level_28,Unnamed: 39_level_28,Unnamed: 40_level_28,Unnamed: 41_level_28,Unnamed: 42_level_28,Unnamed: 43_level_28
Weight?,Feature,Unnamed: 2_level_29,Unnamed: 3_level_29,Unnamed: 4_level_29,Unnamed: 5_level_29,Unnamed: 6_level_29,Unnamed: 7_level_29,Unnamed: 8_level_29,Unnamed: 9_level_29,Unnamed: 10_level_29,Unnamed: 11_level_29,Unnamed: 12_level_29,Unnamed: 13_level_29,Unnamed: 14_level_29,Unnamed: 15_level_29,Unnamed: 16_level_29,Unnamed: 17_level_29,Unnamed: 18_level_29,Unnamed: 19_level_29,Unnamed: 20_level_29,Unnamed: 21_level_29,Unnamed: 22_level_29,Unnamed: 23_level_29,Unnamed: 24_level_29,Unnamed: 25_level_29,Unnamed: 26_level_29,Unnamed: 27_level_29,Unnamed: 28_level_29,Unnamed: 29_level_29,Unnamed: 30_level_29,Unnamed: 31_level_29,Unnamed: 32_level_29,Unnamed: 33_level_29,Unnamed: 34_level_29,Unnamed: 35_level_29,Unnamed: 36_level_29,Unnamed: 37_level_29,Unnamed: 38_level_29,Unnamed: 39_level_29,Unnamed: 40_level_29,Unnamed: 41_level_29,Unnamed: 42_level_29,Unnamed: 43_level_29
Weight?,Feature,Unnamed: 2_level_30,Unnamed: 3_level_30,Unnamed: 4_level_30,Unnamed: 5_level_30,Unnamed: 6_level_30,Unnamed: 7_level_30,Unnamed: 8_level_30,Unnamed: 9_level_30,Unnamed: 10_level_30,Unnamed: 11_level_30,Unnamed: 12_level_30,Unnamed: 13_level_30,Unnamed: 14_level_30,Unnamed: 15_level_30,Unnamed: 16_level_30,Unnamed: 17_level_30,Unnamed: 18_level_30,Unnamed: 19_level_30,Unnamed: 20_level_30,Unnamed: 21_level_30,Unnamed: 22_level_30,Unnamed: 23_level_30,Unnamed: 24_level_30,Unnamed: 25_level_30,Unnamed: 26_level_30,Unnamed: 27_level_30,Unnamed: 28_level_30,Unnamed: 29_level_30,Unnamed: 30_level_30,Unnamed: 31_level_30,Unnamed: 32_level_30,Unnamed: 33_level_30,Unnamed: 34_level_30,Unnamed: 35_level_30,Unnamed: 36_level_30,Unnamed: 37_level_30,Unnamed: 38_level_30,Unnamed: 39_level_30,Unnamed: 40_level_30,Unnamed: 41_level_30,Unnamed: 42_level_30,Unnamed: 43_level_30
Weight?,Feature,Unnamed: 2_level_31,Unnamed: 3_level_31,Unnamed: 4_level_31,Unnamed: 5_level_31,Unnamed: 6_level_31,Unnamed: 7_level_31,Unnamed: 8_level_31,Unnamed: 9_level_31,Unnamed: 10_level_31,Unnamed: 11_level_31,Unnamed: 12_level_31,Unnamed: 13_level_31,Unnamed: 14_level_31,Unnamed: 15_level_31,Unnamed: 16_level_31,Unnamed: 17_level_31,Unnamed: 18_level_31,Unnamed: 19_level_31,Unnamed: 20_level_31,Unnamed: 21_level_31,Unnamed: 22_level_31,Unnamed: 23_level_31,Unnamed: 24_level_31,Unnamed: 25_level_31,Unnamed: 26_level_31,Unnamed: 27_level_31,Unnamed: 28_level_31,Unnamed: 29_level_31,Unnamed: 30_level_31,Unnamed: 31_level_31,Unnamed: 32_level_31,Unnamed: 33_level_31,Unnamed: 34_level_31,Unnamed: 35_level_31,Unnamed: 36_level_31,Unnamed: 37_level_31,Unnamed: 38_level_31,Unnamed: 39_level_31,Unnamed: 40_level_31,Unnamed: 41_level_31,Unnamed: 42_level_31,Unnamed: 43_level_31
Weight?,Feature,Unnamed: 2_level_32,Unnamed: 3_level_32,Unnamed: 4_level_32,Unnamed: 5_level_32,Unnamed: 6_level_32,Unnamed: 7_level_32,Unnamed: 8_level_32,Unnamed: 9_level_32,Unnamed: 10_level_32,Unnamed: 11_level_32,Unnamed: 12_level_32,Unnamed: 13_level_32,Unnamed: 14_level_32,Unnamed: 15_level_32,Unnamed: 16_level_32,Unnamed: 17_level_32,Unnamed: 18_level_32,Unnamed: 19_level_32,Unnamed: 20_level_32,Unnamed: 21_level_32,Unnamed: 22_level_32,Unnamed: 23_level_32,Unnamed: 24_level_32,Unnamed: 25_level_32,Unnamed: 26_level_32,Unnamed: 27_level_32,Unnamed: 28_level_32,Unnamed: 29_level_32,Unnamed: 30_level_32,Unnamed: 31_level_32,Unnamed: 32_level_32,Unnamed: 33_level_32,Unnamed: 34_level_32,Unnamed: 35_level_32,Unnamed: 36_level_32,Unnamed: 37_level_32,Unnamed: 38_level_32,Unnamed: 39_level_32,Unnamed: 40_level_32,Unnamed: 41_level_32,Unnamed: 42_level_32,Unnamed: 43_level_32
Weight?,Feature,Unnamed: 2_level_33,Unnamed: 3_level_33,Unnamed: 4_level_33,Unnamed: 5_level_33,Unnamed: 6_level_33,Unnamed: 7_level_33,Unnamed: 8_level_33,Unnamed: 9_level_33,Unnamed: 10_level_33,Unnamed: 11_level_33,Unnamed: 12_level_33,Unnamed: 13_level_33,Unnamed: 14_level_33,Unnamed: 15_level_33,Unnamed: 16_level_33,Unnamed: 17_level_33,Unnamed: 18_level_33,Unnamed: 19_level_33,Unnamed: 20_level_33,Unnamed: 21_level_33,Unnamed: 22_level_33,Unnamed: 23_level_33,Unnamed: 24_level_33,Unnamed: 25_level_33,Unnamed: 26_level_33,Unnamed: 27_level_33,Unnamed: 28_level_33,Unnamed: 29_level_33,Unnamed: 30_level_33,Unnamed: 31_level_33,Unnamed: 32_level_33,Unnamed: 33_level_33,Unnamed: 34_level_33,Unnamed: 35_level_33,Unnamed: 36_level_33,Unnamed: 37_level_33,Unnamed: 38_level_33,Unnamed: 39_level_33,Unnamed: 40_level_33,Unnamed: 41_level_33,Unnamed: 42_level_33,Unnamed: 43_level_33
Weight?,Feature,Unnamed: 2_level_34,Unnamed: 3_level_34,Unnamed: 4_level_34,Unnamed: 5_level_34,Unnamed: 6_level_34,Unnamed: 7_level_34,Unnamed: 8_level_34,Unnamed: 9_level_34,Unnamed: 10_level_34,Unnamed: 11_level_34,Unnamed: 12_level_34,Unnamed: 13_level_34,Unnamed: 14_level_34,Unnamed: 15_level_34,Unnamed: 16_level_34,Unnamed: 17_level_34,Unnamed: 18_level_34,Unnamed: 19_level_34,Unnamed: 20_level_34,Unnamed: 21_level_34,Unnamed: 22_level_34,Unnamed: 23_level_34,Unnamed: 24_level_34,Unnamed: 25_level_34,Unnamed: 26_level_34,Unnamed: 27_level_34,Unnamed: 28_level_34,Unnamed: 29_level_34,Unnamed: 30_level_34,Unnamed: 31_level_34,Unnamed: 32_level_34,Unnamed: 33_level_34,Unnamed: 34_level_34,Unnamed: 35_level_34,Unnamed: 36_level_34,Unnamed: 37_level_34,Unnamed: 38_level_34,Unnamed: 39_level_34,Unnamed: 40_level_34,Unnamed: 41_level_34,Unnamed: 42_level_34,Unnamed: 43_level_34
Weight?,Feature,Unnamed: 2_level_35,Unnamed: 3_level_35,Unnamed: 4_level_35,Unnamed: 5_level_35,Unnamed: 6_level_35,Unnamed: 7_level_35,Unnamed: 8_level_35,Unnamed: 9_level_35,Unnamed: 10_level_35,Unnamed: 11_level_35,Unnamed: 12_level_35,Unnamed: 13_level_35,Unnamed: 14_level_35,Unnamed: 15_level_35,Unnamed: 16_level_35,Unnamed: 17_level_35,Unnamed: 18_level_35,Unnamed: 19_level_35,Unnamed: 20_level_35,Unnamed: 21_level_35,Unnamed: 22_level_35,Unnamed: 23_level_35,Unnamed: 24_level_35,Unnamed: 25_level_35,Unnamed: 26_level_35,Unnamed: 27_level_35,Unnamed: 28_level_35,Unnamed: 29_level_35,Unnamed: 30_level_35,Unnamed: 31_level_35,Unnamed: 32_level_35,Unnamed: 33_level_35,Unnamed: 34_level_35,Unnamed: 35_level_35,Unnamed: 36_level_35,Unnamed: 37_level_35,Unnamed: 38_level_35,Unnamed: 39_level_35,Unnamed: 40_level_35,Unnamed: 41_level_35,Unnamed: 42_level_35,Unnamed: 43_level_35
Weight?,Feature,Unnamed: 2_level_36,Unnamed: 3_level_36,Unnamed: 4_level_36,Unnamed: 5_level_36,Unnamed: 6_level_36,Unnamed: 7_level_36,Unnamed: 8_level_36,Unnamed: 9_level_36,Unnamed: 10_level_36,Unnamed: 11_level_36,Unnamed: 12_level_36,Unnamed: 13_level_36,Unnamed: 14_level_36,Unnamed: 15_level_36,Unnamed: 16_level_36,Unnamed: 17_level_36,Unnamed: 18_level_36,Unnamed: 19_level_36,Unnamed: 20_level_36,Unnamed: 21_level_36,Unnamed: 22_level_36,Unnamed: 23_level_36,Unnamed: 24_level_36,Unnamed: 25_level_36,Unnamed: 26_level_36,Unnamed: 27_level_36,Unnamed: 28_level_36,Unnamed: 29_level_36,Unnamed: 30_level_36,Unnamed: 31_level_36,Unnamed: 32_level_36,Unnamed: 33_level_36,Unnamed: 34_level_36,Unnamed: 35_level_36,Unnamed: 36_level_36,Unnamed: 37_level_36,Unnamed: 38_level_36,Unnamed: 39_level_36,Unnamed: 40_level_36,Unnamed: 41_level_36,Unnamed: 42_level_36,Unnamed: 43_level_36
Weight?,Feature,Unnamed: 2_level_37,Unnamed: 3_level_37,Unnamed: 4_level_37,Unnamed: 5_level_37,Unnamed: 6_level_37,Unnamed: 7_level_37,Unnamed: 8_level_37,Unnamed: 9_level_37,Unnamed: 10_level_37,Unnamed: 11_level_37,Unnamed: 12_level_37,Unnamed: 13_level_37,Unnamed: 14_level_37,Unnamed: 15_level_37,Unnamed: 16_level_37,Unnamed: 17_level_37,Unnamed: 18_level_37,Unnamed: 19_level_37,Unnamed: 20_level_37,Unnamed: 21_level_37,Unnamed: 22_level_37,Unnamed: 23_level_37,Unnamed: 24_level_37,Unnamed: 25_level_37,Unnamed: 26_level_37,Unnamed: 27_level_37,Unnamed: 28_level_37,Unnamed: 29_level_37,Unnamed: 30_level_37,Unnamed: 31_level_37,Unnamed: 32_level_37,Unnamed: 33_level_37,Unnamed: 34_level_37,Unnamed: 35_level_37,Unnamed: 36_level_37,Unnamed: 37_level_37,Unnamed: 38_level_37,Unnamed: 39_level_37,Unnamed: 40_level_37,Unnamed: 41_level_37,Unnamed: 42_level_37,Unnamed: 43_level_37
Weight?,Feature,Unnamed: 2_level_38,Unnamed: 3_level_38,Unnamed: 4_level_38,Unnamed: 5_level_38,Unnamed: 6_level_38,Unnamed: 7_level_38,Unnamed: 8_level_38,Unnamed: 9_level_38,Unnamed: 10_level_38,Unnamed: 11_level_38,Unnamed: 12_level_38,Unnamed: 13_level_38,Unnamed: 14_level_38,Unnamed: 15_level_38,Unnamed: 16_level_38,Unnamed: 17_level_38,Unnamed: 18_level_38,Unnamed: 19_level_38,Unnamed: 20_level_38,Unnamed: 21_level_38,Unnamed: 22_level_38,Unnamed: 23_level_38,Unnamed: 24_level_38,Unnamed: 25_level_38,Unnamed: 26_level_38,Unnamed: 27_level_38,Unnamed: 28_level_38,Unnamed: 29_level_38,Unnamed: 30_level_38,Unnamed: 31_level_38,Unnamed: 32_level_38,Unnamed: 33_level_38,Unnamed: 34_level_38,Unnamed: 35_level_38,Unnamed: 36_level_38,Unnamed: 37_level_38,Unnamed: 38_level_38,Unnamed: 39_level_38,Unnamed: 40_level_38,Unnamed: 41_level_38,Unnamed: 42_level_38,Unnamed: 43_level_38
Weight?,Feature,Unnamed: 2_level_39,Unnamed: 3_level_39,Unnamed: 4_level_39,Unnamed: 5_level_39,Unnamed: 6_level_39,Unnamed: 7_level_39,Unnamed: 8_level_39,Unnamed: 9_level_39,Unnamed: 10_level_39,Unnamed: 11_level_39,Unnamed: 12_level_39,Unnamed: 13_level_39,Unnamed: 14_level_39,Unnamed: 15_level_39,Unnamed: 16_level_39,Unnamed: 17_level_39,Unnamed: 18_level_39,Unnamed: 19_level_39,Unnamed: 20_level_39,Unnamed: 21_level_39,Unnamed: 22_level_39,Unnamed: 23_level_39,Unnamed: 24_level_39,Unnamed: 25_level_39,Unnamed: 26_level_39,Unnamed: 27_level_39,Unnamed: 28_level_39,Unnamed: 29_level_39,Unnamed: 30_level_39,Unnamed: 31_level_39,Unnamed: 32_level_39,Unnamed: 33_level_39,Unnamed: 34_level_39,Unnamed: 35_level_39,Unnamed: 36_level_39,Unnamed: 37_level_39,Unnamed: 38_level_39,Unnamed: 39_level_39,Unnamed: 40_level_39,Unnamed: 41_level_39,Unnamed: 42_level_39,Unnamed: 43_level_39
Weight?,Feature,Unnamed: 2_level_40,Unnamed: 3_level_40,Unnamed: 4_level_40,Unnamed: 5_level_40,Unnamed: 6_level_40,Unnamed: 7_level_40,Unnamed: 8_level_40,Unnamed: 9_level_40,Unnamed: 10_level_40,Unnamed: 11_level_40,Unnamed: 12_level_40,Unnamed: 13_level_40,Unnamed: 14_level_40,Unnamed: 15_level_40,Unnamed: 16_level_40,Unnamed: 17_level_40,Unnamed: 18_level_40,Unnamed: 19_level_40,Unnamed: 20_level_40,Unnamed: 21_level_40,Unnamed: 22_level_40,Unnamed: 23_level_40,Unnamed: 24_level_40,Unnamed: 25_level_40,Unnamed: 26_level_40,Unnamed: 27_level_40,Unnamed: 28_level_40,Unnamed: 29_level_40,Unnamed: 30_level_40,Unnamed: 31_level_40,Unnamed: 32_level_40,Unnamed: 33_level_40,Unnamed: 34_level_40,Unnamed: 35_level_40,Unnamed: 36_level_40,Unnamed: 37_level_40,Unnamed: 38_level_40,Unnamed: 39_level_40,Unnamed: 40_level_40,Unnamed: 41_level_40,Unnamed: 42_level_40,Unnamed: 43_level_40
Weight?,Feature,Unnamed: 2_level_41,Unnamed: 3_level_41,Unnamed: 4_level_41,Unnamed: 5_level_41,Unnamed: 6_level_41,Unnamed: 7_level_41,Unnamed: 8_level_41,Unnamed: 9_level_41,Unnamed: 10_level_41,Unnamed: 11_level_41,Unnamed: 12_level_41,Unnamed: 13_level_41,Unnamed: 14_level_41,Unnamed: 15_level_41,Unnamed: 16_level_41,Unnamed: 17_level_41,Unnamed: 18_level_41,Unnamed: 19_level_41,Unnamed: 20_level_41,Unnamed: 21_level_41,Unnamed: 22_level_41,Unnamed: 23_level_41,Unnamed: 24_level_41,Unnamed: 25_level_41,Unnamed: 26_level_41,Unnamed: 27_level_41,Unnamed: 28_level_41,Unnamed: 29_level_41,Unnamed: 30_level_41,Unnamed: 31_level_41,Unnamed: 32_level_41,Unnamed: 33_level_41,Unnamed: 34_level_41,Unnamed: 35_level_41,Unnamed: 36_level_41,Unnamed: 37_level_41,Unnamed: 38_level_41,Unnamed: 39_level_41,Unnamed: 40_level_41,Unnamed: 41_level_41,Unnamed: 42_level_41,Unnamed: 43_level_41
Weight?,Feature,Unnamed: 2_level_42,Unnamed: 3_level_42,Unnamed: 4_level_42,Unnamed: 5_level_42,Unnamed: 6_level_42,Unnamed: 7_level_42,Unnamed: 8_level_42,Unnamed: 9_level_42,Unnamed: 10_level_42,Unnamed: 11_level_42,Unnamed: 12_level_42,Unnamed: 13_level_42,Unnamed: 14_level_42,Unnamed: 15_level_42,Unnamed: 16_level_42,Unnamed: 17_level_42,Unnamed: 18_level_42,Unnamed: 19_level_42,Unnamed: 20_level_42,Unnamed: 21_level_42,Unnamed: 22_level_42,Unnamed: 23_level_42,Unnamed: 24_level_42,Unnamed: 25_level_42,Unnamed: 26_level_42,Unnamed: 27_level_42,Unnamed: 28_level_42,Unnamed: 29_level_42,Unnamed: 30_level_42,Unnamed: 31_level_42,Unnamed: 32_level_42,Unnamed: 33_level_42,Unnamed: 34_level_42,Unnamed: 35_level_42,Unnamed: 36_level_42,Unnamed: 37_level_42,Unnamed: 38_level_42,Unnamed: 39_level_42,Unnamed: 40_level_42,Unnamed: 41_level_42,Unnamed: 42_level_42,Unnamed: 43_level_42
Weight?,Feature,Unnamed: 2_level_43,Unnamed: 3_level_43,Unnamed: 4_level_43,Unnamed: 5_level_43,Unnamed: 6_level_43,Unnamed: 7_level_43,Unnamed: 8_level_43,Unnamed: 9_level_43,Unnamed: 10_level_43,Unnamed: 11_level_43,Unnamed: 12_level_43,Unnamed: 13_level_43,Unnamed: 14_level_43,Unnamed: 15_level_43,Unnamed: 16_level_43,Unnamed: 17_level_43,Unnamed: 18_level_43,Unnamed: 19_level_43,Unnamed: 20_level_43,Unnamed: 21_level_43,Unnamed: 22_level_43,Unnamed: 23_level_43,Unnamed: 24_level_43,Unnamed: 25_level_43,Unnamed: 26_level_43,Unnamed: 27_level_43,Unnamed: 28_level_43,Unnamed: 29_level_43,Unnamed: 30_level_43,Unnamed: 31_level_43,Unnamed: 32_level_43,Unnamed: 33_level_43,Unnamed: 34_level_43,Unnamed: 35_level_43,Unnamed: 36_level_43,Unnamed: 37_level_43,Unnamed: 38_level_43,Unnamed: 39_level_43,Unnamed: 40_level_43,Unnamed: 41_level_43,Unnamed: 42_level_43,Unnamed: 43_level_43
+8.032,bias,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+5.901,2prev_word:*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+5.476,word.lower():m²,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+5.387,word.lower():1900.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+4.574,2prev_word:annuelles,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+4.401,word.lower():-5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+4.336,word.lower():3/4ème,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+4.064,prev_word:sans,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
… 422 more positive …,… 422 more positive …,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
… 364 more negative …,… 364 more negative …,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Weight?,Feature
+8.032,bias
+5.901,2prev_word:*
+5.476,word.lower():m²
+5.387,word.lower():1900.0
+4.574,2prev_word:annuelles
+4.401,word.lower():-5
+4.336,word.lower():3/4ème
+4.064,prev_word:sans
… 422 more positive …,… 422 more positive …
… 364 more negative …,… 364 more negative …

Weight?,Feature
+6.321,word.lower():impasse
+3.710,word.lower():boulevard
+3.620,2next_word:92240
+3.381,word.lower():rue
+3.007,2prev_word:seine
+2.876,next_word:villa
+2.678,2next_word:madelaine
+2.553,word.lower():rue'eugène
+2.553,next_word:derrien'..
+2.223,is_number

Weight?,Feature
+3.149,prev_word:avenue
+3.072,prev_word:boulevard
+2.549,prev_word:place
+2.191,prev_word:de
+2.101,next_word:à
+2.061,prev_word:rue
+2.038,2prev_word:avenue
+1.682,2next_word:quartier
… 63 more positive …,… 63 more positive …
… 10 more negative …,… 10 more negative …

Weight?,Feature
+4.471,word.lower():années
+3.873,2next_word:standing
+2.802,2next_word:briques
+2.687,word.lower():(
+2.478,word.lower():2007
+2.463,word.lower():2019
+2.204,is_number
+2.200,next_word:avec
+2.180,next_word:1974
+2.020,prev_word:de

Weight?,Feature
+2.258,prev_word:années
+2.009,word.lower():1974
+1.640,2next_word:soignée
+1.640,prev_word:1974
+1.593,is_number
+1.223,is_small_word
+1.191,"next_word:,"
+0.840,2prev_word:récente
+0.764,2prev_word:des
+0.411,next_word:)

Weight?,Feature
+7.050,word.lower():ascenseur
+2.129,word.lower():sans
+1.992,next_word:ascenseur
+1.729,word.lower():avec
+1.377,next_word:comprenant
+1.376,2prev_word:étage
+1.327,next_word:assenceur
+1.260,prev_word:5ème
+1.255,2next_word:d'ascenseur
+1.064,2next_word:lumineux

Weight?,Feature
+3.739,prev_word:sans
+2.189,word.lower():ascenseur
+1.849,prev_word:avec
+1.699,prev_word:d'ascenseur
+1.573,next_word:offrant
+1.522,2prev_word:pas
+1.319,word.lower():assenceur
+1.299,2next_word:l'appartement
+1.291,2next_word:au
+1.282,2next_word:place

Weight?,Feature
+5.226,2prev_word:charges
+3.556,prev_word:dont
+2.740,word.lower():19
+2.643,prev_word:pour
+2.608,2prev_word:mensuelles
+2.364,prev_word:+
+2.349,next_word:euros/mois
+2.339,2prev_word:locatives
+2.094,word.lower():100
+2.042,word.lower():80€

Weight?,Feature
1.644,prev_word:155
1.644,next_word:d'avance
1.376,word.lower():eur
1.356,2next_word:pour
0.996,2prev_word:et
0.413,is_all_upper
0.0,is_key_word

Weight?,Feature
+4.089,is_number
+3.722,word.lower():94400
+2.303,prev_word:(
+1.267,word.lower():92240
+0.860,next_word:)
+0.744,word.lower():92120
+0.741,prev_word:verdier
+0.722,2prev_word:avenue
+0.674,"2next_word:,"
… 3 more positive …,… 3 more positive …

Weight?,Feature
+8.292,word.lower():copropriété
+5.757,word.lower():co-propriété
+3.550,prev_word:petite
+1.483,next_word:propriété
+1.483,word.lower():co
+1.016,prev_word:dans
+0.909,2next_word:(
+0.602,2next_key
… 7 more positive …,… 7 more positive …
… 4 more negative …,… 4 more negative …

Weight?,Feature
1.851,prev_word:co
1.851,word.lower():propriété
1.717,2next_word:deux
1.491,2prev_word:petite
1.363,next_word:(

Weight?,Feature
+5.919,word.lower():immédiatement
+3.790,prev_word:disponibilité
+3.701,word.lower():immédiate
+3.394,2prev_word:libre
+2.932,next_word:suite..
+2.607,prev_word:disponible
+2.205,word.lower():de
+2.156,word.lower():30/11/2019
+2.123,2next_word:saisir
+1.839,prev_word:libre

Weight?,Feature
+5.614,word.lower():suite
+4.104,word.lower():décembre
+3.482,2next_word:!
+1.722,2prev_word:du
+1.671,2prev_word:-
+1.416,2next_word:__end1__
+1.347,next_word:suite
+1.084,"next_word:,"
… 19 more positive …,… 19 more positive …
… 3 more negative …,… 3 more negative …

Weight?,Feature
+6.839,2prev_word:garantie
+5.130,prev_word:garantie
+2.833,2next_word:;
+2.220,prev_word::
+2.029,2prev_word:de
+1.438,next_word:euros
+1.258,2prev_word:garanti
+0.901,is_number
+0.732,symbole_in_word
+0.514,word.lower():808.07€

Weight?,Feature
2.706,2prev_word:garantie
1.279,next_word:euros
1.164,prev_word:1
1.025,is_small_word
0.376,next_key

Weight?,Feature
+7.156,word.lower():balcon
+6.563,word.lower():jardin
+5.920,word.lower():terrasse
+4.469,word.lower():parc
+4.326,word.lower():balcons
+4.134,word.lower():balcon-terrasse
+3.988,word.lower():véranda
+3.846,word.lower():jardinet
+3.655,next_word:privative
+2.878,word.lower():cour

Weight?,Feature
+3.810,2prev_word:honoraires
+3.455,prev_word:fa
+3.197,next_word:ttc
+2.851,word.lower():180
+2.827,prev_word:honoraires
+2.780,next_word:eurosttc
+2.740,word.lower():699€
+2.687,2prev_word:locataire
+2.649,2prev_word:agence
+2.554,prev_word:dont

Weight?,Feature
2.704,prev_word:780
1.206,word.lower():200
1.187,2next_word:ttc
1.1,2next_word:honoraires
1.08,next_word:+
0.998,prev_word:1
0.449,next_word:euros
0.0,2prev_word:de

Weight?,Feature
+6.141,prev_word:loyer
+4.849,2prev_word:est
+4.580,2prev_word:comprises
+4.536,2prev_word:cc
+4.124,2prev_word:disponibilité
+3.087,2prev_word:loyer
+1.812,word.lower():888.07€
+1.795,word.lower():1400
+1.782,2next_word:dont
+1.726,prev_word:comprises

Weight?,Feature
3.09,prev_word:1
1.527,next_key
1.087,word.lower():050€
1.067,next_word:euros
0.593,2next_word:et
0.085,2prev_word:à

Weight?,Feature
+3.830,prev_word:mensuel
+3.431,2prev_word:loyer
+2.973,2prev_word:hc
+2.422,prev_word::
+2.203,word.lower():629
+2.186,2next_word:hors
+2.116,2prev_word:charge
+2.109,word.lower():1050eur
+2.087,2prev_key
+2.042,2prev_word:s'élève

Weight?,Feature
2.497,2prev_word::
0.312,word.lower():790euros
0.31,2next_word:provisions
0.232,prev_word:1
0.205,prev_word:2
0.161,word.lower():000
0.065,2next_word:.
0.039,next_word:euros

Weight?,Feature
+8.658,next_word:m²
+4.940,prev_word:d'environ
+3.643,is_number_and_letter
+3.381,prev_word:de
+2.455,word.lower():46m²
+2.311,2next_word:-
+2.235,symbole_in_word
+2.079,next_word:)
+2.065,word.lower():108m²
… 53 more positive …,… 53 more positive …

Weight?,Feature
+1.994,prev_word:44
+1.865,prev_word:41.52m²
+1.568,word.lower():000
+1.542,word.lower():41.52m²
+1.460,2prev_word:gare
+1.452,next_word:dans
+1.363,2next_word:un
+1.361,next_word:m²
+1.237,2next_word:dans
+1.133,prev_word:(

Weight?,Feature
+7.455,next_word:chambres
+5.662,next_word:chambre
+3.561,word.lower():chambre
+2.974,next_word:belles
+2.907,word.lower():une
+2.681,word.lower():d'une
+2.559,2next_word:salle
+2.499,2next_word:chambre
+2.314,word.lower():deux
+2.237,"prev_word:,"

Weight?,Feature
+4.588,next_word:etage
+4.211,next_word:étage
+4.112,prev_word:au
+3.970,2next_word:dernier
+3.453,word.lower():rez-de-chaussée
+3.109,prev_word:en
+2.833,next_word:rez
+2.548,word.lower():3
+2.302,2prev_word:copropriété
… 63 more positive …,… 63 more positive …

Weight?,Feature
+3.084,2prev_word:rez
+2.340,prev_word:rez
+2.118,2next_word:(
+1.896,2prev_word:au
+1.195,2next_word:chaussee
+1.087,prev_word:et
+0.999,word.lower():dernier
+0.981,next_word:étage
+0.831,prev_word:de
+0.799,bias

Weight?,Feature
+8.557,next_word:pièces
+5.901,next_word:séjour/chambre
+4.618,word.lower():pièce
+3.782,next_word:pièce
+3.224,word.lower():t
+3.111,2next_word:pièce
+2.950,prev_word:appartement
+2.723,word.lower():f4
+2.697,prev_word:type
+2.585,2prev_word:ascenseur

Weight?,Feature
3.818,prev_word:t
1.76,"2prev_word:,"
1.174,is_number
0.047,2next_word:superfice
0.009,is_small_word
0.008,next_word:d'une

Weight?,Feature
+6.892,word.lower():parking
+6.564,word.lower():stationnement
+5.714,word.lower():garage
+4.382,word.lower():parkings
+2.247,prev_word:2
+1.973,word.lower():emplacement
+1.967,prev_word:un
+1.796,is_key_word
… 18 more positive …,… 18 more positive …
… 11 more negative …,… 11 more negative …

Weight?,Feature
1.806,2prev_word:aménagés..
1.749,word.lower():parkings
1.54,prev_word:2
1.38,2next_word:une
1.205,next_word:et
0.433,is_key_word

Weight?,Feature
+7.776,word.lower():centre-ville
+4.850,word.lower():centre
+4.401,2prev_word:quartier
+4.142,word.lower():mairie
+3.504,word.lower():gare
+3.110,prev_word:quartier
+2.737,2prev_word:le
+2.601,word.lower():plateau
+1.970,word.lower():quartier
+1.948,2prev_word:clamart

Weight?,Feature
+3.570,prev_word:fort
+3.424,prev_word:petit
+2.744,prev_word:**quais
+2.744,next_word:seine**
+2.487,word.lower():ville
+2.384,prev_word:quartier
+2.376,word.lower():d'issy
+2.149,2next_word:mairie
+2.099,2prev_word:en
+2.011,2prev_word:immédiate

Weight?,Feature
+9.234,word.lower():cave
+7.440,word.lower():box
+5.376,word.lower():cellier
+4.818,word.lower():buanderie
+4.277,word.lower():grenier
+3.299,word.lower():rangements
+3.020,next_word:vélo
+2.921,word.lower():local
+2.708,2next_word:garage
+1.954,is_key_word

Weight?,Feature
2.088,2next_word:le
1.844,prev_word:local
1.844,word.lower():vélo
1.139,2prev_word:.
0.331,next_word:.

Weight?,Feature
+10.170,word.lower():tram
+8.910,word.lower():rer
+7.710,word.lower():métro
+7.186,word.lower():tramway
+6.299,word.lower():bus
+4.805,2prev_word:169
+4.567,word.lower():t6
+3.776,word.lower():gare
+3.388,word.lower():transports
+2.941,word.lower():transilien

Weight?,Feature
+3.650,prev_word:ligne
+3.199,2next_word:place
+2.794,prev_word:''
+2.765,next_word:à
+2.548,prev_word:transilien
+2.445,is_small_word
+2.284,word.lower():t6
+2.039,next_word:un
+1.663,next_word:vitry
+1.639,is_all_upper

Weight?,Feature
+4.554,word.lower():individuels
+4.241,word.lower():individuel
+4.127,word.lower():collectifs
+3.940,prev_word:chauffage
+3.520,2prev_word:chauffage
+3.457,word.lower():collectif
+3.150,is_key_word
+2.917,2next_word:l'appartement
+2.375,next_word:!
+2.238,prev_word:chaude

Weight?,Feature
1.908,word.lower():.
1.796,prev_word:collectif
1.784,2prev_word:chauffage
1.665,next_word:contactez-nous
1.037,2next_word:au
0.761,symbole_in_word
0.565,2next_word:__end2__
0.565,next_word:__end1__

Weight?,Feature
+6.662,word.lower():vide
+6.354,word.lower():meublé
+5.064,word.lower():meublée
+4.360,word.lower():maison
+3.958,2prev_word:tramway
+3.479,word.lower():meuble
+2.516,2next_word:très
+2.179,2next_word:disponible
+2.144,word.lower():non
+2.126,prev_word:location

Weight?,Feature
+2.389,word.lower():meublé
+2.185,prev_word:non
+1.951,2prev_word:(
+1.691,2next_word:dernière
+1.661,is_key_word
+1.369,2prev_word:lumineux
+0.947,2next_key
+0.882,next_word:)
+0.718,prev_word:(
+0.302,next_word:au

Weight?,Feature
+7.347,word.lower():issy
+5.956,word.lower():vitry
+5.730,word.lower():issy-les-moulineaux
+5.326,word.lower():chatillon
+4.860,word.lower():clamart
+4.235,word.lower():villejuif
+4.153,word.lower():malakoff
+4.078,word.lower():montrouge
+3.929,prev_word:fort
+3.449,word.lower():châtillon

Weight?,Feature
+4.077,2prev_word:vitry
+3.176,word.lower():les
+2.514,2prev_word:d'issy
+2.410,prev_word:les
+2.027,2next_word:mme
+1.684,word.lower():roses
+1.684,2prev_word:fontenay
+1.554,word.lower():moulineaux
+1.474,prev_word:aux
+1.443,prev_word:d'issy


# V/ Exportation Prediction vers Fichier Json importable sur Doccano

In [25]:
#Fonction qui regroupe les occurences d'un label si les mots se suivent
def occ_labels(labels):
    i = 0
    x = list()
    j = 1
    while i<len(labels):
        tmp = i
        while j<len(labels) and labels[tmp][2]==labels[j][2] and labels[tmp][1]==labels[j][0]-1:
            tmp+=1
            j+=1
        x.append([labels[i][0],labels[j-1][1],labels[i][2]])
        i=j
        j=i+1
    return x

In [26]:
def from_text_to_dataframe(text, nb_ad):
    """Fonction qui à partir de l'annonce permet de créer une dataframe à trois colonnes. 
    La première colonne correspond à l'annonce sous forme de text, la seconde contient les 
    mots de l'annonce et la troisième les positions du mot dans l'annonce.""" 
    Vect_word=word_tokenize(text) # Tokenisation
    nb_ad_list=list(map(int, nb_ad*np.ones(len(Vect_word)))) # Numéro annonce
    # Position
    offset = 0                                                                  
    list_pos=list()
    for token in Vect_word:
        offset = text.find(token, offset)
        list_pos.append([offset, offset+len(token)])
        offset += len(token)
    # Creation de la dataframe
    data={'Ad#':nb_ad_list,'Words':Vect_word,'Pos':list_pos}
    dataframe=pd.DataFrame(data)
    return dataframe

#Fonction qui permet d'écrire un fichier json à partir du texte de l'annonce, des predictions établies et des positions des mots
def pred_to_json(text,y_pred, pos,new):
    #Format json
    labels = list()
    for l in range(len(y_pred)):
        if (y_pred[l] != 'O'):
            labels.append([pos.iloc[l][0],pos.iloc[l][1],y_pred[l][2:]])
    
    #On elimine les occurences de labels qui se suive
    labels = occ_labels(labels)
    ad = {
        "id" : 1,
        "text" : text,
        "meta" : {},
        "annotation_approver" : "gabrielle",
        "labels" : labels
    }
    
    if new==0:
        with open("data_predicted/test.json","a") as f:
            json.dump(ad,f)
            f.write("\n")
    else:
        with open("data_predicted/test.json","w") as f:
            json.dump(ad,f)
            f.write("\n")
        

In [27]:
# TEST avec 3 annonces
nb_ad_test=1000
list_test=["24 rue du Capitaine Ferber- Copropriété de 2004 de standing, au 2ième étage, appartement de 3 pièces de 71 m² refait à neuf (peintures, parquets, radiateurs à inertie) avec séjour (sud est) de 26 m² ouvrant sur loggia de 5,65 m², cuisine semi- indépendante meublée et équipée (hotte, plaques, four lave-vaisselle), 2 chambres avec placards de 11,7 m² donnant sur balcon, salle de bains et WC indépendants. Parking.. Dépôt de garantie 1.620,00 euros. Honoraires de location 1.065,00 eurosTTC Loyer mensuel 1620 euros - Charges locatives 80 euros - Honoraire TTC à la charge du locataire 1065 euros dont 213 euros d'honoraires d'état des lieux.","\"Rue Rouget de l'Isle, à 5 minutes du RER C \"\"Issy Val de Seine\"\", dans une résidence sécurisée de 2010, au 2ème étage avec ascenseur, un studio meublé comprenant une entrée avec placard, un séjour donnant sur cour (calme), un coin cuisine, une salle d'eau avec WC. Idéal étudiant. Parquet au sol. Balcon de 4 m² donnant sur jardin. Calme. Gardien. Interphone. Eau chaude et chauffage individuels électriques.\"","*EXCLUSIVITÉ* ISSY-LES-MOULINEAUX, rue du Capitaine Ferber, dans un immeuble moderne 2003, un appartement de 2 Pièces 58m² au 5ème étage avec ascenseur, comprenant: une entrée avec placards, un séjour donnant sur une terrasse SUD-EST, une cuisine séparée aménagée et entièrement équipée BOSCH, une chambre avec dressing, une salle de bains, un WC séparé, cave et une place de parking intérieur. CALME donnant entièrement côté jardin, LUMINEUX. Proches commerces et transports (RER TRAMWAY Issy Val de Seine, Métro Mairie d'Issy). Loyer 1480 euros Charges Comprises. Disponible de suite."]

def from_list_text_to_dataframe(list_text, nb_ad):
    """Fonction qui à partir d'une liste d'annonce permet de créer une dataframe à trois colonnes. 
    La première colonne correspond aux annonces sous forme de texte, la seconde contient les 
    mots de l'annonce et la troisième les positions du mot dans l'annonce."""
    for i in range(len(list_text)):
        if i==0:
            data=from_text_to_dataframe(list_text[i], nb_ad+1)
            nb_ad+=1
        else:
            data=data.append(from_text_to_dataframe(list_text[i], nb_ad+1), ignore_index = True)
            nb_ad+=1
    add_features(data)
    return data

data_to_json=from_list_text_to_dataframe(list_test, nb_ad_test)

#Fonction qui prend en argument un dataframe contenant les annonces et ses features et le type de modele à utiliser pour les
#predictions et qui renvoie un fichier json importable sur doccano
def df_to_json(list_text,nb_ad,dataframe,modele):
    #Ecriture du dataframe au format dictionnaire
    X_crf=list()
    for i in range(len(list_text)):
        X_crf.append(ad2features(dataframe.loc[dataframe['Ad#']==nb_ad+1+i,:]))
    y_pred = modele.predict(X_crf)
    for i in range(len(list_text)):
        if i==0:
            pred_to_json(list_text[i],y_pred[i],dataframe.loc[dataframe['Ad#']==nb_ad+1+i,:]['Pos'],1)
        else :
            pred_to_json(list_text[i],y_pred[i],dataframe.loc[dataframe['Ad#']==nb_ad+1+i,:]['Pos'],0)

In [28]:
df_to_json(list_test,nb_ad_test,data_to_json,crf)

# VI/ Sequence Tagging avec bi LSTM-CRF