# First attempt at CommonLit Readability

## Purpose

|Description|Status|
|----------------------------------------------------------------------------|-----------------|
|Verify that I can write a submission file|See `model.save(...)`, below|
|Understand the submission errors I have been experiencing|Any exception can stop submission.csv being generated. Be careful of anything that might throw an exception (e.g. division), because the private test data may have different characteristics.|
|Experiment with regression methods|So far Ridge Regession works best for me. I appear to have hit a wall at a score of 0.777, so I don't plan to take this notebook further.|
|Experiment with Features|See section headed: Compute Features|
|See whether a simple neural network can produce a better match than regression|I've used a simple network, which has been disappointing.|
|Investigate Standard Error|I'm using standard error to generate weights for regression, and samples for the neural net.|

## Acknowledgements

I'd like to thank the following people who made their notebooks available for learning.

| Author | Notebook | Remarks |
| --------------- | ---------------------- | --- |
| Bishwajit Shil | [submission score 0.62](https://www.kaggle.com/jitshil143/submission-score-0-62)  | Creation of submission file
|Manish KC|[Text Pre-processing & Data Wrangling](https://www.kaggle.com/manishkc06/text-pre-processing-data-wrangling)|Introduced me to spaCy|

## Notes from Discussions

1. [The reading criterion was developed using a Bradley-Terry model.](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/236402)

1. [So the mean target value of that excerpt is chosen to coincide with the origin of the scale for each and every rater, so that the excerpt's standard error is also zero. This implies that the standard errors of all the other excerpts are inflated by the "true" standard error of the baseline excerpt.](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/236403)
 
1. [Higher scores are considered easier to read than low ones.](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/236538)

# Hyperparameters

In [None]:
M               = 2000       # Number of words from frequency table. 
MODEL           = 'RIDGE_CV' # LR for linear regression, RIDGE for Ridge regression, ANN for neural network
VALIDATION_SIZE = 0.2        # Controle train/validation split
MI_CUTOFF       = 0.01       # Discard featurs if mutual information with target is less than this value

# Neural Network Hyperparameters
      
N_EPOCH      = 2**16      # Number of epochs for training
LR           = 0.001      # Training learning Rate
NBURN        = 2**6       # Number of epochs that are excluded from plots
DROPOUT      = 0          # Controls whther dropout will be used
FREQUENCY    = 2**4       # Plot average loss over FREQUENCY epochs
WEIGHT_DECAY = 0.0001     # Training weight decay

# Regression Hyperparameters

N_POLY       = 2      # Degree of polynomial for regression
EPSILON      = 0.001  # Prevents sample weights blowing up when standard_error==0
ALPHA        = 900.0   # Regularization strength for Ridge
#ALPHAS       = [ 900.0,1000.0,1050.0, 1100.0, 1150.0, 1200.0, 1300.0, 1400.0,1500.0]
ALPHAS       = [ 700.0, 800.0, 850.0, 900.0,950.0, 1000.0]

In [None]:
from math                      import log
from matplotlib.pyplot         import figure, title, xlabel, ylabel, scatter, legend, colorbar, plot, barh, yticks
from numpy                     import arange
from pandas                    import read_csv, Series
from os                        import walk
from os.path                   import join
from random                    import gauss
from sklearn.feature_selection import mutual_info_regression
from sklearn.linear_model      import LinearRegression, Ridge, RidgeCV
from sklearn.model_selection   import train_test_split
from sklearn.preprocessing     import PolynomialFeatures, StandardScaler
from spacy                     import load
from torch                     import FloatTensor, reshape, no_grad
from torch.nn                  import Module, Linear,  MSELoss, Dropout
from torch.nn.functional       import relu
from torch.optim               import Adam


# Load Data

## Data Dictionary

|Train|Public Test|Hidden Test|Description|
|--------------|--------------|----------|----------------------------------------------------|
|id|id|id|Unique ID for excerpt|
|url_legal|url_legal|- |URL of source (Omitted from some records in the test set--see [note](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/238670#1306025))|
|license|license |-|License of source material (Omitted from some records in the test set--see [note](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/238670#1306025))|
|excerpt|excerpt|excerpt|Text for predicting readability|
|target|-|-|Readability|
|standard_error|-|-|Measure of spread of scores among multiple raters for each excerpt|


In [None]:
train_data    = None
test_data     = None
unigram_freq  = None
for dirname, _, filenames in walk('/kaggle/input'):
    for filename in filenames:
        path_name = join(dirname, filename)
        if filename.startswith('train'):
            train_data = read_csv(path_name)
        if filename.startswith('test'):
            test_data = read_csv(path_name)
        if filename.startswith('unigram_freq'):
            unigram_freq = read_csv(path_name)
                       
train_data.describe()


# Prepare table of frequencies for words

We discard all but the most frequent words, assuming that all really infrequent words
are equally infrequent.

In [None]:
number_to_drop = len(unigram_freq.index) - M
if number_to_drop>0:
    drop_these_indices = unigram_freq.tail(number_to_drop).index
    unigram_freq.drop(drop_these_indices, inplace = True)
    
unigram_freq.set_index('word', inplace=True) # so we can index on words
unigram_freq.describe()

# Feature extractors

This is a collection of classes, each used to walk through a document and extract one feature.

In [None]:
# TokenWalker
#
# Used to walk throught the tokens in a document, and perform
# processing determined by subclasses

class TokenWalker:
    # Initialization
    # Parameters:
    #     Actions         Dict of key/value pairs for handling tokens
    #     default_action  Action to be performed if key not found in Actions
    def __init__(self,doc,
                Actions        = {},
                default_action = lambda token: None):
        self.doc            = doc
        self.Actions        = Actions
        self.default_action = default_action

    # get_key
    #
    # Calculate key for combination of pos and tag
    
    def get_key(self,pos,tag=''):
          return f'{pos}:{tag}'

    # get_action
    #
    # Look up action to be performed for token
    def get_action(self,token):
        # First try for exact match on pos and tag
        key = self.get_key(token.pos_,token.tag_)
        if key in self.Actions:
            return self.Actions[key]
        
        # Otherwise look for a match on pos only
        key = self.get_key(token.pos_)
        if key in self.Actions:
            return self.Actions[key]

        return self.default_action

    # walk throught the tokens in a document
    def walk(self):
        for token in self.doc:
             self.get_action(token)(token)


# SentenceCounter
#
# Walk through document and count sentences, i.e. count full stops

class SentenceCounter(TokenWalker):
    def __init__(self,doc,tag='.'):
            super().__init__(doc,
                           Actions = {self.get_key('PUNCT',tag): lambda token: self.incr()} )
            self.count = 0

    # incr
    # Count each full stop
    def incr(self):
        self.count += 1

    # get
    #
    # Note check that there is at least one sentence.
    # Program was falling over on hidden test test set with a disvsion by zero before I added check
    
    def get(self):
        self.walk()
        return max(1,self.count)

# WordCounter
#
# Walk through document and count words

class WordCounter(TokenWalker):
    def __init__(self,doc):
        super().__init__(doc,
                         Actions = {
                           self.get_key('PUNCT')      : lambda token: None,
                           self.get_key('PART','POS') : lambda token: None,
                         },
                         default_action = lambda token: self.incr())
        self.count = 0

    def incr(self):
        self.count += 1

    def get(self):
        self.walk()
        return self.count

# SyllableCounter
#
# Walk through document and count syllables

class SyllableCounter(WordCounter):
    def __init__(self,doc):
        super().__init__(doc)
        self.default_action = lambda token: self.incr(str(token).lower())

    def incr(self,word):
        self.count += max(1,self.countSyllables(word))
    
    def countSyllables(self,word): # https://stackoverflow.com/questions/405161/detecting-syllables-in-a-word
        vowels       = "aeiouy"
        numVowels    = 0
        lastWasVowel = False
        for wc in word:
            foundVowel = False
            for v in vowels:
                if v == wc:
                    if not lastWasVowel: numVowels+=1   #don't count diphthongs
                    foundVowel = lastWasVowel = True
                    break
            if not foundVowel:  #If full cycle and no vowel found, set lastWasVowel to false
                lastWasVowel = False
        if len(word) > 2 and word[-2:] == "es": #Remove es - it's "usually" silent (?)
            numVowels-=1
        elif len(word) > 1 and word[-1:] == "e":    #remove silent e
            numVowels-=1
        return numVowels

# ClauseCounter
#
# Walk through document and count clauses
# Use verbs as a proxy

class TagCounter(TokenWalker):
    def __init__(self,doc,tag='VERB'):
        super().__init__(doc,
                         Actions = {self.get_key(tag): lambda token: self.incr()} )
        self.count = 0

    def incr(self):
        self.count += 1

    def get(self):
        self.walk()
        return self.count 
    
# WordCounter
#
# Walk through document and calculate average rank in frequency table

class FreqCounter(TokenWalker):
    def __init__(self,doc):
        super().__init__(doc,
                         Actions = {
                           self.get_key('NOUN')     : lambda token: self.incr(token),
                           self.get_key('VERB')     : lambda token: self.incr(token), 
                           self.get_key('ADJ')      : lambda token: self.incr(token),
                           self.get_key('ADV')      : lambda token: self.incr(token),
                         })
        self.count = 0
        self.freq  = 0
        
    def incr(self,token):
        word = str(token).lower()
        rank = unigram_freq.index.get_loc(word) if word in unigram_freq.index else M
            
        self.freq  += log(rank+1) #unigram_freq.loc[word,'freq']
        self.count += 1

    def get(self):
        self.walk()
        return  self.freq/self.count if self.count>0 else 0 
    
def get_stopwords(doc,lemma=None):
    return [token.lemma_ for token in doc if token.is_stop and (lemma==None or lemma == token.lemma_)]

# TagCounter
#
# Count number of distinct tag types in document,
# either scaled by length of document or raw.

class UniqueTagCounter:
    def __init__(self,doc,scaled=False):
        self.doc      = doc
        self.tags     = set()
        self.n_tokens = 0
        self.scaled   = scaled
        
    def walk(self):
        for token in self.doc:
            self.tags.add(token.tag_)
            self.n_tokens += 1
            
    def get(self):
        return len(self.tags)/(self.n_tokens if self.scaled else 1)

# Compute Features

Teature extraction is driven by `FeatureTable`, a list of feature names, each paired with a function
that will generate the value of the feature from a text extract

In [None]:
FeatureTable = [ 
    ('word_count',           lambda doc: WordCounter(doc).get()),
    ('sentence_count',       lambda doc: SentenceCounter(doc).get()),
    ('sentence_length',      lambda doc: WordCounter(doc).get()/SentenceCounter(doc).get()),
    ('unique_tags',          lambda doc: UniqueTagCounter(doc).get()),
    ('clauses',              lambda doc: TagCounter(doc).get()),
    ('adjs',                 lambda doc: TagCounter(doc,tag='ADJ').get()),
    ('adps',                 lambda doc: TagCounter(doc,tag='ADP').get()),
    ('advs',                 lambda doc: TagCounter(doc,tag='ADV').get()),
    ('auxen',                lambda doc: TagCounter(doc,tag='AUX').get()),
    ('conjs',                lambda doc: TagCounter(doc,tag='CONJ').get()),
    ('dets',                 lambda doc: TagCounter(doc,tag='DET').get()),
    ('intjs',                lambda doc: TagCounter(doc,tag='INTJ').get()),
    ('nouns',                lambda doc: TagCounter(doc,tag='NOUN').get()),
    ('nums',                 lambda doc: TagCounter(doc,tag='NUM').get()),
    ('parts',                lambda doc: TagCounter(doc,tag='PART').get()),
    ('prons',                lambda doc: TagCounter(doc,tag='PRON').get()),
    ('proper_nouns',         lambda doc: TagCounter(doc,tag='PROPN').get()),
    ('puncts',               lambda doc: TagCounter(doc,tag='PUNCT').get()),
    ('sconjs',               lambda doc: TagCounter(doc,tag='SCONJ').get()),
    ('syms',                 lambda doc: TagCounter(doc,tag='SYMS').get()),
    ('xs',                   lambda doc: TagCounter(doc,tag='X').get()),
    ('syllables',            lambda doc: SyllableCounter(doc).get()),
    ('frequencies',          lambda doc: FreqCounter(doc).get()),
    ('commas',               lambda doc: SentenceCounter(doc,tag=',').get()),
    ('semicolons',           lambda doc: SentenceCounter(doc,tag=';').get()),
    ('stopword_count',       lambda doc: len(get_stopwords(doc))),
    ('stopword_count_of',    lambda doc: len(get_stopwords(doc,lemma='of'))),
    ('stopword_count_with',  lambda doc: len(get_stopwords(doc,lemma='with'))),
    ('stopword_count_the',   lambda doc: len(get_stopwords(doc,lemma='the'))),
    ('stopword_count_in',    lambda doc: len(get_stopwords(doc,lemma='in'))),
    ('stopword_count_as',    lambda doc: len(get_stopwords(doc,lemma='as'))),
    ('stopword_count_which', lambda doc: len(get_stopwords(doc,lemma='which'))),
    ('stopword_count_by',    lambda doc: len(get_stopwords(doc,lemma='by'))),
    ('stopword_count_to',    lambda doc: len(get_stopwords(doc,lemma='to'))),
    ('stopword_count_and',   lambda doc: len(get_stopwords(doc,lemma='and'))),
    ('stopword_count_at',    lambda doc: len(get_stopwords(doc,lemma='at'))),
    ('stopword_count_just',  lambda doc: len(get_stopwords(doc,lemma='just'))),
    ('stopword_count_one',   lambda doc: len(get_stopwords(doc,lemma='one'))),
    ('stopword_count_boy',   lambda doc: len(get_stopwords(doc,lemma='boy'))),
    ('stopword_count_live',  lambda doc: len(get_stopwords(doc,lemma='live'))),
    ('stopword_count_snow',  lambda doc: len(get_stopwords(doc,lemma='snow'))),
    ('stopword_count_tree',  lambda doc: len(get_stopwords(doc,lemma='tree'))),
    ('stopword_count_out',   lambda doc: len(get_stopwords(doc,lemma='out'))),
    ('stopword_count_think', lambda doc: len(get_stopwords(doc,lemma='think'))),
    ('stopword_count_away',  lambda doc: len(get_stopwords(doc,lemma='away'))),
    ('stopword_count_right', lambda doc: len(get_stopwords(doc,lemma='right')))
    
]

# https://www.kaggle.com/c/commonlitreadabilityprize/discussion/240064
nlp      = load("en_core_web_sm")   # Initialize English Language

Features = [name for (name,_) in FeatureTable]   # Extract list of feature names

# get_features
#
# Calculate values for feature in specified row

def get_features(row):
    doc            = nlp(row['excerpt'])
    return tuple([extract(doc) for (_,extract) in FeatureTable])



# assign_features
#
# Create and populate new columns for features to be used in regression

def assign_features(dataset):
    dataset[Features] = dataset.apply(get_features,
                                      axis        = 1,
                                      result_type = 'expand')


assign_features(train_data)
train_data, validation_data = train_test_split(train_data,
                                               test_size=VALIDATION_SIZE) 


# Feature Engineering

Compute [Mutual Information](https://www.kaggle.com/ryanholbrook/mutual-information) between each feature and target, then partition features into two groups, `lifters` and `leaners`, by comparing mutual information with `mi_cutoff`. Drop the leaners out.

In [None]:
def make_mi_scores(X, y):
    mi_scores = mutual_info_regression(X, y, discrete_features=False)
    mi_scores = Series(mi_scores, name="MI Scores", index=X.columns)
    mi_scores = mi_scores.sort_values(ascending=False)
    return mi_scores

y         = train_data.target
X         = train_data[Features]
mi_scores = make_mi_scores(X, y)
mi_scores = mi_scores.sort_values(ascending=True)
 

def plot_mi_scores(scores,lifters=[],mi_cutoff=0.03):
    colours = ['r' if len(scores)-i>len(lifters) else 'b' for i in range(len(scores))]
    width  = arange(len(scores))
    ticks  = list(scores.index)
    barh(width, scores,color=colours)
    yticks(width, ticks)
    title(f'Mutual Information Scores: cutoff={mi_cutoff}')


figure(dpi=100, figsize=(8, 5))

lifters = list(mi_scores[mi_scores.ge(MI_CUTOFF)].index)
leaners = list(mi_scores[mi_scores.lt(MI_CUTOFF)].index)
plot_mi_scores(mi_scores,lifters=lifters,mi_cutoff=MI_CUTOFF)

train_data.drop(leaners,axis=1)
validation_data.drop(leaners,axis=1)


# Create model to be used for predicting targets

In [None]:
class Model:    # Generic parent for all models
    def __init__(self):
        pass
    
    def train(self,data):
        pass
    
    def predict(self,data,output='submission.csv'):
        pass

    # save
    #
    # Save predictions to submission file
    # Snarfed from Bishwajit Shil's notebook
    
    def save(self,data,prediction,output='submission.csv'):
        xsub           = data[["id"]].copy()
        xsub["target"] = prediction
        xsub.to_csv(output, index = False)
        
class RegressionModel(Model):   # Model to perform regression
    
    def __init__(self,
                 degree = 1,
                 alpha  = 0,
                 model  = 'LR'):
        super().__init__()
        self.degree = degree
        self.alpha  = alpha
        self.model  = None
        if model  == 'LR':
            self.model  = LinearRegression() 
        if model  == 'RIDGE':
            self.model = Ridge(alpha=self.alpha)
        if model =='RIDGE_CV':
            self.model = RidgeCV(alphas=ALPHAS) 
        self.scaler = StandardScaler()

    def train(self,data,training=True):
        y             = data.target
        X             = data[Features]
        weights       = 1/(EPSILON + data.standard_error.pow(2))
        self.poly_reg = PolynomialFeatures(degree=self.degree)
        if self.degree>1:
            X        = self.poly_reg.fit_transform(X)
        X             = self.scaler.fit_transform(X)
        if training:           
            model_cv = self.model.fit(X, y)
#             print (f'alpha = {model_cv.alpha_}')
    
        return self.model.predict(X),y,self.model.score(X, y,weights)
        
    def predict(self,data,output='submission.csv'):
        assign_features(data)
        X             = data[Features]
        if self.degree>1:
            X        = self.poly_reg.fit_transform(X)
        X             = self.scaler.fit_transform(X)
        self.save(data, self.model.predict(X), output=output)


class MLP(Model):
    # Scaler 
    # Used by neural network to scale data into (0,1)
    class Scaler:
        def __init__(self,data=train_data.target,tolerance = 0.1):
            self.min_y = min(data) - tolerance
            self.max_y = max(data) + tolerance
            self.range = self.max_y - self.min_y
            print (f'min={self.min_y}, max={self.max_y}, range={self.range}')
        
        # scale 
        # scale data into (0,1)
        def scale(self,y):
            return (y-self.min_y)/self.range
        
        #elacs
        # Restore original value from scaled
        def elacs(self,y):
            return self.range*y + self.min_y
        
      
    class ANN(Module):
        def __init__(self,input_size=6,hidden=[32,16,12]):
            super().__init__()
            self.fc1    = Linear(in_features  = input_size,
                                 out_features = hidden[0])
            self.fc2    = Linear(in_features  = hidden[0],
                                 out_features = hidden[1])
            self.fc3    = Linear(in_features  = hidden[1],
                                 out_features = hidden[2])
            self.output = Linear(in_features  = hidden[2],
                                 out_features = 1)
            if DROPOUT>0:
                self.dropout = Dropout(DROPOUT)

        def forward(self, x):
            x = relu(self.fc1(x))
            if DROPOUT>0:
                x = self.dropout(x)
            x = relu(self.fc2(x))
            if DROPOUT>0:
                x = self.dropout(x)
            x = relu(self.fc3(x))
            x = self.output(x)
            return x
        
        def predict(self,X,scaler):
            preds = []
            with no_grad():
                for val in X:
                    y_hat = self.forward(val)
                    preds.append(scaler.elacs(y_hat[0].item()))
   
            return preds
        
    def __init__(self):
        super().__init__()
        self.model     = self.ANN(len(Features))
        self.criterion = MSELoss()
        self.optimizer = Adam(self.model.parameters(),lr=LR,weight_decay=WEIGHT_DECAY)
        self.scaler   =  Scaler(tolerance=max(train_data.standard_error))
        self.scaler.scale(train_data.target).describe()
        
    def train(self,data,nburn=NBURN):
        
        X        = data[Features]
        X_train  = FloatTensor(X.values)
        loss_arr = []
        loss_sum = 0

        for epoch in range(N_EPOCH):
            sample = []
            for target,sigma in zip(data.target,data.standard_error):
                sample.append(self.scaler.scale(gauss(target,sigma)))
            y_train  = reshape(FloatTensor(sample),(len(data.target),1))
            y_hat    = self.model.forward(X_train)
            loss     = self.criterion(y_hat, y_train)
            if epoch>NBURN:
                loss_sum += loss
                if (epoch - NBURN) % FREQUENCY==0:
                    loss_arr.append(loss)
                    loss_sum=0
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()
        figure(figsize=(10,10))
        plot([FREQUENCY*i for i in range(len(loss_arr))],loss_arr,
             label = f'Mean Loss over {FREQUENCY}  epochs')
        title(f'Dropout = {DROPOUT}')
        legend()
        xlabel('Epoch')
        return scaler.elacs(y_hat.detach().numpy()), data.target, loss
  
    def predict(self,data,output='submission.csv'):
        assign_features(data)
        data.drop(leaners,axis=1)
        X       = data[Features]
        X_test  = FloatTensor(X.values)
        self.save(data,
                  self.model.predict(X_test,self.scaler),
                  output=output)

def createModel():
    if MODEL == 'ANN':
        return MLP()
    
    if MODEL == 'LR':  
        return RegressionModel(degree=N_POLY,model=MODEL)

    if MODEL == 'RIDGE' or MODEL == 'RIDGE_CV':  
        return RegressionModel(degree=N_POLY,alpha=ALPHA,model=MODEL)
    
model = createModel()

# Compare predictions with training data

In [None]:
def plot_predictions(data,predictions,
                     y          = [],
                     score      = None,
                     ax         = None,
                     plot_title = ''):
    r2 = r'$R^2$'   # Quick hack to make latex work properly
    scatter1 =ax.scatter(data['target'],predictions,
                          c     = data['standard_error'],
                          cmap  = 'viridis',
                          label = f'Predictions. {r2}={score:.6f}')
    ax.scatter(data['target'],data['target'],
            c     = 'r',
            s     = 2,
            label = 'Ideal')
    
    ax.set_xlabel('Target')
    ax.set_ylabel('Predicted')
    ax.set_title(plot_title)
    ax.legend()
    return scatter1

fig  = figure(figsize=(12,12))
axes = fig.subplots(nrows=1,ncols=2)
train_predictions,y_train,score_train    = model.train(train_data)
plot_predictions(train_data,train_predictions,
                 y          = y_train,
                 score      = score_train,
                 ax         = axes[0],
                 plot_title = 'Training')
validation_predictions,y_validation,score_validation    = model.train(validation_data,
                                                                      training=False)
scatter1 = plot_predictions(validation_data,validation_predictions,
                            y          = y_validation,
                            score      = score_validation, 
                            ax         = axes[1],
                            plot_title = 'Validation')
colorbar(scatter1).set_label('Standard Error', rotation=270)

## Predict result of tests and write submission file



In [None]:
model.predict(test_data)




# Plot Features pairwise

## Purpose

See whether features are redundant be examing pairwise correlations

In [None]:
fig = figure(figsize=(15,15))
axs = fig.subplots(nrows = len(Features)-1,
                   ncols = len(Features)-1)
r2 = r'$R^2$'   # Quick hack to make latex work properly
for i in range(len(Features)-1):
    for j in range(1,len(Features)):
        axs[i][j-1].xaxis.set_ticks([])
        axs[i][j-1].yaxis.set_ticks([])
        axs[i][j-1].set_frame_on(False) 

for i in range(len(Features)-1):
    for j in range(i+1,len(Features)):
        y           = train_data[Features[j]]
        X           = train_data[Features[i]].values.reshape(-1,1)
        interaction = LinearRegression()
        interaction.fit(X, y)
        axs[i][j-1].scatter(train_data[Features[i]],interaction.predict(X),
                           c     = 'r',
                           s     = 1,
                           label = f'{r2}={interaction.score(X, y):.6f}')
        axs[i][j-1].scatter(train_data[Features[i]], train_data[Features[j]],
                           c = 'b',
                           s = 1)
        axs[i][j-1].set_xlabel(Features[i])
        axs[i][j-1].set_ylabel(Features[j])
        axs[i][j-1].legend()
