As a pre-processing step, you may want to label all consecutive words in your submission without a predicted NER label as "Nothing", and then take some action on that label based on specific conditions (For example, if class=Nothing and text.contains [Some Text] then predict [Some class]).

In [None]:
import numpy as np 
import pandas as pd 
pd.options.mode.chained_assignment = None 



To illustrate, I am using a subset of predictions generated from a training model.

In [None]:
submission=pd.read_csv("../input/predictionsfromtez/fold0_preds")
ids=submission.id.unique()[0:51]
submission=submission[submission['id'].isin(ids)]

In [None]:
#text for each submission essay
id_to_text= {}
for idx in ids:
        filename = f"../input/feedback-prize-2021/train/{idx}.txt"
        with open(filename, "r") as f:
            text = f.read()
            id_to_text[idx]=text    

Below adds additional features that enable addition of the Nothing label and also might be used for post-processing, including prediction text, which based on specified characteristics, 
might be used to change Nothing to a valid class (e.g., if predictiontext contains some phrase then to a valid class name)

In [None]:
submission["predstringsplit"]=submission["predictionstring"].apply(lambda x:x.split())
submission["firstword_index"]=submission["predstringsplit"].apply(lambda x:int(x[0]))
submission.sort_values(["id", "firstword_index"],inplace=True)
submission["lastword_index"]=submission["predstringsplit"].apply(lambda x:int(x[len(x)-1]))
submission["prediction_text"]=submission.apply(lambda x:" ".join(id_to_text.get(x.id).split()[int(x.firstword_index):int(x.lastword_index)+1]), axis=1)
submission['nextfirstwordindex'] = submission['firstword_index'].shift(-1).apply(lambda x: int(x) if pd.notna(x) else None)
submission.drop("predstringsplit", axis=1, inplace=True)


In [None]:
submission.head()

Add the Nothing label:

In [None]:
nothing =submission.query("nextfirstwordindex !=lastword_index+1 and nextfirstwordindex>lastword_index") 
nothing.sort_values(["id", "firstword_index"],inplace=True)
insert_row=len(nothing)
for i,r in nothing.iterrows():
     id = r["id"]
     classname="Nothing"
     intermediate = range(int(r["lastword_index"]+1), int(r["nextfirstwordindex"]))
     predstring=" ".join([str(item) for item in intermediate])
     predstringsplit=predstring.split()
     firstwordindex=int(r["lastword_index"]+1)
     lastwordindex=firstwordindex + len(predstringsplit)-1 
     prediction_text= " ".join(id_to_text.get(id).split()[int(firstwordindex):int(lastwordindex)+1])
    
     nextfirstwordindex=lastwordindex +1
     nothing.loc[insert_row] = [id, classname,predstring, firstwordindex, lastwordindex,  prediction_text,nextfirstwordindex]
     insert_row += 1
submission=submission.merge(nothing, how='outer')
submission.sort_values(["id", "firstword_index"],inplace=True)

In [None]:
submission.head()

After pre-processing and prior to submission, remove the Nothing label and the added features.

In [None]:
submission.query("`class` !='Nothing'", inplace=True)
submission=submission[["id", "class", "predictionstring"]]
submission.head()
#submission.to_csv("submission.csv", index=False)