# Spacy helper functions
Helper functions for Spacy such as tokenizer, lemmatizer, entity extraction etc. Note that there is spacy and there is scispacy. Scispacy seems to be faster but doesn't seem to provide the subcategories of entities like spacy does. We may use a combination of the two. Also note that scispacy is trained with an older version of spacy (3.6.1)

### Usage
1. Install import-ipynb module (!pip install import-ipynb)
2. In the notebook to use these helper functions, add the following code:
- import import_ipynb
- import spacy_helper_methods as sphelpers

You can then access the methods defined here. For example
sphelpers.lemmatize(....)

In [1]:
#These files are needed for this notebook. If not installed, uncomment the lines and run the installation scripts
#!pip install spacy

In [2]:
import spacy as sp
import pandas as pd

In [3]:
## Lemmatize. Takes a Pandas Series as input and returns a lemmatized pandas series
def lemmatize(nlpname, ds):
    lemma_ds = ds.apply(lambda x: " ".join(token.lemma_ for token in nlpname(x) 
                                     if token.lemma_.lower() not in nlpname.Defaults.stop_words))
                                     
    return lemma_ds

In [4]:
# get named entity recognition
def get_entities(nlpname, ds):
    ent_ds = ds.apply(lambda x: [(ent.text, ent.label_) for ent in nlpname(x).ents])
    return ent_ds

In [5]:
# Add additional helper functions here

In [6]:
def filter_non_tech(x, model):
    if x:
        y_pred = model.predict(x)
    return(list(set([x[i] for i in range(len(x)) if y_pred[i]])))

def extract_tech_entities(nlp, model, ds):
    lemma_ds = lemmatize(nlp, ds)
    ent_ds = get_entities(nlp,lemma_ds)
    ent_ds = ent_ds.apply(lambda x: [ent[0] for ent in x])
    ent_ds = ent_ds.apply(lambda x: filter_non_tech(x, model))
    return ent_ds
