<a href="https://colab.research.google.com/github/massivetexts/open-scoring/blob/master/notebooks/Term_Weighting_OCS_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Testing Term-Weighting in [Open Creativity Scoring](https://openscoring.du.edu)

Below, you can enter a phrase and see a visual output of what it's relative value in the phrase is.


In [None]:
#@title Setup
#@markdown <-- Press the run button to run this cell.
import pandas as pd
from IPython.display import display_html, display, HTML
import spacy
nlp = spacy.load("en_core_web_sm")

#Download term weights
!wget https://github.com/massivetexts/open-scoring/raw/master/data/idf-vals.parquet

idf = pd.read_parquet('idf-vals.parquet')
idf.head(1000).sample() # random word from top 1k

In [30]:
phrase = "This is a test of the Open Scoring System term weighting. It shows the relative value of words in the score." #@param {type:'string'}
doc = nlp(phrase, disable=['tagger', 'parser', 'ner', 'lemmatizer'])
df = pd.DataFrame([(word, word.lower_) for word in doc], columns=['original', 'token']).merge(idf, how='left')
# for words not in the IDF dictionary, use score of something around 10k.
df = df.fillna(idf.iloc[10000]['IPF'])
df['opacity'] = df['IPF'] / df['IPF'].max()
get_span = lambda word, opacity: f'<span style="opacity:{opacity}">{word}</span>'

spans = df.apply(lambda x: get_span(x['original'], x['opacity']), axis=1)
final_html = " ".join(spans.tolist())
display(HTML(f"<h2>{final_html}</h2>"))