# 1. Generate Simplifications

In [1]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/bart-large-swipe-clean")
model = AutoModelForSeq2SeqLM.from_pretrained("Salesforce/bart-large-swipe-clean")

In [8]:
# Let's simplify this with the model
input_doc = "Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference. An introductory economics textbook describes econometrics as allowing economists to sift through mountains of data to extract simple relationships. Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today. A basic tool for econometrics is the multiple linear regression model. Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods.[10] Econometricians try to find estimators that have desirable statistical properties including unbiasedness, efficiency, and consistency. Applied econometrics uses theoretical econometrics and real-world data for assessing economic theories, developing econometric models, analysing economic history, and forecasting."
input_ids = tokenizer(input_doc, return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=200, temperature=1.5, num_beams=5, num_return_sequences=1, do_sample=True)
output_doc = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_doc)

# Output: Econometrics is a branch of economics. It is based on the application of statistical methods to economic data in order to give empirical content to the relationships between economic relationships. It is a quantitative analysis of actual economic phenomena based on analysis of theory and observation. Jan Tinbergen was one of the first people to talk about it. The other person was Ragnar Frisch. He invented the term in the sense that it is used today.

Econometrics is a branch of economics that uses statistical methods. It uses empirical analysis of economic data to find economic relationships. It is based on the concurrent development of theory and observation. Jan Tinbergen is the one of the two founding fathers of the field. The other, Ragnar Frisch, also created the term in the sense used today.


# 2. Print the explicit edit sequence

In [9]:
import utils_diff

print(utils_diff.make_colored_text(input_doc, output_doc))

Econometrics is[1;31man application[0m[1;32ma branch[0m of[1;32meconomics that uses[0m statistical methods[1;31mto[0m[1;32m. It uses empirical analysis of[0m economic data[1;31min order[0m to[1;31mgive empirical content to[0m[1;32mfind[0m economic relationships.[1;31mMore precisely, it[0m[1;32mIt[0m is[1;31mthe quantitative analysis of actual economic phenomena[0m based on the concurrent development of theory and observation[1;31m, related by appropriate methods of inference. An introductory economics textbook describes econometrics as allowing economists to sift through mountains of data to extract simple relationships[0m. Jan Tinbergen is[1;32mthe[0m one of the two founding fathers of[1;31meconometrics[0m[1;32mthe field[0m. The other, Ragnar Frisch, also[1;31mcoined[0m[1;32mcreated[0m the term in the sense[1;31min which it is[0m used today.[1;31mA basic tool for econometrics is the multiple linear regression model. Econometric theory uses statisti

# 3. Identify Edit Groups and Categories with BIC

In [10]:
from model_bic import BIC

bic = BIC("Salesforce/bic_simple_edit_id")

In [11]:
from utils_vis import visualize_edit_groups

edit_groups = bic.predict_from_text_pair(input_doc, output_doc)
visualize_edit_groups(input_doc, output_doc, edit_groups)

BIC Identified 15 edit groups
[lexical_generic               ] Econometrics is[1;32ma branch[0m[1;31man application[0m of [...]
[semantic_elaboration_example  ] [...] of[1;32meconomics that uses[0m statistical methods [...]
[syntactic_sentence_splitting  ] [...] statistical methods[1;32m.[0m[1;32m It uses empirical analysis of[0m[1;31mto[0m economic data [...]
[semantic_elaboration_example  ] [...] [1;32m It uses empirical analysis of[0m [...]
[syntactic_deletion            ] [...] economic data[1;31min order[0m to [...]
[lexical_generic               ] [...] to[1;32mfind[0m[1;31mgive empirical content to[0m economic relationships. [...]
[lexical_generic               ] [...] economic relationships.[1;32mIt[0m[1;31mMore precisely, it[0m is [...]
[semantic_deletion             ] [...] is[1;31mthe quantitative analysis of actual economic phenomena[0m based on the concurrent development of theory and observation [...]
[semantic_deletion             ] [...] based 