# Modele generative si cateva despre NLP
> Sper ca va voi capta atentia si sper sa cuprind cateva din elementele esentiale din lumea neuronilor virtuali. Despre ce inseamna sa antrenezi modele.

- toc: false 
- badges: false
- comments: true
- categories: [npl, introducere, machine-learning]

Acum ceva timp am început sa descopăr cu stupoare lumea din spatele analizelor de date și cat de multa energie poți depune pentru a afla ori demonstra un fapt aproape nesemnificativ și apoi sa dainue acolo pana e băgat în seama de câțiva experți, lăudat, comentat iar apoi va dăinui în continuare în speranța ca va folosi cuiva.


Mergand la cursurile de statistica din facultate am vazut cum niste numere absurde, ori raspunsul al catorva intrebari ce par a iti invada viata, pot extrage informatii din lumea mundana intr-un "bar chart" sau o "regresie liniara". Si mai departe trage concluzii din sfera pozitiva ori negativa. M-a fascinat, m-a prins, iar tot ce stiam eu despre matematica impreuna cu cunostiintele de programare s-au aliniat cu dorinta de a sintetiza si explica lumea. Parea foarte simplu, gasesti o ecuatie, ii administrezi niste date, calculezi cu precizie cat de aproape de adevar este si ipoteza ta despre cum arata viata v-a schimba ceva, asta daca nu altcineva a demonstrat acelasi lucruri inaintea ta.


In [None]:
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,BloomTokenizerFast, BloomForCausalLM, PegasusForConditionalGeneration,PegasusTokenizer

seq_tokenizer = AutoTokenizer.from_pretrained("alk/pegasus-scitldr")
seq_model = AutoModelForSeq2SeqLM.from_pretrained("alk/pegasus-scitldr")

gen_tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom")
gen_model = BloomForCausalLM.from_pretrained("bigscience/bloom-1b1")

In [None]:
def paraphrase_text(text):
    inputs = seq_tokenizer(text, max_length=1024, return_tensors="pt")
    summary_ids = seq_model.generate(inputs["input_ids"])
    return seq_tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]


def expand_text(text):
    input = gen_tokenizer(text, return_tensors="pt" )
    output = gen_model.generate(
        input["input_ids"],
        do_sample=True,
        max_new_tokens=50,
        top_p=0.92, 
        top_k=50, 
        temperature=0.5,
        num_return_sequences=3,
    )
    return [gen_tokenizer.decode(suggestion, skip_special_tokens=True) for suggestion in output]

def print_results(title, abstract):
    summary = paraphrase_text(abstract)
    print('-------- Summary------------')
    print('')
    print(summary)
    print('')
    print('')

    for i,possible_future in enumerate(expand_text(summary)):
        print(f'--------- Possible future {i+1} -----')
        print('')
        print(possible_future)
        print('')

In [None]:
title = 'The importance of recruitment and selection process for sustainability of total quality management'
abstract = '''
Management literature discusses that the behavioral traits of employees can play an important role in the success of total quality management (TQM). However, little empirical research exists in this regard. Using an international dataset, the present study investigates: the impact of quality management practices on plant competitiveness; and the moderating effect of an employee selection process on the relationship between quality management practices and plant competitiveness. Results show that quality management practices positively impact plant competitiveness. Furthermore, the behavioral traits of employees seem to have a significant impact on the effectiveness of quality management practices. This implies that managers should pay close attention to prospective employees’ behavioral traits and their fit with the TQM philosophy. Managers should not limit their attention to potential employees’ technical skills.
'''

print_results(title, abstract)

In [None]:


print_results(title, abstract)

In [None]:
import torch
from transformers import CTRLTokenizer, CTRLLMHeadModel

tokenizer = CTRLTokenizer.from_pretrained("ctrl")
model = CTRLLMHeadModel.from_pretrained("ctrl")


In [10]:
title = 'Discrimination in Recruitment: An Empirical Analysis'
abstract = '''
WITH passage of the Civil Rights Act
and the Equal Employment Opportunity Act, as well as the issuance of Executive Orders 11246 and 11375, considerable
progress has been made in mandating equal
opportunity for minorities and women. The
extent and nature of actual progress made,
however, is currently a subject of considerable debate. Many minority action
groups claim that the position of women
and minorities has not improved, while
critics of affirmative action policies claim
that organizations have been so zealous in
their efforts to hire blacks and females that
white males have become the victims of
reverse discrimination. The arguments offered by either side, however, are seldom
supported by empirical evidenc'''


# CTRL was trained with control codes as the first token
inputs = tokenizer(title, return_tensors="pt")
assert inputs["input_ids"][0, 0].item() in tokenizer.control_codes.values()

sequence_ids = model.generate(inputs["input_ids"], max_length=len( inputs["input_ids"][0])+49)
sequences = tokenizer.batch_decode(sequence_ids)
sequences

39
