# [Transformers](https://github.com/kokchun/Deep-learning-AI21/blob/main/Lectures/Lec8-Transformers.ipynb)

In [3]:
import spacy
from spacy import displacy

# python3 -m spacy download en_core_web_md

# this is not a transformers model 
nlp_en_md = spacy.load("en_core_web_lg")

# text from here 
# https://en.wikipedia.org/wiki/Explainable_artificial_intelligence
text_sample = """As regulators, official bodies, and general users come to depend on AI-based dynamic systems, clearer accountability will be required for automated decision-making processes to ensure trust and transparency. Evidence of this requirement gaining more momentum can be seen with the launch of the first global conference exclusively dedicated to this emerging discipline, the International Joint Conference on Artificial Intelligence: Workshop on Explainable Artificial Intelligence (XAI).[63]

The European Union introduced a right to explanation in the General Data Protection Right (GDPR) as an attempt to deal with the potential problems stemming from the rising importance of algorithms. The implementation of the regulation began in 2018. However, the right to explanation in GDPR covers only the local aspect of interpretability. In the United States, insurance companies are required to be able to explain their rate and coverage decisions.[64]
"""

doc = nlp_en_md(text_sample)
print(type(doc))

displacy.render(doc, style="ent")

<class 'spacy.tokens.doc.Doc'>


In [4]:
nlp_en_trf = spacy.load("en_core_web_trf")
doc = nlp_en_trf(text_sample)
displacy.render(doc, style="ent")



In [5]:
entities = {f"{entity}": entity.label_ for entity in doc.ents}
entities

{'first': 'ORDINAL',
 'the International Joint Conference on Artificial Intelligence': 'EVENT',
 'The European Union': 'ORG',
 'the General Data Protection Right': 'LAW',
 '2018': 'DATE',
 'GDPR': 'LAW',
 'the United States': 'GPE'}

In [6]:
nlp_swe = spacy.load("sv_core_news_sm")

# text from here
# https://www.svt.se/nyheter/utrikes/klimatkrisen-gar-att-losa-har-ar-sex-tekniker-som-visar-pa-vagen-framat
text_sample_swe = """
Grannlandet Norge har kommit långt med att elektrifiera sin bilflotta. Om ett år kommer nybilsförsäljningen i Norge vara uppe i 100 procent bilar med sladd. Min kollega , techkorrespondenten Alexander Norén berättar att det som förbluffade honom när han åkte till Norge för att få förklaringen till elbilsboomen där var hur starka de ekonomiska incitamenten är, att det för många är en plånboksfråga att dumpa fossilbilen. 
"""

doc_swe = nlp_swe(text_sample_swe)
displacy.render(doc_swe, "ent")

In [9]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("marma/bert-base-swedish-cased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("marma/bert-base-swedish-cased-sentiment")
sentiment = pipeline("sentiment-analysis", model = 'marma/bert-base-swedish-cased-sentiment')

In [10]:
sentiment("cool pryl där")

[{'label': 'POSITIVE', 'score': 0.9981237053871155}]

In [11]:
sentences = [
    "Jag älskar dig så mycket",
    "Skit, vad jag gillar dig",
    "Skitbra eller skitdåligt?",
    "Boken är OK",
    "AI är väl okej coolt, I guess",
    "Den här boken är sådär",
    "svår"
]

for sentence in sentences:
    label, score = sentiment(sentence)[0]["label"], sentiment(sentence)[0]["score"]
    print(f"{sentence}: {label}, {score:.3f}")

Jag älskar dig så mycket: POSITIVE, 0.999
Skit, vad jag gillar dig: POSITIVE, 0.999
Skitbra eller skitdåligt?: NEGATIVE, 0.997
Boken är OK: POSITIVE, 0.993
AI är väl okej coolt, I guess: NEGATIVE, 0.972
Den här boken är sådär: NEGATIVE, 0.995
svår: NEGATIVE, 0.995


In [13]:
from transformers import pipeline

gpt2 = pipeline('text-generation', model='gpt2')

gpt2("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I'm a language model, and this topic is an example of me learning language models. In your presentation, I'll describe what would be"},
 {'generated_text': "Hello, I'm a language model, so what do I need to be good at? To be good at speaking, in order to be good with"},
 {'generated_text': "Hello, I'm a language model, not a language. No, I'm neither, nor have I ever been. This is my way of expressing"},
 {'generated_text': 'Hello, I\'m a language model, not a tool of some kind. "I have a programmatic understanding of semantics, of programming language constructs or'},
 {'generated_text': "Hello, I'm a language model, and I'm not using all of my writing in Python. The only programming language I have access to is C"}]

In [22]:
print(gpt2('Welcome to IT-Högskolan, we are a school specialized in IT. Our school has around 500 students, we are in Göteborg and Stockholm', max_lenght=10000)[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Welcome to IT-Högskolan, we are a school specialized in IT. Our school has around 500 students, we are in Göteborg and Stockholm, and our motto is: "We need to create an organization that wants to connect


In [23]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("birgermoell/swedish-gpt")

model = AutoModelForCausalLM.from_pretrained("birgermoell/swedish-gpt")

Downloading: 100%|██████████| 207/207 [00:00<00:00, 207kB/s]
Downloading: 100%|██████████| 835k/835k [00:03<00:00, 224kB/s]  
Downloading: 100%|██████████| 501k/501k [00:01<00:00, 289kB/s]  
Downloading: 100%|██████████| 1.40M/1.40M [00:03<00:00, 434kB/s]
Downloading: 100%|██████████| 24.0/24.0 [00:00<00:00, 24.0kB/s]
Downloading: 100%|██████████| 90.0/90.0 [00:00<00:00, 87.4kB/s]
Downloading: 100%|██████████| 863/863 [00:00<00:00, 863kB/s]
Downloading: 100%|██████████| 487M/487M [00:16<00:00, 31.0MB/s] 


In [24]:
gpt_swe = pipeline('text-generation', model='birgermoell/swedish-gpt')
gpt_swe('Jag har ätit pannkaka')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Jag har ätit pannkaka varje dag sedan 2010-2012, men bara för att jag gillar pannkaka. Och somnar lätt. Men det här är mitt svar på era frågor. För om ni frågar om någon vill ha recept på pannkakor blir svaret'}]