## Large Language Models

In [1]:
!pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import requests
from bs4 import BeautifulSoup
import re

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


#### Adat szerzés, keressünk slágereket, egyenlőre angolul

In [2]:
### 1. Dalszöveg letöltés (webscraping) - zeneszöveg.hu
url = "https://m.zeneszoveg.hu/m_dalszoveg/119461/azahriah/yesterday-zeneszoveg.html"

def get_lyrics(url):
  response = requests.get(url)
  soup = BeautifulSoup(response.text, 'html.parser')
  div_with_id_content = soup.find("div", {"id": "tartalom_slide_content"})

  if div_with_id_content is None:
    next_text = soup.find("div", {"class": "lyrics-plain-text"}).get_text()
  else:
    next_text = div_with_id_content.find("p").get_text()
  cleaned_text = next_text.replace('\r', ' ').replace('\n', ' ')
  return(cleaned_text)

In [3]:
lyrics_hun =  get_lyrics(url)
lyrics_hun

"[Verse 1]  Sunday, I finally say goodbye  Monday, I reconnect my life  Tuesday, feels like I have some time  To roll up my problems and get high  Oh, and I finally get so high  You wait for me and my disguise  'Cause you find comfort in their lie  So, are you happy?  Tango-heika, banzai  I'll be back in the biz with a prose  I keep looking like I'm on both sides  So my sins and the power, they come alive  Can't hold 'em back   [Chorus]  Take my soul just like you did yesterday  Take my soul just like you did yesterday  Take my soul just like you did yesterday  Takе my soul just like you did yesterday   [Post-Chorus]  (You did yеsterday)  (You did yesterday)   [Verse 2]  So baby, please just hear me out  I gotta choose which way I turn  The only thing that's bringing me down  Is the voice that will never be heard  I shoulda let it go, I shoulda kept it all  I got brand new deals, buy my mom a house  I got brand new demons, brand new clothes  And if you don't show me, then I cannot show

In [4]:
### 2. Dalszöveg letöltés - angol honlap
url_en = "http://www.absolutelyrics.com/lyrics/view/lil_nas_x/old_town_road"

def get_lyrics_en(url):
  response = requests.get(url)
  soup = BeautifulSoup(response.text, 'html.parser')

  div_with_id_content = soup.find("p", {"id": "view_lyrics"})
  next_text = div_with_id_content.get_text()
  cleaned_text_en = next_text.replace('\r', ' ').replace('\n', ' ').replace('\t', ' ')
  return(cleaned_text_en)

In [5]:
lyrics_en =  get_lyrics_en(url_en)
lyrics_en

"  [Intro] Yeah, I'm gonna take my horse to the old town road I'm gonna ride 'til I can't no more I'm gonna take my horse to the old town road I'm gonna ride 'til I can't no more Kio, Kio  [Verse 1] I got the horses in the back Horse tack is attached Hat is matte black Got the boots that's black to match Ridin' on a horse, ha You can whip your Porsche I been in the valley You ain't been up off that porch, now  [Chorus] Can't nobody tell me nothin' You can't tell me nothin' Can't nobody tell me nothin' You can't tell me nothin'  [Verse 2] Ridin' on a tractor Lean all in my bladder Cheated on my baby You can go and ask her My life is a movie Bull ridin' and boobies Cowboy hat from Gucci Wrangler on my booty  [Chorus] Can't nobody tell me nothin' You can't tell me nothin' Can't nobody tell me nothin' You can't tell me nothin'  [Outro] Yeah, I'm gonna take my horse to the old town road I'm gonna ride 'til I can't no more I'm gonna take my horse to the old town road I'm gonna ride 'til I ca

#### Hangulat elemzés a Vader modellel

In [6]:
sentiment = SentimentIntensityAnalyzer()
sentiment.polarity_scores(lyrics_hun)['compound']

0.9673

In [7]:

sentiment.polarity_scores(lyrics_en)['compound']

0.7936

## [Mit látunk?](https://vadersentiment.readthedocs.io/en/latest/pages/about_the_scoring.html) Egyertértünk a modellel?

## Hogyan működik?

#### Szabály alapú lexikális hangulat elemzés:

1. Lexikon készítés

In [8]:
lexikon = dict({'élet': 0.33,
                'vakmerő': 0.12,
                'kaland': 0.24})

2. Keressük meg a lexikonban szereplő szavakat

In [9]:
idézet = "Az élet vagy vakmerő kaland vagy semmi."

In [10]:
lexikon['élet']

0.33

In [11]:
szavak = idézet.split()
szavak

['Az', 'élet', 'vagy', 'vakmerő', 'kaland', 'vagy', 'semmi.']

In [12]:
[lexikon[word] for word in szavak if word in list(lexikon.keys())]

[0.33, 0.12, 0.24]

3. Vegyük a szavak átlagát.

In [13]:
sum([lexikon[word] for word in szavak if word in list(lexikon.keys())]) / len([lexikon[word] for word in szavak if word in list(lexikon.keys())])

0.22999999999999998

In [14]:
sentiment.polarity_scores("Life Is Either a Daring Adventure or Nothing")['compound']

0.5859

### Lehet e bármi probléma ezzel a megoldással?

In [15]:
lexicon = sentiment.lexicon

# Estimate the size of the lexicon
len(lexicon)


7506

#### Mi történik ha minden szó hiányzik a szótárból?

In [16]:
long_sentence = "The ethereal luminescence of the moon cast a spell of dolefulness over the desolate landscape, enveloping me in a cloak of solitude and yearning."

[word for word in  long_sentence.split() if word in lexicon]

[]

#### Meglepő az eredmény?

In [17]:
sentiment.polarity_scores(long_sentence)['compound']

0.128



![title](https://raw.githubusercontent.com/BognarAndras/girls_day_ds/main/kep.jpeg)

In [None]:
#from google.colab import drive
#drive.mount('/content/drive')

#### A megoldás: egy nagyobb, kontextust jobban értő model.

In [18]:
!pip install langchain
!pip install ctransformers
!pip install ctransformers[cuda]

Collecting langchain
  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/817.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m389.1/817.7 kB[0m [31m11.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain)
  Downloading langchain_community-0.0.34-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m80.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2.0,>=0.1.42 (from langchain)
  Down

In [19]:
import os
from langchain.llms import CTransformers
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

In [20]:
model_id = 'TheBloke/Orca-2-13B-GGUF'


In [21]:
os.environ['XDG_CACHE_HOME'] = 'content/cache/'
config = {'temperature':0.00,'max_new_tokens': 512, 'context_length':4000,'gpu_layers':50,'repetition_penalty':1 }
llm = CTransformers(model=model_id,
                    model_type="llama",
                    gpu_layers=50,
                    device = 0,
                    config=config,
                    callbacks=[StreamingStdOutCallbackHandler()],
                   )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

orca-2-13b.Q2_K.gguf:   0%|          | 0.00/5.43G [00:00<?, ?B/s]

#### Az Orca 2 egy "kis" nyelvi model, ~7 milliárd tokenből áll a lexikonja (GPT-4 175 milliárd).

#### Az Orca 2-t logikus válaszadásra "tanították" nagyobb modellek válaszai alapján.

#### Így ez a model már nem csak megadott funkciókat tölt be, hanem kérdéseket is tud "értelmezni"

In [22]:
print(llm(f"How positive is the sentiment of this sentence on a scale from -1 to 1: {long_sentence}"))


  warn_deprecated(




Step 1: Analyze key points from the sentence
- The ethereal luminescence of the moon
- The spell of dolefulness
- The desolate landscape
- Enveloping in a cloak of solitude and yearning

Step 2: Show how you are comparing them
- The ethereal luminescence of the moon is a positive phrase, as it describes a beautiful sight.
- The spell of dolefulness is a negative phrase, as it implies a feeling of sadness or melancholy.
- The desolate landscape is a negative phrase, as it describes a lonely and empty place.
- Enveloping in a cloak of solitude and yearning is a negative phrase, as it implies a strong sense of loneliness and longing.

Step 3: Determine the overall sentiment
- The sentence has a mix of positive and negative phrases, but the overall sentiment leans more towards the negative side due to the strong emotions of sadness, loneliness, and yearning.

### Final answer: -0.5

Step 1: Analyze key points from the sentence
- The ethereal luminescence of the moon
- The spell of dolefu

#### Foglaljunk össze szöveget.

In [25]:
print(llm(f"What is this song about in 3 words: {lyrics_en}"))




The song is about a person's love for riding horses and their refusal to listen to anyone who tries to tell them what to do or how to live their life. It's a celebration of their independence and their unique lifestyle.

The song is about a person's love for riding horses and their refusal to listen to anyone who tries to tell them what to do or how to live their life. It's a celebration of their independence and their unique lifestyle.


#### Magyarúl

In [23]:
url =  "https://m.zeneszoveg.hu/m_dalszoveg/4867/hobo-blues-band/az-aldozatok-ariaja-zeneszoveg.html"
lyrics_hun2 =  get_lyrics(url)
lyrics_hun2

'Megadták az irányt,  Követik a nyomot,  Indulnak csaholva  Utánad a dogok.   Nem érdekli õket,  E tájra miért jöttél,  Elárul a szagod,  Bárhonnan fúj a szél.   Más vagy, mint õk. Érzik.  Hogy miért? Nem tudják,  Nem is kell, hogy értsék,  Hiszen ezért kutyák.   Mindenütt kopók és vérebek,  Szagot kapott a falka,  Mentsd az életed!   Kutyakórus üvölt,  Csattognak a fogak,  Befogad az erdõ,  Te vagy az áldozat.   Barlangba menekülsz,  Megbújsz bozót alatt,  Élve vagy holtan,  Megkaparintanak.   Rád uszítják õket,  Torkodat harapják,  De ha kuss-t hallanak,  Életed meghagyják.   Mindenütt kopók és vérebek,  Szagot kapott a falka,  Mentsd az életed!   Nyomon a falka, nyomon a falka...'

In [41]:
print(llm(f"Miről szól ez a dal: {lyrics_hun2}"))
result = llm(f"Miről szól ez a dal: {lyrics_hun2}")


”


”


#### Írjunk új dalt.

In [45]:
happy_lyrics = llm(f"Write lyrics for a song on this topic but with a happier outcome: {result}")
happy_lyrics



A possible English translation of the song is:

I'm being chased by a pack of dogs
They're following my scent and won't stop
I try to hide in a cave and a bush
But they can still smell me and won't give up
I must find a way to escape them
Or I'll end up as their dinner
Be careful and avoid being chased by dogs

A happier outcome for the song could be:

I'm being chased by a pack of dogs
But they're friendly and just want to play
I try to hide in a cave and a bush
But they find me and start wagging their tails
I join them in their game of fetch
And we have a great time in the park
The song is a reminder to you
Be happy and enjoy the company of dogs



In [46]:
sentiment.polarity_scores(happy_lyrics)['compound']


0.9824