In [2]:
! pip install flair

Defaulting to user installation because normal site-packages is not writeable
Collecting flair
  Using cached flair-0.13.1-py3-none-any.whl.metadata (12 kB)
Collecting boto3>=1.20.27 (from flair)
  Downloading boto3-1.34.57-py3-none-any.whl.metadata (6.6 kB)
Collecting bpemb>=0.3.2 (from flair)
  Using cached bpemb-0.3.4-py3-none-any.whl.metadata (19 kB)
Collecting conllu>=4.0 (from flair)
  Using cached conllu-4.5.3-py2.py3-none-any.whl.metadata (19 kB)
Collecting deprecated>=1.2.13 (from flair)
  Using cached Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting ftfy>=6.1.0 (from flair)
  Using cached ftfy-6.1.3-py3-none-any.whl.metadata (6.2 kB)
Collecting gdown>=4.4.0 (from flair)
  Using cached gdown-5.1.0-py3-none-any.whl.metadata (5.7 kB)
Collecting gensim>=4.2.0 (from flair)
  Downloading gensim-4.3.2-cp39-cp39-macosx_11_0_arm64.whl.metadata (8.5 kB)
Collecting huggingface-hub>=0.10.0 (from flair)
  Downloading huggingface_hub-0.21.4-py3-none-any.whl.metadata (13 

Flair uses PyTorch/TensorFlow in under the hood, so it's essential that you also have one of the two libraries (or both) installed.

In [3]:
import flair
#english language model for sentiment analysis
model = flair.models.TextClassifier.load('en-sentiment')

  from .autonotebook import tqdm as notebook_tqdm


Our next step is to tokenize input text. For this we use the Flair Sentence object, which we initialize by passing our text into it:

In [4]:
text = "I like you. I love you"  # we are expecting a confidently positive sentiment here

sentence = flair.data.Sentence(text)

sentence

Sentence[7]: "I like you. I love you"

In [5]:
model.predict(sentence)
# The predict method doesn't output our prediction, instead the predictions are added to our sentence:

sentence

Sentence[7]: "I like you. I love you" → POSITIVE (0.9933)

In [6]:
sentence.get_labels()

['Sentence[7]: "I like you. I love you"'/'POSITIVE' (0.9933)]

In [7]:
sentence.get_labels()[0].value

'POSITIVE'

Now let's try with `nft` related posts extracted from the crawler


In [8]:
post = '📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛️ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas! #fintech #friday #nfts #nftcommunity #blockchain #cryptocurrency #cryptoexchange #art'
sentence = flair.data.Sentence(post)

sentence

Sentence[52]: "📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas! #fintech #friday #nfts #nftcommunity #blockchain #cryptocurrency #cryptoexchange #art"

In [9]:
model.predict(sentence)
sentence.get_labels()

['Sentence[52]: "📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas! #fintech #friday #nfts #nftcommunity #blockchain #cryptocurrency #cryptoexchange #art"'/'POSITIVE' (0.8608)]

This post content could be more considered as negative than positive since it is questioning the NFTs hype, or even neutral but not at all positive. So we can conclude the model is probably not going to work just like this. We should built our own or try to adjust it.

I will experiment to see if taking out the hasthags improve the true content

In [10]:
post = '📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛️ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas! #fintech #friday #nfts #nftcommunity #blockchain #cryptocurrency #cryptoexchange #art'
sentence = flair.data.Sentence(post)
model.predict(sentence)
sentence.get_labels()

['Sentence[52]: "📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas! #fintech #friday #nfts #nftcommunity #blockchain #cryptocurrency #cryptoexchange #art"'/'POSITIVE' (0.8608)]

In [11]:
import re

def remove_hashtags(text):
    # Use a regular expression to find and remove hashtags
    #replaces every hashtag with a ''
    cleaned_text = re.sub(r'#\w+', '', text)
    return cleaned_text.strip()


In [12]:
post = '📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛️ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas! #fintech #friday #nfts #nftcommunity #blockchain #cryptocurrency #cryptoexchange #art'
#
post = remove_hashtags(post)
sentence = flair.data.Sentence(post)
model.predict(sentence)
sentence.get_labels()

['Sentence[36]: "📊  Fintech Focus 🔎  📢  NFTs - Are Non-Fungible Tokens The Next Big Thing  ⌛ or Just Hype? Read my latest post  👇 to learn more and share your thoughts/experiences/ideas!"'/'NEGATIVE' (0.9315)]

In [13]:
sentence.get_labels()[0].value

'NEGATIVE'

We can see that removing the hashtags definitely improved the prediction. So now, let's test it with more post contents.

In [14]:
import os
def txt_to_list(file_path):

    lines_list = []

    try:
        with open(file_path, 'r') as file:
            for line in file:
                # Append each line as a string to the list (remove newline characters)
                lines_list.append(line.strip())
    except FileNotFoundError:
        print(f"Error: The file '{file_path}' was not found.")
    except Exception as e:
        print(f"An error occurred: {e}")

    return lines_list


In [15]:
path = 'descriptions.txt'

posts = txt_to_list(path)
posts[:2]

["Last week I had the pleasure to NFT Paris by the hand of Arianee . Witnessing the transformative power of Web3 and NFTs in reshaping the concept of ownership, particularly in relation to our personal data, left me inspired and eager to delve deeper into the evolving landscape of decentralized technologies. The event sparked valuable insights, and I'm excited to continue navigating the dynamic realm of blockchain innovations. #NFTParis #Web3 #OwnershipRevolution",
 'Hello everyone Today, my friends Asma Ghamacha , Hermes Yan NTJAM NDJENG , Harold Geumtcheng , Aloys Aymrick Nzooh , Bryan Fozame and I had the chance to be part of the NFT Paris conference thanks to our school aivancity School for Technology, Business & Society Paris-Cachan where we learned a wealth of new information about blockchain, metaverse, web3 and its use cases across various industries such as finance, gaming, luxury, and more. During this enI had the privilege to engage in discussions with numerous brilliant web

In [16]:
for i in range(len(posts)):
    posts[i] = remove_hashtags(posts[i])
    if posts[i] == '':
        posts[i] = 'This is a neutral comment'

posts



["Last week I had the pleasure to NFT Paris by the hand of Arianee . Witnessing the transformative power of Web3 and NFTs in reshaping the concept of ownership, particularly in relation to our personal data, left me inspired and eager to delve deeper into the evolving landscape of decentralized technologies. The event sparked valuable insights, and I'm excited to continue navigating the dynamic realm of blockchain innovations.",
 'Hello everyone Today, my friends Asma Ghamacha , Hermes Yan NTJAM NDJENG , Harold Geumtcheng , Aloys Aymrick Nzooh , Bryan Fozame and I had the chance to be part of the NFT Paris conference thanks to our school aivancity School for Technology, Business & Society Paris-Cachan where we learned a wealth of new information about blockchain, metaverse, web3 and its use cases across various industries such as finance, gaming, luxury, and more. During this enI had the privilege to engage in discussions with numerous brilliant web3 developers and CEOs from companies 

In [28]:
import pandas as pd

df = pd.DataFrame(columns=['content', 'score', 'prediction'])

# Assuming 'model' is defined somewhere in your code
# model = ...

for i in range(len(posts)):
    sentence = flair.data.Sentence(posts[i])
    model.predict(sentence)
    
    # Extract relevant information from sentence and append to the DataFrame
    content = posts[i]
    score = sentence.labels[0].score
    prediction = sentence.labels[0].value
    
    print('content: ', content[:15], 'score: ', score, 'prediction: ', prediction)


content:  Last week I had score:  0.9998553991317749 prediction:  POSITIVE
content:  Hello everyone  score:  0.9989281296730042 prediction:  POSITIVE
content:  The digital wor score:  0.998602569103241 prediction:  POSITIVE
content:  Can’t wait to w score:  0.999728262424469 prediction:  POSITIVE
content:  DualMint is set score:  0.9997445940971375 prediction:  POSITIVE
content:  This Week on RW score:  0.9849713444709778 prediction:  POSITIVE
content:  Join me Saturda score:  0.973907470703125 prediction:  POSITIVE
content:  NFT Paris 2024  score:  0.9457706212997437 prediction:  POSITIVE
content:  Hello everyone  score:  0.9994600415229797 prediction:  POSITIVE
content:  “Comment achete score:  0.907588005065918 prediction:  POSITIVE
content:  A guide to   an score:  0.9981526732444763 prediction:  POSITIVE
content:  For a long time score:  0.9995279312133789 prediction:  POSITIVE
content:  Brands can now  score:  0.9996111989021301 prediction:  POSITIVE
content:  neutral score:  0.9