Flair uses PyTorch/TensorFlow in under the hood, so it's essential that you also have one of the two libraries (or both) installed.

In [2]:
import flair
#english language model for sentiment analysis
model = flair.models.TextClassifier.load('en-sentiment')

Our next step is to tokenize input text. For this we use the Flair Sentence object, which we initialize by passing our text into it:

In [3]:
text = "I like you. I love you"  # we are expecting a confidently positive sentiment here

sentence = flair.data.Sentence(text)

sentence

Sentence[7]: "I like you. I love you"

In [4]:
model.predict(sentence)
# The predict method doesn't output our prediction, instead the predictions are added to our sentence:

sentence

Sentence[7]: "I like you. I love you" → POSITIVE (0.9933)

In [5]:
sentence.get_labels()

['Sentence[7]: "I like you. I love you"'/'POSITIVE' (0.9933)]

In [6]:
sentence.get_labels()[0].value

'POSITIVE'

Now let's try with `nft` related posts extracted from the crawler


We can see that removing the hashtags definitely improved the prediction. So now, let's test it with more post contents.

In [7]:
posts = ['Last week I pleasure NFT Paris hand Arianee Witnessing transformative power Web NFTs reshaping concept ownership particularly relation personal data left inspired eager delve deeper evolving landscape decentralized technologies The event sparked valuable insights I excited continue navigating dynamic realm blockchain innovations',
 'Hello everyone Today friends Asma Ghamacha Hermes Yan NTJAM NDJENG Harold Geumtcheng Aloys Aymrick Nzooh Bryan Fozame I chance part NFT Paris conference thanks school aivancity School Technology Business Society Paris Cachan learned wealth new information blockchain metaverse web use cases across various industries finance gaming luxury During enI privilege engage discussions numerous brilliant web developers CEOs companies like Maxence Perray Nomiks Victor Briere Arianee Ubisoft Louis Vuitton many others These conversations provided deeper insights innovative technology intersection data science particularly terms transparency tokenization blockchain',
 'The digital world continues intersect traditional art forms online trading platform Robinhood partners Notable art bring prominent artist Hunt Slonem work wider audience use Non Fungible Tokens NFTs Sign website automatically enter monthly prize raffle https lnkd dMcKcrpf',
 'Can wait welcome everyone Paris week NFT Paris Shout Alexandre Tsydenkov team NFT Paris done awesome job creating curating one relevant B B B C web events world Arianee I present throughout week side events panels course main event rd th demos workshop This year Arianee taking different approach new activations taking VIP lounge exhibiting brands building technology BREITLING Panerai Moncler many showcasing tokenized digital product passports Join stage February rd The opening keynote From Hype Purpose Redefining NFTs next billion users CET A panel ian rogers Chief Experience Officer Ledger Gmoney Unleashing Potential Digital Luxury Market CET Our fellow team members esteemed partners also stage Delphine Edde CMO moderating panel Digital Product Passports Physical Goods From Post Purchase Engagement Circular Business Models Eva Assayag Head IS Organization Projects Panerai Adrian Corsin Managing Director MUGLER Michele Lo Forte Global Head E Commerce Digital Customer Engagement BREITLING Friday rd CET Alexandre Mare joining Fabien Aufrechter Head Web Vivendi Sandy Carter COO Unstoppable Domains moderated Farokh Sarmad Rug Media discuss Onboarding Next Billions Users Web Use Cases Challenges Opportunities Saturday th CET Our Lead Developer Maxime Vaullerin host workshop Enhance Digital Product Passport Utilities Interoperability Saturday th pm CET Last least co hosting Speakers Dinner Thursday nd lunch Polygon Labs rd VIP Lounge If time go check Musee Orsay dear friend Agoria amazing musician NFT artist opened exhibition two extraordinary works created specifically museum Go check I might organizing little private tour DM interested It going crazy week looking forward seeing Thanks amazing Arianee team making happen Don hesitate drop note interested setting meeting want attend side events need food tips Paris Click get full Arianee agenda https lnkd euJePd D Save Date February Location Grand Palais Ephemere Paris',
 'DualMint set launch Toji NFT Japanese sake tokenized blockchain nd March It collaboration year old sake producers Daimon Brewery This partnership big deal showcases real world assets tokenization apart existing real estate tokenization financial product tokenization Want part launch Read latest newsletter This Week RWA Insights Dive Day Day leverages RWAs AI redefine insurance Get ready DualMint Toji NFT launch March nd Explore featured blog post revolutionizing commodities market Don miss groundbreaking insights',
 'Join Saturday February th NFT Paris I speaking alongside Fabien Aufrechter Head Web Vivendi Sandy Carter COO Unstoppable Domains moderated Farokh Sarmad Rug Media discuss Onboarding Next Billions Users Web Use cases Challenges Opportunities Discover Arianee digital product passports unlock new circular economy Arianee also massive presence NFT Paris full panels workshops demo See full agenda Save Date February Location Grand Palais Ephemere Paris NFT Paris starts exactly one week Besides taking VIP Lounge couple things sleeve Catch keynote panels workshop look Arianee team hint might wearing something pink especially Builder Zone booth Check details save share post see next week Day February rd Opening keynote From Hype Purpose Redefining NFTs The Next Billion Users Pierre Nicolas Hurstel CEO Co Founder Arianee Main Stage Panel Digital Product Passports Physical Goods From Post Purchase Engagement Circular Business Models Eva Assayag Head IS Organization Projects Panerai Adrian Corsin Managing Director MUGLER Michele Lo Forte Global Head E Commerce Digital Customer Engagement BREITLING Moderated Delphine Edde CMO Arianee pm Main Stage Panel Unleashing Potential Digital Luxury Market ian rogers Chief Experience Officer Ledger Gmoney Pierre Nicolas CEO Co Founder Arianee Moderated Amanda Cassatt u b u b CEO Founder Serotonin Main Stage Day February th Panel Onboarding The Next Billions Users Web Use Cases Challenges Opportunities Fabien Aufrechter Head Web Vivendi Sandy Carter COO Unstoppable Domains Alexandre Mare COO Arianee Moderated Farokh Sarmad Rug Radio pm Main Stage Workshop Enhance Digital Product Passport Utilities Interoperability The Arianee Case Maxime Vaullerin Lead Developer Arianee pm pm Eiffel Stage Look announcements next week Book meeting us advance https lnkd guSVreC Still secured tickets Follow link get special discount code PN https lnkd dn D xHR Discover digital product passport solutions https lnkd e hD Save Date February Location Grand Palais Ephemere Paris See',
 'Hello everyone Today I chance part NFT Paris conference I learned wealth new information blockchain use cases across various industries finance gaming luxury I privilege engage discussions numerous brilliant web developers CEOs companies like Maxence Perray Victor Briere many others These conversations provided deeper insights innovative technology intersection data science particularly terms transparency tokenization blockchain',
 'Comment acheter les NFT de Eyes Humanity et en tirer profit',
 'A guide Decentraland NFTs snapshot data trends',
 'For long time I imagined term always something Well regardless circumstances I said waiting enough nothing needs done happen Thus I publicly presented alter ego The Knight Rose https lnkd gDWS mas music group MonteCristo https lnkd g p V KQ first NFT book combined NIGHT WITHOUT GAME event With exactly I life born made sense Art lecture conversation music masks fallen games gone We ones left watch hearts',
 'Brands create digital narratives customers actively participate adding immersive dimension marketing efforts Imagine launching product line item accompanied NFT telling different story brand heritage product journey This approach enriches customer experience also fosters deeper connection brand',
 'A bit entries days NFT Paris blast Particularly proud attendance major companies luxury sport gaming finance art sector support year Even Tesla NFTs evolving sector becoming mature From Hype purpose introduction keynote Pierre Nicolas Hurstel defining theme edition Once hype gone remains A brilliant community culture Real use cases finally appear Slowly let consolidation work',
 'A guide Bored Ape Yacht Club NFTs snapshot data trends',
 'A SaaS platform top Arianee Protocol distribute gasless NFTs special features engage customers securely anonymously scale NFTs revolutionizing way think ownership value digital world ways deeper meaningful customer email address But nascent technology lot complexity navigate That NFT Management Platform NMP comes make web accessible brands In article explore created NMP future envision',
 'Is best way capitalize Web conference Some context At end last year decided BGA team double effort provide opportunity BGA members Web Gaming enthusiasts network learn We want BGAConnects embodiment new comitment During NFT Paris happy see different verticals BGAConnects made sense every atendee Learn experience innovating We fortunate Yat Siu Nicolas Gilot among many brilliant entrepreneurs panels Give opportunity start ups meet potential investors Just Play games meet visionaries behing One last note Always try contribute events based expertise could provide network professionals specific ecosystem Super happy members chance play role best side events NFT Paris See takeaways https lnkd eGHhc',
 'In two days YSL Beaute reveal latest NFT chapter The Night Is Ours As wait anticipation let take trip memory lane check fantastic drops previously released Discover YSL Beaute web website https lnkd g hsT Vq',
 'What fantastic week NFT Paris Met many incredible people Web space I spent quality time team members I often see face face like CTO Alexandre Cognard It great sharing memories talking progress made since I started Arianee almost years ago Learning incredible opportunity develop career gain insight fields tech management Getting know ins outs day day work made understand tech team better I gained insights viewpoints strengthening mutual understanding boosting alignment within company Thanks organization Alexandre Tsydenkov Can wait next year',
 'Great coverage Louise Laing happening Digital Fashion NFT Paris User Generated Content taking The Sandbox stands transformative force digital fashion landscape The next big designer emerge platform like The Sandbox customers recognising next big brand This validated initiatives like Art Runway collaborations Digital Fashion Week D designers known voxel creators present creations sale game gaining exposure different audience garnering opportunity monetise creations whilst growing digital community https lnkd eHxX UXZ TrueStarsMedia nft paris DgtlFashionWeek',
 'Fintech Focus NFTs Are Non Fungible Tokens The Next Big Thing Just Hype Read latest post learn share thoughts experiences ideas']

In [8]:
import pandas as pd

df = pd.DataFrame(columns=['content', 'score', 'prediction'])

for i in range(len(posts)):
    sentence = flair.data.Sentence(posts[i])
    model.predict(sentence)
    
    # Extract relevant information from sentence and append to the DataFrame
    content = posts[i]
    score = sentence.labels[0].score
    prediction = sentence.labels[0].value
    
    print('content: ', content[:15], 'score: ', score, 'prediction: ', prediction)


content:  Last week I ple score:  0.9994392991065979 prediction:  POSITIVE
content:  Hello everyone  score:  0.9984716773033142 prediction:  POSITIVE
content:  The digital wor score:  0.9924715757369995 prediction:  POSITIVE
content:  Can wait welcom score:  0.9996523857116699 prediction:  POSITIVE
content:  DualMint set la score:  0.9991014003753662 prediction:  POSITIVE
content:  Join Saturday F score:  0.9017152190208435 prediction:  POSITIVE
content:  Hello everyone  score:  0.9986838698387146 prediction:  POSITIVE
content:  Comment acheter score:  0.9600447416305542 prediction:  POSITIVE
content:  A guide Decentr score:  0.9980354905128479 prediction:  POSITIVE
content:  For long time I score:  0.9851458072662354 prediction:  POSITIVE
content:  Brands create d score:  0.9983072280883789 prediction:  POSITIVE
content:  A bit entries d score:  0.9938095211982727 prediction:  POSITIVE
content:  A guide Bored A score:  0.765382707118988 prediction:  NEGATIVE
content:  A SaaS platform 

# Removing stopwords vs leaving stopwords

We have a very particular case here where I originally have a post that says:

"*A guide to Bored Ape Yacht Club #NFTs and a snapshot of data trends*"

Nevertheless if we transform this description with the preprocessing methods in 'nlp_preprocessing.ipynb' and 'preprocessing_no_lemm.ipynb' the results would be respectively:

1.  'guide bored yacht club nfts snapshot data trend',
2. 'A guide Bored Ape Yacht Club NFTs snapshot data trends'

Which is predicted **negative** by the flair model, there I tried experimenting by adding back the stopword "*to*" to observe if it can make a difference. Therefore the results were the following

In [9]:
sentence = flair.data.Sentence( 'A guide Bored Ape Yacht Club NFTs snapshot data trends')
model.predict(sentence)
score = sentence.labels[0].score
prediction = sentence.labels[0].value

print('content: ', sentence[:7], 'score: ', score, 'prediction: ', prediction)

content:  Span[0:7]: "A guide Bored Ape Yacht Club NFTs" score:  0.765382707118988 prediction:  NEGATIVE


In [10]:
sentence = flair.data.Sentence( 'A guide to bored ape yacht club NFTs snapshot data trends')
model.predict(sentence)
score = sentence.labels[0].score
prediction = sentence.labels[0].value

print('content: ', sentence[:7], 'score: ', score, 'prediction: ', prediction)

content:  Span[0:7]: "A guide to bored ape yacht club" score:  0.9295165538787842 prediction:  POSITIVE


In [15]:
sentence = flair.data.Sentence( 'My baby looks like a little bear')
model.predict(sentence)
score = sentence.labels[0].score
prediction = sentence.labels[0].value

print('content: ', sentence[:7], 'score: ', score, 'prediction: ', prediction)

content:  Span[0:7]: "My baby looks like a little bear" score:  0.6381192207336426 prediction:  POSITIVE


We can observe that by adding the stopword *to* the prediction turns out to be positive, which is better than being predicted as negative, therefore, it could be better to not remove prepostions or just the stopword *to*, or consider not removing any stopword at all.

# Next steps for Arianee
 - Get more data through my crawler - basically iterating through more pages
 - Label the data myself as positive, negative, or neutral
 - Train a new transformer model with more data

https://towardsdatascience.com/text-classification-with-state-of-the-art-nlp-library-flair-b541d7add21f

In [12]:
#from flair.data_fetcher import NLPTaskDataFetcher
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentLSTMEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

In [13]:
#corpus = NLPTaskDataFetcher.load_classification_corpus(Path('./'), test_file='test.csv', dev_file='dev.csv', train_file='train.csv')
word_embeddings = [WordEmbeddings('glove'), FlairEmbeddings('news-forward-fast'), FlairEmbeddings('news-backward-fast')]
document_embeddings = DocumentLSTMEmbeddings(word_embeddings, hidden_size=512, reproject_words=True, reproject_words_dimension=256)
#substitute for my corpus using NLPTaskDataFetcher
classifier = TextClassifier(document_embeddings, label_dictionary=corpus.make_label_dictionary(), multi_label=False)

2024-06-18 17:17:22,949 https://flair.informatik.hu-berlin.de/resources/embeddings/token/glove.gensim.vectors.npy not found in cache, downloading to C:\Users\matts\AppData\Local\Temp\tmpudqkhddm


  9%|▉         | 13.4M/153M [00:42<13:57, 174kB/s]   

KeyboardInterrupt: 

  9%|▉         | 13.4M/153M [01:00<13:57, 174kB/s]