Sentiment Analysis with Finetuned Models

Introduction

Work has already begun towards developing a COVID-19 vaccine. From measles to the common flu, vaccines have lowered the risk of illness and death, and have saved countless lives around the world. Unfortunately in some countries, the 'anti-vaxxer' movement has led to lower rates of vaccination and new outbreaks of old diseases.

Although it may be many months before we see COVID-19 vaccines available on a global scale, it is important to monitor public sentiment towards vaccinations now and especially in the future when COVID-19 vaccines are offered to the public. The anti-vaccination sentiment could pose a serious threat to the global efforts to get COVID-19 under control in the long term.

The objective of this challenge is to develop a machine learning model to assess if a Twitter post related to vaccinations is positive, neutral, or negative. This model will be deployed using streamlit on a Docker Container.

Dataset

Tweets have been classified as pro-vaccine (1), neutral (0) or anti-vaccine (-1). The tweets have had usernames and web addresses removed.

Variable definition:

tweet_id: Unique identifier of the tweet

safe_tweet: Text contained in the tweet. Some sensitive information has been removed like usernames and urls

label: Sentiment of the tweet (-1 for negative, 0 for neutral, 1 for positive)

agreement: The tweets were labeled by three people. Agreement indicates the percentage of the three reviewers that agreed on the given label. You may use this column in your training, but agreement data will not be shared for the test set.

Files available for download are:

Train.csv - Labelled tweets on which to train your model

Test.csv - Tweets that you must classify using your trained model

SampleSubmission.csv - is an example of what your submission file should look like. The order of the rows does not matter, but the names of the ID must be correct. Values in the 'label' column should range between -1 and 1.

NLP_Primer_twitter_challenge.ipynb - is a starter notebook to help you make your first submission on this challenge.

Setup

Fork this repo and run the notebook on Google Colab. The Hugging face models are Deep Learning based, so will need a lot of computational GPU power to train them. Please use Colab to do it, or your other GPU cloud provider, or a local machine having NVIDIA GPU.

Note that Google Colab sessions have time limits and may disconnect after a period of inactivity. However, you can save your progress and re-establish the connection to the GPU when needed.

Hugging Face is an open-source and platform provider of machine learning technologies. You can use install their package to access some interesting pre-built models to use them directly or to fine-tune (retrain it on your dataset leveraging the prior knowledge coming with the first training), then host your trained models on the platform, so that you may use them later on other devices and apps.

Please, go to the website and sign-in to access all the features of the platform.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
dev		dev
Article.md		Article.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

dev

dev

Article.md

Article.md

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Sentiment Analysis with Finetuned Models

Introduction

Dataset

Setup

Evaluation

Screenshots

Streamlit App

Gradio App

Resources

👏 Support

Author

About

Releases

Packages

Languages

ikoghoemmanuell/Sentiment-Analysis-with-Finetuned-Models

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis with Finetuned Models

Introduction

Dataset

Setup

Evaluation

Screenshots

Streamlit App

Gradio App

Resources

👏 Support

Author

About

Resources

Stars

Watchers

Forks

Languages