Bilingual Sentiment Analysis (On Two Regional Languages)

Start here: Analysis_OG.ipynb

💭 Background

This project applies concepts and techniques from Natural language processing and Opinion mining.The goal here is simply to build an artificial intelligience system that differentiates Hindi, Marathi code mixed with an english text on basis of their polarity. (ie positive, negative, neutral).overall.

Sentiment vs. Software

Using natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

🔧 Progress

Mining and Collecting the data

The main goal is to get as much comments as possible for this model. We took these comments from major social media websites like facebook and Youtube related to social and political views from many sources which contributes in giving us data in from of polarities. We collected about 5000 comments.

Tagging the data

Next step was to tag all the data according to their polarity(i.e. Positive, Negative, Nuetral). Tagging scheme was basically according to --

Positive Comment : 3
Negative Comment : 1
Neutral Comment : 2

Data Pre-Processing

As the data is all tagged, before feeding it to the model we pre-process the data.The goal of preprocessing text data is to take the data from its raw, readable form to a format that the computer can more easily work with. Most text data, and the data we will work with in this article, arrive as strings of text. Preprocessing is all the work that takes the raw input data and prepares it for insertion into a model.

While preprocessing for numerical data is dependent largely on the data, preprocessing of text data is actually a fairly straightforward process, although understanding each step and its purpose is less trivial. Our preprocessing method consists of two stages: preparation and vectorization. The preparation stage consists of steps that clean up the data and cut the fat. The steps are 1. removing URLs, 2. making all text lowercase, 3. removing numbers, 4. removing punctuation, 5. tokenization, 6. removing stopwords, and 7. lemmatization. Stopwords are words that typically add no meaning.

Splitting data 75-25(train-test)

train_test_split returns four arrays namely training data, test data, training labels and test labels. By default train_test_split, splits the data into 75% training data and 25% test data which we can think of as a good rule of thumb.

test_size keyword argument specifies what proportion of the original data is used for the test set. Here we have mentioned the test_size=0.3 which means 70% training data and 30% test data.

Hyperparameter Tuning

Hyperparameter Tuning used on various algorithms such as linear Regression , XGBoost used in Analysis_OG.ipynb

Accuracy and other Values

Accuracy, precision, Recall and Fscore for every algorithm used is given in Values Simply got overall accuracy around 70%.

💡Work to be done

Contextual understanding and tone
sentiment analysis at Brandwatch?
The caveats of sentiment analysis
Predictions for the future of sentiment analysis

❓ Open questions

Is the accuracy propotional or anyway dependent on the amount of data collected?
- the data source should closely match the intended uses? -- https://blog.infegy.com/understanding-sentiment-analysis-and-sentiment-accuracy
Is sentence-level cross-lingual sentiment classification enough to predict the sentiment?

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
values		values
Analysis_OG.ipynb		Analysis_OG.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bilingual Sentiment Analysis (On Two Regional Languages)

💭 Background

Sentiment vs. Software

🔧 Progress

Mining and Collecting the data

Tagging the data

Data Pre-Processing

Splitting data 75-25(train-test)

Hyperparameter Tuning

Accuracy and other Values

💡Work to be done

❓ Open questions

📚 Resources

Sentiment Analysis-related publications

About

Releases

Packages

Languages

prithvi2226/Bilingual-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Bilingual Sentiment Analysis (On Two Regional Languages)

💭 Background

Sentiment vs. Software

🔧 Progress

Mining and Collecting the data

Tagging the data

Data Pre-Processing

Splitting data 75-25(train-test)

Hyperparameter Tuning

Accuracy and other Values

💡Work to be done

❓ Open questions

📚 Resources

Sentiment Analysis-related publications

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages