Detection of Hate Speech Against LGBT+ on Social Media

This project aims to detect potential hate speech targeting people of the LGBTQIA+ community. The model is trained using a supervised learning approach, where it learned patterns from labeled data to make predictions on unseen text. For the training of the model ,a dataset containing examples of hate speech and non-hate speech texts related to LGBTQIA+ topics was used.

The project uses key concepts of Natural Language Processing, employs TF-IDF for the vectorization of the text.

For the preprocessing of the text, basic operations like removing URLS,Stop Words, Punctuation marks,Digits was done, followed by tokenization and lemmatization. Post this, frequencies of the most repeating words were plotted on wordclouds.

TF-IDF (Term Frequency-Inverse Document Frequency) vectorization was used in this project to convert text into numerical features. TF-IDF assigns weights to words based on their frequency in a document relative to their frequency in the entire corpus, helping capture the importance of words in a document.

For the classification, Random Forest was used, which is an ensemble learning method that builds multiple decision trees during training and combines their predictions to make a final prediction.

The project not only provides predictions of hate speech from user input (in the form of text) but also allows scraping of tweets and comments from a YouTube video, as well as uploading CSV files. For scraping tweets, the Python library ntscraper was used, while for YouTube comments, the YouTube API was utilized. Streamlit library was used to build and design the user interface.

Refer to the attached screenshots demonstrating the working of the project.

The "Youtube Comments Analysis" section allows the users to enter any valid link of a youtube video and scrape the comments to make predictions on that data. The same is displayed on the screen.

The "File Upload" section allows the users to import any csv file which has a 'text' column in it.The textual data is then analysed for the presence of hate speech and the same is then displayed on the screen.

The "Tweets Analysis" section provides the feature of using a term or a valid twitter username, to scrape tweets and make predictions on the scraped data.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
anti-lgbt-cyberbullying.csv		anti-lgbt-cyberbullying.csv
app.ipynb		app.ipynb
lgbt_data.csv		lgbt_data.csv
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl
trained_model_rf (2).pkl		trained_model_rf (2).pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detection of Hate Speech Against LGBT+ on Social Media

About

Releases

Packages

Languages

parth9504/Detection-of-Hate-Speech-Against-LGBT-

Folders and files

Latest commit

History

Repository files navigation

Detection of Hate Speech Against LGBT+ on Social Media

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages