Skip to content

fidsusj/HateSpeechDetection

Repository files navigation

HateSpeechDetection

Code style: black Tests Code quality

Introduction

This project is part of the text analytics course at Heidelberg university. The goal of this project is to classify social media posts on hate speech using text analytics methods.

This Repo contains all files of the project.

Project team

  • Christopher Klammt
  • Felix Hausberger
  • Nils Krehl

Setup Instructions

Run the project

  1. Install Python 3.7

  2. If the operating system is Windows, install the Microsoft build tools für C++ (needed for fastText installation)

  3. Install pipenv

    pip install pipenv
    
  4. Install all the dependencies defined in the Pipfile

    pipenv install --dev
    
  5. Enter the virtual environment of pipenv

    pipenv shell
    
  6. Download and add the original datasets (Automated Hate Speech Detection and the Problem of Offensive Language, Hate speech dataset from a white supremacist forum) The resulting directory structure should look like the following:

    data folder structure

  7. Run the program (on our computers this takes about 10 min)

    pipenv run main
    
  8. Run the tests

    pipenv run test && pipenv run report
    
  9. Leave the virtual environment of pipenv

    exit
    

Normally all needed dependencies are downloaded automatically. If this is not the case, try the following:

  • sudo pipenv run spacy download en (Assignment 2)

  • sudo pipenv run nltk.downloader vader_lexicon

  • sudo pipenv run nltk.downloader averaged_perceptron_tagger

Development setup for the project

  • Set up the git hook scripts

     pre-commit install
    

Run the assignments

For running the assignments further dependencies are needed:

  • pdftotext (additional os dependencies needed) (Assignment 1)

About

Project of the text analytics course at Uni Heidelberg

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published