Skip to content

Language Identification for Stream of Twitter Data Using Scala Language and Apache Kafka and Apache Spark

Notifications You must be signed in to change notification settings

shayan72/Twitter_Language_Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Installation

Mac OS X

Install scala-2.12.1 and sbt

  • Install with Homebrew:

    brew install scala
    brew install sbt

Download and Install hadoop-2.7.3

Follow instructions in Hadoop.

Download and Install spark-2.1.0-bin-hadoop2.7

Download and install Spark version 2.1.0 pre-built for Hadoop 2.7 and later from this link.

Download and Install kafka-0.10.2.0

Use kafka quick start documentation to download and start zookeeper and kafka servers

Quick Start

Import project into IntelliJ IDEA and install sbt packages.

Create new app at Twitter Apps and put consumer key, consumer secret, access key, and access secret in the application.conf.

Run the code using either of the following main functions:

  • DistributedLanguageDetection for running distributed k-means algorithm
  • DistributedLanguageDetection for getting twitter data from kafka and running Streaming k-means algorithm
  • CommonNgrams for preprocessing tweets and find common ngrams in each language

About

Language Identification for Stream of Twitter Data Using Scala Language and Apache Kafka and Apache Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages