Community Prediction in Microblogging Social Networks

This is an open-source python-based framework to predict the future user communities in a text streaming social network (e.g., Twitter) based on the users’ topics of interest. Our proposed framework has already been benchmarked on a Twitter dataset and showed improvements compared to the state of the art in underlying applications such as news recommendation and user prediction.

Installation

It is strongly recommended to use Linux OS for installing the packages and executing the framework. To install packages and dependencies, simply use this command in your shell:

 pip install -r requirements.txt

This command installs compatible version of the following libraries:

gensim

networkx

scikit-network

dynamicgem

tagme

nltk

numpy

pandas

scikit-learn

scipy

sklearn

requests

mysql-connector-python

matplotlib

Structure

Framework Structure

Our framework has six major layers: Data Access Layer (DAL), Topic Modeling Layer (TML), User Modeling Layer (UML), Graph Embedding Layer (GEL), and Community Prediction Layer (CPL). The application layer, is the last layer to show how our method improves the performance of an application.

Code Structure

│── output
│── src
│   │
│   │── cmn (common functions)
│   │──── Common.py
│   │
│   │── dal (data access layer)
│   │──── DataPreparation.py
│   │──── DataReader.py
│   │
│   │── tml (topic modeling layer)
│   │──── TopicModeling.py
│   │
│   │── uml (user modeling layer)
│   │──── UsersGraph.py
│   │──── UserSimilarities.py
│   │
│   │── gel (graph embedding layer)
│   │──── GraphEmbedding.py
│   │──── GraphReconstruction.py
│   │
│   │── cpl (community prediction layer)
│   │──── GraphClustering.py
│   │
│   │── application
│   │──── NewsTopicExtraction.py
│   │──── NewsRecommendation.py
│   │──── ModelEvaluation.py
│   │── main.py
│   │── params.py
│── requirements.txt

Usage

Data

We crawled and stored Twitter posts (tweets) for 2 consecutive months. The data is available as some sql scripts that should be executed. They are accessible through the following links. Please download and execute them into your local database engine. Please be sure that your sql engine is working when you start to run the framework.

Run

This framework contains six different layers. Each layer is affected by multiple parameters. Some of those parameters are fixed in the code via trial and error. However, major parameters such as number of topics can be adjusted by the user. They can be modified via 'params.py' file in root folder.
After modifying 'params.py', you can run the framework via 'main.py' with following command:

cd src
python main.py

Examples

params.py

import random
import numpy as np

random.seed(0)
np.random.seed(0)
RunID = 1                         

# SQL setting. Should be set for each mysql instance
user = ''
password = ''
host = ''
database = ''


uml = {
    'Comment': '', # Any comment to express more information about the configuration.
    'RunId': RunID, # A unique number to identify the configuration per run.

    'start': '2010-12-17', # First date of system activity
    'end': '2010-12-17', # Last day of system activity
    'lastRowsNumber': 100000, # Number of picked rows of the dataset for the whole process as a sample.

    'num_topics': 25, # Number of topics that should be extracted from our corpus.
    'library': 'gensim', # Used library to extract topics from the corpus. Could be 'gensim' or 'mallet'

    'mallet_home': '--------------', # mallet_home path
    
    # Following parameters is used to generate corpus from our dataset:
    'userModeling': True, # Aggregates all tweets of a user as a document
    'timeModeling': True, # Aggregate all tweets of a specific day as a document
    'preProcessing': False, # Applying some traditional pre-processing methods on corpus
    'TagME': False, # Apply Tagme on the raw dataset. Set it to False if tagme-dataset is used.
     

    'filterExtremes': True, # Filter very common and very rare terms in all documents.
    'JO': False, # (JO:=JustOne) If True, just one topic is chosen for each document
    'Bin': True, # (Bin:=Binary) If True, all scores above/below a threshold is set to 1/0 for each topic
    'Threshold': 0.2, # A threshold for topic scores quantization.
    'UserSimilarityThreshold': 0.2 # A threshold for filtering low user similarity scores.
}

evl = {
    'RunId': RunID,
    'Threshold': 0, # A threshold for filtering low news recommendation scores.
    'TopK': 20 # Number of selected top news recommendation candidates.
}

Results

Method	News Recommendation			User Prediction
Method	mrr	ndcg5	ndcg10	Precision	Recall	f1-measure
Community Prediction
Our approach	0.255	0.108	0.105	0.012	0.035	0.015
Appel et al. [PKDD' 18]	0.176	0.056	0.055	0.007	0.094	0.0105
Temporal community detection
Hu et al. [SIGMOD’15]	0.173	0.056	0.049	0.007	0.136	0.013
Fani et al. [CIKM’17]	0.065	0.040	0.040	0.007	0.136	0.013
Non-temporal link-based community detection
Ye et al.[CIKM’18]	0.139	0.056	0.055	0.008	0.208	0.014
Louvain[JSTAT’08]	0.108	0.048	0.055	0.004	0.129	0.007
Collaborative filtering
rrn[WSDM’17]	0.173	0.073	0.08	0.004	0.740	0.008
timesvd++ [KDD’08]	0.141	0.058	0.064	0.003	0.657	0.005

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

Contact

Email: ziaeines@uwindsor.ca - soroushziaeinejad@gmail.com
Project link: https://github.com/soroush-ziaeinejad/Community-Prediction

Acknowledgments

In this work, we use dynamicgem library to temporally embed our user graphs. We would like to thank the authors of this library.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
Readme.md		Readme.md
fig1.png		fig1.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE.txt

LICENSE.txt

Readme.md

Readme.md

fig1.png

fig1.png

requirements.txt

requirements.txt

Repository files navigation

Community Prediction in Microblogging Social Networks

Installation

Structure

Framework Structure

Code Structure

Usage

Data

Run

Examples

params.py

Results

Contributing

License

Contact

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

soroush-ziaeinejad/Community-Prediction

Folders and files

Latest commit

History

Repository files navigation

Community Prediction in Microblogging Social Networks

Installation

Structure

Framework Structure

Code Structure

Usage

Data

Run

Examples

params.py

Results

Contributing

License

Contact

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages