Skip to content

soroush-ziaeinejad/Community-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Community Prediction in Microblogging Social Networks

This is an open-source python-based framework to predict the future user communities in a text streaming social network (e.g., Twitter) based on the users’ topics of interest. Our proposed framework has already been benchmarked on a Twitter dataset and showed improvements compared to the state of the art in underlying applications such as news recommendation and user prediction.

Installation

It is strongly recommended to use Linux OS for installing the packages and executing the framework. To install packages and dependencies, simply use this command in your shell:

 pip install -r requirements.txt

This command installs compatible version of the following libraries:

  • gensim
  • networkx
  • scikit-network
  • dynamicgem
  • tagme
  • nltk
  • numpy
  • pandas
  • scikit-learn
  • scipy
  • sklearn
  • requests
  • mysql-connector-python
  • matplotlib

Structure

Framework Structure

Our framework has six major layers: Data Access Layer (DAL), Topic Modeling Layer (TML), User Modeling Layer (UML), Graph Embedding Layer (GEL), and Community Prediction Layer (CPL). The application layer, is the last layer to show how our method improves the performance of an application.

image info

Code Structure

│── output
│── src
│   │
│   │── cmn (common functions)
│   │──── Common.py
│   │
│   │── dal (data access layer)
│   │──── DataPreparation.py
│   │──── DataReader.py
│   │
│   │── tml (topic modeling layer)
│   │──── TopicModeling.py
│   │
│   │── uml (user modeling layer)
│   │──── UsersGraph.py
│   │──── UserSimilarities.py
│   │
│   │── gel (graph embedding layer)
│   │──── GraphEmbedding.py
│   │──── GraphReconstruction.py
│   │
│   │── cpl (community prediction layer)
│   │──── GraphClustering.py
│   │
│   │── application
│   │──── NewsTopicExtraction.py
│   │──── NewsRecommendation.py
│   │──── ModelEvaluation.py
│   │── main.py
│   │── params.py
│── requirements.txt

Usage

Data

We crawled and stored Twitter posts (tweets) for 2 consecutive months. The data is available as some sql scripts that should be executed. They are accessible through the following links. Please download and execute them into your local database engine. Please be sure that your sql engine is working when you start to run the framework.

Run

This framework contains six different layers. Each layer is affected by multiple parameters. Some of those parameters are fixed in the code via trial and error. However, major parameters such as number of topics can be adjusted by the user. They can be modified via 'params.py' file in root folder.
After modifying 'params.py', you can run the framework via 'main.py' with following command:

cd src
python main.py

Examples

params.py

import random
import numpy as np

random.seed(0)
np.random.seed(0)
RunID = 1                         

# SQL setting. Should be set for each mysql instance
user = ''
password = ''
host = ''
database = ''


uml = {
    'Comment': '', # Any comment to express more information about the configuration.
    'RunId': RunID, # A unique number to identify the configuration per run.

    'start': '2010-12-17', # First date of system activity
    'end': '2010-12-17', # Last day of system activity
    'lastRowsNumber': 100000, # Number of picked rows of the dataset for the whole process as a sample.

    'num_topics': 25, # Number of topics that should be extracted from our corpus.
    'library': 'gensim', # Used library to extract topics from the corpus. Could be 'gensim' or 'mallet'

    'mallet_home': '--------------', # mallet_home path
    
    # Following parameters is used to generate corpus from our dataset:
    'userModeling': True, # Aggregates all tweets of a user as a document
    'timeModeling': True, # Aggregate all tweets of a specific day as a document
    'preProcessing': False, # Applying some traditional pre-processing methods on corpus
    'TagME': False, # Apply Tagme on the raw dataset. Set it to False if tagme-dataset is used.
     

    'filterExtremes': True, # Filter very common and very rare terms in all documents.
    'JO': False, # (JO:=JustOne) If True, just one topic is chosen for each document
    'Bin': True, # (Bin:=Binary) If True, all scores above/below a threshold is set to 1/0 for each topic
    'Threshold': 0.2, # A threshold for topic scores quantization.
    'UserSimilarityThreshold': 0.2 # A threshold for filtering low user similarity scores.
}

evl = {
    'RunId': RunID,
    'Threshold': 0, # A threshold for filtering low news recommendation scores.
    'TopK': 20 # Number of selected top news recommendation candidates.
}

Results

Method News Recommendation User Prediction
mrr ndcg5 ndcg10 Precision Recall f1-measure
Community Prediction
Our approach 0.255 0.108 0.105 0.012 0.035 0.015
Appel et al. [PKDD' 18] 0.176 0.056 0.055 0.007 0.094 0.0105
Temporal community detection
Hu et al. [SIGMOD’15] 0.173 0.056 0.049 0.007 0.136 0.013
Fani et al. [CIKM’17] 0.065 0.040 0.040 0.007 0.136 0.013
Non-temporal link-based community detection
Ye et al.[CIKM’18] 0.139 0.056 0.055 0.008 0.208 0.014
Louvain[JSTAT’08] 0.108 0.048 0.055 0.004 0.129 0.007
Collaborative filtering
rrn[WSDM’17] 0.173 0.073 0.08 0.004 0.740 0.008
timesvd++ [KDD’08] 0.141 0.058 0.064 0.003 0.657 0.005

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

©2021. This work is licensed under a CC BY-NC-SA 4.0 license.

Contact

Email: ziaeines@uwindsor.ca - soroushziaeinejad@gmail.com
Project link: https://github.com/soroush-ziaeinejad/Community-Prediction

Acknowledgments

In this work, we use dynamicgem library to temporally embed our user graphs. We would like to thank the authors of this library.

About

We propose an open-source python-based framework to predict the future user communities in a text streaming social network.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages