Mining Call For Papers

Setup

python version: 3.6

Installation of pdftotext dependencies

https://pypi.org/project/pdftotext/

Installation of python packages:

pip install -r requirements.txt

Conference Crawling

Run Crawler

python crawl.py <CRAWL_TYPE>

Additional configurations of the crawl are located at: cfp_crawl/config.py, also specifies crawl log and data save directories

<CRAWL_TYPE> options:

wikicfp_latest crawls details of the most recent conferences on the homepage of wikicfp at http://www.wikicfp.com

wikicfp_all traverses through and scapes information from every conference series on wikicfp starting from http://www.wikicfp.com/cfp/series?t=c&i=A

conf_crawl assumes a database populated with basic conference information obtained from either wikicfp_latest/wikicfp_all and proceeds to store the HTML information of the specified conferences. Crawls for directory specified in cfp_crawl/config.py.

Notes on conf_crawl

Selenium chromedriver is needed to better simulate organic access of conference sites (e.g. waiting for the loading of javascript elements). The chromedriver should match your chrome version can be downloaded https://chromedriver.chromium.org/. Move the executable into this repo or as specified in cfp_crawl/config.py.

Post Processing

Pipeline is run from python run.py <DB_FILEPATH> in ./post_processing.

Includes:

Extraction of page lines from database.
Generation of vocab and model training (Ensure database has labelled data).
Prediction of all page lines
(To be updated) Line Named Entity Recognition training to improve extraction
Extraction of <Person/Affiliation/Role-label> tuples
(To be updated) Name disambiguation using dblp

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
cfp_crawl		cfp_crawl
post_processing		post_processing
proceedings_crawl		proceedings_crawl
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
crawl.py		crawl.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mining Call For Papers

Setup

Conference Crawling

Run Crawler

Notes on conf_crawl

Post Processing

About

Releases

Packages

Languages

shitian007/cfp-mining

Folders and files

Latest commit

History

Repository files navigation

Mining Call For Papers

Setup

Conference Crawling

Run Crawler

Notes on conf_crawl

Post Processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages