linkedIn_crawler

A python script to run search on linkedIn and collect the result in JSON format.

So far, the script is only work for people search and only provide id, first name, last name, languages, previous companies, educations and skills for each searched person.

To run this script you need to create your own configuration file (the default one is conf/config.json). Please follow the format of the default configuration file to create your own.

You need to define the proxy info. If you don't need a proxy, just leave the 'proxy' array in the config.json empty.
You need to input your login email and password in the 'login' element of config.json.
You need to input your own search rules. The rules ought to be parameters for the request of Advanced search on linkedIn. You can conduct a search with browser and catch the request with Chrome developer tool or Firefox firebug, and put those request parameters into the 'searchRules' array in config.json.

naive bayes classifer

A python script to set up a naive bayes classifer to work on the data crawled by linkedin_crawler.

The linkedin_crawler will generate arff files as final output, and they will be the training/testing data set.

e.g. 1) add your own login email and password in conf/config.json 2) run the following commands under the current dir $python ./linkedin_crawler.py $python ./classifer.py --train-file=temp/person_attr.arff

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
conf		conf
README.md		README.md
classifer.py		classifer.py
linkedIn_crawler.py		linkedIn_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

linkedIn_crawler

naive bayes classifer

About

Releases

Packages

Languages

juliali/linkedIn_crawler

Folders and files

Latest commit

History

Repository files navigation

linkedIn_crawler

naive bayes classifer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages