Small CLI tool that scraps all job posts from emplea.do, then it counts the presence of different technologies on each of the posts.
.
├── /data/
│ ├── /results/ # Directory where results are saved after running calculate.py.
│ ├── /source/ # Directory where job posts soon-to-be-analyzed are saved.
│ │ ├── data.json # Generated file after running 'python3 extract.py'. Contains all job posts from emplea.do in JSON format.
│ │ ├── technologies.json # List of programming languages, frameworks...
├── extract.py # Web scraps every job on emplea.do. Just run it with 'python3 calculate.py'.
│ calculate.py # Counts the presence of every language/framework and saves the results on /data/results/ directory.
│ requirements.txt # Project requirements. Install them with 'pip install -r requirements.txt'.
It does 3 things:
-
Extracts all job posts from emplea.do then saves it to
/data/source/data.json
. -
Takes a list of different type of programming languages, frameworks on
/data/source/technologies.json
and tries to count and find those technologies using the regular expression(^|[^a-zA-Z])WORD([^a-zA-Z]|$)
after normalizing the jobs posts to lower case. -
Saves the results to
/data/results/
by year and category of technology.
{
"languages": [["javascript"], "html", "css", "python", "java", "c#", "php", "typescript", "c++", "kotlin", "ruby", ["assembly", "asm", "ensamblador"], "swift", "rust", ["objective-c", "objective c"], "scala", "perl", "haskell", "julia", "delphi", "dart"],
"frontend": ["jquery", ["react", "reactjs", "react.js"], ["angular", "angularjs", "angular.js"], ["vue", "vuejs", "vue.js"], "svelte"],
"frameworks": [["asp.net", "asp net", "net core", ".net core", "asp", "asp.net core", ".net"], ["express", "expressjs", "express.js"], "spring", "jsf", "grails", "django", "rails", "flask", "laravel", "symfony", "gatsby", "drupal", ["node", "nodejs", "node.js"], "wordpress"],
"databases": ["mysql",["postgresql", "postgres", "postgre"], "microsoft sql", "sqlite", "mongodb", "redis", "mariadb", "firebase", ["elasticsearch", "elastic search"], "dynamodb", "cassandra", "couchbase"],
"clouds": [["aws", "amazon web services", "amazon web service", "amazon cloud"], "azure", ["google cloud", "gc"]],
"mobile": [["react native", "reactnative"], "flutter", ["cordova", "cordovajs", "cordova-js"], "phonegap", "ionic", "xamarin", "nativescript"]
}
- Clone the repo with
git clone https://github.com/ivanubi/emplea.do-analyzer.git
- Install requirements with
pip install -r requirements.txt
. - Extract all emplea.do's job posts with
python3 extract.py
(note: this will send more than 1200 requests in just a few seconds to emplea.do). - Count the presence of all languages, frameworks... by running
python3 calculate.py
. - Check out the results at
/data/results
.
You can checkout the raw results without even running the script by looking into the /data/results/
directory or by looking to these beautiful graphics.