Skip to content
/ mut Public template

A data analysis pipeline about career opportunities announced at Indeed 🔧

Notifications You must be signed in to change notification settings

HelioNeves/mut

Repository files navigation

MUT

Market Understanding Tool

Python 3.6 Code style: black License: MIT

About

This project is intended to make a pipeline of data analysis about opportunities for data science career announced at Indeed. However, this pipeline can classify job opportunities of whenever sector, beyond data science.

This pipeline generates a .html file with:

  1. Clusters 2D Graph
  2. Clusters Keywords Ranking
  3. TF-IDF Ranking

Check the "Brazillian Data Science Jobs Market: A Deep Analysis" on the web!

Project Details

Folders

Folder Description
db/ Folder where your Scrapy database will be saved
output/ Folder where your graphs and results will be saved

Files

ARGS USAGE
[db-title] It is your Scrapy database title (e. g., datascience_db)
[urls-file] It is your Indeed URL filename (take a look at sample.urls)
[toxicwords-file] It is the filename of list of words for not use in the analysis (take a look at sample.toxicwords)
[num-clusters] Number of clusters to identify, in a range (e. g., 2-8) or single (e. g., 8)

Requirements

Paraphrasing The Beatles: " All you need is docker 🐳 "

Install

1. Clone this repo 🍕
git clone https://github.com/HelioNeves/mut.git
cd /mut
2. Basic building 🔧
docker build . -t mut

Running this awesome docker image

1. Load ubuntu layer 🌈
docker run -ti --name MUT-env mut /bin/bash
2. Once inside ubuntu, run pipeline python scripts 🐍
Scrapy
python3 scraper.py [db-title] [urls-file]
Analytics app
python3 app.py [db-title] [toxicwords-file] [num-clusters]