employability

A study of postsecondary graduate employability using topic modeling.

Project Overview

The research goal of this project is to determine the nature of the overlap of those core, academic skills being taught at the postsecondary level and those being expected at the entry level in the workforce. In a nutshell, the research goal is to study how well universities are preparing students for the workforce: to what degree are they promoting graduate employability?

This repository defines an open source software tool to perform that analysis. The core concept leveraged is topic modeling, a method from machine learning and natural language processing. Topic modeling is used to infer concepts from large datasets of job postings and course descriptions.

Installation

Dependencies

docker and docker-compose
java (optional)
sbt (optional)

Import data

docker-compose up -d elasticsearch
./elasticsearch/bin/import-data small  # {small|medium|large}

Start the server

docker-compose up -d web

You should be able to acccess the vis server at localhost:9000.

Data Ingestion

If you would like to run the whole ingestion and analysis process, there are a few more steps.

data.world API token

Register for an account at Data World and export your API token.

export DATA_WORLD_API_TOKEN=

Start up background services

docker-compose up -d elasticsearch kibana postgres

Run data pipelines

sbt ingest/run
sbt preprocess/run

Executing LDA

With default parameters

sbt analysis/run

Configuring LDA

You can modify the behavior of LDA through environment variables. Some pre-defined configurations are made available for you.

# Source one of these before running analysis/run.
source ./analysis/config/small
source ./analysis/config/medium
source ./analysis/config/large

Exporting data

./elasticsearch/bin/export-data NEW_SNAPSHOT_ID

This will do several things:

create a local snapshot repository in your Elasticsearch cluster
- this lives on your local filesystem: ./data/elasticsearch-snapshots/local/
create a new snapshot NEW_SNAPSHOT_ID in the local repository

Project Modules

`core`

Core components, models, and glue code.

"net.rouly" % "employability-core" % "x.x.x"

`elasticsearch`

Elasticsearch read/write services. Interaction is defined using Reactive Streams.

"net.rouly" % "employability-elasticsearch" % "x.x.x"

`postgres`

Postgres read/write services. Interaction is defined using Reactive Streams.

"net.rouly" % "employability-postgres" % "x.x.x"

`ingest`

Entry point application to ingest raw data into Elasticsearch.

Raw data is accepted from the following data providers:

data.world: add data set definitions under resources/datasets/data.world/

"net.rouly" % "employability-ingest" % "x.x.x"

`preprocess`

Entry point application to pre-process and clean ingested data. Cleaned and prepared data is exported to Postgres.

"net.rouly" % "employability-ingest" % "x.x.x"

`analysis`

Entry point application to read processed data from Postgres and execute the primary topic modeling steps. Topics are output to Elasticsearch.

"net.rouly" % "employability-analysis" % "x.x.x"

`web`

User facing web application to explore the generated topics and render various statistics about them.

"net.rouly" % "employability-web" % "x.x.x"

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
analysis		analysis
core/src/main		core/src/main
data/elasticsearch-snapshots		data/elasticsearch-snapshots
demo		demo
elasticsearch		elasticsearch
ingest/src/main		ingest/src/main
paper		paper
postgres/src/main/scala/net/rouly/employability/postgres		postgres/src/main/scala/net/rouly/employability/postgres
preprocess/src/main		preprocess/src/main
project		project
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
.scalariform.conf		.scalariform.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml

License

jrouly/employability

Folders and files

Latest commit

History

Repository files navigation

employability

Project Overview

Installation

Dependencies

Import data

Start the server

Data Ingestion

data.world API token

Start up background services

Run data pipelines

Executing LDA

With default parameters

Configuring LDA

Exporting data

Project Modules

core

elasticsearch

postgres

ingest

preprocess

analysis

web

About

Topics

Resources

License

Stars

Watchers

Forks

Languages