loanAnalyser

In this project, we've used Spark to analyze loan applications in WI. We loaded our data to Hive tables and views so we could easily query them. The big table (loans) has many IDs in columns; we joined these against other tables/views to determine the meaning of these IDs. In addition to our analysis, we studied the performance impact of caching and bucketing.

Learning objectives:

load data to Hive tables and views;

write queries that use filtering, joining, grouping, and windowing;

interpret Spark query explanations;

optimize queries with bucketing and caching

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
.ipynb_checkpoints		.ipynb_checkpoints
image		image
nb		nb
README.md		README.md
docker-compose.yml		docker-compose.yml
main.sh		main.sh
worker.sh		worker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

loanAnalyser

About

Releases

Packages

Contributors 2

Languages

smileysim01/loanLab

Folders and files

Latest commit

History

Repository files navigation

loanAnalyser

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages