Epoch Profiling Web App for Logistic Regression in Spark

Master's Project for UCSB's RACELab

With our methodology, we’ve built a deliverable for users to profile the number of epochs a particular cluster configuration can achieve on their dataset in an hour. The platform is built using modern web development technologies, and purchases machines as cheaply as possible through use of the AWS spot market.

Building This Project

Clone this repository.
Change your working directory to the cloned repository.

Copy the template:

cp templates/aws_cfg.py.template cloud_configs/aws/aws_cfg.py

Modify the copied file to add your AWS Access ID and Secret Key.
If your AWS account has a sufficient spot limit, set launch_type = 'spot'. You will see considerable savings (up to 10x) by using this option. Otherwise, you must keep launch_type='on-demand'.

Create a copy of your AWS private key, copy it to the path below, and ensure the permissions are set to 600:

cp "<your private key's current path>" cloud_configs/aws/aws-key.pem
chmod 600 cloud_configs/aws/aws-key.pem

Build the dependencies with the following command:
```
source setup_app.sh
```
Note, this will request an instance, create a security group, build an AWS AMI, and terminate said instance. The total cost of this is at most $0.262 if launch_type = 'on-demand'.

Using the Web Application

To start the node.js server and Flask endpoint:
```
source start_script.sh
```
Output for this can be found in the server_logs directory.
You can view the web application interface at http://localhost:5000
To stop the node.js server and Flask endpoint:
```
source stop_script.sh
```
To clean the MongoDB database, wipe the synth dataset files, all of the profiles, and all of the logs:
```
source clean_files.sh
```
WARNING: This will delete any data created and start the app from a clean slate. Only do this if you do not care about the data being lost.
To add a dataset:
1. Upload your data to S3.
2. Make the dataset readable by all.
3. Copy the template:
```
cp templates/dataset.cfg.template data_configs/aws/"<your dataset's name>".cfg
```
4. Fill out the fields for the dataset file. For an example see templates/example-dataset.cfg.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
bin		bin
cloud_configs		cloud_configs
flask_endpoint_scripts		flask_endpoint_scripts
image_bundle		image_bundle
node_src		node_src
synth_datasets		synth_datasets
templates		templates
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
setup_app.sh		setup_app.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

cloud_configs

cloud_configs

flask_endpoint_scripts

flask_endpoint_scripts

image_bundle

image_bundle

node_src

node_src

synth_datasets

synth_datasets

templates

templates

.gitignore

.gitignore

README.md

README.md

init.py

init.py

setup_app.sh

setup_app.sh

Repository files navigation

Epoch Profiling Web App for Logistic Regression in Spark

Master's Project for UCSB's RACELab

Building This Project

Using the Web Application

About

Releases

Packages

Languages

kmalta/profiling-web-app

Folders and files

Latest commit

History

Repository files navigation

Epoch Profiling Web App for Logistic Regression in Spark

Master's Project for UCSB's RACELab

Building This Project

Using the Web Application

About

Resources

Stars

Watchers

Forks

Languages