Welcome to NezukoDB automation!

50.043 Database GreatReads project 2019

Group members:

Abinaya
Bertha Han
Kenneth Soon
Nigel Chan
Sumedha Gn
Tamanna

1. Getting the Dependencies

Make sure you are in the project root folder.

For this project, installation of boto3 and fabric python lib packages are required
pip3 install boto3
pip3 install fabric

2. Setting up Production

production_setup.py .

From the root folder, from your command line, run:

python3 production_setup.py
A series of prompts will greet you:
< image of promps here>
- AWS credentials
- Non-existing security group names for three instances
- Non-existing key name
Program will start running to set up mongoDB, MySQL and the Web Server instances
Once the installation has been completed, a link to our web server will appear.
To view the functionalities of our Web Server, here is the link

3. Setting up Analytics

analytics_functions.py .

For this section, once again type the following into your command line and run:

python3 hadoop_spark_setup.py
Similar to before, the command line will prompt you to enter the necessary informations:
- Number of instances to create
- Choice of either choosing your own settings for AMI and instance type or using default (default: Ubuntu 18.04)
The command line will execute the relevant scripts and will take approximately 27 mins to complete (might wanna do something for the time being). Once the script is completed, you can visit the links shown in the terminal to check out the hadoop and spark UIs to see if it was successful.

4. Getting the Analytics

get_analytics.py .

To get the Pearson Correlation and TF-IDF results, run:

python3 get_analytics.py
The server will start the task and results are generated in approx. 10mins. Pearson correlation will be printed in the terminal.
The second functionality involving the TF-IDF had issues with OutOfMemory errors and as such will be commented out. Use this project stuff at your own discretion.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
allnodes-stuff		allnodes-stuff
analytics_generated_items		analytics_generated_items
datanodes-stuff		datanodes-stuff
namenode-stuff		namenode-stuff
sparky		sparky
.gitignore		.gitignore
Readme.md		Readme.md
analytics_functions.py		analytics_functions.py
get_analytics.py		get_analytics.py
hadoop_spark_setup.py		hadoop_spark_setup.py
production_setup.py		production_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

pycache

pycache

allnodes-stuff

allnodes-stuff

analytics_generated_items

analytics_generated_items

datanodes-stuff

datanodes-stuff

namenode-stuff

namenode-stuff

sparky

sparky

.gitignore

.gitignore

Readme.md

Readme.md

analytics_functions.py

analytics_functions.py

get_analytics.py

get_analytics.py

hadoop_spark_setup.py

hadoop_spark_setup.py

production_setup.py

production_setup.py

Repository files navigation

Welcome to NezukoDB automation!

1. Getting the Dependencies

2. Setting up Production

3. Setting up Analytics

4. Getting the Analytics

About

Releases

Packages

Contributors 2

Languages

nixsterchan/nezukoDB_automation_scripts

Folders and files

Latest commit

History

Repository files navigation

Welcome to NezukoDB automation!

1. Getting the Dependencies

2. Setting up Production

3. Setting up Analytics

4. Getting the Analytics

About

Resources

Stars

Watchers

Forks

Languages