50.043 Database GreatReads project 2019
Group members:
- Abinaya
- Bertha Han
- Kenneth Soon
- Nigel Chan
- Sumedha Gn
- Tamanna
Make sure you are in the project root folder.
- For this project, installation of
boto3
andfabric
python lib packages are required pip3 install boto3
pip3 install fabric
production_setup.py .
From the root folder, from your command line, run:
python3 production_setup.py
- A series of prompts will greet you:
- < image of promps here>
- AWS credentials
- Non-existing security group names for three instances
- Non-existing key name
- Program will start running to set up mongoDB, MySQL and the Web Server instances
- Once the installation has been completed, a link to our web server will appear.
- To view the functionalities of our Web Server, here is the link
analytics_functions.py .
For this section, once again type the following into your command line and run:
python3 hadoop_spark_setup.py
- Similar to before, the command line will prompt you to enter the necessary informations:
- Number of instances to create
- Choice of either choosing your own settings for AMI and instance type or using default (default: Ubuntu 18.04)
- The command line will execute the relevant scripts and will take approximately 27 mins to complete (might wanna do something for the time being). Once the script is completed, you can visit the links shown in the terminal to check out the hadoop and spark UIs to see if it was successful.
get_analytics.py .
To get the Pearson Correlation and TF-IDF results, run:
python3 get_analytics.py
- The server will start the task and results are generated in approx. 10mins. Pearson correlation will be printed in the terminal.
- The second functionality involving the TF-IDF had issues with OutOfMemory errors and as such will be commented out. Use this project stuff at your own discretion.