Side Effects Management

Getting Started

Required Software

MySQL and MySQL Workbench (https://dev.mysql.com/downloads/installer/ and https://dev.mysql.com/downloads/workbench/)
Postman (https://www.getpostman.com/downloads/)
Install the required packages.

pip3 install -r requirements.txt

EC2 Instance

ubuntu@18.237.156.136
Ask Kien Nguyen (kien.nguyen@usc.edu) for the .pem file
To SSH into the EC2 instance where the server-side code is located at:

ssh -i /path/my-key-pair.pem ubuntu@18.237.156.136

For more information, visit https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html

Running Locally

Navigate to the top-level of the project directory.
On the command line, type

python wsgi.py

Now you should see something like the following in your terminal.

 * Serving Flask app "app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 258-984-237
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Now you should be able to test API requests via Postman and see any associated changes in MySQL workbench.

NLP - /search endpoint

Read the user string passed as parameter via the API call
Pre-process the query string and find the tfidf vector
TFIDF values for the questions that are in the database is already pre-calculated and dumped to a .pkl file.
Cosine Similarity is found between the user input's TFIDF vector against all the TFIDF vectors present in the JSON file.
Question ID with the highest match is returned back to an internal function, which then pulls the corresponding links and comments from the database and sends back to the frontend app.

TrainingCode.py - To be executed every time there is a modification in the database. The code generates a pkl file containing the TFIDF matrix, which is then read in the NLPCode and similarity is performed.

Libraries required : nltk, numpy, cosine_similarity

Testing Routes using Postman

Open PostMan.
File -> Import -> "SideEffectsApp(AWS).postman_collection"

Connecting to the MySQL DB instance on Amazon RDS

Instructions:

Open MySQL Workbench and navigate to the home page (MySQL Connections).
Open a new connection.
Set the "Hostname" field to "[Ask Kien Nguyen (kien.nguyen@usc.edu) for URL of Amazon RDS MySQL DB instance]" (excluding quotes)
Set the "Port" field to 3306
Set the "Username" field to "administrator" (don't include quotes)
In the "Password" field, click "Store in Vault ..." and enter "[Ask Kien Nguyen (kien.nguyen@usc.edu) for password]" (don't include quotes) as the password.
Click OK to connect.

Migrating Raw Data into MySQL

Download 1st sheet of the "Raw Data" Google Sheets file in Google Drive as a CSV file and save the file in the /datadump/ directory.
In the /datadump/ directory, run

python csv_to_mysql.py

The database should be updated. In the terminal, you should see the total number of questions and comments that now exist in the database.

Tech Stack

Frontend:
- Swift
Backend:
- Python3
- Flask
- SQLAlchemy
- Nginx (to route traffic to port 80 of our EC2 instance)

Deep Dive

Frontend

Description of Files / File Structure

app
- SideEffects
- SideEffects.xcodeproj

Backend

The backend uses a Client-Server architecture model. The Python Flask web framework was used to create an ReST API. SQLAlchemy was used for its ORM (object relational mapper) to make it simple to perform CRUD (create, read, update, delete) operations with our MySQL database. The server-side code is hosted on an AWS EC2 instance, and the MySQL database is hosted on Amazon RDS.

Description of Files / File Structure

app (Contains all files regarding backend development)
- auth (Contains routes related to authentication)
  - routes.py
- comments (Contains routes related to the Comment class)
  - routes.py
- questions (Contains routes related to the Question class)
  - routes.py
- users (Contains routes related to the User class)
  - routes.py
- init.py (Initialization script that connects all the components of the Flask app, including the SQLAlchemy and LoginManager plug-ins. Also registers all Flask blueprints to link all routes to the application.)
- .env (Contains our environment variables. Make sure to update the SECRET_KEY!!!)
- config.py
- models.py (Contains the model classes for User, Question, and Comment.)
- vectorizer.joblib (Importing the file as joblib to access large arrays efficiently.)
- vectors.pkl (Pickle file containing the feature vectors for the questions in the database)
datadump (Contains files regarding data migration)
- csv_to_mysql.py (Script that takes raw_data.csv generated from the Google Spreadsheet containing all our data and populates the database with all the data. Don't worry about adding duplicate data upon successive runs of this script. It checks for duplicate data.)
- data_dump_11152019.sql (SQL script to create database tables, along with all data collected so far.)
- raw_data.csv (CSV file generated from our Google Spreadsheet. Keep column structure, or else csv_to_mysql.py script won't be able to migrate the latest data to the database.)
images (Directory to hold architecture diagrams and other images)
model (Contains files related to the NLP functions used to perform the 'search' function.)
- NLPCode.py (The code gets executed when there is a new query from the user.)
- TrainingCode.py (Run the code to generate the feature vectors for the question sets that are already available.)
tests (Contains all tests that we've conducted so far.)
- SideEffectsApp(AWS).postman_collection.json (Contains PostMan tests that the Fall 2019 team conducted. Import this file in Postman to execute tests.)
.gitignore (Add sensitive files to .gitignore file to prevent accidental commital of sensitive files to online repository)
README.md (Contains technical documentation. Project motivation and background are contained in a separate Google Doc.)
requirements.txt (Perform 'pip3 install requirements.txt' to install all dependencies in your virtual environment to get started)
side-effects-key-pair.pem (File containing SSH key for AWS EC2 instance. Ask Kien Nguyen (kien.nguyen@usc.edu) for this file (keep this information safe!))
wsgi.py (Entry point to start up the backend server/application. To start the backend application, go to your terminal and type in "python3 wsgi.py")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Side Effects Management

Table of Contents

Points of Contact

Getting Started

Required Software

EC2 Instance

Running Locally

NLP - /search endpoint

Testing Routes using Postman

Connecting to the MySQL DB instance on Amazon RDS

Migrating Raw Data into MySQL

Tech Stack

Deep Dive

Frontend

Description of Files / File Structure

Backend

Description of Files / File Structure

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
SideEffects.xcodeproj		SideEffects.xcodeproj
SideEffects		SideEffects
app		app
datadump		datadump
images		images
model		model
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
wsgi.py		wsgi.py

justindho/SideEffects

Folders and files

Latest commit

History

Repository files navigation

Side Effects Management

Table of Contents

Points of Contact

Getting Started

Required Software

EC2 Instance

Running Locally

NLP - /search endpoint

Testing Routes using Postman

Connecting to the MySQL DB instance on Amazon RDS

Migrating Raw Data into MySQL

Tech Stack

Deep Dive

Frontend

Description of Files / File Structure

Backend

Description of Files / File Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages