Skip to content
Qinbo Li edited this page Jan 6, 2020 · 3 revisions

Welcome to the MINDFIRL wiki!

Environment and Deployment

The software will be delivered as a virtual appliance. It will be hosted on the virtual machine, and can be used inside the virtual machine, inside the physical machine but outside the virtual machine (like SAS), and across multiple machine inside Local Area Network (LAN). Therefore, there are mainly three steps toward these goals:

  1. Inside the virtual machine
    The software is hosted and used inside the virtual machine. Different users will use the software in turn in this setting.

  2. Host inside the virtual machine
    The software is hosted inside the virtual machine, but used outside the virtual machine, like SAS. Different users will use the software in turn in this setting.

  3. Across the Local Area Network
    The software is hosted inside the virtual machine, and is accessible on the Local Area Network.

VM guest OS:
I spoke to one of my friends who is an expert on all things os and linux and even he recommended cent os for our problem. He also suggested an alternative called Security-Enhanced Linux (SELinux) where in we could install any distro (say Ubuntu) and enhance the kernel to SELinux, thereby by making it more secure. He also asked to take a look at SUSE.

Features

  1. User management
    a. User log in and log out
    b. User sign up
    The software come with a admin account, whose password can be reset upon logging in the first time. Users can sign up as PI or normal users.
    c. Forgot password
    Admin can create password reset link for users, and give to them offline.

  2. Project
    a. PI
    PI can create projects, assign project to users, and check the progress of a project. PI can split a project and assign them to different users, and check their progress. PI can assign project to themselves.
    b. User
    User can see the projects assigned to them, and their own progress on the project.

  3. Working page When the user choose a project to work on, they will go into the working page. They can exit the working page and their progress and the opened cell will be saved.

  4. Data
    When the PI create a project, they need to import the data. The data should be two csv files. Then, the PI need to specify the data type of each column: String, Date, Category, Number.

  5. Privacy budget
    PI can set up a privacy budget limit when creating project. The privacy budget is per user on the full project.

Databases

  1. MongoDB
    MongoDB will be used as the persistent database for the software (uses, project setting, progress).

  2. Redis
    Redis will be used as the in-memory database (Working page).

  3. MySQL or CSV file
    MySQL or csv file will be used as the database for the records need to be linked.

MongoDB

use mindfirl

collections: db.createCollection("users")
db.createCollection("projects") db.createCollection("log")

{
"pid": "861fd2be35278a34d161dfb52358b6de7d528f9640d8fb955cb32209",
"project_name": "project1",
"project_des": "This is a demo project.",
"owner": "user1",
"created_by": "blocking",
"blocking": "fn",
"file1_path": "/db/csv/project1.csv",
"file2_path": "/db/csv/project1.csv",
"intfile_path": "data/internal/user1_p1_intfile.csv",
"pairfile_path": "data/internal/user1_p1_pairfile.csv",
"block_id": [ [1,2,3,4],[5,6,7,8,9] ],
"result_path": "/data/internal/user1_p1_result.csv",
"assignee": ["user1", "user2"],
"assignee_stat": [
{
"assignee": "user1",
"pf_path": "data/internal/owner_pname_assignee_pf.csv",
"result_path": "data/internal/owner_pname_assignee_result.csv",
"current_page": 1,
"page_size": 6,
"kapr_limit": 15,
"current_kapr": 0,
"pair_idx": 15,
"total_pairs": 30
},
{
"assignee": "user1",
"pf_path": "data/internal/pf_file.csv",
"result_path": "data/internal/owner_pname_assignee_result.csv",
"current_page": 1,
"page_size": 6,
"kapr_limit": 15,
"current_kapr": 0,
"pair_idx": 15,
"total_pairs": 30 }
]
}

{
"username": "user1",
"timestamp": 1538972114,
"url": "mindfirl/record_linkage/1",
"log": "pid: 123456, cell: 1-1-1, result: success"
}

Record linkage

pair_file

file: user_projectname_pairfile.csv
info: a file of pairs generated by blocking
format: ID,voter_reg_num,first_name,last_name,dob,sex,race,type,file_id,cntfn,cntln,same

file: user_projectname_pf.csv
info: a helper file for mindfirl record linkage display
format: 1,1-B, 1000000657, LYNN, WILDING, 07/04/1946, M, W, **********, ****, *******, **/**/****, *, *, 683, 5, 1, 1

file: user_projectname_intfile.csv
info: an intermediate result file
format: ID,voter_reg_num,first_name,last_name,dob,sex,race,fileid,gid

Clone this wiki locally