Skip to content
Kaiming Tao edited this page Jun 21, 2022 · 15 revisions

We're curating a database containing the drug resistance data of SARS-CoV-2. The data are extracted from publications (most of time are papers) and include treatments of monoclonal antibodies (mAb), plasmas (convalescent and vaccinee plasma), and other antiviral drugs.

This document is about how to contribute to the database, how to enter data, and how we manage the database.

The pipeline of curating data

  1. Create a new issue page, with the author name + DOI of the paper, and read the paper. Any questions or comments should be discussed in this issue page.
  2. Extract key data points, and organize them in several Excel spreadsheets with corresponding table headers.
  3. Fork the covid-drdb-payload repository, convert Excel spreadsheets to CSV files with corresponding name schemas, save all CSV files into corresponding folders.
  4. Create a pull request for checking data consistency automatically and for reviewing.

Because of the complexity of entering data, we can divide them into two main steps. Biomedical steps are about extracting data from papers. Database steps are about formatting the data to database-friendly format, and using software to check the data before finally merging them into the main repository.

Biomedical

Because of the heterogeneity of data, we provide different documents to explain them separately.

Comment: if you find the iso_name already exists in the database, you can skip the isolates and isolate_mutations tables. This also applies to other tables like antibodies, vaccines, etc. The goal of the template is to standardize the process, you don't need to fill in duplicated information that already exists in the database.

Database

Suggested tools

  • Please use PDF readers to view the paper
  • You may use Adobe Illustrator to extract data from figures
  • You may convert PDF to word to get data from PDF tables
  • Please use Excel to format data
    • use Excel to open CSV files
    • convert all dates to the "YYYY-MM-DD" format
    • Excel may convert names with - to dates, please convert them to text by prefix '
    • save the CSV files in "CSV UTF-8 (Comma Delimited)" format