Rewrite database description document #9

magnusmanske · 2018-09-07T08:44:04Z

based on previous comments from Richard in email 28/06/2018 13:59

podpearson · 2018-09-28T11:27:50Z

Comments from email mentioned above are as follows:

I think this is a good start. I think what this particularly needs now is an introduction explaining why this is needed, what it is replacing, etc., and also pointers to what comes next. The descriptions of the tables and fields need expanding a bit - I think you should be able to understand every single field in the database using this document. I think we also need, either in this document, or as a separate document, details of the source data for this, how it is populated, etc.

Specifically I would suggest:

Write an introduction which answers the following questions:
- What does this document describe? (an initial prototype of a system to map files to samples?)
- Problem statement - why is the file tracking system needed?
- What does the file tracking system replace?
- What are some known use cases of the file tracking system?
  - Create a build manifest given a set of sequencescape IDs?
  - Create a build manifest given a set of Oxford codes and/or ROMA IDs?
  - Create a build manifest containing all samples from a species?
  - Determine which samples from a given study have been sequenced (Sonia has asked for this a few times recently)
  - Determine how many samples have been sequenced, broken down by species (DK has recently asked for this)
  - Others?
- What are some possible use cases that the file tracking is not intended to cover, but people might assume it would cover? (e.g. any of the above?)
  - If the file tracking system is not intended for this use case, do we know how this use case is expected to be handled in the future?
- Design principles
  - Expand this to full sentences
Database core schema
- Could you make the text a bit bigger?
- Why is there a direct link from storage in file to tag? Why doesn't this go via file2tag? Is storage in some way different to other file properties such as MD5, file size, etc.? If so, need to explain why this is so.
- The SIMS database is using UUIDs as IDs. What do you think about doing the same in the file tracking DB?
Tables
- Could you write a description for each field, describing the data the field contains, and why this is needed. E.g. what are ts_created and ts_touched and why are these needed?
- We also need, either here, or as a separate document, details of the source data for each table, how the table is initially populated, and how the table will be maintained going forwards (e.g. regular updates from other systems, manual changes using SQL or a web front end, etc.
- Could you also include views, either here or as a separate section?
Current list of tags
- What do the terms in brackets represent?
Write a next steps section
- Sequencing not done at WSI. E.g. will tracking system store ENA run accessions?
- Web front end?
- Tie up with SIMS system (maybe just a placeholder for now to say that this needs to happen

podpearson · 2018-09-28T11:34:12Z

@magnusmanske , could you talk to @sclaugoncalves to understand what the different library IDs available are, and ensure that the ones that get included in FITS are documented here?

Also, note that @alimanfoo has suggested (#6) using markdown for documentation, which I think is a good idea.

magnusmanske · 2018-10-05T08:28:43Z

I have ported the Google doc over to markdown, here.

I will incorporate some of the above suggestions. Note, however, that this is the database description document. It is not "FITS MVP", or FITS in general. It describes the database, not the philosophic rationale of having a file tracking system.

podpearson · 2018-10-05T10:10:18Z

I will incorporate some of the above suggestions. Note, however, that this is the database description document. It is not "FITS MVP", or FITS in general. It describes the database, not the philosophic rationale of having a file tracking system.

Yes, fair point. However, I think the comments above should be captured somewhere in the documentation. If you think this is not the right place, could you decide where is and document there?

magnusmanske · 2018-10-17T10:05:55Z

The document is now here

podpearson · 2018-11-23T18:18:02Z

In the following could you:

Separate out any checkbox from this list that you think is not relevant to this particular document either to a new issue, or else as a new checkbox in an existing issue.

Comments on this document (note some of these were previously in the comment dated Sep 28 and they haven't been addressed in the latest version):

"It does not describe FITS in total". But we need this overview somewhere, right? Could you

magnusmanske · 2018-11-26T12:10:36Z

We already have "overview" and "mvp_v1" for, well, an overview. I have linked to those now. I don't see the point in Yet Another Document to duplicate that information.

I have added some field information to the SQL schema itself, where it does not appear relevant for the main document.

I would rather not add the database access details into a git doc/issue. That's just bad form.

magnusmanske · 2018-11-26T12:11:43Z

I'm not sure we need guidelines for the notes. They are, by definition, free-form. My guideline recommendation is "use common sense".

podpearson · 2018-11-29T09:56:20Z

We already have "overview" and "mvp_v1" for, well, an overview. I have linked to those now. I don't see the point in Yet Another Document to duplicate that information.

Sorry, I think I forgot there was already an overview document when writing this. The new links go to raw .md file rather than correctly rendered version - could this be fixed?

I have added some field information to the SQL schema itself, where it does not appear relevant for the main document.

OK, what might be useful is an example of how to access this information from the schema itself

I would rather not add the database access details into a git doc/issue. That's just bad form.

Fair point. How about including the details with the exception of the password and have a note saying "contact @magnusmanske for password" or similar?

magnusmanske · 2018-12-13T09:53:11Z

With this commit, I consider all points here addressed, and close the issue.

podpearson · 2018-12-17T20:12:11Z

@magnusmanske - I've just made a pull request with a few suggested small changes.

I also have a few follow on questions:

"this should be done automatically now, but may be missing from early imports". I find this worrying (see also comments on process doc). Could you retrospectively apply this to older imports. Presumably you still know all of the imports that were done, right?
file_relation table. Is there a convention here about whether the BAM or the CRAM is considered the "parent"? I'm wondering this in particular because I'm wondering which will get select in your code for creating a manifest when both are available.
Why the need for sample.name? How might this be used? Has it been populated in a consistent way to date, e.g. does the sequencescape number represent anything specific?

Please address the above by making pull requests, and outlining answers to each of the above (including question number) either in the pull request comments, or else as comments in this issue.

This issue should be left open until we have sign-off, i.e. agreement at a production meeting that this is good to go.

magnusmanske self-assigned this Sep 7, 2018

podpearson mentioned this issue Sep 7, 2018

Review database description document #10

Open

podpearson mentioned this issue Oct 5, 2018

Write document describing how to create a build manifest from the database #14

Open

magnusmanske added this to the MVP V1 milestone Oct 18, 2018

podpearson unassigned magnusmanske Nov 5, 2018

magnusmanske pushed a commit that referenced this issue Nov 26, 2018

issue #9

bd90cbf

magnusmanske closed this as completed Dec 13, 2018

podpearson mentioned this issue Dec 17, 2018

Update database_design_v1.md #45

Merged

podpearson reopened this Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite database description document #9

Rewrite database description document #9

magnusmanske commented Sep 7, 2018

podpearson commented Sep 28, 2018

podpearson commented Sep 28, 2018

magnusmanske commented Oct 5, 2018

podpearson commented Oct 5, 2018 •

edited

Loading

magnusmanske commented Oct 17, 2018

podpearson commented Nov 23, 2018 •

edited by magnusmanske

Loading

magnusmanske commented Nov 26, 2018

magnusmanske commented Nov 26, 2018

podpearson commented Nov 29, 2018

magnusmanske commented Dec 13, 2018

podpearson commented Dec 17, 2018

Rewrite database description document #9

Rewrite database description document #9

Comments

magnusmanske commented Sep 7, 2018

podpearson commented Sep 28, 2018

podpearson commented Sep 28, 2018

magnusmanske commented Oct 5, 2018

podpearson commented Oct 5, 2018 • edited Loading

magnusmanske commented Oct 17, 2018

podpearson commented Nov 23, 2018 • edited by magnusmanske Loading

magnusmanske commented Nov 26, 2018

magnusmanske commented Nov 26, 2018

podpearson commented Nov 29, 2018

magnusmanske commented Dec 13, 2018

podpearson commented Dec 17, 2018

podpearson commented Oct 5, 2018 •

edited

Loading

podpearson commented Nov 23, 2018 •

edited by magnusmanske

Loading