Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure: Prior Knowledge #107

Closed
cgreene opened this issue Sep 1, 2020 · 7 comments
Closed

Figure: Prior Knowledge #107

cgreene opened this issue Sep 1, 2020 · 7 comments
Assignees
Labels

Comments

@cgreene
Copy link
Collaborator

cgreene commented Sep 1, 2020

It'd be grand to get a figure on how prior knowledge/data can be useful, especially for rare diseases.

@jaclyn-taroni
Copy link
Collaborator

jaclyn-taroni commented Nov 9, 2020

@dvenprasad I am including here the background info for this figure, including any appropriate links I came across. I am also going to quote what I think are the most relevant passages from the manuscript here for your reference.

The relevant section of the manuscript covers the following topics:

  • Knowledge graphs - we specifically talk about graphs that are comprised of multiple relationship types

Knowledge graphs integrate related-but-different data types, creating a rich data source. Examples of public biomedical knowledge graphs and frameworks that could be useful in rare disease include the Monarch Graph Database[doi:10.1093/nar/gkw1128], hetionet[doi:10.7554/eLife.26726], PheKnowLator[doi:10.1101/2020.04.30.071407], and the Global Network of Biomedical Relationships[doi:10.1093/bioinformatics/bty114]. These graphs connect information like genetic, functional, chemical, clinical, and ontological data to enable the exploration of relationships of data with disease phenotypes...

  • Transfer learning - we're mostly focused on feature-representation-transfer (think MultiPLIER)

Transfer learning is an approach where a model trained for one task or domain (source domain) is applied to another, typically related task or domain (target domain). Transfer learning can be supervised (one or both of the source and target domains have labels), or unsupervised (both domains are unlabeled). Though there are multiple types of transfer learning, in a later section we will focus in-depth on feature-representation-transfer. Feature-representation-transfer approaches learn representations from the source domain and apply them to a target domain [doi:10.1109/TKDE.2009.191].

  • Multitask learning

Multitask learning is an approach where classifiers are learned for related individual predictions (tasks) at the same time using a shared representation [doi:10.1023/A:1007379606734].

Multitask neural networks (which predict multiple tasks simultaneously) are thought to improve performance over single task models by learning a shared representation, effectively being exposed to more training data than single task models [doi:10.1023/A:1007379606734; arxiv:1606.08793].

  • Few-shot learning (also called one-shot learning depending on the number of examples)

Few-shot learning is the generalization of a model trained on related tasks to a new task with limited labeled data (e.g., the detection of a patient with a rare disease from a low number of examples of that rare disease).

one-shot or few-shot learning relies on using prior knowledge to generalize to new prediction tasks where there are a low number of examples [@arXiv:1904.05046v3], where a distance metric is learned from input data and used to compare new examples for prediction [@doi:10.1021/acscentsci.6b00367].

@jaclyn-taroni
Copy link
Collaborator

This section of the manuscript is divided up into two headings: Knowledge graphs and Transfer, multitask, and few-shot learning.

I think the main takeaways are somewhat covered above, but I also drew something up for each of those sections that I will include below. These drawings are not necessarily intended to guide development of this figure, but instead to communicate the takeaways. I'll post over on #108 shortly, but did want to note that I'm not sure we need this figure and the "putting it all together" figure tracked on #108 which would cover two specific studies using transfer learning in rare diseases (DeepProfile and MultiPLIER) that unify some of the other concepts introduced in the manuscript. As a result, I'm not totally sure what the main takeaway message is yet.

For context, I'm using a "database" as my representation of a model here because that's consistent with the tentative sketch of the statistical technique figure #106 (comment).

Knowledge graphs

knowledge graphs

I'm worried that this is putting the cart before the horse (the model before the what the model is supposed to be doing) ☝️ in its current form.

Transfer, multitask, and few-shot learning

Image from iOS (41)

Hopefully this figure makes it somewhat clear why we would put these approaches under the same header! What I didn't include was information about supervised vs. unsupervised tasks, but I think that might muddy things a bit 💭 I'm also very wary of including anything in this figure that implicitly references a specific neural network architecture.

@dvenprasad
Copy link
Collaborator

Transfer Learning

transfer-learning

Few-shot Learning

few-shot-learning

@allaway
Copy link
Collaborator

allaway commented Dec 3, 2020

I think these figures are looking great. One comment about similarity metrics: they are often (maybe always?) on a 0 to 1 scale, and are either representative of similarity or distance, not both (e.g. I don't think that the center should be the "origin" for the similarity bars). Could also use a stylized heatmap representation of distance.

Also, thinking about these two concepts (transfer learning and few-shot learning) I think a key distinction that we should highlight in this feature is that transfer learning is leveraging the knowledge of large-complex datasets (and may not have the disease you're studying represented in it at all) to perform a prediction (or related) task, while few-shot or one-shot learning is using the one to few examples of the disease(s) you are studying to perform a prediction (or related) task.

@jaclyn-taroni
Copy link
Collaborator

I think some of the imagery for few shot learning that I've found helpful usually gives you some kind of intuition about why this is a possible strategy. (Granted it helps that they are from the natural image domain!) Here are some examples:

This one is tricky because we don't want to talk about architectures, etc. – I'm wondering if we even need to get into the part about similarity? I'm not sure it's necessary.

I also know we're trying to keep things at a high-level of abstraction in many cases, but is there a place for having these figures be a little more specific (e.g., we use some kind of representation of medical images)?

@dvenprasad
Copy link
Collaborator

Transfer Learning

  • The initial dataset still has some of the classes in the test dataset but low number of sample (see light green and purple)
  • Using the horizontal rectangle box to indicate rare disease data (lots of features, few samples)

transfer-learning

Few shot learning
Used @jaclyn-taroni sketch and modeled it after that layout.

few-shot-learning

@dvenprasad
Copy link
Collaborator

Updated figures based on Monday's call.

Transfer Learning

  • Updated caption of big dataset

transfer-learning

Few Shot Learning

  • Combined the 4 small datasets to make one big dataset
  • Greyed out everything but pink and green classes
  • Removed the classification of other colors
  • Retained only classes from Query Set in the classification

few-shot-learning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants