-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add built in support the mouse igH analysis #25
Comments
Dear Zhou, Regarding building a model for your data: AlignmentsThe first step is to align your sequences. This has proven to be a critical step in order to obtain accurate models and I strongly encourage you to optimize and double check this step by adjusting the different thresholds and parameters. Most importantly I strongly recommend you to use your knowledge of the sequencing process (e.g primers positions) to constrain the alignment offsets (global offsets or template specific offsets) via the dedicated command line options. Building a new model for an immune chainStarting from an existing model topologyThis is most likely the simplest option. Use the model topology (model_parms.txt) provided along with an existing entry in IGoR's model folder by loading it using the -load_custom_model command line, providing no path to a marginal file. IGoR will automatically create the corresponding marginal file with a uniform distribution. See the Advanced Usage section of the documentation. Building a new model topology from scratchIGoR's model is encoded by a Bayesian network for which a C++ API is provided. For this you will need to use the C++ interface. An example of the syntax is given in the -run_demo section of the main.cpp file. Here are the simple steps one should take: One can create a new recombination event, say e.g V choice: //Construct a TCRb model
Gene_choice v_choice(V_gene,v_genomic);
v_choice.set_nickname("v_choice");
v_choice.set_priority(7); Add this event as a node in the Bayesian network: // Create the model graph
Model_Parms parms;
//Add nodes to the graph
parms.add_event(&v_choice); Finally add conditional dependencies (directed edges) between events as desired: // Add conditional dependency between V choice and number of deletions on 3' side of the V gene
// This will lead IGoR to record P(v_3_del|V)
parms.add_edge(&v_choice,&v_3_del); Self consistency checksIn a nutshell: there is no perfect recipe, the only thing I can advise is to remain cautious at every step (alignment, inference, model selection etc). Inference convergenceFirst of all, although it may sound obvious you should make sure that the inference procedure has converged after the number of iterations you have specified have been accomplished. For this you can look at the log likelihood and/or parameters convergence (i.e how much they keep varying in the last iterations). The log likelihoods are given in the Check against simpler methodsAlthough far from perfect it is always good to check that the results you obtain at least make sense vs simpler methods based on alignments. Check against synthetic dataIt is generally good to make sure that your different alignments and inference parameters make sense regarding the sequencing dataset you have. You can perform this check by generating synthetic sequences from your model learned with IGoR. Then cut/edit your sequences in a way that mimics your sequencing data (try to mimic single end sequencing our paired end sequencing by cutting your sequences in regions corresponding to your primers, add a C part after the J if your primers lie in the C part, etc). Then align and re-infer a model based on your synthetic sequences and check the consistency between your originally inferred model and the one inferred on the synthetic data (e.g by computing the DKL between the two distributions using pygor). See the IGoR paper for technical details. Check consistency between datasetsPretty straightforward: simply make sure that by using biological replicates, other donors or even other sequencing techniques that your results hold. This is probably the most bulletproof check, and will also give you information about the limitations of different sequencing technologies. Check higher order statisticsAnother model design strategy is to compute quantities that are not directly encoded in your recombination model. For instance compute the expected amount of shared sequences as done in Pogorelly et al and check that prediction of your model against the actual fraction of shared sequences in your dataset. Another example is what we have done in IGoR's paper to demonstrate the existence of double Ds. Yet another example is to compute mutual information between recombination events in the data (e.g using the scenario counter) vs the conditional dependencies encoded in your recombination model's Bayesian network. Hope this helps, |
Hi @qmarcou
|
Hi, Quentin
Could you add an database for IGoR so it can support mouse IgH analysis? In addition, I have some mouse IgH sequencing data, and could you show me how to build the customized mouse IgH model step by step? Highly appreciated!
Best
Zhou
The text was updated successfully, but these errors were encountered: