Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVELOP] Adding TAD class #34

Merged
merged 13 commits into from
Dec 19, 2019
Merged

Conversation

gwaybio
Copy link
Collaborator

@gwaybio gwaybio commented Oct 4, 2019

Adding method to facilitate a TAD_Pathways analysis. The method calls all the scripts in order of the tad_pathways pipeline given a file storing SNP info.

This is in preparation of Diana's Longitudinal BMD paper. Not really sure of where this should all live...

The TAD_Pathways portion of the longitudinal BMD analysis consists of three steps:

  1. Running TAD_Pathways on 41 different SNP lists
  2. Simulating gene lists using randomly selected TADs matched by gene content similarity
  3. Comparing implicated genes in real TAD_Pathways to permuted gene lists (and Diana digs deeper into these specific genes in downstream analyses)

At this point, nearly all of the functionality lives on the develop branch in this repo. All of it has been reviewed already (except this PR) (see #21, #22, #25, #26, #30, #31, #33). My original goal was to make this repo a python package so that Diana's analysis could live in an independent repo - but this goal is likely to require several more hours of work.

An alternative is to add the longitudinal BMD analysis here in a separate branch (can branch off of develop) and generate a release on this specific branch. I think this will take the least amount of time.

There is also the element of code review. I certainly don't want to take up any time of anyone in the lab...

Thoughts @cgreene ?

@cgreene
Copy link
Member

cgreene commented Oct 10, 2019

For all of the reasons that code review is important including the opportunity to catch certain bugs, identify new solutions, and making sure code is reusable across the lab (and hopefully also the field), it seems worth going through review. This isn't an inordinately large amount of code. I bet that if you asked @dongbohu he would be happy to take a look.

I'm fine with the develop branch of this repo. You might also be able to get help from @dongbohu to make this repo into a python package. I'm not sure if he has the capacity but it seems like it'd be worth asking.

Copy link

@dongbohu dongbohu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwaygenomics: Most of the code looks good. There are a few minor issues for you to address or clarify.

import yaml


def config_yaml(config_file):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I suppose this module will be used in the future by users to read in a yaml config file?

# Load configuration
with open(config_file, "r") as stream:
config = yaml.load(stream)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a template yaml file? I guess some error handling logic will be needed in the future to validate the input yaml file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently used, but will be in the future. I will add error handling in a future PR (see #35)

@@ -0,0 +1,31 @@
"""
Gregory Way 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019?

@@ -0,0 +1,167 @@
"""
Gregory Way 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019?

"""
Gregory Way 2018
TAD Pathways
scripts/genes/genelist.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path doesn't match the actual location.


def build_custom_tad_genelist(self):
command_list = [
"python",
Copy link

@dongbohu dongbohu Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scripts/build_custom_tad_genelist.py compatible for Python 2 or 3 (or both)? By default, python command calls python 2 on many (if not most) operating systems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python 3 - should I change the call to "python3"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this package is for Python 3 only, then it is a good idea to change it to python3,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in fbe303d

output_pval_file = os.path.join(
self.base_dir, "{}_pvals.tsv".format(self.snp_list_name)
)
if os.path.exists(output_pval_file):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, lines 136-139 can be simplified as:

return os.path.exists(output_pval_file)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

def get_evidence(self):
# The first command builds the evidence
command_list = [
"python",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question on Python 2/3 compatibility.


# The second command summarizes this evidence
command_list = [
"python",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question on python 2/3 compatibility.

webgestalt_exists = self.check_webgestalt()
if webgestalt_exists:
self.get_evidence()
return 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, the return value of this method will be used later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, doesn't look like I use this.. Not sure why its here. Maybe some functionality that I decided not to use. I will remove for now

@DCousminer
Copy link

Proposed next steps to complete the enrichment analysis:

  1. Output from step 2: divvy into 100 sets of 41 matched TADs
  2. Filter by protein-coding genes
  3. Submit to DAVID for functional annotation
  4. Count number of times genes appear with specific key words of interest (e.g. "osteoblast"), being careful to count each gene only once
  5. Create null distribution based on 100 iterations
  6. Compare actual and calculate p-value for enrichment of "real" list vs. simulated

We are working on this-- any thoughts before we proceed are welcome.

update documentation, update all_pathways argument, streamline return statements
update documentation, remove remove_hla_tad argument from compile()
@gwaybio
Copy link
Collaborator Author

gwaybio commented Dec 4, 2019

@dongbohu - ready again for re-review!

Copy link

@dongbohu dongbohu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@gwaybio gwaybio merged commit 2de630d into greenelab:develop Dec 19, 2019
@gwaybio gwaybio deleted the add-tad-class branch December 19, 2019 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants