Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated to MSigDB v6.1 and extra information was added #1

Closed
wants to merge 10 commits into from

Conversation

ToledoEM
Copy link

Changes:

  • I wanted to update this data to the current version of MSigDB. But also add the sub category of the geneset to the tables. Now you can filter the C5 (Gene Ontology) by BP (GO Biological Processes), CC (GO cellular components) and MF (GO Molecular Function).
  • Made it with gene symbol, since I prefer to work with gene symbols than with other genes ids.
  • Vignette and README file are partially updated .
  • Mouse dataset was done translating genes with biomaRt.

@ToledoEM
Copy link
Author

Just corrections in readme, .gitignore and comments in code in data-raw.

@ToledoEM ToledoEM closed this Jan 11, 2018
@ToledoEM ToledoEM reopened this Jan 11, 2018
@stephenturner
Copy link
Owner

Thanks for the updates! I'd like to merge this but before doing so I want to make sure I can maintain it going forward. Could you update your PR with comments throughout the data parsing script in data-raw, explaining at a high level what each new step is doing? If you could also put in some wgets or curls showing how you downloaded the data directly from MSigDB, that would help in the way of reproducibility and future maintenance. Thanks!

@ToledoEM
Copy link
Author

No problem, I was looking for something like this since I do not enjoy much working with list in R. But the update was necessary, and didn't took much time.
The code for parsing the data is now properly commented. I try to use wget or curl to get the gmt files from the Broad website, but since it require registration I didn't found a way to avoid it in the CLI.
I keep an eye on MSigDB for new releases, they are a very useful datasets for my job.

@ToledoEM
Copy link
Author

ToledoEM commented Jan 22, 2018

MSigDB wget/curl problem

I have been doing some digging in this. The files can not be downloaded from the ftp at the Broad Institute (ftp://ftp.broadinstitute.org/pub/gsea/gene_sets/) since there is only one geneset in the public folder.

If you look closely at the GSEA java software, it will connect to another ftp server (gseaftp.broadinstitute.org), which it is under user/password to download the gmt files. Potentially since it is ftp, and the user and password are in cleartext....... you get the point. But I will not write a simple script in bash with wget --user=$USER --password=$PASSWORD $URLs

@stephenturner
Copy link
Owner

Thanks @ToledoEM -- and I fully agree, plain text user/passwords aren't a great idea. It could work easily with a dummy/throwaway user/password, but it still seems sketchy to me. Please excuse my sloth in reviewing this PR, and don't take it as a lack of interest -- still trying to catch up on the holiday backlog. Will review shortly.

@steffenheyne
Copy link

Can you please review or merge this?
Thanks,
steffen

@ToledoEM
Copy link
Author

ToledoEM commented Jul 3, 2018

Hi @steffenheyne, you can always get my local version https://github.com/ToledoEM/msigdf/tree/ToledoEM-local
The branch local-version it is just the same but with updated README.md

@steffenheyne
Copy link

@ToledoEM Thanks, yeah I forgot about this possibillity!

@stephenturner
Copy link
Owner

@ToledoEM - thank you so much for the suggestions to update this to the latest gene sets in MSigDB. Unfortunately I don't have the bandwidth to review this PR for my ability to maintain the updated code. I also like the idea of maintaining the ability to programmatically access the starting files rather than having to sign in and download them locally. I'm going to close this issue, archive this repo, and put a note in the readme to point to your branch for an updated version.

stephenturner added a commit that referenced this pull request Jul 5, 2018
added link to @ToledoEM fork updated as suggested in #1
@ToledoEM
Copy link
Author

ToledoEM commented Jul 5, 2018

@stephenturner - You are welcome, I have been using this library constantly in my work.
Happy to keep it going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants