Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including external data #3

Open
xiangzhu opened this issue Jun 14, 2018 · 3 comments
Open

Including external data #3

xiangzhu opened this issue Jun 14, 2018 · 3 comments

Comments

@xiangzhu
Copy link
Collaborator

@pcarbo Hi Peter -- I just had a great meeting with Nick today, and we have the following question about including published external data in LDshrink package. We would appreciate your input.

To make the package user-friendly, we would like to include the following external data files:

  1. Genetic maps from 1000 Genomes: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates/

  2. Approximately independent LD blocks: https://bitbucket.org/nygcresearch/ldetect-data/src

These data files have fairly simple structure. The genetic maps are data frames with the following headers:

[id] [physical position] [genetic position (cumulative)]

The LD blocks are data frames with the following headers:

[chromosome name] [region start] [region stop]

These data files also have formal publications available. I wonder if we could include these external files in LDshrink package, provided that we explicitly cite these publications?

Do we have to worry about licensing issues?

An alternative plan is to provide users with the data preprocessing scripts/functions, but I think this will make the package harder to use.

@pcarbo
Copy link
Member

pcarbo commented Jun 15, 2018

@xiangzhu @CreRecombinase I don't see any issue in adding these data to the R package as long as you make sure to cite the original sources.

For ease of use, I would recommend that you add .RData files to the data folder inside the package. See here for an example of this. See also the files in the man folder for an example of how these data sets were documented in the package. (You can also document data sets with roxygen2.)

Note that data cannot be licensed/copyrighted.

@xiangzhu
Copy link
Collaborator Author

xiangzhu commented Jun 15, 2018

Thanks for the information!

@xiangzhu
Copy link
Collaborator Author

Update: Nick and I decided not include the "approximately independent LD blocks" in LDshrink (at least in this stage), since adding these blocks might complicate the implementation and confuse people.

We are still using these blocks in some projects. Perhaps we can have a vignette showing how to use LDshrink and these blocks together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants