Block Regression Mapping (BRM)

Block Regression Mapping (BRM) is a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. The core function is programmed by R language. For the detailed description of the method, please see the original article "BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing" published in Bioinformatics.

Please cite: Huang L, Tang W, Bu S, et al. BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. Bioinformatics, 2019. https://doi.org/10.1093/bioinformatics/btz861

Content

Introduction
Getting started
Input data file
- Example
Input configuration files
- Chromosome length file
  - Example
- Block regression mapping configuration file
  - Example
About u_α/2
- The u_α/2 values in various populations
  - The u_α/2 values of some species in various populations
  - A tool for calculating u_α/2 in an F_k population
- The u_α/2 values in random-mating progeny populations
  - The u_α/2 values of yeast and maize in random-mating progeny populations
  - A tool for calculating u_α/2 in an R_k population
Output files
- Result 1 file
  - Example
- Result 2 file
  - Example
Q&A

Introduction

BRM is a method of BSA-seq for mapping QTLs or major genes. It can apply to different populations including recombinant inbred lines (RIL), doubled haploid (DH), haploid (H), F₂ , F₃, and so on.

BRM finds out candidate QTL (or gene) peaks in three main steps. The first step is to divide the genome into many small blocks of equal size and calculate the average allele frequency (AF) of each block in each pool, the average allele frequency in the population (AFP) of each block as well as the allele frequency difference (AFD) between two pools in each block. The second step is to figure out the AFD threshold of the 5% overall significance level at every genomic position. The third step is to identify possible QTL positions (significant AFD peaks) and calculate the 95% confidence interval of each QTL. The BRM scripts output the above results in two files. The first file contains the results of step one and step two, and the second file contains the results of step three.

back to top

Getting started

Download all scripts and examples from GitHub, then you can have a try with the example data:

git clone https://github.com/huanglikun/BRM.git
cd BRM
# Usage: 
# Rscript BRM.R <Block regression mapping configuration file> <Chromosome length file> <Input data in bsa format>
# Design A
Rscript BRM.R configureExample/designA/BRM_conf.txt configureExample/chr_length.tsv dataExample/designA/yeast_markers_dp10.bsa
# Design B (the experiment which has a high selected pool and a random pool)
Rscript BRM.R configureExample/designBH/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBH/yeast_markers_dp10.bsa
# Design B (the experiment which has a low selected pool and a random pool)
Rscript BRM.R configureExample/designBL/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBL/yeast_markers_dp10.bsa

back to top

Input data file

The input data file is a tab-separated values file named with bsa format. It contains six columns:

Column 1	Column 2	Column 3	Column 4	Column 5	Column 6
Chromosome code	Marker position (bp)	a	b	c	d

Note: a, b, c and d stand for the counts of marker allele of PARENT 1 in pool 1 (high pool in Design A or selected high/low pool in Design B), PARENT 2 in pool 1, PARENT 1 in pool 2 (low pool in Design A or random pool in Design B) and PARENT 2 in pool 2, respectively.

Example

dataExample/designBL/yeast_markers_dp10.bsa

I	33070	6	6	6	7
I	33147	7	7	2	5
I	33152	4	5	2	5
I	33200	3	9	4	3
I	33293	7	7	7	2  
......

back to top

Input configuration files

Chromosome length file

It's a tab-separated values file with two columns:

Column 1 Column 2

Chromosome code Chromosome length (bp)
- Example
  
  configureExample/chr_length.tsv
```
  I    230218
  II    813184
  III    316620
  ......
```

Note: The chromosome code in data file should corresponding to the chromosome code in chromosome length file.

back to top

Block regression mapping configuration file

It's a key-value file containing parameters required for block regression mapping (see the following table). The separator is "=". The space between key and value will be ignored.

Key	Value type	Description	Recommend value
Design	A, BH, BL	A: Design A; BH: Design B with selected high pool; BL: Design B with selected low pool	Default: A
n1	Integer	Number of individuals in pool1 (high pool)	Depending on experiment design
n2	Integer	Number of individuals in pool2 (low pool)	Depending on experiment design
t	0, 1	Population type	t = 0 for DH, RI or H; t = 1 for F₂, F₃, etc.
ua	float	u_α/2 value	See the table below, or can be calculated by cal_ua_fk.R or cal_ua_rk.R in the "tools" directory (see explanation below).
UNIT	Integer	Unit block size (bp)	Default: 1000
DEG	Integer	The degree of the polynomials used in LOESS (local polynomial regression fitting)	2
BLK	Integer or float	Block size = BLK * UNIT	e.g.: yeast, BLK = 0.2 ; rice, BLK = 20
MIN	Integer	Minimum total depth in a valid block	10
MINVALID	Integer	Minimum valid block number in a chromosome	10
Result1_File	File path	To define the result 1 output path (optional)	Default: result/result1.xls
Result2_File	File path	To define the result 2 output path (optional)	Default: result/result2.xls

Example

configureExample/designBL/BRM_conf.txt

  # version 0.3

  # Experiment
  Design   = BL       # available: A, BH, BL. BH: Design B with HIGH selected pool. BL: Design B with LOW selected pool.
  n1       = 300      # pool 1 size. The high pool for design A or the selected pool for design B.
  n2       = 300      # pool 2 size. The low pool for design A or the random pool for design B.
  t        = 0        # For DH or RI etc., t=0; F2 or F3 etc., t=1
  ua       = 4.08     # For rice, F2:3.65; F3:3.74; F4:3.78.

  # Block regression
  UNIT     = 1000     # block unit(bp)
  DEG      = 2        # The degree of the polynomials to be used in Local Polynomial Regression Fitting.
  BLK      = 0.2      # block size = BLK * UNIT
  MIN      = 10       # min total depth in block
  MINVALID = 10       # min valid blocks in one chromosome (needed to be at least 10)

  # output file determination (optional)
  # Result1_File = result/result1.xls	# including allele frequency and threshold
  # Result2_File = result/result2.xls	# including peak information and confidence interval
  Result1_File = result_random_L/result1.xls	# including allele frequency and threshold
  Result2_File = result_random_L/result2.xls	# including peak information and confidence interval

back to top

About u_α/2

The u_α/2 values in various populations

The u_α/2 values of some species in various populations

Species	n^a	Genome size^b		Ratio (kb/cM)	u_α/2^c				Ref.
Species	n^a	cM	Mb	Ratio (kb/cM)	H/DH/F₂	F₃	F₄	RIL	Ref.
Arabidopsis	5	600	119	199	3.41	3.50	3.54	3.57	[1]
Cucumber	7	1390	192	138	3.62	3.72	3.75	3.78	[2]
Maize	10	2060	2106	1023	3.72	3.81	3.85	3.87	[3]
Rapeseed	18	2520	855	339	3.78	3.87	3.90	3.93	[4]
Rice	12	1530	382	250	3.65	3.74	3.78	3.80	[5]
Tobacco	12	3270	3613	1105	3.84	3.92	3.96	3.98	[6]
Tomato	12	1470	807	549	3.65	3.73	3.77	3.80	[7]
Wheat	21	3140	14547	4633	3.83	3.92	3.95	3.98	[8]
Yeast	16	4900	12	2.5	3.93				[9]

Note: a. n, number of chromosomes in haploid. b. The genetic map length (cM) of each species was from the references listed except for that of yeast, which was calculated by us using the data from [9]. The genome size (Mb) of each species was all from NCBI. c. Corresponding to the overall significance level of 0.05.

References

[1] Garcia-Hernandez,M. et al. (2002) TAIR: a resource for integrated Arabidopsis data. Funct Integr Genomics, 2, 239–253.

[2] Zhou,Q. et al. (2015) A sequencing-based linkage map of cucumber. Molecular Plant, 8, 961–963.

[3] Civardi,L. et al. (1994) The relationship between genetic and physical distances in the cloned a1-sh2 interval of the Zea mays L. genome. Proc Natl Acad Sci U S A, 91, 8268–8272.

[4] Raman,H. et al. (2014) SNP markers-based map construction and genome-wide linkage analysis in Brassica napus. Plant Biotechnology Journal, 12, 851–860.

[5] International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature, 436, 793.

[6] Bindler,G. et al. (2011) A high density genetic map of tobacco (Nicotiana tabacum L.) obtained from large scale microsatellite marker development. Theor Appl Genet, 123, 219.

[7] Shirasawa,K. et al. (2010) SNP Discovery and linkage map construction in cultivated tomato. DNA Research, 17, 381–391.

[8] Yang,Q. et al. (2018) High-density genetic map construction and mapping of the homologous transformation sterility gene (hts) in wheat using GBS markers. BMC Plant Biology, 18, 301.

[9] Bloom,J.S. et al. (2015) Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nat Commun, 6, 8712.

A tool for calculating u_α/2 in an F_k population

A tool is provided in tools/cal_ua_fk.R as below for calculating u_α/2 in an F_k population given the values of necessary parameters.

Quick start

# Usage
# Rscript tools/cal_ua_fk.R <population and species information file>
Rscript tools/cal_ua_fk.R configureExample/fk_rice/ua_fk_conf.txt

Example

configureExample/fk_rice/ua_fk_conf.txt

If we know the approximately total genetic distance of this species,

#
precision = 1e-13
k         = 2            # k=0(RIL) k=1(H/DH) k=2(F2) k=3(F3) k=4(F4) k>4(F5,F6...)
#
chrNum         = 12      # chromosome number
geneticLength  = 1530    # Genome size (cM)
#genomeSize    = 382     # Genome size (Mb)
#ratio         = 249.7   # Ratio (kb/cM)

Otherwise,

#
precision = 1e-13
k         = 2            # k=0(RIL) k=1(H/DH) k=2(F2) k=3(F3) k=4(F4) k>4(F5,F6...)
#
chrNum         = 12      # chromosome number
#geneticLength = 1530    # Genome size (cM)
genomeSize     = 382     # Genome size (Mb)
ratio          = 249.7   # Ratio (kb/cM)

It is a key-value file contains required parameters. The separator is "=". The space between key and value will be ignored. There are at least four parameters needed to be set:

Key	Value type	Description	Recommend value
precision	Float	Precision for numerical calculation.	Default: 1e-13
k	Integer	Generation. k=0 means RIL k=1 means H/DH k=2 means F₂ k=3 means F₃ k=4 means F₄ k>4 means F₅, F₆ ..., and is considered as RIL.	Depending on experimental design
chrNum	Integer	Chromosome number.	Depending on species
geneticLength	Integer	Approximately total genetic distance of the species, the unit is cM. Comment this by "#" to use parameters below instead.	Depending on species
genomeSize	Float	Genome size of the species, the unit is Mb. (optional)	Depending on species
ratio	Float	Physical distance/Genetic distance ratio (kb/cM). (optional)	Depending on species

Result

The u_α/2 will be displayed as ua on the screen:

Read global parameters from configureExample/fk_rice/ua_fk_conf.txt .
Parameter precision=1e-13
Parameter k=2
Parameter chrNum=12
Parameter geneticLength=1530

ua is  3.654641

back to top

The u_α/2 values in random-mating progeny populations

For the case that the progeny population of a cross between two pure-line parents is not generated by selfing but by random mating, the u_α/2 value will be larger. The following table shows the u_α/2 values of yeast and maize in different random-mating progeny populations, where H₁ (or H) is the gamete (haploid) generated by F₁, and R₁ (or F₂) is the sporophyte generated by combination of F₁ gametes; H₂ is the gamete generated by R₁ (F₂), and R₂ is the sporophyte generated by combination of R₁ (F₂) gametes; the others can be deduced likewise.

The u_α/2 values of yeast and maize in random-mating progeny populations

Species H₁(H)/R₁(F₂) H₂/R₂ H₃/R₃ H₄/R₄ H₅/R₅ H₆/R₆ H₇/R₇ H₈/R₈ H₉/R₉

Yeast 3.93 4.02 4.08 4.13 4.17 4.21 4.24 4.26 4.29

Maize 3.72 3.81 3.88 3.93 3.97 4.01 4.04 4.07 4.09

A tool for calculating u_α/2 in an R_k population

A tool is provided in tools/cal_ua_rk.R as below for calculating u_α/2 in a random-mating progeny population given the values of necessary parameters.

Quick start

# Usage
# Rscript tools/cal_ua_rk.R <population and species information file>
Rscript tools/cal_ua_rk.R configureExample/rk_yeast/ua_rk_conf.txt

Example

configureExample/rk_yeast/ua_rk_conf.txt

If we know the approximately total genetic distance of this species,

#
precision = 1e-13
k         = 6
#
chrNum         = 16      # chromosome number
geneticLength  = 4900    # Genome size (cM)
#genomeSize    = 12      # Genome size (Mb)
#ratio         = 2.5     # Ratio (kb/cM)

Otherwise,

#
precision = 1e-13
k         = 6
#
chrNum         = 16      # chromosome number
#geneticLength = 4900    # Genome size (cM)
genomeSize     = 12      # Genome size (Mb)
ratio          = 2.5     # Ratio (kb/cM)

It is a key-value file contains required parameters. The separator is "=". The space between key and value will be ignored. There are at least four parameters needed to be set:

Key	Value type	Description	Recommend value
precision	Float	Precision for numerical calculation.	Default: 1e-13
k	Integer	Generation, e.g. k=6 means R₆ for diploid or H₆ for haploid.	Depending on experimental design
chrNum	Integer	Chromosome number.	Depending on species
geneticLength	Integer	Approximately total genetic distance of the species, the unit is cM. Comment this by "#" to use parameters below instead.	Depending on species
genomeSize	Float	Genome size of the species, the unit is Mb. (optional)	Depending on species
ratio	Float	Physical distance/Genetic distance ratio (kb/cM). (optional)	Depending on species

Result

The u_α/2 will be displayed as ua on the screen:

Read global parameters from configureExample/rk_yeast/ua_rk_conf.txt .
Parameter precision=1e-13
Parameter k=6
Parameter chrNum=16
Parameter geneticLength=4900

recombination rate is  0.02545
Genetic distance is  2.5472  cM
ua is  4.207883

back to top

Output files

Result 1 file

This file contains one line to show the theoretical threshold (assuming AF = 0.5 in the population) and a 11 columns table following. The AF is defined by the allele from Parent 1.

The data table described as below:

Column	Heading	Description
1	Chr.	Chromosome
2	Pos.	Block position (bp)
3	AF1-Observed	Observed value of block average allele frequency in pool 1 (AF1)
4	AF1-Expected	Expected value of block average allele frequency in pool 1 (AF1)
5	AF2-Observed	Observed value of block average allele frequency in pool 2 (AF2)
6	AF2-Expected	Expected value of block average allele frequency in pool 2 (AF2)
7	AFD-Observed	Observed value of allele frequency differency (AF1 - AF2)
8	AFD-Expected	Expected value of allele frequency differency (AF1 - AF2)
9	AFP-Observed	Observed value of block average allele frequency in the population (AFP)
10	AFP-Expected	Expected value of Observed value of block average allele frequency in the population (AFP)
11	Sample threshold	Threshold estimated based on the expected AFP

Example

result_random_L/result1.xls

#Theoretical threshold is ±	0.1666
#Chr.	Pos.	AF1-Observed	AF1-Expected	AF2-Observed	AF2-Expected	AFD-Observed	AFD-Expected	AFP-Observed	AFP-Expected	Sample threshold
......
I	178300	NA	0.4655	0.6667	0.4993	NA	0.0409	0.6667	0.4993	0.1666
I	178500	NA	0.4658	NA	0.4993	NA	0.0404	NA	0.4993	0.1666
I	178700	0.5	0.4661	0.4583	0.4993	-0.0417	0.04	0.4583	0.4993	0.1666
I	178900	0.4545	0.4664	NA	0.4992	NA	0.0396	NA	0.4992	0.1666
I	179100	0.4444	0.4667	0.5	0.4992	0.0556	0.0392	0.5	0.4992	0.1666
I	179300	0.6	0.467	0.2727	0.4992	-0.3273	0.0387	0.2727	0.4992	0.1666
......

back to top

Result 2 file

This file shows the information of candidate QTLs (significant AFD peaks). It contains six columns. By far, some peaks are needed to filter out manually to get the final candidate QTL list.

Column	Heading	Description
1	Chr.	Chromosome code
2	Pos.	Position of block center
3	Val.	Peak value of AFD
4	Peak Dir.	Peak direction: +, upward; -,downward
5	Start	Start point of confidence interval
6	End	End point of confidence interval

Example

result_random_L/result2.xls

#Chr.	Pos.	Val.	Peak Dir.	Start	End
IV	599459	-0.1914	-	534520	680013
VIII	222625	-0.2848	-	153041	287275
XI	643900	-0.3316	-	618381	643900
XII	655812	0.2102	+	550089	733741
XIII	9900	-0.2488	-	9900	108758
XIV	377027	0.273	+	321716	450359

back to top

Q&A

If I have the VCF file generated by Freebayes/GATK, how can I convert the VCF format into BSA format?

A perl script is provided for transforming the VCF format into BSA format.

Quick start

# Usage:
# perl tools/vcf2bsa.pl <samples information file> <markers.vcf> <output file>
perl tools/vcf2bsa.pl configureExample/vcf2bsa/vcf2bsa_conf.txt dataExample/vcf2bsa/markers.freebayes.vcf result/markers.bsa

Example

configureExample/vcf2bsa/vcf2bsa_conf.txt

 # Samples in 2x2 Table
 # Bulk/Pool 1: the first sample
 Table2x2.pool1 = low-pool-RG
 # Bulk/Pool 2: the second sample
 Table2x2.pool2 = random-pool-RG
 # Parent 1
 Table2x2.parent1 = P1-RG
 # Parent 2
 Table2x2.parent2 = P2-RG

It's a key-value file contains samples relationship. The separator is "=". And the space between key and value will be ignored. There are four parameters needed to be set:

Key	Value type	Description
Table2x2.pool1	String	Design A: high selected pool RG tag in VCF file. Design B: selected pool RG tag in VCF file.
Table2x2.pool2	String	Design A: low selected pool RG tag in VCF file. Design B: random pool RG tag in VCF file.
Table2x2.parent1	String	High phenotype value parent sample RG tag in VCF file.
Table2x2.parent2	String	Low phenotype value parent sample RG tag in VCF file.

RG tag is the “Read Group” setting at the reads mapping step. For example, the -R parameter value in BWA.

If one parent is missing, the script will consider the genotype different from the known parent sample as the other parent genotype. If both parents are missing, the script will consider the reference genotype as the known parent genotype.

back to top

Why does the output show that the data size of one chromosome is 0?

If the chromosome code in the data file and the chromosome code in the chromosome length file are not the same, such report will be shown. The chromosome codes in the two files should be exactly the same. It is case sensitive.

back to top

What is the suitable block size?

We recommend a physical distance approximately equivalent to a genetic distance of 0.1 cM as the block size. For example, in yeast, 1 cM is approximately equivalent to 2.5 kb on average, so we choose 0.2 kb as the block size. In rice, 1 cM is about equivalent to 250 kb, so we choose 20 kb as the block size.

back to top

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
configureExample		configureExample
dataExample		dataExample
img		img
tools		tools
.gitignore		.gitignore
BRM.R		BRM.R
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Block Regression Mapping (BRM)

Content

Introduction

Getting started

Input data file

Example

Input configuration files

Chromosome length file

Example

Block regression mapping configuration file

Example

About u_α/2

The u_α/2 values in various populations

The u_α/2 values of some species in various populations

A tool for calculating u_α/2 in an F_k population

Quick start

Result

The u_α/2 values in random-mating progeny populations

The u_α/2 values of yeast and maize in random-mating progeny populations

A tool for calculating u_α/2 in an R_k population

Quick start

Example

Result

Output files

Result 1 file

Example

Result 2 file

Example

Q&A

Quick start

Example

About

Releases 8

Packages

Languages

Species	H₁(H)/R₁(F₂)	H₂/R₂	H₃/R₃	H₄/R₄	H₅/R₅	H₆/R₆	H₇/R₇	H₈/R₈	H₉/R₉
Yeast	3.93	4.02	4.08	4.13	4.17	4.21	4.24	4.26	4.29
Maize	3.72	3.81	3.88	3.93	3.97	4.01	4.04	4.07	4.09

License

huanglikun/BRM

Folders and files

Latest commit

History

Repository files navigation

Block Regression Mapping (BRM)

Content

Introduction

Getting started

Input data file

Example

Input configuration files

Chromosome length file

Example

Block regression mapping configuration file

Example

About uα/2

The uα/2 values in various populations

The uα/2 values of some species in various populations

A tool for calculating uα/2 in an Fk population

Quick start

Result

The uα/2 values in random-mating progeny populations

The uα/2 values of yeast and maize in random-mating progeny populations

A tool for calculating uα/2 in an Rk population

Quick start

Example

Result

Output files

Result 1 file

Example

Result 2 file

Example

Q&A

Quick start

Example

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

About u_α/2

The u_α/2 values in various populations

The u_α/2 values of some species in various populations

A tool for calculating u_α/2 in an F_k population

The u_α/2 values in random-mating progeny populations

The u_α/2 values of yeast and maize in random-mating progeny populations

A tool for calculating u_α/2 in an R_k population

Packages