In [1]:
import warnings
warnings.filterwarnings(action='ignore')

# Chapter 12. Bio.PopGen: Population genetics

Bio.PopGen is a Biopython module supporting population genetics, available in Biopython 1.44 onwards.
The objective for the module is to support widely used data formats, applications and databases.

## 12.1 GenePop

GenePop (http://genepop.curtin.edu.au/) is a popular population genetics software package supporting
Hardy-Weinberg tests, linkage disequilibrium, population differentiation, basic statistics, Fst and migration
estimates, among others. GenePop does not supply sequence based statistics as it doesn’t handle sequence
data. 

The GenePop file format is supported by a wide range of other population genetic software applications,
thus making it a relevant format in the population genetics field.
Bio.PopGen provides a parser and generator of GenePop file format. Utilities to manipulate the content
of a record are also provided. Here is an example on how to read a GenePop file (you can find example
GenePop data files in the [Test/PopGen](https://github.com/biopython/biopython/blob/master/Tests/PopGen) directory of Biopython):

In [2]:
from copy import deepcopy
from Bio.PopGen import GenePop

In [3]:
handle = open('./c3line.gen')
rec = GenePop.read(handle)
handle.close()

In [4]:
print(rec)

Generated by createGenePop.py - (C) Tiago Antao
136255903
136257048
136257636
Pop
1, 003003 004004 002002
2, 003003 003004 002002
3, 003003 004004 002002
4, 003003 004003 000000
Pop
b1, 000000 004004 002002
b2, 000000 004004 002002
b3, 000000 004004 002002
Pop
1, 003003 004004 002002
2, 003003 001004 002002
3, 003002 001001 002002
4, 000000 004004 002002
5, 003003 004004 002002



The most important information in rec will be the loci names and population information (but there is more – use help(GenePop.Record) to check the API documentation). Loci names can be found on rec.loci_list. Population information can be found on rec.populations. 

In [5]:
rec.loci_list

['136255903', '136257048', '136257636']

In [6]:
 rec.populations

[[('1', [(3, 3), (4, 4), (2, 2)]),
  ('2', [(3, 3), (3, 4), (2, 2)]),
  ('3', [(3, 3), (4, 4), (2, 2)]),
  ('4', [(3, 3), (4, 3), (None, None)])],
 [('b1', [(None, None), (4, 4), (2, 2)]),
  ('b2', [(None, None), (4, 4), (2, 2)]),
  ('b3', [(None, None), (4, 4), (2, 2)])],
 [('1', [(3, 3), (4, 4), (2, 2)]),
  ('2', [(3, 3), (1, 4), (2, 2)]),
  ('3', [(3, 2), (1, 1), (2, 2)]),
  ('4', [(None, None), (4, 4), (2, 2)]),
  ('5', [(3, 3), (4, 4), (2, 2)])]]

In [7]:
rec.remove_population(1)
# Removes a population from a record, pos is the population position in
# rec.populations, remember that it starts on position 0.
# rec is altered.

In [8]:
rec.remove_locus_by_position(2)
#Removes a locus by its position, pos is the locus position in
#  rec.loci_list, remember that it starts on position 0.
#  rec is altered.

In [9]:
rec.remove_locus_by_name(1)
# Removes a locus by its name, name is the locus name as in
# rec.loci_list. If the name doesn't exist the function fails
# silently.
# rec is altered.

In [10]:
rec_loci = rec.split_in_loci(0)
# Splits a record in loci, that is, for each loci, it creates a new
# record, with a single loci and all populations.
# The result is returned in a dictionary, being each key the locus name. # The value is the GenePop record.
# rec is not altered.

In [11]:
rec_pops = rec.split_in_pops(['r', 'e', 'c'])
# Splits a record in populations, that is, for each population, it creates # a new record, with a single population and all loci.
# The result is returned in a dictionary, being each key
# the population name. As population names are not available in GenePop,
# they are passed in array (pop_names).
# The value of each dictionary entry is the GenePop record.
# rec is not altered.

In [12]:
rec.populations

[[('1', [(3, 3), (4, 4)]),
  ('2', [(3, 3), (3, 4)]),
  ('3', [(3, 3), (4, 4)]),
  ('4', [(3, 3), (4, 3)])],
 [('1', [(3, 3), (4, 4)]),
  ('2', [(3, 3), (1, 4)]),
  ('3', [(3, 2), (1, 1)]),
  ('4', [(None, None), (4, 4)]),
  ('5', [(3, 3), (4, 4)])]]

In [13]:
rec_loci

{'136255903': <Bio.PopGen.GenePop.Record at 0x7f11343ac250>,
 '136257048': <Bio.PopGen.GenePop.Record at 0x7f11343ac040>}

In [14]:
rec_pops

{'r': <Bio.PopGen.GenePop.Record at 0x7f11343ac430>,
 'e': <Bio.PopGen.GenePop.Record at 0x7f11343ac340>}