# Python Basics

_This notebook will allow you to practice some basic skills for using python: working with different data types, using various data structures, reading and writing text files, using conditionals, control flow structures, creating functions, and of course working with ipython notebooks._


__Data File:__ The file `genos.txt` has a column of genotypes.
There are 3 types of genotypes: `'AA'`, `'AG'`, `'GG'`, and missing values are denoted by `'NA'`

I. Read in the data and store the contents in a list called __genos__

II. Calculate the number of occurrences of each genotype, and store the results in a dictionary called __geno_counts__. Use the following 3 approaches:
1. Use a **for** loop to count the genotypes (store the result in a dictionary)
2. Get the same counts but this time using the `count()` method
3. Another alternative is to use `Counter` from __Collections__

III. Once you've counted the genotypes, make a function __get_proportions()__ that takes `geno_counts` and returns a dictionary with relative frequencies (i.e. proportions) of genotypes.

IV. Convert the string value in __genos__ into integers ('NA' remains as 'NA') and put them in a new list called **numeric_genos**:
- `'AA'` = 0
- `'AG'` = 1
- `'GG'` = 2
- `'NA'` = `'NA'`

V. Write the data in **numeric_genos** to a text file called `genos_int.txt`

VI. Finally, convert (and open) your notebook to html by running these commands from the shell:

```shell
ipython nbconvert genotypes.ipynb
open genotypes.html
```

In [1]:
# things to be imported
from __future__ import division  # if you use python 2.?
from collections import Counter

---

## I. Reading a text file

Some refs about Reading Files:

- File Operations: [https://github.com/dlab-berkeley/python-fundamentals/blob/master/cheat-sheets/12-Files.ipynb](https://github.com/dlab-berkeley/python-fundamentals/blob/master/cheat-sheets/12-Files.ipynb)
- Reading Text Files: [http://www.jarrodmillman.com/rcsds/lectures/reading_text_files.html](http://www.jarrodmillman.com/rcsds/lectures/reading_text_files.html)

In [2]:
# open 'genos.txt' (in read mode)
my_file = open("genos.txt", "rt")
# empty list
genos = []
# stripping newline characters
for line in my_file:
    genos.append(line.strip())
# close file connection
my_file.close()

In [3]:
# how many lines
len(genos)

1000

In [4]:
# first 5 lines
genos[:5]

['GG', 'GG', 'AG', 'AG', 'AA']

In [5]:
# last 5 lines
genos[-5:]

['GG', 'AG', 'AG', 'GG', 'AA']

---

## II. Counting Genotypes

In [42]:
# count AA, AG, GG, NA
count_AA = 0
count_AG = 0
count_GG = 0
count_NA = 0

for gen in genos:
    if gen == 'AA':
        count_AA = count_AA + 1
    elif gen == 'AG':
        count_AG = count_AG + 1
    elif gen == 'GG':
        count_GG = count_GG + 1
    else:
        count_NA = count_NA + 1
    geno_counts = {'AA': count_AA, 'AG': count_AG, 'GG': count_GG, 'NA': count_NA}
    
geno_counts

{'AA': 339, 'AG': 315, 'GG': 333, 'NA': 13}

### Using count method

In [7]:
genos.count('AA')

339

### Using Counter from collections

In [9]:
b = Counter(genos)

In [10]:
b

Counter({'AA': 339, 'AG': 315, 'GG': 333, 'NA': 13})

In [12]:
b['AA']

339

---

## III. Function to Calculate Proportions

In [19]:
geno_counts

{'AA': 339, 'AG': 315, 'GG': 333, 'NA': 13}

In [22]:
geno_counts.values()

[339, 333, 315, 13]

In [43]:
def get_proportions(counts):
    """computes the proportions of each genotype
        - input: dictionary with counts
        - output: dictionary with proportions
    """
    # total 
    total = 0
    for value in counts.itervalues():
        total = total + value
    # proportions
    proportions = []
    for value in counts.values():
        proportions.append(value / total)
    # output
    geno_props = dict(zip(counts.keys(), proportions))
    return(geno_props)

In [44]:
get_proportions(geno_counts)

{'AA': 0.339, 'AG': 0.315, 'GG': 0.333, 'NA': 0.013}

---

## IV. Converting to numeric genotypes

In [15]:
# count AA, AG, GG, NA
numeric_genos = []

for gen in genos:
    if gen == 'AA':
        numeric_genos.append(0)
    elif gen == 'AG':
        numeric_genos.append(1)
    elif gen == 'GG':
        numeric_genos.append(2)
    else:
        numeric_genos.append('NA')
    
# check it
Counter(numeric_genos)

Counter({0: 339, 1: 315, 2: 333, 'NA': 13})

---

## V. Write Numeric Genotypes to a text file

In [16]:
new_file = open("genos_int.txt", "w")

for gen in numeric_genos:
    new_file.write(str(gen) + '\n')
new_file.close()