# Intro to Python, pandas, and iPython notebooks 

### iPython notebooks 

Welcome to jupyter notebook on HPC (high-performance compute cluster). iPython notebooks / jupyter notebooks are interactive multimedia environments to write and run code.

For those who have programmed and run their own programs before, perhaps you're more familiar with running your program once from start to end. iPython notebooks provide a way to run code more interactively.

Each unit in a jupyter notebook is called a cell. Typically when I use ipynbs, I run one cell at a time. You can run a cell by clicking on it and pressing shift+enter. Try it on the cell below.

In [1]:
print('hello world')

hello world


Another really nice feature of ipynbs that particularly well-suited for data science is the persistence of variables after running a cell of code. That is to say, if you run some code in one cell, you can then access it in another cell and investigate what it is. This allows you to run small chunks of code at a time, make sure that they're doing what you want them to, and then to continue using your variables.

Take the following example, say we have a list and we want to swap out all instances of 'dog' for 'cat'

In [2]:
pet_list = ['dog', 'cat', 'guinea_pig', 'lizard', 'cat', 'dog', 'mouse']

First, we can see if we can identify which parts of the list contain 'dog' instances.

In [3]:
# loop through the list and find the indices of the list where the element is 'dog'
dog_indices = []
for i, element in enumerate(pet_list):
    if element == 'dog':
        dog_indices.append(i)

In [4]:
# make sure that each of the indices we found are 'dog'
for i in dog_indices:
    print()
    print(pet_list[i])
    print(pet_list[i] == 'dog')


dog
True

dog
True


In [5]:
# now that we can be confident that we've identified all the elements of the list that are 'dog', we can replace them
for i in dog_indices:
    pet_list[i] = 'cat'

In [6]:
# and finally, verify that all of the elements of the list have been replaced
print(pet_list)
print('dog' in pet_list)

['cat', 'cat', 'guinea_pig', 'lizard', 'cat', 'cat', 'mouse']
False


This is a pretty simple example, but hopefully it demonstrates the value of being able to stop and examine what your code is doing while you're writing it, instead of debugging it by running everything over again every single time. This convenience really shines when you're dealing with big data and don't have to bother loading it / transforming it (which can be very time consuming steps) every time you want to try something new. 

An additional nice functionality of ipynbs is their direct compatibility with Markdown. For each cell that you write, you can choose whether it should be interpreted as code, markdown, or raw text. Markdown supports *text formatting* **such as this**. You can also make headings to different sections of your jupyter notebook, which you can see in the other parts of this notebook.

### Python

This notebook is running Python. Python is a programming language that is commonly used in bioinformatics. It is (in my opinion) easy to understand and write, and has really powerful libraries that you can load for data science and visualization. I'm hoping a lot of you know how to program already but I'll go over some basic Python syntax and programming basics here.

In [10]:
import pandas as pd # library for data matrix manipulation
import seaborn as sns # library for plotting pandas-formatted data

In [8]:
my_list = ['frog', 'bat', 'axlotl'] # list
print(my_list[0]) # indexing a list (python is a 0-based language)
print(my_list[1])
print(my_list[2])

frog
bat
axlotl


In [9]:
my_dict = {'axlotl': 'amphibian', 'bat': 'mammal', 'frog': 'amphibian'} # dictionary - store key:value pairs
print(my_dict['axlotl'])
print(my_dict['bat'])
print(my_dict['frog'])

amphibian
mammal
amphibian


In [12]:
data = ['amphibian', 'mammal', 'amphibian']
ind = ['axlotl', 'bat', 'frog']
df = pd.DataFrame(data=data, index=ind, columns=['kind']) # pandas data frame - I work with these every day!
df

Unnamed: 0,kind
axlotl,amphibian
bat,mammal
frog,amphibian


In [15]:
# for loops

# iterate until a certain number
for i in range(10):
    print(i)
    
print()

# iterate through a list
my_list = ['frog', 'bat', 'axlotl'] # list
for animal in my_list:
    print(animal)
    
print()
    
# iterate through a list while getting the number of each iteration using enumerate()
for i, animal in enumerate(my_list):
    print('Animal at {}: {}'.format(i, animal)) # string formatting - this is also useful

0
1
2
3
4
5
6
7
8
9

frog
bat
axlotl

Animal at 0: frog
Animal at 1: bat
Animal at 2: axlotl


In [19]:
# if / else blocks - execute code based on whether or not a condition is met
for animal in my_list:
    if animal == 'frog' or animal == 'axlotl': # or logic - one or the other condition is met
        print('Animal {} is an amphibian'.format(animal))
    else:
        print('Animal {} is not an amphibian'.format(animal))
        
print()

first_element = True # boolean variable, can be True or False
for animal in my_list:
    if first_element and animal == 'frog': # and logic - both conditions must be true
        print('First animal is frog')
    elif first_element and animal == 'bat': # elif: the following will execute only if the first part does not. in this case, it will never run.
        print('First animal is bat')
    else:
        # you can index strings the same way you can index lists. here we're just trying to see if the word
        # starts with a vowel
        if animal[0] == 'a' or animal[0] == 'e' or animal[0] == 'i' or animal[0] == 'o' or animal[0] == 'u': 
            print('Found an {}'.format(animal))
        else:
            print('Fount a {}'.format(animal))
    
    first_element = False

Animal frog is an amphibian
Animal bat is not an amphibian
Animal axlotl is an amphibian

First animal is frog
Fount a bat
Found an axlotl
