# 1. Partial Correlations

Modify the code of the `partial_cor()` function from Class 09d, so that it takes as input a correlation matrix and the name of one variable, and returns the partial correlation for each pair of variables controlled for the variable you passed to the function.

For example, if you use the correlation matrices that we calculated in Class 9d, you could use the function by calling: `partial_cor(cor1, CAI)`, this would return the partial correlation of each pair of variables (minus CAI) controlled by CAI.

To do this, please save the `result` dataframe from Class 9d to a file, copy the file to the homework directory, read the file into a datframe, calculate the correlation matrix, and finally test your function.

In [106]:
import pandas as pd
from math import sqrt

def partial_cor(cor, varname):
    cor2 = cor.copy()
    cor2.drop(varname, axis=1, inplace=True)
    cor2.drop(varname, axis=0, inplace=True)
    for x in cor.columns.tolist():
        if x == varname:
            continue
        for y in cor.columns.tolist():
            if y == varname:
                continue
            rdk = ((cor[x][y] - cor[x][varname] * cor[y][varname]) /
                   (sqrt(1 - cor[x][varname] ** 2) * sqrt(1 - cor[y][varname] ** 2)))
            cor2.at[x, y] = rdk
    return cor2

# Load DataFrame from CSV file
df = pd.read_csv('result_df.csv')

# Calculate correlation matrix
cor1 = df.corr(numeric_only=True)

# To do the same calculation for the Pearson correlation, we would use the "CAI" column:
test = partial_cor(cor1, "CAI")
test

Unnamed: 0,dS,dN,dN/dS,dS adjusted,dN/dS adjusted,fitness,fdS,fdN,fdNdS,fdSadj,fdNdSadj
dS,1.0,0.334624,0.081485,1.0,0.145174,0.072793,0.922889,0.370816,0.136501,0.994316,0.21648
dN,0.334624,1.0,0.94219,0.334624,0.967828,0.171,0.227405,0.827892,0.798871,0.322453,0.824645
dN/dS,0.081485,0.94219,1.0,0.081485,0.992111,0.174507,-0.00726,0.818125,0.854336,0.067914,0.858034
dS adjusted,1.0,0.334624,0.081485,1.0,0.145175,0.072793,0.922889,0.370816,0.136502,0.994316,0.21648
dN/dS adjusted,0.145174,0.967828,0.992111,0.145175,1.0,0.173285,0.056628,0.812879,0.829747,0.132763,0.84241
fitness,0.072793,0.171,0.174507,0.072793,0.173285,1.0,0.068038,0.22495,0.21668,0.072368,0.223637
fdS,0.922889,0.227405,-0.00726,0.922889,0.056628,0.068038,1.0,0.304332,0.052298,0.933469,0.15418
fdN,0.370816,0.827892,0.818125,0.370816,0.812879,0.22495,0.304332,1.0,0.965591,0.368648,0.985984
fdNdS,0.136501,0.798871,0.854336,0.136502,0.829747,0.21668,0.052298,0.965591,1.0,0.131464,0.990333
fdSadj,0.994316,0.322453,0.067914,0.994316,0.132763,0.072368,0.933469,0.368648,0.131464,1.0,0.21311


# 2. Results from Wall et al.

Download the HTML page for the Wall et al paper from the PNAS website (https://www.pnas.org/content/102/15/5483/) using your browser. Then use BeautifulSoup (not pandas) to parse the HTML and scrape Table 1, finally print the value of the BeautifulSoup variable containing the table using the `IPYthon.display.Markdown` function to display the table. 

(Unfortunately, we cannot automatically download the HMTL for the page anymore because the PNAS site uses Cloudfare to prevent users from scraping so we cannot use python's request package to download the page)

In [105]:
from bs4 import BeautifulSoup
from IPython.display import Markdown

data = open("pnas.html")
html = data.read()
soup = BeautifulSoup(html, "html.parser")

table = soup.find('table')
fullTable2 = []

rows = table.find_all('tr')
for tr in rows:
    line = []
    for th in tr.find_all('th'):
        line.append(th.get_text().strip())
    for td in tr.find_all('td'):
        line.append(td.get_text().strip())
    fullTable2.append(line)

df = pd.DataFrame(fullTable2)
df = df.to_markdown()

# #convert to markdown
Markdown(df)

|    | 0              | 1                | 2        | 3              | 4         | 5        | 6         |
|---:|:---------------|:-----------------|:---------|:---------------|:----------|:---------|:----------|
|  0 | Evolution rate | Dispensability   | rdk      | Expression     | rxk       | rdk|x    | xk|d      |
|  1 | dN/dS′         | Warringer et al. | 0.239 np | mRNA abundance | -0.368 np | 0.183 np | -0.328 np |
|  2 |                |                  |          | CAI            | -0.528 np | 0.190 np | -0.513 np |
|  3 | dN             | Warringer et al. | 0.237 np | mRNA abundance | -0.363 np | 0.181 np | -0.324 np |
|  4 |                |                  |          | CAI            | -0.493 np | 0.189 np | -0.478 np |
|  5 | dN/dS′         | SGTC             | 0.230 np | mRNA abundance | -0.368 np | 0.166 np | -0.330 np |
|  6 |                |                  |          | CAI            | -0.528 np | 0.187 np | -0.516 np |
|  7 | dN             | SGTC             | 0.227 np | mRNA abundance | -0.363 np | 0.163 np | -0.325 np |
|  8 |                |                  |          | CAI            | -0.493 np | 0.185 np | -0.479 np |
|  9 | dN/dS′         | Warringer et al. | 0.274    | mRNA abundance | -0.279    | 0.259    | -0.256    |
| 10 |                |                  |          | CAI            | -0.522    | 0.241    | -0.505    |
| 11 | dN             | Warringer et al. | 0.274    | mRNA abundance | -0.282    | 0.259    | -0.259    |
| 12 |                |                  |          | CAI            | -0.509    | 0.241    | -0.491    |
| 13 | dN/dS′         | SGTC             | 0.264    | mRNA abundance | -0.279    | 0.252    | -0.258    |
| 14 |                |                  |          | CAI            | -0.522    | 0.232    | -0.505    |
| 15 | dN             | SGTC             | 0.264    | mRNA abundance | -0.282    | 0.251    | -0.262    |
| 16 |                |                  |          | CAI            | -0.509    | 0.232    | -0.491    |