# Assignment

Using matplotlib/pyplot, implement a function that receives a pandas dataframe with three columns (chromosome, position, pvalue) and draws a manhattan plot with a result similar to this:

![manhattan](https://github.com/ne1s0n/dataviz_python/raw/main/resources/proj01_manhattan_target.png)





# Data

It is provided a GWAS result dataset originally included with the [Plink](https://github.com/ShujiaHuang/qmplot/blob/main/tests/data/gwas_plink_result.tsv) software, which I saved in the course repo for extra security. The file is:

* tab separated
* without header
* 9224 rows (one per SNP)
* 12 columns:
  * `"CHROM", "POS", "ID", "REF", "ALT", "A1", "TEST", "OBS_CT", "BETA", "SE", "T_STAT", "P"`
* some rows have a missing value in the `P` column, and need to be removed (see [.dropna()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna))

You may need a refresh on the [pandas.read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function.

The dataset is available at the following address:


In [None]:
GWAS_RESULTS_URL = 'https://github.com/ne1s0n/dataviz_python/raw/main/resources/gwas_plink_result_cleaned.tsv'

# Is there anybody NOT familiar with manhattan plots?

Raise your hand!

# Existing solutions

Many software solutions are already available to create manhattan plots with more options that we will ever able to implement. There's no doubt that if you *need* to do a manhattan plot it's better to not reinvent the wheel and use something stable and published. The whole point of this project is to do meaningful exercise, though, not to create a competitive tool. 

That said, you may want to take a look to:

* [manhattan plots with bioinfokit](https://www.reneshbedre.com/blog/manhattan-plot.html), a bioinformatics-oriented package
* [manhattan plots with plotly](https://plotly.com/python/manhattan-plot/), a generic visualization library with emphasis on interactivity
* [manhattan plots with qmplot](https://pythonawesome.com/a-python-package-for-creating-high-quality-manhattan-and-q-q-plots-from-gwas-results/) : a dedicated package for creating high-quality manhattan and Q-Q plots from GWAS results 

# That's it

That's the assignment. Keep scrolling if you want to read and follow the subassignments (SA).

<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>
<br><br><br><br><br><br>

# SA0) Read the data

Read the data from the remote url saved in `GWAS_RESULTS_URL` variable, remembering that there's no file header but you should name the columns

`"CHROM", "POS", "ID", "REF", "ALT", "A1", "TEST", "OBS_CT", "BETA", "SE", "T_STAT", "P"`

Moreover remember to

* remove lines with missing (`NA`) P values
* compute -log10(p)

In [None]:
import pandas as pd
import numpy as np

colnames = ["CHROM", "POS", "ID", "REF", "ALT", "A1", "TEST", "OBS_CT", "BETA", "SE", "T_STAT", "P"]
df = pd.read_csv(filepath_or_buffer = GWAS_RESULTS_URL, sep = '\t', names = colnames)
print(df.shape)
df.dropna(inplace = True)
print(df.shape)

#computing the -log10(p)
df['SCORE'] = -np.log10(df['P'])
print(df.shape)


df.head()

# SA1) A simpler problem: single chromosome manhattan plot 

At this point you should already be able to do a single chromosome plot.

In [None]:
import matplotlib.pyplot as plt

#focusing on chromosome 1
df_ch1 = df[df['CHROM'] == 'chr01']

plt.scatter(x = df_ch1['POS'], y=df_ch1['SCORE'])

# SA2) Sort the data by chromosome

In [None]:
df.sort_values(by = 'CHROM', inplace = True)

# SA3) Compute the length of each chromosome

This will certainly come handy. Put the values in a variable.



In [None]:
chroms = df.groupby('CHROM')['POS'].max()
print(chroms)

In [None]:
chroms['chr01'] * 3