Q1. Look at the chapter on interactive graphics and, specifically, the code to display a subject's MRICloud data as a sunburst plot. Do the following. Display this subject's data as a Sankey diagram. Display as many levels as you can for type = 1, starting from the intracranial volume. Put this in a file called hw4.ipynb.

In [6]:
import pandas as pd
import numpy as np
import plotly.express as px
# from plotly.offline import plot # for spyder
import plotly.graph_objects as go
import plotly.io as pio

## load in the hierarchy information
url = "https://raw.githubusercontent.com/bcaffo/MRIcloudT1volumetrics/master/inst/extdata/multilevel_lookup_table.txt"
multilevel_lookup = pd.read_csv(url, sep = "\t").drop(['Level5'], axis = 1)
multilevel_lookup = multilevel_lookup.rename(columns = {
    "modify"   : "roi", 
    "modify.1" : "level4",
    "modify.2" : "level3", 
    "modify.3" : "level2",
    "modify.4" : "level1"})
multilevel_lookup = multilevel_lookup[['roi', 'level4', 'level3', 'level2', 'level1']]
multilevel_lookup.head()

## Now load in the subject data
id = 127
subjectData = pd.read_csv("https://raw.githubusercontent.com/smart-stats/ds4bio_book/main/book/assetts/kirby21AllLevels.csv")
subjectData = subjectData.loc[(subjectData.type == 1) & (subjectData.level == 5) & (subjectData.id == id)]
subjectData = subjectData[['roi', 'volume']]
## Merge the subject data with the multilevel data
subjectData = pd.merge(subjectData, multilevel_lookup, on = "roi")
subjectData = subjectData.assign(icv = "ICV")
subjectData = subjectData.assign(comp = subjectData.volume / np.sum(subjectData.volume))
subjectData.head()

# add color column to the dataframe
def color(row):
   if row['level1'] == 'CSF':
      return 'rgba(171, 99, 250, 0.4)'
   if row['level1'] == 'Telencephalon_L':
      return 'rgba(239, 85, 59, 0.4)'
   if row['level1'] == 'Telencephalon_R':
      return 'rgba(99, 111, 251, 0.4)'
   if row['level1'] == 'Diencephalon_L':
      return 'rgba(255, 161, 90, 0.4)'
   if row['level1'] == 'Diencephalon_R':
      return 'rgba(25, 211, 243, 0.4)'
   if row['level1'] == 'Mesencephalon':
      return 'rgba(255, 102, 146, 0.4)'
   if row['level1'] == 'Metencephalon':
      return 'rgba(0, 204, 150, 0.4)'
   if row['level1'] == 'Myelencephalon':
      return 'rgba(182, 233, 128, 0.4)'
subjectData['color'] = subjectData.apply(lambda row: color(row), axis=1)

# plot sunburst diagram
#fig = px.sunburst(subjectData, path=['icv', 'level1', 'level2', 'level3', 'level4', 'roi'], values='comp', width=800, height=800)
#fig.show()
#plot(fig) # for spyder

# define the edge dataframe for each level
level1 = subjectData.groupby(['icv', 'level1', 'color'], as_index=False).sum()
level1.columns = ['source', 'target', 'color', 'volume', 'comp']

level2 = subjectData.groupby(['level1', 'level2', 'color'], as_index=False).sum()
level2.columns = ['source', 'target', 'color', 'volume', 'comp']

level3 = subjectData.groupby(['level2', 'level3', 'color'], as_index=False).sum()
level3.columns = ['source', 'target', 'color', 'volume', 'comp']

level4 = subjectData.groupby(['level3', 'level4', 'color'], as_index=False).sum()
level4.columns = ['source', 'target', 'color', 'volume', 'comp']

# define the node list
node_list = ['ICV'] + list(level1.target) + list(level2.target) + list(level3.target) + list(level4.target)

# define the MAIN edge dataframe
edge_df = pd.concat([level1, level2, level3, level4], axis=0, ignore_index=True)
# replacing labels (strings) in edge dataframe with numbers 
my_dict = {}
for k in node_list:
    my_dict[k] = node_list.index(k)
# update edge dataframe
edge_df = edge_df.replace(my_dict)
# drop self-loops from the edge dataframe (rows with the same source and target)
edge_df = edge_df.drop(edge_df.loc[edge_df['source']==edge_df['target']].index)

# plot Sankey diagram
fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = node_list,
      color = "blue"
    ),
    link = dict(
      source = edge_df.source,
      target = edge_df.target,
      value = edge_df.comp,
      color = edge_df.color
  ))])

fig.update_layout(title_text="Farzad's Sankey Diagram", height=1000, font_size=10)
fig.show()
# plot(fig) # for spyder

# convert the plot to a static image and write it to an HTML file
pio.write_html(fig, file='./Sankey_plot.html')

Q2. Create a simple webpage containing this graphic and host it on github pages. -Do not- host this off of your assignment repo from github classroom, since this is not public. Instead, you'll have to create a new public repo from your regular github account and add this file. Put the link to your live web page in a markdown cell of your hw4.ipynb file. Note, an easy way to create a webpage with this graphic is to export an ipynb as an html file.

<span style="color:red">Link to my live web page:</span>
https://fvfarahani.github.io/Farzad_Plots/Sankey_plot.html

Q3. Create the opioid sqlite database from https://smart-stats.github.io/ds4bio_book/book/_build/html/sqlite.html. However, only go to the step where the csv files are read into the database. Then exit sqlite and you should have a file opioid.db that has the data. Next, read the three tables into pandas dataframes and do the data wrangling from the sqlite chapter directly in pandas. Add the python code to your hw4.ipynb file.

In [18]:
import sqlite3 as sq3
import pandas as pd

con = sq3.connect('./opioid.db')
population = pd.read_sql_query('SELECT * from population', con)
annual = pd.read_sql_query('SELECT * from annual', con)
land = pd.read_sql_query('SELECT * from land', con)
con.close # close the connection

# updating countyfips (set countyfips = 05097 where BUYER_STATE = "AR" and BUYER_COUNTY = "MONTGOMERY")
mask = (annual.BUYER_COUNTY=='MONTGOMERY') & (annual.BUYER_STATE=='AR')
annual.countyfips[mask] = '05097'

# delete rows from the annual table that have missing county data (or keep others)
annual = annual[annual['BUYER_COUNTY']!='NA']
annual.head(20)

Unnamed: 0,Unnamed: 1,BUYER_COUNTY,BUYER_STATE,year,count,DOSAGE_UNIT,countyfips
0,1,ABBEVILLE,SC,2006,877,363620,45001
1,2,ABBEVILLE,SC,2007,908,402940,45001
2,3,ABBEVILLE,SC,2008,871,424590,45001
3,4,ABBEVILLE,SC,2009,930,467230,45001
4,5,ABBEVILLE,SC,2010,1197,539280,45001
5,6,ABBEVILLE,SC,2011,1327,566560,45001
6,7,ABBEVILLE,SC,2012,1509,589010,45001
7,8,ABBEVILLE,SC,2013,1572,596420,45001
8,9,ABBEVILLE,SC,2014,1558,641350,45001
9,10,ACADIA,LA,2006,5802,1969720,22001


In [20]:
annual = annual.assign(pills = pd.to_numeric((annual.DOSAGE_UNIT))/1000000)
annual_year=annual.groupby(['year'], as_index=False).mean().rename(columns = {'pills' : 'average_pills'})

# plot interactive scatter plot
import plotly.io as pio
import plotly.express as px
fig = px.scatter(annual_year, x='year', y='average_pills')
fig.show()

# convert the plot to a static image and write it to an HTML file
pio.write_html(fig, file='./Scatter_plot.html')

Q4. Create an interactive scatter plot of average number of opiod pills by year plot using plotly. See the example here. Don't do the intervals (little vertical lines), only the points. Add your plot to an html file with your repo for your Sanky diagram and host it publicly. Put a link to your hosted file in a markdown cell of your hw4.ipynb file. Note, an easy way to create a webpage with this graphic is to export an ipynb as an html file.

<span style="color:red">Link to my live web page:</span>
https://fvfarahani.github.io/Farzad_Plots/Scatter_plot.html