Plotting differential scanning fluorimetry (DSF) plots to visualize raw data exported from qPCR machine

In [None]:
#set python environment to micromamba imaging_env - has the general plotting and data analysis packages we will need

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd


On qPCR machine's CFX Maestro software, export all data sheets to CSV

You will get 8 data sheets, all with the file name and a title of the type of datasheet each one is
We only need two of these, titles are:
- Melt Curve RFU Results_FRET.csv 
- Melt Curve Derivative Results_FRET.csv 

Copy these to a new folder to keep all your analysis materials in. Suggest giving them shorter names like this:
- experimentID_RFU.csv
- experimentID_derivative.csv

Also make a CSV file with all of your assay and buffer information. Suggest these columns:
- <b>Well:</b> well id for each sample (string)
- <b>protein_mgmL:</b> protein concentration in mg/mL (value)
- <b>protein:</b> Name of protein used (string)
- <b>buffer_mM:</b> concentration of buffer in mM (value)
- <b>buffer:</b> ID of buffer (string)
- <b>pH:</b> pH of buffer (value)
- <b>salt_mM:</b> concentration of salt (value)
- <b>salt:</b> ID of salt (string)
<br>Add additional columns if experiment includes other components in the buffer, like divalent cations, etc

In your analysis folder you should have three CSV files
- experimentID_RFU.csv
- experimentID_derivative.csv
- experimentID_buffers.csv


In [None]:
#Import data and bufer information here.
#Copy and paste pathname of the corresponding file into the ''

#import DSF relative fluorescence units (RFU) data: 
DSF_data = pd.read_csv('/path/to/data/experimentID_RFU.csv')

#Import Derivative data:
DSF_neg_derivative = pd.read_csv('/path/to/data/experimentID_derivative.csv')

#load buffer information:
buffers = pd.read_csv('/path/to/buffer_info/experimentID_buffers.csv')

Next steps will convert the data to long form and combine everything into one large dataframe. 
<br><br>This will also save the dataframe as a new .csv file in your folder.
<br>Don't forget to give the resulting file a name in the next cell:

In [None]:
#filename for resulting .csv file when everything is combined after next cell is run:
result_filename = "DSF_data_combined.csv" 

In [None]:
#Convert RFU data to long-form:

#change index to temperature column
DSF_index = DSF_data.set_index('Temperature');

#file exported from qPCR machine includes a blank column, use this to delete it
DSF_data_wide = DSF_index.drop(columns='Unnamed: 0');

#want to make DSF data long form, easier to deal with that way
DSF_data_long = DSF_data_wide.melt( var_name="Well", value_name="RFU", ignore_index=False).reset_index()


###############


#Convert RFU Derivative data to long-form:

#change index to temperature column
DSF_nDeriv_index = DSF_neg_derivative.set_index('Temperature')

#file exported from qPCR machine includes a blank column, use this to delete it
DSF_nDeriv_wide = DSF_nDeriv_index.drop(columns='Unnamed: 0')

#data from qPCR machine is negative, we want the positive derivatives
#multiplvalues except for index by -1 to get the derivatives
DSF_deriv_wide = DSF_nDeriv_wide.mul(-1)

#want to make DSF data long form, easier to deal with that way
DSF_deriv_long = DSF_deriv_wide.melt( var_name="Well", value_name="dRFU", ignore_index=False).reset_index()


###############


#make 1 dataframe with all the experimental info together

#first add dRFU data to RFU dataframe
DSF_RFU_dRFU = pd.concat([DSF_data_long, DSF_deriv_long['dRFU']], axis=1)


###############


#next calculate and make columns for normalized RFU and dRFU data

#normalize RFU first:
#find max value of RFU for each well
RFU_max = DSF_RFU_dRFU.groupby("Well")['RFU'].agg('max')
#make a new column called norm_RFU, fill values by dividing RFU by the max RFU of each well
DSF_RFU_dRFU["norm_RFU"] = DSF_RFU_dRFU["RFU"]/DSF_RFU_dRFU["Well"].map(RFU_max)

#next normalize dRFU, same way as we did for RFU
dRFU_max = DSF_RFU_dRFU.groupby("Well")['dRFU'].agg('max')
DSF_RFU_dRFU["norm_dRFU"] = DSF_RFU_dRFU["dRFU"]/DSF_RFU_dRFU["Well"].map(dRFU_max)


###############


#Now we want to add the buffer conditions onto all the data points of the corresponding well
DSF_RFU_dRFU_buffers = DSF_RFU_dRFU.join(buffers.set_index('Well'), on='Well')

#Un-comment the following line to check on your resulting dataframe
#print(DSF_RFU_dRFU_buffers)

#Save dataframe to file:
DSF_RFU_dRFU_buffers.to_csv(result_filename)


Now that larger dataframe is put together, we can get to plotting the data.
<br> Plot raw RFU data to check that everything looks correctly put together:

In [None]:
#Plot derivativess for all wells, color lines by well:

#If you did half of your samples with buffer blanks and half with protein samples, set pallete to something diverging like "coolwarm" so you can distinguish blanks from samples
#If no buffer blanks, set palette to something like "flare" or "crest" to distinguish lines without colors being too jarring
palette_name = "coolwarm_r"


########################


g = sns.lineplot(data=DSF_RFU_dRFU_buffers, x="Temperature", y="dRFU", hue="Well", legend=False, palette=palette_name)
g.set(title="RFU derivatives for all wells");

Some experiments will include a set of buffer blanks. It's good practice to check these to make sure there are no peaks, if they look relatively flat then we don't want to include them in our plots.
<br>Use the next cell to make a new dataframe that does not include the buffer blanks.
<br>If no buffer blanks were used in the experiment, skip this step

In [None]:
#Make a new dataframe with rows that contain only protein samples

DSF_RFU_protein = DSF_RFU_dRFU_buffers.query("protein_mgmL != 0").copy()
g = sns.lineplot(data=DSF_RFU_protein, x="Temperature", y="dRFU", hue="Well", legend=False, palette ="flare")
g.set(title="RFU derivatives for wells containing protein samples");

Next we can plot a subset of our normalized dRFU data by specifying the condition we are interested and coloring the plots according to any variations within that condition 
<br> For example we can plot all the norm_dRFU peaks for a specific buffer and color-code by pH variation
<br> To make multiple plots, copy paste following cell into a new code block and add your new parameters

In [None]:
#First we choose the dataframe we want to pull our data from
#for original dataframe, use DSF_RFU_dRFU_buffers
#for dataframe without buffer blanks, use DSF_RFU_protein
source_df = DSF_RFU_protein

#next we select our condition we are interested in
sel_condition = "ion =='CaCl2'"

#choose how you want to color the graphs. 
#if not varying a specific condition, put "Well" here
variation = "ion_mM"

#set palette to color data 
#to reverse palette add _r to the end of the palette's name (ie "flare" vs. "flare_r")
palette_name = "flare"

#set lower and upper bounds for x axis to zoom in to region of interest
x_min = 40
x_max = 70

#give a title to your graph:
graph_title = "Normalized RFU derivatives, CaCl2"


#############


#make a new dataframe that contains a subset of data from our big dataframe
new_df = source_df.query(sel_condition).copy()

#plot lineplot
g = sns.lineplot(data=new_df, x="Temperature", y="norm_dRFU", hue=variation, palette=palette_name)
g.axes.set_xlim(x_min,x_max);
g.set(title=graph_title);

We often do melt curves for a range of conditions, and we want to see if there are any trends in how the temperature at which the melt curve peak occurs changes as the condition varies.
<br>Here we find the max value of the d_RFU dataset (a rough proxy for the peak - note it doesn't always work for all datasets if there is not a clean single peak above background),
<br>Then we make a dataframe with the conditions we are interested in that correspond with each peak's temperature
<br>Then we make a plot with the following parameters:
- y_axis: temperature at which melt curve peaks
- x_axis: value of the condition you are varying (i.e. ion concentration or pH)
- color: name of condition you are varying (i.e. ion name or buffer name)

In [None]:
#Plot the max norm_dRFU vs. a condition of your choice 
#Enter dataframe to be used for plotting, condition you are varying, and a title for resulting plot here:

#the dataframe you want to pull your data from - could be a subset of the data or the whole thing
df_start = DSF_RFU_protein

#Choose the condition you want to plot against the max norm_dRFU temperature, could be ion concentration or pH
#must be a value
condition = "ion_mM"

#Add another label to identify your sample, could be buffer or ion name
#must be a string
label = "ion"

#Add title for resulting plot here:
plot_title = "Peak melt temps over ion concentration"


###########################


#find the temperatures at which the max norm_dRFU occurs
DSF_Temperature_Index = df_start.set_index('Temperature')
norm_dRFU_max_temp = DSF_Temperature_Index.groupby("Well")["norm_dRFU"].idxmax()

#get the condition specified for each well
well_condition = df_start.groupby("Well")[condition].max()

#Combine pH and temp data into one dataframe
norm_dRFU_max_temp_df = norm_dRFU_max_temp.to_frame('Temperature').reset_index()
well_condition_df = well_condition.to_frame(condition).reset_index()
temp_condition_combined_df = pd.concat([well_condition_df, norm_dRFU_max_temp_df['Temperature']], axis=1)

#Adding label specified earlier to the dataframe (buffer, ion, etc.)
label_df = df_start[['Well', label]].drop_duplicates().reset_index()

#concat label to the temperature and condition dataframe
temp_condition_label_df = pd.concat([temp_condition_combined_df, label_df[label]], axis=1)

#Un-comment following line to check on final dataframe used to make the plot
#print(temp_condition_label_df)

#plot results
ax = sns.scatterplot(data=temp_condition_label_df, x=condition, y='Temperature', hue=label)
ax = sns.lineplot(data=temp_condition_label_df, x=condition, y='Temperature', hue=label, alpha=0.4, legend=False)
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))
ax.set(title=plot_title);