# Jupyter notebook example

USGS Scientists often produce one-off analyses that become part of larger "stories". However, the data and code from these oneoff products often cannot be used for other projects unless the users is an expert in the subject field. Interactive products such as Jupyter notebooks could be used to highlight key findings and help others intereste in these products find them and learn how to use them. 

For this example, we examine eDNA data generated as part of sea lamprey monitoring research. 

First, we read in our data from Science Base. We print the `head()` of the data to peak at it and print the `shape` to see how many rows and columns we have. 

In [None]:
import pandas as pd
import numpy as np
import io
import requests
import matplotlib.pyplot as plt
import numpy as np
import re

url = "https://www.sciencebase.gov/catalog/file/get/59b6cc06e4b08b1644ddf8b3?f=__disk__f7%2F19%2F08%2Ff719084d841c0419e3a7f9a747c156406e32a85b"
s = requests.get(url).content
dat = pd.read_csv(io.StringIO(s.decode('utf-8')))
print(dat.head())
print(dat.shape)

Then, we might filter our data and do other cleanup steps.

In [None]:
dat[['Stock_level', 'Sample_ID', 'date']] = dat['Sample'].apply(lambda x: pd.Series(str(x).split('-')))
FAM = dat[dat["Fluor"] == "FAM"]
copies_to_keep = ["0L", "2L", "20L", "200L"] 
FAM = FAM[FAM["Stock_level"].isin(copies_to_keep)]
FAM["Stock_level"] = FAM["Stock_level"].astype('category')
FAM["Stock_level"] = FAM["Stock_level"].cat.reorder_categories(copies_to_keep, ordered=True)
FAM["Copies_log10"] = np.log10(FAM["Copies"] + 1)
FAM['Sample_replicate'] = FAM['Sample'].str.replace("(\\d+L)-(\\d)([A-Z])", "\\1\\2", regex = True)
print(FAM.head())
print(FAM.shape)

Next, we might plot our data and run other analyses.

In this case, we first plot our data using a boxplot. 

In [None]:
FAM.boxplot(column = "Copies_log10", by ="Stock_level")

In the above example, see how stocking-level increases the amount of eDNA detected in water. 

Originally, a mixed-effect model was run on this code. However, I cannot figure out how to run this so I ran a linear model instead. 
This confirms plot shown above. 

In [None]:
#md = smf.mixedlm("Copies_log10 ~ 1", FAM, groups=FAM["Sample"])
#data = sm.datasets.get_rdataset('dietox', 'geepack').data
#md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
md = smf.ols("Copies_log10 ~ C(Stock_level)", FAM)
#md = sm.MixedLM.from_formula( "Copies_log10 ~ Stock_level", FAM, groups = FAM['Sample_replicate'], re_formula = "Stock_level")
md = md.fit()
print(md.summary())

**Note: My origial code and analysis were done in R and included both better figures and more analysis. Think of this notebook as a proof of concent.**