# Welcome to the Week 14 Jupyter notebook! 

###This Notebook will allow you to analyze the data pertaining to your three questions in the Ecology Project. Follow the instructions written in the notebook and copy a plot for each question. You will be using these results in your Group mini-poster project. The code cells themselves require some completion in order to function. You will need to add code where indicated. There are multiple code cells for each question. You should only run those that correspond to the selections you made in the planning process.

## In each box, descriptions of each command are given and are set apart from the coding language by the # symbol. You should click on each box and then the "run cell" button (looks like a "play" button, on the far left side of the cell) to execute the code written inside.

##These first steps will set up the Notebook, group the data by plot type, and display descriptive statistics about each plot type (fenced, unfenced and transect). You must run each of these cells in order for the notebook to work properly.

In [0]:
#Just as you did in Week 11, these first lines of code set up the Notebook environment.
#We are going to want to make some graphs, also called plots, so we first create this environment.
#The code below this line sets up a plotting environment inside the notebook.
%matplotlib inline
%matplotlib notebook
#Next we will bring in some shortcut libraries that we will use for our analyses.
#Think of these like toolboxes containing lots of shortcuts. We call these modules.
#When the code line says "import", we are simply bringing in Python modules with code and objects we can use.
import pandas as pd #pandas provides the capability for spreadsheets
import numpy as np #for numerical analysis
import matplotlib.pyplot as plt #for plotting
import scipy.stats as ss #for statistical analysis
import seaborn as sb #for nicer graphics
#now we have a set of tools that we can call on inside the notebook to do things for us.
sb.set_style('darkgrid') #sets background style for graphics

Now we are ready to begin our data analysis. 

The first step is to import the data generated by the class. When you click "play" on the cell below, click the "Choose Files" box and select the "EcologySp19.csv" file you should have downloaded from Blackboard. 

In [0]:
from google.colab import files
uploaded = files.upload()

In [0]:
#This code allows you to verify that your file was correctly uploaded.
#You should see a message that the file was uploaded and has a specific length.
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

In [0]:
#Now you will take that data you just imported and extract the numbers into a form Python understands
#Here you are using the "pandas" library to tell the computer to pull the numbers from the file
#into a table and to call that table "ecology". That way any time you want to reference this data,
#you simply need to use the "ecology" command.
#If you were successful here, you should see a list appear with all the data collected.
import pandas as pd
import io
ecology = pd.read_csv(io.StringIO(uploaded['EcologySp19.csv'].decode('utf-8')))
ecology.head()

In [0]:
#Now you need to put the data into groups
grouped = ecology.groupby('Plot_type')

In [0]:
#This cell will show the descriptive statistics for each of the groups
grouped.describe() .transpose()

In [0]:
#Next, we will create individual data sets for each type of plot. This will
#allow you to call one particular set depending on your question
fenced = ecology[ecology['Plot_type']=='Fenced']
unfenced = ecology[ecology['Plot_type']=='Unfenced']
transect = ecology[ecology['Plot_type']=='Transect']

In [0]:
#Let's do a quick check and make sure everything up to this point has been done
#correctly. When you run the code in this cell, you should see a condensed 
#summary of the fenced dataset. If this is not the case, verify that you have
#executed ALL cells in order.
fenced.describe()



---



##Question 1: Do deer affect groundcover plant biodiversity?

You developed a more specific question and hypothesis that included one of the measures of diversity we calculated. This question will only consider the fenced and unfenced data sets. Transect data is not considered in this question.

Just like we did in Week 10, the first thing is to check for a normal distribution by plotting a histogram (distplot). You should have chosen species richness, Simpson index or Shannon index for your response variable. Look back to your Mini-poster outline to refresh your memory here.

In [0]:
%matplotlib inline
# Look at the first line of code creating a title for your histogram. Type your chosen response variable inbetween the '' marks.
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
# Look at the second line of code which creates the distribution plot for the fenced data set
# Just as you did above, type your response variable between the '' marks
plt.title('')
sb.distplot(fenced[''])
#Decide if you think the fenced data is normally distributed or not. This plot does not need to be included in the poster.

In [0]:
#This code cell does the same thing but for unfenced sites. Type in your response variable into the 
#code just as you did in the cell above. 
plt.title('')
sb.distplot(unfenced[''])
#Decide if you think the unfenced data is normally distributed or not. This plot does not need to be included in the poster.

In your mini-poster outline, you decided which type of plot to use for this question. 

###Run only the code cell appropriate for your selection! This plot should end up in your mini-poster so make sure to take a screenshot.

In [0]:
#Run this cell if you chose to make a bar/box plot.
#You can see where we've removed the transect data with the ! symbol
#Scroll to the right in this cell and find the y= ''. Here you should once again type your response variable between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
sb.boxplot(data=ecology[ecology['Plot_type'] != 'Transect'], x='Plot_type',y='',width=0.2)

In [0]:
#Run this cell if you chose to make a scatterplot.
#Make sure to input your variables between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
sb.regplot(ecology[''],ecology[''])

Now you should do the statistical analysis. In your mini-poster outline, you decided which statistical test to do depending on the normality of the data. In the cells above, you generated distribution plots to help you assess normality. If both data sets (fenced and unfenced) were normally distributed, run the appropriate test for normal data. If one or both of your data sets (fenced or unfenced) were non-normally distributed, run the appropriate test for non-normal data.
###Run only the code cell appropriate for your selection! Make sure you take note of the result (i.e. p-value) of your test. This information needs to be included in your mini-poster.

In [0]:
# This cell contains the code to run a unpaired t-test.
# As you have done in the previous cells, you must type in your response variable between the ''. 
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
# You do not need to change the second line of code which simply reports the p-value in an easy to find way.
result=ss.ttest_ind(fenced[''],unfenced[''],equal_var=False)
print("P-value:",result[1])

In [0]:
# This cell contains the code to run a Mann-Whitney U test.
# As you have done in the previous cells, you must type in your response variable between the ''. 
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
# You do not need to change the second line of code which simply reports the p-value in an easy to find way.
result=ss.mannwhitneyu(fenced[''], unfenced[''],use_continuity=True, alternative='two-sided')
print("p-value:",result[1])

In [0]:
# This code will run a Pearson Corellation coefficient analysis.
# The results are shown as the coefficient then the p-value.
# Complete the code by typing your variables between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
ss.pearsonr(ecology[''],ecology[''])
# This line of code returns the regression analysis
slope,intercept,rval,pvalue,stderr = ss.linregress(x=ecology[''],y=ecology[''])
print('Slope',slope)
print('Intercept',intercept)
print('Rsq',rval**2)
print('pvalue',pvalue)

In [0]:
# This code will run a Spearman correlation coefficient analysis.
# Complete the code by typing your variables between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Groundcover_species_richness, Groundcover_Simpson, or Groundcover_Shannon
ss.spearmanr(ecology[''], ecology[''])



---



##Question 2: Do deer affect invertebrate biodiversity?

You developed a more specific question and hypothesis that included one of the measures of diversity we calculated. This question will only consider the fenced and unfenced data sets. Transect data is not considered in this question.

Just like you just did, the first thing is to check for a normal distribution by plotting a histogram (distplot). You should have chosen invertebrate species richness, Simpson index or Shannon index for your response variable. Look back to your Mini-poster outline to refresh your memory here.

In [0]:
# Look at the first line of code creating a title for your histogram. Type your chosen response variable inbetween the '' marks.
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
# Look at the second line of code which creates the distribution plot for the fenced data set
# Just as you did above, type your response variable between the '' marks
plt.title('')
sb.distplot(fenced[''])
#Decide if you think the fenced data is normally distributed or not. This plot does not need to be in your poster.

In [0]:
# This code cell does the same thing but for unfenced sites. Type in your response variable into the 
# code just as you did in the cell above. 
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
plt.title('')
sb.distplot(unfenced[''])
#Decide if you think the unfenced data is normally distributed or not. This plot does not need to be in your poster.

In your mini-poster outline, you decided which type of plot to use for this question. 

###Run only the code cell appropriate for your selection! This plot should end up in your mini-poster so make sure to take a screenshot.

In [0]:
# Run this cell if you chose to make a bar/box plot.
# Scroll to the right in this cell and find the y= ''. Here you should once again type your response variable between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
sb.boxplot(data=ecology[ecology['Plot_type'] != 'Transect'], x='Plot_type',y='',width=0.2)

In [0]:
# Run this cell if you chose to make a scatterplot.
# Make sure to input your variables between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
sb.regplot(ecology[''],ecology[''])

Now you should do the statistical analysis. In your mini-poster outline, you decided which statistical test to do depending on the normality of the data. In the cells above, you generated distribution plots to help you assess normality. If both data sets (fenced and unfenced) were normally distributed, run the appropriate test for normal data. If one or both of your data sets (fenced or unfenced) were non-normally distributed, run the appropriate test for non-normal data.
###Run only the code cell appropriate for your selection! Make sure you take note of the result (i.e. p-value) of your test. This information needs to be included in your mini-poster.

In [0]:
# This cell contains the code to run a unpaired t-test.
# As you have done in the previous cells, you 
# must type in your response variable between the ''. You do not need to change the second line of code which simply 
# reports the p-value in an easy to find way.
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
result=ss.ttest_ind(fenced[''],unfenced[''],equal_var=False)
print("P-value:",result[1])

In [0]:
# This cell contains the code to run a Mann-Whitney U test.
# As you have done in the previous cells, you 
# must type in your variable between the first two '' marks. 
# You do not need to change the second line of code which simply 
# reports the p-value in an easy to find way.
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
result=ss.mannwhitneyu(fenced[''], unfenced[''],use_continuity=True, alternative='two-sided')
print("p-value:",result[1])

In [0]:
# This code will run a Pearson Corellation coefficient analysis.
# The results are shown as the coefficient then the p-value.
# Complete the code by typing your variables between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
ss.pearsonr(ecology[''],ecology[''])
# This line of code returns the regression analysis
slope,intercept,rval,pvalue,stderr = ss.linregress(x=ecology[''],y=ecology[''])
print('Slope',slope)
print('Intercept',intercept)
print('Rsq',rval**2)
print('pvalue',pvalue)

In [0]:
# This code will run a Spearman correlation coefficient analysis.
# Complete the code by typing your variables between the ''
# The variable name must match EXACTLY to one of these (case sensitive!): 
# Invert_species_richness, Invert_Simpson, or Invert_Shannon
ss.spearmanr(ecology[''], ecology[''])



---



##Question 3: Student's choice

You should have picked one of the seven provided options for the third question. The next sections are divided up based on those questions. As you saw with the first two questions, each section below has several cells that you need to choose from to generate the appropriate plots and statistical tests. 

For simplicity, you do not need to check for normality in the data you use for this third questions. The statistical test options are pre-selected for the correct data distribution.

Make sure you are screenshotting any plot and recording the p-value for tests.


###Option 1: Does the act of setting up the unfenced plot alter biodiversity of groundcover plants or invertebrates (pick one)?
With this question, unfenced plots are compared to transect plots. You can see how this is done in the code below with the ! = 'Fenced' line
Here we are excluding (indicated by !) fenced sites.

###Run only the cells appropriate for your selection!

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
sb.boxplot(data=ecology[ecology['Plot_type'] != 'Fenced'], x='Plot_type', y='', width=0.2)

In [0]:
# scatterplot
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
sb.lmplot('','',fit_reg=True, data=unfenced)

In [0]:
# t-test unpaired samples
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
ss.ttest_ind(unfenced[''],transect[''],equal_var=False)

In [0]:
# spearman 
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(unfenced[''], unfenced[''])

###Option 2: Is the biodiversity of groundcover plants or invertebrates (pick one) related to the number of deer photographed in the unfenced site?
You are only considering unfenced data here because those sites are the only ones for which we have deer photograph data.

###Run only the cells appropriate for your selection!

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
sb.boxplot(data=ecology[ecology['Plot_type'] != 'Fenced'], x='Plot_type', y='', width=0.2)

In [0]:
# scatterplot
# You MUST put a response variable (whichever you selected) between the '' marks
# Don't change the 'Deer_photos' variable
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
sb.lmplot('','Deer_photos',fit_reg=True, data=unfenced)

In [0]:
# t-test unpaired samples
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
ss.ttest_ind(unfenced[''],transect[''],equal_var=False)

In [0]:
# spearman 
# You MUST put a response variable (whichever you selected) between the '' marks
# Do not change the 'Deer_photos' variable
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(unfenced[''], unfenced['Deer_photos'])

###Option 3: Is groundcover plant diversity related to invertebrate diverstiy when deer are either excluded or included (pick one)?
This question does not consider transect data.
###Run only the cells appropriate for your selection!

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
sb.boxplot(data=ecology[ecology['Plot_type'] != 'transect'], x='Plot_type', y='', width=0.2)

In [0]:
# scatter graph
# You MUST put the variables(whichever you selected) between the sets of '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groudcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
# For this question, you should have picked TWO 
# If you selected to examine the relationship when deer are excluded, type fenced after data=
# If you selected to examine the relationship when deer are included, type unfenced after data=
sb.lmplot('','',fit_reg=True, data=)

In [0]:
# t-test unpaired samples
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
ss.ttest_ind(unfenced[''],fenced[''],equal_var=False)

In [0]:
# spearman 
# If you selected to examine the relationship when deer are excluded, replace SITE with fenced
# If you selected to examine the relationship when deer are included, replace SITE with unfenced
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Invert_species_richness, Invert_Simpson, Invert_Shannon
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(SITE[''], SITE[''])

###Option 4: Do deer alter surface soil pH, nutrient content, or texture (pick one) of the topsoil?
For this question, only fenced and unfenced sites are considered.
###Run only the cells appropriate for your selection!

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.boxplot(data=ecology[ecology['Plot_type'] != 'Transect'], x='Plot_type', y='', width=0.2)

In [0]:
# Bar plot ONLY if you chose to examine the effect of deer on topsoil texture
# There is no code to modify here
%matplotlib inline
sb.catplot(x='Plot_type', hue="Topsoil_class", kind="count", palette='pastel', edgecolor='.6', data=ecology[ecology['Plot_type'] != 'Transect'])

In [0]:
# scatter graph
# You MUST put the variables (whichever you selected) between the sets of '' marks
# Options for groundcover plant diveristy are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.lmplot('','Fenced',fit_reg=True, data=ecology)

In [0]:
# t-test unpaired samples for pH, N, P, or K 
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
ss.ttest_ind(unfenced[''],fenced[''],equal_var=False)

In [0]:
# spearman 
# You MUST put the variables (whichever you selected) between the '' marks
# Options are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(ecology[''], ecology['Fenced'])

In [0]:
# If you chose to look at the effect of deer on Topsoil_class, you should have 
# noted that for this question, you have two categorical variables with topsoil type
# being the response variable. For this type of question, you will need to perform
# a chi-square test to determine if fenced is different from unfenced in terms of 
# distribution of soil type. The first two lines count the occurrance of each type of soil for fenced sites and unfenced sites
# Then we make an array containing these values and perform the Chi-square test
# Execute this code ONLY if you selected this question variant
# There is no need to modify this code
fencesoiltype = fenced['Topsoil_class'].value_counts()
unfencesoiltype = unfenced['Topsoil_class'].value_counts()
array = np.array([fencesoiltype, unfencesoiltype])
chi2_stat, p_val, dof, ex = ss.chi2_contingency(array)
print("===P-Value===")
print(p_val)

###Option 5: Does surface soil pH, nutrient content, or texture (pick one) affect groundcover plant diversity (without deer as a confounding factor)?
For this question, only fenced data are considered.
If you elected to examine the effect of topsoil texture on groundcover plant diversity, you must run the cells specific for that situation because that variable (soil texture) is categorical, not continuous like the other variable options.
###Run only the cells appropriate for your selection!

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.boxplot(data=fenced, x='Plot_type', y='Topsoil_N', width=0.2)

In [0]:
# scatter graph
# You MUST put the variables (whichever you selected) between the sets of '' marks
# Options for groundcover plant diveristy are Groundcover_species_richness, Groundcover_Simpson, Groudcover_Shannon
# Options for other variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
%matplotlib inline
sb.lmplot('Groundcover_Simpson','Topsoil_pH',fit_reg=True, data=fenced)

In [0]:
# Bar/box plot ONLY if you elected to examine the effect of topsoil texture on groundcover diversity
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
sb.boxplot(data=fenced, x='Topsoil_class', y='', width=0.2)

In [0]:
# t-test unpaired samples
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon,
# Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
ss.ttest_ind(unfenced[''],transect[''],equal_var=False)

In [0]:
# spearman 
# You MUST put the variables (whichever you selected) between the '' marks
# Options for groundcover plant diveristy are Groundcover_species_richness, Groundcover_Simpson, Groudcover_Shannon
# Options for other variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(fenced[''], fenced[''])

In [0]:
# If you chose to look at the effect of soil texture on groundcover diversity, you will
# need to run an ANOVA becasue you are comparing diversity across different groups (soil textures)
# and there are more than two groups
# If your ANOVA p-value is significant,
# make sure to also run the Tukey post-hoc test in the next cell.
# You MUST put the variable (whichever you selected) where VARIABLE is typed.
# Options for the variable are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon
fenceloamysand = fenced[fenced['Topsoil_class']=='loamy sand']
fencesandloam = fenced[fenced['Topsoil_class']=='sandy loam']
fenceloam = fenced[fenced['Topsoil_class']=='loam']
fencesand = fenced[fenced['Topsoil_class']=='sand']
f, p = ss.f_oneway(fenceloamysand.VARIABLE, fencesandloam.VARIABLE, fenceloam.VARIABLE, fencesand.VARIABLE)
print ('One-way ANOVA')
print ('=============')
 
print ('F value:', f)
print ('P value:', p, '\n')

In [0]:
# Run this code if you chose to do an ANOVA AND your resulting p-value was <0.05.
# If your p-value was >0.05, you do not need to run this test
# You MUST put the variable (whichever you selected) between the '' marks.
# Options for the variable are Groundcover_species_richness, Groundcover_Simpson, Groundcover_Shannon
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

mc = MultiComparison(fenced[''], fenced['Topsoil_class'])
result = mc.tukeyhsd()
 
print(result)
print(mc.groupsunique)

###Option 6: Does surface soil pH, nutrient content, or texture (pick one) affect invertebrate diversity (without deer as a confounding factor)?
For this question, only fenced data are considered. If you elected to examine the effect of topsoil texture on invertebrate diversity, you must run the cells specific for that situation because that variable (soil texture) is categorical, not continuous like the other variable options.
###Run only the cells appropriate for your selection!

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Invert_species_richness, Invert_Simpson, Invert_Shannon,
# Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.boxplot(data=fenced, x='Plot_type', y='', width=0.2)

In [0]:
# scatter graph
# You MUST put the variables (whichever you selected) between the sets of '' marks
# Options for groundcover plant diveristy are Invert_species_richness, Invert_Simpson, Invert_Shannon
# Options for other variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.lmplot('','',fit_reg=True, data=fenced)

In [0]:
# Bar/box plot ONLY if you elected to examine the effect of topsoil texture on invertebrate diversity
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Invert_species_richness, Invert_Simpson, Invert_Shannon,
sb.boxplot(data=fenced, x='Topsoil_class', y='', width=0.2)

In [0]:
# t-test unpaired samples
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Invert_species_richness, Invert_Simpson, Invert_Shannon
# Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K, Topsoil_class
ss.ttest_ind(unfenced[''],transect[''],equal_var=False)

In [0]:
# spearman 
# You MUST put the variables (whichever you selected) between the '' marks
# Options for groundcover plant diveristy are Invert_species_richness, Invert_Simpson, Invert_Shannon
# Options for other variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K, Topsoil_class
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(fenced[''], fenced[''])

In [0]:
# If you chose to look at the effect of soil texture on invertebrate diversity, you will
# need to run an ANOVA becasue you are comparing diversity across different groups (soil textures)
# and there are more than two groups
# If your ANOVA p-value is significant,
# make sure to also run the Tukey post-hoc test in the next cell.
# You MUST put the variable (whichever you selected) where VARIABLE is typed.
# Options for the variable are Invert_species_richness, Invert_Simpson, Invert_Shannon
fenceloamysand = fenced[fenced['Topsoil_class']=='loamy sand']
fencesandloam = fenced[fenced['Topsoil_class']=='sandy loam']
fenceloam = fenced[fenced['Topsoil_class']=='loam']
fencesand = fenced[fenced['Topsoil_class']=='sand']
f, p = ss.f_oneway(fenceloamysand.VARIABLE, fencesandloam.VARIABLE, fenceloam.VARIABLE, fencesand.VARIABLE)
print ('One-way ANOVA')
print ('=============')
 
print ('F value:', f)
print ('P value:', p, '\n')

In [0]:
# Run this code if you chose to do an ANOVA AND your resulting p-value was <0.05.
# If your p-value was >0.05, you do not need to run this test
# You MUST put the variable (whichever you selected) between the '' marks.
# Options for the variable are Invert_species_richness, Invert_Simpson, Invert_Shannon
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

mc = MultiComparison(fenced[''], fenced['Topsoil_class'])
result = mc.tukeyhsd()
 
print(result)
print(mc.groupsunique)

### Option 7: Are clustered sites more likely to have similar topsoil pH or nutrients (pick one) without deer as a confounding factor?
For this question, only fenced data are considered from locations with more than one site (Mill Neck, Nelsen 1&2, Compton)

### Run only the cells appropriate for your selection!

In [0]:
# RUN THIS CELL FIRST REGARDLESS OF SELECTION BELOW
# consider Mill Neck, Nelsen 1&2 and Compton only because these sites have more than one plot of each type
singlesites = ['Loner', 'Yellow Jacket', 'Clown Car', 'Keck']
nodeernosingles = fenced[~fenced.Site_name.isin(singlesites)]

In [0]:
# Bar/box plot
# You MUST put a response variable (whichever you selected) between the '' marks next to y=
# Options are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.boxplot(data=nodeernosingles, x='Site_name', y='', width=0.2)

In [0]:
# scatter graph
# You MUST put the variables (whichever you selected) between the sets of '' marks
# Options for other variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
sb.lmplot('','',fit_reg=True, data=fenced)

In [0]:
# Run this code if you chose to do an ANOVA. If your ANOVA p-value is significant,
# make sure to also run the Tukey post-hoc test in the next cell.
# You MUST put the variable (whichever you selected) where VARIABLE is typed.
# Options for the variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
f, p = ss.f_oneway(nodeernosingles[nodeernosingles['Site_name'] == 'Mill Neck'].VARIABLE,
                      nodeernosingles[nodeernosingles['Site_name'] == 'Nelsen 1'].VARIABLE,
                      nodeernosingles[nodeernosingles['Site_name'] == 'Nelsen 2'].VARIABLE,
                      nodeernosingles[nodeernosingles['Site_name'] == 'Compton'].VARIABLE)
 
print ('One-way ANOVA')
print ('=============')
 
print ('F value:', f)
print ('P value:', p, '\n')

In [0]:
# Run this code if you chose to do an ANOVA AND your resulting p-value was <0.05.
# If your p-value was >0.05, you do not need to run this test
# You MUST put the variable (whichever you selected) between the '' marks.
# Options for the variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

mc = MultiComparison(nodeernosingles[''], nodeernosingles['Site_name'])
result = mc.tukeyhsd()
 
print(result)
print(mc.groupsunique)

In [0]:
# t-test unpaired samples
# You MUST put a response variable (whichever you selected) between the '' marks
# Options are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
ss.ttest_ind(unfenced[''],transect[''],equal_var=False)

In [0]:
# spearman 
# You MUST put the variables (whichever you selected) between the '' marks
# Options for the variable are Topsoil_pH, Topsoil_P, Topsoil_N, Topsoil_K
# There is a chance depending on the variable selected, that your regression analysis will result in an error 
# where the result will read “nan”. If this is the case, interpret the result as a p-value = 1.0:
ss.spearmanr(nodeernosingles[''], nodeernosingles[''])