# Irrigation Methods

This notebook can be run to recreate the analysis done to study irrigation methods.

Begin by importing the library to run the experiments and initiate the classes.

In [None]:
import sys; sys.path.append("../../../")
import tara.SongroveBotanicals.research as hub
irrigation = hub.Irrigation()
%matplotlib inline

### Load the Data
Run the cells below to run the data gathered for the experiment.  For more on how the data was collected see https://www.stellargrove.com/irrigation-methods.  The data returned is broken into two dataframes: yield and growth rate.

In [None]:
df_Yield, df_GrowthRate = irrigation.loadData()


# Analysis of Irrigation Methods Using Crop Yields

## Examine the Data

Run this cell if you want to see what it looks like after being transformed into a usable state.

In [None]:
df_Yield.head()

## Introduction

To begin we will take a quick visual representation of what the data looks like in order to understand whether or not our initial hypothesis looks like it will hold true or not.

## Yield Box Plots

Running the cell below will create a simple box plot for you to view how the data is distributed amongst the different irrigation methods.

In [None]:
df_Yield.boxplot(grid=False)

Now that a visual representation has been established let's use stats to ensure that our intuition holds up to mathematical rigor.

## Calculate the Mean Yield

For each of the different Irrigation Methods, calculate the mean yield of the crop grown.  The mean along with the Standard Deviation will give us an idea of whether or not we can make claims that one method is better than another.

In [None]:
yields = irrigation.calculateMeans(df_Yield,sort_order="d")
yields

## Write Out Hypothesis

In [None]:
f_Yield, p_Yield = irrigation.runANOVA(df_Yield)
print(f_Yield, p_Yield)

As can be seen from the very large F statistic and correspondingly small p-value, we can begin to assert some level of confidence that our null hypothesis is rightfully rejected.

Looking at the means, it is natural to rank the effectiveness of the methods as:
<ol>
    <li>Drip
    <li>Flood
    <li>Furrow
    <li>Sprinkler
</ol>

In the next section we will use ANOVA analysis to confirm our assumptions.

## Dominance Analysis

Using the table below we can test on a pairwise basis whether or not the sample means are the same or not. <br>
In this example, we use the t-test to determine whether the two methods that are being compared are the same or not. <br>
To determine if one variable is greater than another, we first compare the two means, then determine whether or not we can reject the null hypothesis of the means being equal. <br>
The table below outlines each mean, the test statistic, the p-value of the t-test performed and whether or not the test statistic was significant based on the p-value. 

In [None]:
dominance_results = irrigation.runDominance(df_Yield,"t-test")
dominance_results

As can be seen all the pairwise comparisons were statistically significant, meaning that we can adequately reject the null hypothesis that the means of each method that is being compared are equal. <br>
Our initial ranking of <b>Drip -> Flood -> Furrow -> Sprinkler </b> seems to hold true with the means of each being: 11.47, 10.05, 9.63 and 9.02 respectively.  Examining all the pairwise comparisons laid out in the table we show that the Drip method of irrigation worked best when comparing the Yields of the crop.  

## Power of Test

One last note on how extensively that this analysis can be <i>trusted</i> that it is correct.  With only 10 trials, one would be rightfully a little skeptical that you are meeting all the assumptions required in order to perform the tests as the framework requires. <br>
To test the power of the t-test we used, we can run the <b><i>determinePower</b></i> function to see how well our test works.<br>
Using an effect size equal to Cohen's d = 0.8 we have the following:

In [None]:
effect_size =  0.8
hub.determinePower(effect_size, 10)

More aptly, from the documentation: Power is the probability that the test correctly rejects the Null Hypothesis if the Alternative Hypothesis is true.  <br>
With the power of this test being roughly 0.62, we would have some confidence in this experiment to appropriately reject the null hypothesis of all the means being equal in favor the null that they are not. <br>
This then brings the question of how can we determine how many trials must be performed in order to have a good amount of <i>faith</i> in the experiment?  In order to do so, we need to solve for the sample size based on the effect we'd expect to see and the alpha value we are using as our type I tolerance.

In [None]:
size = hub.determineSampleSize(0.8)
size

# Analysis of Irrigation Methods Using Growth Rates

## Examine the data visually

In [None]:
df_GrowthRate.boxplot(grid=False)

Looking at box plot above, it looks as though the ranking of the methods for growth rate is: 
1. Drip
2. Furrow
3. Flood
4. Sprinkler

Now that a visual representation has been established let's use stats to ensure that our intuition holds up to mathematical rigor.

### Calculate the Mean Yield

For each of the different Irrigation Methods, calculate the mean yield of the crop grown.  The mean along with the Standard Deviation will give us an idea of whether or not we can make claims that one method is better than another.

In [None]:
growthrates = irrigation.calculateMeans(df_GrowthRate,sort_order="d")
growthrates

### Write Out Hypothesis

Looking at the means, it is natural to rank the effectiveness of the methods as:
<ol>
    <li>Drip
    <li>Furrow
    <li>Flood
    <li>Sprinkler
</ol>

In the next section we will use ANOVA analysis to confirm our assumptions.

### ANOVA Analysis

Run the cell below to get the f-statistic and p-value for the growth rates of different irrigation methods.

In [None]:
f_GrowthRate, p_GrowthRate = irrigation.runANOVA(df_GrowthRate)
print(f_GrowthRate, p_GrowthRate)

As can be seen from the very large F statistic and correspondingly small p-value, we can begin to assert some level of confidence that our null hypothesis is rightfully rejected.

### Dominance Analysis

Using the table below we can test on a pairwise basis whether or not the sample means are the same or not. <br>
In this example, we use the t-test to determine whether the two methods that are being compared are the same or not. <br>
To determine if one variable is greater than another, we first compare the two means, then determine whether or not we can reject the null hypothesis of the means being equal. <br>
The table below outlines each mean, the test statistic, the p-value of the t-test performed and whether or not the test statistic was significant based on the p-value. 

In [None]:
dominance_results = irrigation.runDominance(df_GrowthRate,"t-test")
dominance_results

As can be seen all the pairwise comparisons were statistically significant, meaning that we can adequately reject the null hypothesis that the means of each method that is being compared are equal. <br>
Our initial ranking of <b>Drip -> Flood -> Furrow -> Sprinkler </b> seems to hold true with the means of each being: 11.47, 10.05, 9.63 and 9.02 respectively.  Examining all the pairwise comparisons laid out in the table we show that the Drip method of irrigation worked best when comparing the Yields of the crop.  

## Power of Test

One last note on how extensively that this analysis can be <i>trusted</i> that it is correct.  With only 10 trials, one would be rightfully a little skeptical that you are meeting all the assumptions required in order to perform the tests as the framework requires. <br>
To test the power of the t-test we used, we can run the <b><i>determinePower</b></i> function to see how well our test works.<br>
Using an effect size equal to Cohen's d = 0.8 we have the following:

In [None]:
effect_size =  0.8
hub.determinePower(effect_size, 10)

More aptly, from the documentation: Power is the probability that the test correctly rejects the Null Hypothesis if the Alternative Hypothesis is true.  <br>
With the power of this test being roughly 0.62, we would have some confidence in this experiment to appropriately reject the null hypothesis of all the means being equal in favor the null that they are not. <br>
This then brings the question of how can we determine how many trials must be performed in order to have a good amount of <i>faith</i> in the experiment?  In order to do so, we need to solve for the sample size based on the effect we'd expect to see and the alpha value we are using as our type I tolerance.

In [None]:
size = hub.determineSampleSize(0.8)
size