## The Scientific Method
It is the process by which science is carried out. The general idea is to build on previous knowledge to in order to improve an understanding of a given topic. 

1. Formulate the question

2. Generate a hypothesis to address the question

3. Make a prediction

4. Conduct an experiment

5. Analyze the data and draw a conclusion

We will continue with an interactive example, but first it is important to note that Scientific experiments must be repeatable in order to become reliable evidence. 

### Question
The question can be open-ended and generally it summarizes your business opportunity. Let’s say you work for a small business that manufactures sleds and other winter gear and you are not sure which cities to build your next retail locations. You have heard that Utah, Colorado and Vermont are all states that have high rates of snowfall, but it is unclear which one has the highest rate of snowfall.

### Hypothesis
Because the Rocky mountains are higher in elevation and they are well-known for fresh powder on their ski slopes, you hypothesize that both Utah and Colorado have more snow than Vermont.

### Prediction
If you were to run a hypothesis test, you would find that Vermont has significantly less snow fall than Colorado or Utah

### Experiment
You hit the [NOAA weather API](https://www.ncdc.noaa.gov/cdo-web/webservices/v2) to get average annual snowfall by city. We have compiled these data for you in snowfall.csv

---

You could use a 1-way ANOVA to test the validity of your prediction, but let’s start by looking at the data.

First we read in the data:

In [1]:
import pandas as pd
df = pd.read_csv("../data/snowfall.csv")

In [2]:
df.head()

Unnamed: 0,rank,location,snowfall,state,city,lat,long,elevation
0,1,VALDEZ,316.8,AK,Valdez,61.12994,-146.349364,6.8
1,2,MT. WASHINGTON,260.0,NH,Mt. Washington,44.27046,-71.303531,1913.4
2,3,BLUE CANYON,240.3,CA,Blue Canyon,39.257275,-120.710825,1405.3
3,4,YAKUTAT,190.3,AK,Yakutat,59.572735,-139.578312,26.0
4,5,MARQUETTE,149.1,MI,Marquette,46.543491,-87.396433,


Next, subset the data to focus only on the states of interest

In [3]:
df1 = df[df['state'].isin(['CO','UT','VT'])]

In [7]:
df1

Unnamed: 0,rank,location,snowfall,state,city,lat,long,elevation
23,24,BURLINGTON,80.9,VT,Burlington,44.472399,-73.211494,52.2
47,48,DENVER,59.6,CO,Denver,39.739154,-104.984703,1608.6
48,49,SALT LAKE CITY,58.2,UT,Salt Lake City,40.767013,-111.890431,1314.9
72,73,MILFORD,45.1,UT,Milford,38.396911,-113.010789,1515.2
91,92,COLORADO SPRINGS,40.8,CO,Colorado Springs,38.833958,-104.825349,1831.7
110,111,ALAMOSA,32.5,CO,Alamosa,37.469877,-105.869601,2299.6
111,112,PUEBLO,32.5,CO,Pueblo,38.254447,-104.609141,1425.5
138,139,GRAND JUNCTION,23.4,CO,Grand Junction,39.063956,-108.550732,1400.0


Finally, create a pivot of the data that focuses only on the relevant summary data

In [5]:
df1_pivot = pd.pivot_table(df1, values='snowfall', index='state',
                            aggfunc=['count', 'mean', 'max'])

print(df1_pivot)

         count     mean      max
      snowfall snowfall snowfall
state                           
CO           5    37.76     59.6
UT           2    51.65     58.2
VT           1    80.90     80.9


### Analyze
1. There is not enough data to do a 1-way ANOVA. The experiment is not a failure; it has a few pieces of information.

2. There is not enough data

3. There is a small possibility that VT gets more snow on average than either CO or UT

Our degree of belief in the conclusion drawn from (2) is very small because of (1)

The notion of degree of belief is central to scientific thinking. It is somehow a part of our human nature to believe statements that have little to no supporting evidence. In science the word belief, with respect to a hypothesis is proportional to the evidence. With more evidence available, ideally, from repeated experiments, one’s degree of belief should change. Evidence is derived from the process described above and if we have none then we are stuck at the question stage and a proper scientific hypothesiscannot be made.

The other important side to degree of belief is that it never caps out at 100 percent certainty. Some hypotheses have become laws like Newton’s Law of Gravitation, but most natural phenomena in the world outside of physics cannot be explained as a law.

A hypothesis is the simplest explanation of a phenomenon. A scientific theory is an in-depth explanation of the observed phenomenon. Do not be mistaken with the word theory, there can be sufficient evidence that your degree of belief all but touches 100%, and is plenty for decision making purposes. A built-in safeguard for scientific thought is that our degree of belief does not reach 100%, which leaves some room to find new evidence that could move the dial in the other direction.

There are additional factors like external peer review that help ensure the integrity of the scientific method and in the case of implementing a model for a specific business task this could mean assigning reviewers for a pull request or simply asking other qualified individuals to check over your work.