# Perform statistical analyses

# Document

<table align="left">
    <tr>
        <th class="text-align:left">Title</th>
        <td class="text-align:left">Perform statistical analyses</td>
    </tr>
    <tr>
        <th class="text-align:left">Last modified</th>
        <td class="text-align:left">2019-01-31</td>
    </tr>
    <tr>
        <th class="text-align:left">Author</th>
        <td class="text-align:left">Gilles Pilon <gillespilon13@gmail.com></td>
    </tr>
    <tr>
        <th class="text-align:left">Status</th>
        <td class="text-align:left">Active</td>
    </tr>
    <tr>
        <th class="text-align:left">Type</th>
        <td class="text-align:left">Jupyter notebook</td>
    </tr>
    <tr>
        <th class="text-align:left">Created</th>
        <td class="text-align:left">2018-12-21</td>
    </tr>
    <tr>
        <th class="text-align:left">File name</th>
        <td class="text-align:left">statistics.ipynb</td>
    </tr>
    <tr>
        <th class="text-align:left">Other files required</th>
        <td class="text-align:left">thirteen_weeks.csv</td>
    </tr>
</table>

# Ideas

- Built-in statistics.
- Parametric statistics.
- Non-parametric statistics.
- Simple linear regression.

In [1]:
import pandas as pd

In [2]:
# Read a csv file. This file is raw, not munged.
FILE_RAW = 'thirteen_weeks.csv'
df = pd.read_csv(FILE_RAW,
                 parse_dates=True,
                 index_col='Time')

## Built-in statistics

In [3]:
df.dtypes

Water Load (lb/MSF)            float64
Trim Board Density (lb/cft)    float64
Trim Board Thickness (in)      float64
Trim Board Weight (lb/sft)     float64
Wool Target (%)                float64
Wool Usage (%)                 float64
Wool Flow (lb/min)             float64
Starch Target (%)              float64
Starch Usage (%)               float64
Starch Flow (lb/min)           float64
Clay Target (%)                float64
Clay Usage (%)                   int64
Clay Flow (lb/min)             float64
Newsprint Target (%)           float64
Newsprint Usage (%)            float64
Perlite Target (%)             float64
Perlite Usage (%)              float64
Wet Clay Target (%)            float64
Wet Clay Usage (%)               int64
Wet Gypsum Target (%)          float64
Wet Gypsum Usage (%)             int64
Wet Broke Target (%)           float64
Wet Broke Usage (%)            float64
Dust Target (%)                float64
Dust Usage (%)                 float64
Broke Target (%)         

In [4]:
# For a single numeric column.
df['Water Load (lb/MSF)'].describe()

count    4275.000000
mean     1088.845725
std       128.663227
min       943.770020
25%      1012.230042
50%      1041.569946
75%      1075.800049
max      1486.560059
Name: Water Load (lb/MSF), dtype: float64

In [5]:
# For a single object column.
df['Trim Board Density (lb/cft)'].describe()

count    4271.000000
mean       11.694717
std         0.481829
min        10.542293
25%        11.435856
50%        11.598742
75%        11.956731
max        22.305628
Name: Trim Board Density (lb/cft), dtype: float64

In [6]:
# For all columns in a dataframe. Only numeric fields are returned.
df.describe()

Unnamed: 0,Water Load (lb/MSF),Trim Board Density (lb/cft),Trim Board Thickness (in),Trim Board Weight (lb/sft),Wool Target (%),Wool Usage (%),Wool Flow (lb/min),Starch Target (%),Starch Usage (%),Starch Flow (lb/min),...,Wet Gypsum Target (%),Wet Gypsum Usage (%),Wet Broke Target (%),Wet Broke Usage (%),Dust Target (%),Dust Usage (%),Broke Target (%),Broke Usage (%),Consistency Target (%),Consistency Actual (%)
count,4275.0,4271.0,4271.0,4267.0,1126.0,4275.0,4275.0,1126.0,4275.0,4275.0,...,1126.0,4275.0,1126.0,4275.0,1126.0,4275.0,1126.0,4275.0,4275.0,4275.0
mean,1088.845725,11.694717,0.598653,57.725521,18.211368,258.104949,23.932503,10.0,2.93394e+17,14.815349,...,0.0,0.0,0.517762,0.792902,12.791297,12.687469,7.989343,2.382471,2.382471,0.0
std,128.663227,0.481829,0.009049,2.953627,0.958561,901.637413,27.605636,0.0,1.918312e+19,14.321005,...,0.0,0.0,1.586371,3.276198,1.591074,33.573087,0.376973,4.061182,4.061182,0.0
min,943.77002,10.542293,0.299,0.0,16.0,0.0,4.2e-05,10.0,-42.54367,-1.235398,...,0.0,0.0,0.0,-0.000856,8.0,7.7e-05,6.0,-5.000071,-5.000071,0.0
25%,1012.230042,11.435856,0.594,56.299999,18.0,4.163668,0.077812,10.0,1.169661,-0.16805,...,0.0,0.0,0.0,0.001372,12.0,0.329074,8.0,0.0,0.0,0.0
50%,1041.569946,11.598742,0.596,57.099998,18.0,17.559425,0.470875,10.0,9.867329,6.758324,...,0.0,0.0,0.0,0.0051,14.0,9.200419,8.0,0.003685,0.003685,0.0
75%,1075.800049,11.956731,0.602,58.599998,18.0,20.963467,52.839138,10.0,10.3441,29.8506,...,0.0,0.0,0.0,0.257903,14.0,13.778525,8.0,4.528879,4.528879,0.0
max,1486.560059,22.305628,0.63,67.900002,20.0,7352.957031,232.154449,10.0,1.25426e+21,45.67025,...,0.0,0.0,8.0,100.278595,15.0,305.015625,12.0,60.165688,60.165688,0.0


In [7]:
# Describe all columns regardless of data type.
df.describe(include='all')

Unnamed: 0,Water Load (lb/MSF),Trim Board Density (lb/cft),Trim Board Thickness (in),Trim Board Weight (lb/sft),Wool Target (%),Wool Usage (%),Wool Flow (lb/min),Starch Target (%),Starch Usage (%),Starch Flow (lb/min),...,Wet Gypsum Target (%),Wet Gypsum Usage (%),Wet Broke Target (%),Wet Broke Usage (%),Dust Target (%),Dust Usage (%),Broke Target (%),Broke Usage (%),Consistency Target (%),Consistency Actual (%)
count,4275.0,4271.0,4271.0,4267.0,1126.0,4275.0,4275.0,1126.0,4275.0,4275.0,...,1126.0,4275.0,1126.0,4275.0,1126.0,4275.0,1126.0,4275.0,4275.0,4275.0
mean,1088.845725,11.694717,0.598653,57.725521,18.211368,258.104949,23.932503,10.0,2.93394e+17,14.815349,...,0.0,0.0,0.517762,0.792902,12.791297,12.687469,7.989343,2.382471,2.382471,0.0
std,128.663227,0.481829,0.009049,2.953627,0.958561,901.637413,27.605636,0.0,1.918312e+19,14.321005,...,0.0,0.0,1.586371,3.276198,1.591074,33.573087,0.376973,4.061182,4.061182,0.0
min,943.77002,10.542293,0.299,0.0,16.0,0.0,4.2e-05,10.0,-42.54367,-1.235398,...,0.0,0.0,0.0,-0.000856,8.0,7.7e-05,6.0,-5.000071,-5.000071,0.0
25%,1012.230042,11.435856,0.594,56.299999,18.0,4.163668,0.077812,10.0,1.169661,-0.16805,...,0.0,0.0,0.0,0.001372,12.0,0.329074,8.0,0.0,0.0,0.0
50%,1041.569946,11.598742,0.596,57.099998,18.0,17.559425,0.470875,10.0,9.867329,6.758324,...,0.0,0.0,0.0,0.0051,14.0,9.200419,8.0,0.003685,0.003685,0.0
75%,1075.800049,11.956731,0.602,58.599998,18.0,20.963467,52.839138,10.0,10.3441,29.8506,...,0.0,0.0,0.0,0.257903,14.0,13.778525,8.0,4.528879,4.528879,0.0
max,1486.560059,22.305628,0.63,67.900002,20.0,7352.957031,232.154449,10.0,1.25426e+21,45.67025,...,0.0,0.0,8.0,100.278595,15.0,305.015625,12.0,60.165688,60.165688,0.0


In [8]:
# Exclude all strings in any object column.
import numpy as np
df.describe(exclude=[np.object])

Unnamed: 0,Water Load (lb/MSF),Trim Board Density (lb/cft),Trim Board Thickness (in),Trim Board Weight (lb/sft),Wool Target (%),Wool Usage (%),Wool Flow (lb/min),Starch Target (%),Starch Usage (%),Starch Flow (lb/min),...,Wet Gypsum Target (%),Wet Gypsum Usage (%),Wet Broke Target (%),Wet Broke Usage (%),Dust Target (%),Dust Usage (%),Broke Target (%),Broke Usage (%),Consistency Target (%),Consistency Actual (%)
count,4275.0,4271.0,4271.0,4267.0,1126.0,4275.0,4275.0,1126.0,4275.0,4275.0,...,1126.0,4275.0,1126.0,4275.0,1126.0,4275.0,1126.0,4275.0,4275.0,4275.0
mean,1088.845725,11.694717,0.598653,57.725521,18.211368,258.104949,23.932503,10.0,2.93394e+17,14.815349,...,0.0,0.0,0.517762,0.792902,12.791297,12.687469,7.989343,2.382471,2.382471,0.0
std,128.663227,0.481829,0.009049,2.953627,0.958561,901.637413,27.605636,0.0,1.918312e+19,14.321005,...,0.0,0.0,1.586371,3.276198,1.591074,33.573087,0.376973,4.061182,4.061182,0.0
min,943.77002,10.542293,0.299,0.0,16.0,0.0,4.2e-05,10.0,-42.54367,-1.235398,...,0.0,0.0,0.0,-0.000856,8.0,7.7e-05,6.0,-5.000071,-5.000071,0.0
25%,1012.230042,11.435856,0.594,56.299999,18.0,4.163668,0.077812,10.0,1.169661,-0.16805,...,0.0,0.0,0.0,0.001372,12.0,0.329074,8.0,0.0,0.0,0.0
50%,1041.569946,11.598742,0.596,57.099998,18.0,17.559425,0.470875,10.0,9.867329,6.758324,...,0.0,0.0,0.0,0.0051,14.0,9.200419,8.0,0.003685,0.003685,0.0
75%,1075.800049,11.956731,0.602,58.599998,18.0,20.963467,52.839138,10.0,10.3441,29.8506,...,0.0,0.0,0.0,0.257903,14.0,13.778525,8.0,4.528879,4.528879,0.0
max,1486.560059,22.305628,0.63,67.900002,20.0,7352.957031,232.154449,10.0,1.25426e+21,45.67025,...,0.0,0.0,8.0,100.278595,15.0,305.015625,12.0,60.165688,60.165688,0.0


In [9]:
# Read a munged csv file.
FILE_MUNGED = 'thirteen_weeks_munged.csv'
df = pd.read_csv(FILE_MUNGED,
                 parse_dates=True,
                 index_col='Time')

In [10]:
# This results in the same as the exclude object code above.
pd.set_option('display.max_columns', 500)
df.describe()

Unnamed: 0,Water Load (lb/MSF),Trim Board Density (lb/cft),Trim Board Thickness (in),Trim Board Weight (lb/sft),Wool Target (%),Wool Usage (%),Wool Flow (lb/min),Starch Target (%),Starch Usage (%),Starch Flow (lb/min),Clay Target (%),Clay Usage (%),Clay Flow (lb/min),Newsprint Target (%),Newsprint Usage (%),Perlite Target (%),Perlite Usage (%),Wet Clay Target (%),Wet Clay Usage (%),Wet Gypsum Target (%),Wet Gypsum Usage (%),Wet Broke Target (%),Wet Broke Usage (%),Dust Target (%),Dust Usage (%),Broke Target (%),Broke Usage (%),Consistency Target (%),Consistency Actual (%)
count,15841.0,15733.0,15733.0,15607.0,12168.0,15841.0,15841.0,12168.0,15841.0,15841.0,12168.0,15841.0,15841.0,12168.0,15841.0,12168.0,15841.0,12168.0,15841.0,12168.0,15841.0,12168.0,15841.0,12168.0,15841.0,12168.0,15841.0,15841.0,15841.0
mean,1059.56092,11.644561,0.59572,57.206901,18.842538,1356.251861,30.105139,10.0,1.859247,17.007565,0.0,0.0,-0.979008,14.0,10.292116,57.157462,179.576793,0.0,0.0,0.0,0.0,0.825583,1.001813,11.743754,5.963129,7.924556,1.894697,1.894697,0.0
std,76.849443,0.326772,0.006344,1.700297,1.205529,2177.037398,27.455114,0.0,27.583559,15.952152,0.0,0.0,0.313479,0.0,15.378247,1.205529,986.924703,0.0,0.0,0.0,0.0,2.116682,4.819223,1.319167,5.396391,0.54867,3.621378,3.621378,0.0
min,948.660095,10.720471,0.567,53.400002,16.0,0.0,3e-06,10.0,-42.655109,-0.858497,0.0,0.0,-1.234821,14.0,2.1e-05,56.0,-377.465424,0.0,0.0,0.0,0.0,0.0,0.0,8.0,1.2e-05,6.0,-0.017936,-0.017936,0.0
25%,1031.789917,11.444907,0.594,56.200001,18.0,16.882212,0.091215,10.0,-13.481616,-0.543541,0.0,0.0,-1.223897,14.0,0.125339,56.0,57.99353,0.0,0.0,0.0,0.0,0.0,0.001902,12.0,0.341094,8.0,0.000705,0.000705,0.0
50%,1056.23999,11.777571,0.596,57.799999,18.0,18.535389,42.925354,10.0,9.7271,24.46883,0.0,0.0,-1.094978,14.0,13.64046,58.0,59.224125,0.0,0.0,0.0,0.0,0.0,0.077472,12.0,8.005055,8.0,0.002334,0.002334,0.0
75%,1056.23999,11.777571,0.597,57.900002,20.0,2287.714111,55.27269,10.0,9.96209,31.839529,0.0,0.0,-0.80126,14.0,13.973696,58.0,135.767273,0.0,0.0,0.0,0.0,0.0,0.293917,12.0,10.532058,8.0,1.792655,1.792655,0.0
max,1452.330078,13.197065,0.617,67.300003,20.0,7372.216797,86.67675,10.0,165.930817,47.899357,0.0,0.0,1.220639,14.0,104.402199,60.0,19776.449219,0.0,0.0,0.0,0.0,22.5,100.358955,14.0,45.160435,12.0,94.812431,94.812431,0.0


## Parametric statistics

In [11]:
# Average of each column.
df.mean()

Water Load (lb/MSF)            1059.560920
Trim Board Density (lb/cft)      11.644561
Trim Board Thickness (in)         0.595720
Trim Board Weight (lb/sft)       57.206901
Wool Target (%)                  18.842538
Wool Usage (%)                 1356.251861
Wool Flow (lb/min)               30.105139
Starch Target (%)                10.000000
Starch Usage (%)                  1.859247
Starch Flow (lb/min)             17.007565
Clay Target (%)                   0.000000
Clay Usage (%)                    0.000000
Clay Flow (lb/min)               -0.979008
Newsprint Target (%)             14.000000
Newsprint Usage (%)              10.292116
Perlite Target (%)               57.157462
Perlite Usage (%)               179.576793
Wet Clay Target (%)               0.000000
Wet Clay Usage (%)                0.000000
Wet Gypsum Target (%)             0.000000
Wet Gypsum Usage (%)              0.000000
Wet Broke Target (%)              0.825583
Wet Broke Usage (%)               1.001813
Dust Target

In [12]:
# Standard deviation of each column.
df.std()

Water Load (lb/MSF)              76.849443
Trim Board Density (lb/cft)       0.326772
Trim Board Thickness (in)         0.006344
Trim Board Weight (lb/sft)        1.700297
Wool Target (%)                   1.205529
Wool Usage (%)                 2177.037398
Wool Flow (lb/min)               27.455114
Starch Target (%)                 0.000000
Starch Usage (%)                 27.583559
Starch Flow (lb/min)             15.952152
Clay Target (%)                   0.000000
Clay Usage (%)                    0.000000
Clay Flow (lb/min)                0.313479
Newsprint Target (%)              0.000000
Newsprint Usage (%)              15.378247
Perlite Target (%)                1.205529
Perlite Usage (%)               986.924703
Wet Clay Target (%)               0.000000
Wet Clay Usage (%)                0.000000
Wet Gypsum Target (%)             0.000000
Wet Gypsum Usage (%)              0.000000
Wet Broke Target (%)              2.116682
Wet Broke Usage (%)               4.819223
Dust Target

In [13]:
# Range of each column
df.max() - df.min()

Water Load (lb/MSF)              503.669983
Trim Board Density (lb/cft)        2.476594
Trim Board Thickness (in)          0.050000
Trim Board Weight (lb/sft)        13.900002
Wool Target (%)                    4.000000
Wool Usage (%)                  7372.216797
Wool Flow (lb/min)                86.676747
Starch Target (%)                  0.000000
Starch Usage (%)                 208.585926
Starch Flow (lb/min)              48.757854
Clay Target (%)                    0.000000
Clay Usage (%)                     0.000000
Clay Flow (lb/min)                 2.455460
Newsprint Target (%)               0.000000
Newsprint Usage (%)              104.402178
Perlite Target (%)                 4.000000
Perlite Usage (%)              20153.914642
Wet Clay Target (%)                0.000000
Wet Clay Usage (%)                 0.000000
Wet Gypsum Target (%)              0.000000
Wet Gypsum Usage (%)               0.000000
Wet Broke Target (%)              22.500000
Wet Broke Usage (%)             

## Non-parametric statistics

In [14]:
df.median()

Water Load (lb/MSF)            1056.239990
Trim Board Density (lb/cft)      11.777571
Trim Board Thickness (in)         0.596000
Trim Board Weight (lb/sft)       57.799999
Wool Target (%)                  18.000000
Wool Usage (%)                   18.535389
Wool Flow (lb/min)               42.925354
Starch Target (%)                10.000000
Starch Usage (%)                  9.727100
Starch Flow (lb/min)             24.468830
Clay Target (%)                   0.000000
Clay Usage (%)                    0.000000
Clay Flow (lb/min)               -1.094978
Newsprint Target (%)             14.000000
Newsprint Usage (%)              13.640460
Perlite Target (%)               58.000000
Perlite Usage (%)                59.224125
Wet Clay Target (%)               0.000000
Wet Clay Usage (%)                0.000000
Wet Gypsum Target (%)             0.000000
Wet Gypsum Usage (%)              0.000000
Wet Broke Target (%)              0.000000
Wet Broke Usage (%)               0.077472
Dust Target

In [15]:
df.quantile(.5) # median

Water Load (lb/MSF)            1056.239990
Trim Board Density (lb/cft)      11.777571
Trim Board Thickness (in)         0.596000
Trim Board Weight (lb/sft)       57.799999
Wool Target (%)                  18.000000
Wool Usage (%)                   18.535389
Wool Flow (lb/min)               42.925354
Starch Target (%)                10.000000
Starch Usage (%)                  9.727100
Starch Flow (lb/min)             24.468830
Clay Target (%)                   0.000000
Clay Usage (%)                    0.000000
Clay Flow (lb/min)               -1.094978
Newsprint Target (%)             14.000000
Newsprint Usage (%)              13.640460
Perlite Target (%)               58.000000
Perlite Usage (%)                59.224125
Wet Clay Target (%)               0.000000
Wet Clay Usage (%)                0.000000
Wet Gypsum Target (%)             0.000000
Wet Gypsum Usage (%)              0.000000
Wet Broke Target (%)              0.000000
Wet Broke Usage (%)               0.077472
Dust Target

In [16]:
# Find the interquartile range.
df.quantile(0.75) - df.quantile(0.25)

Water Load (lb/MSF)              24.450073
Trim Board Density (lb/cft)       0.332664
Trim Board Thickness (in)         0.003000
Trim Board Weight (lb/sft)        1.700001
Wool Target (%)                   2.000000
Wool Usage (%)                 2270.831900
Wool Flow (lb/min)               55.181475
Starch Target (%)                 0.000000
Starch Usage (%)                 23.443706
Starch Flow (lb/min)             32.383070
Clay Target (%)                   0.000000
Clay Usage (%)                    0.000000
Clay Flow (lb/min)                0.422637
Newsprint Target (%)              0.000000
Newsprint Usage (%)              13.848357
Perlite Target (%)                2.000000
Perlite Usage (%)                77.773743
Wet Clay Target (%)               0.000000
Wet Clay Usage (%)                0.000000
Wet Gypsum Target (%)             0.000000
Wet Gypsum Usage (%)              0.000000
Wet Broke Target (%)              0.000000
Wet Broke Usage (%)               0.292014
Dust Target

In [17]:
# Compare the interquartile range to the standard deviation.
print((df.quantile(.75) - df.quantile(.25)), df.std())

Water Load (lb/MSF)              24.450073
Trim Board Density (lb/cft)       0.332664
Trim Board Thickness (in)         0.003000
Trim Board Weight (lb/sft)        1.700001
Wool Target (%)                   2.000000
Wool Usage (%)                 2270.831900
Wool Flow (lb/min)               55.181475
Starch Target (%)                 0.000000
Starch Usage (%)                 23.443706
Starch Flow (lb/min)             32.383070
Clay Target (%)                   0.000000
Clay Usage (%)                    0.000000
Clay Flow (lb/min)                0.422637
Newsprint Target (%)              0.000000
Newsprint Usage (%)              13.848357
Perlite Target (%)                2.000000
Perlite Usage (%)                77.773743
Wet Clay Target (%)               0.000000
Wet Clay Usage (%)                0.000000
Wet Gypsum Target (%)             0.000000
Wet Gypsum Usage (%)              0.000000
Wet Broke Target (%)              0.000000
Wet Broke Usage (%)               0.292014
Dust Target

In [18]:
import datasense as ds
ds.parametric_summary(df['Water Load (lb/MSF)'])

n      15841.000000
min      948.660095
max     1452.330078
ave     1059.560920
s         76.849443
var     5905.836817
dtype: float64

In [19]:
for column_name in df.columns:
    print(column_name, '\n', ds.parametric_summary(df[column_name]))

Water Load (lb/MSF) 
 n      15841.000000
min      948.660095
max     1452.330078
ave     1059.560920
s         76.849443
var     5905.836817
dtype: float64
Trim Board Density (lb/cft) 
 n      15733.000000
min       10.720471
max       13.197065
ave       11.644561
s          0.326772
var        0.106780
dtype: float64
Trim Board Thickness (in) 
 n      15733.000000
min        0.567000
max        0.617000
ave        0.595720
s          0.006344
var        0.000040
dtype: float64
Trim Board Weight (lb/sft) 
 n      15607.000000
min       53.400002
max       67.300003
ave       57.206901
s          1.700297
var        2.891012
dtype: float64
Wool Target (%) 
 n      12168.000000
min       16.000000
max       20.000000
ave       18.842538
s          1.205529
var        1.453300
dtype: float64
Wool Usage (%) 
 n      1.584100e+04
min    0.000000e+00
max    7.372217e+03
ave    1.356252e+03
s      2.177037e+03
var    4.739492e+06
dtype: float64
Wool Flow (lb/min) 
 n      15841.000000
min  

In [20]:
for column_name in df.columns:
    print(column_name, '\n', ds.nonparametric_summary(df[column_name]))

Water Load (lb/MSF) 
                    n         min           q1          q2          q3  \
interpolation                                                           
linear         15841  948.660095  1031.789917  1056.23999  1056.23999   
lower          15841  948.660095  1031.789917  1056.23999  1056.23999   
higher         15841  948.660095  1031.789917  1056.23999  1056.23999   
nearest        15841  948.660095  1031.789917  1056.23999  1056.23999   
midpoint       15841  948.660095  1031.789917  1056.23999  1056.23999   

                     iqr          max  
interpolation                          
linear         24.450073  1452.330078  
lower          24.450073  1452.330078  
higher         24.450073  1452.330078  
nearest        24.450073  1452.330078  
midpoint       24.450073  1452.330078  
Trim Board Density (lb/cft) 
                    n        min         q1         q2         q3       iqr  \
interpolation                                                                


Newsprint Usage (%) 
                    n       min        q1        q2         q3        iqr  \
interpolation                                                              
linear         15841  0.000021  0.125339  13.64046  13.973696  13.848357   
lower          15841  0.000021  0.125339  13.64046  13.973696  13.848357   
higher         15841  0.000021  0.125339  13.64046  13.973696  13.848357   
nearest        15841  0.000021  0.125339  13.64046  13.973696  13.848357   
midpoint       15841  0.000021  0.125339  13.64046  13.973696  13.848357   

                      max  
interpolation              
linear         104.402199  
lower          104.402199  
higher         104.402199  
nearest        104.402199  
midpoint       104.402199  
Perlite Target (%) 
                    n   min    q1    q2    q3  iqr   max
interpolation                                          
linear         12168  56.0  56.0  58.0  58.0  2.0  60.0
lower          12168  56.0  56.0  58.0  58.0  2.0  60.0
high

## Simple linear regression

In [21]:
# x = water load, y = trim board thickness
import statsmodels.api as sm
df2 = pd.read_csv('particle_board.csv')

In [22]:
x = df2['Density']
y = df2['Stiffness']
x = sm.add_constant(x)

In [23]:
model = sm.OLS(y, x)
results = model.fit()
predictions = results.predict(x)
results.summary()

0,1,2,3
Dep. Variable:,Stiffness,R-squared:,0.845
Model:,OLS,Adj. R-squared:,0.839
Method:,Least Squares,F-statistic:,146.9
Date:,"Thu, 31 Jan 2019",Prob (F-statistic):,1.98e-12
Time:,15:29:22,Log-Likelihood:,-103.41
No. Observations:,29,AIC:,210.8
Df Residuals:,27,BIC:,213.6
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-21.5338,4.735,-4.547,0.000,-31.250,-11.817
Density,3.5405,0.292,12.119,0.000,2.941,4.140

0,1,2,3
Omnibus:,8.16,Durbin-Watson:,1.748
Prob(Omnibus):,0.017,Jarque-Bera (JB):,6.591
Skew:,0.891,Prob(JB):,0.037
Kurtosis:,4.509,Cond. No.,46.8


# References

- [pandas describe](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html)

- [pandas basic functionality](https://pandas.pydata.org/pandas-docs/stable/basics.html)