# Statistika

## Ukuran Pemusatan Data

Ukuran pemusatan data meliputi rata-rata, mode dan median. Untuk mencari-nya kita akan menggabungkan library dataframe untuk membaca data dari file CSV, kemudian diteruskan dengan gonum untuk menghitung rata-rata, mode dan median. 

In [20]:
import (
    "sort"
    "github.com/go-gota/gota/dataframe"
    "gonum.org/v1/gonum/stat"
)

func getDataframe(file string) (dataframe.DataFrame, error) {
    var df dataframe.DataFrame 
    f, err := os.Open(file)
    if err != nil {
        return df, err
    }
    defer f.Close()

    return dataframe.ReadCSV(f), nil
}

In [25]:
%%
df, err := getDataframe("../data/iris.csv")
if err != nil {
    fmt.Println(err)
    return
}

sepalLength := df.Col("Sepal Length").Float() 
// Untuk mencari median di gonum, data perlu diurutkan terlebih dahulu
sort.Float64s(sepalLength)
fmt.Println(sepalLength)

mean := stat.Mean(sepalLength, nil) 
mode, modeCount := stat.Mode(sepalLength, nil) 
median := stat.Quantile(0.5, stat.Empirical, sepalLength, nil)

fmt.Printf("\nSepal Length Summary Statistics:\n") 
fmt.Printf("Mean value: %0.2f\n", mean) 
fmt.Printf("Mode value: %0.2f\n", mode)
fmt.Printf("Mode count: %d\n", int(modeCount)) 
fmt.Printf("Median value: %0.2f\n\n", median)

[4.3 4.4 4.4 4.4 4.5 4.6 4.6 4.6 4.6 4.7 4.7 4.8 4.8 4.8 4.8 4.8 4.9 4.9 4.9 4.9 4.9 4.9 5 5 5 5 5 5 5 5 5 5 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.2 5.2 5.2 5.2 5.3 5.4 5.4 5.4 5.4 5.4 5.4 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.6 5.6 5.6 5.6 5.6 5.6 5.7 5.7 5.7 5.7 5.7 5.7 5.7 5.7 5.8 5.8 5.8 5.8 5.8 5.8 5.8 5.9 5.9 5.9 6 6 6 6 6 6 6.1 6.1 6.1 6.1 6.1 6.1 6.2 6.2 6.2 6.2 6.3 6.3 6.3 6.3 6.3 6.3 6.3 6.3 6.3 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.5 6.5 6.5 6.5 6.5 6.6 6.6 6.7 6.7 6.7 6.7 6.7 6.7 6.7 6.7 6.8 6.8 6.8 6.9 6.9 6.9 6.9 7 7.1 7.2 7.2 7.2 7.3 7.4 7.6 7.7 7.7 7.7 7.7 7.9]

Sepal Length Summary Statistics:
Mean value: 5.84
Mode value: 5.00
Mode count: 10
Median value: 5.80



Mean, median dan mode pada "Sepal Length" mempunyai nilai yang mirip. Ini menandakan bahwa Sepal Length terdistribusi secara normal.

In [26]:
%%
df, err := getDataframe("../data/iris.csv")
if err != nil {
    fmt.Println(err)
    return
}

petalLength := df.Col("Petal Length").Float() 
// Untuk mencari median di gonum, data perlu diurutkan terlebih dahulu
sort.Float64s(petalLength)
fmt.Println(petalLength)

mean := stat.Mean(petalLength, nil) 
mode, modeCount := stat.Mode(petalLength, nil) 
median := stat.Quantile(0.5, stat.Empirical, petalLength, nil)

fmt.Printf("\nSepal Length Summary Statistics:\n") 
fmt.Printf("Mean value: %0.2f\n", mean) 
fmt.Printf("Mode value: %0.2f\n", mode)
fmt.Printf("Mode count: %d\n", int(modeCount)) 
fmt.Printf("Median value: %0.2f\n\n", median)

[1 1.1 1.2 1.2 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.7 1.7 1.7 1.7 1.9 1.9 3 3.3 3.3 3.5 3.5 3.6 3.7 3.8 3.9 3.9 3.9 4 4 4 4 4 4.1 4.1 4.1 4.2 4.2 4.2 4.2 4.3 4.3 4.4 4.4 4.4 4.4 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.6 4.6 4.6 4.7 4.7 4.7 4.7 4.7 4.8 4.8 4.8 4.8 4.9 4.9 4.9 4.9 4.9 5 5 5 5 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.1 5.2 5.2 5.3 5.3 5.4 5.4 5.5 5.5 5.5 5.6 5.6 5.6 5.6 5.6 5.6 5.7 5.7 5.7 5.8 5.8 5.8 5.9 5.9 6 6 6.1 6.1 6.1 6.3 6.4 6.6 6.7 6.7 6.9]

Sepal Length Summary Statistics:
Mean value: 3.76
Mode value: 1.50
Mode count: 14
Median value: 4.30



Mean, median dan mode pada "Petal Length" mempunyai nilai yang tidak terlalu dekat. Ini menandakan bahwa Petal Length terdistribusi secara miring (skewed distribution).

## Ukuran Penyebaran Data

Ukuran penyebaran data meliputi maksimum, minimum, range, varian, standar deviasi, dan quantile/quartil. 

In [29]:
%%
df, err := getDataframe("../data/iris.csv")
if err != nil {
    fmt.Println(err)
    return
}

sepalLength := df.Col("Sepal Length").Float() 
max := floats.Max(sepalLength)
min := floats.Min(sepalLength)
variance := stat.Variance(sepalLength, nil)
stddev := math.Sqrt(variance)

sort.Float64s(sepalLength)
quant25 := stat.Quantile(0.25, stat.Empirical, sepalLength, nil)
quant75 := stat.Quantile(0.75, stat.Empirical, sepalLength, nil)

fmt.Printf("\nSepal Length Summary Statistics:\n") 
fmt.Printf("Max value: %0.2f\n", max)
fmt.Printf("Min value: %0.2f\n", min)
fmt.Printf("Range value: %0.2f\n", max-min)
fmt.Printf("Variance value: %0.2f\n", variance) 
fmt.Printf("Std Dev value: %0.2f\n", stddev)
fmt.Printf("25 Quantile: %0.2f\n", quant25) 
fmt.Printf("75 Quantile: %0.2f\n\n", quant75)


Sepal Length Summary Statistics:
Max value: 7.90
Min value: 4.30
Range value: 3.60
Variance value: 0.69
Std Dev value: 0.83
25 Quantile: 5.10
75 Quantile: 6.40

