In [71]:
using LightGraphs
using StatsBase
using Plots
using GLM
using DataFrames

Carica il grafo e popola un dizionario in cui ogni chiave e' il grado. Il valore e' invece il numero di nodi che hanno quel determinato grado.

In [72]:
g = loadgraph("../testGraphs/internet.lg", "graph")
deg = degree_centrality(g, normalize=false)
occurences = countmap(deg)
# println(deg)
# println(occurences)
# println(length(occurences))

Dict{Float64,Int64} with 161 entries:
  68.0  => 3
  532.0 => 1
  658.0 => 1
  148.0 => 1
  2.0   => 9700
  89.0  => 2
  288.0 => 1
  306.0 => 1
  11.0  => 96
  197.0 => 1
  46.0  => 4
  39.0  => 5
  85.0  => 1
  25.0  => 21
  755.0 => 1
  66.0  => 2
  29.0  => 15
  58.0  => 1
  42.0  => 4
  55.0  => 2
  59.0  => 2
  207.0 => 1
  8.0   => 173
  142.0 => 1
  150.0 => 1
  ⋮     => ⋮

Gli Array X e Y sono usati per rappresentare la distribuzione

In [73]:
X = Array{Int64}(length(occurences))
Y = Array{Int64}(length(occurences))

i = 1
for (x, y) in occurences
    X[i] = x
    Y[i] = y
    i = i + 1
end

# println(X)
# println(Y)

In [74]:
scatter(X, Y)

### Ordinary least square

Usiamo il pacchetto GLM per calcolare OLS

In [75]:
data = DataFrame(X=X, Y=Y)
OLS = glm(@formula(Y ~ X), data, Normal(), IdentityLink())

DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Normal{Float64},GLM.IdentityLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: Y ~ 1 + X

Coefficients:
              Estimate Std.Error  z value Pr(>|z|)
(Intercept)    191.952   91.2064  2.10459   0.0353
X            -0.247251  0.233602 -1.05843   0.2899


### Retta di regressione

In [76]:
f(x) = coef(OLS)[1] + coef(OLS)[2]*x

f (generic function with 1 method)

In [77]:
p1 = plot(X, Y,seriestype=:scatter)
plot!(f)

## Log-log plot
Nel mio caso i valore della funzione:

$$ \log(f(k)) = \log(\alpha) - \beta \log(k)$$

corrispondono a $\alpha = 9.16125$ e $\beta = 1.15617$

$\beta$ si avvicina al valore che ci si poteva aspettare cioe' ~2

In [78]:
logX = map(x -> log(x), X)
logY = map(y -> log(y), Y)
data = DataFrame(X=logX, Y=logY)
OLS = glm(@formula(Y ~ X), data, Normal(), IdentityLink())

DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Normal{Float64},GLM.IdentityLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: Y ~ 1 + X

Coefficients:
             Estimate Std.Error  z value Pr(>|z|)
(Intercept)   6.35009  0.257849  24.6272   <1e-99
X            -1.15617 0.0552885 -20.9116   <1e-96


In [79]:
f(x) = coef(OLS)[1] + coef(OLS)[2]*x
p1 = plot(logX, logY,seriestype=:scatter)
plot!(f)

In [80]:
baGraph = barabasi_albert(100000, 3)

{100000, 299991} undirected simple Int64 graph

In [87]:
deg = degree_centrality(baGraph, normalize=false)
occurences = countmap(deg)
X = Array{Int64}(length(occurences))
Y = Array{Int64}(length(occurences))

i = 1
for (x, y) in occurences
    X[i] = x
    Y[i] = y
    i = i + 1
end
logX = map(x -> log(x), X)
logY = map(y -> log(y), Y)
data = DataFrame(X=logX, Y=logY)
OLS = glm(@formula(Y ~ X), data, Normal(), IdentityLink())

DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Normal{Float64},GLM.IdentityLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: Y ~ 1 + X

Coefficients:
             Estimate Std.Error  z value Pr(>|z|)
(Intercept)   11.2994  0.293928  38.4428   <1e-99
X             -2.1162 0.0648773 -32.6185   <1e-99


In [88]:
f(x) = coef(OLS)[1] + coef(OLS)[2]*x
p1 = plot(logX, logY,seriestype=:scatter)
plot!(f)