# Job Applicants and Unemployment Rate + YAGO

Budu hledat pravidla nad jednou kostkou, pro kterou jsem si pro hodnoty její dimenze oblasti našel data z YAGO znalostního grafu. Datová kostka má IRI:
http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate

V popisu budu používat prefix czso: http://data.czso.cz/ontology/

Má tyto dimenze:
* czso:refArea
    * 13 krajů (bez Prahy)
    * 76 okresů (to vychází bez Prahy)
* czso:refPeriod
    * roky 2005 - 2013
* czso:sex
    * http://purl.org/linked-data/sdmx/2009/code#sex-F -> ženy
    * http://purl.org/linked-data/sdmx/2009/code#sex-M -> muži
    * http://purl.org/linked-data/sdmx/2009/code#sex-T -> ženy + muži

A tyto míry:
* czso:neumisteniUchazeciOZamestnani
* czso:dosazitelniNeumisteniUchazeciOZamestnani
* czso:podilNezamestnanych
* czso:pocetVolnychMist

Každé pozorování má více měr a míry czso:pocetVolnychMist a czso:neumisteniUchazeciOZamestnani jsou uvedeny jenom u pozorování pro obě pohlaví celkem. Příklady pozorování:

```turtle
<http://data.czso.cz/resource/observation/job-applicants-and-unemployment-rate/CZ020/2007-12-31/F>
a qb:Observation ;
		czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/27> ;
		czso:refPeriod <http://reference.data.gov.uk/id/gregorian-day/2007-12-31> ;
		czso:sex sdmx-code:sex-F ;
		czso:dosazitelniNeumisteniUchazeciOZamestnani 15758.0 ;
		czso:podilNezamestnanych 3.6 ;
		qb:dataSet <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> .
	
<http://data.czso.cz/resource/observation/job-applicants-and-unemployment-rate/CZ020/2013-12-31/T>
a qb:Observation ;
		czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/27> ;
		czso:refPeriod <http://reference.data.gov.uk/id/gregorian-day/2013-12-31> ;
		czso:sex sdmx-code:sex-T ;
		czso:neumisteniUchazeciOZamestnani 61681.0 ;
		czso:dosazitelniNeumisteniUchazeciOZamestnani 60772.0 ;
		czso:podilNezamestnanych 6.9 ;
		czso:pocetVolnychMist 4040.0 ;
		qb:dataSet <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> .
```

URL originálních dat: [https://linked.opendata.cz/soubor/czso-job-applicants-and-unemployment-rate.trig](https://linked.opendata.cz/soubor/czso-job-applicants-and-unemployment-rate.trig)

Stránka datasetu: [https://linked.opendata.cz/dataset/czso-job-applicants-and-unemployment-rate](https://linked.opendata.cz/dataset/czso-job-applicants-and-unemployment-rate)

In [1]:
kernel.silent(true)

Nastavuju parametry úlohy:
* počet ekvifrekvenčních intervalů
* minimální support
* minimální confidence

In [2]:
var intervalsCount = 10
var minSupport = 30
var minConfidence = 0.65

In [3]:
import coursierapi.MavenRepository
interp.repositories() ++= Seq(MavenRepository.of("https://jitpack.io"))

In [4]:
import $ivy.`com.github.propi:rdfrules:1.5.0`
import collection._
import org.apache.jena.riot.Lang

import com.github.propi.rdfrules.data._
import com.github.propi.rdfrules.algorithm.amie._
import com.github.propi.rdfrules.algorithm.dbscan._
import com.github.propi.rdfrules.utils._
import com.github.propi.rdfrules.index._
import com.github.propi.rdfrules.rule._
import com.github.propi.rdfrules.ruleset._

Kostku bylo potřeba nařezat do více datasetů, protože hodnoty měr se diskretizují a musí se diskretizovat jen ty hodnoty, které jsou mezi sebou soumeřitelné. Nemůžu počítat ekvifrekvenční intervaly z hodnot počtu volných míst v okresech a krajích zároveň. Muselo dojít k rozdělení na datasety pouze s kraji a pouze s okresy a na datasety pouze s pozorování konkrétních pohlaví a na datasety s pozorováními za obě pohlaví celkem. To dává dohromady 4 menší kostky, ve kterých se budou hodnoty měr diskretizovat zvlášť:

* okresy podle pohlaví
* kraje podle pohlaví
* okresy celkem
* kraje celkem

Nejjednoduší bylo to provést ve SPARQLu. K hodnotám czso:refArea není slovník, díky kterému bych mohl lehce odlišit okresy od krajů, ale mají ho hodnoty dimenze refArea kostek z ČSSZ. Oba datasety číslují oblasti podle číselníku RÚIAN, takže jsem si s pomocí regulárního výrazu vytvořil linkovací ttl soubor, který jsem zahrnul do dat, nad kterými se volaly ty SPARQL dotazy. Ukázka obsahu souboru:

```turtle
<https://data.cssz.cz/resource/ruian/vusc/78> owl:sameAs 
<http://ruian.linked.opendata.cz/resource/vusc/78> .
<https://data.cssz.cz/resource/ruian/okresy/3704> owl:sameAs
<http://ruian.linked.opendata.cz/resource/okresy/3704> .
```

In [5]:
val unemploymentRate = "http://data.czso.cz/ontology/podilNezamestnanych"
val reachableApplicants = "http://data.czso.cz/ontology/dosazitelniNeumisteniUchazeciOZamestnani"
val unplacedApplicants = "http://data.czso.cz/ontology/neumisteniUchazeciOZamestnani"
val vacanciesCount = "http://data.czso.cz/ontology/pocetVolnychMist"

val refArea = "http://data.czso.cz/ontology/refArea"
val sex = "http://data.czso.cz/ontology/sex"
val refPeriod = "http://data.czso.cz/ontology/refPeriod"

val qbObservation = "http://purl.org/linked-data/cube#Observation"
val uri = (value: String) => TripleItem.Uri(value)

okresy podle pohlaví

```sparql
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX czso: <http://data.czso.cz/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

CONSTRUCT {
    ?observation ?p ?o
} 
WHERE {
 GRAPH <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> {
    ?observation qb:dataSet <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> ;
                 ?p ?o ;
                 czso:refArea ?refAreaCZSO .
    NOT EXISTS {
        ?observation czso:sex <http://purl.org/linked-data/sdmx/2009/code#sex-T> .
    }             
 }
  ?refAreaCSSZ owl:sameAs   ?refAreaCZSO.
 GRAPH <https://data.cssz.cz/resource/dataset/pomocne-ciselniky> {
     ?refAreaCSSZ a <https://data.cssz.cz/ontology/ruian/Okres>
 }
}
```

In [6]:
val jaurDistrictsBySex = Graph("czso","../data/czso-jaur-districts-by-sex.ttl")
println("počet pozorování: " + jaurDistrictsBySex.filter(t => t.`object` == uri(qbObservation)).size)  

počet pozorování: 1368


kraje podle pohlaví

```sparql
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX czso: <http://data.czso.cz/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

CONSTRUCT {
    ?observation ?p ?o
} 
WHERE {
 GRAPH <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> { 
    ?observation qb:dataSet <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> ;
                 ?p ?o ;
                 czso:refArea ?refAreaCZSO .
    NOT EXISTS {
        ?observation czso:sex <http://purl.org/linked-data/sdmx/2009/code#sex-T> .
    }                                     
 }
 ?refAreaCSSZ owl:sameAs   ?refAreaCZSO.
 GRAPH <https://data.cssz.cz/resource/dataset/pomocne-ciselniky> {
     ?refAreaCSSZ a <https://data.cssz.cz/ontology/ruian/Vusc>
 }
}
```

In [7]:
val jaurRegionsBySex = Graph("czso","../data/czso-jaur-regions-by-sex.ttl")
println("počet pozorování: " + jaurRegionsBySex.filter(t => t.`object` == uri(qbObservation)).size)

počet pozorování: 234


okresy celkem

```sparql
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX czso: <http://data.czso.cz/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

CONSTRUCT {
    ?observation ?p ?o
} 
WHERE {
 GRAPH <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> {
    ?observation qb:dataSet <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> ;
                 ?p ?o ;
                 czso:sex <http://purl.org/linked-data/sdmx/2009/code#sex-T> ;
                 czso:refArea ?refAreaCZSO .
 }
 ?refAreaCSSZ owl:sameAs   ?refAreaCZSO.
 GRAPH <https://data.cssz.cz/resource/dataset/pomocne-ciselniky> {
     ?refAreaCSSZ a <https://data.cssz.cz/ontology/ruian/Okres>
 }
}
```

In [8]:
val jaurDistrictsTotal = Graph("czso","../data/czso-jaur-districts-total.ttl")
println("pozorování: " + jaurDistrictsTotal.filter(t => t.`object` == uri(qbObservation)).size)

pozorování: 684


kraje celkem

```sparql
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX czso: <http://data.czso.cz/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

CONSTRUCT {
    ?observation ?p ?o
} 

WHERE {
 GRAPH <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> {
    ?observation qb:dataSet <http://data.czso.cz/resource/dataset/job-applicants-and-unemployment-rate> ;
                 ?p ?o ;
                 czso:sex <http://purl.org/linked-data/sdmx/2009/code#sex-T> ;
                 czso:refArea ?refAreaCZSO .
 }
 ?refAreaCSSZ owl:sameAs   ?refAreaCZSO.
 GRAPH <https://data.cssz.cz/resource/dataset/pomocne-ciselniky> {
     ?refAreaCSSZ a <https://data.cssz.cz/ontology/ruian/Vusc>
 }
}
```

In [9]:
val jaurRegionsTotal = Graph("czso","../data/czso-jaur-regions-total.ttl")
println("pozorování: " + jaurRegionsTotal.filter(t => t.`object` == uri(qbObservation)).size)

pozorování: 117


Tady dochází k diskrezizaci hodnot měr kostek. Jsou to ekvifrekvenční intervaly, počet intervalů je konfigurovatelný.

In [10]:
val equiFrequent = DiscretizationTask.Equifrequency(intervalsCount)
val hasPredicate = (quad: Quad, uri: String) => quad.triple.predicate.hasSameUriAs(uri)
import eu.easyminer.discretization.impl.Interval
import eu.easyminer.discretization.impl.IntervalBound._
val rounded = (value: Double, scale: Integer) => BigDecimal(value).setScale(scale, BigDecimal.RoundingMode.HALF_UP).toDouble
val intervalToString = (i: Interval) => "<"+rounded(i.minValue.value,2)+ "__"+rounded(i.maxValue.value,2)+")"

println("unemploymentRate")
    
jaurDistrictsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nreachableApplicants")
jaurDistrictsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nunplacedApplicants")
jaurDistrictsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, unplacedApplicants))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nvacanciesCount")
jaurDistrictsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, vacanciesCount))
.foreach(i => print(intervalToString(i) + "  "))

val jaurDistrictsTotalDiscretized = jaurDistrictsTotal
    .discretize(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
    .discretize(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))
    .discretize(equiFrequent)(quad => hasPredicate(quad, unplacedApplicants))
    .discretize(equiFrequent)(quad => hasPredicate(quad, vacanciesCount))



unemploymentRate
<1.2__3.66)  <3.66__4.56)  <4.56__5.3)  <5.3__5.97)  <5.97__6.69)  <6.69__7.34)  <7.34__7.9)  <7.9__8.74)  <8.74__9.98)  <9.98__16.2)  

reachableApplicants
<901.0__2214.5)  <2214.5__2928.0)  <2928.0__3509.0)  <3509.0__4140.5)  <4140.5__4973.5)  <4973.5__5952.5)  <5952.5__6731.5)  <6731.5__8081.5)  <8081.5__9639.0)  <9639.0__25767.0)  

unplacedApplicants
<1025.0__2332.5)  <2332.5__3045.0)  <3045.0__3699.0)  <3699.0__4340.0)  <4340.0__5234.0)  <5234.0__6207.5)  <6207.5__6989.5)  <6989.5__8333.0)  <8333.0__9877.0)  <9877.0__26549.0)  

vacanciesCount
<37.0__163.0)  <163.0__211.5)  <211.5__262.5)  <262.5__325.5)  <325.5__394.5)  <394.5__501.5)  <501.5__698.5)  <698.5__878.0)  <878.0__1252.5)  <1252.5__8550.0)  

In [11]:
println("unemploymentRate")
jaurRegionsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nreachableApplicants")
jaurRegionsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nunplacedApplicants")
jaurRegionsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, unplacedApplicants))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nvacanciesCount")
jaurRegionsTotal.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, vacanciesCount))
.foreach(i => print(intervalToString(i) + "  "))

val jaurRegionsTotalDiscretized = jaurRegionsTotal
    .discretize(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
    .discretize(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))
    .discretize(equiFrequent)(quad => hasPredicate(quad, unplacedApplicants))
    .discretize(equiFrequent)(quad => hasPredicate(quad, vacanciesCount))

unemploymentRate
<3.22__4.2)  <4.2__4.9)  <4.9__5.6)  <5.6__6.05)  <6.05__6.5)  <6.5__7.13)  <7.13__7.73)  <7.73__8.3)  <8.3__9.01)  <9.01__11.47)  

reachableApplicants
<12357.0__15848.5)  <15848.5__18343.0)  <18343.0__21524.5)  <21524.5__23417.5)  <23417.5__26083.5)  <26083.5__27795.5)  <27795.5__33840.0)  <33840.0__48943.0)  <48943.0__61297.5)  <61297.5__91177.0)  

unplacedApplicants
<12975.0__16681.0)  <16681.0__19317.5)  <19317.5__22483.5)  <22483.5__24431.5)  <24431.5__26626.0)  <26626.0__29341.5)  <29341.5__35839.0)  <35839.0__50244.0)  <50244.0__64754.0)  <64754.0__96528.0)  

vacanciesCount
<664.0__1095.5)  <1095.5__1316.0)  <1316.0__1840.5)  <1840.5__2227.5)  <2227.5__2415.5)  <2415.5__2751.5)  <2751.5__3651.0)  <3651.0__4511.0)  <4511.0__7139.0)  <7139.0__19691.0)  

In [12]:
println("unemploymentRate")
jaurDistrictsBySex.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nreachableApplicants")
jaurDistrictsBySex.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))
.foreach(i => print(intervalToString(i) + "  "))

val jaurDistrictsBySexDiscretized = jaurDistrictsBySex
    .discretize(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
    .discretize(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))

unemploymentRate
<1.13__3.55)  <3.55__4.58)  <4.58__5.29)  <5.29__5.94)  <5.94__6.67)  <6.67__7.29)  <7.29__7.89)  <7.89__8.76)  <8.76__10.05)  <10.05__16.49)  

reachableApplicants
<468.0__1136.5)  <1136.5__1467.0)  <1467.0__1783.0)  <1783.0__2115.0)  <2115.0__2491.5)  <2491.5__2923.0)  <2923.0__3395.5)  <3395.5__4019.0)  <4019.0__4855.0)  <4855.0__13697.0)  

In [13]:
println("unemploymentRate")
jaurRegionsBySex.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
.foreach(i => print(intervalToString(i) + "  "))

println("\n\nreachableApplicants")
jaurRegionsBySex.discretizeAndGetIntervals(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))
.foreach(i => print(intervalToString(i) + "  "))

val jaurRegionsBySexDiscretized = jaurRegionsBySex
    .discretize(equiFrequent)(quad => hasPredicate(quad, unemploymentRate))
    .discretize(equiFrequent)(quad => hasPredicate(quad, reachableApplicants))


unemploymentRate
<2.85__4.28)  <4.28__4.89)  <4.89__5.61)  <5.61__6.14)  <6.14__6.74)  <6.74__7.29)  <7.29__7.85)  <7.85__8.36)  <8.36__9.43)  <9.43__11.5)  

reachableApplicants
<6297.0__8219.0)  <8219.0__9024.0)  <9024.0__10695.5)  <10695.5__11971.0)  <11971.0__12994.5)  <12994.5__14786.5)  <14786.5__18294.0)  <18294.0__26473.0)  <26473.0__31649.0)  <31649.0__46450.0)  

Každé pravidlo se musí vztahovat k pozorováním pouze v jedné dílčí kostce, protože míry mají napříč dílčími kostkami stejné IRI. Možné řešení je pojmenovat pro každou ze 4 kostek její míry jinak, ale jednodušší je zavedení jiného názvu datasetu každé ze 4 kostek.

In [14]:
// TODO pro každý dataset jiný qb:dataSet
val qbDataSet = "http://purl.org/linked-data/cube#dataSet"



val jaurDistrictsTotalNamed = jaurDistrictsTotalDiscretized
.map(t => if (t.predicate.hasSameUriAs(qbDataSet)) t.copy(`object` = uri("jaurDistrictsTotal")) else t)

val jaurRegionsTotalNamed = jaurRegionsTotalDiscretized
.map(t => if (t.predicate.hasSameUriAs(qbDataSet)) t.copy(`object` = uri("jaurRegionsTotal")) else t)

val jaurDistrictsBySexNamed = jaurDistrictsBySexDiscretized
.map(t => if (t.predicate.hasSameUriAs(qbDataSet)) t.copy(`object` = uri("jaurDistrictsBySex")) else t)

val jaurRegionsBySexNamed = jaurRegionsBySexDiscretized
.map(t => if (t.predicate.hasSameUriAs(qbDataSet)) t.copy(`object` = uri("jaurRegionsBySex")) else t)


V dílčích kostkách s pozorováními celkem za obě pohlaví nejsou potřeba trojice s predikátem czso:sex.

In [15]:
val jaurDistrictsTotalNoSexDimension = jaurDistrictsTotalNamed.filter(t => !t.predicate.hasSameUriAs(sex))
val jaurRegionsTotalNoSexDimension = jaurRegionsTotalNamed.filter(t => !t.predicate.hasSameUriAs(sex))

Dílčí kostky už můžeme poskládát do jedné kostky.

In [16]:
val jaurDataset = Dataset() + 
    jaurDistrictsTotalNoSexDimension + 
    jaurRegionsTotalNoSexDimension + 
    jaurDistrictsBySexNamed + 
    jaurRegionsBySexNamed  

V datasetu není potřeba mít trojice ```?observation a qb:Observation```, ze vzorů pravidel to samo vyplyne.

In [17]:
val jaurDatasetFiltered = jaurDataset.filter(quad => !(quad.triple.`object`.equals(uri(qbObservation))))

Nejdřív si zkusím vydolovat nějaká pravidla jenom nad samotnou kostkou. Data z YAGO přidám později.

In [18]:
val jaurIndex = jaurDatasetFiltered.index()
jaurIndex.cache("../cache/jaurIndex.cache")

2021-03-18 11:52:59:669 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Predicates trimming.
2021-03-18 11:52:59:697 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Subjects indexing.
2021-03-18 11:52:59:755 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Subjects trimming.
2021-03-18 11:52:59:769 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Objects indexing.
2021-03-18 11:52:59:795 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Objects trimming.


Toto jsou jenom nějaké pomocné objekty, které se použijí při definici vzorů pravidel a dolovacích úloh, aby to bylo čitelnější.

In [19]:
val constantsAtObject = RuleConstraint.ConstantsAtPosition.ConstantsPosition.Object
val constantsOnlyAtObject = RuleConstraint.ConstantsAtPosition(constantsAtObject)
val oneOfMeasures = OneOf(
    uri(unemploymentRate), 
    uri(unemploymentRate), 
    uri(reachableApplicants), 
    uri(vacanciesCount)
)
val qbdPredicate = uri(qbDataSet)
val oneOfDimensions = OneOf(
    uri(refArea), 
    uri(sex), 
    uri(refPeriod)
)

Vzor pro pravidla typu: pokud má pozorování v nějaké kostce hodnotu této míry v tomto intervalu, tak má hodnotu této míry v tomto intervalu. Jelikož mají v těch 4 kostkách míry stejné IRI, tak každé pravidlo musím ukotvit na konkrétní dataset. Proto ten atom ```AtomPattern(subject = 'a', predicate = qbdPredicate)``` ikdyž nedoluju nad více kostkama.

In [20]:
val oneCubeTwoMeasures: RulePattern = (
    AtomPattern(subject = 'a', predicate = qbdPredicate) &: 
    AtomPattern(subject = 'a', predicate = oneOfMeasures) 
    =>: 
    AtomPattern(subject = 'a', predicate = oneOfMeasures)
)

Vzor pro pravidla typu: pokud má pozorování v nějaké kostce hodnotu této míry v tomto intervalu a zároveň má hodnotu tuto hodnotu této dimenze, tak má hodnotu této míry v tomto intervalu.

In [21]:
val oneCubeTwoMeasuresOneDimension: RulePattern = (
    AtomPattern(subject = 'a', predicate = qbdPredicate) &: 
    AtomPattern(subject = 'a', predicate = oneOfMeasures) &: 
    AtomPattern(subject = 'a', predicate = oneOfDimensions)
    =>: 
    AtomPattern(subject = 'a', predicate = oneOfMeasures)
)

Nechtěl jsem, aby mi vznikaly pravidla delší než jejich vzory, takže jsem pro každý vzor vytvořil vlastní *mining task*, ve kterých maximální délka pravidel odpovídá délce vzorů.

In [22]:
val oneCubeTwoMeasuresTask = Amie()
    .addThreshold(Threshold.MinSupport(0))
    .addThreshold(Threshold.MaxRuleLength(3))
    .addThreshold(Threshold.MinHeadSize(0))
    .addConstraint(constantsOnlyAtObject)
    .addPattern(oneCubeTwoMeasures)

val oneCubeTwoMeasuresOneDimensionTask = Amie()
    .addThreshold(Threshold.MinSupport(0))
    .addThreshold(Threshold.MaxRuleLength(4))
    .addThreshold(Threshold.MinHeadSize(0))
    .addConstraint(constantsOnlyAtObject)
    .addPattern(oneCubeTwoMeasuresOneDimension)

Níž se vypisuje, kolik se našlo pro každý *mining task* pravidel.

In [23]:
val oneCubeTwoMeasuresRuleset = jaurIndex.mine(oneCubeTwoMeasuresTask)
val oneCubeTwoMeasuresOneDimensionRuleset = jaurIndex.mine(oneCubeTwoMeasuresOneDimensionTask)

println("oneCubeTwoMeasuresRuleset size: " + oneCubeTwoMeasuresRuleset.size)
println("oneCubeTwoMeasuresOneDimensionRuleset size: " + oneCubeTwoMeasuresOneDimensionRuleset.size)

2021-03-18 11:53:09:898 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Amie task settings:
MinHeadSize=1,
MinHeadCoverage=0.0,
MinSupport=1,
MaxThreads=4,
MinAtomSize=0,
MaxRuleLength=3,
WithConstants=true,
ConstantsPosition=Object,
Timeout=-1,
WithDuplicitPredicates=true,
Patterns=List(Mapped(Vector(Mapped(Variable(?a),Constant(Constant(624690160)),Any,Any), Mapped(Variable(?a),OneOf(ArrayBuffer(Constant(Constant(-2070273298)), Constant(Constant(-2070273298)), Constant(Constant(1659106226)), Constant(Constant(1142069620)))),Any,Any)),Some(Mapped(Variable(?a),OneOf(ArrayBuffer(Constant(Constant(-2070273298)), Constant(Constant(-2070273298)), Constant(Constant(1659106226)), Constant(Constant(1142069620)))),Any,Any)),false,false)),
OnlyPredicates=None,
WithoutPredicates=None
2021-03-18 11:53:10:900 +0100 [scala-interpreter-1] INFO com.github.propi.rdfrules.utils.Debugger - Amie task settings:
MinHeadSize=1,
MinHeadCoverage=0.0,
MinSupport=1,
MaxThreads=4,
Min

oneCubeTwoMeasuresRuleset size: 1214
oneCubeTwoMeasuresOneDimensionRuleset size: 9794


Pravidla obou *mining tasků* jsem spojil a uložil do souboru a vytiskl jenom ta, která dosáhla na minimální *confidence*.

In [None]:
val jaurRuleset = (oneCubeTwoMeasuresRuleset + oneCubeTwoMeasuresOneDimensionRuleset)
    .computeConfidence(minConfidence)
    .sortBy(Measure.Confidence, Measure.Support)
println("jaurRuleset size: " + jaurRuleset.size)
jaurRuleset.export("../rulesets/jaurRules.txt")
jaurRuleset.slice(0,10).foreach(rule => println("\n" + rule + "\n"))

jaurRuleset size: 3007

(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 10.045 ; 16.49 ]) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3803>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4855.0 ; 13697.0 ]) | support: 13, headCoverage: 0.005409904286308781, confidence: 1.0, headSize: 2403, bodySize: 13


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4019.0 ; 4855.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3713>) -> (?a czso:podilNezamestnanych [ 10.045 ; 16.49 ]) | support: 11, headCoverage: 0.004577611319184353, confidence: 1.0, headSize: 2403, bodySize: 11


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 26473.0 ; 31649.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/60>) -> (?a czso:podilNezamestnanych [ 9.43 ; 11.5 ]) | support: 10, headCoverage: 0.004161464835622139, confidence: 1.0, headS



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 5.285 ; 5.9350000000000005 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3305>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1136.5 ; 1467.0 )) | support: 6, headCoverage: 0.0024968789013732834, confidence: 1.0, headSize: 2403, bodySize: 6


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 9.98 ; 16.2 ]) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3713>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 8081.5 ; 9639.0 )) | support: 6, headCoverage: 0.0024968789013732834, confidence: 1.0, headSize: 2403, bodySize: 6


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.285 ; 7.885 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3208>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2115.0 ; 2491.5 )) | support: 6, headCoverage: 0.0024968789013732834, confidence: 1.0, headSize: 2403,



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 3.545 ; 4.575 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3212>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 468.0 ; 1136.5 )) | support: 5, headCoverage: 0.0020807324178110697, confidence: 1.0, headSize: 2403, bodySize: 5


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 5.9350000000000005 ; 6.665 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3404>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1783.0 ; 2115.0 )) | support: 5, headCoverage: 0.0020807324178110697, confidence: 1.0, headSize: 2403, bodySize: 5


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 5.9350000000000005 ; 6.665 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3303>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1783.0 ; 2115.0 )) | support: 5, headCoverage: 0.0020807324178110697, confidence: 1.0, h



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 4.575 ; 5.285 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3301>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2923.0 ; 3395.5 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 4.575 ; 5.285 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3505>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2923.0 ; 3395.5 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 1.2 ; 3.6550000000000002 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3406>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 901.0 ; 2214.5 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, 



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 4.575 ; 5.285 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3608>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1136.5 ; 1467.0 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 5.9350000000000005 ; 6.665 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3212>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1136.5 ; 1467.0 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.285 ; 7.885 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3603>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2491.5 ; 2923.0 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 240



(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 9.43 ; 11.5 ]) ^ (?a czso:refPeriod <http://reference.data.gov.uk/id/gregorian-day/2005-12-31>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 31649.0 ; 46450.0 ]) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 8.355 ; 9.43 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/116>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 31649.0 ; 46450.0 ]) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurRegionsTotal>) ^ (?a czso:pocetVolnychMist [ 2751.5 ; 3651.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/132>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 61297.5 ; 91177.0 ]) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4






(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4019.0 ; 4855.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3710>) -> (?a czso:podilNezamestnanych [ 10.045 ; 16.49 ]) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1467.0 ; 1783.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3811>) -> (?a czso:podilNezamestnanych [ 10.045 ; 16.49 ]) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize: 4


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1783.0 ; 2115.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3301>) -> (?a czso:podilNezamestnanych [ 1.13 ; 3.545 )) | support: 4, headCoverage: 0.0016645859342488557, confidence: 1.0, headSize: 2403, bodySize:



(?a qb:dataSet <jaurRegionsTotal>) ^ (?a czso:podilNezamestnanych [ 8.3 ; 9.01 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/51>) -> (?a czso:pocetVolnychMist [ 664.0 ; 1095.5 )) | support: 3, headCoverage: 0.003745318352059925, confidence: 1.0, headSize: 801, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 5.970000000000001 ; 6.6899999999999995 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3303>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 3509.0 ; 4140.5 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 8.74 ; 9.98 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3704>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 6731.5 ; 8081.5 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:da



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 9639.0 ; 25767.0 ]) ^ (?a czso:refPeriod <http://reference.data.gov.uk/id/gregorian-day/2008-12-31>) -> (?a czso:pocetVolnychMist [ 1252.5 ; 8550.0 ]) | support: 3, headCoverage: 0.003745318352059925, confidence: 1.0, headSize: 801, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 163.0 ; 211.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3308>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4973.5 ; 5952.5 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 1.13 ; 3.545 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3401>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 468.0 ; 1136.5 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 4.575 ; 5.285 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3607>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1136.5 ; 1467.0 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 3.545 ; 4.575 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3712>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1136.5 ; 1467.0 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.885 ; 8.76 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3208>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2491.5 ; 2923.0 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 4.575 ; 5.285 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3208>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1467.0 ; 1783.0 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 9639.0 ; 25767.0 ]) ^ (?a czso:refPeriod <http://reference.data.gov.uk/id/gregorian-day/2007-12-31>) -> (?a czso:pocetVolnychMist [ 1252.5 ; 8550.0 ]) | support: 3, headCoverage: 0.003745318352059925, confidence: 1.0, headSize: 801, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 5.970000000000001 ; 6.6899999999999995 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3703>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 8081.5 ; 9639.0 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 262.5 ; 325.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3805>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 9639.0 ; 25767.0 ]) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 7.9 ; 8.74 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3805>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 9639.0 ; 25767.0 ]) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 7.845 ; 8.355 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/116>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 31649.0 ; 46450.0 ]) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a



(?a qb:dataSet <jaurRegionsTotal>) ^ (?a czso:pocetVolnychMist [ 664.0 ; 1095.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/124>) -> (?a czso:podilNezamestnanych [ 8.3 ; 9.01 )) | support: 3, headCoverage: 0.0012484394506866417, confidence: 1.0, headSize: 2403, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 5.970000000000001 ; 6.6899999999999995 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3303>) -> (?a czso:pocetVolnychMist [ 37.0 ; 163.0 )) | support: 3, headCoverage: 0.003745318352059925, confidence: 1.0, headSize: 801, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 4.555 ; 5.3 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3408>) -> (?a czso:pocetVolnychMist [ 37.0 ; 163.0 )) | support: 3, headCoverage: 0.003745318352059925, confidence: 1.0, headSize: 801, bodySize: 3


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnan



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 4.555 ; 5.3 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3610>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 3509.0 ; 4140.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurRegionsTotal>) ^ (?a czso:pocetVolnychMist [ 2415.5 ; 2751.5 )) ^ (?a czso:refPeriod <http://reference.data.gov.uk/id/gregorian-day/2013-12-31>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 23417.5 ; 26083.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.885 ; 8.76 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3805>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4855.0 ; 13697.0 ]) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 394.5 ; 501.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3404>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 3509.0 ; 4140.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 163.0 ; 211.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3608>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 3509.0 ; 4140.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 1.2 ; 3.6550000000000002 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3602>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 3509.0 ; 4140.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1783.0 ; 2115.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3604>) -> (?a czso:podilNezamestnanych [ 6.665 ; 7.285 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4855.0 ; 13697.0 ]) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3703>) -> (?a czso:podilNezamestnanych [ 6.665 ; 7.285 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2115.0 ; 2491.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3308>) -> (?a czso:podilNezamestnanych [ 6.665 ; 7.285 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2




(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 8.76 ; 10.045 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3204>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2923.0 ; 3395.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 6.74 ; 7.29 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/27>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 26473.0 ; 31649.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 6.135 ; 6.74 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/27>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 26473.0 ; 31649.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dat



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 3.6550000000000002 ; 4.555 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3306>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 901.0 ; 2214.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2491.5 ; 2923.0 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3504>) -> (?a czso:podilNezamestnanych [ 7.885 ; 8.76 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.285 ; 7.885 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3506>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2923.0 ; 3395.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bo



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 3.6550000000000002 ; 4.555 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3305>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 901.0 ; 2214.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 3509.0 ; 4140.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3301>) -> (?a czso:pocetVolnychMist [ 1252.5 ; 8550.0 ]) | support: 2, headCoverage: 0.0024968789013732834, confidence: 1.0, headSize: 801, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 7.9 ; 8.74 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3410>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2928.0 ; 3509.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodyS



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 7.9 ; 8.74 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3402>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4973.5 ; 5952.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 211.5 ; 262.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3402>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4973.5 ; 5952.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 1.13 ; 3.545 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3712>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 468.0 ; 1136.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 5.3 ; 5.970000000000001 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3403>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4140.5 ; 4973.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 325.5 ; 394.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3404>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4140.5 ; 4973.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.285 ; 7.885 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3211>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2491.5 ; 2923.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySi



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 5.3 ; 5.970000000000001 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3809>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4140.5 ; 4973.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 6.6899999999999995 ; 7.34 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3402>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 4140.5 ; 4973.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 4.575 ; 5.285 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3804>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2491.5 ; 2923.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSiz



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 4.555 ; 5.3 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3708>) -> (?a czso:pocetVolnychMist [ 1252.5 ; 8550.0 ]) | support: 2, headCoverage: 0.0024968789013732834, confidence: 1.0, headSize: 801, bodySize: 2


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 4.885 ; 5.605 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/124>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 10695.5 ; 11971.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurRegionsBySex>) ^ (?a czso:podilNezamestnanych [ 4.279999999999999 ; 4.885 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/vusc/124>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 10695.5 ; 11971.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jau



(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 7.285 ; 7.885 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3302>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 1467.0 ; 1783.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:podilNezamestnanych [ 7.34 ; 7.9 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3403>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 5952.5 ; 6731.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 394.5 ; 501.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3403>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 5952.5 ; 6731.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a q



(?a qb:dataSet <jaurDistrictsTotal>) ^ (?a czso:pocetVolnychMist [ 211.5 ; 262.5 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3509>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 8081.5 ; 9639.0 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 5.9350000000000005 ; 6.665 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3206>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2115.0 ; 2491.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSize: 2403, bodySize: 2


(?a qb:dataSet <jaurDistrictsBySex>) ^ (?a czso:podilNezamestnanych [ 5.285 ; 5.9350000000000005 )) ^ (?a czso:refArea <http://ruian.linked.opendata.cz/resource/okresy/3714>) -> (?a czso:dosazitelniNeumisteniUchazeciOZamestnani [ 2115.0 ; 2491.5 )) | support: 2, headCoverage: 8.322929671244278E-4, confidence: 1.0, headSi

V pravidlech podle vzoru ```oneCubeTwoMeasuresOneDimension``` je jenom dimenze sex a refPeriod, protože je asi nastavený moc nízký support na to, aby se chytla dimenze oblasti. Dolování takovýchto pravidel nijak neodporuje těm mým "pravidlům tvorby pravidel", protože ta jedna kostka je vlastně "konsekventní" a může mít volné dimenze.

# Přidání trojic z YAGO 4

Použil jsem veřejný sparql endpoint na adrese ```https://yago-knowledge.org/sparql/query```. Pokusil jsem se si YAGO 4 stáhnout a rozjet lokálně, ale doma nemám žádné zařízení, do kterého by se celý dataset vlezl. Na ty dotazy pro jeden a dva hopy mi přicházelo trojic, že mi vždycky zamrzl prohlížeč, takže jsem ten endpoint provolával curlem dotazy psal do souboru:

```bash
curl -X POST -d @query.rq -H Content-Type: application/x-www-form-urlencoded 'https://yago-knowledge.org/sparql/query' > result.xml
```

Dotaz pro kraje ČR


```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>

select distinct ?region where {
?region a yago:Regions_of_the_Czech_Republic
}
```

Dotaz pro okresy ČR

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>
PREFIX schema: <http://schema.org/>

select distinct ?district where {
?region a yago:Regions_of_the_Czech_Republic; schema:containsPlace ?district.
?district a yago:Districts_of_the_Czech_Republic .
}
```

DESCRIBE dotaz pro kraje ČR

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>

describe ?region where {
?region a yago:Regions_of_the_Czech_Republic
}
```

Vzorek dat:

```turtle
yago:Prague <http://schema.org/logo> <http://commons.wikimedia.org/wiki/Special:FilePath/Logo%20Praha.jpg> ;
	<http://schema.org/url> "http://www.praha.eu/"^^xsd:anyURI ;
	<http://schema.org/subOrganization> yago:Prague_Conservatory , yago:Akademické_gymnázium_Štěpánská_Q10726782 .
```

In [None]:
val regions = Graph("yago", "../data/describe-region.ttl")
println("trojic: " + regions.size) 
regions.addPrefixes(Traversable(Prefix("schema", "http://schema.org/")))
val types: Map[TripleItem.Uri, Map[TripleItemType, Int]] = regions.types()
println("predikáty:")
for ((k,v) <- types) print(k + "\t")

DESCRIBE dotaz pro okresy ČR

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>
PREFIX schema: <http://schema.org/>

describe ?district where {
?region a yago:Regions_of_the_Czech_Republic; schema:containsPlace ?district.
?district a yago:Districts_of_the_Czech_Republic .
}
```

Vzorek dat:

```turtle
yago:_Q3564604 schema:containsPlace yago:Frýdek-Místek_District .
yago:Moravian-Silesian_Region schema:containsPlace yago:Frýdek-Místek_District .
yago:Čeladná schema:containedInPlace yago:Frýdek-Místek_District ;
	schema:location yago:Frýdek-Místek_District .
```

In [None]:
val districts = Graph("yago", "../data/describe-district.ttl")
println("trojic: " + districts.size)
val types: Map[TripleItem.Uri, Map[TripleItemType, Int]] = districts.types()
println("predikáty:")
for ((k,v) <- types) print(k + "\t")

DESCRIBE dotaz pro kraje ČR hop 1

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>

describe ?hop1 where {
?region a yago:Regions_of_the_Czech_Republic .
?hop1 ?p ?region .
}
```

Vzorek dat:

```xml
<rdf:Description rdf:about="http://yago-knowledge.org/resource/Agáta_Prachařová_Q10721001">
	<birthPlace xmlns="http://schema.org/" rdf:resource="http://yago-knowledge.org/resource/Prague"/>
	<parent xmlns="http://schema.org/" rdf:resource="http://yago-knowledge.org/resource/Veronika_Žilková"/>
	<givenName xmlns="http://schema.org/" rdf:resource="http://yago-knowledge.org/resource/Agáta_Q9110539"/>
```

In [None]:
val regionsHop1 = Graph("yago", "../data/describe-region-hop1.xml")
println("trojic: " + regionsHop1.size)
val types: Map[TripleItem.Uri, Map[TripleItemType, Int]] = regionsHop1.types()
println("predikáty:")
for ((k,v) <- types) print(k + "\t")

DESCRIBE dotaz pro okresy ČR hop 1

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>
PREFIX schema: <http://schema.org/>

describe ?hop1 where {
?region a yago:Regions_of_the_Czech_Republic; schema:containsPlace ?district .
?district a yago:Districts_of_the_Czech_Republic .
?hop1 ?p ?district .
}
```

Vzorek dat:

```xml
<rdf:Description rdf:about="http://yago-knowledge.org/resource/Frýdek-Místek_District">
	<schema:containedInPlace rdf:resource="http://yago-knowledge.org/resource/_Q3564604"/>
</rdf:Description>

<rdf:Description rdf:about="http://yago-knowledge.org/resource/Přerov_District">
	<schema:containedInPlace rdf:resource="http://yago-knowledge.org/resource/_Q3564604"/>
</rdf:Description>
```

In [None]:
val districtsHop1 = Graph("yago", "../data/describe-district-hop1.xml")
println("trojic: " + districtsHop1.size)
val types: Map[TripleItem.Uri, Map[TripleItemType, Int]] = districtsHop1.types()
println("predikáty:")
for ((k,v) <- types) print(k + "\t")

DESCRIBE dotaz pro kraje ČR hop 2

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>

describe ?hop2 where {
?region a yago:Regions_of_the_Czech_Republic .
?hop1 ?p1 ?region .
?hop2 ?p2 ?hop1 .
}
```

Vzorek dat:

```ttl
<http://yago-knowledge.org/resource/Carl_Ferdinand_Cori> <http://schema.org/award> <http://yago-knowledge.org/resource/Banting_Medal> .
<http://yago-knowledge.org/resource/Carl_Ferdinand_Cori> <http://schema.org/award> <http://yago-knowledge.org/resource/honorary_doctorate_of_the_University_of_Granada_Q50610972> .
```

Data z "hop 2" i pro kraje i pro okresy jsem musel rozdělit do více souborů (viz notebook graph-divide.ipynb ve stejném repozitáři), protože jeden soubor byl moc velký a nešel mi pushnout do githubu.

In [None]:
val regionsHop2_1 = Graph("yago", "../data/describe-region-hop2_1.ttl")
val regionsHop2_2 = Graph("yago", "../data/describe-region-hop2_2.ttl")
val regionsHop2_3 = Graph("yago", "../data/describe-region-hop2_3.ttl")
val regionsHop2_4 = Graph("yago", "../data/describe-region-hop2_4.ttl")
val regionsHop2 = Dataset() + regionsHop2_1 + regionsHop2_2 + regionsHop2_3 + regionsHop2_4
println("trojic: " + regionsHop2.size)
val types: Map[TripleItem.Uri, Map[TripleItemType, Int]] = regionsHop2.types()
println("predikáty:")
for ((k,v) <- types) print(k + "\t")

DESCRIBE dotaz pro okresy ČR hop 2

```sparql
PREFIX yago: <http://yago-knowledge.org/resource/>
PREFIX schema: <http://schema.org/>

describe ?hop2 where {
?region a yago:Regions_of_the_Czech_Republic; schema:containsPlace ?district .
?district a yago:Districts_of_the_Czech_Republic .
?hop1 ?p1 ?district .
?hop2 ?p2 ?hop1 .
}
```

Vzorek dat:

```turtle
<http://yago-knowledge.org/resource/Řepiště> <http://schema.org/location> <http://yago-knowledge.org/resource/Frýdek-Místek_District> .
<http://yago-knowledge.org/resource/Kunčice_pod_Ondřejníkem> <http://schema.org/containedInPlace> <http://yago-knowledge.org/resource/Frýdek-Místek_District> .
```


In [None]:
val districtsHop2_1 = Graph("yago", "../data/describe-district-hop2_1.ttl")
val districtsHop2_2 = Graph("yago", "../data/describe-district-hop2_2.ttl")
val districtsHop2_3 = Graph("yago", "../data/describe-district-hop2_3.ttl")
val districtsHop2_4 = Graph("yago", "../data/describe-district-hop2_4.ttl")
val districtsHop2 = Dataset() + districtsHop2_1 + districtsHop2_2 + districtsHop2_3 + districtsHop2_4
println("trojic: " + districtsHop2.size)
val types: Map[TripleItem.Uri, Map[TripleItemType, Int]] = districtsHop2.types()
println("predikáty:")
for ((k,v) <- types) print(k + "\t")

Přidávám soubor s trojicemi, které spojují entitity okresů a krajů mezi daty ČSÚ a YAGO. Tyto trojice nešly nikde natáhnout SPARQLem. Entity ani ČSÚ ani ČSSZ neodkazují ani na YAGO ani na jiné znalostní grafy. Přiklady trojic:

```turtle
yago:Zlín_Region owl:sameAs <http://ruian.linked.opendata.cz/resource/vusc/141> .
yago:Uherské_Hradiště_District owl:sameAs <http://ruian.linked.opendata.cz/resource/okresy/3711> .
```

In [None]:
val yagoLinking = Graph("yago", "../data/yagoLinking.ttl")
println("trojic: " + yagoLinking.size)

In [None]:
val yagoDataset = (Dataset() + 
    regions + districts + 
    regionsHop1 + districtsHop1 + 
    districtsHop2 + regionsHop2 +
    yagoLinking
)

Z datasetu jsem si odfiltroval trojice s některými predikáty, které by nebyly stejně přínosné počet trojic se mi srazil na 46 %. 

In [None]:
val rdfsLabel = "http://www.w3.org/2000/01/rdf-schema#label"
val rdfsComment = "http://www.w3.org/2000/01/rdf-schema#comment"
val alternateName = "http://schema.org/alternateName"
val image = "http://schema.org/image"

val yagoDatasetFiltered = yagoDataset.
filter(q => !q.triple.predicate.hasSameUriAs(rdfsLabel) &&
                !q.triple.predicate.hasSameUriAs(rdfsComment) &&
                !q.triple.predicate.hasSameUriAs(alternateName) &&
                !q.triple.predicate.hasSameUriAs(image))

val ratio: Double = (yagoDatasetFiltered.size.toDouble / yagoDataset.size.toDouble)
println(yagoDatasetFiltered.size + " / " + yagoDataset.size + " = " + rounded(ratio,2)*100 + "%")

In [None]:
val yagoJaurDataset = jaurDatasetFiltered + yagoDatasetFiltered
val yagoJaurIndex = yagoJaurDataset.index()
yagoJaurIndex.cache("../cache/yagoJaurIndex.cache")

Vzory pravidel pro tuto úlohu jsou podobné jako u úlohy jenom nad kostkou. Do každého vzoru přibyl atom, který je z YAGO jeho subjekt je spojen s hodnotou dimenze oblasti s pozorováními.

In [None]:
val rdfType = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"

val oneCubeTwoMeasuresYagoPattern = (
    AtomPattern(subject = 'a', predicate = qbdPredicate, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = NoneOf(uri(rdfType)), graph = uri("yago")) &:
    AtomPattern(subject = 'a', `object` = 'b', graph = uri("czso")) &:
    AtomPattern(subject = 'a', predicate = oneOfMeasures, graph = uri("czso"))
    =>: 
    AtomPattern(subject = 'a', predicate = oneOfMeasures, graph = uri("czso"))
)

val oneCubeOneMeasureYagoPattern = (
    AtomPattern(subject = 'a', predicate = qbdPredicate, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = NoneOf(uri(rdfType)), graph = uri("yago")) &:
    AtomPattern(subject = 'a', `object` = 'b', graph = uri("czso"))
    =>: 
    AtomPattern(subject = 'a', predicate = oneOfMeasures, graph = uri("czso"))
)

In [None]:
minSupport = 50
minConfidence = 0.75

In [None]:
val oneCubeTwoMeasuresYagoTask = Amie()
    .addThreshold(Threshold.MinSupport(minSupport))
    .addThreshold(Threshold.MaxRuleLength(5))
    .addThreshold(Threshold.MinHeadSize(0))
    .addConstraint(constantsOnlyAtObject)
    .addPattern(oneCubeTwoMeasuresYagoPattern)

val oneCubeOneMeasureYagoTask = Amie()
    .addThreshold(Threshold.MinSupport(minSupport))
    .addThreshold(Threshold.MaxRuleLength(4))
    .addThreshold(Threshold.MinHeadSize(0))
    .addConstraint(constantsOnlyAtObject)
    .addPattern(oneCubeOneMeasureYagoPattern)

In [None]:
val oneCubeTwoMeasuresYagoTaskRuleset = yagoJaurIndex.mine(oneCubeTwoMeasuresYagoTask)
val oneCubeOneMeasureYagoRuleset = yagoJaurIndex.mine(oneCubeOneMeasureYagoTask)

println("oneCubeTwoMeasuresYagoTaskRuleset size: " + oneCubeTwoMeasuresYagoTaskRuleset.size)
println("oneCubeOneMeasureYagoRuleset size: " + oneCubeOneMeasureYagoRuleset.size)

6 pravidel má confidence vyšší než 0,75

In [None]:
val jaurYagoRuleset = (oneCubeTwoMeasuresYagoTaskRuleset + oneCubeOneMeasureYagoRuleset)
.computeConfidence(minConfidence).sortBy(Measure.Confidence, Measure.HeadCoverage)
jaurYagoRuleset.export("../rulesets/jaurYagoRules.txt")
jaurYagoRuleset.foreach(rule => println("\n" + rule + "\n"))

Jediné to poslední pravidlo je ze vzoru ```oneCubeTwoMeasuresYagoPattern```. Jako jediný predikát z YAGO trojic tu je <http://schema.org/containedInPlace>. Zajímavé je třeba to druhé pravidlo, podle kterého platí vztah nějakých dvou měr pro okresy z bývalého Severočeského kraje ([https://www.wikidata.org/wiki/Q3509008](https://www.wikidata.org/wiki/Q3509008)). Trojice s informacemi, které okresy patří do kterého kraje, jsem ještě mohl ručně poskládat, ale na bývalé kraje bych určitě nemyslel. To už je pravidlo, které by se bez dat ze znalostního grafu nenašlo.



In [None]:
minSupport = 55
minConfidence = 0.65

Zkusil jsem vydolovat pravidla, které by měly v hlavě trojici ze znalostního grafu. Ještě jsem nad strukturou takových moc nepřemýšlel, ale držel jsem se pořád toho, že AK nesmí mít volné dimenze a jakákoliv kostka v pravidle, že není v hlavě trojice z kostky, je AK. Vytvořil jsem dva vzory pro obě struktury kostek:
* rok x oblast
* rok x oblast x pohlaví

In [None]:
val oneOf2Dcubes = OneOf(uri("jaurDistrictsTotal"),uri("jaurRegionsTotal"))
val oneOf3Dcubes = OneOf(uri("jaurDistrictsBySex"),uri("jaurRegionsBySex"))

val yago2DPattern = (
    AtomPattern(subject = 'b', predicate = qbdPredicate, `object` = oneOf2Dcubes,  graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = oneOfMeasures, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refPeriod), `object` = AnyConstant, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refArea), `object` = 'a', graph = uri("czso"))
    =>: 
    AtomPattern(subject = AnyVariable, graph = uri("yago"))
)

val yago3DPattern = (
    AtomPattern(subject = 'b', predicate = qbdPredicate, `object` = oneOf3Dcubes,  graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = oneOfMeasures, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(sex), `object` = AnyConstant, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refPeriod), `object` = AnyConstant, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refArea), `object` = 'a', graph = uri("czso"))
    =>: 
    AtomPattern(subject = AnyVariable, graph = uri("yago"))
)

Pro tento *mining task* už jsem musel nastavit timeout (je v minutách), protože bez něj se mi hlásila ```OutOfMemoryException``` a žádná pravidla jsem nedostal.

In [None]:
val yagoClosedDimensionsTask = Amie()
    .addThreshold(Threshold.MinSupport(minSupport))
    .addThreshold(Threshold.MaxRuleLength(10))
    .addThreshold(Threshold.MinHeadSize(0))
    .addThreshold(Threshold.Timeout(10))
    .addConstraint(constantsOnlyAtObject)
    .addPattern(yago2DPattern)
    .addPattern(yago3DPattern)

In [None]:
val yagoClosedDimensionsTaskRuleset = yagoJaurIndex.mine(yagoClosedDimensionsTask)
println("yagoClosedDimensionsTaskRuleset size: " + yagoClosedDimensionsTaskRuleset.size)

Našlo to nějaká pravidla, ale všechna jsou skoro stejná. Všechna mají v hlavě predikát rdfType. Ukazuju prvních 10.

In [None]:
val yagoClosedDimensionsTaskRulesetFiltered = yagoClosedDimensionsTaskRuleset
.slice(0,10)
.computePcaConfidence(minConfidence)
.computeLift()
yagoClosedDimensionsTaskRulesetFiltered.foreach(rule => println("\n" + rule + "\n"))

Zkusil jsem ve vzorech pravidla zakázat v hlavě rdf:type ...

In [None]:
val yago2DNoTypePattern = (
    AtomPattern(subject = 'b', predicate = qbdPredicate, `object` = oneOf2Dcubes,  graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = oneOfMeasures, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refPeriod), `object` = AnyConstant, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refArea), `object` = 'a', graph = uri("czso"))
    =>: 
    AtomPattern(subject = AnyVariable, predicate = NoneOf(uri(rdfType)),  graph = uri("yago"))
)

val yago3DNoTypePattern = (
    AtomPattern(subject = 'b', predicate = qbdPredicate, `object` = oneOf3Dcubes,  graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = oneOfMeasures, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(sex), `object` = AnyConstant, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refPeriod), `object` = AnyConstant, graph = uri("czso")) &:
    AtomPattern(subject = 'b', predicate = uri(refArea), `object` = 'a', graph = uri("czso"))
    =>: 
    AtomPattern(subject = AnyVariable, predicate = NoneOf(uri(rdfType)),  graph = uri("yago"))
)

A nastavit minimální support threshold ...

In [None]:
val yagoClosedDimensionsNoTypeTask = Amie()
    .addThreshold(Threshold.MinSupport(0))
    .addThreshold(Threshold.MaxRuleLength(10))
    .addThreshold(Threshold.MinHeadSize(0))
    .addThreshold(Threshold.Timeout(10))
    .addConstraint(constantsOnlyAtObject)
    .addPattern(yago2DNoTypePattern)
    .addPattern(yago3DNoTypePattern)

Ale žádná pravidla mi to nenašlo. Svádím to na data.

In [None]:
val yagoClosedDimensionsNoTypeTaskRuleset = yagoJaurIndex.mine(yagoClosedDimensionsNoTypeTask)
println("yagoClosedDimensionsNoTypeTaskRuleset size: " + yagoClosedDimensionsNoTypeTaskRuleset.size)