## Import to Allegrograph

In this notebook we describe the steps of data import to your Allegrograph repository.


The SPARQL queries are to be executed on the Allegrograph SPARQL Endpoint:

First we check the basic properties of the population: name, sex, year of birth.

In [2]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT distinct ?item ?itemLabel ?gender ?year  WHERE {
  { ?item wdt:P106 wd:Q635734 }  # Archiviste

  ?item wdt:P31 wd:Q5;  # Any instance of a human.
    wdt:P21 ?gender;  # Genre
    wdt:P569 ?birthDate.

  VALUES ?country { wd:Q142 wd:Q31 wd:Q39 wd:Q16 }  # France, Belgium, Switzerland, Canada

  BIND(REPLACE(str(?birthDate), "(.*)([0-9]{4})(.*)", "$2") AS ?year)

  FILTER(xsd:integer(?year) > 1850  && xsd:integer(?year) < 1990)

  BIND ( ?itemLabel as ?itemLabel)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
}
LIMIT 20

SyntaxError: invalid syntax (2882896383.py, line 1)

### Preparing to import data

* Here we use the CONSTRUCT query to prepare the triples for import into a graph database.
* We limit the test to a few rows to avoid displaying thousands of them.
* Inspect and check the triplets that are generated.
* Reuse if possible the Wikidata properties 

In [3]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

CONSTRUCT 
    {?item  rdfs:label ?itemLabel.
        ?item wdt:P21 ?gender.
        ?item wdt:P569 ?year. 
        ?item  wdt:P31 wd:Q5. }
        
WHERE {
  
  SERVICE <https://query.wikidata.org/sparql>
  {
    { ?item wdt:P106 wd:Q635734 }  # Archiviste

    ?item wdt:P31 wd:Q5;  # Any instance of a human
      wdt:P569 ?birthDate;
      wdt:P21 ?gender.  # Genre

    VALUES ?country { wd:Q142 wd:Q31 wd:Q39 wd:Q16 }  # France, Belgium, Switzerland, Canada

    BIND(REPLACE(str(?birthDate), "(.*)([0-9]{4})(.*)", "$2") AS ?year)

    FILTER(xsd:integer(?year) > 1850  && xsd:integer(?year) < 1990)

    BIND ( ?itemLabel as ?itemLabel)
    SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
  }
}
LIMIT 20

SyntaxError: invalid syntax (21175726.py, line 1)

### Import the triples into a dedicated graph

Two import strategies are possible: 
* directly through a federated query
  * the query can be executed on a sparql-book 
  * or directly on the Allegrograph server, if it takes to much time to work through the notebook or it does not work
* directly in Wikidata with import/export of the data 
  * execute a CONTRUCT query with the complete data (without the SERIVICE and LIMIT clause) and export it to the Turtle format (suffix: .ttl)
  * then import the data into Allegrograph with the appropriate functionality


In all cases, activate in Allegrograph the 'Duplication suppression' of type SPOG, cf. menu: Repository control -> Manage duplicates -> Duplicate suppression type


The graph URI is in fact a URL pointing to a page with the description of the [imported data](../graphs/wikidata-imported-data.md)

In [4]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

INSERT {

        ### Note that the data is imported into a named graph and not the DEFAULT one
        GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>

        {?item  rdfs:label ?itemLabel.
           ?item wdt:P21 ?gender.
           ?item wdt:P569 ?year. 
           # ?item  wdt:P31 wd:Q5.
           # modifier pour disposer de la propriété standard
           ?item  rdf:type wd:Q5.
        }
}
        
WHERE {
    SERVICE <https://query.wikidata.org/sparql>
    {
        { ?item wdt:P106 wd:Q635734 }  # Archiviste

        ?item wdt:P31 wd:Q5;  # Any instance of a human
        wdt:P569 ?birthDate;
        wdt:P21 ?gender.  # Genre

        VALUES ?country { wd:Q142 wd:Q31 wd:Q39 wd:Q16 }  # France, Belgium, Switzerland, Canada

        BIND(REPLACE(str(?birthDate), "(.*)([0-9]{4})(.*)", "$2") AS ?year)

        FILTER(xsd:integer(?year) > 1850  && xsd:integer(?year) < 1990)

        BIND ( ?itemLabel as ?itemLabel)
        SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
    }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (2961474516.py, line 5)

#### Correctif si la requête précédente a été réalisée avec wdt:P31 à la place de rdf:type

* rdf:tpye permet d'indiquer explicitement que wd:Q5 est un type RDF et donc vituellement une classe
* noter qu'il faut exécuter cette requête DIRECTEMENT sur le serveur Allegrograph

In [5]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

DELETE {?item  wdt:P31 wd:Q5}
INSERT {?item rdf:type wd:Q5}
WHERE { GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
    {
        ?item wdt:P31 wd:Q5.
    }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (429288054.py, line 3)

#### Add a label to the Person class


In [6]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
    {
        wd:Q5 rdfs:label "Person".
    }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (463993195.py, line 3)

### Add the gender class

In [7]:
###  Inspect the genders:
# number of different countries

PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT (COUNT(*) as ?n)
WHERE
   {
   SELECT DISTINCT ?gender
   WHERE {
      GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
         {
            ?s wdt:P21 ?gender.
         }
      }
   }

SyntaxError: invalid syntax (2815036451.py, line 4)

In [8]:
### Insert the class 'gender' for all countries
# Please note that strictly speaking Wikidata has no ontology,
# therefore no classes. We add this for our convenience

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

WITH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
INSERT {
   ?gender rdf:type wd:Q48264.
}
WHERE
   {
   SELECT DISTINCT ?gender
   WHERE {
         {
            ?s wdt:P21 ?gender.
         }
      }
   }

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (1470267805.py, line 5)

In [9]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
    {
        wd:Q48264 rdfs:label "Gender Identity".
    }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (750380080.py, line 3)

### Verify imported triples and add labels to genders

In [10]:
### Number of triples in the graph
SELECT (COUNT(*) as ?n)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s ?p ?o}
}

SyntaxError: Invalid star expression (3401868768.py, line 2)

In [11]:
### Number of persons with more than one label : no person
SELECT (COUNT(*) as ?n)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s rdf:label ?o}
}
GROUP BY ?s
HAVING (?n > 1)

SyntaxError: Invalid star expression (464328043.py, line 2)

In [12]:
### Explore the gender

In [13]:
### Number of persons having more than one gender
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?s (COUNT(*) as ?n)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s wdt:P21 ?gen}
}
GROUP BY ?s
HAVING (?n > 1)

SyntaxError: invalid syntax (2285325527.py, line 2)

In [14]:
### Number of persons per gender
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?gen (COUNT(*) as ?n)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s wdt:P21 ?gen}
}
GROUP BY ?gen
#HAVING (?n > 1)

SyntaxError: invalid syntax (961159767.py, line 2)

In [15]:
### Number of persons per gender in relation to a period
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?gen (COUNT(*) as ?n)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s wdt:P21 ?gen;
            wdt:P569 ?birthDate.
        FILTER (?birthDate < '1900')     
          }
}
GROUP BY ?gen
#HAVING (?n > 1)

SyntaxError: invalid syntax (154956319.py, line 2)

In [16]:
### Add the label to the gender

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT ?gen ?genLabel
WHERE {

    

    {SELECT DISTINCT ?gen
    WHERE {
        GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
            {?s wdt:P21 ?gen}
    }
    }   

    SERVICE  <https://query.wikidata.org/sparql> {
        ## Add this clause in order to fill the variable      
        BIND(?gen as ?gen)
        BIND ( ?genLabel as ?genLabel)
        SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }  
    }
}

SyntaxError: invalid syntax (2853247686.py, line 3)

In [17]:
### Add the label to the gender

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

CONSTRUCT {
     ?gen rdfs:label ?genLabel
    
} 
WHERE {

    

    {SELECT DISTINCT ?gen
    WHERE {
        GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>   
            {?s wdt:P21 ?gen}
    }
    }   

    SERVICE  <https://query.wikidata.org/sparql> {
        ## Add this clause in order to fill the variable      
        BIND(?gen as ?gen)
        BIND ( ?genLabel as ?genLabel)
        SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }  
    }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (2156326862.py, line 7)

In [18]:
### Add the label to the gender: INSERT

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

WITH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
INSERT {
     ?gen rdfs:label ?genLabel
    
} 
WHERE {    

    {SELECT DISTINCT ?gen
    WHERE {
        GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
            {?s wdt:P21 ?gen}
    }
    }   

    SERVICE  <https://query.wikidata.org/sparql> {
        ## Add this clause in order to fill the variable      
        BIND(?gen as ?gen)
        BIND ( ?genLabel as ?genLabel)
        SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }  
    }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (2166207328.py, line 7)

In [19]:
### Verify data insertion - using only Allegrograph data

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT ?gen ?genLabel ?n
WHERE
{
    {
    SELECT ?gen (COUNT(*) as ?n)
        WHERE {
            GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
                    {
            ?s wdt:P21 ?gen.
            }
        }    
        GROUP BY ?gen        
    }    
    ?gen rdfs:label ?genLabel
    }   


SyntaxError: invalid syntax (2500549158.py, line 3)

### Prepare data to analyse

In [20]:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>


SELECT ?s ?label ?birthDate ?gen
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s wdt:P21 ?gen;
            rdfs:label ?label;
            wdt:P569 ?birthDate.
          }
}
ORDER BY ?birthDate
LIMIT 10

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (1104606660.py, line 3)

In [21]:
### Number of persons

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT (COUNT(*) as ?n)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {
          # ?s wdt:P31 wd:Q5 
          ?s a wd:Q5
          }
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (2148459703.py, line 5)

In [22]:
### Personnes avec choix aléatoire de modalités pour variables doubles

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>


SELECT  ?s (MAX(?label) as ?label) (xsd:integer(MAX(?birthDate)) as ?birthDate) (MAX(?gen) as ?gen)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s wdt:P21 ?gen;
            rdfs:label ?label;
            wdt:P569 ?birthDate.
          }
}
GROUP BY ?s
LIMIT 10

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (65136328.py, line 5)

In [23]:
### Nombre de personnes avec propriétés de base sans doublons (choix aléatoire)

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT (COUNT(*) as ?n)
WHERE {
SELECT  ?s (MAX(?label) as ?label) (xsd:integer(MAX(?birthDate)) as ?birthDate) (MAX(?gen) as ?gen)
WHERE {
    GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
        {?s wdt:P21 ?gen;
            rdfs:label ?label;
            wdt:P569 ?birthDate.
          }
}
GROUP BY ?s
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (932159073.py, line 5)

In [24]:
### Ajouter le label pour la propriété "date of birth"

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

INSERT DATA {
GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
{    wdt:P569 rdfs:label "date of birth"
}    
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (4187316724.py, line 5)

In [25]:
### Nombre de personnes avec propriétés de base sans doublons (choix aléatoire)

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

INSERT DATA {
GRAPH <https://github.com/mroylem/archivist/blob/main/Wikidata/graph/imported-data.md>
{    wdt:P21 rdfs:label "sex or gender"
}    
}

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers (1431230381.py, line 5)