# Analysis and Visualization

Let's create a visualization of a single meme:

## 0. Setup

In [8]:
import os
import os.path

from kgtk.configure_kgtk_notebooks import ConfigureKGTK
from kgtk.functions import kgtk, kypher

In [9]:
# Parameters

# Folders on local machine where to create the output and temporary files:
input_path = "wikidata"
output_path = "projects"
project_name = "tutorial-kypher"

In [10]:
big_files=["label"]

additional_files = {
    "P31": "derived.P31.tsv.gz",
    "items": "claims.wikibase-item.tsv.gz",
    "P1963": "derived.P1963computed.count.star.tsv.gz",
    "external": "claims.external-id.tsv.gz",
    "indegree": "metadata.in_degree.tsv.gz",
    "outdegree": "metadata.out_degree.tsv.gz",
    "pagerank": "metadata.pagerank.directed.tsv.gz"
}

ck = ConfigureKGTK(big_files)
ck.configure_kgtk(input_graph_path=input_path, 
                  output_path=output_path, 
                  project_name=project_name,
                  additional_files=additional_files)

User home: /Users/filipilievski
Current dir: /Users/filipilievski/mcs/imkg
KGTK dir: /Users/filipilievski/mcs
Use-cases dir: /Users/filipilievski/mcs/use-cases


In [11]:
ck.print_env_variables()

OUT: projects/tutorial-kypher
KGTK_LABEL_FILE: wikidata/labels.en.tsv.gz
GRAPH: wikidata
KGTK_GRAPH_CACHE: projects/tutorial-kypher/temp.tutorial-kypher/wikidata.sqlite3.db
kgtk: kgtk
USE_CASES_DIR: /Users/filipilievski/mcs/use-cases
EXAMPLES_DIR: /Users/filipilievski/mcs/examples
KGTK_OPTION_DEBUG: false
STORE: projects/tutorial-kypher/temp.tutorial-kypher/wikidata.sqlite3.db
kypher: kgtk query --graph-cache projects/tutorial-kypher/temp.tutorial-kypher/wikidata.sqlite3.db
TEMP: projects/tutorial-kypher/temp.tutorial-kypher
label: wikidata/labels.en.tsv.gz
P31: wikidata/derived.P31.tsv.gz
items: wikidata/claims.wikibase-item.tsv.gz
P1963: wikidata/derived.P1963computed.count.star.tsv.gz
external: wikidata/claims.external-id.tsv.gz
indegree: wikidata/metadata.in_degree.tsv.gz
outdegree: wikidata/metadata.out_degree.tsv.gz
pagerank: wikidata/metadata.pagerank.directed.tsv.gz


## 1. Visualize most relations for the Distracted-Boyfriend meme

In [5]:
!kgtk query -i $TEMP/templates.kgtk.gz \
            --match '(:`kym:distracted-boyfriend`)-[r]->()' \
             --where 'r.label in ["kym:parent", "kym:child", "kym:year", "rdf:type", "m4s:fromAbout", "m4s:fromTags", "m4s:fromImage"]' \
            -o $TEMP/db_subject.kgtk.gz

Create node file:

In [6]:
!kgtk query -i $TEMP/labelfile.kgtk.gz -i $TEMP/db_subject.kgtk.gz \
            --match 'db: ()-[]->(n), \
                label: (n)-[r]->(l)' \
            --return 'n as id, l as label' / deduplicate \
            -o $TEMP/nodefile.kgtk.gz 

In [9]:
kgtk("""visualize-graph 
        -i $TEMP/db_subject.kgtk.gz
        --node-file $TEMP/nodefile.kgtk.gz
        --show-text above
        --tooltip-column label
        --direction arrow
        --edge-color-column label
        --edge-color-style d3.schemeDark2
        -o distracted.graph.html""")

## 2. Visualize children and parents for the TLDR meme

In [16]:
!kgtk query -i $TEMP/templates.kgtk.gz \
            --match '(n)-[r]->(n2)' \
            --where 'r.label in ["kym:parent", "kym:child"] and (n="kym:tldr" or n2="kym:tldr")' \
            -o $TEMP/db_subject.kgtk.gz

In [19]:
kgtk("""visualize-graph 
        -i $TEMP/db_subject.kgtk.gz
        --show-text above
        --tooltip-column label
        --direction arrow
        --edge-color-column label
        --edge-color-style d3.schemeDark2
        -o tldr.graph.html""")

## 3. Visualize all sibling relations

In [20]:
!kgtk query -i $TEMP/templates.kgtk.gz \
            --match '()-[r:`kym:sibling`]->()' \
            -o $TEMP/siblings.kgtk.gz

In [22]:
kgtk("""visualize-graph 
        -i $TEMP/siblings.kgtk.gz
        --direction arrow
        -o sibling.graph.html""")

## 4. Analyze graph

In [27]:
!kgtk cat -i $TEMP/templates.kgtk.gz -i $TEMP/wikidata_memes.kgtk.gz -i $TEMP/wikidata_ent.kgtk.gz -o $TEMP/templates_with_wd.kgtk.gz

In [28]:
!kgtk graph-statistics \
     -i $TEMP/templates_with_wd.kgtk.gz \
     --log-file $TEMP/meme_summary.txt \
     --output-statistics-only \
     -o $TEMP/meme_stats.tsv


	Using the fallback 'C' locale.
objc[7928]: Class GNotificationCenterDelegate is implemented in both /Users/filipilievski/opt/anaconda3/envs/wikiEnv/lib/libgio-2.0.0.dylib (0x19c2b8960) and /usr/local/Cellar/glib/2.72.2/lib/libgio-2.0.0.dylib (0x1b07f66b0). One of the two will be used. Which one is undefined.


In [35]:
!cat $TEMP/meme_summary.txt

graph loaded! It has 32275 nodes and 245720 edges

*** Top relations:
kym:sibling	111053
m4s:fromImage	53794
kym:tag	11050
m4s:fromAbout	5403
P31	4419
m4s:fromTags	3733
kym:spread	3070
m4s:structured_uri	2812
m4s:structured_value	2812
P279	2656

*** Degrees:
in degree stats: mean=7.613323, std=0.212806, max=1
out degree stats: mean=7.613323, std=0.181539, max=1
total degree stats: mean=15.226646, std=0.329010, max=1

*** PageRank
Max pageranks
28886	http://www.w3.org/2001/XMLSchema#timestamp	0.017156
4367	Q336	0.012597
2089	Q30	0.010275
18832	Q151885	0.008613
21912	Q11862829	0.007663

*** HITS
HITS hubs
570	kym:Meme	0.087287
487	0.5	0.083497
556	Q2927074	0.075007
561	Q4868296	0.070277
1159	kym:lolspeak-chanspeak	0.064123
HITS auth
1112	kym:fap	0.076770
1094	kym:derp	0.076497
1083	kym:cool-story-bro	0.075991
1252	kym:verbose-classy-memes	0.075928
1178	kym:noice	0.075895


What are the most common relations?

In [34]:
!kgtk query -i $TEMP/templates_with_wd.kgtk.gz \
    --match '(n)-[r]->()' \
    --return 'r.label, count(n) as c' \
    --order-by 'c desc' \
    -o $TEMP/rel_stats.tsv

Let's get nodes with highest indegree:

In [38]:
!kgtk query -i $TEMP/meme_stats.tsv \
    --match '(n1)-[:vertex_in_degree]->(n2)' \
    --return 'n1 as node1, printf("(%d),", n2) as node2' \
    --limit 10 \
    --order-by 'n2 desc'

node1	node2
Q277421	(99),
Q9633	(99),
Q42602	(99),
Q712378	(98),
youtube	(97),
Q4868296	(963),
Q42586	(96),
Q56	(95),
catchphrase	(95),
kymt:slang	(94),


## 4. Other stuff

Issue: how are the instances and the templates linked?