# S-Feature Analysis
This is an optional notebook where we will go through the steps of creating random walks using the S-path method. These can be used as a feature when learning HAS-embeddings in the HAS_entity_embeddings notebook. We will consider different implementation decisions and look at the results of using this feature for learning embeddings.

*What is this feature?* --> These random walks are intended to detect structural similarity. I.e. entities with similar types of neighbors are similar.

## Pre-requisite steps to run this notebook
1. You need to run the 1_candidate_label_creation notebook before this notebook.
2. gensim is a dependency. You can install it with `pip install --upgrade gensim`, or if you want to use Anaconda, `conda install -c conda-forge gensim`
3. `conda install -c pytorch faiss-cpu`

In [22]:
import os
import random
import numpy as np
import pandas as pd
from gensim.models import Word2Vec
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm
import json
from sklearn.preprocessing import MinMaxScaler
from collections import defaultdict
import math
import faiss

## parameters

**Embedding model parameters**   
*num_walks*: Number of random walks to start at each node with the S-feature walk method   
*walk_length*: Length of random walk started at each node  
*representation_size*: Number of latent dimensions to learn from each node  
*window_size*: Window size of skipgram model  
*workers*: Number of parallel processes  

**File/Directory parameters**  
*item_file*: File path for the file that contains entity to entity relationships (e.g. wikibase-item).  
*label_file*: File path for the file that contains wikidata labels.  
*work_dir*: same work_dir that you specified in the label creation notebook. We'll look for files created by that notebook here. Files created by this notebook will also be saved here.  
*store_dir*: Path to folder containing the sqlite3.db file that we will use for our queries. We will reuse an existing file if there is one in this folder. Otherwise we will create a new one.

In [23]:
# Embedding model params
num_walks = 10
walk_length = 10
representation_size = 64
window_size = 5
workers = 16

# File/Directory params
data_dir = "./data/wikidata-20210215-dwd"
item_file = "{}/claims.wikibase-item.tsv.gz".format(data_dir)
label_file = "{}/labels.en.tsv.gz".format(data_dir)
work_dir = "./output/wikidata-20210215-dwd"
store_dir = "./output/wikidata-20210215-dwd/temp-s"

### Process parameters and set up variables / file names

In [24]:
# Ensure paths are absolute
item_file = os.path.abspath(item_file)
label_file = os.path.abspath(label_file)
work_dir = os.path.abspath(work_dir)
store_dir = os.path.abspath(store_dir)
    
# Create directories
if not os.path.exists(work_dir):
    os.makedirs(work_dir)
output_dir = "{}/S_walks_analysis".format(work_dir)
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
if not os.path.exists(store_dir):
    os.makedirs(store_dir)
    
walks_file = "{}/s_walks.txt".format(output_dir)

# Setting up environment variables 
os.environ['ITEM_FILE'] = item_file
os.environ['LABEL_FILE'] = label_file
os.environ['STORE'] = "{}/wikidata.sqlite3.db".format(store_dir)
os.environ['LABEL_CREATION'] = "{}/label_creation".format(work_dir)
os.environ['OUT'] = output_dir
os.environ['kgtk'] = "kgtk" # Need to do this for kgtk to be recognized as a command when passing it through a subprocess call

### 1. Create simple embeddings for entities based on the types of their neighbors

These embeddings will have $|\tau|$ dimensions where $\tau$ is the set of distinct types amongst entities that share an edge with entities of type $t$. Each dimension of the embeddings will correspond to a type. The embedding for an entity will be created by filling in counts of neighbors of each type and normalizing each dimension.

#### 1.1 Gather list of entities of the type_to_profile along with the types of their neighbors and the counts of neighbors of those types

In [43]:
!kgtk query -i $ITEM_FILE -i $LABEL_CREATION/type_mapping.tsv --graph-cache $STORE \
-o $OUT/entity_neighbor_types.tsv \
--match '`'"$ITEM_FILE"'`: (e1)-[]->(e2), type: (e2)-[]->(t2)' \
--return 'distinct e1 as node1, t2 as label, count(e2) as node2, printf("%s_%s",e1,t2) as id' \
--order-by 'e1, t2'

In [44]:
!head $OUT/entity_neighbor_types.tsv | column -t -s $'\t'

node1  label      node2  id
P10    Q1209283   1      P10_Q1209283
P10    Q16521     1      P10_Q16521
P10    Q24862     1      P10_Q24862
P10    Q55983715  1      P10_Q55983715
P1000  Q15647814  1      P1000_Q15647814
P1000  Q5         2      P1000_Q5
P1001  Q1129645   1      P1001_Q1129645
P1001  Q11773926  1      P1001_Q11773926
P1001  Q15617994  1      P1001_Q15617994


order neighbor types by frequency of occurrence

In [45]:
!kgtk query -i $OUT/entity_neighbor_types.tsv -i $LABEL_FILE --graph-cache $STORE \
-o $OUT/neighbor_types_by_freq.tsv \
--match 'types: (ent)-[l {label:neigh_type}]->(), `'"$LABEL_FILE"'`: (neigh_type)-[:label]->(type_lab)' \
--return 'distinct neigh_type as node1, type_lab as label, count(distinct ent) as node2, neigh_type as id' \
--where 'type_lab.kgtk_lqstring_lang_suffix = "en"' \
--order-by 'node2 desc'

In [47]:
!wc -l $OUT/neighbor_types_by_freq.tsv

43185 /data/profiling/kgtk/entity_profiling/output/wikidata-20210215-dwd/S_walks_analysis/neighbor_types_by_freq.tsv


In [48]:
!head -101 $OUT/neighbor_types_by_freq.tsv | column -t -s $'\t'

node1      label                                                     node2     id
Q3624078   'sovereign state'@en                                      18138375  Q3624078
Q6256      'country'@en                                              17963939  Q6256
Q55983715  'organisms known by a particular common name'@en          9480017   Q55983715
Q48264     'gender identity'@en                                      6904029   Q48264
Q4369513   'sex of humans'@en                                        6902238   Q4369513
Q4656150   'Wikimedia project policies and guidelines'@en            6123811   Q4656150
Q28640     'profession'@en                                           5617188   Q28640
Q5         'human'@en                                                4771064   Q5
Q56005592  'Wikimedia help page'@en                                  4752748   Q56005592
Q35252665  'Wikimedia non-main namespace'@en                         4752682   Q35252665
Q7270      'republic'@en             

Trim down neighbor types to top 100

In [49]:
!head -101 $OUT/neighbor_types_by_freq.tsv > $OUT/neighbor_types_by_freq_trimmed_100.tsv

Filter out less-common neighbor types

In [50]:
%%time
!kgtk ifexists --input-file $OUT/entity_neighbor_types.tsv --filter-file $OUT/neighbor_types_by_freq_trimmed_100.tsv \
--output-file $OUT/entity_neighbor_types_trimmed_100.tsv --input-keys label --filter-keys node1

CPU times: user 16.2 s, sys: 3.23 s, total: 19.4 s
Wall time: 14min 23s


In [51]:
!cat $OUT/entity_neighbor_types.tsv | wc -l

278941812


In [52]:
!cat $OUT/entity_neighbor_types_trimmed_100.tsv | wc -l

210865858


Create embeddings in Python

In [25]:
%%time
neigh_types_df = pd.read_csv("{}/entity_neighbor_types_trimmed_100.tsv".format(output_dir), delimiter = '\t').fillna("")
print("loaded entity neighbor type counts")

neigh_types = neigh_types_df.label.unique()
entities = neigh_types_df.node1.unique()
neigh_type_to_idx = {neigh_types[ix] : ix for ix in range(len(neigh_types))}
ent_to_idx = {entities[ix] : ix for ix in range(len(entities))}
embeddings = np.zeros((len(entities),len(neigh_type_to_idx)), dtype=np.float32)

print("filling in embeddings...")
for ent, neigh_type, count in zip(neigh_types_df['node1'], neigh_types_df['label'], neigh_types_df['node2']):
    embeddings[ent_to_idx[ent], neigh_type_to_idx[neigh_type]] = count

# normalize each dimension
# embeddings -= np.nanmin(embeddings,0)
# embeddings /= [m if m != 0 else 1 for m in np.nanmax(embeddings,0)]

loaded entity neighbor type counts
filling in embeddings...
CPU times: user 6min 59s, sys: 1min 15s, total: 8min 15s
Wall time: 8min 36s


In [26]:
len(entities)

39030788

In [27]:
len(neigh_types)

100

In [56]:
%%time
dim = embeddings.shape[-1]
nlist = int(np.sqrt(embeddings.shape[0]))
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist)
assert not index.is_trained
index.train(embeddings)
assert index.is_trained
index.add(embeddings)

CPU times: user 3h 8min 56s, sys: 12min 50s, total: 3h 21min 46s
Wall time: 13min 21s


In [57]:
%time faiss.write_index(index, "{}/IVFFlat_100.index".format(output_dir))

CPU times: user 1.67 s, sys: 13.4 s, total: 15 s
Wall time: 33.3 s


In [28]:
%time index = faiss.read_index("{}/IVFFlat_100.index".format(output_dir))

CPU times: user 2.35 s, sys: 19.7 s, total: 22.1 s
Wall time: 1min 17s


In [7]:
!ls -lh $OUT/IVFFlat_100.index

-rw-r--r-- 1 nmklein div22 15G Apr 24 16:26 /data/profiling/kgtk/entity_profiling/output/wikidata-20210215-dwd/S_walks_analysis/IVFFlat_100.index


In [40]:
faiss.omp_set_num_threads(16) # limit cpu usage
index.nprobe = 20 # empirically found for 21-nn

closest neighbors to Putin

In [64]:
%%time
index.nprobe = 20
distances, neighbors = index.search(embeddings[[ent_to_idx["Q7747"]]],21)
neighbors = [entities[n] for n in neighbors[0]]
print(distances)
print(neighbors)

[[  0. 170. 179. 185. 192. 194. 199. 206. 206. 208. 209. 210. 213. 221.
  221. 221. 222. 224. 226. 227. 228.]]
['Q7747', 'Q1394', 'Q694826', 'Q855', 'Q79822', 'Q458702', 'Q11860', 'Q4074457', 'Q36740', 'Q567', 'Q86412932', 'Q7315', 'Q15031', 'Q55704', 'Q180589', 'Q208993', 'Q160318', 'Q2514', 'Q36591', 'Q1239291', 'Q310630']
CPU times: user 54.3 ms, sys: 4.66 ms, total: 59 ms
Wall time: 3.86 ms


In [13]:
for i in tqdm(range(num_batches)):
    begin = i * batch_size
    end = min((i+1)*batch_size, len(embeddings))
    search_idxs = list(range(begin,end))
    if entities[search_idxs[-1]] not in entity_to_neighs:
        print(i)
        break

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=39031.0), HTML(value='')))

22257



In [101]:
len(entity_to_neighs)

22257000

In [39]:
for i in tqdm(range(22257,num_batches)):
    begin = i * batch_size
    end = min((i+1)*batch_size, len(embeddings))
    search_idxs = list(range(begin,end))
    if entities[search_idxs[-1]] not in entity_to_neighs:
        print(i)
        break

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=16774.0), HTML(value='')))

37382



In [42]:
len(entity_to_neighs)

16773788

In [52]:
d1={1:2,2:3}
d2={4:2,5:3}
d1.update(d2)
d1

{1: 2, 2: 3, 4: 2, 5: 3}

In [49]:
with open("{}/entity_to_neighs_dict_0-57%.json".format(output_dir), 'r') as f:
    entity_to_neighs_1 = json.load(f)

In [51]:
len(entity_to_neighs_1) + len(entity_to_neighs)

39030788

In [54]:
len(entity_to_neighs)

39030788

In [53]:
entity_to_neighs.update(entity_to_neighs_1)

In [41]:
%%time
print("46 hours to get to 57% with 16 cores, another 43 to get to 77 with 8 cores, another 7.5 to get to 90 with 8 cores, another 3.5 with 16 cores to get to 100%")
# entity_to_neighs = dict()

index.nprobe = 20
k = 21 # number of nearest neighbors to find for each entity (including itself)
batch_size = 1000
num_batches = math.ceil(len(embeddings) / batch_size)
saving_points = [i * (num_batches//10) for i in range(1,10)]

for i in tqdm(range(37382,num_batches)):
    begin = i * batch_size
    end = min((i+1)*batch_size, len(embeddings))
    search_idxs = list(range(begin,end))
    if entities[search_idxs[-1]] in entity_to_neighs:
        continue
    distances, neighbors = index.search(embeddings[search_idxs],k)
    for j in range(len(search_idxs)):
        ent_idx = search_idxs[j]
        ent = entities[ent_idx]
        nbrs = [entities[nbr_idx] for nbr_idx in neighbors[j] if nbr_idx != ent_idx]
        entity_to_neighs[ent] = nbrs
    # intermitently save progress
    if i in saving_points:
        with open("{}/entity_to_neighs_dict_57-100%.json".format(output_dir), 'w') as f:
            json.dump(entity_to_neighs, f)
    
# save at very end
with open("{}/entity_to_neighs_dict_57-100%.json".format(output_dir), 'w') as f:
    json.dump(entity_to_neighs, f)

46 hours to get to 57% with 16 cores, another 43 to get to 77 with 8 cores, another 7.5 to get to 90 with 8 cores...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1649.0), HTML(value='')))


CPU times: user 1d 15h 35min 10s, sys: 1h 31min 21s, total: 1d 17h 6min 31s
Wall time: 3h 32min 18s


### 3. Perform the random walks

In [55]:
def random_walks_to_file(entity_to_neighs, walks_file, walk_length=10, num_walks=10):
    entities = entity_to_neighs.keys()
    print("num entities to perform walks from: {}".format(len(entities)))
    with open(walks_file, "w") as f:
        for ent in tqdm(entities):
            for i in range(num_walks):
                walk = random_walk_from_node(entity_to_neighs, ent, walk_length)
                f.write("{}\n".format(walk))


# Returns a string of space separated Q-nodes as a walk
def random_walk_from_node(entity_to_neighs, start_ent, walk_length):
    walk = start_ent
    cur_ent = start_ent
    cur_length = 1
    while cur_length < walk_length:
        next_ent = random.choice(entity_to_neighs[cur_ent])
        walk = "{} {}".format(walk, next_ent)
        cur_ent = next_ent
        cur_length += 1
    return walk

# from functools import lru_cache

# @lru_cache(maxsize = 1000000)
# def faiss_knn_cached(ent_idx, index, k):
#     distances, neighbors = index.search(embeddings[[ent_idx]],k+1)
#     return [nbr_idx for nbr_idx in neighbors[0] if nbr_idx != ent_idx]

# def random_walks_to_file(index, k, entities, ent_to_idx, walks_file, walk_length=10, num_walks=10):
#     print("num entities to perform walks from: {}".format(len(entities)))
#     with open(walks_file, "w") as f:
#         for ent_idx in tqdm(range(len(entities))):
#             for i in range(num_walks):
#                 walk = random_walk_from_node(index, k, entities, ent_to_idx, ent_idx, walk_length)
#                 f.write("{}\n".format(walk))


# # Returns a string of space separated Q-nodes as a walk
# def random_walk_from_node(index, k, entities, ent_to_idx, start_ent_idx, walk_length):
#     walk = entities[start_ent_idx]
#     cur_ent_idx = start_ent_idx
#     cur_length = 1
#     while cur_length < walk_length:
# #         next_ent = random.choice(entity_to_neighs[cur_ent])
#         next_ent_idx = random.choice(faiss_knn_cached(cur_ent_idx, index, k))
#         walk = "{} {}".format(walk, entities[next_ent_idx])
#         cur_ent_idx = next_ent_idx
#         cur_length += 1
#     return walk

In [None]:
# %%time
# faiss_knn_cached.cache_clear()
# random_walks_to_file(index, 20, entities, ent_to_idx, walks_file, walk_length, num_walks)

In [40]:
%%time
random_walks_to_file(entity_to_neighs, walks_file, walk_length, num_walks)

num entities to perform walks from: 12207186


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=12207186.0), HTML(value='')))


CPU times: user 18min 8s, sys: 56.6 s, total: 19min 5s
Wall time: 18min 58s


In [57]:
%%time
random_walks_to_file(entity_to_neighs, walks_file, walk_length, num_walks)

num entities to perform walks from: 39030788


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=39030788.0), HTML(value='')))

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)




CPU times: user 49min 15s, sys: 2min 43s, total: 51min 59s
Wall time: 51min 45s


### 4. Let's see what embeddings we learn if we only use this feature
Use Skip-Gram model to learn representations for the entities

In [59]:
!wc -l /data/profiling/kgtk/entity_profiling/output/wikidata-20210215-dwd/S_walks_analysis/s_walks.txt

390307880 /data/profiling/kgtk/entity_profiling/output/wikidata-20210215-dwd/S_walks_analysis/s_walks.txt


In [247]:
%%time
model = Word2Vec(corpus_file=walks_file, size=representation_size, window=window_size, min_count=0, sg=1, hs=1,
                 workers=workers)
model.wv.save("{}/S_embeddings.kv".format(output_dir))



CPU times: user 9d 12h 42min 2s, sys: 50min 57s, total: 9d 13h 32min 59s
Wall time: 8h 44min 38s


We want similar entities to have more similar embeddings. This feature aims to capture a measure of structural similarity amongst entities of the same type. Therefore we will compare entities within a type to evaluate the embeddings that are learned with this feature.

In [249]:
entity_to_neighs["Q7747"]

['Q855',
 'Q1394',
 'Q36740',
 'Q694826',
 'Q458702',
 'Q23505',
 'Q48990',
 'Q76',
 'Q107441',
 'Q567',
 'Q454925',
 'Q79822',
 'Q61064',
 'Q6294',
 'Q1030228',
 'Q83552',
 'Q93031',
 'Q33391',
 'Q135481',
 'Q7604']

In [252]:
model.wv.most_similar(positive=["Q7747"], topn=10)

[('Q259731', 0.9205145835876465),
 ('Q48990', 0.9030447602272034),
 ('Q107441', 0.8978070020675659),
 ('Q2624201', 0.8975504040718079),
 ('Q93031', 0.8960544466972351),
 ('Q21644165', 0.89321368932724),
 ('Q4491900', 0.8916231989860535),
 ('Q774872', 0.8908449411392212),
 ('Q61064', 0.8905149102210999),
 ('Q855', 0.8889703750610352)]

In [18]:
%%time
index.nprobe = 20
distances, neighbors = index.search(embeddings[[ent_to_idx["Q7747"]]],21)
neighbors = [entities[n] for n in neighbors[0]]
print(distances)
print(neighbors)

[[  0. 257. 379. 392. 398. 402. 406. 425. 431. 438. 439. 442. 450. 451.
  456. 458. 459. 461. 463. 466. 467.]]
['Q7747', 'Q855', 'Q1394', 'Q36740', 'Q694826', 'Q458702', 'Q23505', 'Q48990', 'Q76', 'Q107441', 'Q567', 'Q454925', 'Q79822', 'Q61064', 'Q6294', 'Q1030228', 'Q83552', 'Q93031', 'Q33391', 'Q135481', 'Q7604']
CPU times: user 2.91 s, sys: 162 ms, total: 3.07 s
Wall time: 117 ms


In [24]:
embeddings[[ent_to_idx["Q7747"],ent_to_idx["Q48990"]]]

array([[ 0.,  6.,  2.,  7., 11.,  6.,  1.,  4.,  6.,  2.,  1.,  1.,  1.,
        11.,  1.,  1.,  1.,  1.,  1.,  1.,  7.,  0.,  1.,  2.,  0., 12.,
         0.,  1.,  0.,  0.,  1.,  2.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,
        14.,  0.,  0.,  0.,  2.,  2.,  0.,  0., 11.,  0.,  0.,  0.,  0.,
         4.,  0.,  0.,  2.,  0.,  2.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,
         0.,  0.,  0.,  6.,  0.,  3.,  2.,  0.,  0.,  0.,  1.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  2.,  4.,  5.,  4.,  2.,  1.,  1.,  2.,  0.,  1.,  1.,
         9.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  3.,
         0.,  1.,  0.,  1.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        12.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  2.,  0.,  0.,  0.,  1.,
         5.,  0.,  0.,  6.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0

In [32]:
neigh_types[39]

'Q5'

Medvedev (former president and prime minister of Russia) to Putin

In [254]:
model.wv.similarity("Q23530","Q7747")

0.43997127

Robin williams to Putin

In [256]:
model.wv.similarity("Q83338","Q7747")

0.36006907

In [6]:
from gensim.models import KeyedVectors
model = KeyedVectors.load("{}/S_embeddings.kv".format(output_dir))

Robin Williams to Spike Lee

In [8]:
model.similarity("Q83338","Q51566")

0.28939056

In [9]:
model.similarity("Q83338","Q618352")

0.35379958

In [15]:
%%time
index.nprobe = 100
distances, neighbors = index.search(embeddings[[ent_to_idx["Q259731"]]],100)
neighbors = [entities[n] for n in neighbors[0]]
print(distances)
print(neighbors)

[[ 0. 45. 46. 47. 52. 53. 55. 57. 59. 64. 64. 65. 65. 66. 66. 69. 69. 70.
  72. 72. 73. 74. 74. 74. 74. 74. 76. 76. 77. 77. 79. 79. 79. 80. 81. 81.
  82. 82. 83. 83. 85. 85. 85. 86. 86. 86. 86. 87. 87. 87. 87. 87. 87. 87.
  88. 88. 88. 89. 89. 90. 90. 90. 90. 90. 91. 91. 92. 92. 92. 92. 92. 92.
  93. 93. 93. 93. 93. 93. 93. 94. 94. 94. 94. 94. 94. 94. 95. 95. 95. 95.
  95. 95. 95. 95. 95. 96. 96. 96. 96. 96.]]
['Q259731', 'Q449538', 'Q372026', 'Q366471', 'Q55038248', 'Q3710025', 'Q4199550', 'Q1980377', 'Q172844', 'Q1973460', 'Q4220628', 'Q34458', 'Q2371003', 'Q302434', 'Q834947', 'Q246497', 'Q4166843', 'Q2023785', 'Q574605', 'Q353451', 'Q1248105', 'Q289727', 'Q2637042', 'Q1384966', 'Q4400066', 'Q363560', 'Q4421420', 'Q65163420', 'Q451319', 'Q339581', 'Q4396201', 'Q450864', 'Q774872', 'Q1387629', 'Q83552', 'Q10132', 'Q481605', 'Q4494036', 'Q3918013', 'Q712412', 'Q3710055', 'Q81244', 'Q1667121', 'Q3918736', 'Q104668', 'Q4143171', 'Q7186', 'Q898722', 'Q4122907', 'Q621113', 'Q350778', 'Q42