# Upload activity matrix
Natalia Vélez, April 2021

In this notebook, we'll create four new collections in the OHOL database:
* `activity_matrix.{files, chunks}`: Full activity matrix stored in GridFS 
* `activity_vectors`: Activity matrix split into vectors by avatar
* `activity_labels`: Contains list of item,avatar labels

In [24]:
import pymongo, sys, bson, pickle
import numpy as np
import gridfs
from tqdm import notebook

Load data:

In [17]:
item_labels = np.loadtxt('outputs/activity_features.txt', dtype=np.int).tolist()
avatar_labels = np.loadtxt('outputs/activity_agg/avatarIDs.txt', dtype=np.int).tolist()
activity_matrix = np.loadtxt('outputs/activity_agg/activity_matrix.txt', dtype=np.int)

print('Item labels: %s' % len(item_labels))
print('Avatar labels: %s' % len(avatar_labels))
print('Activity matrix: %s' % str(activity_matrix.shape))

Item labels: 3044
Avatar labels: 1000
Activity matrix: (1000, 3044)


In [12]:
item_labels.astype(int).dtype

dtype('int64')

## Upload to database

Connect:

In [4]:
keyfile = '../6_database/credentials.key'
creds = open(keyfile, "r").read().splitlines()
myclient = pymongo.MongoClient('134.76.24.75', username=creds[0], password=creds[1], authSource='ohol') 
db = myclient.ohol

print(db)
print(db.list_collection_names())

Database(MongoClient(host=['134.76.24.75:27017'], document_class=dict, tz_aware=False, connect=True, authsource='ohol'), 'ohol')
['old_svd', 'old_jobmatrix', 'tech_tree', 'tech_tree_demo', 'objects', 'expanded_transitions', 'transitions', 'activity_labels', 'categories']


Upload labels:

In [15]:
labels = {'avatars': avatar_labels, 'items': item_labels}
labels_collection = db.activity_labels
labels_collection.insert(labels)

  labels_collection.insert(labels)


ObjectId('607dc8db7a2558a6fc4bd044')

Upload each row separately:

In [22]:
vectors_collection = db.activity_vectors
for avatar,vec in notebook.tqdm(zip(avatar_labels, activity_matrix), total=len(avatar_labels)):
    vectors_collection.insert_one({'avatar': avatar, 'activity': vec.tolist()})

  0%|          | 0/1000 [00:00<?, ?it/s]

Upload whole matrix through GridFS:

In [25]:
fs = gridfs.GridFS(db, collection='activity_matrix')
mtx_bin = bson.binary.Binary(pickle.dumps(activity_matrix, protocol=2), subtype=128)
mtx_fs = fs.put(mtx_bin, filename='activity_matrix')