# Installation

With the help of docker we downloaded a neo4j image locally, and we started the container with the following command: 
```
docker run  --platform linux/amd64 -p7474:7474  -p7687:7687 -d --env  NEO4J_AUTH=neo4j/test neo4j:latest
```


# DB Connection 

We will use Python to connect to our database. 
For this reason we should install the neo4j package from PyPI

In [5]:
! pip3 install neo4j

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[0m

In [6]:
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "test"), encrypted=False)

# Data Loading

First, we need to read our data from the file system, where they are save as .tsv files. The data were downloaded from
this url: https://snap.stanford.edu/data/act-mooc.html
We will use pandas in order to read our data

In [7]:

import pandas as pd

#Read data from .tsv files
actions = pd.read_csv('act-mooc/mooc_actions.tsv', sep='\t')
labels = pd.read_csv('act-mooc/mooc_action_labels.tsv', sep='\t')
features = pd.read_csv('act-mooc/mooc_action_features.tsv', sep='\t')

## Load Users

In [22]:
with driver.session() as session:

    for id in actions["USERID"].unique():
        result = session.run("Create (n:User) "
                        "SET n.id = $id "
                        "RETURN 'User:' + id(n)", id=int(id))
    

## Load Targets

In [25]:
    for id in actions["TARGETID"].unique():
        result = session.run("Create (n:Target) "
                        "SET n.id = $id "
                        "RETURN 'Target:' + id(n)", id=int(id))

## Load Actions

In order to load the actions to our graph database, we need to create a merged dataframe containing for each action its attributes

In [34]:
temp = actions.merge(labels, on=['ACTIONID'])
merged_actions =  temp.merge(features, on=['ACTIONID'])
merged_actions

Unnamed: 0,ACTIONID,USERID,TARGETID,TIMESTAMP,LABEL,FEATURE0,FEATURE1,FEATURE2,FEATURE3
0,0,0,0,0.0,0,-0.319991,-0.435701,0.106784,-0.067309
1,1,0,1,6.0,0,-0.319991,-0.435701,0.106784,-0.067309
2,2,0,2,41.0,0,-0.319991,-0.435701,0.106784,-0.067309
3,3,0,1,49.0,0,-0.319991,-0.435701,0.106784,-0.067309
4,4,0,2,51.0,0,-0.319991,-0.435701,0.106784,-0.067309
...,...,...,...,...,...,...,...,...,...
411744,411744,7026,8,2572041.0,0,-0.319991,-0.435701,0.106784,-0.067309
411745,411745,6842,8,2572043.0,0,-0.319991,-0.435701,0.106784,-0.067309
411746,411746,7026,9,2572048.0,0,-0.319991,-0.435701,0.106784,-0.067309
411747,411747,6842,5,2572054.0,0,-0.319991,-0.435701,0.106784,-0.067309


In [None]:
count = 0
for action in merged_actions.iterrows():
    count = count + 1

In [None]:
merged_actions