Update Neo4j generators for new batch_num argument #1050

huonw · 2020-03-10T01:17:42Z

In 4070ccf (#844), a second argument (batch_num) was added to the sample_features function in the BatchedNodeGenerator class, and most subclasses were updated, but not the Neo4j ones. This PR adds that argument.

This code is untested on CI (even the notebooks #849), but that's being worked on (#1046), and I've manually verified the notebooks run for now.

See: #1016

codeclimate · 2020-03-10T01:18:17Z

Code Climate has analyzed commit f352b30 and detected 0 issues on this pull request.

View more on Code Climate.

richardmark · 2020-03-10T03:43:19Z

Thanks,
I downloaded the updates and ran the code for demo/connector/neo4j
I got an apoc.cypher error that requiring me to do the following

Download the jar file for my version of Neo4j I copied apoc-3.5.0.9-all.jar to $NEO4J_HOME\plugins directory from https://github.com/neo4j-contrib/neo4j-apoc-procedures
I changed the line to dbms.security.procedures.unrestricted=apoc.* in $NEO4J_HOME/conf/neo4j.conf
Restart neo4j
In Neo4j I ran - call dbms.procedures() and all the apoc procedures where there
I ran demo/connector/neo4j ipynb files and they all worked

Thanks

huonw · 2020-03-10T04:07:25Z

Awesome; thanks for letting us know!

richardmark · 2020-03-10T04:21:08Z

I am going to test your code using Neo4j version 4.0.0 and will let the folks on the Neo4j discussion boards on the results since so far not much on using Neo4j with graph neural networks.Rick On Monday, March 9, 2020, 09:07:28 PM PDT, Huon Wilson <notifications@github.com> wrote: Awesome; thanks for letting us know! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

richardmark · 2020-03-10T04:40:22Z

I am getting the same error as reported by others when using py2neo 4.3.0 and Neo4j version 4.0 on Windows 10.

ClientError: SyntaxError: The old parameter syntax `{param}` is no longer supported. Please use `$param` instead (line 3, column 8 (offset: 79))
"        UNWIND {node_id_list} AS node_id"

Using py2neo 4.3.0 and Neo4j version 3.5 does not cause errors on Windows 10.

Error from directed-graphsage-on-cora-neo4j-example.ipynb
See full error

---------------------------------------------------------------------------
HydrationError                            Traceback (most recent call last)
~\Anaconda3\lib\site-packages\py2neo\internal\connectors.py in run(self, statement, parameters, tx, graph, keys, entities)
    371         try:
--> 372             raw_result = hydrator.hydrate_result(r.data.decode("utf-8"))
    373         except HydrationError as e:

~\Anaconda3\lib\site-packages\py2neo\internal\hydration\__init__.py in hydrate_result(self, data, index)
    432         if data.get("errors"):
--> 433             raise HydrationError(*data["errors"])
    434         return data["results"][index]

HydrationError: {'code': 'Neo.ClientError.Statement.SyntaxError', 'message': 'The old parameter syntax `{param}` is no longer supported. Please use `$param` instead (line 3, column 8 (offset: 79))\r\n"        UNWIND {node_id_list} AS node_id"\r\n                ^'}

During handling of the above exception, another exception occurred:

ClientError                               Traceback (most recent call last)
<ipython-input-18-121d29a55fd9> in <module>
----> 1 history = model.fit(train_gen, epochs=20, validation_data=test_gen, verbose=2, shuffle=False)

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    817         max_queue_size=max_queue_size,
    818         workers=workers,
--> 819         use_multiprocessing=use_multiprocessing)
    820 
    821   def evaluate(self,

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    233           max_queue_size=max_queue_size,
    234           workers=workers,
--> 235           use_multiprocessing=use_multiprocessing)
    236 
    237       total_samples = _get_total_number_of_samples(training_data_adapter)

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_training_inputs(model, x, y, batch_size, epochs, sample_weights, class_weights, steps_per_epoch, validation_split, validation_data, validation_steps, shuffle, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    591         max_queue_size=max_queue_size,
    592         workers=workers,
--> 593         use_multiprocessing=use_multiprocessing)
    594     val_adapter = None
    595     if validation_data:

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_inputs(model, mode, x, y, batch_size, epochs, sample_weights, class_weights, shuffle, steps, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    704       max_queue_size=max_queue_size,
    705       workers=workers,
--> 706       use_multiprocessing=use_multiprocessing)
    707 
    708   return adapter

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, standardize_function, shuffle, workers, use_multiprocessing, max_queue_size, **kwargs)
    950         use_multiprocessing=use_multiprocessing,
    951         max_queue_size=max_queue_size,
--> 952         **kwargs)
    953 
    954   @staticmethod

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, standardize_function, workers, use_multiprocessing, max_queue_size, **kwargs)
    745     # Since we have to know the dtype of the python generator when we build the
    746     # dataset, we have to look at a batch to infer the structure.
--> 747     peek, x = self._peek_and_restore(x)
    748     assert_not_namedtuple(peek)
    749 

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in _peek_and_restore(x)
    954   @staticmethod
    955   def _peek_and_restore(x):
--> 956     return x[0], x
    957 
    958   def _make_callable(self, x, workers, use_multiprocessing, max_queue_size):

~\Anaconda3\lib\site-packages\stellargraph\mapper\sequences.py in __getitem__(self, batch_num)
    135 
    136         # Get features for nodes
--> 137         batch_feats = self._sample_function(head_ids, batch_num)
    138 
    139         return batch_feats, batch_targets

~\Anaconda3\lib\site-packages\stellargraph\connector\neo4j\mapper.py in sample_features(self, head_nodes, batch_num)
    241             n=1,
    242             in_size=self.in_samples,
--> 243             out_size=self.out_samples,
    244         )
    245 

~\Anaconda3\lib\site-packages\stellargraph\connector\neo4j\sampler.py in run(self, neo4j_graphdb, nodes, n, in_size, out_size, seed)
    156                 neighbor_records = neo4j_graphdb.run(
    157                     in_sample_query,
--> 158                     parameters={"node_id_list": cur_nodes, "num_samples": in_num},
    159                 )
    160                 this_hop.append(neighbor_records.data()[0]["next_samples"])

~\Anaconda3\lib\site-packages\py2neo\database.py in run(self, cypher, parameters, **kwparameters)
    531         :return:
    532         """
--> 533         return self.begin(autocommit=True).run(cypher, parameters, **kwparameters)
    534 
    535     def separate(self, subgraph):

~\Anaconda3\lib\site-packages\py2neo\database.py in run(self, cypher, parameters, **kwparameters)
    826                                              graph=self.graph,
    827                                              keys=[],
--> 828                                              entities=entities))
    829         except CypherError as error:
    830             raise GraphError.hydrate({"code": error.code, "message": error.message})

~\Anaconda3\lib\site-packages\py2neo\internal\connectors.py in run(self, statement, parameters, tx, graph, keys, entities)
    375             if tx is not None:
    376                 self.transactions.remove(tx)
--> 377             raise GraphError.hydrate(e.args[0])
    378         else:
    379             result = CypherResult({

ClientError: SyntaxError: The old parameter syntax `{param}` is no longer supported. Please use `$param` instead (line 3, column 8 (offset: 79))
"        UNWIND {node_id_list} AS node_id"

huonw · 2020-03-10T06:04:49Z

Thanks for testing. I opened a separate issue #1055.

same error as reported by others

As reported by who? I'm interested to know about any other discussion.

richardmark · 2020-03-10T12:01:06Z

The error SyntaxError: The old parameter syntax `{param}` is no longer supported #791 is an error related to py2neosee SyntaxError: The old parameter syntax `{param}` is no longer supported · Issue #791 · technige/py2neo | | | | | | | | | | | SyntaxError: The old parameter syntax `{param}` is no longer supported ·... Using neo4j 4.0.0 Using py2neo v4 g.nodes.match(U, _id=3125349375).first() yields: ClientError: SyntaxError: The... | | | h 9, 2020, 11:04:53 PM PDT, Huon Wilson <notifications@github.com> wrote: Thanks for testing. I opened a separate issue #1055. same error as reported by others As reported by who? I'm interested to know about any other discussion. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

huonw · 2020-03-10T21:41:51Z

Ah, I see. Fortunately I think the problem is entirely in stellargraph, since we have some Cypher queries that use the {...} syntax, so we don't have to wait for a fix in py2neo.

richardmark · 2020-03-10T22:17:26Z

Thanks,
A fix for Neo4j 4.0 would be great.

Not really a fix but more of an improvement would be to add data visualization of the results in the form of a confusion matrix plot something like this I modified from Kaggle can be added to the end of your ipynb file. You could just save def plot_confusion_matrix as a python file to clean the code

df["Predicted"] = df["Predicted"].replace("subject=", "", regex=True)
from sklearn.metrics import confusion_matrix
ConfusionMatrix = confusion_matrix(df["True"], df["Predicted"])
print("Confusion matrix:\n%s" % ConfusionMatrix)

import numpy as np
def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    import numpy as np
    import itertools

    accuracy = np.trace(cm) / float(np.sum(cm))
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

#A normalized confusion matrix plot
plot_confusion_matrix(cm           = ConfusionMatrix, 
                      normalize    = False,
                      target_names = ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'],
                      title        = "Confusion Matrix")

#A normalized confusion matrix plot
plot_confusion_matrix(cm           = ConfusionMatrix,  
                      normalize    = True,
                      target_names = ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'],
                      title        = "Confusion Matrix, Normalized")

huonw · 2020-03-11T02:19:05Z

Not really a fix but more of an improvement would be to add data visualization of the results in the form of a confusion matrix plot something like this I modified from Kaggle can be added to the end of your ipynb file. You could just save def plot_confusion_matrix as a python file to clean the code

Thanks. That might be something great to include in one of our other example notebooks. We're trying to keep the Neo4j notebooks focused on the Neo4j-specific functionality, but data visualisation like that is perfect to go in one of the other ones, like the the GCN one or even the the GraphSAGE one. We could add a cell like:

from sklearn.metrics import confusion_matrix
confusion = confusion_matrix(df["True"], df["Predicted"])

names = sorted(set(df["True"]))

pd.DataFrame(confusion, index=names, columns=names)

It's not quite as nice as the plot, but it's less code because it can leverage the existing DataFrame formatting support.

If you were interested, I'd be very happy to help you open a pull request with an improvement like this.

richardmark · 2020-03-11T12:31:11Z

Thanks,
I will look over steps on opening a pull request.

Simplifying data visualization is crucial in decision making as well as using an ensemble of tools including graph neural networks. I have started to modify your code to integrate into a medical application that helps break a population of over 12 million people into numerous at risk groups for a large assortment of diseases and bad outcomes. An ensemble of tools like yours give significantly better overall predictions than just using a single tool. Likewise data visualization and the confusion matrix points out weaknesses in predictions and can help identify areas where additional features (risk factors) may need to added to the model or subsets of at risk patients that require root cause analysis methods to identify why they are outliers on our predictive models.

Rick

huonw · 2020-03-11T20:34:57Z

You're 100% correct about the importance of visualisation. However, StellarGraph is focused on graphs and graph machine learning, and part of that is build on the shoulders of giants: instead of having to invent our own pre-processing and post-processing (e.g. visualisation) pipeline, we can benefit from tools like scitkit-learn, Pandas, NumPy and matplotlib, and all the libraries that work with them. Once machine learning results have been computed, conventional visualisation/analysis of them can leverage those great libraries. This allows everyone to benefit from their existing skills, and from all of the numerous resources about using those libraries, and let's us Stellargraph developers focus on adding and improving our graph ML algorithms, rather than writing visualisation tutorials (and, the ones we write are likely to be worse than all of the other ones available on the internet, because we're graph experts, not visualisation ones 😄 ).

That said, a display of a confusion matrix would be a perfect addition to our notebooks.

Also, we're enthusiastic about people using StellarGraph for interesting applications. Please stay in touch, and file issues for any further help/advice we can offer!

Update Neo4j generators for new batch_num argument

f352b30

huonw requested a review from kjun9 March 10, 2020 01:20

kjun9 approved these changes Mar 10, 2020

View reviewed changes

huonw merged commit 86fe094 into develop Mar 10, 2020

huonw deleted the bugfix/1016-neo4j-typeerror branch March 10, 2020 01:30

huonw mentioned this pull request Mar 10, 2020

Neo4j: notebooks fail with TypeError #1016

Closed

huonw mentioned this pull request Mar 10, 2020

Neo4j: GraphSAGE samplers do not work with neo4j 4.0 #1055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Neo4j generators for new batch_num argument #1050

Update Neo4j generators for new batch_num argument #1050

huonw commented Mar 10, 2020 •

edited

codeclimate bot commented Mar 10, 2020

richardmark commented Mar 10, 2020

huonw commented Mar 10, 2020

richardmark commented Mar 10, 2020 via email

richardmark commented Mar 10, 2020 •

edited by huonw

huonw commented Mar 10, 2020

richardmark commented Mar 10, 2020 via email

huonw commented Mar 10, 2020

richardmark commented Mar 10, 2020 •

edited by huonw

huonw commented Mar 11, 2020

richardmark commented Mar 11, 2020

huonw commented Mar 11, 2020

Update Neo4j generators for new batch_num argument #1050

Update Neo4j generators for new batch_num argument #1050

Conversation

huonw commented Mar 10, 2020 • edited

codeclimate bot commented Mar 10, 2020

richardmark commented Mar 10, 2020

huonw commented Mar 10, 2020

richardmark commented Mar 10, 2020 via email

richardmark commented Mar 10, 2020 • edited by huonw

huonw commented Mar 10, 2020

richardmark commented Mar 10, 2020 via email

huonw commented Mar 10, 2020

richardmark commented Mar 10, 2020 • edited by huonw

huonw commented Mar 11, 2020

richardmark commented Mar 11, 2020

huonw commented Mar 11, 2020

huonw commented Mar 10, 2020 •

edited

richardmark commented Mar 10, 2020 •

edited by huonw

richardmark commented Mar 10, 2020 •

edited by huonw