Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Neo4j generators for new batch_num argument #1050

Merged
merged 1 commit into from Mar 10, 2020

Conversation

huonw
Copy link
Member

@huonw huonw commented Mar 10, 2020

In 4070ccf (#844), a second argument (batch_num) was added to the sample_features function in the BatchedNodeGenerator class, and most subclasses were updated, but not the Neo4j ones. This PR adds that argument.

This code is untested on CI (even the notebooks #849), but that's being worked on (#1046), and I've manually verified the notebooks run for now.

See: #1016

@codeclimate
Copy link

codeclimate bot commented Mar 10, 2020

Code Climate has analyzed commit f352b30 and detected 0 issues on this pull request.

View more on Code Climate.

@huonw huonw requested a review from kjun9 March 10, 2020 01:20
@huonw huonw merged commit 86fe094 into develop Mar 10, 2020
@huonw huonw deleted the bugfix/1016-neo4j-typeerror branch March 10, 2020 01:30
@richardmark
Copy link

Thanks,
I downloaded the updates and ran the code for demo/connector/neo4j
I got an apoc.cypher error that requiring me to do the following

  1. Download the jar file for my version of Neo4j I copied apoc-3.5.0.9-all.jar to $NEO4J_HOME\plugins directory from https://github.com/neo4j-contrib/neo4j-apoc-procedures

  2. I changed the line to dbms.security.procedures.unrestricted=apoc.* in $NEO4J_HOME/conf/neo4j.conf

  3. Restart neo4j

  4. In Neo4j I ran - call dbms.procedures() and all the apoc procedures where there

  5. I ran demo/connector/neo4j ipynb files and they all worked

Thanks

@huonw
Copy link
Member Author

huonw commented Mar 10, 2020

Awesome; thanks for letting us know!

@richardmark
Copy link

richardmark commented Mar 10, 2020 via email

@richardmark
Copy link

richardmark commented Mar 10, 2020

I am getting the same error as reported by others when using py2neo 4.3.0 and Neo4j version 4.0 on Windows 10.

ClientError: SyntaxError: The old parameter syntax `{param}` is no longer supported. Please use `$param` instead (line 3, column 8 (offset: 79))
"        UNWIND {node_id_list} AS node_id"

Using py2neo 4.3.0 and Neo4j version 3.5 does not cause errors on Windows 10.

Error from directed-graphsage-on-cora-neo4j-example.ipynb
See full error

---------------------------------------------------------------------------
HydrationError                            Traceback (most recent call last)
~\Anaconda3\lib\site-packages\py2neo\internal\connectors.py in run(self, statement, parameters, tx, graph, keys, entities)
    371         try:
--> 372             raw_result = hydrator.hydrate_result(r.data.decode("utf-8"))
    373         except HydrationError as e:

~\Anaconda3\lib\site-packages\py2neo\internal\hydration\__init__.py in hydrate_result(self, data, index)
    432         if data.get("errors"):
--> 433             raise HydrationError(*data["errors"])
    434         return data["results"][index]

HydrationError: {'code': 'Neo.ClientError.Statement.SyntaxError', 'message': 'The old parameter syntax `{param}` is no longer supported. Please use `$param` instead (line 3, column 8 (offset: 79))\r\n"        UNWIND {node_id_list} AS node_id"\r\n                ^'}

During handling of the above exception, another exception occurred:

ClientError                               Traceback (most recent call last)
<ipython-input-18-121d29a55fd9> in <module>
----> 1 history = model.fit(train_gen, epochs=20, validation_data=test_gen, verbose=2, shuffle=False)

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    817         max_queue_size=max_queue_size,
    818         workers=workers,
--> 819         use_multiprocessing=use_multiprocessing)
    820 
    821   def evaluate(self,

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    233           max_queue_size=max_queue_size,
    234           workers=workers,
--> 235           use_multiprocessing=use_multiprocessing)
    236 
    237       total_samples = _get_total_number_of_samples(training_data_adapter)

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_training_inputs(model, x, y, batch_size, epochs, sample_weights, class_weights, steps_per_epoch, validation_split, validation_data, validation_steps, shuffle, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    591         max_queue_size=max_queue_size,
    592         workers=workers,
--> 593         use_multiprocessing=use_multiprocessing)
    594     val_adapter = None
    595     if validation_data:

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_inputs(model, mode, x, y, batch_size, epochs, sample_weights, class_weights, shuffle, steps, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    704       max_queue_size=max_queue_size,
    705       workers=workers,
--> 706       use_multiprocessing=use_multiprocessing)
    707 
    708   return adapter

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, standardize_function, shuffle, workers, use_multiprocessing, max_queue_size, **kwargs)
    950         use_multiprocessing=use_multiprocessing,
    951         max_queue_size=max_queue_size,
--> 952         **kwargs)
    953 
    954   @staticmethod

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, standardize_function, workers, use_multiprocessing, max_queue_size, **kwargs)
    745     # Since we have to know the dtype of the python generator when we build the
    746     # dataset, we have to look at a batch to infer the structure.
--> 747     peek, x = self._peek_and_restore(x)
    748     assert_not_namedtuple(peek)
    749 

~\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\data_adapter.py in _peek_and_restore(x)
    954   @staticmethod
    955   def _peek_and_restore(x):
--> 956     return x[0], x
    957 
    958   def _make_callable(self, x, workers, use_multiprocessing, max_queue_size):

~\Anaconda3\lib\site-packages\stellargraph\mapper\sequences.py in __getitem__(self, batch_num)
    135 
    136         # Get features for nodes
--> 137         batch_feats = self._sample_function(head_ids, batch_num)
    138 
    139         return batch_feats, batch_targets

~\Anaconda3\lib\site-packages\stellargraph\connector\neo4j\mapper.py in sample_features(self, head_nodes, batch_num)
    241             n=1,
    242             in_size=self.in_samples,
--> 243             out_size=self.out_samples,
    244         )
    245 

~\Anaconda3\lib\site-packages\stellargraph\connector\neo4j\sampler.py in run(self, neo4j_graphdb, nodes, n, in_size, out_size, seed)
    156                 neighbor_records = neo4j_graphdb.run(
    157                     in_sample_query,
--> 158                     parameters={"node_id_list": cur_nodes, "num_samples": in_num},
    159                 )
    160                 this_hop.append(neighbor_records.data()[0]["next_samples"])

~\Anaconda3\lib\site-packages\py2neo\database.py in run(self, cypher, parameters, **kwparameters)
    531         :return:
    532         """
--> 533         return self.begin(autocommit=True).run(cypher, parameters, **kwparameters)
    534 
    535     def separate(self, subgraph):

~\Anaconda3\lib\site-packages\py2neo\database.py in run(self, cypher, parameters, **kwparameters)
    826                                              graph=self.graph,
    827                                              keys=[],
--> 828                                              entities=entities))
    829         except CypherError as error:
    830             raise GraphError.hydrate({"code": error.code, "message": error.message})

~\Anaconda3\lib\site-packages\py2neo\internal\connectors.py in run(self, statement, parameters, tx, graph, keys, entities)
    375             if tx is not None:
    376                 self.transactions.remove(tx)
--> 377             raise GraphError.hydrate(e.args[0])
    378         else:
    379             result = CypherResult({

ClientError: SyntaxError: The old parameter syntax `{param}` is no longer supported. Please use `$param` instead (line 3, column 8 (offset: 79))
"        UNWIND {node_id_list} AS node_id"

@huonw
Copy link
Member Author

huonw commented Mar 10, 2020

Thanks for testing. I opened a separate issue #1055.

same error as reported by others

As reported by who? I'm interested to know about any other discussion.

@richardmark
Copy link

richardmark commented Mar 10, 2020 via email

@huonw
Copy link
Member Author

huonw commented Mar 10, 2020

Ah, I see. Fortunately I think the problem is entirely in stellargraph, since we have some Cypher queries that use the {...} syntax, so we don't have to wait for a fix in py2neo.

@richardmark
Copy link

richardmark commented Mar 10, 2020

Thanks,
A fix for Neo4j 4.0 would be great.

Not really a fix but more of an improvement would be to add data visualization of the results in the form of a confusion matrix plot something like this I modified from Kaggle can be added to the end of your ipynb file. You could just save def plot_confusion_matrix as a python file to clean the code

df["Predicted"] = df["Predicted"].replace("subject=", "", regex=True)
from sklearn.metrics import confusion_matrix
ConfusionMatrix = confusion_matrix(df["True"], df["Predicted"])
print("Confusion matrix:\n%s" % ConfusionMatrix)

import numpy as np
def plot_confusion_matrix(cm,
                          target_names,
                          title='Confusion matrix',
                          cmap=None,
                          normalize=True):
    """
    given a sklearn confusion matrix (cm), make a nice plot

    Arguments
    ---------
    cm:           confusion matrix from sklearn.metrics.confusion_matrix

    target_names: given classification classes such as [0, 1, 2]
                  the class names, for example: ['high', 'medium', 'low']

    title:        the text to display at the top of the matrix

    cmap:         the gradient of the values displayed from matplotlib.pyplot.cm
                  see http://matplotlib.org/examples/color/colormaps_reference.html
                  plt.get_cmap('jet') or plt.cm.Blues

    normalize:    If False, plot the raw numbers
                  If True, plot the proportions

    Usage
    -----
    plot_confusion_matrix(cm           = cm,                  # confusion matrix created by
                                                              # sklearn.metrics.confusion_matrix
                          normalize    = True,                # show proportions
                          target_names = y_labels_vals,       # list of names of the classes
                          title        = best_estimator_name) # title of graph

    Citiation
    ---------
    http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

    """
    import matplotlib.pyplot as plt
    import numpy as np
    import itertools

    accuracy = np.trace(cm) / float(np.sum(cm))
    misclass = 1 - accuracy

    if cmap is None:
        cmap = plt.get_cmap('Blues')

    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()

    if target_names is not None:
        tick_marks = np.arange(len(target_names))
        plt.xticks(tick_marks, target_names, rotation=45)
        plt.yticks(tick_marks, target_names)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


    thresh = cm.max() / 1.5 if normalize else cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        if normalize:
            plt.text(j, i, "{:0.4f}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
        else:
            plt.text(j, i, "{:,}".format(cm[i, j]),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")


    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
    plt.show()

#A normalized confusion matrix plot
plot_confusion_matrix(cm           = ConfusionMatrix, 
                      normalize    = False,
                      target_names = ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'],
                      title        = "Confusion Matrix")

#A normalized confusion matrix plot
plot_confusion_matrix(cm           = ConfusionMatrix,  
                      normalize    = True,
                      target_names = ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'],
                      title        = "Confusion Matrix, Normalized")

@huonw
Copy link
Member Author

huonw commented Mar 11, 2020

Not really a fix but more of an improvement would be to add data visualization of the results in the form of a confusion matrix plot something like this I modified from Kaggle can be added to the end of your ipynb file. You could just save def plot_confusion_matrix as a python file to clean the code

Thanks. That might be something great to include in one of our other example notebooks. We're trying to keep the Neo4j notebooks focused on the Neo4j-specific functionality, but data visualisation like that is perfect to go in one of the other ones, like the the GCN one or even the the GraphSAGE one. We could add a cell like:

from sklearn.metrics import confusion_matrix
confusion = confusion_matrix(df["True"], df["Predicted"])

names = sorted(set(df["True"]))

pd.DataFrame(confusion, index=names, columns=names)

It's not quite as nice as the plot, but it's less code because it can leverage the existing DataFrame formatting support.

If you were interested, I'd be very happy to help you open a pull request with an improvement like this.

@richardmark
Copy link

Thanks,
I will look over steps on opening a pull request.

Simplifying data visualization is crucial in decision making as well as using an ensemble of tools including graph neural networks. I have started to modify your code to integrate into a medical application that helps break a population of over 12 million people into numerous at risk groups for a large assortment of diseases and bad outcomes. An ensemble of tools like yours give significantly better overall predictions than just using a single tool. Likewise data visualization and the confusion matrix points out weaknesses in predictions and can help identify areas where additional features (risk factors) may need to added to the model or subsets of at risk patients that require root cause analysis methods to identify why they are outliers on our predictive models.

Rick

@huonw
Copy link
Member Author

huonw commented Mar 11, 2020

You're 100% correct about the importance of visualisation. However, StellarGraph is focused on graphs and graph machine learning, and part of that is build on the shoulders of giants: instead of having to invent our own pre-processing and post-processing (e.g. visualisation) pipeline, we can benefit from tools like scitkit-learn, Pandas, NumPy and matplotlib, and all the libraries that work with them. Once machine learning results have been computed, conventional visualisation/analysis of them can leverage those great libraries. This allows everyone to benefit from their existing skills, and from all of the numerous resources about using those libraries, and let's us Stellargraph developers focus on adding and improving our graph ML algorithms, rather than writing visualisation tutorials (and, the ones we write are likely to be worse than all of the other ones available on the internet, because we're graph experts, not visualisation ones 😄 ).

That said, a display of a confusion matrix would be a perfect addition to our notebooks.

Also, we're enthusiastic about people using StellarGraph for interesting applications. Please stay in touch, and file issues for any further help/advice we can offer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants