New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Neo4j generators for new batch_num argument #1050
Conversation
Code Climate has analyzed commit f352b30 and detected 0 issues on this pull request. View more on Code Climate. |
Thanks,
Thanks |
Awesome; thanks for letting us know! |
I am going to test your code using Neo4j version 4.0.0 and will let the folks on the Neo4j discussion boards on the results since so far not much on using Neo4j with graph neural networks.Rick
On Monday, March 9, 2020, 09:07:28 PM PDT, Huon Wilson <notifications@github.com> wrote:
Awesome; thanks for letting us know!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I am getting the same error as reported by others when using py2neo 4.3.0 and Neo4j version 4.0 on Windows 10.
Using py2neo 4.3.0 and Neo4j version 3.5 does not cause errors on Windows 10. Error from directed-graphsage-on-cora-neo4j-example.ipynb
|
Thanks for testing. I opened a separate issue #1055.
As reported by who? I'm interested to know about any other discussion. |
The error SyntaxError: The old parameter syntax `{param}` is no longer supported #791 is an error related to py2neosee SyntaxError: The old parameter syntax `{param}` is no longer supported · Issue #791 · technige/py2neo
|
|
|
| | |
|
|
|
| |
SyntaxError: The old parameter syntax `{param}` is no longer supported ·...
Using neo4j 4.0.0 Using py2neo v4 g.nodes.match(U, _id=3125349375).first() yields: ClientError: SyntaxError: The...
|
|
|
h 9, 2020, 11:04:53 PM PDT, Huon Wilson <notifications@github.com> wrote:
Thanks for testing. I opened a separate issue #1055.
same error as reported by others
As reported by who? I'm interested to know about any other discussion.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Ah, I see. Fortunately I think the problem is entirely in stellargraph, since we have some Cypher queries that use the |
Thanks, Not really a fix but more of an improvement would be to add data visualization of the results in the form of a confusion matrix plot something like this I modified from Kaggle can be added to the end of your ipynb file. You could just save def plot_confusion_matrix as a python file to clean the code df["Predicted"] = df["Predicted"].replace("subject=", "", regex=True)
from sklearn.metrics import confusion_matrix
ConfusionMatrix = confusion_matrix(df["True"], df["Predicted"])
print("Confusion matrix:\n%s" % ConfusionMatrix)
import numpy as np
def plot_confusion_matrix(cm,
target_names,
title='Confusion matrix',
cmap=None,
normalize=True):
"""
given a sklearn confusion matrix (cm), make a nice plot
Arguments
---------
cm: confusion matrix from sklearn.metrics.confusion_matrix
target_names: given classification classes such as [0, 1, 2]
the class names, for example: ['high', 'medium', 'low']
title: the text to display at the top of the matrix
cmap: the gradient of the values displayed from matplotlib.pyplot.cm
see http://matplotlib.org/examples/color/colormaps_reference.html
plt.get_cmap('jet') or plt.cm.Blues
normalize: If False, plot the raw numbers
If True, plot the proportions
Usage
-----
plot_confusion_matrix(cm = cm, # confusion matrix created by
# sklearn.metrics.confusion_matrix
normalize = True, # show proportions
target_names = y_labels_vals, # list of names of the classes
title = best_estimator_name) # title of graph
Citiation
---------
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
"""
import matplotlib.pyplot as plt
import numpy as np
import itertools
accuracy = np.trace(cm) / float(np.sum(cm))
misclass = 1 - accuracy
if cmap is None:
cmap = plt.get_cmap('Blues')
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if normalize:
plt.text(j, i, "{:0.4f}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
else:
plt.text(j, i, "{:,}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
plt.show()
#A normalized confusion matrix plot
plot_confusion_matrix(cm = ConfusionMatrix,
normalize = False,
target_names = ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'],
title = "Confusion Matrix")
#A normalized confusion matrix plot
plot_confusion_matrix(cm = ConfusionMatrix,
normalize = True,
target_names = ['Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'],
title = "Confusion Matrix, Normalized") |
Thanks. That might be something great to include in one of our other example notebooks. We're trying to keep the Neo4j notebooks focused on the Neo4j-specific functionality, but data visualisation like that is perfect to go in one of the other ones, like the the GCN one or even the the GraphSAGE one. We could add a cell like: from sklearn.metrics import confusion_matrix
confusion = confusion_matrix(df["True"], df["Predicted"])
names = sorted(set(df["True"]))
pd.DataFrame(confusion, index=names, columns=names) It's not quite as nice as the plot, but it's less code because it can leverage the existing DataFrame formatting support. If you were interested, I'd be very happy to help you open a pull request with an improvement like this. |
Thanks, Simplifying data visualization is crucial in decision making as well as using an ensemble of tools including graph neural networks. I have started to modify your code to integrate into a medical application that helps break a population of over 12 million people into numerous at risk groups for a large assortment of diseases and bad outcomes. An ensemble of tools like yours give significantly better overall predictions than just using a single tool. Likewise data visualization and the confusion matrix points out weaknesses in predictions and can help identify areas where additional features (risk factors) may need to added to the model or subsets of at risk patients that require root cause analysis methods to identify why they are outliers on our predictive models. Rick |
You're 100% correct about the importance of visualisation. However, StellarGraph is focused on graphs and graph machine learning, and part of that is build on the shoulders of giants: instead of having to invent our own pre-processing and post-processing (e.g. visualisation) pipeline, we can benefit from tools like scitkit-learn, Pandas, NumPy and matplotlib, and all the libraries that work with them. Once machine learning results have been computed, conventional visualisation/analysis of them can leverage those great libraries. This allows everyone to benefit from their existing skills, and from all of the numerous resources about using those libraries, and let's us Stellargraph developers focus on adding and improving our graph ML algorithms, rather than writing visualisation tutorials (and, the ones we write are likely to be worse than all of the other ones available on the internet, because we're graph experts, not visualisation ones 😄 ). That said, a display of a confusion matrix would be a perfect addition to our notebooks. Also, we're enthusiastic about people using StellarGraph for interesting applications. Please stay in touch, and file issues for any further help/advice we can offer! |
In 4070ccf (#844), a second argument (
batch_num
) was added to thesample_features
function in theBatchedNodeGenerator
class, and most subclasses were updated, but not the Neo4j ones. This PR adds that argument.This code is untested on CI (even the notebooks #849), but that's being worked on (#1046), and I've manually verified the notebooks run for now.
See: #1016