From 491e78a95a3ce90289acb7601ab523da76533777 Mon Sep 17 00:00:00 2001 From: shubhamvij Date: Sun, 11 Sep 2022 02:49:24 -0700 Subject: [PATCH] graph book --- .../notebooks/1_graphs_intro.ipynb | 128 ++++++++++++++++++ .../notebooks/2_node_embeddings.ipynb | 49 +++++++ books/graph_ml_for_engineers/pyproject.toml | 19 +++ 3 files changed, 196 insertions(+) create mode 100644 books/graph_ml_for_engineers/notebooks/1_graphs_intro.ipynb create mode 100644 books/graph_ml_for_engineers/notebooks/2_node_embeddings.ipynb create mode 100644 books/graph_ml_for_engineers/pyproject.toml diff --git a/books/graph_ml_for_engineers/notebooks/1_graphs_intro.ipynb b/books/graph_ml_for_engineers/notebooks/1_graphs_intro.ipynb new file mode 100644 index 0000000..8282b8e --- /dev/null +++ b/books/graph_ml_for_engineers/notebooks/1_graphs_intro.ipynb @@ -0,0 +1,128 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Graphs\n", + "\n", + "Graphs are a general mechanism for describing and analyzing entities with relations and interactions\n", + "\n", + "Rather than thinking of the world as a set of isolated data points:\n", + "\n", + "| Entities | Feature 1 | Feature 2 |\n", + "| ----------- | ----------- | ----------- |\n", + "| A | 1000 | 1001 |\n", + "| B | 2000 | 2002 |\n", + "\n", + "We think of the these entities and their networks and relations between other entities\n", + "\n", + "| Entities | Feature 1 | Feature 2 | \n", + "| ----------- | ----------- | ----------- | \n", + "| A | 1000 | 1001 | \n", + "| B | 2000 | 2002 | \n", + "\n", + "➕\n", + "\n", + "| Src Entity | Dst Entity | Relation Type | Feature 2 | Feature 2 | \n", + "| ------------- | ------------- | ------------- | ------------- | ------------- | \n", + "| A | B | Is Child Of | 1 | 1 | \n", + "| B | A | Is Parent Of | 2 | 2 |\n", + "\n", + "\n", + "\n", + "## What is the issue with non-graph related ML Toolbox?\n", + "\n", + "Designed for tabular data, grids of data, or sequences of data.\n", + "- Text / Audio sequences has a notion of left & right\n", + "- Images has a notion of up / down & left / right\n", + "\n", + "Graphs have arbritary size and arbritary topology and has no spatial locality.\n", + "\n", + "\n", + "In traditional ML, we take our nodes, links and entire graph and represent them as vectors; then we train a classical ML model on them i.e. Random forest, SVM, NN, etc.\n", + "Thus that when a new node / link / graph appears we can obtain its features to make a prediction. Traditional ML uses hand crafted features which is what we will talk about below.\n", + "\n", + "Traditional way to do node prediction:\n", + "\n", + "Given a graph $G$, set of vertices $V$ and Edges $E$, where $G = (V, E)$, we want to learn a function $f : V \\rightarrow \\Reals$\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### What are some features we can extract for nodes?\n", + "\n", + "- Node Degree\n", + " - The degree $d_v$ of a node $v$ is the number of edges the nodes has.\n", + " - CONS: We treat all neighboring nodes equally.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Node Centrality\n", + " - Try to capture the importance of a node, can be modeled by:\n", + " - Engienvector centrality\n", + " - \n", + " - Betweenness centrality\n", + " - Closeness centrality" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Clustering Coefficient" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Clustering Coefficient" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What are we trying to do?\n", + "\n", + "Given node $u$, we are trying to learn a nueral network $f$, such that we can generate a $d$ dimensional vector representaiton of the node $u$ : $f(u) \\rightarrow \\Reals^d$ ; where similar nodes have their vector representation spatially close to each other." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.9.12 ('graph-ml-for-engineers-9GQyHo6a-py3.9')", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.9.12" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "c5835645cbd39e77e80fd28b6a8a6b63c0a1f33699bc9c2aaafa2cbac9764660" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/books/graph_ml_for_engineers/notebooks/2_node_embeddings.ipynb b/books/graph_ml_for_engineers/notebooks/2_node_embeddings.ipynb new file mode 100644 index 0000000..c392be4 --- /dev/null +++ b/books/graph_ml_for_engineers/notebooks/2_node_embeddings.ipynb @@ -0,0 +1,49 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What are we trying to do?\n", + "\n", + "Input Graph --> Feature Engineering --> Apply some learning algorithm --> Drive some prediction\n", + "Wtih Graph represeatnation learning we want to eliminate the \"Feature Engineering\" step, and automatically learn the features\n", + "\n", + "\n", + "## So whats the idea?\n", + "\n", + "Learn a $f: u \\rightarrow \\mathbb{R}^d$ ; a function $f$ that given a node $u$ maps that node to a $d$ dimensional vector in the $\\mathbb{R}$ (real) space\n", + "\n", + "\n", + "Given that we learn the function $f$, what can we do?\n", + "You can use them for downstream gtasks such as Node classification, Link prediction, Graph classification, Clustering, Anomoly detection, et al.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.9.12 ('graph-ml-for-engineers-9GQyHo6a-py3.9')", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.9.12" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "c5835645cbd39e77e80fd28b6a8a6b63c0a1f33699bc9c2aaafa2cbac9764660" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/books/graph_ml_for_engineers/pyproject.toml b/books/graph_ml_for_engineers/pyproject.toml new file mode 100644 index 0000000..0212e2b --- /dev/null +++ b/books/graph_ml_for_engineers/pyproject.toml @@ -0,0 +1,19 @@ +[tool.poetry] +name = "graph-ml-for-engineers" +version = "0.1.0" +description = "Learning Graph Machine Learning made easy for Seasoned Engineers" +authors = ["shubhamvij "] +readme = "README.md" +packages = [{include = "graph_ml_for_engineers"}] + +[tool.poetry.dependencies] +python = "^3.9" +matplotlib = "networkx" + +[tool.poetry.group.dev.dependencies] +networkx = "^2.8.6" +matplotlib = "^3.5.3" + +[build-system] +requires = ["poetry-core"] +build-backend = "poetry.core.masonry.api"