# Grokking Graph Databases
> understanding graph databases and graph-backed apps

## Intro to graphs

### Table of Contents
+ What is a graph?
+ Graph databases
+ Reasons for not using a relational db
+ Graph db decision tree

### Intro

A relational database handles around 90% of the queries modern applications need to run. That leaves around 10% of questions where relational databases struggle.

Most of those remaining 10% questions that deal with links and connections within the data, and those aspects can generate powerful and unique insights.

### What is a graph?

Graphs have been studied by mathematicians for centuries. As a result we can give a very concise and consistent definition:

+ A **graph** is a set of vertices (plural for vertex) and edges.
+ A **vertex** is a point in a graph where zero or more edges meet. Vertices are often referred to as **nodes** or **entities**.
+ An **edge** is a relationship between two vertices within a graph, sometimes called a **relationship**, **link**, or **connection**.

Graphs are extremely simple to illustrate. Graph diagrams usually consist of circles representing vertices and lines representing edges.

![Graph diagram](../images/graph_diagram.png)

Note that we use the terms vertex and node, and edge and relationship/link interchangeably.

### What is a graph database?

A graph database is a data-storage engine that combines the basic graph structures of vertices (node) and edges (relationship/link) with a persistence technology and a traversal (query) language to create a database optimized for storage and fast retrieval of highly connected data.

The relationships in relational databases are foreign keys, which are pointers to primary keys in other tables. These pointers are not things we can observe and manipulate easy.

By contrast, graph databases represent these associations as full-fledged constructs of the database that can be easily observed and manipulated: in a graph database edges are "first-class citizens" just like the vertices.

### Types of databases

| Database type | Description | Products |
| :------------ | :---------- | :------- |
| Key-value | Represents all data by a unique identifier (key) and an associated object (the value) | Redis, Memcached |
| Column | Stores data in rows with a potentially large number or possibly varying numbers of columns in each row. | HBase, Cassandra, BigQuery |
| Document | Stores data in a uniquely keyed document that can have varying schema and that can contain nested data. | MongoDB, DynamoDB, CouchDB |
| Relational | Stores data in tables containing rows with strict schema. Relationships can be established between tables allowing the joining of rows. | PostgreSQL, MySQL, Microsoft SQL Server |
| Graph | Stores data as vertices (nodes/components) and edges (relationships/links) | Neo4j, Apache TinkerPop |

The complexity grows from top to bottom.

![Graph databases](../images/graph_db_types.png)

### Why don't use SQL?

While using SQL is possible, there are areas where graph databases excel, by providing a simpler, more elegant solution than using a relational db:
+ Recursive queries
+ Queries with different result types
+ Problems whose results are paths


While many problems seem to be graph problems, you should use a different database (RDBMS or search technologies such as Elasticsearch/Opensearch) when you do not require to explore rich relationships between the data.

For example:
+ Selection:
  + give me everyone who works at company A
  + find all customers wose first name starts with "John"
  + locate all stores within a 5 mile radius
+ Aggregation:
  + how many companies are in my system?
  + what are my average sales for each day over the past month?
  + what's the number of transactions processed by my system each day?

Other questions do require these rich relationship exploration:
+ Related/recursive data:
  + What's the easiest way for me to be introduced to the CEO of Company A?
  + How do John and Alice know each other?
  + How's company X related to company Y?
+ Pattern matching:
  + who in my system has a similar resume to mine?
  + does this transaction look like any fraudulent transaction seen in the past?
  + Is the user J. Smith the same as Johan S.?
+ Centrality, clustering, and influence:
  + Who's the most influential person I am connected with on LikedIn?
  + What equipment in my network has the most substantial impact if it breaks?
  + What parts tend to fail at the same time?

### Using a graph db decision tree

The following workflow describes a series of questions you should ask yourself when facing whether I should be choosing a graph db.

![Graph DB problem decision tree](../images/graph_choose_graph_db_workflow.png)

The most critical question is:
> Do we care about the relationship between entities as much or more than the entities themselves?



If you can answer yes to one or more of the questions of the workflow depicted above, it's more than likely that you have a graph problem in your hands.

If you are uncertain, it is recommended to execute a small project to evaluate the graph as a part of a solution.

Also, it is important to understand that using a graph database is not an "all or nothing" type of situation. Multi-model approaches in which graph databases are used to solve some of the problems, while others are solved using other types of databases are common and tend to be very successful.