# Chapter 7 GraphFrames

Graphs are an interesting way to solve data problems because graph structures are a more intuitive approach to many classes of data problems.

In this chapter, you will learn about:
- Why use graphs?
- Understanding the classic graph problem: the flights dataset
- Understanding the graph vertices and edges
- Simple queries
- Using motif finding
- Using breadth first search
- Using PageRank
- Visualizing flights using D3

Whether traversing social networks or restaurant recommendations, it is easier to understand these data problems within the context of graph structures: vertices, edges, and properties:

![Graph](./asset/ch7-im1.png)

For example, within the context of social networks, the vertices are the people while the edges are the connections between them. Within the context of restaurant recommendations, the vertices (for example) involve the location, cuisine type, and restaurants while the edges are the connections between them (for example, these three restaurants are in Vancouver, BC, but only two of them serve ramen).

While the two graphs are seemingly disconnected, you can in fact create a social network + restaurant recommendation graph based on the reviews of friends within a social circle, as noted in the following figure:

![Graph](./asset/ch7-im2.png)

Another classic graph problem is the analysis of flight data: airports are represented by vertices and flights between those airports are represented by edges. Also, there are numerous properties associated with these flights, including, but not limited to, departure delays, plane type, and carrier:


In this chapter, we will use GraphFrames to quickly and easily analyze flight performance data organized in graph structures. Because we're using graph structures, we can easily ask many questions that are not as intuitive as tabular structures, such as finding structural motifs, airport ranking using PageRank, and shortest paths between cities. GraphFrames leverages the distribution and expression capabilities of the DataFrame API to both simplify your queries and leverage the performance optimizations of the Apache Spark SQL engine.

In addition, with GraphFrames, graph analysis is available in Python, Scala, and Java. Just as important, you can leverage your existing Apache Spark skills to solve graph problems (in addition to machine learning, streaming, and SQL) instead of making a paradigm shift to learn a new framework.

## Introductions to GraphFrames
GraphFrames utilizes the power of Apache Spark DataFrames to support general graph processing. Specifically, the vertices and edges are represented by DataFrames allowing us to store arbitrary data with each vertex and edge. While GraphFrames is similar to Spark's GraphX library, there are some key differences, including: