# Joy Payton's Hello World Graph

## Import what I need

In [1]:
import graphlab as gl

## Use a little pre-created data from a .csv

This data is intended to reproduce a chart of family members.  In my line of work we often study familial traits to understand the heredity of autism.

In [2]:
subjects = gl.SFrame.read_csv("family.csv")

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1503940056.log


This non-commercial license of GraphLab Create for academic use is assigned to karen.payton@spsmail.cuny.edu and will expire on August 25, 2018.


------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


## Show the data, in a table format

Note that we don't always have all family members participate, and we often have mixed or step-families.

In [3]:
subjects.show

<bound method SFrame.show of Columns:
	Subject ID	int
	Father ID	int
	Mother ID	int

Rows: 11

Data:
+------------+-----------+-----------+
| Subject ID | Father ID | Mother ID |
+------------+-----------+-----------+
|    123     |    530    |    339    |
|    124     |    530    |    339    |
|    125     |    423    |    339    |
|    218     |    344    |    496    |
|    218     |    344    |    496    |
|    339     |    299    |    None   |
|    344     |    None   |    961    |
|    423     |    None   |    None   |
|    496     |    None   |    None   |
|    299     |    None   |    None   |
+------------+-----------+-----------+
[11 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.>

## Create a graph

Many times, my data sources might not be very clean -- I could be missing data, I could have multiple relationships listed in columns, etc.  In this section, for example, I have to drop missing data for each of my edge additions.

In [4]:
myGraph = gl.SGraph()

In [5]:
myGraph = myGraph.add_vertices(subjects, vid_field='Subject ID')
myGraph = myGraph.add_edges(subjects.dropna("Father ID"), src_field='Father ID', dst_field='Subject ID')
myGraph = myGraph.add_edges(subjects.dropna("Mother ID"), src_field='Mother ID', dst_field='Subject ID')

## Display a graph

In this case, arrows indicate that Source "is a parent of" Destination.  We can "follow the flow" of heredity by following arrows.  For example, 299 will pass some of their genetics to 339, who will in turn pass some genetics to 123, 124, and 125.

In [6]:
gl.canvas.set_target('ipynb')
myGraph.show(vlabel = "id", arrows=True)