# Educational Platforms Pairs: which do Kagglers take together? 

All Kagglers use their Machine Learning and Data Science skills when competing on this site. However, each person's educational journey to arrive here is different. Here, I explore the [2020 Kaggle Machine Learning & Data Science Survey](https://www.kaggle.com/c/kaggle-survey-2020) to examine the 'Educational Platforms' Kagglers have used to learn and augment their skills. In the survey, Kagglers were asked: <b>On which platforms have you begun or completed data science courses? (Select all that apply)</b>, and given the following platforms: [Cloud-cert](https://www.businessnewsdaily.com/10748-top-5-cloud-certifications.html), [Coursera](https://www.coursera.org/), [DataCamp](https://www.datacamp.com/), [edX](https://www.edx.org/), [Fast.ai](https://www.fast.ai/), [Kaggle](https://www.kaggle.com/), [LinkedIn](https://www.linkedin.com/), [Udacity](https://www.udacity.com/), [Udemy](https://www.udemy.com/), and finally University courses for credit. 

This notebook will examine both how individuals take these educational platforms, and educational platforms are taken together. 

### By the end of the notebook, we'll be able to see that University Courses (in red) are less related to all other types of educational platforms. 

In [None]:
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn import manifold, datasets
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
from itertools import combinations

# data used to create network from the bottom cells of the notebook

node_labels = ['Udacity', 'Cloud-cert', 'LinkedIn', 'edX', 'Fast.ai', 'DataCamp',
       'Kaggle', 'Udemy', 'Coursera', 'University']

all_edges = [(0,1,0.51), (0,2,0.42), (0,3,0.59),(0,4,0.66),(0,5,0.44), (0,6,0.18),
            (0,7,0.33), (0,8,0.24), (0,9,-0.01), (1,2,0.73),(1,3,0.38), (1,4,0.45),
            (1,5,0.41), (1,6,0.24), (1,7,0.24), (1,8,0.09), (1,9,0.2), (2,3,0.45),
            (2,4,0.26), (2,5,0.5), (2,6,0.29), (2,7,0.35), (2,8,0.09), (2,9,0.08),
            (3,4,0.46), (3,5,0.44), (3,6,0.14), (3,7,0.21), (3,8,0.21), (3,9,-0.04),
            (4,5,0.15), (4,6,0.32), (4,7,-0.04), (4,8,0.26), (4,9,0.0), (5,6,0.17),
            (5,7,0.21), (5,8,0.08), (5,9,-0.03), (6,7,0.02), (6,8,0.02), (6,9,-0.15),
            (7,8,0.03), (7,9,-0.2), (8,9,-0.1)]

nodes = range(0,len(node_labels))
g = nx.Graph()
g.add_nodes_from(nodes)
g.add_weighted_edges_from(all_edges)
pos = nx.spring_layout(g,seed=3)

fig, ax = plt.subplots(1,1,figsize=(10,6))
options = {"node_size": 700}
colorlist = ['tab:pink','tab:olive', 'tab:gray','tab:brown', 'tab:cyan', 'tab:purple',
             'tab:orange', 'tab:green', 'tab:blue', 'tab:red']
               
nx.draw_networkx_nodes(g, pos, nodelist=nodes,node_color=colorlist, **options)
for edge in all_edges:
    sedge = (edge[0],edge[1])
    style = "solid"
    if edge[2] < 0:
        style="dashed"
    else:
        style="solid"
    width = (int(abs(edge[2] * 10)))
    if width < 1:
        width = 1
    alpha = abs(edge[2])
    if alpha < .15:
        alpha = .15
    nx.draw_networkx_edges(g, pos, edgelist=[sedge],
                           width=width, alpha=alpha,edge_color='tab:gray',
                          style=style)
legend_elements = list()
for edu in range(0,len(node_labels)):
    legend_elements.append(Line2D([0], [0], marker='o', color='w', label=node_labels[edu],
                          markerfacecolor=colorlist[edu], markersize=15))

ax.legend(handles=legend_elements, 
             title="Educational Platforms",bbox_to_anchor=(1, 1), ncol=1);
plt.axis('off');

## Education Platform Analysis
1. Which education platform is most popular, and how many platforms do Kagglers report using?
    - Does Educational Platform popularity or number reported vary greatly based on gender, age or formal education?
1. How are pairs of Educationa Platforms related?
    - Do distinctive cluster appear when using a TSNE plot for dimensionality reduction? 
    - What is the pairwise relationship between education platforms? 
    
Breifly, I explore the relative popularity of these Educational Platforms, and the degree to which Educational Platforms are more likely to be taken together. I first examine the popularity and numbers of courses taken by the respondents, and examine if those broad quantifications vary based on the demographic variables of gender, overall education and age. Next, I perform an exploratory data analysis using a [TSNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) to see how the courses were related. I found that certain Platforms were in common regions of the plot, and then examined the pairwise relationship between the platforms. 


# Data Cleaning

First, let's look at all the survey contents,

In [None]:
#df = pd.read_csv('data/kaggle_survey_2020_responses.csv',low_memory=False)
df = pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv', low_memory=False);
df[:2]

When cleaning this data, I extracted the columns of interest for the Education Platforms and demographic variables (i.e., age, gender, and formal education). After renaming the column names, I change the coursename/Nan values in the Educational Platforms to 1's and 0's respectively. To avoid being overly influenced by small categories in my demographic variables, I collapsed age into 18-24, 25-29, 30-39, and 40+, gender into Man, Woman, and Everyone Else, and formal education into Doctoral, Masters, Bachelors, Other. 

After the initial processing, I made some further decisions on the data I will examine. All of the respondents who selected 'None' did not select any other Educational Platforms, I removed this column. I also removed the 'Other' column because choice is not specific to a particular Educational Platform. 

At the end of processing, I ended up with 11,890 responses with one or more of the 10 educational platforms.


In [None]:
# extracting the columns of interest
keep = [1,2,4,231,232,233,234,235,236,237,238,239,240,241,242]
education = df.iloc[:,keep]

# prepare to rename columns: remove redundant question and shorten for graphing
names = education.iloc[0,:].values
renamedict = dict()
for x in range(0,len(names)):
    renamedict[education.columns[x]] = names[x].replace('On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - ','') 

renamedict['Q1'] = 'Age'
renamedict['Q2'] = 'Gender'
renamedict['Q4'] = 'Formal Ed'
renamedict['Q37_Part_3'] = 'Kaggle'
renamedict['Q37_Part_8'] = 'LinkedIn'
renamedict['Q37_Part_9'] = 'Cloud-cert'
renamedict['Q37_Part_10'] = 'University'

# drop the first row, rename columns, gather the platforms in a list
education = education.drop(index=0,axis=0)
education = education.rename(columns=renamedict) 
platforms = list(education.columns[3:])

# change strings to 1's and Nan's to 0 for Education platforms
education[platforms] = education[platforms].replace(to_replace=np.nan, value=0)
education[platforms] = education[platforms].replace(to_replace=r'^\w', value=1, regex=True)
education['Formal Ed'] = education['Formal Ed'].replace(to_replace=np.nan, value='No Response')

# collapse age category
education['Age'] = education['Age'].replace(to_replace='18-21', value='18-24')
education['Age'] = education['Age'].replace(to_replace='22-24', value='18-24')
education['Age'] = education['Age'].replace(to_replace='30-34', value='30-39')
education['Age'] = education['Age'].replace(to_replace='35-39', value='30-39')
education['Age'] = education['Age'].replace(to_replace='40-44', value='40+')
education['Age'] = education['Age'].replace(to_replace='45-49', value='40+')
education['Age'] = education['Age'].replace(to_replace='50-54', value='40+')
education['Age'] = education['Age'].replace(to_replace='55-59', value='40+')
education['Age'] = education['Age'].replace(to_replace='60-69', value='40+')
education['Age'] = education['Age'].replace(to_replace='70+', value='40+')

# collapse gender category
ee = 'Everyone Else'
education['Gender'] = education['Gender'].replace(to_replace='Prefer to self-describe', value=ee)
education['Gender'] = education['Gender'].replace(to_replace='Nonbinary', value=ee)
education['Gender'] = education['Gender'].replace(to_replace='Prefer not to say', value=ee)

# collapse formal education category
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='Doctoral degree', value='Doctoral')
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='Master’s degree', value='Masters')
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='Bachelor’s degree', value='Bachelors')
other = 'Other'
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='Some college/university study without earning a bachelor’s degree', value=other)
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='Professional degree', value=other)
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='No formal education past high school', value=other)
education['Formal Ed'] = education['Formal Ed'].replace(to_replace='I prefer not to answer', value=other)

# chaeck to see if respondents that selected 'None' did not select other values
np.sum(np.logical_and(education['None'] == 1, education[platforms].sum(axis=1) > 1)) == 0

# Remove 'None' column, and drop the name from platforms
platforms.pop(10)
education = education.drop('None',axis=1)

# remove the 'Other' column and drop the name from the platforms
platforms.pop(10)
education = education.drop('Other',axis=1)

# remove rows that don't have any educational information
education = education[education[platforms].sum(axis=1) > 0]

In [None]:
education.head()

In [None]:
print("Total number of respondents:", len(education),"\nNumber of education platforms:",len(platforms))

# Broad Education Platform Use

First, I wanted to look at the relative popularity of the Educational Platforms, and how many platforms respondents reported taking. In addition, I examined the popularity and number of Educational Platforms across the demographic variables of Age, Gender, and Formal Education.

We can see that the lowest for formal education is two points with a correlation of .89, whild the majority of the points are correlated at .96 or greater. 

<i>Takeaway: at least within the broad strokes of Percent Taken and Number of Education Platforms, subgroups within the demographic variables of age, formal education and gender are highly correlated. </i>

In [None]:
overall_popularity = sorted(education[platforms].columns, key=lambda x: education[x].sum(),reverse=True)

# create a dataframe that I can put the correlations between the demographic variables to be able to use a stripplot
def platforms_and_courses(colname):
    platforms = pd.DataFrame(columns=education[colname].unique())
    courses = pd.DataFrame(columns=education[colname].unique())
    for formtype in education[colname].unique():
        df = education[overall_popularity][education[colname] == formtype]
        platforms[formtype] = (df.sum()/len(df)).values
        courses[formtype] = df.sum(axis=1).value_counts()
    plat = np.tril(platforms.corr(method='spearman'),k=-1)
    cour = np.tril(courses.corr(method='spearman'),k=-1)
    return plat[np.nonzero(plat)], cour[np.nonzero(cour)]

# get lists of correlations between platforms and number of courses
gender_platforms, gender_courses = platforms_and_courses('Gender')
formal_platforms, formal_courses = platforms_and_courses('Formal Ed')
age_platforms, age_courses = platforms_and_courses('Age')

# create lists of the the education type (i.e., platforms or courses), correlation number, and demographic name
ed_type = list(); correlation = list(); demographic = list()
def add_to_list(ed,cor,dem):
    for val in cor:
        ed_type.append(ed)
        correlation.append(val)
        demographic.append(dem)
add_to_list('Platforms',gender_platforms,'Gender')
add_to_list('Courses',gender_courses,'Gender')
add_to_list('Platforms',formal_platforms,'Formal Ed')
add_to_list('Courses',formal_courses,'Formal Ed')
add_to_list('Platforms',age_platforms,'Age')
add_to_list('Courses',age_courses,'Age')

# turn into a dataframe
df_corrs = pd.DataFrame.from_dict({'Ed_type':ed_type,'Correlation':correlation,'Demographic':demographic})

# get the popularity and number of courses taken
pct_platform = education[overall_popularity].sum()/len(education)
num_courses = education[platforms].sum(axis=1).value_counts()

# plot the three graphs
plt.figure(figsize=(16, 4))
ax = plt.subplot(1,3,1)
sns.barplot(np.arange(len(overall_popularity)),pct_platform);
ax.set_xticklabels(overall_popularity)
plt.xticks(rotation=90);
ax.set_title('Percent Taken Education Type')
ax.set_ylabel('Percent of Total')

ax = plt.subplot(1,3,2)
sns.barplot(np.arange(len(num_courses)),num_courses,color='tab:gray');
ax.set_ylabel('Number Respondents')
ax.set_xlabel('Total Courses Taken')
ax.set_title('Number of Courses Taken');
ax.set_xticklabels([1,2,3,4,5,6,7,8,9,10]);

ax = plt.subplot(1,3,3)
ax = sns.stripplot(x="Ed_type", y="Correlation", hue="Demographic", 
                   data=df_corrs,size=15,jitter=.2,alpha=.6)
plt.ylim(top=1.01,bottom=.86)
plt.legend(loc='lower right',markerscale=1.5);
ax.set_title('Correlation within Demographic Subtypes');

plt.subplots_adjust(wspace=.3)

# Exploratory Descriptive Analyses

### Clustermap
In the first step to finding how the Educational Platforms are related, I plotted a clustermap. Here, we do see some clusters forming within the groups.

In [None]:
g = sns.clustermap(education[platforms],row_cluster=True, method="ward")
g.cax.set_visible(False)
ax = g.ax_heatmap
ax.set_yticks([]);

### Exploratory Descriptive Analysis: TSNE

As a next step, I decided to make a [TSNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) plot of the educational platform data. This will allow me to examine the relationships between the variables. Since the data only cotains yes/no information per each observation, I chose to use the [Jaccard similarity metric](https://en.wikipedia.org/wiki/Jaccard_index) whe computing the similarity between the respondents. 

To aid in interpreting the TSNE plot, I first color the points by the number of courses taken. By doing this, I found that the center of the plot was the most course overlap, while the edges contained one or two different Educational Platforms. 

<i>Take away: The Educational Platforms will separate themselves into distinctive regions and overlap on the TSNE plot.</i>

In [None]:
# run the TSNE using the Jaccard similarity with high perplexity and learning rate
tsne = manifold.TSNE(n_components=2, init='pca',metric='jaccard',perplexity=200,learning_rate=500)
Y = tsne.fit_transform(education[platforms].values)

# save to use later if working on personal computer
#np.save('tsne-plx200-ln-500.npy',Y)
#Y = np.load('tsne-plx200-ln-500.npy')

# color the points based on the number of courses taken
numcourses = education[platforms].sum(axis=1)
nums = [1,2,3,4,5,6,7,8,9,10]
cols = ['k',
        'c','c',
        'g','g','g',
        'r','r','r','r']
colnum = (numcourses.replace(to_replace=nums,value=cols)).values

fig,ax = plt.subplots(ncols=2,figsize=(8,4))
ax[0].scatter(Y[:, 0], Y[:, 1],color='tab:gray',s=3)
ax[0].set_xticks([])
ax[0].set_yticks([])
ax[0].set_title('Unlabeled TSNE Output');


ax[1].scatter(Y[:,0], Y[:,1],color=colnum,s=3)
ax[1].set_xticks([])
ax[1].set_yticks([])
ax[1].set_title('Educational Platforms Reported');

from matplotlib.patches import Patch
from matplotlib.lines import Line2D

# create a legend 'by hand' 
legend_elements = [Line2D([0], [0], marker='o', color='w', label='Only 1',
                          markerfacecolor='k', markersize=10),
                   Line2D([0], [0], marker='o', color='w', label='2 - 3',
                          markerfacecolor='c', markersize=10),
                   Line2D([0], [0], marker='o', color='w', label='4 - 6',
                          markerfacecolor='g', markersize=10),
                   Line2D([0], [0], marker='o', color='w', label='7 - 10',
                          markerfacecolor='r', markersize=10),
                   ]

ax[1].legend(handles=legend_elements, 
             title="Number Platforms Reported",bbox_to_anchor=(1, 1), ncol=1);


Next, I colored the points by the specific Educational Platform. The more popular courses have distinct regions within the TSNE, while the less popular courses are more dispersed throughout. 

In [None]:
ncol=5
tablist = ['tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 
           'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan']
fix, ax = plt.subplots(nrows=2,ncols=ncol,figsize=(12,5))
for x in range(0,len(platforms)):
    yes = education[overall_popularity[x]]==1
    no = education[overall_popularity[x]] == 0
    xval = x//ncol
    yval = x%ncol
    ax[xval,yval].scatter(Y[:, 0][no], Y[:, 1][no],color='lightgray',s=1,alpha=.2)
    ax[xval,yval].scatter(Y[:, 0][yes], Y[:, 1][yes],color=tablist[x],s=3)
    ax[xval,yval].set_xticks([])
    ax[xval,yval].set_yticks([])
    ax[xval,yval].set_title(overall_popularity[x])
plt.suptitle('Regions of Educational Platforms within a TSNE Plot');

## What classes are most likely to be taken together?

In the TSNE above, we see that different educational platforms go together. Now, we'll look to quantify this by first finding how many times pairs of platforms go together, and how correlated the patterns of platforms are.


<i>Takeaway: Although pairs of educational platforms may go together frequently, the correlation between them may be low. For example, Coursera and Kaggle have the most co-occurrence (left top), but they are only correlated at 0.022.</i>


In [None]:
corr = education[overall_popularity].corr()
bycorr = list(corr.sum().sort_values(ascending=False).index)

corr = education[bycorr].corr()
corr_mask = np.triu(np.ones_like(corr, dtype=bool))

edu = education[overall_popularity].values
coocc = edu.T.dot(edu)
coocc_mask = np.triu(np.ones_like(coocc, dtype=bool))

plt.figure(figsize=(16,8))
ax=plt.subplot(1,2,1)
sns.heatmap(coocc.astype(float), xticklabels=overall_popularity, yticklabels=overall_popularity,
            annot=True, fmt='.0f', cmap=sns.diverging_palette(220, 20, as_cmap=True),mask=coocc_mask,
           vmax=3070,vmin=100);
ax.set_xlabel('Platforms Ordered by Overall Popularity')
ax.set_title('Co-occurrence Matrix for Education Platforms')

ax=plt.subplot(1,2,2)
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, mask=corr_mask,
            annot=True, cmap=sns.diverging_palette(220, 20, as_cmap=True),
           vmax=.2, vmin=-.2);
ax.set_xlabel('Platforms Ordered by Average Correlation');
ax.set_title('Correlation Matrix for Education Platforms');


## What courses pattern together? 

In this section, I look at the conditional probability of two courses occurring together, and the likliehood that events co-occurr together. 

On the left side, I have the [conditional probability ](https://en.wikipedia.org/wiki/Conditional_probability) of one course given together. This is typically represented as the probability of P(B|A), or the probability of B given A. In this data, it is the probability of taking one course, given that you've taken another. The formula is P(B|A) = probability of A and B / probability of A. 

One thing that pops out to me in this is how these results are still somewhat dependent on the overall popularity of the educational platform. For example, the probability that someone took Fast.ai given they took Coursera is 0.8 (bottom left), but the probability that you took Coursera given you took Fast.ai is 0.12 (upper right). This is influenced by overall popularity, as the probability of Coursera is aout 60%, while the probability of fast.ai is about 10%. 

Instead, a more informative measure is to divide the conditional probability by probability of A and the probability of B. This quantity tells us the probability that A and B co-occurr, divided by the probability that they occurr independtly of one another. The log of the quantity is taken so that both an increase, and a decrease in probability are on the same scale. (As a side note, feel free to share if this quantity has a specific name. It is almost the mutual information, but not quite.) Using this quantity, we see that on average, taking university courses for credit is less associated with the education platforms, with the exception of linkedin and cloud certifications. In contrast, we see that fast.ai and Udacity, and Linkedin and Coud Certification are more likely to be taken together. 

<i>Take home: by just looking at the conditional probability, all of the courses are popular given that you have taken Coursera. When examining how likely it is that two courses are taken together over the independent probablity, a more complex picture emerges. </i>


In [None]:
reverse_overall_popularity = sorted(education[platforms].columns, key=lambda x: education[x].sum(),reverse=False)
cond_prob = np.ones((len(overall_popularity),len(overall_popularity)))
op = overall_popularity
for a in range(0,len(op)):
    for b in range(0,len(op)):
        pab = np.sum(np.logical_and(education[op[a]]==1,education[op[b]]==1))/ len(education)
        pa = education[op[a]].sum()/len(education)
        cond_prob[a,b]= pab/pa
        
condprob_mask = np.zeros_like(cond_prob, dtype=bool)
np.fill_diagonal(condprob_mask,True,wrap=False)

def enrichment_metric(pop):
    enrichment = np.ones((len(pop),len(pop)))
    for a in range(0,len(pop)):
        for b in range(0,len(pop)):
            pab = np.sum(np.logical_and(education[pop[a]]==1,education[pop[b]]==1))/ len(education)
            pa = education[pop[a]].sum()/len(education)
            pb = education[pop[b]].sum()/len(education)
            enrichment[a,b]= np.log(pab/(pa*pb))
    return enrichment

enrichment = enrichment_metric(overall_popularity)
np.fill_diagonal(enrichment, 0, wrap=False)
enrichment = pd.DataFrame(enrichment,columns=overall_popularity)
enrichment_popularity = sorted(enrichment.columns, key=lambda x: enrichment[x].sum(),reverse=True)
enrichment = enrichment_metric(enrichment_popularity)
enrichment_mask = np.triu(np.ones_like(enrichment, dtype=bool))

plt.figure(figsize=(16,8))
ax=plt.subplot(1,2,1)
sns.heatmap(cond_prob,xticklabels=op,yticklabels=op, annot=True,
            cmap=sns.diverging_palette(220, 20, as_cmap=True),mask=condprob_mask,
           vmax=.8,vmin=0);
ax.set_title('Conditional Probability: P(Row|Column)');
ax.set_xlabel('Ordered by Overall Popularity')
ax=plt.subplot(1,2,2)
sns.heatmap(enrichment,xticklabels=enrichment_popularity,yticklabels=enrichment_popularity,
            annot=True, cmap=sns.diverging_palette(220, 20, as_cmap=True),
           vmin=-0.75,vmax=0.75,mask=enrichment_mask);
ax.set_title("Log 'Normalized' Conditional Probability: log(PA&B/PA*PB)");
ax.set_xlabel("Ordered by 'Normalized' Conditional Probability")
plt.suptitle("Conditional Probability versus 'Normalized' of Educational Platform Pairs");

Finally, we return to the network presented at the beginning of the notebook. I liked the information presented in the 'Normalized' Conditional Probability, but I wanted a different way to visualize it. I chose to make a network to see which platforms gravitated towards one another. 

<i>Takeaway: The University Courses are less likely to be taken with other courses. Cloud certification and linkedin courses are taken together, and fast.ai and Udacity.</i>

#### Graphical representation of the 'Normalized' Conditional Probability. Line thickness represents strength of association, dashed lines represent a negative association (see in thin lines with University.)

In [None]:
enrichment = enrichment_metric(enrichment_popularity)
node_labels = enrichment_popularity
all_edges = list(combinations(nodes, 2))

nodes = range(0,len(node_labels))
g = nx.Graph()
g.add_nodes_from(nodes)
for edge in all_edges:
    g.add_edge(edge[0], edge[1], weight=enrichment[edge[0],edge[1]])
pos = nx.spring_layout(g,seed=3)

fig, ax = plt.subplots(1,1,figsize=(10,6))
options = {"node_size": 700}
colorlist = ['tab:pink','tab:olive', 'tab:gray','tab:brown', 'tab:cyan', 'tab:purple',
             'tab:orange', 'tab:green', 'tab:blue', 'tab:red']
               
nx.draw_networkx_nodes(g, pos, nodelist=nodes,node_color=colorlist, **options)
for edge in all_edges:
    sedge = (edge[0],edge[1])
    style = "solid"
    if enrichment[edge[0],edge[1]] < 0:
        style="dashed"
    else:
        style="solid"
    width = (int(abs(enrichment[edge[0],edge[1]] * 10)))
    if width < 1:
        width = 1
    alpha = abs(enrichment[edge[0],edge[1]])
    if alpha < .15:
        alpha = .15
    nx.draw_networkx_edges(g, pos, edgelist=[sedge],
                           width=width, alpha=alpha,edge_color='tab:gray',
                          style=style)
legend_elements = list()
for edu in range(0,len(node_labels)):
    legend_elements.append(Line2D([0], [0], marker='o', color='w', label=node_labels[edu],
                          markerfacecolor=colorlist[edu], markersize=15))

ax.legend(handles=legend_elements, 
             title="Educational Platforms",bbox_to_anchor=(1, 1), ncol=1);
plt.axis('off');

Thanks for looking! My future plans for this kernel include:
- Determining if the Log 'Normalized' Conditional Probability is statistically significant for pairs of Education Platforms. For example, Fast.ai and Udacity have a value of 0.66, but is this more likely than random chance?

Any other thoughts/suggestions?