# First Graph 
After speaking to Darren, we have an initial categorisation for the data:

## User Categories
- Community energy groups who would normally be the client / project owner
- Asset owners who might be the same as the client if it is a wholly owned community building but often the asset on which the system is being built or fitted is owned by a separate entity. The land in South Glos where we are hoping to do the new wind turbines is South Gloucestershire Council, for example. 
- Homeowners (individuals looking to get work done on their property)
- Companies providing services (not an individual with a company name)
- Individuals providing services (these might be freelancers or they work for a company that doesn't necessarily want to sign up - there are quite a lot of BEN members in this category)
- Product / system installers who would normally be the project managers and would buy all the various components so at this stage I don't think we need to have a further category for the sellers of the products. 
## Proffessional Categories
- Legal
- Planning Applications
- System Design
- Civil engineering
- Electrical engineering
- Groundworks
- Fundraising 
- Finance 
## Project Stages
- Project definition
- Feasibility study
- Heads of Terms
- Legal & Finance 
- Build
- Commision, Monitor and Maintain

# Plan
We can now create a database structure that reflects this categorisation and visualise it.



In [1]:
# import libraries
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

: 

In [67]:
# local file path
local_path = "/Users/oliverbrown/Desktop/Oli/Renewable-Energy-Connect/cee-members.csv"
df = pd.read_csv(local_path)
df = df.dropna()
df

Unnamed: 0,Position,Member Name,Website Link,Logo Image
0,1,361 Energy,http://www.361energy.org,https://communityenergyengland.org/files/img_c...
1,2,Abbotts Ann Community Land Trust,https://aaclt.abbottsann.com/,https://communityenergyengland.org/files/img_c...
2,3,Acocks Greener,https://www.facebook.com/groups/acocksgreener/,https://communityenergyengland.org/files/img_c...
4,5,Aldgate Solar Power,https://www.repowering.org.uk/aldgate-solar-po...,https://communityenergyengland.org/files/img_c...
5,6,Allgreen Energy,http://www.allgreenergy.org/,https://communityenergyengland.org/files/img_c...
6,7,Alwoodley Community Energy,https://alwoodleycommunityenergy.co.uk/,https://communityenergyengland.org/files/img_c...
7,8,Archenfield Community Environment Group,https://www.facebook.com/groups/2002263053275608/,https://communityenergyengland.org/files/img_c...
8,9,Ashford Borough Council,https://www.ashford.gov.uk/,https://communityenergyengland.org/files/img_c...
9,10,Ashton Hayes Community Energy,http://goingcarbonneutral.co.uk,https://communityenergyengland.org/files/img_c...
10,11,Ashwell Parish Council,http://www.ashwell.gov.uk/,https://communityenergyengland.org/files/img_c...


## Define Categories for labelling

In [72]:
user_category = ['Community Energy Group', 'Asset Owner', 'Homeowner','Service Provider (Company)','Service Provider (Individual)', 'Product / System Installers']
professional_category = ['Legal','Planning Applications','System Design','Civil Engineering','Electrical Engineering','Groundworks','Fundraising','Finance']
project_stages = ['Project definition','Feasibility study','Heads of Terms','Legal & Finance','Build','Commision, Monitor and Maintain']
project_stages[-1]
print(len(professional_category))
print(professional_category)

8
['Legal', 'Planning Applications', 'System Design', 'Civil Engineering', 'Electrical Engineering', 'Groundworks', 'Fundraising', 'Finance']


## Create Toy Data for Showcase Purposes

In [286]:
from faker import Faker
fake = Faker()

orgs = []
num_orgs = 100
for i in range(0,num_orgs):
    orgs.append(fake.company())

print(orgs)

def gen_prob_vec(input_labels,val):
    return_list = np.append(input_labels,'NA')
    print(return_list)
    prob_vec = np.full(len(input_labels),val)
    prob_vec = np.append(prob_vec,1-sum(prob_vec))
    return (return_list, prob_vec)

# make a probability vector to assign secondary professional categories to fake data
val_1 = 0.03
val_2 = 0.01
secondary_prof_cat, secondary_prof_vec = gen_prob_vec(professional_category,val_1)
secondary_proj_cat, secondary_proj_vec = gen_prob_vec(project_stages,val_2)

['Bryant-Ball', 'Warren Group', 'Barrett-Porter', 'Stewart-Young', 'Koch Inc', 'Mitchell-Munoz', 'Ramirez, Murphy and Hayden', 'Smith-Hughes', 'Webb-West', 'Jones Ltd', 'Finley and Sons', 'Reeves-Williams', 'Leonard-Flores', 'Ochoa Group', 'Martin-Townsend', 'Mcintosh Group', 'Aguirre, Powell and Escobar', 'Giles and Sons', 'Leonard Ltd', 'Berry and Sons', 'Brown, Mcguire and Barton', 'Parker LLC', 'Barker LLC', 'Gray, Watson and Smith', 'Brown Inc', 'Collins-Thompson', 'Lambert Group', 'Rodriguez LLC', 'Combs-Brown', 'Hoffman Group', 'Nicholson-Summers', 'Johnston and Sons', 'Griffith, Solis and Payne', 'Fisher, Mendoza and Clark', 'Bates LLC', 'Jenkins, Pratt and Diaz', 'White-Thornton', 'Garrison LLC', 'Adams-Leon', 'Flowers-Miller', 'Hudson-Taylor', 'Hancock, Parrish and Lopez', 'Blair, Cooper and Nelson', 'Reed-Clements', 'Flores LLC', 'Pearson Inc', 'Henderson, Terry and Mcmillan', 'Mendez-Bright', 'Hall-Perkins', 'Carter, Cannon and Little', 'Brown Ltd', 'Harrell, Mcguire and Wi

In [301]:
# make a list of colours for the different profesions
colour_choices = ['Bisque',
                  'Brown',
                  'BlueViolet',
                  'CadetBlue',
                  'GoldenRod',
                  'IndianRed',
                  'LightCoral',
                  'LightSkyBlue']
prof_colours = pd.DataFrame(professional_category,columns=['primary_professional_category'])
prof_colours['node_colour'] = colour_choices
prof_colours



Unnamed: 0,primary_professional_category,node_colour
0,Legal,Bisque
1,Planning Applications,Brown
2,System Design,BlueViolet
3,Civil Engineering,CadetBlue
4,Electrical Engineering,GoldenRod
5,Groundworks,IndianRed
6,Fundraising,LightCoral
7,Finance,LightSkyBlue


In [302]:
df = pd.DataFrame(data = orgs, columns=['name'])
df['primary_professional_category'] = np.random.choice(professional_category, size = len(df))
df['secondary_professional_category'] = np.random.choice(secondary_prof_cat,size = len(df), p = secondary_prof_vec)
df['primary_project_stage'] = np.random.choice(project_stages, size = len(df))
df['secondary_project_stage'] = np.random.choice(secondary_proj_cat, size = len(df), p = secondary_proj_vec)
#df['node_colour'] = 'Khaki'
df['node_size'] = 5
df = pd.merge(df,prof_colours,on='primary_professional_category',how='left')
df

Unnamed: 0,name,primary_professional_category,secondary_professional_category,primary_project_stage,secondary_project_stage,node_size,node_colour
0,Bryant-Ball,Finance,,Heads of Terms,,5,LightSkyBlue
1,Warren Group,Electrical Engineering,,Legal & Finance,,5,GoldenRod
2,Barrett-Porter,Groundworks,,Feasibility study,,5,IndianRed
3,Stewart-Young,Civil Engineering,,Feasibility study,,5,CadetBlue
4,Koch Inc,Legal,,Feasibility study,,5,Bisque
...,...,...,...,...,...,...,...
95,Stewart-Gray,Groundworks,,Project definition,,5,IndianRed
96,Cruz Group,Planning Applications,,Legal & Finance,,5,Brown
97,"Leon, Robbins and Nelson",Groundworks,,Heads of Terms,,5,IndianRed
98,"Wilson, Harrison and Guzman",Finance,,Build,,5,LightSkyBlue


In [1]:
# add address, phone number and webs ite to df
df['email'] = np.nan
df['address'] = np.nan
df['phone_number'] = np.nan
df

NameError: name 'np' is not defined

## Build Network Graph
Make the network graph to show the stages that projects go through. For each major step of the process, we want to attach all of the relevant service providers.

In [399]:
#pos_list = [(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)]
x_ords = list(range(1,len(project_stages)+1))
x_ords = [10,400,600,350,200,100]
y_ords = [10,10,250,400,500,600]

In [410]:
# create a dataframe for the project stages
df_stages = pd.DataFrame(data = project_stages, columns=['stages'])
df_stages['node_colour'] = 'DarkSalmon'
df_stages['node_size'] = 70
#df_stages['node_position'] = pos_list
df_stages['x'] = x_ords
df_stages['y'] = x_ords
df_stages

Unnamed: 0,stages,node_colour,node_size,x,y
0,Project definition,DarkSalmon,70,10,10
1,Feasibility study,DarkSalmon,70,400,400
2,Heads of Terms,DarkSalmon,70,600,600
3,Legal & Finance,DarkSalmon,70,350,350
4,Build,DarkSalmon,70,200,200
5,"Commision, Monitor and Maintain",DarkSalmon,70,100,100


In [411]:
project_stages_nodes = []
for idx, row in df_stages.iterrows():
    project_stages_nodes.append((row['stages'], {"color": row.node_colour,
                                                 "size": row.node_size,
                                                 "x": row.x,
                                                 "y": row.y}
                                ))

project_stages_nodes

[('Project definition', {'color': 'DarkSalmon', 'size': 70, 'x': 10, 'y': 10}),
 ('Feasibility study',
  {'color': 'DarkSalmon', 'size': 70, 'x': 400, 'y': 400}),
 ('Heads of Terms', {'color': 'DarkSalmon', 'size': 70, 'x': 600, 'y': 600}),
 ('Legal & Finance', {'color': 'DarkSalmon', 'size': 70, 'x': 350, 'y': 350}),
 ('Build', {'color': 'DarkSalmon', 'size': 70, 'x': 200, 'y': 200}),
 ('Commision, Monitor and Maintain',
  {'color': 'DarkSalmon', 'size': 70, 'x': 100, 'y': 100})]

In [412]:
# giving up and going manual!
actor_nodes = []
for idx, row in df.iterrows():
    actor_nodes.append((row['name'], {"Primary Professional Category":row.primary_professional_category,
                                               "secondary_professional_category":row.secondary_professional_category,
                                               "primary_project_stage": row.primary_project_stage,
                                               "secondary_project_stage": row.secondary_project_stage,
                                               "color": row.node_colour,
                                               "size": row.node_size}
                                               ))

actor_nodes[5]

('Mitchell-Munoz',
 {'Primary Professional Category': 'System Design',
  'secondary_professional_category': 'NA',
  'primary_project_stage': 'Build',
  'secondary_project_stage': 'Commision, Monitor and Maintain',
  'color': 'BlueViolet',
  'size': 5})

In [None]:
from pyvis import network as net

G = nx.DiGraph()

# add the project stages to the graph
G.add_nodes_from(project_stages_nodes)

# add the directed edges that link the project stages together
for i in range(len(project_stages)-1):
    G.add_edge(project_stages[i], 
               project_stages[i+1], 
               color = 'black',
               width = 100)

# add the different actors to the graph
G.add_nodes_from(actor_nodes)

# add the edges from the stages to the actors
for i in range(len(df)):
    G.add_edge(df['name'][i], df['primary_project_stage'][i])
    if df['secondary_project_stage'][i] != 'NA':
        G.add_edge(df['name'][i], df['secondary_project_stage'][i])

# create PyVis object and convert NetworkX object to PyVis Format
graph_to_show = net.Network(height='800px',
                            directed = True,
                            notebook=True,
                            select_menu=True,
                            filter_menu=True)
                            #heading='Renewable Energy Connect - Initial Scoping')

graph_to_show.repulsion()
graph_to_show.from_nx(G)
graph_to_show.write_html('test_graph.html')



To get more control of the node positioning, I'll add the project stages via `PyVis`. To then feed in node attributes for actors, we should use `NetworkX`

In [386]:
#project_stages_nodes
df_stages

Unnamed: 0,stages,node_colour,node_size,x,y
0,Project definition,DarkSalmon,10,10,10
1,Feasibility study,DarkSalmon,10,400,400
2,Heads of Terms,DarkSalmon,10,600,600
3,Legal & Finance,DarkSalmon,10,350,350
4,Build,DarkSalmon,10,200,200
5,"Commision, Monitor and Maintain",DarkSalmon,10,100,100


In [392]:
project_stages

['Project definition',
 'Feasibility study',
 'Heads of Terms',
 'Legal & Finance',
 'Build',
 'Commision, Monitor and Maintain']

In [None]:
from pyvis.network import Network

rec_net = Network(height = '750px',
                  width = '100%',
                  bgcolor='#222222',
                  font_color='white',
                  directed = True)

# set the physics layout
rec_net.barnes_hut()

# add the nodes via PyVis (rather than NetworkX)
# fix the project stage nodes first
# must do this node by node (annoyingly) to access the physics option

for k in range(len(project_stages)):
    rec_net.add_node(str(df_stages['stages'][k]), size = int(df_stages['node_size'][k]),
                                                  x = float(df_stages['x'][k]),
                                                  y = float(df_stages['y'][k]),
                                                  label = str(df_stages['stages'][k]),
                                                  color = str(df_stages['node_colour'][k]))#,             
                                                  #physics = False)

# add the directed edges that link the project stages together
for i in range(len(project_stages)-1):
    rec_net.add_edge(project_stages[i], project_stages[i+1])

rec_net.write_html('test_graph.html')