## Building Network Models
# UB Collectors Coworking Network
### CWN Communities

---

In [2]:
# Setting paths
import sys,os
import pathlib
sys.path.insert(0,os.path.expanduser('~/caryocar'))
sys.path.insert(0,os.path.abspath('..'))

In [3]:
# Building the networks from data
from setupmodels import *

In [4]:
# Importing libraries for analysis
import networkx as nx
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.rcParams.update(mpl.rcParamsDefault)
plt.style.use('seaborn-paper')
sns.set_color_codes('deep')

%matplotlib inline

In [5]:
graphsdir = os.path.abspath('./graphs')

if not os.path.isdir(graphsdir):
    !mkdir graphs

In [28]:
occs['eventDate'] = pd.to_datetime(occs['eventDate'])

---

In [6]:
print(nx.info(cwn))

Name: 
Type: CoworkingNetwork
Number of nodes: 6768
Number of edges: 10391
Average degree:   3.0706


# Filtering for visualization

In [123]:
from copy import deepcopy
g_filt = deepcopy(cwn)

Filtering weaker edges ($k_w < 10$)

In [124]:
g_filt.remove_edges_from([ (u,v) for u,v,w in g_filt.edges(data='weight_hyperbolic') if w < 10 ])

In [125]:
print(nx.info(g_filt))

Name: 
Type: CoworkingNetwork
Number of nodes: 6768
Number of edges: 1259
Average degree:   0.3720


Filtering components with score lower than $600$

In [141]:
sgs = nx.connected_component_subgraphs(g_filt)
sgs_filtered = list(filter(lambda g: sum(cnt for n,cnt in g.nodes(data='count'))>600 ,sgs))

In [166]:
sgs_filtered

[<networkx.classes.graph.Graph at 0x7f72fc533518>,
 <networkx.classes.graph.Graph at 0x7f72fc4f1320>,
 <networkx.classes.graph.Graph at 0x7f72ff4c1908>,
 <networkx.classes.graph.Graph at 0x7f72fc55e940>,
 <networkx.classes.graph.Graph at 0x7f72fc55e240>,
 <networkx.classes.graph.Graph at 0x7f72fc54fdd8>,
 <networkx.classes.graph.Graph at 0x7f72fc4f19b0>,
 <networkx.classes.graph.Graph at 0x7f72fc4f10b8>,
 <networkx.classes.graph.Graph at 0x7f72fc533390>,
 <networkx.classes.graph.Graph at 0x7f72fc5335c0>,
 <networkx.classes.graph.Graph at 0x7f72fc4f14e0>,
 <networkx.classes.graph.Graph at 0x7f72fc4f1828>]

In [168]:
g_filt = nx.compose_all(sgs_filtered)
nx.set_edge_attributes(g_f,'','taxons')

In [169]:
print(nx.info(g_filt))

Name: compose( ,  )
Type: Graph
Number of nodes: 545
Number of edges: 1158
Average degree:   4.2495


# Detecting Communities

In [171]:
import community

In [173]:
communities = community.best_partition(g_filt)

How many communities were found?

In [176]:
len(set(communities.values()))

30

Set community to nodes attributes and write gexf file

In [182]:
nx.set_node_attributes(g_filt,communities,name='community')
#nx.write_gexf(g_filt,'g.gexf')

# Coworking groups

First, we will aggregate the SCN by family

In [6]:
grp = dict(occs[['species','family']].groupby('family').apply(lambda x: set(x['species'])))
scn_family = scn.taxonomicAggregation(grp)
nx.set_edge_attributes(scn_family, dict([ ((u,v),int(ct)) for u,v,ct in scn_family.edges(data='count') ]), name='count')

## Same coworking group but distinct interests

## Similar interests but distinct coworking groups

### The case of *J.B.A. Bringel* and *D.A. Chaves*

*Bringel* and *Chaves* are both mainly *Asteraceae* collectors (they were included in the same SCN interest community, in another notebook).

In [9]:
bringel_families = sorted([ (f,d['count']) for f,d in dict(scn_family['bringel,jba']).items() ], key=lambda x:x[1], reverse=True)
chaves_families = sorted([ (f,d['count']) for f,d in dict(scn_family['chaves,da']).items() ], key=lambda x:x[1], reverse=True)

In [11]:
bringel_families[:5]

[('Asteraceae', 242),
 ('Myrtaceae', 37),
 ('Arecaceae', 22),
 ('Fabaceae', 21),
 ('Lythraceae', 20)]

In [10]:
chaves_families[:5]

[('Asteraceae', 566),
 ('Myrtaceae', 26),
 ('Fabaceae', 22),
 ('Melastomataceae', 18),
 ('Cyperaceae', 7)]

In [8]:
"Pct of Asteraceae: {}".format(bringel_families[0][1]/ sum( cnt for f,cnt in bringel_families ))

'Pct of Asteraceae: 0.5426008968609866'

In [12]:
"Pct of Asteraceae: {}".format(chaves_families[0][1]/sum( cnt for f,cnt in chaves_families ))

'Pct of Asteraceae: 0.7850208044382802'

However, they have not collaborated in specimens recording. This is not exclus

In [17]:
"Number of Bringel and Chaves co-authorship records: {}".format(cwn['bringel,jba'].get('chaves,da',0))

'Number of Bringel and Chaves co-authorship records: 0'

They are apparently not geographically nor temporally impeded of collaborating

In [22]:
occs.loc[ni['bringel,jba']]['stateProvince'].value_counts(normalize=False)

Goiás               203
Distrito Federal    137
Tocantins            87
Minas Gerais         14
Bahia                 4
Name: stateProvince, dtype: int64

In [21]:
occs.loc[ni['chaves,da']]['stateProvince'].value_counts(normalize=False)

Minas Gerais    577
Goiás           143
Name: stateProvince, dtype: int64

In [49]:
occs.loc[ni['bringel,jba']]['eventDate'].apply(lambda x: x.year).value_counts().sort_index()

2003.0     10
2004.0     56
2005.0     10
2006.0     11
2009.0     45
2010.0     35
2011.0    189
2012.0     81
2013.0      8
Name: eventDate, dtype: int64

In [50]:
occs.loc[ni['chaves,da']]['eventDate'].apply(lambda x: x.year).value_counts().sort_index()

2013    311
2014    397
2016     13
Name: eventDate, dtype: int64