## The realm of possibility on VarageSale is more like MineCraft than Airbnb.

#### TODO:

- adapt sql for other network types
- move to active network instead of all time
- refactor indegree and outdegree


#### Edge Types

- Transaction
- Message
- follow
- Praise
- comment
- Interest
- Approximated notifications?

In [119]:
import pandas as pd
import pandas_gbq as pgbq
from scipy import stats

google_project_id = 'solid-ridge-104914'

In [146]:
# Calculate network properties 

with open('../sql/cumulative_networks/transaction_net.txt') as query_file:
    transaction_query = query_file.read()

transaction_net = pgbq.read_gbq(transaction_query, google_project_id, dialect='standard')
#pgbq.to_gbq(transaction_net, 'community_networks.transaction_net', google_project_id, if_exists='replace')
pgbq.to_gbq(transaction_net, 'community_networks.transaction_net', google_project_id, if_exists='replace')

Requesting query... ok.
Query running...
  Elapsed 12.01 s. Waiting...
  Elapsed 22.16 s. Waiting...
  Elapsed 32.69 s. Waiting...
  Elapsed 43.16 s. Waiting...
  Elapsed 53.9 s. Waiting...
Query done.
Processed: 19.9 GB

Retrieving results...
  Got page: 1; 100% done. Elapsed 66.45 s.
Got 2023 rows.

Total time taken 66.52 s.
Finished at 2017-06-24 22:26:57.
The existing table has a different schema. Please wait 2 minutes. See Google BigQuery issue #191



Streaming Insert is 100.0% Complete




In [147]:
# Cross correlate community facts and output key correlations

variables = list(transaction_net)[3:len(transaction_net)] #this is not a good way to create the list of variables

transaction_corr_matrix = pd.DataFrame(index=variables, columns=variables)
transaction_key_correlations = pd.DataFrame(columns=['pair','coefficient'])
corr_checked = list()

i = 0
j = 0

for i in range(len(variables)):
    for j in range(len(variables)):
        pair = variables[i]+','+variables[j]
        rho, pval = stats.spearmanr(transaction_net[[variables[i],variables[j]]])
        transaction_corr_matrix.set_value(variables[i], variables[j], rho)
        if pair not in corr_checked and rho < 0.99 and (rho > 0.5 or rho < -0.5):
            transaction_key_correlations.loc[len(transaction_key_correlations)] = [pair, rho]
            corr_checked.append(variables[j]+','+variables[i])

transaction_key_correlations.sort_values(by='coefficient', ascending=False, inplace=True)
transaction_key_correlations.reset_index(drop=True, inplace=True)

#Save to BigQuery
pgbq.to_gbq(transaction_corr_matrix, 'community_networks.transaction_corr_matrix', google_project_id, if_exists='replace')
pgbq.to_gbq(transaction_key_correlations, 'community_networks.transaction_key_correlations', google_project_id, if_exists='replace')
transaction_key_correlations

The existing table has a different schema. Please wait 2 minutes. See Google BigQuery issue #191



Streaming Insert is 100.0% Complete





Streaming Insert is 100.0% Complete




Unnamed: 0,pair,coefficient
0,"nodes,mau_may_2017",0.893027
1,"nodes,inventory",0.867513
2,"items_sold_may_2017,mau_may_2017",0.837955
3,"nodes,items_sold_may_2017",0.807662
4,"items_sold_may_2017,inventory",0.770706
5,"inventory,mau_may_2017",0.748851
6,"avg_degree,avg_indegree",0.719062
7,"mau_may_2017,avg_outdegree",0.686545
8,"avg_weight,avg_outdegree",0.65894
9,"inventory,avg_outdegree",0.658868


In [148]:
transaction_corr_matrix

Unnamed: 0,nodes,items_sold_may_2017,items_sold_change_2017,inventory,mau_may_2017,mau_change_2017,avg_weight,network_density_X_100,avg_degree,avg_indegree,avg_outdegree,indegree_skew
nodes,1.0,0.807662,0.151469,0.867513,0.893027,0.214864,0.219732,-0.68885,0.28338,0.265672,0.644698,-0.29365
items_sold_may_2017,0.807662,1.0,0.366256,0.770706,0.837955,0.327119,0.126679,-0.484799,0.346469,0.193568,0.589926,-0.342244
items_sold_change_2017,0.151469,0.366256,1.0,0.0983015,0.224724,0.250215,-0.0301276,-0.0275245,0.157664,0.0758456,0.10783,0.00363176
inventory,0.867513,0.770706,0.0983015,1.0,0.748851,0.0465965,0.256057,-0.597717,0.255058,0.212405,0.658868,-0.387237
mau_may_2017,0.893027,0.837955,0.224724,0.748851,1.0,0.491298,0.159063,-0.410932,0.554571,0.352237,0.686545,-0.242501
mau_change_2017,0.214864,0.327119,0.250215,0.0465965,0.491298,1.0,-0.110955,0.115272,0.430694,0.25005,0.161427,0.124093
avg_weight,0.219732,0.126679,-0.0301276,0.256057,0.159063,-0.110955,1.0,-0.0585791,0.168304,0.547164,0.65894,0.115437
network_density_X_100,-0.68885,-0.484799,-0.0275245,-0.597717,-0.410932,0.115272,-0.0585791,1.0,0.437108,0.306123,-0.186591,0.527594
avg_degree,0.28338,0.346469,0.157664,0.255058,0.554571,0.430694,0.168304,0.437108,1.0,0.719062,0.607329,0.202044
avg_indegree,0.265672,0.193568,0.0758456,0.212405,0.352237,0.25005,0.547164,0.306123,0.719062,1.0,0.58869,0.539643


### Notes on Correlations among transaction network properties
 
OBSERVATIONS
 
 Inventory correlations:
 
 - Communities where the average seller has a high number of buyers are also likely more have more inventory (0.658)
 - density is inversly correlated with inventory (row 17, -0.596)