-
Notifications
You must be signed in to change notification settings - Fork 0
/
EE5.R
88 lines (69 loc) · 3.9 KB
/
EE5.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# loading packages
library(data.table)
library(igraph)
library(tidyverse)
library(lubridate)
# set working directory laptop
setwd("C:\\Users\\jentl\\Documents\\Emory\\Fall 2021\\Social Network Analytics\\EE#5")
# set working directory pc
# setwd("C:\\Users\\Jent\\Documents\\College\\Emory\\Social Network Analytics\\EE#5")
# loading data
comp <- fread('company_details.csv')
deals <- fread('deal_details.csv')
invest <- fread('investor_details.csv')
invdeal <- fread('investors_and_deals.csv')
# setting up network for analysis
# getting venture capital investors
invest <- invest[Investor_Type=='Venture Capital']
# deals from 1990 onwards
deals$Deal_Date <- as_date(deals$Deal_Date)
deals <- deals[year(Deal_Date)>=1990]
"We can define a status relationship for a pair of firms (A→B) as the proportion of times that
Firm A has served as a lead investor in deals it has participated in with Firm B"
"Count(Deals with B where A is lead)/total deals with B"
"Let this proportion be the the entries of a matrix representing a relationship between each
of the investors. The diagonals of this matrix should be 0."
'joins approach'
# initialize empty dt
dt <- data.table(Lead_Inv_Id=character(),Investor_Id=character(),status=numeric())
# investor with many deals for testing
j <- '10013-77'
temp1 <- invdeal[Investor_Id==j] # deals investor is a part of
temp2 <- invdeal[temp1,on='Deal_Id'] # all investors on these deals
d <- temp2[, .N, by=Investor_Id] # N represents total deals with investor j
temp3 <- invdeal[Investor_Id==j & Lead_Investor==1] # lead investor deals
temp4 <- invdeal[temp3, on='Deal_Id'] # all investors on these deals
ld <- temp4[, .N, by=Investor_Id] #number of lead deals between investor j and others
#status column between investor j and Investor_id Column
d[ld, on='Investor_Id', status:=i.N/N]
# setting no deals as zero, self-deals as zero
d[is.na(status) | Investor_Id==j,status:=0]
d$Lead_Inv_Id <- j
d[,N:=NULL]
setcolorder(d,c('Lead_Inv_Id','Investor_Id','status'))
dt <- rbind(dt,d)
# for loop of above test code
dt <- data.table(Lead_Inv_Id=character(),Investor_Id=character(),status=numeric())
for (i in unique(invdeal$Investor_Id)){
temp1 <- invdeal[Investor_Id==i] # deals investor is a part of
temp2 <- invdeal[temp1,on='Deal_Id'] # all investors on these deals
d <- temp2[, .N, by=Investor_Id] # N represents total deals with investor i
temp3 <- invdeal[Investor_Id==i & Lead_Investor==1] # lead investor deals
temp4 <- invdeal[temp3, on='Deal_Id'] # all investors on these deals
ld <- temp4[, .N, by=Investor_Id] #number of lead deals between investor i and others
#status column between investor i and Investor_id Column
d[ld, on='Investor_Id', status:=i.N/N]
# setting no deals as zero, self-deals as zero
d[is.na(status) | Investor_Id==i,status:=0]
d$Lead_Inv_Id <- i
d[,N:=NULL]
setcolorder(d,c('Lead_Inv_Id','Investor_Id','status'))
dt <- rbind(dt,d)
}
# saving as csv b/c loop takes a long time
fwrite(dt, 'status.csv')
status <- fread('status.csv')
# Creating a matrix looks computationally difficult, so new idea: make an igraph object
# undirected, with edge weight equal to status
# in order to do this, will first need inv1, inv2, status dt
"Then, each investor’s status can be represented as the Bonacich centrality of this matrix—an investor’s status is represented by its ability to be a leader on deals, as well as to be connected to other firms that lead their own deals. You can calculate Bonacich centrality using power_centrality() with the argument exponent = 0.75 to represent the commonly-used beta parameter. For the analysis, only consider firms that are actually a part of the status hierarchy, i.e., have co-invested with other firms and have non-missing values for status. To allow older ties to weaken over time, you can exclude ties that have not been renewed after five years."