<h1>Prove the validity of community in graph G_3 in active_plaintiff_patterns4</h1>
<p>设计随机试验：
	现在的graph其plaintiff的连接是基于其是否有合作过同一个律师而决定的，这样连下来的图经过community detection之后得到了几个plaintiff的cluster，我们可以先算一下目前数据里平均一个plaintiff要连接多少个lawyer，然后整理出lawyer的集合，让plaintiff随机(或者看看plaintiff与lawyer有没有明显的地域倾向，连接时可以挑选同一地方的lawyer)和集合中的lawyer做连接，然后重新计算出plaintiff的连接图并做community detection（确保community数量与要验证的plaintiff cluster数量一致），计算cluster的modularity和conductance，重复1000次，得到上述两个指标的均值和方差（此过程模拟了随机情况下plaintiff 和lawyer的连接），比较待验证cluster和随机过程得到的上述两种指标（add:搞清楚graph cart是什么），如果待验证的plaintiff cluster明显指标更好，就可以说明其存在的有效性</p>

In [1]:
import pandas as pd
import networkx as nx
import numpy as np
import random
import scipy.stats as stats
import itertools
import community as community_louvain
from sklearn.cluster import KMeans
import scipy.linalg as linalg
import sklearn.preprocessing
import scipy.sparse as sparse
import networkx.algorithms.community as nx_comm
from timeit import default_timer as timer

In [2]:
# comparative graph, has 7 communities
G_3 = nx.read_gexf("/Users/starice/OwnFiles/cityu/RA/case_study/case_study_result/networks/G_3.gexf")

In [3]:
all_cases = pd.read_csv('/Users/starice/OwnFiles/cityu/RA/case_study/data/total_extracted_result/all_cases.csv', encoding="utf-8")
# print(all_cases['case_id'].drop_duplicates())
all_cases = all_cases[all_cases['defendant'] != all_cases['lawyer']]
all_cases = all_cases[all_cases['lawyer']!="共同委托人"]

<h3>平均一个活跃plaintiff连接多少个lawyer</h3>
<p>plaintiff set: fps_200; lawyer_set: lawyers</p>

In [4]:
#获取所有一审案件
first_cases = all_cases[all_cases['procedure']=="一审"]
# print("一审案件数量： ", len(first_cases['case_id'].drop_duplicates()))

# 获取所有案件原告的节点度并排序
degree_1stplaintiffs = first_cases.groupby("plaintiff")['case_id'].unique().reset_index()
degree_1stplaintiffs['case_count'] = degree_1stplaintiffs['case_id'].apply(lambda r: len(r))
degree_1stplaintiffs.sort_values(by="case_count", inplace=True, ascending=False)
fps_200 = degree_1stplaintiffs[:200]
# fps_200
new_selected_1stcp = first_cases[first_cases['plaintiff'].isin(fps_200['plaintiff'])]

In [5]:
len(new_selected_1stcp['case_id'].drop_duplicates())

16070

In [6]:
temp = new_selected_1stcp.groupby("plaintiff")['lawyer'].nunique().reset_index()
print(temp['lawyer'].mean(), temp['lawyer'].std())
# 平均每个plaintiff和8个不同的lawyer合作

7.375 6.0678222894777845


In [7]:
lawyers = list(new_selected_1stcp['lawyer'].drop_duplicates())
plaintiffs = fps_200['plaintiff']

In [8]:
temp['lawyer'].describe()

count    200.000000
mean       7.375000
std        6.067822
min        0.000000
25%        3.000000
50%        6.000000
75%       11.000000
max       34.000000
Name: lawyer, dtype: float64

<h3>随机生成plaintiff和lawyer的连接</h3>

In [9]:
pdpl = pd.DataFrame(columns=["plaintiff", 'lawyer'])
for i in plaintiffs:
    tlawyers = random.sample(lawyers, 8)
    for j in tlawyers:
        pdpl = pdpl.append({"plaintiff": i, "lawyer": j}, ignore_index=True)

In [10]:
law_pdpl = pdpl.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()

In [11]:
law_pdpl.head()

Unnamed: 0,lawyer,plaintiff
0,丁刘,"(李玉婵, 李海林)"
1,丁长富,"(李波, 林海东)"
2,万丽娜,"(张丙刚, 王玥明, 向云福, 谢志胜)"
3,万方,"(贾龙, 郭勇, 陶然)"
4,万红平,"(李娟, 李双琴, 朱志鹏, 张正荣)"


In [12]:
G = nx.Graph()
G.add_nodes_from(plaintiffs)
for i in range(1, len(law_pdpl)+1):
    a = list(law_pdpl[i-1:i]['plaintiff'].values[0])
    if len(a) > 0:
        G.add_edges_from(list(itertools.combinations(a, 2)))

In [13]:
def randomGenerateGraph(): # 重新加地域信息（市）
    pdpl = pd.DataFrame(columns=["plaintiff", 'lawyer'])
    for i in plaintiffs:
        tlawyers = random.sample(lawyers, 8)
        for j in tlawyers:
            pdpl = pdpl.append({"plaintiff": i, "lawyer": j}, ignore_index=True)
    law_pdpl = pdpl.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()
    G= nx.Graph()
    G.add_nodes_from(plaintiffs)
    for i in range(1, len(law_pdpl)+1):
        a = list(law_pdpl[i-1:i]['plaintiff'].values[0])
        if len(a) > 0:
            G.add_edges_from(list(itertools.combinations(a, 2)))
    return G

In [14]:
# 看看有多少节点在G_3中不连通
# print('connected_components of graph: ',list(nx.connected_components(G_3))[0], len(list(nx.connected_components(G_3))[0]))#162 nodes
# print(G_3.nodes(), len(G_3.nodes()))#200 nodes

<h3>Spectral Clustering</h3>

In [15]:
# Spectral Clustering
# 补充提取律师的律所信息（先不做）
'''
https://www3.nd.edu/~kogge/courses/cse60742-Fall2018/Public/StudentWork/KernelPaperFinal/SCD-Sikdar-final.pdf
'''
#----------------------------------------------------------------------

def k_way_spectral(G, k):
    
    #去掉不连通的节点
    connected_nodes = list(list(nx.connected_components(G))[0])
    connected_graph = G.subgraph(connected_nodes)
    
    #再算一下随机连接的图里面最大连通的节点数有多少（可以再次证明abnormal）
    
    assert nx.is_connected(connected_graph), "the graph must be connnected"
    clusters = []
    if connected_graph.order() < k:
        clusters = list(connected_graph.nodes())
    else:
        L = nx.laplacian_matrix(connected_graph)
        # compute the first k + 1 eigenvectors
        _, eigenvecs = sparse.linalg.eigs(L.asfptype(), k=k+1, which='SM')
        eigenvecs = eigenvecs.real
        # discard the first trivial eigenvector
        eigenvecs = eigenvecs[:, 1:]
        # normalize each row by its L2 norm
        eigenvecs = sklearn.preprocessing.normalize(eigenvecs)
        # run K-means
        kmeans = KMeans(n_clusters=k).fit(eigenvecs)
        cluster_labels = kmeans.labels_
        clusters = [[] for _ in range(max(cluster_labels) + 1)]
        for node_id, cluster_id in zip(connected_graph.nodes(), cluster_labels):
            clusters[cluster_id].append(node_id)
    return clusters, connected_graph

In [16]:
# clusters
# Compute mean modularity and mean conductance for G_3(also std in addition)

def result_output(clusters, G):
    set_clusters = []
    for i in clusters:
        newG = G.subgraph(i)
        set_clusters.append(set(i))
#     print(set_clusters)
    conduct = np.mean([nx.conductance(G, cluster_i) for cluster_i in set_clusters])
    modula = nx_comm.modularity(G, set_clusters)
#     print(modula, conduct)
    return modula, conduct

In [17]:
#求G_3 community的modularity和conductance

clusters, connecG = k_way_spectral(G_3, 7)
a, b = result_output(clusters, connecG)
a, b

(0.5375050039655035, 0.1970649700890459)

In [136]:
# 生成随机图 -> 按照指定数量求partition -> 求分隔后整体社区的modularity和conductance -> 累计运行1000次和原有G_3做比较
# The conductance of the whole graph is the minimum conductance over all the possible cuts!!!

start = timer()

modulas, conducs = [], [] #存起来
for i in range(1000):
    G = randomGenerateGraph()
    clusters, connecG = k_way_spectral(G, 7)
    a, b = result_output(clusters, connecG)
    modulas.append(a)
    conducs.append(b)
    
end = timer()
print("------time used = " + str(end - start) + " s")


print(np.mean(modulas), np.std(modulas), np.mean(conducs), np.min(conducs), np.std(conducs)) # 补充std

# 拒绝了0假设，G_3的community是abnormal的

------time used = 1999.4415307470008 s
0.2296116217351217 0.017317788973549648 0.6277363547934968 0.5675226935379116 0.016731176106924452


<h3>按照地理位置（具体到市）重新生成plaintiff和lawyer的链接</h3>

In [18]:
new_selected_1stcp
temp_ns1stcp = new_selected_1stcp[new_selected_1stcp['lawyer'].notna()]
city_lawyer = temp_ns1stcp.groupby(['city'])['lawyer'].apply(set).to_dict()
city_plaintiff = temp_ns1stcp.groupby(['plaintiff'])['city'].apply(set).to_dict()

In [19]:
np.random.choice([1, 2, 3, 4, 5, 6, 7, 8, 9], 8, False).tolist()

[9, 6, 5, 7, 4, 8, 3, 1]

In [20]:
def randGenGraphwithinLocation(): # 重新加地域信息（市）
    pdpl = pd.DataFrame(columns=["plaintiff", 'lawyer'])
    tlawyer = []
    for p in list(city_plaintiff.keys()):
        for c in city_plaintiff[p]:
            if c in city_lawyer.keys():
                tlawyer += list(city_lawyer[c])
        if len(tlawyer) > 0:
            lawyers = np.random.choice(tlawyer, 8, True).tolist() if len(tlawyer) < 8 else np.random.choice(tlawyer, 8, False).tolist()
            for j in lawyers:
                pdpl = pdpl.append({"plaintiff": p, "lawyer": j}, ignore_index=True)
    law_pdpl = pdpl.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()
    G = nx.Graph()
    G.add_nodes_from(plaintiffs)
    for i in range(1, len(law_pdpl)+1):
        a = list(law_pdpl[i-1:i]['plaintiff'].values[0])
        if len(a) > 0:
            G.add_edges_from(list(itertools.combinations(a, 2)))
    return G

In [82]:
# 生成随机图 -> 按照指定数量求partition -> 求分隔后整体社区的modularity和conductance -> 累计运行1000次和原有G_3做比较

start = timer()

modulas, conducs = [], [] #存起来
for i in range(1000):
    G = randGenGraphwithinLocation()
    clusters, connecG = k_way_spectral(G, 7)
    a, b = result_output(clusters, connecG)
    modulas.append(a)
    conducs.append(b)

end = timer()
print("------time used = " + str(end - start) + " s")
print(np.mean(modulas), np.std(modulas), np.mean(conducs), np.min(conducs), np.std(conducs)) # 补充std

------time used = 2163.4393698250024 s
0.1674507166579698 0.0163721276292351 0.6899982640587417 0.015281769101389692


<h2>Addition: Analyse the G_3 Community</h2>

In [21]:
partition_3 = community_louvain.best_partition(G_3)
partitions_3 = []
for i in range(50): #看一下前十个community
    partitions_3.append({k for k, v in partition_3.items() if v==i})

In [22]:
len(G_3.edges())

1412

<h4>统计cluster里律师的相关案件次数排序， 选择每个cluster内较高的律师，重新选择由高频律师主导的数据并且画图，比较前后数据集构成的community内的边差多少</h4>

In [23]:
partition_list = []
marginal_list = []
for i in partitions_3:
    if len(i) > 1: partition_list.append(list(i))
    if len(i) == 1: marginal_list.append(list(i)[0])

In [24]:
# partition_list#louvain

In [25]:
clusters#spectral clustering

[['张正荣',
  '秦东',
  '况力彬',
  '晏勇',
  '蒋飞亮',
  '俞光新',
  '强大应',
  '周开礼',
  '谢志桂',
  '叶润军',
  '胡鎏亮',
  '李军',
  '余定勇',
  '覃玉东',
  '胡玉宝',
  '贾龙',
  '朱志鹏',
  '刘庆生',
  '王堂飞',
  '王伟华',
  '李玉婵',
  '李海林',
  '姚金东',
  '李战江',
  '李俊华',
  '阳秋旺',
  '尹前林',
  '张伟',
  '徐忠',
  '胡健',
  '崔毅',
  '邓德波',
  '王飞',
  '谢志胜',
  '许承凯',
  '徐水江',
  '郁德专',
  '张翼',
  '常晓恒',
  '贾伟',
  '陈亚平',
  '张望成',
  '贾涛',
  '张佩',
  '李娟',
  '林海东',
  '何林松',
  '杜文江',
  '胡勇',
  '殷庆',
  '彭海波',
  '李波',
  '程祥',
  '罗伟',
  '余啟红',
  '李季洪',
  '蔡兴杨',
  '彭雪莲',
  '孙桂兰',
  '罗生华',
  '王书培',
  '张园',
  '沈亮',
  '刘旺'],
 ['白世桥',
  '沈凯',
  '申元生',
  '周悟权',
  '张波',
  '李政',
  '丛李松',
  '郑建芳',
  '战伟东',
  '郑细海',
  '刘洋',
  '徐桂锦',
  '刘占奎',
  '杨照',
  '赵磊',
  '吕芝培',
  '梁铭洲',
  '李亮',
  '李双琴',
  '胡祥年',
  '王旭',
  '许智禄',
  '张琦',
  '刘阳',
  '马超',
  '王少勇',
  '李洪岩',
  '孙梦达',
  '徐向新',
  '刘锐哲',
  '吴保利',
  '矫仁辉'],
 ['杨秀欣',
  '王玥明',
  '陶然',
  '魏胜武',
  '孟凡野',
  '魏世录',
  '张宏海',
  '田秋生',
  '于福利',
  '李家亨',
  '李志伟',
  '郭福顺',
  '王志财',
  '简晓华',
  '冯德林',
  '李桐丞',
  '田晓晓',
  '简晓祥',
  '赵铁亮

<h4>Compare two community groups result from louvain and spectral clustering(optional)</h4>

In [26]:
df = pd.DataFrame(columns=["group", "provinces", "success_rate", "object_money", "penalty"])
for i in partition_list:
#     print(i)
    temp_ns1stcp = first_cases[first_cases['plaintiff'].isin(i)]
#     print("province of this group is: ", temp_ns1stcp['province'].unique())
    temp_ns1stcp_cc = len(temp_ns1stcp['case_id'].unique())
    temp_ns1stcp_sc = len(temp_ns1stcp[temp_ns1stcp['is_success']=="TRUE"]['case_id'].unique())
#     print("success_rate of this group is: ", temp_ns1stcp_sc/temp_ns1stcp_cc, "\n")
    df = df.append({"group":i, "provinces":temp_ns1stcp['province'].unique(), 
                    "success_rate": temp_ns1stcp_sc/temp_ns1stcp_cc, 
                    "object_money": temp_ns1stcp['objectmoney'].mean(), 
                    "penalty": temp_ns1stcp[temp_ns1stcp['is_success']=="TRUE"]['penalty'].mean()}, 
                   ignore_index=True)

In [27]:
# add marginal nodes into df as a group
# print(marginal_list)
temp_ns1stcp = first_cases[first_cases['plaintiff'].isin(marginal_list)]
# print("province of this group is: ", temp_ns1stcp['province'].unique())
temp_ns1stcp_cc = len(temp_ns1stcp['case_id'].unique())
temp_ns1stcp_sc = len(temp_ns1stcp[temp_ns1stcp['is_success']=="TRUE"]['case_id'].unique())
# print("success_rate of this group is: ", temp_ns1stcp_sc/temp_ns1stcp_cc, "\n")
df = df.append({"group":marginal_list, 
                "provinces": temp_ns1stcp['province'].unique(), 
                "success_rate": temp_ns1stcp_sc/temp_ns1stcp_cc, 
                "object_money": temp_ns1stcp['objectmoney'].mean(), 
                "penalty": temp_ns1stcp[temp_ns1stcp['is_success']=="TRUE"]['penalty'].mean()}, 
               ignore_index=True)

In [28]:
display(df)

Unnamed: 0,group,provinces,success_rate,object_money,penalty
0,"[丛李松, 薛文英, 许智禄, 孙梦达, 梁铭洲, 白世桥, 张琦, 郑建芳, 胡祥年, 李...","[浙江省, 北京市, 广东省, 上海市, 河南省, 黑龙江省, 天津市, 山东省, 重庆市,...",0.946607,789.412832,6389.558996
1,"[熊佳丽, 阎家明, 杨超, 宇政义, 李富蓉, 于凤星, 俞光新, 王会勇, 程宝全, 郑...","[上海市, 北京市, 安徽省, 江苏省, 浙江省, 山东省, 河北省, 广东省]",0.898944,1946.455129,19554.524683
2,"[胡一定, 葛太玉, 刘宏伟, 张明亮, 杨林茂, 王玥明, 田亚超, 范海, 李成学, 王...","[广东省, 天津市, 辽宁省, 河北省, 四川省, 湖北省, 广西壮族自治区, 河南省, 北...",0.929078,514.801934,4872.354807
3,"[王威, 于福利, 李桐丞, 张宏海, 陶然, 刘树文, 赵铁亮, 魏世录, 李家亨, 杨秀...","[辽宁省, 江苏省, 天津市]",0.924584,48.738495,1058.75834
4,"[谢志胜, 谢志桂, 许承凯, 阳秋旺, 李海林, 李俊华]","[广西壮族自治区, nan]",0.846154,274.150113,3443.229063
5,"[王伟华, 徐忠, 贾龙, 朱志鹏, 郁德专, 晏勇, 李玉婵, 强大应, 罗生华, 崔毅,...","[重庆市, 广西壮族自治区, 安徽省, 湖北省, 江苏省, 四川省, 陕西省, 浙江省, 吉...",0.88113,243.712757,3061.729798
6,"[韩进虎, 刘会]",[广东省],0.921875,2234.871963,12070.8
7,"[孙安民, 陈天高, 谷战, 杜渺, 郭勇, 杨丽, 李魁伟, 代国海, 任满仓, 赵佳斌,...","[陕西省, 广东省, nan, 四川省, 云南省, 江苏省, 湖南省, 湖北省, 宁夏回族自...",0.666935,152.029765,1734.974739


In [29]:
df['group'].apply(lambda r: len(r))

0    36
1    16
2    36
3    14
4     6
5    54
6     2
7    36
Name: group, dtype: int64

In [30]:
df_spc = pd.DataFrame(columns=["group", "provinces", "success_rate", "object_money", "penalty"])
for i in clusters:
#     print(i)
    temp_ns1stcp = first_cases[first_cases['plaintiff'].isin(i)]
#     print("province of this group is: ", temp_ns1stcp['province'].unique())
    temp_ns1stcp_cc = len(temp_ns1stcp['case_id'].unique())
    temp_ns1stcp_sc = len(temp_ns1stcp[temp_ns1stcp['is_success']=="TRUE"]['case_id'].unique())
#     print("success_rate of this group is: ", temp_ns1stcp_sc/temp_ns1stcp_cc, "\n")
    df_spc = df_spc.append({"group":i, "provinces":temp_ns1stcp['province'].unique(), 
                    "success_rate": temp_ns1stcp_sc/temp_ns1stcp_cc, 
                    "object_money": temp_ns1stcp['objectmoney'].mean(), 
                    "penalty": temp_ns1stcp[temp_ns1stcp['is_success']=="TRUE"]['penalty'].mean()}, 
                   ignore_index=True)

In [31]:
df_spc

Unnamed: 0,group,provinces,success_rate,object_money,penalty
0,"[张正荣, 秦东, 况力彬, 晏勇, 蒋飞亮, 俞光新, 强大应, 周开礼, 谢志桂, 叶润...","[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143
1,"[白世桥, 沈凯, 申元生, 周悟权, 张波, 李政, 丛李松, 郑建芳, 战伟东, 郑细海...","[浙江省, 北京市, 广东省, 上海市, 河南省, 黑龙江省, 天津市, 山东省, 重庆市,...",0.9518,878.979564,7005.227557
2,"[杨秀欣, 王玥明, 陶然, 魏胜武, 孟凡野, 魏世录, 张宏海, 田秋生, 于福利, 李...","[辽宁省, 天津市, 江苏省, 河北省, 北京市, 山东省, 上海市]",0.933721,179.037101,2450.829368
3,"[阎家明, 孙丁丁, 郑志军, 杨超, 申亚坤, 熊佳丽, 于凤星, 王会勇, 宇政义, 张...","[上海市, 北京市, 江苏省, 浙江省, 山东省, 河北省, 广东省]",0.899267,2154.884084,21751.144203
4,"[陈明江, 王福岭, 向云福, 杨林茂]",[广东省],0.972678,3180.404225,22248.809127
5,"[江金龙, 葛太玉, 张宝辉, 吕学鹏, 李剑林, 刘宏伟, 邹士伟, 胡一定, 王福群, ...","[广东省, 四川省, 湖北省, 广西壮族自治区, 北京市, 湖南省, 浙江省]",0.904895,729.098021,6262.52261
6,"[郭栋, 王新建, 郭夏天]","[广东省, 河南省]",0.827068,15.277059,312.676056


In [32]:
df_spc['group'].apply(lambda r: len(r))

0    64
1    32
2    32
3    13
4     4
5    14
6     3
Name: group, dtype: int64

<h4>Concat two types of community groups into one dataframe for further comparison(optional)</h4>

In [33]:
comgroups = pd.concat([df.add_prefix('lou_'), df_spc.add_prefix('spc_')], axis=1)

In [34]:
# set(comgroups.loc[4, 'lou_group']).difference(set(comgroups.loc[1, 'spc_group']))
# After compare two groups, the first 3 groups in both louvain and spectral clustering 
# are similar, while the other 3 groups are different

<h4>将律师信息添加进df和df_spc中</h4>

In [35]:
lawdf = pd.DataFrame(columns=["group", "lawyer", "case_count"])
for i in range(7):
    tempGroup = new_selected_1stcp[new_selected_1stcp['plaintiff'].isin(df['group'][i])]
    tempLawyers = list(tempGroup['lawyer'].drop_duplicates())
#     temp = new_selected_1stcp[new_selected_1stcp['lawyer'].isin(tempLawyers)]
    temp = tempGroup.groupby("lawyer")['case_id'].\
    nunique().reset_index().sort_values(by="case_id", ascending=False) # calculate case count for laywers within each community
    temp.rename(columns={"case_id": "case_count"}, inplace=True)
    temp = temp.assign(group=i)
    lawdf = lawdf.append(temp, ignore_index=True)
    
lawdf_spc = pd.DataFrame(columns=["group", "lawyer", "case_count"])
for i in range(7):
    tempGroup = new_selected_1stcp[new_selected_1stcp['plaintiff'].isin(df_spc['group'][i])]
    tempLawyers = list(tempGroup['lawyer'].drop_duplicates())
#     temp = new_selected_1stcp[new_selected_1stcp['lawyer'].isin(tempLawyers)]
    temp = tempGroup.groupby("lawyer")['case_id'].\
    nunique().reset_index().sort_values(by="case_id", ascending=False) # calculate case count for laywers within each community
    temp.rename(columns={"case_id": "case_count"}, inplace=True)
    temp = temp.assign(group=i)
    lawdf_spc = lawdf_spc.append(temp, ignore_index=True)

In [36]:
lawdf_spc[lawdf_spc['group']==0].head()

Unnamed: 0,group,lawyer,case_count
0,0,吴波,444
1,0,赵乾伟,342
2,0,吴金梅,289
3,0,彭丹,221
4,0,段理,126


<h4>找一下G_3中community的子图(based on spectral clustering)</h4>

In [37]:
subgraphs = []
for i in range(7):
    subgraphs.append(G_3.subgraph(list(df_spc['group'][i])))

<h4>找一下每一个group里面的高频律师</h4>

In [38]:
highFreqLaws = lawdf.groupby("group").head(50).reset_index(drop=True) #只取第一个lawyer的话图的连接太少
highFreqLawsSpc = lawdf_spc.groupby("group").head(50).reset_index(drop=True)
highFreqLawsSpc5 = lawdf_spc.groupby("group").head(5).reset_index(drop=True)

In [39]:
pd.concat([highFreqLaws.add_prefix("lou_"), highFreqLawsSpc.add_prefix("spc_")], axis=1) # lawyers in the first 4 groups are same

Unnamed: 0,lou_group,lou_lawyer,lou_case_count,spc_group,spc_lawyer,spc_case_count
0,0,肖丽君,239,0,吴波,444
1,0,牛琨,181,0,赵乾伟,342
2,0,徐洋,27,0,吴金梅,289
3,0,万迎军,24,0,彭丹,221
4,0,吴迪,14,0,段理,126
...,...,...,...,...,...,...
306,,,,6,吴正海,1
307,,,,6,宁振纳,1
308,,,,6,张振宇,1
309,,,,6,朱敏婷,1


In [40]:
# highFreqLawsSpc[['group', 'lawyer']][6*50:6*50+50]
# highFreqLawsSpc5.head(10)

<h4>重新build一下以高频律师为基础的network</h4>

In [41]:
nnew_selected_1stcp = new_selected_1stcp[new_selected_1stcp['lawyer'].isin(list(highFreqLawsSpc5['lawyer']))]

In [42]:
newG = nx.Graph()
newG.add_nodes_from(plaintiffs)
law_nselected_1stcp = nnew_selected_1stcp.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()
for i in range(1, len(law_nselected_1stcp)+1):
    a = list(law_nselected_1stcp[i-1:i]['plaintiff'].values[0])
    if len(a) > 1:
        newG.add_edges_from(list(itertools.combinations(a, 2)))

In [43]:
new_subgraphs = []
for i in range(len(clusters)):
    new_subgraphs.append(newG.subgraph(clusters[i]))

<h4>比较前后两个community内部边的差</h4>

In [44]:
print(sum([len(i.edges()) for i in subgraphs]), sum([len(i.edges()) for i in new_subgraphs])) # sum of edges within clusters
print(len(G_3.edges()), len(newG.edges())) # all edges within graph

1292 1056
1412 1094


<h4>去掉某一个group的高频律师重新build一下network</h4>

In [45]:
#去掉group0的高频律师
print(list(highFreqLawsSpc5['lawyer'][:5]))
nnew_selected_1stcp = new_selected_1stcp[~(new_selected_1stcp['lawyer'].isin(list(highFreqLawsSpc5['lawyer'][:5])))]

newG = nx.Graph()
newG.add_nodes_from(plaintiffs)
law_nselected_1stcp = nnew_selected_1stcp.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()
for i in range(1, len(law_nselected_1stcp)+1):
    a = list(law_nselected_1stcp[i-1:i]['plaintiff'].values[0])
    if len(a) > 1:
        newG.add_edges_from(list(itertools.combinations(a, 2)))
group0_sub = newG.subgraph(df_spc['group'][0])
print(len(group0_sub.edges()), len(G_3.subgraph(df_spc['group'][0]).edges()))

['吴波', '赵乾伟', '吴金梅', '彭丹', '段理']
278 741


In [46]:
#group1
highFreqLawsSpc5['lawyer'][5:10]
nnew_selected_1stcp = new_selected_1stcp[~(new_selected_1stcp['lawyer'].isin(list(highFreqLawsSpc5['lawyer'][5:10])))]
newG = nx.Graph()
newG.add_nodes_from(plaintiffs)
law_nselected_1stcp = nnew_selected_1stcp.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()
for i in range(1, len(law_nselected_1stcp)+1):
    a = list(law_nselected_1stcp[i-1:i]['plaintiff'].values[0])
    if len(a) > 1:
        newG.add_edges_from(list(itertools.combinations(a, 2)))
group1_sub = newG.subgraph(df_spc['group'][1])
print(len(group1_sub.edges()), len(G_3.subgraph(df_spc['group'][1]).edges()))

59 312


In [47]:
# verify whether the first 5 lawyers with high cases are dominant in their groups
resultDf = pd.DataFrame(columns=[
    "dominant_lawyers", 
    "new_com_edge", 
    "ori_com_edge", 
    "ori_group_member", 
    "new_group_member", 
    "dominant_proportion", 
    "ori_conductance", 
    "new_conductance"])

new_clusters = []

nnewG = nx.Graph()
nnewG.add_nodes_from(plaintiffs)

for i in range(7):
#     highFreqLaws['lawyer'][i*5:i*5+5]
    new_cluster = []
    newG = nx.Graph()

    # select group data
    nnew_selected_1stcp = new_selected_1stcp[new_selected_1stcp['plaintiff'].isin(df_spc['group'][i])]
    
    # remove dominant lawyer from that data
    nnew_selected_1stcp = nnew_selected_1stcp[~(nnew_selected_1stcp['lawyer'].isin(list(highFreqLawsSpc['lawyer'][i*50:i*50+5])))]
    
    law_nselected_1stcp = nnew_selected_1stcp.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()
    for j in range(1, len(law_nselected_1stcp)+1):
        a = list(law_nselected_1stcp[j-1:j]['plaintiff'].values[0])
        if len(a) > 1:
            new_cluster += a
            newG.add_edges_from(list(itertools.combinations(a, 2)))
            nnewG.add_edges_from(list(itertools.combinations(a, 2)))
            
    new_clusters.append(list(set(new_cluster)))
    group_sub = G_3.subgraph(new_cluster)
    
    resultDf = resultDf.append({
        "dominant_lawyers": list(highFreqLawsSpc['lawyer'][i*50:i*50+5]), 
        "new_com_edge": len(newG.edges()), 
        "ori_com_edge": len(G_3.subgraph(df_spc['group'][i]).edges()), 
        "ori_group_member": len(df_spc['group'][i]), 
        "new_group_member": len(new_clusters[i]), 
        "dominant_proportion": 1 - (len(newG.edges()) / len(G_3.subgraph(df_spc['group'][i]).edges())), 
    }, ignore_index=True)
    
resultDf['ori_conductance'] = [nx.conductance(G_3, cluster_i) for cluster_i in df_spc['group']]
resultDf['new_conductance'] = [nx.conductance(G_3, cluster_i) if len(cluster_i) > 0 else np.nan for cluster_i in new_clusters]
resultDf
# group0, 2, and 6 are significant

Unnamed: 0,dominant_lawyers,new_com_edge,ori_com_edge,ori_group_member,new_group_member,dominant_proportion,ori_conductance,new_conductance
0,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理]",278,741,64,57,0.624831,0.04031,0.132705
1,"[肖丽君, 牛琨, 万迎军, 张朝阳, 吴迪]",59,312,32,27,0.810897,0.122363,0.271829
2,"[王志扬, 顾雪微, 陈卉, 孟丹妮, 齐芳梅]",140,158,32,31,0.113924,0.122222,0.164265
3,"[王虹, 杨超, 张燕, 滕卫兴, 王瀛]",18,22,13,12,0.181818,0.254237,0.263158
4,"[利继梅, 崔峰, 谢泽烽, 黄道义, 杨玉盟]",2,3,4,3,0.333333,0.454545,0.555556
5,"[莫观培, 梁梭, 杨玉盟, 孙依丰, 钟永标]",33,53,14,14,0.377358,0.242857,0.242857
6,"[曾娜, 蔡泽宇, 袁媛, 叶丽明, 吕远霞]",0,3,3,0,1.0,0.142857,


In [48]:
# 设置dominant阈值，找到不同group的dominant lawyers，find intersection of them去观察有没有律师流窜（活跃律师）
# dominant是0.8

dom_lawyers = []
new_clusters = []
dom_levels = []

for i in range(7):
    nnew_selected_1stcp = new_selected_1stcp[new_selected_1stcp['plaintiff'].isin(df_spc['group'][i])]
    dom_lawyer = []
    
    for l in highFreqLawsSpc['lawyer'][i*50:i*50+50]:
        new_cluster = []
        newG = nx.Graph()
        dom_lawyer.append(l)

        # remove dominant lawyer from that data
        nnnew_selected_1stcp = nnew_selected_1stcp[~nnew_selected_1stcp['lawyer'].isin(list(dom_lawyer))]
        law_nselected_1stcp = nnnew_selected_1stcp.groupby('lawyer')['plaintiff'].unique().apply(tuple).reset_index()

        for j in range(1, len(law_nselected_1stcp)+1):
            a = list(law_nselected_1stcp[j-1:j]['plaintiff'].values[0])
            if len(a) > 1:
                new_cluster += a
                newG.add_edges_from(list(itertools.combinations(a, 2)))
        
        group_sub = G_3.subgraph(new_cluster)
        dom_level = 1 - (len(group_sub.edges()) / len(G_3.subgraph(df_spc['group'][i]).edges()))
        
        if dom_level >= 0.8: break
            
    dom_lawyers.append(dom_lawyer)
    new_clusters.append(new_cluster)
    dom_levels.append(dom_level)

redf = pd.DataFrame({"dom_lawyer": dom_lawyers, "new_cluster": new_clusters, "dom_level": dom_levels})
redf

Unnamed: 0,dom_lawyer,new_cluster,dom_level
0,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理, 杨林, 李元忠, 贺彬霖, 牟琴瑶, 胡伦扬,...","[秦东, 晏勇, 姚金东, 李战江, 李俊华, 谢志桂, 张佩, 孙桂兰, 罗伟, 余定勇,...",0.676113
1,"[肖丽君, 牛琨, 万迎军, 张朝阳, 吴迪, 胡帅芳, 宁亚楠, 刘新焱, 陈婷, 许玲,...","[徐桂锦, 杨照, 李政, 赵磊, 胡祥年, 申元生, 白世桥, 申元生, 张琦, 孙梦达,...",0.810897
2,"[王志扬, 顾雪微, 陈卉, 孟丹妮, 齐芳梅, 孙悦, 徐洋, 周洁, 张赛, 丛健, 吴...","[杨秀欣, 于福利, 王瑜, 李志伟, 田秋生, 赵铁亮, 孟凡野, 魏世录]",0.848101
3,"[王虹, 杨超, 张燕, 滕卫兴, 王瀛, 王雅静, 殷国丰, 金越, 孔珊珊, 楼溪, 肖...","[孙丁丁, 阎家明, 郑志军, 杨超]",0.818182
4,"[利继梅, 崔峰, 谢泽烽, 黄道义, 杨玉盟, 张弛, 谢智舜, 王际贵, 陈娟, 吴长阳...",[],1.0
5,"[莫观培, 梁梭, 杨玉盟, 孙依丰, 钟永标, 朱小斌, 武奎元, 胡宁可, 黄秀雅, 林...","[吕学鹏, 张宝辉, 吕学鹏, 李剑林, 江金龙, 葛太玉, 李剑林, 葛太玉]",0.849057
6,"[曾娜, 蔡泽宇]",[],1.0


In [49]:
common_lawyer = set.intersection(*[set(redf['dom_lawyer'][i]) for i in range(6)]) #无重复

In [50]:
common_lawyer

set()

In [51]:
domLawyers = []
for i in range(6):
    domLawyers += set(redf['dom_lawyer'][i])
len(domLawyers) #总共有189个主导律师

189

<h3>Further analysis</h3><br>
<p> 
1. 通过找每个cluster的plaintiff 的centrality, betweeness, pagerank<br>
2. 每个cluster里主导plaintiff的案件数<br>
3. 看一下上两个指标有没有正相关<br>
4. 最活跃的原告在哪些group里面，是否是well-connected的group<br>
5. 看一下这些dominant律师的律所有没有什么相似<br>
</p>

In [72]:
# 1. 2. 3.
dfSpc = pd.concat([df_spc, redf['dom_lawyer']], axis=1).reset_index().rename(
    columns={"index": "group_number", "group": "member"})
dfspcLawyer = dfSpc.explode('dom_lawyer').reset_index(drop=True)
dfSpc = dfSpc.explode('member').reset_index(drop=True)

In [73]:
# group 0
dfSpc[dfSpc['group_number']==0].head()

Unnamed: 0,group_number,member,provinces,success_rate,object_money,penalty,dom_lawyer
0,0,张正荣,"[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理, 杨林, 李元忠, 贺彬霖, 牟琴瑶, 胡伦扬,..."
1,0,秦东,"[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理, 杨林, 李元忠, 贺彬霖, 牟琴瑶, 胡伦扬,..."
2,0,况力彬,"[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理, 杨林, 李元忠, 贺彬霖, 牟琴瑶, 胡伦扬,..."
3,0,晏勇,"[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理, 杨林, 李元忠, 贺彬霖, 牟琴瑶, 胡伦扬,..."
4,0,蒋飞亮,"[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143,"[吴波, 赵乾伟, 吴金梅, 彭丹, 段理, 杨林, 李元忠, 贺彬霖, 牟琴瑶, 胡伦扬,..."


In [54]:
from networkx.algorithms.centrality import betweenness_centrality, degree_centrality

bnPd = pd.DataFrame(columns=["group_number", "member", "betweenness_centrality", "degree_centrality", "pageRank"])
for i in range(6):
    subGraphi = G_3.subgraph(df_spc['group'][i])
    betCen = betweenness_centrality(subGraphi)
    degCen = degree_centrality(subGraphi)
    nxPage = nx.pagerank(subGraphi)
    data = {
        "group_number": i, 
        "member": list(betCen.keys()), 
        "betweenness_centrality": list(betCen.values()), 
        "degree_centrality": list(degCen.values()), 
        "pageRank": list(nxPage.values())
    }
    bnPd = bnPd.append(pd.DataFrame.from_dict(data), ignore_index=True)
bnPd.head()

Unnamed: 0,group_number,member,betweenness_centrality,degree_centrality,pageRank
0,0,王伟华,0.005799,0.587302,0.021766
1,0,徐忠,0.0,0.015873,0.002898
2,0,贾龙,0.050679,0.746032,0.028782
3,0,朱志鹏,0.006976,0.333333,0.014077
4,0,俞光新,0.0,0.047619,0.006199


In [55]:
dfSpc = dfSpc.merge(bnPd, on=['group_number', 'member'], how="left")

In [56]:
temp = new_selected_1stcp.groupby("plaintiff")['case_id'].nunique().reset_index().rename(columns={"case_id": "case_count"})
dfSpc = dfSpc.merge(temp, left_on="member", right_on="plaintiff")

In [57]:
dfSpc.sort_values(["group_number", "case_count"]).groupby("group_number")\
[["group_number", "member", "case_count", "betweenness_centrality", "degree_centrality", "pageRank"]].head(3)

Unnamed: 0,group_number,member,case_count,betweenness_centrality,degree_centrality,pageRank
12,0,余定勇,29,0.000427,0.174603,0.009323
46,0,何林松,29,0.0,0.460317,0.017411
5,0,俞光新,30,0.0,0.047619,0.006199
88,1,马超,30,0.0,0.774194,0.035736
92,1,徐向新,31,0.0,0.096774,0.012954
94,1,吴保利,31,0.0,0.774194,0.035736
127,2,刘俊梅,29,0.0,0.096774,0.0167
100,2,孟凡野,31,0.269493,0.483871,0.052242
104,2,于福利,31,0.0,0.419355,0.034128
136,3,宇政义,30,0.0,0.166667,0.047324


<h4>regression: relationship between case count and degree_centrality/betweenness_centrality/pageRank</h4>

In [58]:
# degree_centrality significant!!! 
# 节点度的中心度越大（合作过的律师越多，有关联的其他原告越多，案件数量越多）
from statsmodels.formula.api import ols
fit = ols('case_count ~ C(group_number) + degree_centrality', data=dfSpc).fit() 
fit.summary()

0,1,2,3
Dep. Variable:,case_count,R-squared:,0.222
Model:,OLS,Adj. R-squared:,0.192
Method:,Least Squares,F-statistic:,7.239
Date:,"Fri, 05 Nov 2021",Prob (F-statistic):,8.21e-07
Time:,11:55:38,Log-Likelihood:,-907.56
No. Observations:,159,AIC:,1829.0
Df Residuals:,152,BIC:,1851.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,75.9894,13.219,5.748,0.000,49.872,102.107
C(group_number)[T.1],-92.1964,17.465,-5.279,0.000,-126.701,-57.692
C(group_number)[T.2],-18.6816,16.189,-1.154,0.250,-50.666,13.303
C(group_number)[T.3],-64.2364,22.784,-2.819,0.005,-109.251,-19.222
C(group_number)[T.4],-83.8591,38.571,-2.174,0.031,-160.063,-7.655
C(group_number)[T.5],-87.3761,22.669,-3.854,0.000,-132.162,-42.590
C(group_number)[T.6],0,0,,,0,0
degree_centrality,107.2393,25.509,4.204,0.000,56.842,157.637

0,1,2,3
Omnibus:,74.771,Durbin-Watson:,1.709
Prob(Omnibus):,0.0,Jarque-Bera (JB):,249.937
Skew:,1.878,Prob(JB):,5.33e-55
Kurtosis:,7.86,Cond. No.,1.68e+19


In [59]:
# betweenness_centrality
fit = ols('case_count ~ C(group_number) + betweenness_centrality', data=dfSpc).fit() 
fit.summary()

  return np.sqrt(eigvals[0]/eigvals[-1])


0,1,2,3
Dep. Variable:,case_count,R-squared:,0.137
Model:,OLS,Adj. R-squared:,0.103
Method:,Least Squares,F-statistic:,4.014
Date:,"Fri, 05 Nov 2021",Prob (F-statistic):,0.000925
Time:,11:55:56,Log-Likelihood:,-915.85
No. Observations:,159,AIC:,1846.0
Df Residuals:,152,BIC:,1867.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,114.6801,9.848,11.645,0.000,95.223,134.138
C(group_number)[T.1],-64.2346,17.005,-3.777,0.000,-97.831,-30.638
C(group_number)[T.2],-25.0844,17.049,-1.471,0.143,-58.768,8.599
C(group_number)[T.3],-77.9552,24.386,-3.197,0.002,-126.134,-29.777
C(group_number)[T.4],-83.8182,43.230,-1.939,0.054,-169.228,1.591
C(group_number)[T.5],-65.9537,23.238,-2.838,0.005,-111.865,-20.042
C(group_number)[T.6],0,0,,,0,0
betweenness_centrality,59.5524,63.814,0.933,0.352,-66.525,185.630

0,1,2,3
Omnibus:,89.528,Durbin-Watson:,1.63
Prob(Omnibus):,0.0,Jarque-Bera (JB):,383.141
Skew:,2.189,Prob(JB):,6.34e-84
Kurtosis:,9.218,Cond. No.,inf


In [60]:
# pageRank
fit = ols('case_count ~ C(group_number) + pageRank', data=dfSpc).fit() 
fit.summary()

  return np.sqrt(eigvals[0]/eigvals[-1])


0,1,2,3
Dep. Variable:,case_count,R-squared:,0.14
Model:,OLS,Adj. R-squared:,0.106
Method:,Least Squares,F-statistic:,4.123
Date:,"Fri, 05 Nov 2021",Prob (F-statistic):,0.000728
Time:,11:55:59,Log-Likelihood:,-915.55
No. Observations:,159,AIC:,1845.0
Df Residuals:,152,BIC:,1867.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,111.1553,10.421,10.667,0.000,90.567,131.743
C(group_number)[T.1],-68.4072,17.339,-3.945,0.000,-102.664,-34.150
C(group_number)[T.2],-28.1884,17.339,-1.626,0.106,-62.445,6.069
C(group_number)[T.3],-90.0829,27.607,-3.263,0.001,-144.625,-35.541
C(group_number)[T.4],-133.4200,66.775,-1.998,0.047,-265.347,-1.493
C(group_number)[T.5],-79.5167,26.367,-3.016,0.003,-131.611,-27.423
C(group_number)[T.6],0,0,,,0,0
pageRank,272.0586,226.832,1.199,0.232,-176.092,720.210

0,1,2,3
Omnibus:,89.105,Durbin-Watson:,1.644
Prob(Omnibus):,0.0,Jarque-Bera (JB):,378.931
Skew:,2.18,Prob(JB):,5.2e-83
Kurtosis:,9.18,Cond. No.,inf


In [75]:
# 5 TODO
temp = dfspcLawyer.merge(all_cases[['case_id', 'lawyer']], left_on="dom_lawyer", right_on="lawyer", how="left")

In [110]:
temp[temp['dom_lawyer']=="杨潇潇"].head(1)

Unnamed: 0,group_number,member,provinces,success_rate,object_money,penalty,dom_lawyer,case_id,lawyer
3984,0,"[张正荣, 秦东, 况力彬, 晏勇, 蒋飞亮, 俞光新, 强大应, 周开礼, 谢志桂, 叶润...","[上海市, 重庆市, 安徽省, 广西壮族自治区, 湖北省, 江苏省, 浙江省, 四川省, 陕...",0.880352,253.431253,3121.548143,杨潇潇,5ec9356c3cdef0087edcb0d1,杨潇潇


In [111]:
lawyer_maps = {}
lawyer_maps["吴波"] = ["北京大成(重庆)律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所", "重庆森木律师事务所"]
lawyer_maps["吴金梅"] = ["四川银证律师事务所"]
lawyer_maps["彭丹"] = ["重庆翰墨律师事务所"]
lawyer_maps["段理"] = ["重庆中钦律师事务所"]
lawyer_maps["杨林"] = ["江苏金华星律师事务所"]
lawyer_maps["李元忠"] = ["福建名闻律师事务所"]
lawyer_maps["贺彬霖"] = ["重庆翰墨律师事务所"]
lawyer_maps["牟琴瑶"] = ["重庆中钦律师事务所"]
lawyer_maps["胡伦扬"] = ["北京市盈科(福州)律师事务所"]
lawyer_maps["王镜杰"] = ["重庆商策律师事务所"]
lawyer_maps["朱丽丹"] = ["重庆中钦律师事务所"]
lawyer_maps["江前娅"] = ["重庆森木律师事务所"]
lawyer_maps["郑腾飞"] = ["重庆丽达律师事务所"]
lawyer_maps["温衍"] = ["重庆善弘律师事务所"]
lawyer_maps["李春晖"] = ["广西桂兴律师事务所"]
lawyer_maps["韦松明"] = ["广西横原律师事务所"]
lawyer_maps["王玲霞"] = ["北京市盈科(福州)律师事务所"]
lawyer_maps["江前亚"] = ["重庆森木律师事务所"]
lawyer_maps["刘歆琪"] = ["重庆翰墨律师事务所"]
lawyer_maps["梁安武"] = ["重庆渝礼律师事务所"]

In [112]:
lawyer_maps["冉波"] = ["重庆商策律师事务所"]
lawyer_maps["王晖"] = ["北京中伦(武汉)律师事务所"]
lawyer_maps["罗琳"] = ["北京中伦(武汉)律师事务所"]
lawyer_maps["张兴杨"] = ["重庆圣石律师事务所", "重庆圣石牛律师事务所"]
lawyer_maps["李海霞"] = ["广西刘晰律师事务所"]
lawyer_maps["朱清"] = ["重庆鼎圣律师事务所"]
lawyer_maps["王星"] = ["重庆圣石牛律师事务所"]
lawyer_maps["肖翠婷"] = ["重庆善弘律师事务所"]
lawyer_maps["屈刚"] = ["四川丰宜律师事务所"]
lawyer_maps["杨潇潇"] = ["广西刘晰律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]
lawyer_maps["赵乾伟"] = ["重庆长弘律师事务所"]

In [113]:
# TODO 展示每一个原告cluster和其律师的二部图
# 原告cluster告的被告有没有什么特点（类别），违法类型，标的、赢率等等共性