# 分析不合规合同

基本思路: 读取每周的统计数据, 生成一个全量的数据集存入本地 Excel 中, 然后根据全量的数据集统计数据做数据分析. 同时根据最新的 KPI 考核规范改写相关统计逻辑.


In [1]:
from functools import partial
from pathlib import Path
import pandas as pd
# pd.options.display.max_rows = 5


将周统计 Excel 文件中的多个 sheet 映射成内部定义的名称, 实现对周统计 Excel 的解耦.


In [7]:
# config
sheet_names = ['新签续签合同',
               '应终止合同',
               '应结算合同',
               '不规范合同',
               '产园项目清单',
               '所有组织机构',
               '分公司+项目部+项目'
               ]
name_for_df = ['new',
               'termination',
               'settlement',
               'irregular',
               'projects',
               'org',
               'org_projects'
               ]

df_sheets = pd.DataFrame({
    'sheet_name': sheet_names,
    'df_name': name_for_df
})

df_sheets


Unnamed: 0,sheet_name,df_name
0,新签续签合同,new
1,应终止合同,termination
2,应结算合同,settlement
3,不规范合同,irregular
4,产园项目清单,projects
5,所有组织机构,org
6,分公司+项目部+项目,org_projects


获取最新的周统计 Excel 文件的句柄.


In [8]:
# get or create 'data' directory
data_dir_name = 'data'
data_dir = Path.cwd() / data_dir_name
if not data_dir.exists():
    data_dir.mkdir()
# get all irregular contracts xlsx files into a array
irregular_contracts_dir = data_dir / 'irregular_contracts'
if not irregular_contracts_dir.exists():
    irregular_contracts_dir.mkdir()
files = [f for f in sorted(irregular_contracts_dir.glob('20*.xlsx'))]
filename = files[-1]
filename


PosixPath('/Users/levin/workspace/git-repositories/anaconda/study-pandas-tutorials/Work/data/irregular_contracts/20220408.xlsx')

将最新的周统计即本周期的 Excel 文件中的 sheets 映射成 DataFrame.


In [9]:
def map_sheet_name(sheet_name_, df_, lookup, target):
    """Map sheet names in Excel file

    Parameters:
    ----------
    sheet_name_: str
      sheet name in Excel file
    df_: DataFrame
      mapping relations
    lookup: str
      filed name in DataFrame (df_) that is sheet name in excel file
    target: str
      map to name

    Returns:
    -------
    name: str
    """
    values = df_.loc[df_[lookup] == sheet_name_][target].values
    if len(values) > 0:
        return values[0]
    return sheet_name_


map_name = partial(map_sheet_name, df_=df_sheets,
                   lookup='sheet_name', target='df_name')


In [10]:
dfs = {}
xls = pd.ExcelFile(filename)
for sheet_name in xls.sheet_names:
    dfs[map_name(sheet_name)] = pd.read_excel(xls, sheet_name)


构造分公司, 项目部, 项目三级组织机构


In [11]:
def industry_org(df_):
    industry_id = 1001
    df = df_.rename(columns={
        '机构id': 'id',
        '机构名称': 'org_name',
        '上级id': 'pid',
        '上级机构名称': 'p_org_name'
    })
    # branch
    df_branch = df.loc[df['pid'] == industry_id]
    # project department
    df_dept = pd.merge(
        df, df_branch[['id']], left_on='pid', right_on='id', suffixes=('', '_y'))
    df_industry = pd.merge(df_branch, df_dept,
                           left_on='id', right_on='pid', suffixes=('_branch', '_dept'))
    df_industry = df_industry[['id_branch',
                              'org_name_branch',
                               'id_dept',
                               'org_name_dept']] \
        .rename(columns={
            'org_name_branch': 'branch_name',
            'org_name_dept': 'dept_name'
        })

    return df_industry


In [12]:
df_org = industry_org(dfs['org'])
df_org.head()


Unnamed: 0,id_branch,branch_name,id_dept,dept_name
0,1005,园区运营中心,1436204,北京产业创新中心
1,1005,园区运营中心,1436205,价值工厂
2,1005,园区运营中心,1436206,南海意库-商业
3,1005,园区运营中心,1436207,蛇口网谷-商业
4,1005,园区运营中心,1437198,创业壹号A座招商创库


In [13]:
def project_org(df_left, df_right):
    df_projects = df_right.rename(columns={
        'ORGAN_ID': 'org_id',
        '项目名称': 'project_name',
        '上级机构id': 'pid',
        '上级机构名称': 'p_org_name'
    })
    df_all = pd.merge(df_left, df_projects, left_on='id_dept', right_on='pid')
    df_all = df_all[['id_branch', 'branch_name',
                    'id_dept', 'dept_name', 'org_id']]
    df_all = df_all.rename(columns={'id_branch': 'branch_id',
                                    'id_dept': 'dept_id',
                                    'org_id': 'project_id'
                                    })
    return df_all


In [14]:
df_org_projects = project_org(df_org, dfs['projects'])
df_org_projects


Unnamed: 0,branch_id,branch_name,dept_id,dept_name,project_id
0,1005,园区运营中心,1436204,北京产业创新中心,1435203
1,1005,园区运营中心,1436205,价值工厂,1413262
2,1005,园区运营中心,1436205,价值工厂,1413263
3,1005,园区运营中心,1436206,南海意库-商业,1433221
4,1005,园区运营中心,1436207,蛇口网谷-商业,1412260
...,...,...,...,...,...
91,1435224,产园-武汉公司,1434222,东湖网谷,1427224
92,1435224,产园-武汉公司,1434222,东湖网谷,1436221
93,1435224,产园-武汉公司,1434223,高新网谷,1427236
94,1435225,产园-青岛公司,1435226,蓝湾网谷,1421248


In [12]:
df_org_projects = dfs['org_projects'] \
    .rename(columns={
        'ORGAN_ID': 'project_id',
        '项目公司': 'branch_name',
        '项目部': 'dept_name',
        '项目名称': 'project_name'
    })
df_org_projects


Unnamed: 0,project_id,branch_name,dept_name,project_name
0,1421228,产园-杭州公司,信雅达创库,A1招商创库
1,1421227,产园-杭州公司,信雅达创库,A2信雅达
2,1430201,产园-深圳公司,蛇口网谷,万海大厦
3,1433225,园区运营中心,蛇口网谷-商业,万海大厦-商业
4,1431205,产园-深圳公司,蛇口网谷,万维大厦
...,...,...,...,...
96,1413258,文化产业公司,文化公司其他租赁,青少年活动中心（本部）
97,1413261,文化产业公司,文化公司其他租赁,风华剧院A座
98,1413251,园区运营中心,园区运营中心其他,风华剧院B座
99,1427236,产园-武汉公司,高新网谷,高新网谷


将不合规合同的类型和分公司结合, 创建分公司不合规类型表, 该表用于后续的统计分析, 解决分公司对某种不合规类型没有数据的问题.


In [15]:
df_branch = df_org['branch_name'].drop_duplicates()
df_irregular_category = dfs['irregular']['情况'].drop_duplicates()
df_irregular_category_with_branch = pd.merge(
    df_branch, df_irregular_category, how='cross')
df_irregular_category_with_branch


Unnamed: 0,branch_name,情况
0,园区运营中心,倒签
1,园区运营中心,应结未结
2,园区运营中心,应算未算
3,文化产业公司,倒签
4,文化产业公司,应结未结
5,文化产业公司,应算未算
6,南油平方,倒签
7,南油平方,应结未结
8,南油平方,应算未算
9,番禺科技园,倒签


In [16]:
df_irregular_category_with_dept = pd.merge(
    df_org, df_irregular_category, how='cross')
df_irregular_category_with_dept

Unnamed: 0,id_branch,branch_name,id_dept,dept_name,情况
0,1005,园区运营中心,1436204,北京产业创新中心,倒签
1,1005,园区运营中心,1436204,北京产业创新中心,应结未结
2,1005,园区运营中心,1436204,北京产业创新中心,应算未算
3,1005,园区运营中心,1436205,价值工厂,倒签
4,1005,园区运营中心,1436205,价值工厂,应结未结
...,...,...,...,...,...
94,1435224,产园-武汉公司,1434223,高新网谷,应结未结
95,1435224,产园-武汉公司,1434223,高新网谷,应算未算
96,1435225,产园-青岛公司,1435226,蓝湾网谷,倒签
97,1435225,产园-青岛公司,1435226,蓝湾网谷,应结未结


分公司+不合规类型的统计

In [17]:
df_irregular_deduplication = dfs['irregular'].drop_duplicates(subset='合同编号')
df_irregular = pd.merge(
    df_org_projects,
    df_irregular_deduplication,
    left_on='project_id',
    right_on='organ_id')
df_irregular


Unnamed: 0,branch_id,branch_name,dept_id,dept_name,project_id,organ_id,项目名称,资源ids,资源名称,合同编号,...,合同终止类型,终止审批状态,终止申请状态,终止申请类型,终止审批创建日期,申请id,old_contract_id,contract_id,情况,说明
0,1005,园区运营中心,1436204,北京产业创新中心,1435203,1435203,北京产业创新中心,551398,北京新时代国际中心A座14-BJCYCXZX-001,bjcycxzx-2022-03-1019,...,,,,,,42718,0,42727,倒签,已审批
1,1005,园区运营中心,1436205,价值工厂,1413262,1413262,价值工厂,526121,集装箱商业1层-2-101,jzgc-2021-12-0087,...,提前终止,审批通过,正常,,,41663,32873,40565,应结未结,已终止
2,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536790,6栋1层-110,nhyk-sy-2021-12-1131,...,正常终止,审批通过,正常,终止申请,2022-02-25,42259,34361,40660,应算未算,未结算
3,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536728,2栋1层-122,nhyk-2019-04-0359,...,正常终止,审批通过,正常,,,41846,0,35400,应结未结,已终止
4,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536741,5栋1层-117-118,nhyk-sy-2022-02-1158,...,提前终止,审批通过,正常,,,43193,0,41510,应结未结,已终止
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,1435224,产园-武汉公司,1434222,东湖网谷,1424227,1424227,东湖网谷一期,515285,1号楼4层-401-3,dhwgyq-2022-01-1089,...,,,,,,40752,0,40987,倒签,已审批
296,1435224,产园-武汉公司,1434222,东湖网谷,1424227,1424227,东湖网谷一期,515287,1号楼4层-403,dhwgyq-2019-10-1008,...,提前终止,审批通过,正常,,,41084,0,19170,应结未结,已终止
297,1435224,产园-武汉公司,1434223,高新网谷,1427236,1427236,高新网谷,535702,1号楼9层-904,gxwg-2022-01-1101,...,,,,,,41170,0,41183,倒签,已审批
298,1435224,产园-武汉公司,1434223,高新网谷,1427236,1427236,高新网谷,535727,1号楼10层-1013,gxwg-2022-02-1123,...,,,,,,42019,0,41492,倒签,已审批


应用白名单

In [18]:
df_whitelist = pd.read_excel(irregular_contracts_dir / 'whitelist.xlsx')
df_irregular = df_irregular[~df_irregular['合同编号']
                            .isin(df_whitelist['合同编号'])] \
    .reset_index()
df_irregular


Unnamed: 0,index,branch_id,branch_name,dept_id,dept_name,project_id,organ_id,项目名称,资源ids,资源名称,...,合同终止类型,终止审批状态,终止申请状态,终止申请类型,终止审批创建日期,申请id,old_contract_id,contract_id,情况,说明
0,0,1005,园区运营中心,1436204,北京产业创新中心,1435203,1435203,北京产业创新中心,551398,北京新时代国际中心A座14-BJCYCXZX-001,...,,,,,,42718,0,42727,倒签,已审批
1,1,1005,园区运营中心,1436205,价值工厂,1413262,1413262,价值工厂,526121,集装箱商业1层-2-101,...,提前终止,审批通过,正常,,,41663,32873,40565,应结未结,已终止
2,2,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536790,6栋1层-110,...,正常终止,审批通过,正常,终止申请,2022-02-25,42259,34361,40660,应算未算,未结算
3,3,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536728,2栋1层-122,...,正常终止,审批通过,正常,,,41846,0,35400,应结未结,已终止
4,4,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536741,5栋1层-117-118,...,提前终止,审批通过,正常,,,43193,0,41510,应结未结,已终止
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
287,295,1435224,产园-武汉公司,1434222,东湖网谷,1424227,1424227,东湖网谷一期,515285,1号楼4层-401-3,...,,,,,,40752,0,40987,倒签,已审批
288,296,1435224,产园-武汉公司,1434222,东湖网谷,1424227,1424227,东湖网谷一期,515287,1号楼4层-403,...,提前终止,审批通过,正常,,,41084,0,19170,应结未结,已终止
289,297,1435224,产园-武汉公司,1434223,高新网谷,1427236,1427236,高新网谷,535702,1号楼9层-904,...,,,,,,41170,0,41183,倒签,已审批
290,298,1435224,产园-武汉公司,1434223,高新网谷,1427236,1427236,高新网谷,535727,1号楼10层-1013,...,,,,,,42019,0,41492,倒签,已审批


过滤公寓项目

In [19]:
# filter_project_list = ['东湖网谷公寓', '九龙意库公寓']
# df_irregular = df_irregular[~df_irregular['项目名称']
#                             .isin(filter_project_list)] \
#     .reset_index()
# df_irregular

In [20]:
df_irregular_count = df_irregular.groupby(
    ['branch_name', '情况'])['organ_id'] \
    .count() \
    .rename('count') \
    .reset_index()
df_irregular_count


Unnamed: 0,branch_name,情况,count
0,产园-南京公司,倒签,26
1,产园-南京公司,应结未结,24
2,产园-杭州公司,倒签,45
3,产园-杭州公司,应结未结,51
4,产园-武汉公司,倒签,5
5,产园-武汉公司,应结未结,2
6,产园-深圳公司,倒签,69
7,产园-深圳公司,应算未算,4
8,产园-深圳公司,应结未结,39
9,产园-重庆公司,倒签,6


In [21]:
df_irregular_count2 = pd.merge(
    df_irregular_category_with_branch,
    df_irregular_count,
    how='left')[['branch_name',
                 '情况',
                 'count']] \
    .fillna(0) \
    .astype({'count': 'int32'})

df_irregular_count2


Unnamed: 0,branch_name,情况,count
0,园区运营中心,倒签,2
1,园区运营中心,应结未结,3
2,园区运营中心,应算未算,1
3,文化产业公司,倒签,0
4,文化产业公司,应结未结,0
5,文化产业公司,应算未算,1
6,南油平方,倒签,2
7,南油平方,应结未结,0
8,南油平方,应算未算,0
9,番禺科技园,倒签,1


In [22]:
df_irregular_dept_count = df_irregular.groupby(
    ['branch_name', 'dept_name', '情况'])['organ_id'] \
    .count() \
    .rename('count') \
    .reset_index()
df_irregular_dept_count

Unnamed: 0,branch_name,dept_name,情况,count
0,产园-南京公司,紫金智谷,倒签,3
1,产园-南京公司,紫金智谷,应结未结,7
2,产园-南京公司,高铁网谷,倒签,23
3,产园-南京公司,高铁网谷,应结未结,17
4,产园-杭州公司,上海森兰美奂创库,倒签,11
5,产园-杭州公司,上海森兰美奂创库,应结未结,28
6,产园-杭州公司,信雅达创库,倒签,32
7,产园-杭州公司,信雅达创库,应结未结,23
8,产园-杭州公司,豪华邮轮配套产业园,倒签,2
9,产园-武汉公司,东湖网谷,倒签,1


In [23]:
df_irregular_dept_count2 = pd.merge(
    df_irregular_category_with_dept,
    df_irregular_dept_count,
    how='left')[['branch_name',
                 'dept_name',
                 '情况',
                 'count']] \
    .fillna(0) \
    .astype({'count': 'int32'})

df_irregular_dept_count2


Unnamed: 0,branch_name,dept_name,情况,count
0,园区运营中心,北京产业创新中心,倒签,1
1,园区运营中心,北京产业创新中心,应结未结,0
2,园区运营中心,北京产业创新中心,应算未算,0
3,园区运营中心,价值工厂,倒签,0
4,园区运营中心,价值工厂,应结未结,1
...,...,...,...,...
94,产园-武汉公司,高新网谷,应结未结,0
95,产园-武汉公司,高新网谷,应算未算,0
96,产园-青岛公司,蓝湾网谷,倒签,0
97,产园-青岛公司,蓝湾网谷,应结未结,0


In [24]:
def get_irregular(df_irregular_all_, key, irregular_category):
    return df_irregular_all_[df_irregular_all_[key] == irregular_category] \
        .reset_index(drop=True)


def total_irregular(df_irregular_, df_org_projects_):
    df = pd.merge(df_org_projects_,
                  df_irregular_,
                  left_on='project_id',
                  right_on='organ_id'
                  )
    df_total = df.groupby('branch_name')['organ_id'] \
        .count() \
        .rename('count') \
        .reset_index()
    return df_total


def report_irregular(df_total_, df_irregular_):
    df_report = pd.merge(df_irregular_,
                         df_total_,
                         how='left',
                         on='branch_name'
                         )
    df_report['percent'] = round(
        df_report['count_x'] /
        df_report['count_y'],
        4
    )
    df_report['total'] = df_report['count_x'].sum()
    df_report['grand_total'] = df_report['count_y'].sum()
    df_report['average_percent'] = round(
        df_report['total'] /
        df_report['grand_total'],
        4
    )
    df_report = df_report.sort_values('percent', ascending=False) \
        .reset_index(drop=True) \
        .fillna(0) \
        .astype({'grand_total': 'int32'})
    df_report = df_report.rename(columns={
        'branch_name': '分公司',
        'count_x': '不合规合同数量',
        'count_y': '合同总量',
        'percent': '比率',
        'total': '事业部不合规合同数量',
        'grand_total': '事业部合同总量',
        'average_percent': '平均比率'
    })
    return df_report


计算倒签

In [25]:
df_reverse = get_irregular(df_irregular_count2, '情况', '倒签')
df_total_reverse = total_irregular(dfs['new'], df_org_projects)
df_report_reverse = report_irregular(df_total_reverse, df_reverse)
df_report_reverse

Unnamed: 0,分公司,情况,不合规合同数量,合同总量,比率,事业部不合规合同数量,事业部合同总量,平均比率
0,产园-杭州公司,倒签,45,67,0.6716,156,430,0.3628
1,产园-深圳公司,倒签,69,143,0.4825,156,430,0.3628
2,南油平方,倒签,2,5,0.4,156,430,0.3628
3,产园-南京公司,倒签,26,72,0.3611,156,430,0.3628
4,产园-武汉公司,倒签,5,23,0.2174,156,430,0.3628
5,产园-重庆公司,倒签,6,36,0.1667,156,430,0.3628
6,园区运营中心,倒签,2,32,0.0625,156,430,0.3628
7,番禺科技园,倒签,1,34,0.0294,156,430,0.3628
8,文化产业公司,倒签,0,16,0.0,156,430,0.3628
9,产园-青岛公司,倒签,0,2,0.0,156,430,0.3628


计算应结未结

In [26]:
df_untermination = get_irregular(df_irregular_count2, '情况', '应结未结')
df_total_untermination = total_irregular(dfs['termination'], df_org_projects)
df_report_untermination = report_irregular(df_total_untermination, df_untermination)
df_report_untermination

Unnamed: 0,分公司,情况,不合规合同数量,合同总量,比率,事业部不合规合同数量,事业部合同总量,平均比率
0,产园-杭州公司,应结未结,51,92.0,0.5543,130,398,0.3266
1,产园-南京公司,应结未结,24,50.0,0.48,130,398,0.3266
2,产园-重庆公司,应结未结,9,25.0,0.36,130,398,0.3266
3,产园-深圳公司,应结未结,39,129.0,0.3023,130,398,0.3266
4,产园-武汉公司,应结未结,2,8.0,0.25,130,398,0.3266
5,园区运营中心,应结未结,3,34.0,0.0882,130,398,0.3266
6,番禺科技园,应结未结,2,41.0,0.0488,130,398,0.3266
7,文化产业公司,应结未结,0,15.0,0.0,130,398,0.3266
8,南油平方,应结未结,0,4.0,0.0,130,398,0.3266
9,产园-青岛公司,应结未结,0,0.0,0.0,130,398,0.3266


计算应算未算

In [27]:
df_unsettlement = get_irregular(df_irregular_count2, '情况', '应算未算')
df_total_unsettlement = total_irregular(dfs['settlement'], df_org_projects)
df_report_unsettlement = report_irregular(df_total_unsettlement, df_unsettlement)
df_report_unsettlement

Unnamed: 0,分公司,情况,不合规合同数量,合同总量,比率,事业部不合规合同数量,事业部合同总量,平均比率
0,文化产业公司,应算未算,1,15.0,0.0667,6,388,0.0155
1,产园-深圳公司,应算未算,4,114.0,0.0351,6,388,0.0155
2,园区运营中心,应算未算,1,35.0,0.0286,6,388,0.0155
3,南油平方,应算未算,0,4.0,0.0,6,388,0.0155
4,番禺科技园,应算未算,0,50.0,0.0,6,388,0.0155
5,产园-重庆公司,应算未算,0,24.0,0.0,6,388,0.0155
6,产园-南京公司,应算未算,0,50.0,0.0,6,388,0.0155
7,产园-杭州公司,应算未算,0,87.0,0.0,6,388,0.0155
8,产园-武汉公司,应算未算,0,9.0,0.0,6,388,0.0155
9,产园-青岛公司,应算未算,0,0.0,0.0,6,388,0.0155


提取不合操作要求的合同清单

In [28]:
df_irregular

Unnamed: 0,index,branch_id,branch_name,dept_id,dept_name,project_id,organ_id,项目名称,资源ids,资源名称,...,合同终止类型,终止审批状态,终止申请状态,终止申请类型,终止审批创建日期,申请id,old_contract_id,contract_id,情况,说明
0,0,1005,园区运营中心,1436204,北京产业创新中心,1435203,1435203,北京产业创新中心,551398,北京新时代国际中心A座14-BJCYCXZX-001,...,,,,,,42718,0,42727,倒签,已审批
1,1,1005,园区运营中心,1436205,价值工厂,1413262,1413262,价值工厂,526121,集装箱商业1层-2-101,...,提前终止,审批通过,正常,,,41663,32873,40565,应结未结,已终止
2,2,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536790,6栋1层-110,...,正常终止,审批通过,正常,终止申请,2022-02-25,42259,34361,40660,应算未算,未结算
3,3,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536728,2栋1层-122,...,正常终止,审批通过,正常,,,41846,0,35400,应结未结,已终止
4,4,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,南海意库-商业,536741,5栋1层-117-118,...,提前终止,审批通过,正常,,,43193,0,41510,应结未结,已终止
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
287,295,1435224,产园-武汉公司,1434222,东湖网谷,1424227,1424227,东湖网谷一期,515285,1号楼4层-401-3,...,,,,,,40752,0,40987,倒签,已审批
288,296,1435224,产园-武汉公司,1434222,东湖网谷,1424227,1424227,东湖网谷一期,515287,1号楼4层-403,...,提前终止,审批通过,正常,,,41084,0,19170,应结未结,已终止
289,297,1435224,产园-武汉公司,1434223,高新网谷,1427236,1427236,高新网谷,535702,1号楼9层-904,...,,,,,,41170,0,41183,倒签,已审批
290,298,1435224,产园-武汉公司,1434223,高新网谷,1427236,1427236,高新网谷,535727,1号楼10层-1013,...,,,,,,42019,0,41492,倒签,已审批


In [29]:
df_report_irregular = df_irregular[[
    'branch_name',
    'dept_name',
    '项目名称',
    '合同编号',
    '资源名称',
    '甲方名称',
    '乙方名称',
    '情况',
    '说明'
]].rename(columns={
    'branch_name': '分公司',
    'dept_name': '项目部'})

df_report_irregular


Unnamed: 0,分公司,项目部,项目名称,合同编号,资源名称,甲方名称,乙方名称,情况,说明
0,园区运营中心,北京产业创新中心,北京产业创新中心,bjcycxzx-2022-03-1019,北京新时代国际中心A座14-BJCYCXZX-001,深圳市招商创业有限公司,北京至曙经贸有限公司,倒签,已审批
1,园区运营中心,价值工厂,价值工厂,jzgc-2021-12-0087,集装箱商业1层-2-101,招商局蛇口工业区控股股份有限公司,恩佐（深圳） 汽车服务有限公司,应结未结,已终止
2,园区运营中心,南海意库-商业,南海意库-商业,nhyk-sy-2021-12-1131,6栋1层-110,招商局蛇口工业区控股股份有限公司,深圳剪刀侠美发管理有限公司,应算未算,未结算
3,园区运营中心,南海意库-商业,南海意库-商业,nhyk-2019-04-0359,2栋1层-122,招商局蛇口工业区控股股份有限公司,深圳市国宾大酒店有限公司,应结未结,已终止
4,园区运营中心,南海意库-商业,南海意库-商业,nhyk-sy-2022-02-1158,5栋1层-117-118,招商局蛇口工业区控股股份有限公司,深圳潮石先生艺术时尚品牌管理有限公司,应结未结,已终止
...,...,...,...,...,...,...,...,...,...
287,产园-武汉公司,东湖网谷,东湖网谷一期,dhwgyq-2022-01-1089,1号楼4层-401-3,武汉右岸网谷产业园有限公司,武汉埃申测控技术有限公司,倒签,已审批
288,产园-武汉公司,东湖网谷,东湖网谷一期,dhwgyq-2019-10-1008,1号楼4层-403,武汉右岸网谷产业园有限公司,湖北荣屹昊机器人科技有限公司,应结未结,已终止
289,产园-武汉公司,高新网谷,高新网谷,gxwg-2022-01-1101,1号楼9层-904,武汉船舶配套工业园有限公司,武汉仕代环境科技有限公司,倒签,已审批
290,产园-武汉公司,高新网谷,高新网谷,gxwg-2022-02-1123,1号楼10层-1013,武汉船舶配套工业园有限公司,湖北天合致远工程有限公司,倒签,已审批


提取不合操作要求的增量合同数据

In [30]:
# output_dir_name = 'output'
# out_dir = Path.cwd() / output_dir_name
# if not out_dir.exists():
#     out_dir.mkdir()
# filename_lp = out_dir / '2022-03-25-租赁平台-合同规范性检查（下发）.xlsx'
filename_lp = files[-2]
df_lp = pd.read_excel(filename_lp, sheet_name='不规范合同')
df_report_increase = df_report_irregular[~df_report_irregular['合同编号']
                                         .isin(df_lp['合同编号'])] \
    .reset_index(drop=True)
df_report_increase


Unnamed: 0,分公司,项目部,项目名称,合同编号,资源名称,甲方名称,乙方名称,情况,说明
0,南油平方,南油集团-仓库,仓库,CMNY-经（2022）-仓库-0007,保税港一期仓库1-4层（整体）-306A,深圳市南油（集团）有限公司,广东顺丰电子商务有限公司,倒签,已审批
1,南油平方,南油集团-仓库,仓库,CMNY-经（2022）-仓库-0006,保税港一期仓库1-4层（整体）-102,深圳市南油（集团）有限公司,深圳鑫荣鹏程供应链管理有限公司,倒签,已审批
2,产园-深圳公司,蛇口网谷,万融大厦,wrds-2022-04-1137,万融大厦C座3层-309,深圳市万融大厦管理有限公司,华景山海控股（深圳）有限公司,倒签,已审批
3,产园-深圳公司,蛇口网谷,万融大厦,wrds-2022-03-1136,万融大厦C座3层-308,深圳市万融大厦管理有限公司,招商局健康产业发展（苏州）有限公司,倒签,已审批
4,产园-深圳公司,光明科技园,招商局光明科技园,招光加22A024,二期研发楼A2栋4层-A2-0407,招商局光明科技园有限公司,上海梵荣国际贸易有限公司,倒签,已审批
5,产园-深圳公司,光明科技园,招商局光明科技园,招光加20A055 补充协议（新增物管费用）,"二期研发楼A2栋5层-A2-0502,二期研发楼A2栋5层-A2-0503,二期研发楼A2栋...",招商局光明科技园有限公司,深圳礼意久久网络科技有限公司,应结未结,已终止
6,产园-深圳公司,光明科技园,招商局光明科技园,招光加20B007 补充协议（新增物管费用）,二期研发厂房A6栋厂房1层A6-2A,招商局光明科技园有限公司,深圳芯珑电子技术有限公司,应结未结,已终止
7,产园-南京公司,高铁网谷,招商高铁网谷,ZSGTWG-2022-017,B座3层308,南京铁盛商业管理有限公司,南京天加贸易有限公司,应结未结,已终止
8,产园-南京公司,高铁网谷,招商高铁网谷,zsgtwg-2022-02-1315,B座7层-B座-708、709,南京铁盛商业管理有限公司,江苏方进建筑工程有限公司,应结未结,已终止
9,产园-杭州公司,上海森兰美奂创库,森兰美奂创库,slmhck-2021-07-1132,"森兰美奂大厦A栋B座6层-642-CK676,森兰美奂大厦A栋B座6层-642-CK677,...",上海浦隽房地产开发有限公司,吴颖晔,应结未结,执行中


In [31]:
def total_irregular_dept(df_irregular_, df_org_projects_):
    df = pd.merge(df_org_projects_,
                  df_irregular_,
                  left_on='project_id',
                  right_on='organ_id'
                  )
    df_total = df.groupby(['branch_name', 'dept_name'])['organ_id'] \
        .count() \
        .rename('count') \
        .reset_index()
    return df_total


def report_irregular_dept(df_total_, df_irregular_):
    df_report = pd.merge(df_irregular_,
                         df_total_,
                         how='left',
                         on=['branch_name', 'dept_name']
                         )
    df_report['percent'] = round(
        df_report['count_x'] /
        df_report['count_y'],
        4
    )
    df_report['total'] = df_report.groupby(
        ['branch_name']
    )['count_x'].transform('sum')
    df_report['grand_total'] = df_report.groupby(
        ['branch_name']
    )['count_y'].transform('sum')
    df_report['average_percent'] = round(
        df_report['total'] /
        df_report['grand_total'],
        4
    )
    df_report = df_report.sort_values(['branch_name', 'percent'], ascending=False) \
        .reset_index(drop=True) \
        .fillna(0) \
        .astype({'grand_total': 'int32'})
    df_report = df_report.rename(columns={
        'branch_name': '分公司',
        'dept_name': '项目部',
        'count_x': '不合规合同数量',
        'count_y': '合同总量',
        'percent': '比率',
        'total': '分公司不合规合同数量',
        'grand_total': '分公司合同总量',
        'average_percent': '平均比率'
    }).astype({
        '合同总量': 'int32'
    }).set_index(['分公司', '项目部'])
    return df_report


计算倒签(按项目部统计)

In [32]:
df_dept_reverse = get_irregular(df_irregular_dept_count2, '情况', '倒签')
df_total_dept_reverse = total_irregular_dept(dfs['new'], df_org_projects)
df_report_dept_reverse = report_irregular_dept(df_total_dept_reverse, df_dept_reverse)
df_report_dept_reverse

Unnamed: 0_level_0,Unnamed: 1_level_0,情况,不合规合同数量,合同总量,比率,分公司不合规合同数量,分公司合同总量,平均比率
分公司,项目部,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
番禺科技园,番禺科技园,倒签,1,34,0.0294,1,34,0.0294
文化产业公司,文化公司其他租赁,倒签,0,16,0.0,0,16,0.0
文化产业公司,海上世界文化艺术中心,倒签,0,0,0.0,0,16,0.0
园区运营中心,北京产业创新中心,倒签,1,4,0.25,2,32,0.0625
园区运营中心,园区运营中心其他,倒签,1,6,0.1667,2,32,0.0625
园区运营中心,价值工厂,倒签,0,5,0.0,2,32,0.0625
园区运营中心,南海意库-商业,倒签,0,8,0.0,2,32,0.0625
园区运营中心,蛇口网谷-商业,倒签,0,9,0.0,2,32,0.0625
园区运营中心,创业壹号A座招商创库,倒签,0,0,0.0,2,32,0.0625
南油平方,南油集团-仓库,倒签,2,5,0.4,2,5,0.4


计算应结未结(按项目部统计)

In [33]:
df_dept_untermination = get_irregular(df_irregular_dept_count2, '情况', '应结未结')
df_total_dept_untermination = total_irregular_dept(dfs['termination'], df_org_projects)
df_report_dept_untermination = report_irregular_dept(df_total_dept_untermination, df_dept_untermination)
df_report_dept_untermination

Unnamed: 0_level_0,Unnamed: 1_level_0,情况,不合规合同数量,合同总量,比率,分公司不合规合同数量,分公司合同总量,平均比率
分公司,项目部,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
番禺科技园,番禺科技园,应结未结,2,41,0.0488,2,41,0.0488
文化产业公司,文化公司其他租赁,应结未结,0,14,0.0,0,15,0.0
文化产业公司,海上世界文化艺术中心,应结未结,0,1,0.0,0,15,0.0
园区运营中心,南海意库-商业,应结未结,2,11,0.1818,3,34,0.0882
园区运营中心,价值工厂,应结未结,1,7,0.1429,3,34,0.0882
园区运营中心,北京产业创新中心,应结未结,0,2,0.0,3,34,0.0882
园区运营中心,蛇口网谷-商业,应结未结,0,10,0.0,3,34,0.0882
园区运营中心,园区运营中心其他,应结未结,0,4,0.0,3,34,0.0882
园区运营中心,创业壹号A座招商创库,应结未结,0,0,0.0,3,34,0.0882
南油平方,南油集团-仓库,应结未结,0,4,0.0,0,4,0.0


计算应算未算(按项目部统计)

In [34]:
df_dept_unsettlement = get_irregular(df_irregular_dept_count2, '情况', '应算未算')
df_total_dept_unsettlement = total_irregular_dept(dfs['settlement'], df_org_projects)
df_report_dept_unsettlement = report_irregular_dept(df_total_dept_unsettlement, df_dept_unsettlement)
df_report_dept_unsettlement

Unnamed: 0_level_0,Unnamed: 1_level_0,情况,不合规合同数量,合同总量,比率,分公司不合规合同数量,分公司合同总量,平均比率
分公司,项目部,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
番禺科技园,番禺科技园,应算未算,0,50,0.0,0,50,0.0
文化产业公司,海上世界文化艺术中心,应算未算,1,1,1.0,1,15,0.0667
文化产业公司,文化公司其他租赁,应算未算,0,14,0.0,1,15,0.0667
园区运营中心,南海意库-商业,应算未算,1,11,0.0909,1,35,0.0286
园区运营中心,北京产业创新中心,应算未算,0,2,0.0,1,35,0.0286
园区运营中心,价值工厂,应算未算,0,7,0.0,1,35,0.0286
园区运营中心,蛇口网谷-商业,应算未算,0,10,0.0,1,35,0.0286
园区运营中心,园区运营中心其他,应算未算,0,5,0.0,1,35,0.0286
园区运营中心,创业壹号A座招商创库,应算未算,0,0,0.0,1,35,0.0286
南油平方,南油集团-仓库,应算未算,0,4,0.0,0,4,0.0


## 导出下发数据


In [32]:
output_dir_name = 'output'
out_dir = Path.cwd() / output_dir_name
if not out_dir.exists():
    out_dir.mkdir()

out_filename = f'{filename.stem}-租赁平台-合同规范性检查（下发）.xlsx'

out_path = out_dir / out_filename

with pd.ExcelWriter(out_path) as writer:
    df_report_irregular.to_excel(writer, sheet_name='不合规范合同清单')
    df_report_increase.to_excel(writer, sheet_name='不合规范合同清单(增量)')
    df_report_reverse.to_excel(writer, sheet_name='倒签统计')
    df_report_untermination.to_excel(writer, sheet_name='应结未结统计')
    df_report_unsettlement.to_excel(writer, sheet_name='应算未算统计')
    df_report_dept_reverse.to_excel(writer, sheet_name='倒签统计(按项目部)')
    df_report_dept_untermination.to_excel(writer, sheet_name='应结未结统计(按项目部)')
    df_report_dept_unsettlement.to_excel(writer, sheet_name='应算未算统计(按项目部)')


## 底层实现逻辑

以下代码是计算某种不合规要求的底层计算逻辑, 用于逻辑备查.

In [192]:
df_reverse = df_irregular_count2[df_irregular_count2['情况'] == '应结未结'] \
    .reset_index(drop=True)
df_reverse


Unnamed: 0,branch_name,情况,count
0,园区运营中心,应结未结,3
1,文化产业公司,应结未结,0
2,南油平方,应结未结,0
3,番禺科技园,应结未结,2
4,产园-深圳公司,应结未结,38
5,产园-重庆公司,应结未结,9
6,产园-南京公司,应结未结,22
7,产园-杭州公司,应结未结,49
8,产园-武汉公司,应结未结,2
9,产园-青岛公司,应结未结,0


In [193]:
df_new = pd.merge(df_org_projects,
                  dfs['termination'],
                  left_on='project_id',
                  right_on='organ_id'
                  )
df_new.head()


Unnamed: 0,branch_id,branch_name,dept_id,dept_name,project_id,organ_id,community_id,项目名称,资源id,资源名称,...,合同来源,合同终止类型,终止申请类型,终止审批状态,结算状态,终止申请状态,申请id,申请人,old_contract_id,contract_id
0,1005,园区运营中心,1436205,价值工厂,1413262,1413262,1413228,价值工厂,526121,集装箱商业1层-2-101,...,变更合同,提前终止,终止申请类型,审批通过,已结算,正常,41663.0,王昆,32873,40565
1,1005,园区运营中心,1436205,价值工厂,1413262,1413262,1413228,价值工厂,159461594715959,"价值工厂1层-机械大厅-104,价值工厂1层-机械大厅-105,价值工厂1层-机械大厅110",...,变更合同,提前终止,终止申请类型,审批通过,已结算,正常,42605.0,王昆,41067,41265
2,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,1433220,南海意库-商业,536723,2栋1层-115,...,新签合同,提前终止,终止申请类型,审批通过,可结算,正常,41607.0,欧阳冰,0,35385
3,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,1433220,南海意库-商业,536728,2栋1层-122,...,新签合同,正常终止,终止申请类型,审批通过,可结算,正常,41846.0,欧阳冰,0,35400
4,1005,园区运营中心,1436206,南海意库-商业,1433221,1433221,1433220,南海意库-商业,536741,5栋1层-117-118,...,续签合同,正常终止,终止申请类型,审批通过,已结算,正常,41604.0,欧阳冰,35419,38910


In [184]:
df_new_count = df_new.groupby('branch_name')['organ_id'] \
    .count() \
    .rename('count') \
    .reset_index()
df_new_count


Unnamed: 0,branch_name,count
0,产园-南京公司,26
1,产园-杭州公司,61
2,产园-武汉公司,3
3,产园-深圳公司,67
4,产园-重庆公司,22
5,南油平方,2
6,园区运营中心,10
7,文化产业公司,1
8,番禺科技园,18


In [191]:
df_reverse_result = pd.merge(df_reverse,
                             df_new_count,
                             how='left',
                             on='branch_name'
                             )
df_reverse_result['percent'] = round(
    df_reverse_result['count_x'] /
    df_reverse_result['count_y'],
    4
)
df_reverse_result['average_percent'] = round(
    df_reverse_result['count_x'].sum() /
    df_reverse_result['count_y'].sum(),
    4
)

df_reverse_result = df_reverse_result.sort_values('percent', ascending=False) \
    .reset_index(drop=True) \
    .fillna(0)
df_reverse_result


Unnamed: 0,branch_name,情况,count_x,count_y,percent,average_percent
0,产园-武汉公司,倒签,5,3.0,1.6667,0.7476
1,产园-南京公司,倒签,27,26.0,1.0385,0.7476
2,产园-深圳公司,倒签,67,67.0,1.0,0.7476
3,产园-杭州公司,倒签,48,61.0,0.7869,0.7476
4,产园-重庆公司,倒签,6,22.0,0.2727,0.7476
5,园区运营中心,倒签,2,10.0,0.2,0.7476
6,番禺科技园,倒签,2,18.0,0.1111,0.7476
7,文化产业公司,倒签,0,1.0,0.0,0.7476
8,南油平方,倒签,0,2.0,0.0,0.7476
9,产园-青岛公司,倒签,0,0.0,0.0,0.7476


In [73]:
df = dfs['new']
df.loc[df['']]

Unnamed: 0,organ_id,community_id,项目名称,资源ids,资源名称,合同编号,甲方名称,乙方名称,合同录入日期,合同开始日期,...,合同终止类型,终止审批状态,终止申请状态,终止申请类型,终止审批创建日期,申请id,old_contract_id,contract_id,情况,说明
0,1032,1008,科健大厦,534279,科健大厦-广告位2,kjds-2022-02-0109,深圳市招商创业有限公司,驰众广告有限公司,2022-02-15,2022-01-01,...,,,,,,41879,0,41441,倒签,已审批
1,1020,1015,招港大厦,94603749460375,"招港大厦7层-704,招港大厦7层-705",zgds-2022-01-0056,深圳市招商创业有限公司,深圳市大田粮食有限公司,2022-01-21,2022-01-24,...,,,,,,41501,0,41316,倒签,已审批
2,1412212,1410209,金山意库,539439,1号楼-1号楼1.5层连廊,jsyk-2022-02-0658,重庆招商金山意库商业管理有限公司,重庆壹艺库文化艺术发展有限公司,2022-02-10,2022-02-15,...,,,,,,41804,0,41424,倒签,已审批
3,1412212,1410209,金山意库,549964,9号楼-场地租赁,jsyk-2022-03-0663,重庆招商金山意库商业管理有限公司,丁思明,2022-03-10,2022-03-13,...,,,,,,42727,0,41743,倒签,已审批
4,1412212,1410209,金山意库,550744,3号楼-3栋室外部分,jsyk-2022-03-0665,重庆招商金山意库商业管理有限公司,丁思明,2022-03-14,2022-03-14,...,,,,,,42824,0,41795,倒签,已审批
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
560,1437215,1437214,重庆金山意库招商创库,545755,8栋4层-JSYKA04FL-104,cqjsykzsck-2022-01-1009,重庆招商金山意库商业管理有限公司,李蓓蓓,2022-01-14,2022-02-01,...,,,,,,41240,0,42182,倒签,已审批
561,1437215,1437214,重庆金山意库招商创库,545987,8栋4层-JSYKR04F3,cqjsykzsck-2022-01-1011,重庆招商金山意库商业管理有限公司,重庆礼悠格诗食品有限公司,2022-01-19,2022-02-01,...,,,,,,41507,0,42231,倒签,已审批
562,1437215,1437214,重庆金山意库招商创库,545806,8栋4层-JSYKA04FL-207,cqjsykzsck-2022-03-1017,重庆招商金山意库商业管理有限公司,重庆梵瑞装饰设计有限公司,2022-03-19,2022-04-01,...,,,,,,43060,0,42839,倒签,已审批
563,1437217,1437216,海门邮轮研究院,548193,海门邮轮研究院4层-独立办公403,hmylyjy-2022-02-1001,南通招海置业有限公司,南通思诺船舶科技有限公司,2022-02-23,2022-02-15,...,,,,,,42475,0,41534,倒签,已审批
