# Validate 
Author: Mark Chinnock  
Date: 19/07/2024

This script can read in excel file(s) from sharepoint that have previously been extracted from 3dx, processed to add the additional functions/metrics and passed to smartsheets.  Currently, this is what it is doing as I am expecting to process the historic files to create summary metrics.

Going forward this could read in the 3dx extract directly to validate the latest attributes.

It can process as many files as you want, for as many product structures as you want, and build a history of data quality.  Currently writing out to excel file named the same as the input file and suffixed with '_validated'.

Required Inputs:
* 3dx extract xlsx file - as long as the standard columns used for calculating the metrics are present this will process it.

Outputs Produced:
* xlsx spreadsheet with full input BOM written to a sheet with columns for each metric appended to far right columns for reporting in power bi/excel
* Additional sheets written into xlsx for each validation rule, showing the subset of records that failed each rule (might be useful for quick dashboard displays)

# 1. Validation Rules

## 1.1 Makes without Buys
For each Make part there should be an assembly that needs at least one Buy part below it to make sense - if you're going to bake a cake, you need at least 1 ingredient!  If you're buying a cake, then you don't need anything else!

If a MAKE (source code 'AIH','MIH','MOB') is not followed by a child BUY this is a problem

In [22]:
def check_make_no_buy(df):
    # if MAKE and there is no BUY below next row's part level less than or equal to current part level we have a MAKE without a BUY
    # df['PROVIDE'] = np.where(df['Source Code'].isin(['AIH','MIH','MOB']),'Make','Buy')
    make_no_buy = list(df[(df['Source Code'].isin(['AIH','MIH','MOB'])) & (df['Level'].shift(-1) <= df['Level'])].index)
    make_no_buy = sorted(make_no_buy)
    df['make_no_buy'] = np.where((df['Source Code'].isin(['AIH','MIH','MOB'])) & (df['Level'].shift(-1) <= df['Level']), True, False)

    return df, make_no_buy

## 1.2 Parent Source Code

We will need to track the source code of the parent part and use it for validate checks coming later.

This interates through the dataframe and appends to each row the parent part source code (the level numerically above this row's level)

In [23]:
def parent_source_code(df):
    prev_level = 0

    level_source = {}

    for i, x in df.iterrows():
        # take the current level source and store it
        level_source[x['Level']] = x['Source Code']
        if ((x['Level'] >= 4)):
            df.loc[i, 'Parent Source Code'] = level_source[x['Level'] - 1]
    
    return df

## 1.3 Source Code within parent checks

sc_check_list is a list of invalid scenarios we are checking for with syntax: SOURCE CODE_PARENT SOURCE CODE

A dataframe of invalid rows is written to dict_checks[sc_check] and a column is added to the end of the main dataframe with the sc_check as a column name holding true or false

In [24]:
def source_code_within_parent_checks(dict_checks, df):
    # check for combinations of source codes within a parent source that's not accepted
    sc_check_list = ['AIH_POA','BOP_FIP','FAS_FAS','FIP_FIP','FIP_FAS']
    
    for sc_check in sc_check_list:
        sc, parent_sc = sc_check.split('_')

        dict_checks[sc_check] = df[(df['Source Code'] == sc) & (df['Parent Source Code'] == parent_sc)]

        df[sc_check] = np.where((df['Source Code'] == sc) & (df['Parent Source Code'] == parent_sc), True, False)


    return dict_checks, df

## 1.4 Level 4 Source Code Checks

level 4 (assembly level when first level = 0) should only have Source Code 'MIH' or 'AIH'

In [25]:
def check_level_4_source_code_checks(dict_checks, df):
    # level 4 can only be MIH or AIH
    dict_checks['Non_MIH_AIH_Level_4'] = df[(df['Level'] == 4) & (~df['Source Code'].isin(['MIH','AIH']))]
    df['Non_MIH_AIH_Level_4'] = np.where((df['Level'] == 4) & (~df['Source Code'].isin(['MIH','AIH'])), True, False)

    return dict_checks, df

## 1.5 Fasteners with wrong parent source code

Fasteners should only be within parents of 'FIP','AIH,'MIH'



In [26]:
def FAS_wrong_parent_source_code(dict_checks, df):
    # FAS can only be within a FIP, AIH or MIH parent
    dict_checks['FAS_Wrong_Parent_Source_code'] = df[(df['Source Code'] == 'FAS') & (~df['Parent Source Code'].isin(['FIP','AIH','MIH']))]
    df['FAS_Wrong_Parent_Source_code'] = np.where((df['Source Code'] == 'FAS') & (~df['Parent Source Code'].isin(['FIP','AIH','MIH'])), True, False)

    return dict_checks, df


## 1.6 Fastener checks

Look for scenarios where a description says washer, bolt or grommet but the source code says 'BOF'.  

In [27]:
def fastener_checks(dict_checks, df):
    # All BOF records that are fasteners should be {FAS}teners in the BOMS
    # Part Description contains washer, bolt, grommet
    # Source code = "BOF"
    fastener_check_list = ['^washer|^bolt|^grommet']        

    dict_checks['FAS_as_BOF'] = df[(df['Description'].str.lower().str.contains('{}'.format(fastener_check_list))) & (df['Source Code'] == 'BOF')]
    df['FAS_as_BOF'] = np.where((df['Description'].str.lower().str.contains('{}'.format(fastener_check_list))) & (df['Source Code'] == 'BOF'), True, False)

    return dict_checks, df

## 1.7 Filter check columns

For writing out to excel on separate sheets, only need to keep the pertinent columns

In [28]:
def filter_check_columns(dict_checks):
    # reduce the selection of columns used for writing out later
    check_columns = [
    'orig_sort',
    'Last Modified By',
    'Owner',
    'Function Group',
    'System',
    'Sub System',
    'Level',
    'Title',
    'Revision',
    'Description',
    'Parent Part',
    'Source Code',
    'Quantity',
    'Parent Source Code'
    ]

    for key in dict_checks.keys():
        print (key)
        dict_checks[key] = dict_checks[key][check_columns]

    return dict_checks

## 1.8 Validate GMT Standards 3DX Attributes Requirements

Read in the GMT Standards document from GMT sharepoint folder and confirm 3dx extract contains the same columns and valid values



In [29]:
def check_attributes(df):
    attr_filename = '3DX Attributes Requirements for Release and Clarification.xlsx'

    attr = pd.read_excel(attr_filename, sheet_name='Drop Down Attributes', na_values="", keep_default_na=False)

    # create a dictionary of all the valid values for each column - this drops the nan values for each column
    attr_d = {attr[column].name: [y for y in attr[column] if not pd.isna(y)] for column in attr}

    for key in attr_d:
        # check the column exists
        try:
            mask = df[key].isin(attr_d[key])
            df[key + ' Check'] = np.where(mask, 'Valid','Invalid')
        except KeyError as e:
            df[key + ' Check'] = 'Not in Extract'

    return df


# 1.9 Validate BoM and Function Group Structure GMT document

This document is stored on GMT-EngineeringBoM sharepoint in GMT Standards folder:  
https://forsevengroup.sharepoint.com/:x:/r/sites/GMT-EngineeringBoM/Shared%20Documents/GMT%20-%20Standards/BoM%20and%20Function%20Group%20Structure%20GMT.xlsx?d=wc3cbfc77631c40b69ba7d5026066a2e7&csf=1&web=1&e=B64OP2

# 1.10 Validate Part No

at the moment the Part Number (Title) should be:

[project]-[functional area][5 digit part number] == 11 characters

In [30]:
def validate_part_no(df):
    # ? means the preceding bracketed group is optional (optional s,S and trailing X)
    pattern = r'([A-Z]\d{2}[e])-(\w[A-Za-z0-9]*)?-?([A-Z])(\d{5})(X)?'
    df[['extr_project','extr_invalid_code','extr_function','extr_pn','extr_maturity']] = df['Title'].str.extract(pattern, expand=True)

    return df

# 2. Script config setup

In [31]:
import pandas as pd
import numpy as np
import os
import re
import io
import xlwings as xw
import openpyxl
from pathlib import Path
import argparse
import platform
import sys


function to determine whether we're running in Juypter notebook or as a command line script

In [32]:
def type_of_script():
    '''
        determine where this script is running
        return either jupyter, ipython, terminal
    '''
    try:
        ipy_str = str(type(get_ipython()))
        if 'zmqshell' in ipy_str:
            return 'jupyter'
        if 'terminal' in ipy_str:
            return 'ipython'
    except:
        return 'terminal'

In [33]:
def add_missing_metrics(df):
    for col in ['UOM','Provide','Source Code','CAD Mass','CAD Maturity','CAD Material','Electrical Connector']:
        df['Missing {}'.format(col)] = np.where(df[col].isnull(), 1, 0)

    return df

In [34]:
def CAD_Material_validation(df):
    df[1:][df['Title'].str.contains('TPP', na=False)].groupby(['Title','CAD Material']).size()


In [35]:
def add_bi_key(df):
    # for use in power_bi reporting
    # replace NaN with ''
    df['bi_combined_key'] = df['Product'].astype(str) + df['Function Group'].astype(str) + df['System'].astype(str) + df['Sub System'].astype(str)
    
    return df['bi_combined_key']

determine the folder structure based on whether we're running on a test windows pc, in azure server, a mac, or in the real world against sharepoint - helps Mark test on different devices! 

In [36]:
def set_folder_defaults():
    if 'macOS' in platform.platform():
        # set some defaults for testing on mac
        download_dir = Path('/Users/mark/Downloads')
        user_dir = download_dir
        sharepoint_dir = download_dir

    elif 'Server' in platform.platform():
        # we're on the azure server (probably)
        user_dir = Path('Z:/python/FilesIn')

        download_dir = Path(user_dir)
        user_dir = download_dir
        sharepoint_dir = Path('Z:/python/FilesOut')

    elif os.getlogin() == 'mark_':
        # my test windows machine
        download_dir = Path('C:/Users/mark_/Downloads')
        user_dir = download_dir
        sharepoint_dir = download_dir        

    else:
        # personal one drive
        user_dir = 'C:/Users/USERNAME'

        # replace USERNAME with current logged on user
        user_dir = user_dir.replace('USERNAME', os.getlogin())

        # read in config file
        config = configparser.ConfigParser()
        config.read('user_directory.ini')

        # read in gm_dir and gm_docs from config file
        gm_dir = Path(config[os.getlogin().lower()]['gm_dir'])
        gm_docs = Path(config[os.getlogin().lower()]['gmt'])
        # this may find more than one sharepoint directory
        # sharepoint_dir = user_dir + "/" + gm_dir + "/" + gm_docs
        sharepoint_dir = Path(user_dir / gm_dir / gm_docs)

        # download_dir = os.path.join(sharepoint_dir, 'Data Shuttle', 'downloads')
        download_dir = Path(sharepoint_dir / 'Data Shuttle' / 'downloads')

    return sharepoint_dir, download_dir, user_dir

based on the folder defaults look for the files we're interested in

In [37]:
def find_files(download_dir):
    # find any changed files changed in past 2hrs in the downloads directory
    dirpath = download_dir
    files = []
    for p, ds, fs in os.walk(dirpath):
        for fn in fs:
            if 'Updated_' in fn:
                # was using this to filter what filenames to find
                filepath = os.path.join(p, fn)
                files.append(filepath)

    return files

In [38]:
def lookup_variant(search):

    variant_d = {'T48E-01-Z00001':'VP_5_door',
                'T48E-01-Z00005':'XP_5_door',
                'T48E-02-Z00001':'VP_3_door',
                'T48E-02-Z00005':'XP_3_door'}
    
    return variant_d[search]

# 3. Write to excel

Call xlwings with your pre-prepared dictionary and write out many sheets to one excel file, naming the sheets whatever you called your dictionary keys

In [39]:
def write_to_xl(outfile, df_dict):
    import xlwings as xw
    with xw.App(visible=True) as app:
        try:
            wb = xw.Book(outfile)
            print ("writing to existing {}".format(outfile))
        except FileNotFoundError:
            # create a new book
            print ("creating new {}".format(outfile))
            wb = xw.Book()
            wb.save(outfile)

        for key in df_dict.keys():
            try:
                ws = wb.sheets.add(key)
            except Exception as e:
                print (e)
            
            ws = wb.sheets[key]

            table_name = key

            ws.clear()

            df = df_dict[key].set_index(list(df_dict[key])[0])
            if table_name in [table.df for table in ws.tables]:
                ws.tables[table_name].update(df)
            else:
                table_name = ws.tables.add(source=ws['A1'],
                                            name=table_name).update(df)
    wb.save(outfile)

# write out to excel using sub system

This was used previously (GMD) might be useful again so haven't removed, but not currently calling.

Writes out the checks to sheets filtered against the sub system - maybe useful if we wanted to give the problem rows to a team to manage

In [40]:
def write_to_xl_sub_system(dict_checks):
    sub_sys = dict_checks[check]['Sub System'].unique()
    sub_sys.sort()

    for s_sys in sub_sys:

        df_temp = dict_checks[check][dict_checks[check]['Sub System'] == s_sys]

        if df_temp.shape[0] > 0:
            df_temp.to_excel(writer, sheet_name=s_sys, index=False)

            ws = writer.sheets[s_sys]
            wb = writer.book

            excel_formatting.adjust_col_width_from_col(ws)

# 4. Main Processing

This is where the processing begins, and where we call the functions defined above.  

In [41]:
if __name__ == '__main__':

    # for reading in multiple files

    # files = find_files()
    dict_df = {}

    filename = 'Updated_T48e-01-Z00005_2024-07-26.xlsx'

    # variant ie T48E-01-Z00001
    variant = lookup_variant(re.split('_', filename)[1].upper())
    # product ie T48E
    product = re.split('_|-', filename)[1].upper()

    sharepoint_dir, download_dir, user_dir = set_folder_defaults()

    file = Path(download_dir) / filename

    with open(file, "rb") as f:
        # reading in the historic excel files
        df = pd.read_excel(f, parse_dates=True)
        f.close()

    # copy df without the 1st row of metrics and 1st col of BOM COUNTs
    df = df.iloc[1:,1:]        

    df.reset_index(drop=False, inplace=True)
    df.rename(columns={'index':'bom_order'}, inplace=True)

    # add variant column for merging multiple BOMs together and reporting on 1 dashboard
    df['Variant'] = variant
    df['Product'] = product
    


In [42]:
    df['bi_combined_key'] = add_bi_key(df)    

    df = add_missing_metrics(df)
    # add part validation
    df = validate_part_no(df)

    # add parent source code to each row for validation checks to come
    df = parent_source_code(df)

    # initialise a dictionary to store all the check output as dataframes
    dict_checks = {}

    # complete the source code with parent source code checks
    dict_checks, df = source_code_within_parent_checks(dict_checks, df)

    # check all level 4 have the correct source code
    dict_checks, df = check_level_4_source_code_checks(dict_checks, df)

    # check for FAS with the wrong source code
    dict_checks, df = FAS_wrong_parent_source_code(dict_checks, df)

    # complete the fasteners source code checks
    dict_checks, df = fastener_checks(dict_checks, df)

    # look for make assemblys with no parts to buy
    df, make_no_buy = check_make_no_buy(df)
    dict_checks['make_no_buy'] = df.loc[make_no_buy]

    # validate the 3dx attributes that have dropdowns
    df = check_attributes(df)

    # write out just the cols we need to report against
    dict_checks = filter_check_columns(dict_checks)

    # add the full df to the sheet
    dict_checks['BOM'] = df



AIH_POA
BOP_FIP
FAS_FAS
FIP_FIP
FIP_FAS
Non_MIH_AIH_Level_4
FAS_Wrong_Parent_Source_code
FAS_as_BOF
make_no_buy


# 5. Development

Dumping ground for checks that might come in

## Non Level 4 ASSY

Should any part with ASSY in description be at Level 4 only?

In [43]:
def Non_level_4_ASSY(df):
    df[df.Description.str.contains('ASSY', na=False)].groupby(['Level']).size()


## Multiple Source Codes

Check whether a part has been configured with more than one source code within the product structure

Is it valid to have a TFF part that's FAS and POA?

In [44]:
def multi_source_code(df):
    unstacked = df.groupby(['Title','Revision','Source Code']).size().unstack()

    # find number of columns dynamically, as number of unique status controls the number of columns
    expected_status_count = len(unstacked.columns) - 1
    unstacked2 = unstacked[unstacked.isna().sum(axis=1)!=expected_status_count]
    unstacked2


    multi_sc = unstacked2.reset_index().fillna('')

    # make_sc_cols = ['AIH','MIH','MOB']

    first_cols = ['Title', 'Revision']

    cols_to_order = first_cols
    sc_ordered_cols = cols_to_order + (multi_sc.columns.drop(cols_to_order).tolist())

    multi_sc = multi_sc[sc_ordered_cols]

    return multi_sc


### Write out source code checks to excel

In [45]:
# Write out to excel
# pathfile = Path(file.name).stem
# outfile_name = Path(product + '_' + variant + '_power_bi_metrics'
# output_file = Path(sys.path[0]) / Path(pathfile + '_validated').with_suffix('.xlsx')
outfile_name = product + '_' + variant
output_file = Path(sys.path[0]) / Path(outfile_name + '_power_bi_metrics').with_suffix('.xlsx')
# write_to_xl(output_file, dict_checks)

# using inline write to excel as this seems to work better on mac.  
outfile = output_file
df_dict = dict_checks

import xlwings as xw
try:
    wb = xw.Book(output_file)
    print ("writing to existing {}".format(outfile))
except FileNotFoundError:
    # create a new book
    print ("creating new {}".format(outfile))
    wb = xw.Book()
    wb.save(outfile)

for key in df_dict.keys():
    try:
        ws = wb.sheets.add(key)
    except Exception as e:
        print (e)
    
    ws = wb.sheets[key]

    table_name = key

    ws.clear()

    df = df_dict[key].set_index(list(df_dict[key])[0])
    if len(df) > 0:
        if table_name in [table.df for table in ws.tables]:
            ws.tables[table_name].update(df)
        else:
            table_name = ws.tables.add(source=ws['A1'],
                                        name=table_name).update(df)

creating new /Users/mark/Documents/BOM_from_3DX/T48E_XP_5_door_power_bi_metrics.xlsx
Command failed:
		OSERROR: -1728
		MESSAGE: The object you are trying to access does not exist
		COMMAND: app(pid=46303).workbooks['T48E_XP_5_door_power_bi_metrics.xlsx'].sheets['Sheet2'].name.get()
Command failed:
		OSERROR: -1728
		MESSAGE: The object you are trying to access does not exist
		COMMAND: app(pid=46303).workbooks['T48E_XP_5_door_power_bi_metrics.xlsx'].sheets['Sheet3'].name.get()
Command failed:
		OSERROR: -1728
		MESSAGE: The object you are trying to access does not exist
		COMMAND: app(pid=46303).workbooks['T48E_XP_5_door_power_bi_metrics.xlsx'].sheets['Sheet4'].name.get()
Command failed:
		OSERROR: -1728
		MESSAGE: The object you are trying to access does not exist
		COMMAND: app(pid=46303).workbooks['T48E_XP_5_door_power_bi_metrics.xlsx'].sheets['Sheet5'].name.get()
Command failed:
		OSERROR: -1728
		MESSAGE: The object you are trying to access does not exist
		COMMAND: app(pid=46303

In [46]:
struct_filename = 'BoM and Function Group Structure GMT.xlsx'

struct = pd.read_excel(struct_filename, sheet_name='T48E')
# drop first row of struct which should be project, model variant, function group area, systems, sub sytems, AMs
struct = struct.loc[1:]

# find the first nan row in Level 4s as this will show where the last row in the valid values ends
last_row = struct['Level 4'].isna().idxmax()-1

# drop the rest of the struct rows
struct = struct.loc[:last_row]

# and now create a dictionary of all the valid values for each column - this drops the nan values for each column where we read merged cells from excel
struct_d = {struct[column].name: [y for y in struct[column] if not pd.isna(y)] for column in struct}





In [47]:
for lvl in range(0,5):
    if len(list(struct_d['Level {}'.format(lvl)])) != df[df['Level']==lvl+1].shape[0]:
        print ("Function Group Area mismatch.  Expected {} but got {}".format())


IndexError: Replacement index 0 out of range for positional args tuple

In [None]:
for lvl in range(0,5):
    expected = len(struct_d['Level {}'.format(lvl+1)])
    got = df[df['Level']==lvl].shape[0]
    if  expected != got:
        print ("Level {}: Function/System Group mismatch.  Expected {} but got {}".format(lvl, expected, got))


Level 2: Function/System Group mismatch.  Expected 24 but got 22
Level 3: Function/System Group mismatch.  Expected 81 but got 75
Level 4: Function/System Group mismatch.  Expected 21 but got 367


In [48]:
df[df['Level']==2][['Title','Description']]

Unnamed: 0_level_0,Title,Description
bom_order,Unnamed: 1_level_1,Unnamed: 2_level_1
3,T48E-01-A01,STRUCTURE SYSTEMS-5 DOOR-XP
708,T48E-01-B01,FRONT AXLE-5 DOOR-XP
878,T48E-01-B02,REAR AXLE-5 DOOR-XP
1051,T48E-01-B03,WHEELS & TYRES-5 DOOR-XP
1175,T48E-01-C01,BRAKE ACTUATION SYSTEMS-5 DOOR-XP
1256,T48E-01-D01,STEERING SYSTEMS-5 DOOR-XP
1285,T48E-01-E01,AIR SYSTEMS-5 DOOR-XP
1319,T48E-01-F01,ELECTRIC POWER SYSTEMS-XP
1331,T48E-01-G01,ELECTRIC DRIVE SYSTEMS-XP
1348,T48E-01-J01,POWERTRAIN NVH & HEATSHIELDS-XP


In [49]:
df.filter(regex='Function|Source Code').groupby(['Function Group','Source Code']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Function System Combined,Source Code Name,Missing Source Code,Parent Source Code,Function Group System Combined Check,Source Code Name Check,Source Code Check
Function Group,Source Code,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
BODY EXTERIORS-5 DOOR-XP,AIH,44,182,182,182,182,182,182
BODY EXTERIORS-5 DOOR-XP,BOF,51,139,139,139,139,139,139
BODY EXTERIORS-5 DOOR-XP,BOP,0,2,2,2,2,2,2
BODY EXTERIORS-5 DOOR-XP,CON,23,27,27,27,27,27,27
BODY EXTERIORS-5 DOOR-XP,ENG,0,15,15,15,15,15,15
BODY EXTERIORS-5 DOOR-XP,FAS,10,23,23,21,23,23,23
BODY EXTERIORS-5 DOOR-XP,POA,124,194,194,185,194,194,194
BODY EXTERIORS-5 DOOR-XP,RAW,4,4,4,4,4,4,4
BODY EXTERIORS-5 DOOR-XP,SYS,1,19,19,19,19,19,19
BODY STRUCTURES-5 DOOR-XP,AIH,57,93,93,93,93,93,93


In [50]:
pd.crosstab([df['Function Group']], df['Source Code'])
# pd.crosstab(power_bi['Function Group'], power_bi[col], margins=True, margins_name='Totals', dropna=False, normalize=True).iloc[:-1].round(4)*100

Source Code,AIH,BOF,BOP,CON,ENG,FAS,Joining,POA,RAW,SOF,SYS
Function Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
BODY EXTERIORS-5 DOOR-XP,182,139,2,27,15,23,0,194,4,0,19
BODY STRUCTURES-5 DOOR-XP,93,207,2,20,0,57,1,213,0,0,6
CHASSIS SYSTEMS-5 DOOR-XP,69,128,65,0,3,32,0,266,0,0,21
ELECTRIC POWERTRAIN SYSTEMS-XP,67,75,3,0,3,24,0,188,0,0,14
ELECTRICAL SYSTEMS-5 DOOR-XP,100,40,78,0,411,0,0,78,0,1,23
INTERIOR & HVAC SYSTEMS-5 DOOR-XP,51,48,10,0,0,0,0,11,0,0,14
LIFESTYLE PRODUCTS-XP,0,0,0,0,0,0,0,1,0,0,5
MASTER PRODUCT-5 DOOR-XP,0,0,0,0,0,0,0,0,0,0,1


In [59]:
missing_sc = df.filter(regex='Function Group$|Missing|Source Code$').groupby(['Function Group','Source Code']).count()

In [60]:
def create_heatmap(df, figsize):
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns

       
    hmap = plt.figure(figsize=figsize)
    ax = sns.heatmap(df, annot = True, fmt=".0%", cmap='YlGnBu', annot_kws={'fontsize':8}, linewidths=0.5)
    ax.set(xlabel="", ylabel="")
    ax.xaxis.tick_top()
    plt.rc('xtick', labelsize=10)
    plt.rc('ytick', labelsize=10)
    cbar = ax.collections[0].colorbar
    cbar.set_ticks([0, .2, .75, 1])
    cbar.set_ticklabels(['0%', '20%', '75%', '100%'])
    plt.figure()
    # sns.set(font_scale=.5)
    # plt.show()
    plt.close(hmap)
    return hmap

In [76]:
missing_sc

Unnamed: 0_level_0,Unnamed: 1_level_0,Missing UOM,Missing Provide,Missing Source Code,Missing CAD Mass,Missing CAD Maturity,Missing CAD Material,Missing Electrical Connector,Parent Source Code
Function Group,Source Code,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BODY EXTERIORS-5 DOOR-XP,AIH,182,182,182,182,182,182,182,182
BODY EXTERIORS-5 DOOR-XP,BOF,139,139,139,139,139,139,139,139
BODY EXTERIORS-5 DOOR-XP,BOP,2,2,2,2,2,2,2,2
BODY EXTERIORS-5 DOOR-XP,CON,27,27,27,27,27,27,27,27
BODY EXTERIORS-5 DOOR-XP,ENG,15,15,15,15,15,15,15,15
BODY EXTERIORS-5 DOOR-XP,FAS,23,23,23,23,23,23,23,21
BODY EXTERIORS-5 DOOR-XP,POA,194,194,194,194,194,194,194,185
BODY EXTERIORS-5 DOOR-XP,RAW,4,4,4,4,4,4,4,4
BODY EXTERIORS-5 DOOR-XP,SYS,19,19,19,19,19,19,19,19
BODY STRUCTURES-5 DOOR-XP,AIH,93,93,93,93,93,93,93,93
