## Setup

*You must run the cells in this section each time you connect to a new runtime. For example, when you return to the notebook after an idle timeout, when the runtime crashes, or when you restart or factory reset the runtime.*

Install requirements (*Note: ocdskingfishercolab installs google-colab, which expects specific versions of pandas and numpy*):


In [None]:
! pip install --upgrade pip > pip.log
! pip install --upgrade ocdskingfishercolab ipywidgets psycopg2-binary >> pip.log

In [None]:
# @title Import packages and load extensions { display-mode: "form" }

import gzip
import json
import os
import shutil
import tempfile
from collections import Counter
from datetime import datetime, timezone
from pathlib import Path

import numpy as np
import pandas as pd
from dateutil.relativedelta import relativedelta
from google.colab.data_table import DataTable
from google.colab.files import download
from ipywidgets import widgets
from ocdskingfishercolab import (
    authenticate_gspread,
    calculate_coverage,
    download_dataframe_as_csv,
    format_thousands,
    render_json,
    save_dataframe_to_sheet,
    save_dataframe_to_spreadsheet,
    set_dark_mode,
    set_light_mode,
)

# Load https://pypi.org/project/ipython-sql/
%load_ext sql
# Load https://colab.research.google.com/notebooks/data_table.ipynb
%load_ext google.colab.data_table

In [None]:
# @title Configure the notebook environment { display-mode: "form" }

# Increase max columns so that Pandas DataFrames with many columns are rendered as data tables.
DataTable.max_columns = 50
# Remove the index from data tables for easier copy-pasting to Google Docs.
DataTable.include_index = False

# Return Pandas DataFrames instead of regular result sets.
%config SqlMagic.autopandas = True
# Don't print number of rows affected.
%config SqlMagic.feedback = False

# If you set Tools > Settings > Site > Theme to dark, uncomment this line.
# set_dark_mode()
# If you are creating plots to copy-paste into reports, uncomment this line.
# set_light_mode()

## Setup Kingfisher Process

### Connect to the database

In [None]:
import getpass

from ocdskingfishercolab import (
    list_collections,
    list_source_ids,
    set_search_path,
)

Enter your PostgreSQL credentials and connect to the Kingfisher Process database:

In [None]:
user = input("Username:")
password = getpass.getpass("Password:")

# Don't show connection string after execute.
%config SqlMagic.displaycon = False

connection_string = (
    "postgresql://"
    + user
    + ":"
    + password
    + "@postgres.kingfisher.open-contracting.org/kingfisher_process?sslmode=require"
)
%sql $connection_string

### Choose collections and schema

*Use this section to choose the collections and schema that you want to query.*

#### Set the collection(s)

Update `collection_ids` with the `id`(s) of the [Kingfisher Process collection(s)](https://kingfisher-process.readthedocs.io/en/latest/data-model.html#collections):

In [None]:
collection_ids = (2358, 2359)

If you don't know which collections you need, run the next cell and use the **Filter** button to filter the [collection table](https://kingfisher-process.readthedocs.io/en/latest/database-structure.html#collection-table) to find the collection(s). You can use the `source_id` column to filter on the `name` of the [Kingfisher Collect spider](https://kingfisher-collect.readthedocs.io/en/latest/spiders.html) used to collect the data. Use the value(s) from the `id` column to update the previous cell.

In [None]:
list_collections()

#### Set the schema

Update `schema_name` with the name of the [Kingfisher Summarize schema](https://kingfisher-summarize.readthedocs.io/en/latest/index.html#how-it-works).

In [None]:
schema_name = "view_data_collection_2358_2359"
set_search_path(schema_name)

If you don't know which schema you need, run the next cell and use the **Filter** button to filter the [selected collections table](https://kingfisher-summarize.readthedocs.io/en/latest/database.html#summaries-selected-collections) to find the schema. You can use the `collection_id` column to filter on the `id` of the collections that you identified in the previous step. Alternatively, you can filter on the `source_id` column. Use the value from the `schema` column to update the previous cell.

In [None]:
%%sql
SELECT
    summaries.selected_collections.*,
    source_id
FROM
    summaries.selected_collections
INNER JOIN
    collection
    ON summaries.selected_collections.collection_id = collection.id


If you can't find a schema containing the collections that you want to query, you can create a schema using [Kingfisher Summarize](https://ocdsdeploy.readthedocs.io/en/latest/use/kingfisher-summarize.html).

## Red flags analysis setup

Use this section to setup the functions needed to perform a usability analysis of the dataset, to identify if a publisher has the necessary fields to calculate 73 red flags indicators. 

In [None]:
# @title Red flags functions { display-mode: "form" }
def check_red_flags_indicators(result):

    gc = authenticate_gspread()

    # NEW Red Flags to OCDS mapping #Public
    worksheet = gc.open_by_key("1GACSPd64X5Tm-nu6LKttyEpaEp1CLsaCUGrEutljnFU").get_worksheet(1)

    # get_all_values gives a list of rows.
    rows = worksheet.get_all_values()
    # Convert to a DataFrame and render.

    indicators = pd.DataFrame(rows)
    indicators = indicators.rename(columns=indicators.iloc[0]).drop(indicators.index[0])
    indicatorsdf = indicators.iloc[:, [0, 5, 6, 7]]

    return result.merge(indicatorsdf, on="R_id")


def check_rule(rule, fields_list):
    """Check if a DSL rule is satisfied against available fields.

    Returns a tuple of (satisfied, missing_fields) where satisfied is a boolean
    and missing_fields is a list of fields that are missing.
    """
    if isinstance(rule, str):
        satisfied = rule in fields_list
        return satisfied, [] if satisfied else [rule]
    if "all" in rule:
        missing = []
        for sub in rule["all"]:
            ok, sub_missing = check_rule(sub, fields_list)
            if not ok:
                missing.extend(sub_missing)
        return len(missing) == 0, missing
    if "any" in rule:
        all_missings = []
        for sub in rule["any"]:
            ok, sub_missing = check_rule(sub, fields_list)
            if ok:
                return True, []
            all_missings.append(sub_missing)
        best = min(all_missings, key=len) if all_missings else []
        return False, best


def get_all_fields(rule):
    """Get all field paths referenced in a DSL rule."""
    if isinstance(rule, str):
        return [rule]
    if "all" in rule:
        fields = []
        for sub in rule["all"]:
            fields.extend(get_all_fields(sub))
        return fields
    if "any" in rule:
        fields = []
        for sub in rule["any"]:
            fields.extend(get_all_fields(sub))
        return fields
    return []


def most_common_fields_to_calculate_indicators(indicators_dict, fields_table):
    flat_list = []
    for data in indicators_dict.values():
        flat_list.extend(get_all_fields(data["rule"]))
    fields_list = Counter(flat_list)

    fields_count = (
        pd.DataFrame.from_dict(fields_list, orient="index")
        .reset_index()
        .rename(columns={"index": "field", 0: "number of indicators"})
    )

    fields_count = fields_count.sort_values("number of indicators", ascending=False).reset_index(drop=True)
    fields_count["published"] = np.where(fields_count["field"].isin(fields_table["path"]), "yes", "no")

    return fields_count


def get_coverage(indicators_dic):
    coverage = []
    for data in indicators_dic.values():
        fields = get_all_fields(data["rule"])
        result = calculate_coverage(fields, "release_summary")
        result_value = pd.to_numeric(result["total_percentage"][0])
        coverage.append(result_value)
    return coverage


_RF_BUYER = {
    "any": [
        {"all": ["buyer/name", "buyer/id"]},
        {"all": ["tender/procuringEntity/name", "tender/procuringEntity/id"]},
        {"all": ["parties/name", "parties/id", "parties/roles"]},
    ]
}

_RF_BIDDERS = {"any": ["tender/tenderers/id", "bids/details/tenderers/id"]}

_RF_BIDDERS_COUNT = {
    "any": [
        "tender/numberOfTenderers",
        "tender/tenderers/id",
        "bids/details/tenderers/id",
        "bids/statistics/value",
    ]
}

_RF_ITEMS = {
    "any": [
        {"all": ["tender/items/classification/id", "tender/items/classification/scheme"]},
        {"all": ["awards/items/classification/id", "awards/items/classification/scheme"]},
        {"all": ["contracts/items/classification/id", "contracts/items/classification/scheme"]},
    ]
}

_RF_UNIT_ITEMS = {
    "any": [
        {"all": ["tender/items/unit/value/amount", "tender/items/unit/value/currency"]},
        {"all": ["awards/items/unit/value/amount", "awards/items/unit/value/currency"]},
        {"all": ["contracts/items/unit/value/amount", "contracts/items/unit/value/currency"]},
    ]
}

_RF_DATE = {"any": ["tender/tenderPeriod/startDate", "awards/date"]}

_RF_AMOUNT = {
    "any": [
        "tender/value/amount",
        "bids/details/value/amount",
        "awards/value/amount",
        "contracts/value/amount",
    ]
}

_RF_WIN_BID = {
    "any": [
        {"all": ["bids/awards/relatedBid"]},
        {"all": ["bids/details/tenderers/id", "awards/suppliers/id"]},
    ]
}

_RF_BIDDER_INFO = {
    "any": [
        "parties/contactPoint/telephone",
        "parties/address/streetAddress",
        "parties/address/postalCode",
    ]
}

_RF_CONTACT_INFO = {
    "any": [
        "parties/contactPoint/telephone",
        "parties/contactPoint/email",
        "parties/contactPoint/name",
    ]
}

_RF_AWARD_CONTRACT_VALUE = {
    "any": [
        {"all": ["awards/status", "awards/date", "awards/value/amount", "awards/value/currency"]},
        {"all": ["contracts/status", "contracts/dateSigned", "contracts/value/amount", "contracts/value/currency"]},
    ]
}

_RF_IMP_VALUE = {
    "any": [
        {"all": ["contracts/implementation/finalValue/amount", "contracts/implementation/finalValue/currency"]},
        {
            "all": [
                "contracts/implementation/transactions/value/amount",
                "contracts/implementation/transactions/value/currency",
            ]
        },
    ]
}

RED_FLAGS_REQUIREMENTS = {
    "Planning documents not available": {
        "id": ["R001"],
        "rule": {"all": ["planning/documents/documentType"]},
    },
    "Manipulation of procurement thresholds": {
        "id": ["R002"],
        "rule": {
            "all": [
                "tender/value/amount",
                "tender/value/currency",
                "tender/procurementMethod",
                "tender/tenderPeriod/startDate",
                _RF_BUYER,
            ]
        },
    },
    " The submission period is too short": {
        "id": ["R003"],
        "rule": {"all": ["tender/tenderPeriod/startDate", "tender/tenderPeriod/endDate", "tender/procurementMethod"]},
    },
    "Failure to adequately advertise the request for bids": {
        "id": ["R004"],
        "rule": {
            "all": [
                "tender/documents/documentType",
                "tender/documents/datePublished",
                "tender/tenderPeriod/startDate",
            ]
        },
    },
    "Key tender information and documents are not available": {
        "id": ["R005"],
        "rule": {
            "all": [
                "tender/documents/documentType",
                "tender/documents/datePublished",
                "tender/tenderPeriod/startDate",
                "tender/tenderPeriod/endDate",
            ]
        },
    },
    "Unreasonable prequalification requirements": {
        "id": ["R006"],
        "rule": {"all": ["tender/eligibilityCriteria"]},
    },
    "Unreasonable technical specifications": {
        "id": ["R007"],
        "rule": {
            "all": [
                "tender/documents/documentType",
                "tender/procurementMethod",
                _RF_ITEMS,
                _RF_BUYER,
                "tender/value/amount",
            ]
        },
    },
    "Unreasonable participation fees": {
        "id": ["R008"],
        "rule": {
            "all": [
                "tender/participationFees/value/amount",
                "tender/participationFees/value/currency",
                "tender/value/amount",
            ]
        },
    },
    "Buyer increases the cost of the bidding documents": {
        "id": ["R009"],
        "rule": {
            "all": [
                "tender/participationFees/value/amount",
                "tender/participationFees/value/currency",
                "date",
            ]
        },
    },
    "Unjustified use of non competitive procedure": {
        "id": ["R010"],
        "rule": {
            "all": [
                "tender/procurementMethod",
                "tender/procurementMethodDetails",
                "tender/procurementMethodRationale",
            ]
        },
    },
    "Splitting purchases to avoid procurement thresholds": {
        "id": ["R011"],
        "rule": {
            "all": [
                "tender/procurementMethod",
                _RF_ITEMS,
                "tender/value/amount",
                "tender/value/currency",
                "tender/tenderPeriod/startDate",
                _RF_BUYER,
            ]
        },
    },
    "Direct awards in contravention of the provisions of the procurement plan": {
        "id": ["R012"],
        "rule": {
            "all": [
                "tender/procurementMethod",
                "tender/procurementMethodDetails",
                "planning/documents/documentType",
            ]
        },
    },
    "High use of non competitive methods": {
        "id": ["R013"],
        "rule": {"all": ["tender/procurementMethod", _RF_BUYER]},
    },
    "Short time between tender advertising and bid opening": {
        "id": ["R014"],
        "rule": {
            "all": [
                "tender/tenderPeriod/startDate",
                "tender/bidOpening/date",
                "tender/procurementMethod",
            ]
        },
    },
    "Long time between bid opening and bid evaluation": {
        "id": ["R015"],
        "rule": {
            "all": [
                "tender/bidOpening/date",
                "tender/awardPeriod/startDate",
                "tender/procurementMethod",
            ]
        },
    },
    "Tender value is higher or lower than average for this item category": {
        "id": ["R016"],
        "rule": {
            "all": [
                "tender/value/amount",
                "tender/value/currency",
                _RF_ITEMS,
                "tender/procurementMethod",
            ]
        },
    },
    "Unreasonably low or high line item": {
        "id": ["R017"],
        "rule": {"all": [_RF_ITEMS, _RF_UNIT_ITEMS]},
    },
    "Single bid received": {
        "id": ["R018"],
        "rule": {"all": ["tender/procurementMethod", _RF_BIDDERS_COUNT]},
    },
    "Low number of bidders for item and procuring entity": {
        "id": ["R019"],
        "rule": {"all": ["tender/procurementMethod", _RF_ITEMS, _RF_BUYER, _RF_BIDDERS_COUNT]},
    },
    "Tender has a complaint": {
        "id": ["R020"],
        "rule": {"all": ["complaints/id"]},
    },
    "High use of discretionary evaluation criteria": {
        "id": ["R021"],
        "rule": {"all": ["tender/awardCriteria", _RF_BUYER]},
    },
    "Wide disparity in bid prices": {
        "id": ["R022"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
            ]
        },
    },
    "Fixed multiple bid prices": {
        "id": ["R023"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
            ]
        },
    },
    "Price close to winning bid": {
        "id": ["R024"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
                _RF_WIN_BID,
            ]
        },
    },
    "Excessive unsuccessful bids": {
        "id": ["R025"],
        "rule": {"all": ["awards/suppliers/id", "bids/details/status", _RF_BIDDERS]},
    },
    "Prevalence of consortia": {
        "id": ["R026"],
        "rule": {
            "all": [
                "awards/suppliers/id",
                "awards/suppliers/name",
                "awards/status",
                "awards/date",
                _RF_ITEMS,
            ]
        },
    },
    "Missing bidders": {
        "id": ["R027"],
        "rule": {"all": ["tender/procurementMethod", _RF_ITEMS, _RF_BIDDERS, _RF_DATE]},
    },
    "Identical bid prices": {
        "id": ["R028"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/tenderers/id",
            ]
        },
    },
    "Bid prices deviate from Benford\u2019s Law": {
        "id": ["R029"],
        "rule": {"all": [_RF_AMOUNT, _RF_ITEMS]},
    },
    "Late bid won": {
        "id": ["R030"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/date",
                "bids/details/status",
                "tender/tenderPeriod/endDate",
                _RF_WIN_BID,
            ]
        },
    },
    "Winning bid price very close or higher than estimated price": {
        "id": ["R031"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
                "tender/value/amount",
                "tender/value/currency",
                _RF_WIN_BID,
            ]
        },
    },
    "Bidders share same beneficial owner": {
        "id": ["R032"],
        "rule": {
            "all": [
                "parties/roles",
                "parties/id",
                "parties/beneficialOwners/name",
                "parties/beneficialOwners/id",
            ]
        },
    },
    "Bidders share same major shareholder": {
        "id": ["R033"],
        "rule": {
            "all": [
                "parties/roles",
                "parties/id",
                "parties/shareholders/shareholder/id",
                "parties/shareholders/shareholding",
            ]
        },
    },
    " Bids submitted in same order": {
        "id": ["R034"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/date",
                "bids/details/tenderers/id",
                "bids/details/tenderers/name",
                "bids/details/status",
            ]
        },
    },
    "All except winning bid disqualified": {
        "id": ["R035"],
        "rule": {"all": ["bids/details/id", "bids/details/status", "awards/status", _RF_WIN_BID]},
    },
    "Lowest bid disqualified ": {
        "id": ["R036"],
        "rule": {
            "all": [
                "tender/awardCriteria",
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
            ]
        },
    },
    "Poorly supported disqualifications": {
        "id": ["R037"],
        "rule": {
            "all": [
                "tender/awardCriteria",
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
                "bids/details/documents",
            ]
        },
    },
    "Excessive disqualified bids": {
        "id": ["R038"],
        "rule": {"all": ["bids/details/id", "bids/details/status", _RF_BIDDERS, _RF_BUYER]},
    },
    "Unanswered bidder questions": {
        "id": ["R039"],
        "rule": {
            "all": [
                "tender/enquiries/date",
                "tender/enquiries/dateAnswered",
                "tender/enquiries/answer",
                "tender/status",
            ]
        },
    },
    "High share of buyers contracts": {
        "id": ["R040"],
        "rule": {
            "all": [
                _RF_BUYER,
                "awards/status",
                "awards/date",
                "awards/suppliers/id",
                "awards/suppliers/name",
            ]
        },
    },
    "Physical similarities in documents by different bidders": {
        "id": ["R041"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/tenderers/id",
                "bids/documents/documentType",
            ]
        },
    },
    "Bidder has abnormal address or phone number": {
        "id": ["R042"],
        "rule": {"all": [_RF_BIDDER_INFO]},
    },
    "Bidder has same contact information as project official": {
        "id": ["R043"],
        "rule": {"all": ["parties/roles", "parties/id", _RF_CONTACT_INFO]},
    },
    "Business similarities between bidders": {
        "id": ["R044"],
        "rule": {"all": ["parties/roles", "parties/id", _RF_BIDDER_INFO]},
    },
    "Bidder is not listed in  business registries": {
        "id": ["R045"],
        "rule": {"all": ["parties/roles", "parties/id"]},
    },
    "Bidder is debarred or on sanctions list": {
        "id": ["R046"],
        "rule": {"all": ["parties/roles", "parties/id"]},
    },
    "Supplier is not traceable on the web": {
        "id": ["R047"],
        "rule": {"all": ["awards/suppliers/name", "awards/suppliers/id", "parties/contactPoint/url"]},
    },
    "Heterogeneous supplier": {
        "id": ["R048"],
        "rule": {"all": [_RF_ITEMS, "awards/suppliers/id", "awards/suppliers/name"]},
    },
    "Direct awards below threshold": {
        "id": ["R049"],
        "rule": {
            "all": [
                "awards/suppliers/id",
                "awards/suppliers/name",
                "awards/date",
                "tender/procurementMethod",
                _RF_BUYER,
            ]
        },
    },
    " High market share": {
        "id": ["R050"],
        "rule": {
            "all": [
                "awards/suppliers/id",
                "awards/suppliers/name",
                _RF_BUYER,
                "awards/value/amount",
                "awards/value/currency",
                _RF_ITEMS,
                "awards/date",
                "awards/status",
            ]
        },
    },
    "High market concentration": {
        "id": ["R051"],
        "rule": {"all": ["awards/suppliers/id", "awards/suppliers/name", _RF_AWARD_CONTRACT_VALUE, _RF_ITEMS]},
    },
    "Small initial purchase from supplier followed by much larger purchases": {
        "id": ["R052"],
        "rule": {
            "all": [
                "awards/suppliers/id",
                "awards/suppliers/name",
                "tender/procurementMethod",
                _RF_BUYER,
                _RF_AWARD_CONTRACT_VALUE,
            ]
        },
    },
    "Co-bidding pairs have same recurrent winner": {
        "id": ["R053"],
        "rule": {"all": ["bids/details/id", "bids/details/status", _RF_WIN_BID]},
    },
    "Direct award followed by change orders  that exceed the competitive threshold": {
        "id": ["R054"],
        "rule": {
            "all": [
                "tender/procurementMethod",
                "awards/value/amount",
                "awards/value/currency",
                "contracts/value/amount",
                "contracts/value/currency",
                "contracts/amendments/description",
            ]
        },
    },
    "Multiple direct awards above or just below competitive threshold": {
        "id": ["R055"],
        "rule": {
            "all": [
                "tender/procurementMethod",
                "awards/suppliers/id",
                "awards/suppliers/name",
                _RF_AWARD_CONTRACT_VALUE,
                _RF_BUYER,
            ]
        },
    },
    "Winning bid does not meet the award criteria": {
        "id": ["R056"],
        "rule": {
            "all": [
                "tender/awardCriteria",
                "bids/details/status",
                "bids/details/documents",
                _RF_WIN_BID,
            ]
        },
    },
    " Bid rotation": {
        "id": ["R057"],
        "rule": {
            "all": [
                "bids/details/tenderers/id",
                "bids/details/tenderers/name",
                "awards/suppliers/id",
                "awards/suppliers/name",
                "bids/details/value/amount",
                "bids/details/value/currency",
                _RF_ITEMS,
            ]
        },
    },
    "Heavily discounted bid": {
        "id": ["R058"],
        "rule": {
            "all": [
                "bids/details/id",
                "bids/details/value/amount",
                "bids/details/value/currency",
                "bids/details/status",
                _RF_WIN_BID,
            ]
        },
    },
    "Large difference between the award value and final contract amount": {
        "id": ["R059"],
        "rule": {
            "all": [
                "awards/id",
                "awards/status",
                "awards/value/amount",
                "awards/value/currency",
                "contracts/awardID",
                "contracts/value/amount",
                "contracts/value/currency",
                "contracts/status",
            ]
        },
    },
    "Long time between award date and contract signature date": {
        "id": ["R060"],
        "rule": {"all": ["awards/date", "contracts/dateSigned", "tender/procurementMethod"]},
    },
    "Decision period extremely short": {
        "id": ["R061"],
        "rule": {"all": ["tender/tenderPeriod/endDate", "awards/date", "tender/procurementMethod"]},
    },
    "Decision period extremely long": {
        "id": ["R062"],
        "rule": {"all": ["tender/tenderPeriod/endDate", "awards/date", "tender/procurementMethod"]},
    },
    "Contract is not published": {
        "id": ["R063"],
        "rule": {"all": ["contracts/documents/documentType"]},
    },
    "Contract has modifications": {
        "id": ["R064"],
        "rule": {"all": ["contracts/status", "contracts/amendments/description"]},
    },
    "Contract amendments to reduce line items": {
        "id": ["R065"],
        "rule": {
            "all": [
                "contracts/status",
                "contracts/amendments/description",
                "contracts/amendments/rationale",
            ]
        },
    },
    "Contract amendments to increase line items": {
        "id": ["R066"],
        "rule": {
            "all": [
                "contracts/status",
                "contracts/amendments/description",
                "contracts/amendments/rationale",
            ]
        },
    },
    "Delivery failure": {
        "id": ["R067"],
        "rule": {
            "all": [
                "contracts/implementation/milestones/type",
                "contracts/implementation/milestones/dueDate",
                "contracts/implementation/milestones/dateMet",
            ]
        },
    },
    "Contract transactions exceed contract amount": {
        "id": ["R068"],
        "rule": {"all": ["contracts/value/amount", "contracts/value/currency", _RF_IMP_VALUE]},
    },
    "Contract amendments to increase price": {
        "id": ["R069"],
        "rule": {
            "all": [
                "contracts/status",
                "contracts/amendments/description",
                "contracts/amendments/rationale",
            ]
        },
    },
    "Losing bidders are hired as subcontractors": {
        "id": ["R070"],
        "rule": {
            "all": [
                "contracts/relatedProcesses",
                "contracts/relatedProcesses/relationship",
                "awards/suppliers/id",
                _RF_BIDDERS,
            ]
        },
    },
    "A contractor subcontracts all or most of the work received": {
        "id": ["R071"],
        "rule": {"all": ["awards/hasSubcontracting", "awards/subcontracting/minimumPercentage"]},
    },
    "High prevalence of subcontracts": {
        "id": ["R072"],
        "rule": {"all": ["awards/hasSubcontracting", _RF_BUYER]},
    },
    "Discrepancies between work completed and contract specifications": {
        "id": ["R073"],
        "rule": {
            "all": [
                "contracts/status",
                "contracts/documents/documentType",
                "contracts/implementation/documents/documentType",
            ]
        },
    },
}


def redflags_checks(fields_list, indicators_dic, check_coverage=False):
    """Returns a table of the usability checks.
    It indicates if the fields needed to calculate a particular indicator are present.
    Set check_coverage=True to check for coverage"""

    results_list = []
    missing_fields = []
    fields_needed = []

    for indicator_data in indicators_dic.values():
        ok, missing = check_rule(indicator_data["rule"], fields_list)
        result = "missing fields" if not ok else "possible to calculate"
        results_list.append(result)
        missing_fields.append(missing)
        fields_needed.append(", ".join(get_all_fields(indicator_data["rule"])))

    # Generate dataframe

    indicatordf = pd.DataFrame(
        list(
            zip(
                list(indicators_dic),
                [indicators_dic[i]["id"] for i in indicators_dic],
                fields_needed,
                strict=True,
            )
        ),
        columns=["red_flag", "R_id", "fields needed"],
    )
    indicatordf["R_id"] = indicatordf["R_id"].apply(lambda x: ", ".join(map(str, x)))
    indicatordf["calculation"] = results_list
    indicatordf["missing fields"] = missing_fields
    indicatordf["missing fields"] = indicatordf["missing fields"].apply(lambda x: ", ".join(map(str, x)))

    if check_coverage:
        # Calculate coverage
        coverage = []
        for data in indicators_dic.values():
            fields = get_all_fields(data["rule"])
            result = calculate_coverage(fields, "release_summary")
            result_value = pd.to_numeric(result["total_percentage"][0])
            coverage.append(result_value)
        indicatordf["coverage"] = coverage
    return indicatordf

## Usability analysis

Generate a list of the fields published:

In [None]:
%%sql fields_table <<
SELECT
    path,
    distinct_releases
FROM
    field_counts
WHERE
    release_type = 'compiled_release'


In [None]:
fields_list = fields_table.iloc[:, 0].tolist()

In [None]:
result = redflags_checks(fields_list, RED_FLAGS_REQUIREMENTS, check_coverage=True)
result["coverage"] = get_coverage(RED_FLAGS_REQUIREMENTS)

### Export results

#### Load use case indicators spreadsheet

In [None]:
result_final = check_red_flags_indicators(result)

#### Table of results

In [None]:
result_final

#### Most common fields for indicators

This table shows the most frequent fields used to calculate indicators and if they are published.  You can use this table to highlight to the publisher the key data gaps.  

In [None]:
common_fields = most_common_fields_to_calculate_indicators(RED_FLAGS_REQUIREMENTS, fields_table)
common_fields

#### Save tables to spreadsheet

In [None]:
spreadsheet_name = input("Enter the name of your spreadsheet:")
save_dataframe_to_sheet(spreadsheet_name, result_final, "red_flags_table")
save_dataframe_to_sheet(spreadsheet_name, common_fields, "common_fields_table")
save_dataframe_to_sheet(spreadsheet_name, fields_table, "fields_list")