# CROM: Climate-Related Opportunity Metrics

To triage climate-related opportunities is necessary because climate change is an urgent problem.  
We have to allocate the limited budget to feasible and effective ones. It is the common problem for cities and companies. 

![top_image](https://i.imgur.com/kDxyLOU.png)

I show the **CROM: Climate-Related Opportunity Metrics** to evaluate the climate-related opportunities from qualitative and quantitive aspects.

* Qualitative: The opportunities should be feasible enough to be integrated into the corporate business strategy.
* Quantitative: The opportunities should be effective enough to earn financial performance.

This metrics is not only useful for companies but also for cities when making the decision about the fund to promote climate-related opportunities, 

For that reason, this metrics can detect **practical and actionable points** between city and company. That is asked in the original problem statement.

*What are the practical and actionable points where city and corporate ambition join, i.e. where do cities have problems that corporations affected by those problems could solve, and vice versa?*

The characteristics of CROM is the following.

* Simple: The formula is very simple.
* Assemblable: Each item in the formula has a distinctive meaning.
* Customizable: You can change the weight of the items in the formula according to your interest.

The construction of this document is the following.

1. How to calculate CROM.
  * Qualitative Evaluation
  * Quantitive Evaluation
  * Calculate CROM
2. Find the actionable points of city and company by CROM.


# How to calculate CROM

CROM is the multiplied value of qualitative evaluation and quantitative evaluation.  

*CROM = qualitative evaluation * quantitative evaluation*

The CDP data is necessary to calculate these and additional financial data is required for quantitative evaluation.


# Qualitative Evaluation

The qualitative evaluation is calculated by the following.

*Qualitative Evaluation = Time horizon + Likelihood + Impact*

In short, *The short-term and most likely occur and impactful opportunities* are highly graded.  
You can customize the weight of item in the formula.  
For example, if you concern the time horizon is important, add weight to the time horizon ( *(1.5 * Time horizon) + Likelihood + Impact* etc).
I use the following CDP questionnaires for qualitative evaluation.

* C2.4a: *Provide details of opportunities identified with the potential to have a substantive financial or strategic impact on your business.*
  * Column 4: Primary climate-related opportunity driver
  * Column 7: Time horizon
  * Column 8: Likelihood
  * Column 9: Magnitude of impact

The response values are the following.


In [None]:
import os
import pandas as pd
import numpy as np
import altair as alt
import re
import json

In [None]:
# Confirm the difference of columns
RESPONSE_ROOT = "../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses"
YEARS = (2018, 2019, 2020)
cl_dfs = {}

for year in YEARS:
    kind = "Climate Change"
    file_name = "{}_Full_{}_Dataset.csv".format(year, kind.replace(" ", "_"))
    path = "{}/{}/{}".format(RESPONSE_ROOT, kind, file_name)
    df = pd.read_csv(path)
    cl_dfs[year] = df

In [None]:
def extract_c24_responses(year_dfs):
    """
    Extract C2.4a responses
    Column 4: Primary climate-related opportunity driver
    Column 7: Time horizon
    Column 8: Likelihood
    Column 9: Magnitude of impact
    """

    c24s = []
    columns = [
        "account_number",
        "organization",
        "survey_year",
        "question_number",
        "row_number"
    ]
    
    for year in year_dfs:
        year_df = year_dfs[year][(columns + ["column_number", "response_value"])]
        c24 = year_df[(year_df["question_number"] == "C2.4a")]
        c24 = c24[(c24["column_number"].isin([4, 7, 8, 9]))]
        c24 = c24.dropna(subset=["response_value"])
        c24["column_number"] = c24["column_number"].map(int)
        stacked = c24.pivot(index=columns, columns="column_number", values="response_value")
        stacked = stacked.rename_axis(None, axis=1).reset_index()
        stacked.rename(columns={4: "opportunity_driver", 7: "time_horizon", 8: "likelihood", 9: "impact"}, inplace=True)
        c24s.append(stacked)
    
    c24s = pd.concat(c24s)
    return c24s


qualitative_df = extract_c24_responses(cl_dfs)
qualitative_df.head(5)

Now, I convert *Time horizon*, *Likelihood*, *Magnitude of impact* to 1~5 number values.

In [None]:
def convert_time_horizon(v):    
    convert_dict = {
        "Unknown": 0,
        "Current": 4.0,
        "Short-term": 5.0,
        "Medium-term": 2.0,
        "Long-term": 1.0,
    }

    if not pd.notna(v):
        return 0
    else:
        return convert_dict[v]


qualitative_df["time_horizon_value"] = qualitative_df["time_horizon"].apply(convert_time_horizon)

In [None]:
def convert_likelihood(v):    
    convert_dict = {
        "Unknown": 0,
        "Virtually certain": 5.0,
        "Very likely": 4.5,
        "Likely": 4.0,
        "More likely than not": 3.0,
        "About as likely as not": 2.5,
        "Unlikely": 2.0,
        "Very unlikely": 1.5,
        "Exceptionally unlikely": 1.0
    }

    if not pd.notna(v):
        return 0
    else:
        return convert_dict[v]


qualitative_df["likelihood_value"] = qualitative_df["likelihood"].apply(convert_likelihood)

In [None]:
def convert_impact(v):    
    convert_dict = {
        "Unknown": 0,
        "High": 5.0,
        "Medium-high": 4.0,
        "Medium": 3.0,
        "Medium-low": 2.0,
        "Low": 1.0
    }

    if not pd.notna(v):
        return 0
    else:
        return convert_dict[v]


qualitative_df["impact_value"] = qualitative_df["impact"].apply(convert_impact)

Let's calculate qualitative evaluation by `time_horizon_value` * `likelihood_value` * `impact_value`.  
And visualize the value of each `opportunity_driver`. 

In [None]:
def qualitative_evaluation(df, time_horizon_weight=1, likelihood_weight=1, impact_weight=1, scale=5):
    value = (time_horizon_weight * df["time_horizon_value"]) *\
             (likelihood_weight * df["likelihood_value"]) *\
               (impact_weight * df["impact_value"])
    if scale > 0:
        return value / (scale**3) * scale
    else:
        return value

In [None]:
from altair import expr, datum


def visualize_opportunity_drivers_qualitative(df):
    """
    Visualize qualitative evaluation of each opportunity drivers.
    """

    def merge_driver(x):
        if isinstance(x, str) and x.startswith("Other"):
            return "Other"
        else:
            return x
    
    slider_t = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Time Horizon Weight:")
    select_t = alt.selection_single(fields=["time_horizon_weight"],
                                    bind=slider_t, init={"time_horizon_weight": 1.0})

    slider_l = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Likelihood Weight:")
    select_l = alt.selection_single(fields=["likelihood_weight"],
                                    bind=slider_l, init={"time_horizon_weight": 1.0})

    slider_i = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Impact Weight:")
    select_i = alt.selection_single(fields=["impact_weight"],
                                    bind=slider_i, init={"impact_weight": 1.0})
    
    select_year = alt.selection_multi(fields=["year"])
    
    
    data = pd.DataFrame({
            "year": df["survey_year"].apply(str),
            "opportunity_driver": df["opportunity_driver"].apply(merge_driver),
            "t": df["time_horizon_value"],
            "l": df["likelihood_value"],
            "i": df["impact_value"],
           })
    
    data = data.groupby(["year", "opportunity_driver"]).median().reset_index()
    return alt.Chart(data).mark_bar().encode(
        x="evaluation:Q",
        y=alt.Y("opportunity_driver", sort=alt.EncodingSortField(field="evaluation", order="descending")),
        color="year"
    ).add_selection(
        select_t, select_l, select_i, select_year
    ).transform_calculate(
        evaluation=(datum.t * select_t.time_horizon_weight + datum.l * select_l.likelihood_weight + datum.i * select_i.impact_weight) / 3
    ).transform_filter(
        select_year
    )

visualize_opportunity_drivers_qualitative(qualitative_df)

Please change the weight slider and reflect your interest!

* If we think *Impact* is important, "Ability to diversity business activities" is highly rated.
* *Likelihood* changes the evaluation a little, but *Participation in carbon market* is ranked up if it is important.
* As *Time Hrizon Weight* decrease, "Ability to diversity business activities" is ranked up.

To change the weight and watch its effect contributes to your consideration.

# Quantitive Evaluation

The quantitive evaluation is calculated by the following (This formula is based on [Hybrid Metrics](https://www.sharedvalue.org/resource/hybrid-metrics/)).

*Quantitive Evaluation = EBITDA / CO2 emission*

In short, this value is fuel efficiency.  
If the opportunity is effective, it will not only decrease emissions but also increase earnings because it led to differentiation in the market.

I use the following CDP questionnaires and [financial data](https://www.kaggle.com/takahirokubo0/annual-financial-data-for-hybrid-cdp-kpi) for quantitive evaluation.

* C6.1: What were your organization’s gross global Scope 1 emissions in metric tons CO2e?
* C6.3: What were your organization's gross global Scope 2 emissions in metric tons CO2e?
* C6.5: Account for your organization’s Scope 3 emissions, disclosing and explaining any exclusions.

To extract emissions from the above questionnaires is a little complicated.  
[Please refer to this notebook for the detail](https://www.kaggle.com/takahirokubo0/cdp-extract-emissions-from-corporate-responses).

You can customize CO2 emission with the selective summation of Scope1~3, and use another account instead of EBITDA ([Depreciation & Amortization is one of the alternatives](https://www.kaggle.com/takahirokubo0/cdp-hybrid-metrics-for-corporate-sustainability)).


At first, read the financial dataset.

In [None]:
!pip install simfin

In [None]:
import simfin as sf
from simfin.names import *


FINANCIAL_ROOT = "../input/annual-financial-data-for-hybrid-cdp-kpi/cdp_financial_data.csv"
f_df = pd.read_csv(FINANCIAL_ROOT)
f_df.head(5)

And read the emisssion data.

In [None]:
def extract_c6_emissions(year_df):
    """
    Extract Scope1, Scope2 and Scope3 emissions from C6.
    """
    structure = {
        "C6.1": {
            "column_name": "Scope1",
            "column_number": 1,
            "row_number": 1
        },
        "C6.3": {
            "column_name": ["Scope2-location", "Scope2-market"],
            "column_number": [1, 2],
            "row_number": 1
        },
        "C6.5": {
            "column_name": ["Scope3"],
            "column_number": 2
        }
    }
    
    items = ["account_number", "organization", "survey_year",
             "question_number", "column_number", "row_number",
             "table_columns_unique_reference", "response_value"]
    
    c6_emissions = []
    for target_number in structure:
        location = structure[target_number]
        df = year_df[year_df["question_number"] == target_number]
        
        # Select columns
        columns = location["column_number"]
        columns = columns if isinstance(columns, list) else [columns]
        for i, c in enumerate(columns):
            name = location["column_name"]
            name = name if isinstance(name, str) else name[i]
            selected = df[df["column_number"] == c]
            selected = selected[items]
            
            # Filter by rows
            if "row_number" in location:
                r = location["row_number"]
                selected = selected[selected["row_number"] == r]
            
            # Preprocess response value
            selected["response_value"] = pd.to_numeric(selected["response_value"], errors="coerce")
            selected = selected.dropna(subset=["response_value"])
            selected["scope"] = pd.Series([name] * len(selected), index=selected.index)
            c6_emissions.append(selected)
        
    c6_emissions = pd.concat(c6_emissions)
    items.append("scope")
    items.remove("row_number")
    c6_emissions = c6_emissions.groupby(items).sum().reset_index()
    c6_emissions.rename(columns={"response_value": "emissions"}, inplace=True)
    
    return c6_emissions

In [None]:
def make_quantitive_df(year_dfs, f_df):
    """
    Make financial & non-financial dataset.
    """
    
    emissions = []
    for year in year_dfs:
        e_df = extract_c6_emissions(year_dfs[year])
        e_df["survey_year"] = year
        pivot = e_df.pivot_table(index=["account_number", "survey_year"], columns="scope", values="emissions")
        pivot = pivot.reset_index()
        pivot.fillna(0, inplace=True)
        pivot["emissions"] = pivot["Scope1"] + pivot["Scope2-location"] + pivot["Scope2-market"] + pivot["Scope3"]
        emissions.append(pivot)
    
    emissions = pd.concat(emissions)
    df = emissions.merge(f_df, how="inner", on=["account_number", "survey_year"], suffixes=("_emission", None))    
    return df

In [None]:
quantitive_df = make_quantitive_df(cl_dfs, f_df)
quantitive_df.head(5)

To allocate the emissions and financial data to opportunities, I use qualitative evaluation value.

In [None]:
def allocate_quantitives(qualitative_df, quantitive_df):
    """
    Allocate quantitive data by qualitative evaluation.
    """
    
    # Calculate allocation rate
    qualitative_df["qualitative_evaluation"] = qualitative_evaluation(qualitative_df)
    columns = ["account_number", "survey_year", "question_number", "row_number", "opportunity_driver"]
    allocation_rate = qualitative_df[columns + ["qualitative_evaluation"]].groupby(columns).agg(
                        {"qualitative_evaluation": "sum"})
    allocation_rate = allocation_rate.groupby(level=list(range(len(columns) - 2))).apply(lambda v: v / float(v.sum()))
    allocation_rate = allocation_rate.reset_index()
    allocation_rate.rename(columns={"qualitative_evaluation": "allocation_rate"}, inplace=True)
    q_df = qualitative_df.merge(allocation_rate, how="left", on=columns, suffixes=("_allocation", None))
    
    # Extract financial values   
    fv_columns = [
        REVENUE,
        COST_REVENUE,
        OPERATING_INCOME,
        OPERATING_EXPENSES,
        DEPR_AMOR,
        "EBITDA",
        "Scope1",
        "Scope2-location",
        "Scope2-market",
        "Scope3",
        "emissions",
    ]
    
    f_columns = [
        "account_number",
        "survey_year",
        "Ticker",
        CURRENCY,
        FISCAL_YEAR,
        FISCAL_PERIOD
    ]
     
    # Merge financial data
    df = q_df.merge(quantitive_df[f_columns + fv_columns],
                    how="left", on=["account_number", "survey_year"], suffixes=("_emission", None))
    
    # Allocate by rate
    for c in fv_columns:
        df[c] = df[c] * df["allocation_rate"]
    
    return df


qq_df = allocate_quantitives(qualitative_df, quantitive_df)

Let's visualize quantitive evaluation.  
(I use sigmoid function to scale quantitive evaluation to 0~1 scale).

In [None]:
def quantitive_evaluation(df, kind="EBITDA", scale=5):

    
    def clip(s, lower_th=1, upper_th=99):
        _lower, _upper = np.percentile(s, [lower_th, upper_th])
        return np.clip(s, _lower, _upper)
    
    
    def f_normalize(s):
        return (s - np.mean(s)) / np.std(s)
    
    
    def sigmoid(s):
        return s.apply(lambda v: 0.0 if v < -709 else 1 / (1 + np.exp(-v)))
    
    value = f_normalize(clip(df[kind]))
    emissions = f_normalize(clip(df["emissions"]))
    value = sigmoid((value / emissions)).fillna(0)
    
    if scale > 0:
        return value * scale
    else:
        return value

In [None]:
def visualize_opportunity_drivers_quantitive(df):
    """
    Visualize quantitive evaluation of each opportunity drivers.
    """

    def merge_driver(x):
        if isinstance(x, str) and x.startswith("Other"):
            return "Other"
        else:
            return x

    _df = df.dropna(subset=["EBITDA", DEPR_AMOR, "emissions"])
    
    columns = ["EBITDA", DEPR_AMOR]
    selector_kind = alt.binding_select(options=columns, name="Calculation Base")
    select_kinds = alt.selection_single(fields=["column"], bind=selector_kind, init={"column": "EBITDA"})

    data = pd.DataFrame({
            "year": _df["survey_year"].apply(str),
            "opportunity_driver": _df["opportunity_driver"].apply(merge_driver),
            "EBITDA": quantitive_evaluation(_df, "EBITDA"),
             DEPR_AMOR: quantitive_evaluation(_df, DEPR_AMOR)
           })
    
    data = data.groupby(["year", "opportunity_driver"]).median().reset_index()
    return alt.Chart(data).transform_fold(
            columns,
            as_=["column", "evaluation"]
           ).transform_filter(
            select_kinds 
           ).mark_bar().encode(
            x="evaluation:Q",
            y=alt.Y("opportunity_driver", sort=alt.EncodingSortField(field="evaluation", order="descending")),
            color="year"
           ).add_selection(
            select_kinds
           )


visualize_opportunity_drivers_quantitive(qq_df)

The change of calculation base change the value.

* If "EBITDA" base, *Participation in carbon market*, *Use of recycling* is highly rated.
* If "Depreciation & Amortization" base, *Acccess to new markets* and *Shift in consumer preferences* is important.

The "Depreciation & Amortization" base emphasis on asset procurement efficiency. Please refer the detail to [this notebook](https://www.kaggle.com/takahirokubo0/cdp-hybrid-metrics-for-corporate-sustainability).


# Calculate CROM

Now we can calculate CROM.  

*CROM = qualitative evaluation * quantitative evaluation*

In [None]:
def visualize_opportunity_drivers_crom(df):
    """
    Visualize qualitative and quantitive evaluation of each opportunity drivers.
    """

    def merge_driver(x):
        if isinstance(x, str) and x.startswith("Other"):
            return "Other"
        else:
            return x
    
    slider_t = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Time Horizon Weight:")
    select_t = alt.selection_single(fields=["time_horizon_weight"],
                                    bind=slider_t, init={"time_horizon_weight": 1.0})

    slider_l = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Likelihood Weight:")
    select_l = alt.selection_single(fields=["likelihood_weight"],
                                    bind=slider_l, init={"time_horizon_weight": 1.0})

    slider_i = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Impact Weight:")
    select_i = alt.selection_single(fields=["impact_weight"],
                                    bind=slider_i, init={"impact_weight": 1.0})
    
    columns = ["EBITDA", DEPR_AMOR]
    selector_kind = alt.binding_select(options=columns, name="Calculation Base")
    select_kinds = alt.selection_single(fields=["column"], bind=selector_kind, init={"column": "EBITDA"})

    select_opp = alt.selection_multi(fields=["opportunity_driver"])
    
    _df = df.dropna(subset=["EBITDA", DEPR_AMOR, "emissions"])

    data = pd.DataFrame({
            "year": _df["survey_year"].apply(str),
            "opportunity_driver": _df["opportunity_driver"].apply(merge_driver),
            "t": _df["time_horizon_value"],
            "l": _df["likelihood_value"],
            "i": _df["impact_value"],
            "EBITDA": quantitive_evaluation(_df, "EBITDA"),
             DEPR_AMOR: quantitive_evaluation(_df, DEPR_AMOR)
           })
    
    data = data.groupby(["year", "opportunity_driver"]).median().reset_index()
    base = alt.Chart(data).transform_calculate(
                qualitative=(datum.t * select_t.time_horizon_weight + datum.l * select_l.likelihood_weight + datum.i * select_i.impact_weight) / 3
           ).transform_fold(
                columns,
                as_=["column", "quantitive"]
           ).transform_filter(
                select_kinds
           ).transform_filter(
                select_opp
           ).transform_calculate(
                crom=datum.qualitative + datum.quantitive
           ).add_selection(
                select_t, select_l, select_i, select_kinds, select_opp
            )
    
    location = base.mark_circle().encode(
                x=alt.X("quantitive:Q", scale=alt.Scale(domain=[0, 5])),
                y=alt.Y("qualitative:Q", scale=alt.Scale(domain=[0, 7])),
                size=alt.Size("crom:Q", scale=alt.Scale(align=1.5, domain=[0, 25])),
                color="opportunity_driver",
                tooltip=["year", "opportunity_driver"]
            )
    
    time_series = base.mark_line().encode(
                    x="year",
                    y=alt.Y("sum(crom):Q", scale=alt.Scale(domain=[0, 12])),
                    color="opportunity_driver"
                  ).properties(
                    width=250
                  )
    
    return time_series | location


visualize_opportunity_drivers_crom(qq_df)

We can recognize the following fact by selecting trend in left chart and analyze the reason (qualitative or quantitive) in the right chart.

* Increase
  * *Development and/or expansion of low emission goods and services* increases its CROM score by both qualitative & quantitive score.
  * *Use of supportive policy incentives* increases its CROM score by quantitive score.
* Decrease
  * *Reduced water usage and consumption* decreases its CROM score by its quantitive score.


Finally add company data to analyze.

In [None]:
DISCLOSURE_ROOT = "../input/cdp-unlocking-climate-solutions/Corporations/Corporations Disclosing"
cl_ddfs = []

for year in YEARS:
    kind = "Climate Change"
    file_name = "{}_Corporates_Disclosing_to_CDP_{}.csv".format(year, kind.replace(" ", "_"))
    path = "{}/{}/{}".format(DISCLOSURE_ROOT, kind, file_name)
    df = pd.read_csv(path)
    cl_ddfs.append(df)

cl_ddfs = pd.concat(cl_ddfs)
qq_df = qq_df.merge(cl_ddfs, how="inner", on=["survey_year", "account_number"], suffixes=(None, "_master"))
qq_df.head(5)

# Find the actionable points of city and company by CROM

The actionable points meet the following 2 conditions.

1. The opportunities of the city and company are overlapping.
2. The opportunities should be feasible and effective (high CROM).

To reveal 1, extracting collaborations of cities and matching these to corporate ones is necessary.

* Extract collaborations of cities from *Please provide some key examples of how your city collaborates with business in the table below.*
  * 2018: 5.1a
  * 2019: 6.1a
  * 2020: 6.2a
* Match the collaboration and opportunities by [Universal Sentence Encoder](https://arxiv.org/abs/1803.11175).
  * It is suitable to convert text to vector in CDP because the CDP has multiple language responses.
  
At first extract collaborations.

In [None]:
CITY_RESPONSE_ROOT = "../input/cdp-unlocking-climate-solutions/Cities/Cities Responses"
CITY_DISCLOSING_ROOT = "../input/cdp-unlocking-climate-solutions/Cities/Cities Disclosing"
city_dfs = []


for year in YEARS:
    response_file_name = "{}_Full_Cities_Dataset.csv".format(year)
    disclosing_file_name = "{}_Cities_Disclosing_to_CDP.csv".format(year)
    response_path = "{}/{}".format(CITY_RESPONSE_ROOT, response_file_name)
    disclosing_path = "{}/{}".format(CITY_DISCLOSING_ROOT, disclosing_file_name)
    
    responses = pd.read_csv(response_path)
    disclosing = pd.read_csv(disclosing_path)
    merged = responses.merge(disclosing, how="inner", on=["Year Reported to CDP", "Account Number"], suffixes=(None, "_master"))
    city_dfs.append(merged)

city_dfs = pd.concat(city_dfs)

In [None]:
def extract_city_collaborations(city_dfs):
    business_collaborations = {
        2018: "5.1a",
        2019: "6.1a",
        2020: "6.2a"
    }
    
    city_collaborations = []
    for year in YEARS:
        df = city_dfs[city_dfs["Year Reported to CDP"] == year]
        question_number = business_collaborations[year]
        question_df = df[
                        (df["Question Number"] == question_number) &\
                        (df["Column Name"] == "Description of collaboration")
                        ].dropna(subset=["Response Answer"])
        city_collaborations.append(question_df)
    
    city_collaborations = pd.concat(city_collaborations)
    
    return city_collaborations


city_collaborations = extract_city_collaborations(city_dfs)
city_collaborations.head(5)

Each city has its own collaborations.

In [None]:
def get_collaboration(df, city_name, year):
    collaboration = df[(df["Organization"] == city_name) & (df["Year Reported to CDP"] == year)]["Response Answer"]
    return collaboration


tokyo_collaboration = get_collaboration(city_collaborations, "Tokyo Metropolitan Government", 2020)
tokyo_collaboration.head(5)

Then match the collaborations and opportunities.  

In [None]:
import tensorflow_hub as hub


encoder = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

In [None]:
from sklearn.metrics.pairwise import cosine_similarity


class OpportunityMatcher():
    
    def __init__(self, collaborations):
        self.collaborations = collaborations
        self._co_keys, self._co_vectors = self.calculate_vectors(collaborations)

    def calculate_vectors(self, series, batch_size=10):
        keys = series.dropna().unique()
        vectors = None
        texts = keys.tolist()
        for i in range(0, len(texts), batch_size):
            target = texts[i:(i + batch_size)]
            v = encoder(target)
            if vectors is None:
                vectors = v.numpy()
            else:
                vectors = np.vstack([vectors, v.numpy()])

        return keys, vectors
    
    def _sigmoid(self, v):
        return 0.0 if v < -709 else 1 / (1 + np.exp(-v))
        
    def add_similarity_weight(self, df, opportunities_column):
        op_keys, op_vectors = self.calculate_vectors(df[opportunities_column])
        corr = cosine_similarity(op_vectors, self._co_vectors)
        
        folded = []
        for c in self._co_keys:
            co_df = []
            for i, row in df.iterrows():
                d = {}
                for _c in df.columns:
                    d[_c] = row[_c]
                
                d["collaboration"] = c
                
                if not pd.notnull(row[opportunities_column]):
                    d["similarity"] = 0
                else:
                    y = op_keys.tolist().index(row[opportunities_column])
                    x = self._co_keys.tolist().index(c)
                    cr = corr[y, x]
                    d["similarity"] = self._sigmoid(cr)

                co_df.append(d)
            
            folded += co_df
        
        folded = pd.DataFrame(folded)
        return folded

In [None]:
def visualize_collaboration_crom(df):
    """
    Visualize collaboration.
    """

    def merge_driver(x):
        if isinstance(x, str) and x.startswith("Other"):
            return "Other"
        else:
            return x
    
    slider_t = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Time Horizon Weight:")
    select_t = alt.selection_single(fields=["time_horizon_weight"],
                                    bind=slider_t, init={"time_horizon_weight": 1.0})

    slider_l = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Likelihood Weight:")
    select_l = alt.selection_single(fields=["likelihood_weight"],
                                    bind=slider_l, init={"time_horizon_weight": 1.0})

    slider_i = alt.binding_range(min=0.5, max=2.0, step=0.1, name="Impact Weight:")
    select_i = alt.selection_single(fields=["impact_weight"],
                                    bind=slider_i, init={"impact_weight": 1.0})
    
    columns = ["EBITDA", DEPR_AMOR]
    selector_kind = alt.binding_select(options=columns, name="Calculation Base")
    select_kinds = alt.selection_single(fields=["column"], bind=selector_kind, init={"column": "EBITDA"})

    select_co = alt.selection_multi(fields=["collaboration"])
    
    _df = df.dropna(subset=["EBITDA", DEPR_AMOR, "emissions"])

    data = pd.DataFrame({
            "year": _df["survey_year"].apply(str),
            "collaboration": _df["collaboration"],
            "similarity": _df["similarity"],
            "t": _df["time_horizon_value"],
            "l": _df["likelihood_value"],
            "i": _df["impact_value"],
            "EBITDA": quantitive_evaluation(_df, "EBITDA") * _df["similarity"],
             DEPR_AMOR: quantitive_evaluation(_df, DEPR_AMOR) * _df["similarity"]
           })
    
    data = data.groupby(["year", "collaboration"]).median().reset_index()
    base = alt.Chart(data).transform_calculate(
                qualitative=(datum.t * select_t.time_horizon_weight + datum.l * select_l.likelihood_weight + datum.i * select_i.impact_weight) * datum.similarity / 3
           ).transform_fold(
                columns,
                as_=["column", "quantitive"]
           ).transform_filter(
                select_kinds
           ).transform_filter(
                select_co
           ).transform_calculate(
                crom=datum.qualitative * datum.quantitive * datum.similarity
           ).add_selection(
                select_t, select_l, select_i, select_kinds, select_co
            )
    
    location = base.mark_circle().encode(
                x=alt.X("quantitive:Q"),
                y=alt.Y("qualitative:Q", scale=alt.Scale(domain=[1, 3])),
                size=alt.Size("crom:Q", scale=alt.Scale(align=1.5, domain=[0, 5])),
                color="collaboration",
                tooltip=["year", "collaboration"]
            )
    
    time_series = base.mark_line().encode(
                    x="year",
                    y=alt.Y("sum(crom):Q", scale=alt.Scale(domain=[0, 5])),
                    color="collaboration"
                  ).properties(
                    width=250
                  )
    
    return time_series | location

In [None]:
def show_collaboration(city_name, year, city_collaborations, qq_df):
    collaboration = get_collaboration(city_collaborations, city_name, year)
    matcher = OpportunityMatcher(collaboration)
    qq_df_with_similarity = matcher.add_similarity_weight(qq_df, "opportunity_driver")
    return visualize_collaboration_crom(qq_df_with_similarity)

In [None]:
show_collaboration("Tokyo Metropolitan Government", 2020, city_collaborations, qq_df)

* The CROM score increase because of quantitative value increase.
* "We are implementing a collaborative research which leverages private companies..." is most effective collaboration.

You can evaluate various city collaborations.


# Conclusion

I show that CROM is useful to evaluate climate-related opportunities. And we can find actionable points by combining Natural Language Processing. CROM is very simple but enough to use and has many customizable points in it. I think customizable is necessary because climate change status changes from moment to moment and it means we have to change the priority timely.

To conclude, The CROM is the most suitable metric to triage climate-related opportunities and evaluate collaboration points.