# Defining the Bioeconomy

For studying the sustainability transition of the Swedish Forest-based Bioeconomy, a definition of the bioeconomy is needed.

Previous research has struggled with defining the bioeconomy from a theoretical standpoint, instead providing ad hoc operationalization for each research project.
These definitions usually depart from the central notion that a bioeconomy has at least something to do with the conversion of biomass for economic purposes.

Which sectors exactly are included in this depends on the research project.
For SWINNO, one previous definition exists.
In their working paper Cristina and Blaise use the following sectors to filter SWINNO for innovations:

|     Sectors                                       |     SNI Code                          |
|:---------------------------------------------------|---------------------------------------:|
|     Agriculture                                   |     011, 012, 12, 17, 18, 14,15,19    |
|     Chemical and Chemical products                |     24                                |
|     Construction                                  |     45                                |
|     Energy                                        |     10                                |
|     Engineering (machine and equipment)           |     29                                |
|     Forestry and logging                          |     20                                |
|     Fishery                                       |     050                               |
|     Food and beverages/tobacco                    |     15,16                             |
|     Health & Social Welfare- Pharmaceuticals,     |     85,                               |
|     Medical Equipment                             |     33                                |
|     Paper and paper products                      |     21                                | 

: Sector Codes Used in Chaminade and Bayuo

In addition, they use keywords to capture innovation outside the specified sector boundaries relating to the bioeconomy.
The keywords are: Biochemicals, Biofoods, Biofuel, Biogas, Biogenetic, Biological, Biomass, Biopharma, Bioplant, Biorefineries, Biotech, Biotechnology, Biomaterial, Biotextiles, Biodiversity, Co2, Carbondioxide, Climate, Ecological, Ecology, Environment, GMO, Greenhouse, Green, Food, Forest, Forestry, Fishing, Organic, Paper, Pulp, Renewable, Recycling, Recycle, Timber, Waste, Wood.

With this combination they find 560 innovations, which after manually checking for false positives is reduced to 360 innovations related to the bioeconomy by source or use.

I propose a slightly similar approach of combining sector codes of product origin and use with keywords.
However, I depart from the value chain of the forest based bioeconomy for my definition.
@wolfslehner2016ForestBioeconomyNew [17] illustrate the forest product value chain in @fig-forest-value-chain.

![Forest Sector Value Chain](../../assets/forest-sector-value-chain.png){#fig-forest-value-chain}.

For the classification this implies when capturing the bioeconomy, one must include the core forestry sectors.
In order to capture innovation outside, it is best to follow the products produces by the core sectors.
For this, one must first identify all, or at least the most relevant, final and intermediate products originating from forests.

## Different Sectors of the Forest Bioeconomy

An argument can be made that the core sectors of the forest bioeconomy are too restrictive based on the services rendered by the ecosystem forest.
The ecosystem service concept includes four services from which humans benefit: provisional, supporting, regulating and cultural.
Of those, provisional are the easiest to attribute and should, for the most part, be included in the sectors selected below.

|SNI Code | Description                            |
|--------:|:---------------------------------------|
|02       | Forestry and related services
|20       | Wood and wood product manufacturing except furniture
|21       | Pulp, paper and paper product manufacturing
|36       | Furniture manufacturing; other manufacturing

: Core Sectors of the Forest Bioeconomy

Possible extensions of these core sectors to other sectors involved in the bioeconomy are presented in @tbl-new_sectors, organized by the reason for their identification.

| Category           | Code | Industry                                                    |
|:----------------------|------:|:---------------------------------------------------------|
| Frequent in Literature                      | 17    | Textile manufacturing                             |
| Frequent in Literature                      | 19    | Tanning and dressing of leather; manufacture of luggage, handbags, and footwear |
| Frequent in Literature                      | 24    | Chemical and chemical product manufacturing |
| Frequent in Literature                      | 25    | Rubber and plastic product manufacturing |
| Frequent in Literature                      | 37    | Waste collection, treatment, and disposal activities; materials recovery |
| Frequent in Literature                      | 40    | Electricity, gas, steam, and hot water supply |
| Frequent in Literature                      | 45    | Construction|
| Cultural                   | 92    | Recreational, cultural, and sporting activities                   |
| Cultural                   | 93    | Other service activities                                                |
| Provisional              | 01    | Agriculture, hunting, and related services                           |
|Provisional | 0113 | Growing of fruit, berries, nuts, herbs etc |
|Provisional | 0150 | Hunting, game keeping and related services|
|Provisional | 1533 |Annan beredning och hållbarhetsbehandling av frukt, bär och grönsaker|
| Provisional              | 05    | Fishing, aquaculture, and related services                            |
| Provisional              | 15    | Food and beverage manufacturing                                     |
| Provisional              | 18    | Clothing and fur product manufacturing                         |

: Potential additional sectors {#tbl-new_sectors}

However, considering the high rate of false positives, keywords are needed to filter these sectors.
Due to the way the query is structured, it appears more prudent to omit the additional sectors and instead apply the keywords to the descriptions of all innovations in SWINNO.
Apart from being more parsimonious, this also improves the accuracy of the query as any innovation, regardless of their sectoral use or origin, that makes use of at least one forest based material should be included in the forest based bioeconomy.

The keywords are based on domain knowledge from literature and from experience of working with SWINNO:

    - "virke",
    - "cellulos",
    - "lignin",
    - "spån", 
    - "bark",
    - "levulinsyra" (Levulinic acid),
    - "furfural" (Furfural),
    - "svarttjära",
    - "svartlut",
    - "växtbas",
    - "ved",
    - "trä",
    - "skog",
    - "papper",
    - "biobränsle",
    - "biologiskt",
    - "nedbrytbar",
    - "papper",
    - "pappret",
    - "karton",
    - "tencel",
    <!-- The following might be too coarse and yield too many false positives -->
    - "gren",
    - "kvist",
    - "grönmaterial",
    - "rot", 
    - "rött",
    - "stubb",

The query is structure so that innovations are returned even if the keyword only matches part of a description, for example, a description of an innovation that makes use of träflis for energy will be included as trä is matched in träflis.

This list of keywords is not yet exhaustive and more effort is needed to ensure that the rate of false negatives is minimized. 

The possibility of false positives is not concerning at this stage as, during classification, innovations not outside the system boundaries are tagged as such for later exclusion.



include reindeer in keywords -- the issue is that "ren" is too frequently used with other words or has synonoms, alternatively one could filter for specific reindeer aspects "renkött", "renarana". A cursory exploration yielded no results.

benzoin is a resin that should also be included as a keyword

In [1]:
from IPython.display import display, Markdown

import pandas as pd
import seaborn as sns
import altair as alt
from src.swinno_helpers import connect_swinno_db

swinno_db = connect_swinno_db()


In [2]:
core = pd.read_sql_query(
    """
select
  i.sinno_id,
  i.innovation_name_in_swedish AS name,
  i.description_in_swedish AS description,
  i.additional_information_if_origin__new_scientific_discovery || i.additional_information_if_origin__new_technologies_or_materials || i.additional_info_if_origin__official_regulation_legislation_and_standards || i.additional_information_if_origin__solution_for_a_problem || i.additional_information_if_origin__performance || i.additional_information_if_origin__other AS info,
  i.year_of_commercialization AS year
from
  innovation i
  join use_sectors us on us.sinno_id = i.sinno_id
where
  us.use_sectors like '02%'
  or us.use_sectors like '20%'
  or us.use_sectors like '21%'
  or us.use_sectors like '36%'
  or product_code like '02%'
  or product_code like '20%'
  or product_code like '21%'
  or product_code like '36%';
""",
    swinno_db,
)


In [40]:
bioeconomy = pd.read_sql(
    """
select
  i.sinno_id,
  i.innovation_name_in_swedish AS name,
  i.description_in_swedish AS description,
  i.additional_information_if_origin__new_scientific_discovery || i.additional_information_if_origin__new_technologies_or_materials || i.additional_info_if_origin__official_regulation_legislation_and_standards || i.additional_information_if_origin__solution_for_a_problem || i.additional_information_if_origin__performance || i.additional_information_if_origin__other AS info,
  i.year_of_commercialization AS year,
  us.use_sectors
from
  innovation i
  join use_sectors us on i.sinno_id = us.sinno_id
where
  (
    us.use_sectors like '02%'
    or us.use_sectors like '20%'
    or us.use_sectors like '21%'
    or us.use_sectors like '36%'
    or product_code like '02%'
    or product_code like '20%'
    or product_code like '21%'
    or product_code like '36%'
  )
  or (
    description like '%virke%'
    or description like '%cellulos%'
    or description like '%lignin%'
    or description like '%spån%'
    or description like '%bark%'
    or description like '%levulinsyra%'
    or description like '%furfural%'
    or description like '%svarttjära%'
    or description like '%svartlut%'
    or description like '%växtbas%'
    or description like '%ved%'
    or description like '%trä%'
    or description like '%skog%'
    or description like '%biobränsle%'
    or description like '%biologisk%'
    or description like '%nedbrytbar%'
    or description like '%papper%'
    or description like '%pappret%'
    or description like '%karton%'
    or description like '%tencel%'
  );
""",
    swinno_db,
)


In [41]:
count_core = len(core["sinno_id"].unique())
count_bioeconomy = len(bioeconomy["sinno_id"].unique())

display(
    Markdown(
        f"The number of innovations in the forest based bioeconomy sector as defined above is {count_bioeconomy}, with {count_core} innovations in the core sector. \
This means that at least {round(count_core /count_bioeconomy * 100, 2)}% of the innovations in the forest based bioeconomy stem from traditional forest sector activities. \
These values are based on uncleaned queries of SWINNO and may include false positives, especially the full bioeconomy query.      "
    )
)


The number of innovations in the forest based bioeconomy sector as defined above is 892, with 719 innovations in the core sector. This means that at least 80.61% of the innovations in the forest based bioeconomy stem from traditional forest sector activities. These values are based on uncleaned queries of SWINNO and may include false positives, especially the full bioeconomy query.      

In [42]:
sni_codes = pd.read_sql_query(
    """
select
*
from 
sni_codes
""",
    swinno_db,
)

sni_codes = sni_codes.rename(columns={"code": "use_sectors"})


In [69]:
# plot an interactive barchart of innovation counts by sector for the swinno dataframe using altair order it by descending order of the number of innovations in the sector with the highest number of innovations


def plot_innovation_counts_by_sector(df, title):
    df = df.copy()
    df = df[["sinno_id", "use_sectors", "label"]]
    df["sector"] = df["use_sectors"].str[:2]
    df = (
        df.groupby("sector")
        .nunique()["sinno_id"]
        .sort_values(ascending=False)
        .reset_index()
    )
    df = df.rename(columns={"sinno_id": "count"})
    df = df.join(sni_codes.set_index("use_sectors"), on="sector", how="left")

    chart = (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=alt.X("count:Q", title="Number of innovations"),
            y=alt.Y("label:N", title="Sector", sort="-x"),
            tooltip=["count"],
        )
        .properties(title=title, width=900)
    )

    return chart


plot_innovation_counts_by_sector(bioeconomy, "Innovation counts by sector in SWINNO")


In [102]:
swinno = pd.read_sql_query(
    """
select sinno_id, year_of_commercialization as year, innovation_name_in_swedish as name
from innovation;
""",
    swinno_db,
)


In [104]:
swinno["bioeconomy"] = swinno["sinno_id"].isin(bioeconomy["sinno_id"].unique())

swinno.loc[:, ["bioeconomy"]].sum()


bioeconomy    892
dtype: int64

In [124]:
def plot_innovation_counts_by_year(df, title):
    df = df.copy()
    df = df[["sinno_id", "year", "bioeconomy"]]
    df = (
        df.groupby("year")
        .agg(
            total_count=("sinno_id", "nunique"), bioeconomy_count=("bioeconomy", "sum")
        )
        .reset_index()
    )
    df = pd.melt(
        df,
        id_vars=["year"],
        value_vars=["total_count", "bioeconomy_count"],
        var_name="type",
        value_name="count",
    )

    chart = (
        alt.Chart(df)
        .mark_line()
        .encode(
            x=alt.X("year:O", title="Year"),
            y=alt.Y("count:Q", title="Count of Total Innovations"),
            color=alt.Color("type:N", title="Type"),
            tooltip=["count", "year"],
        )
        .transform_filter(alt.datum.year >= 1970)
        .properties(title=title, width=900)
        .configure_title(font="Inconsolata")
        .configure_axis(labelFont="Inconsolata", titleFont="Inconsolata")
        .interactive()
    )

    return chart


plot_innovation_counts_by_year(swinno, "Innovation counts by year in SWINNO")


In [176]:
def calculate_bioeconomy_share(df):

    df = df.copy()
    df = df[["sinno_id", "year", "bioeconomy"]]
    df = (
        df.groupby("year")
        .agg(
            total_count=("sinno_id", "nunique"), bioeconomy_count=("bioeconomy", "sum")
        )
        .reset_index()
    )
    df["share_bioeconomy"] = df["bioeconomy_count"] / df["total_count"]

    return df


def plot_innovation_counts_by_year(df, title):
    df = calculate_bioeconomy_share(df)

    base = alt.Chart(df).encode(alt.X("year:O", axis=alt.Axis(title="Year")))

    count = (
        base.mark_line(color="red", opacity=0.6)
        .encode(
            alt.Y("bioeconomy_count:Q", title="Count of Bioeconomy Innovations"),
            tooltip=["bioeconomy_count", "share_bioeconomy", "year"],
        )
        .interactive()
    )

    share = (
        base.mark_line()
        .encode(
            alt.Y("share_bioeconomy:Q", title="Share of Bioeconomy Innovations"),
            tooltip=["bioeconomy_count", "share_bioeconomy", "year"],
        )
        .interactive()
    )

    chart = (
        alt.layer(count, share)
        .resolve_scale(y="independent")
        .transform_filter((alt.datum.year >= 1970) & (alt.datum.year <= 2010))
        .properties(title=title, width=900)
        .configure_title(font="Inconsolata")
        .configure_axis(labelFont="Inconsolata", titleFont="Inconsolata")
    )

    return chart


plot_innovation_counts_by_year(swinno, "Innovation counts by year in SWINNO")


The plot above suggests that the innovation rate for bioeconomy innovations closely follows the overall rate of innovations in Sweden.
Considering that it should be more difficult to get a high share if there are few innovation, the graph suggests that the 90s were a time in which more bioeconomy innovations were commercialzied than in the years before and after.

From this chart it does not appear as if there was a sustained increase in bioeconomy innovations over the time period.
This might contradict the findings of Josef's environmental innovation study.

Still, a bioeconomy, which at its peak contributes almost 30% to the total innovation count of the country is substantial.
How does this look in different countries and using different metrics (patents)? 

<!-- TODO find benchmarks: countires and patents -->
<!-- TODO: check -->

<!-- TODO Is this correct? -->

In [178]:
bio_share = calculate_bioeconomy_share(swinno)
alt.Chart(bio_share).mark_line(point=True).encode(
    alt.X("bioeconomy_count", scale=alt.Scale(domain=[0, 180])),
    alt.Y("total_count", scale=alt.Scale(domain=[0, 180])),
    order="year",
    tooltip=["bioeconomy_count", "total_count", "year"],
).transform_filter((alt.datum.year >= 1970) & (alt.datum.year <= 2010))


This chart suggests that there is no clear trend in improving the share of bioeconomy visions to total innovations.
Any improvment is quickly reversed.

## Next Steps

TODO: time series analysis
- stationarity
- granger causality?