# openmod Parse

This notebook scrapes the [openmod-inititaive](https://wiki.openmod-initiative.org) database for all listed energy models.

An earlier version of this notebook was published by Samuel Dotson for a [Ph.D. proposal](https://github.com/samgdotson/2023-dotson-prelim).

In [1]:
from lzma import MODE_NORMAL
from xml import dom
import requests
from lxml import etree
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

In [2]:
def has_no_class(tag):
    return not tag.has_attr('class')

In [3]:
URL = "https://wiki.openmod-initiative.org/wiki/Open_Models"
BASE_URL = "https://wiki.openmod-initiative.org"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id='mw-content-text')
list_elements = results.find_all('ul')

In [4]:
cols= ["institution",
    "doi",
    "citation",
    "computation time comments",
    "computation time minutes",
    "contact email",
    "contact persons",
    "data availability",
    "decisions",
    "deterministic",
    "full model name",
    "georegions",
    "georesolution",
    "is suited for many scenarios",
    "license",
    "math model type",
    "math model type shortdesc",
    "model class",
    "model source public",
    "modelling software",
    "open future",
    "open source licensed",
    "processing software",
    "report references",
    "sectors",
    "source download",
    "technologies",
    "text description",
    "time resolution",
    "website",
    "objective",
    "url",
    "authors",
    "example research questions",
    "math objective",
    "network coverage",
    "number of variables"]


In [5]:
model_list = list_elements[-1]

# run through the list of models

frames = []
for a in model_list.find_all('a', href=True):
    
    model_name = list(a.children)[0]
    print(model_name)

    # get the URL of a model
    MODEL_URL = BASE_URL + a['href']
    print(MODEL_URL)

    
    
    
    df_list = pd.read_html(MODEL_URL)
    
    obj = df_list[3].loc[4].values[1]
    
    large_df = df_list[4]
    large_df = large_df.rename(columns={1:f'{model_name}'}).T
    large_df.columns = large_df.iloc[0]
#     large_df.columns = cols
    large_df.drop(large_df.index[0], inplace=True)
    large_df['Objective'] = obj
    large_df['URL'] = MODEL_URL
    
    frames.append(large_df)

AMIRIS
https://wiki.openmod-initiative.org/wiki/AMIRIS
ASAM
https://wiki.openmod-initiative.org/wiki/ASAM
Antares-Simulator
https://wiki.openmod-initiative.org/wiki/Antares-Simulator
AnyMOD
https://wiki.openmod-initiative.org/wiki/AnyMOD
Backbone
https://wiki.openmod-initiative.org/wiki/Backbone
Balmorel
https://wiki.openmod-initiative.org/wiki/Balmorel
Breakthrough Energy Model
https://wiki.openmod-initiative.org/wiki/Breakthrough_Energy_Model
CAPOW
https://wiki.openmod-initiative.org/wiki/CAPOW
CESAR-P
https://wiki.openmod-initiative.org/wiki/CESAR-P
Calliope
https://wiki.openmod-initiative.org/wiki/Calliope
CapacityExpansion
https://wiki.openmod-initiative.org/wiki/CapacityExpansion
DESSTinEE
https://wiki.openmod-initiative.org/wiki/DESSTinEE
DIETER
https://wiki.openmod-initiative.org/wiki/DIETER
Demod
https://wiki.openmod-initiative.org/wiki/Demod
Dispa-SET
https://wiki.openmod-initiative.org/wiki/Dispa-SET
DynPP
https://wiki.openmod-initiative.org/wiki/DynPP
EA-PSM Electric Arc Fl

In [6]:
model_df = pd.concat(frames, axis=0)
model_df.columns = cols
model_df.index.names = ['Model']

In [7]:
model_df.to_csv('results/esom_database_raw.csv', encoding='utf-8')

# Process Data

* Filter for the frameworks that have open licenses AND have open source code.

In [22]:
model_df = pd.read_csv('results/esom_database_raw.csv', encoding='utf-8', index_col='Model')

In [23]:
model_df.fillna(str(-999), inplace=True)

In [19]:
print(f"The parser found {len(model_df.index)} distinct energy models on the openmod wiki.")

The parser found 90 distinct energy models on the openmod wiki.


In [28]:
strip_str = lambda x: x.strip(' +')
strip_series = lambda x: x.apply(strip_str)

In [31]:
clean_df = model_df.apply(strip_series)

In [33]:
clean_df[['objective','modelling software', 'processing software']]

Unnamed: 0_level_0,objective,modelling software,processing software
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AMIRIS,-999,Java,Python
ASAM,-999,Python (Pyomo),"Python, PyPSA, Mesa"
Antares-Simulator,"socio-economic welfare, investment costs, greenhouse gas emissions","C++, C","Python, TypeScript"
AnyMOD,"cost minimization by default, can set other objectives",Julia/JuMP,-999
Backbone,Cost minimization; emission minimization;,GAMS,Spine Toolbox or Excel
Balmorel,economic costs,GAMS,-999
Breakthrough Energy Model,Minimize cost,Julia/JuMP,Python
CAPOW,Cost minimization,Python (Pyomo),-999
CESAR-P,-999,"Python, EnergyPlus",-999
Calliope,"User-dependent, including financial cost, CO2, and water consumption",Python (Pyomo),Python (pandas et al)


In [7]:
model_df.columns

Index(['institution', 'doi', 'citation', 'computation time comments',
       'computation time minutes', 'contact email', 'contact persons',
       'data availability', 'decisions', 'deterministic', 'full model name',
       'georegions', 'georesolution', 'is suited for many scenarios',
       'license', 'math model type', 'math model type shortdesc',
       'model class', 'model source public', 'modelling software',
       'open future', 'open source licensed', 'processing software',
       'report references', 'sectors', 'source download', 'technologies',
       'text description', 'time resolution', 'website', 'objective', 'url',
       'authors', 'example research questions', 'math objective',
       'network coverage', 'number of variables'],
      dtype='object')

In [7]:
other_models = [
"REMix", # done
"PRIMES",  # added
"MARKAL", # deprecated, now TIMES
"METIS", # done
"ENSYSI",
"OSeMOSYS", #done
"SimREN", # EnergyPLAN
"NEMS", #done
"POLES",
"OPERA", #done
"EnergyPLAN", #done
"IWES",
"ESME", #done
"STREAM", # EnergyPLAN
"ETM", #done
"LEAP", # EnergyPLAN
"E4Cast", # EnergyPLAN
"DynEMo",
"IKARUS", # EnergyPLAN
]

In [8]:
has_openmod_entry = [(i in model_df.index) for i in other_models]
for m,b in zip(other_models, has_openmod_entry):
    print(m,b)

REMix False
PRIMES False
MARKAL False
METIS False
ENSYSI False
OSeMOSYS True
SimREN False
NEMS False
POLES False
OPERA False
EnergyPLAN False
IWES False
ESME False
STREAM False
ETM False
LEAP False
E4Cast False
DynEMo False
IKARUS False
