Skip to content

Notes on how to install the PyPSA-Eur framework and to run models based on it in GenomeDK

Notifications You must be signed in to change notification settings

ricnogfer/pypsa-in-genomedk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

PyPSA-Eur in GenomeDK

Description

The instructions below describe the steps to follow to successfully install the PyPSA-Eur framework (version 0.8.1 or higher) and to run models based on it in GenomeDK, a high-performance computing facility for the life sciences in Denmark serving scientists, students and SMEs. Although GenomeDK was primarily meant for the life sciences, this facility is equally well suited to run any CPU-intensive task including energy system models.

Installation

  1. Request an account to get access to GenomeDK here.

  2. Once the account has been created (an email will be sent to the user announcing this fact), log in to GenomeDK through ssh by opening a terminal and executing the following (replace <username> with the information provided upon requesting an account in step 1):

    privateusername@privatemachine:~$ ssh <username>@login.genome.au.dk
  3. Once logged in to GenomeDK, create a directory named workspace by executing the following:

    username@genomedkfrontend:~$ mkdir workspace
  4. Go in directory workspace by executing the following:

    username@genomedkfrontend:~$ cd workspace
  5. Download an Anaconda installation file for Linux 64 bit by executing the following (replace <version> with the version of Anaconda of interest):

    username@genomedkfrontend:~/workspace$ wget https://repo.anaconda.com/archive/Anaconda3-<version>-Linux-x86_64.sh
  6. An Anaconda installation file for Linux 64 bit is essentially a shell script file with a name similar to Anaconda3-<version>-Linux-x86_64.sh, where <version> refers to a specific version of Anaconda (e.g. Anaconda3-2023.03-1-Linux-x86_64.sh). Therefore, launch the Anaconda installation file by executing the following (replace <version> with the version of Anaconda downloaded in step 5):

    username@genomedkfrontend:~/workspace$ bash Anaconda3-<version>-Linux-x86_64.sh
  7. Download a Gurobi installation file for Linux 64 bit by executing the following (replace <version> with the version of Gurobi of interest):

    username@genomedkfrontend:~/workspace$ wget https://packages.gurobi.com/<version>/gurobi<version>_linux64.tar.gz
  8. A Gurobi installation file for Linux 64 bit is essentially a compressed archive file with a name similar to gurobi<version>_linux64.tar.gz, where <version> refers to a specific version of Gurobi (e.g. gurobi9.5.2_linux64.tar.gz). Therefore, decompress the Gurobi installation file by executing the following (this will create a directory named gurobi<version> containing Gurobi, where <version> refers to the version of Gurobi downloaded in step 7):

    username@genomedkfrontend:~/workspace$ tar -xzvf gurobi<version>_linux64.tar.gz
  9. To be able to run Gurobi, a license is required. Fortunately, the Aarhus University maintains a server (gurobi.licsrv.au.dk) which contains a license for Gurobi. To enable Gurobi consuming the necessary license, a file containing information on how to access the server is needed by this tool. Therefore, create such file by executing the following (this will create a file named gurobi.lic containing the information needed to access the server):

    username@genomedkfrontend:~/workspace$ echo -e "TOKENSERVER=gurobi.licsrv.au.dk\nPORT=41954" > gurobi.lic
  10. Add to the .bashrc file (found in the user home root in GenomeDK) the following four lines (replace <username> with the information provided upon requesting an account in step 1 and <version> with the version of Gurobi downloaded in step 7 without the . separating its major, minor and revision numbers):

    export GRB_LICENSE_FILE="/home/<username>/workspace/gurobi.lic"
    export GUROBI_HOME="/home/<username>/workspace/gurobi<version>/linux64"
    export PATH=${GUROBI_HOME}/bin:$PATH
    export LD_LIBRARY_PATH=${GUROBI_HOME}/lib
    

    Execute source ~/.bashrc for the addition to the .bashrc file to take effect on the current terminal or, alternatively, close the terminal and open a new one (every time a new terminal is opened, the file is read again).

  11. Install Gurobi Python package in Anaconda by executing the following (replace <version> with the version of Gurobi downloaded in step 7):

    username@genomedkfrontend:~/workspace$ conda install -c gurobi gurobi=<version>
  12. After installing Gurobi and enabling Anaconda to properly interact with it, clone PyPSA-Eur by executing the following (this will create a directory named pypsa-eur containing this framework):

    username@genomedkfrontend:~/workspace$ git clone https://github.com/PyPSA/pypsa-eur.git
  13. Go in directory pypsa-eur by executing the following:

    username@genomedkfrontend:~/workspace$ cd pypsa-eur
  14. Copy PyPSA-Eur default configuration file config.default.yaml (found in directory config) to file config.yaml by executing the following:

    username@genomedkfrontend:~/workspace/pypsa-eur$ cp config/config.default.yaml config/config.yaml
  15. Create an Anaconda environment tailored for PyPSA-Eur by executing the following (this might take a while to complete):

    username@genomedkfrontend:~/workspace/pypsa-eur$ conda env create -f envs/environment.yaml

Usage

  1. Log in to GenomeDK through ssh by opening a terminal and executing the following (replace <username> with the information provided upon requesting an account in step 1 of the installation procedure):

    privateusername@privatemachine:~$ ssh <username>@login.genome.au.dk
  2. Once logged in to GenomeDK, users are initially assigned to one of its front-end machines. Given that GenomeDK front-end machines are configured with a maximum limit of 1000 processes a user may launch at any point in time (which is a rather low limit), PyPSA-Eur will likely fail to build/run showing messages such as ERROR; return code from pthread_create() is 11, OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 16 and Resource temporarily unavailableOpenBLAS blas_thread_init: RLIMIT_NPROC 1000 current, 1000 max. The solution to this is to build/run PyPSA-Eur in one of GenomeDK back-end machines, where this limit is much higher. Therefore, from a GenomeDK front-end machine, log in to one of its back-end machines by executing the following:

    username@genomedkfrontend:~$ srun --mem 16G --pty bash

    After some seconds, a back-end machine with 16 GB of RAM memory and configured with the Bash shell will be assigned to the user. It is crucial to explicitly allocate (at least) this amount of RAM memory to the back-end machine given that the default value (currently set to 10 GB by GenomeDK administrators) will likely make PyPSA-Eur failing to build/run as some of its (Snakemake) rules require a substantial amount of RAM memory, forcing the OS to abruptly terminate the process and show the cryptic message Killed.

  3. Once logged in one of GenomeDK back-end machines, go in directory workspace/pypsa-eur by executing the following:

    username@genomedkbackend:~$ cd workspace/pypsa-eur
  4. Set the relevant configuration options in file config.yaml (copied in step 14 of the installation procedure) with appropriate values representing the model to run in GenomeDK - see file config.default.yaml to get an enumeration of the configuration options available (a thorough explanation about these can be found here). Some configuration options that are usually set with new values are run:name, scenario:ll, scenario:clusters and scenario:sector_opts, to name a few. In addition, setting configuration option run:shared_resources to true may be a good idea to avoid downloading and processing the same resources every time a new run is launched (otherwise the run will take additional time to complete).

  5. Activate the Anaconda environment tailored for PyPSA-Eur (created in step 15 of the installation procedure) by executing the following:

    username@genomedkbackend:~/workspace/pypsa-eur$ conda activate pypsa-eur
  6. Run PyPSA-Eur model based on the configuration done in step 4 by executing the following (this might take a while to complete, especially the first time it runs):

    username@genomedkbackend:~/workspace/pypsa-eur$ snakemake -call all -j1

    Of note, an error may occur (yaml.reader.ReaderError: unacceptable character #x009f: special characters are not allow) when using tools such as PuTTY or WSL (from Windows) to connect to GenomeDK and perform this step. The solution to this is to use ssh (from a Linux distribution) to connect to the machine instead.

Tips & Tricks

  1. To avoid slowing down GenomeDK due to Gurobi constantly reading and writing temporary files, assigning scratch memory to the temporary directory is a good idea. Although option solving:tmpdir (found in PyPSA-Eur default configuration file config.default.yaml in directory config) was originally implemented with this idea in mind, in practice it has no effect given that the option is currently disabled. Therefore, the workaround to this is to add an argument when calling function n.optimize or n.optimize.optimize_transmission_expansion_iteratively (in function solve_network found in the Python script solve_network.py) that specifies the temporary directory to use as follows:

    from pathlib import Path
    import os
    
    tmpdir = "/scratch/%s" % os.environ["SLURM_JOB_ID"]
    
    if tmpdir_scratch:
       Path(tmpdir_scratch).mkdir(parents = True, exist_ok = True)
    
    if skip_iterations:
       status, condition = n.optimize(
                                      solver_name=solver_name,
                                      model_kwargs = {"solver_dir": tmpdir},
                                      extra_functionality=extra_functionality,
                                      **solver_options,
                                      **kwargs,
                                     )
    else:
       status, condition = n.optimize.optimize_transmission_expansion_iteratively(
                                                                                  solver_name=solver_name,
                                                                                  model_kwargs = {"solver_dir": tmpdir},
                                                                                  track_iterations=track_iterations,
                                                                                  min_iterations=min_iterations,
                                                                                  max_iterations=max_iterations,
                                                                                  extra_functionality=extra_functionality,
                                                                                  **solver_options,
                                                                                  **kwargs,
                                                                                 )
  2. To avoid having to provide user credentials every time a connection to GenomeDK is made, setting up a public key authentication in this machine is a good idea. First, generate a public key using a tool named ssh-keygen by executing the following (press ENTER when asked questions by ssh-keygen as the default values are appropriate for this case):

    privateusername@privatemachine:~$ ssh-keygen

    After generating a public key, copy it to GenomeDK by executing the following (replace <username> with the information provided upon requesting an account in step 1 of the installation procedure):

    privateusername@privatemachine:~$ ssh-copy-id -i ~/.ssh/id_rsa <username>@login.genome.au.dk

    From this point on, every time a connection is made to GenomeDK - through, e.g., ssh or scp - no user credentials are asked and the connection will be made transparently.

  3. To copy files from a machine to GenomeDK, the tool scp may be useful in this task due to its simplicity and ubiquity in all Linux distributions. Therefore, to copy a file to GenomeDK execute the following (replace <file> with the name of the file to copy, <username> with the information provided upon requesting an account in step 1 of the installation procedure and <location> with the location in the user home in GenomeDK where the file will be copied to):

    privateusername@privatemachine:~$ scp <file> <username>@login.genome.au.dk:~/<location>

    Moreover, in case of need to copy an entire directory (including sub-directories) execute the following (replace <directory> with the name of the directory to copy, <username> with the information provided upon requesting an account in step 1 of the installation procedure and <location> with the location in the user home in GenomeDK where the directory will be copied to):

    privateusername@privatemachine:~$ scp -rp <directory> <username>@login.genome.au.dk:~/<location>
  4. When performing a long-running process (or task), e.g. a PyPSA-Eur model, on a remote machine, e.g. GenomeDK, the capability to persist a session when the connection to the machine drops for whatever the reason becomes important (otherwise all the work done by the process in the meantime will be lost). To that end, a tool named screen is very useful given its capability of persisting a session to a remote machine even when the connection to it is no longer available. To persist a session using screen, execute the following:

    username@genomedkbackend:~$ screen

    This will make screen to be in the background and return to the terminal. The user may then launch a long-running process and press Ctrl+a followed by d afterwards. Pressing this keys sequence will: 1) detach the process, 2) display information about the detachment (e.g. 23625.pts-0.username (04/28/2023 09:36:05 AM) (Attached)) and 3) return to the terminal. The session is preserved now even when the connection to the remote machine is lost. To reattach to the session (i.e. to see it again), log in to the remote machine and execute the following (replace <session_id> with the number displayed upon detaching the session - e.g. 23625):

    username@genomedkbackend:~$ screen -r <session_id>

Support

The present notes are actively maintained by the Energy Systems Group at Aarhus University (Denmark). Please open a ticket here in case something is missing, incomplete, outdated or incorrect.

About

Notes on how to install the PyPSA-Eur framework and to run models based on it in GenomeDK

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published