# Notes, 9/6/18

## Mac homebrew

### Build from source

* This can be used to make sure your libraries have been compiled using the same compilers that you are going to use for JEDI

* Need to do this on the mac since the default compilers are clang
    * These have access to gcc, but it's an ancient version (4.x)
    
* Add the --build-from-source option to the brew install command
    * --HEAD will check out a version from a repository
    * you can define "head" in the brew formula which gives you the ability to define a particular branch or tag to use

### Edit a formula

* Formulas are ruby scripts
    * To set an environment variable in a formula
        * ENV\['FC'\] = '/usr/local/bin/gfortran-7'

* brew edit <formula\>


* After finishing you can check your formula by "auditing"
    * brew audit <formula\>
    * This checks for acceptability to merge your formula, or edits to and existing formula into the brew repository

### Create your own formula

* Need source code
    * Brew wants source code so that it can be demonstrated that you can build on your platform

* Two ways to provide source code:
    * Tar file
        * Give brew path to a tarball containing the source code and build configuration
        * Can use autoconf, cmake, etc.
        * Give brew the sha256 sum for the tar file
            * shasum -a256 <tar_file\>
    * Repository
        * Give brew url for cloning
        * Can specify particular branches or tags
        * Give brew the sha1 value for the commit you are checking out

* Create a new formula
    * brew edit <formula\>
    
### Environment for homebrew

* export HOMEBREW_CC="gcc-7"
* export HOMEBREW_CXX="g++-7"

* brew --env
* brew config

### Building boost

* brew install --verbose --build-from-source --c++11 boost
   * says it is ignoring c++11, but it uses c++1 anyway
   * c++11 is necessary to get boost name mangling to sync up with JEDI name mangling

### Formulas for open-mpi, hdf5, netcdf

* brew edit open-mpi, hdf5, netcdf
    * Change depends_on "gcc" to "gcc@7"
    * For Fortran compiling, add: ENV\['FC'\] = '/usr/local/bin/gfortran:q'
    
* For open-mpi only
    * brew edit open-mpi
        * add to the head section
            * :using =\> :git,
            * :branch =\> "v3.2.1",
            * :revision =\> "<full sha value from git log\>"
        * ENV\['CC'\] = '/usr/local/bin/gcc-7'
        * ENV\['CXX'\] = '/usr/local/bin/g++-7'

* For netcdf only
    * do a brew edit to see the url for downloading netcdf-fortran source tarball
    * do a wget to download this tarball, unpack it, and edit the top level CMakeLists.txt
        * get rid of the macro-backtrace options to gfortran (these are for clang, not gcc)
    * repack the tar ball, and store in a safe place
    * run shasum -a256 on the tarball
    * brew edit netcdf
        * in resource "fortran" do
            * comment out the specs and replace with url and sha256 values in accordance with the new tarball
                * use the file://<full path\> form of the url

* brew install --verbose --build-from-source open-mpi, hdf5, netcdf
   * Add the --c++11 option for netcdf
   * Don't use the --HEAD option for open-mpi
        
### Formulas for ecbuild, eckit, fckit

* Need to create new formulas for these

* brew create ecbuild, eckit, fckit

* For ecbuild:
~~~~~~~~
class Ecbuild < Formula
  desc "CMake macros for building JEDI"
  homepage "https://github.com/ECMWF/"
  url "https://github.com/ECMWF/ecbuild.git",
    :using => :git,
    :tag => "2.9.0",
    :revision => "10d46b3d22df192405aa4da148e5c80bd7a814e0"

  depends_on "cmake" => :build

  def install
    mkdir "build" do
      system "cmake", "..", *std_cmake_args
      system "make", "install"
    end
  end

  test do
  end
end
~~~~~~~~

* For eckit:
~~~~~~~~
class Eckit < Formula
  desc "C++ utilities for JEDI"
  homepage "https://github.com/ECMWF"
  url "https://github.com/ECMWF/eckit.git",
    :using => :git,
    :tag => "0.22.0",
    :revision => "a1d8af4a48cdfc9088761a2ddf69eceb69e4ba13"

  depends_on "cmake" => :build
  depends_on "ecbuild"
  depends_on "eigen"
  depends_on "open-mpi"

  def install
    # Allow open-mpi to oversubscribe processes since
    # only have 2 cores on this system.
    ENV['OMPI_MCA_rmaps_base_oversubscribe'] = '1'
    mkdir "build" do
      system "ecbuild", "..", *std_cmake_args
      system "make"
      system "ctest"
      system "make", "install"
    end
  end

  test do
  end
end
~~~~~~~~

* For fckit:
~~~~~~~~
class Fckit < Formula
  desc "Fortran utilities for JEDI"
  homepage "https://github.com/ECMWF/"
  url "https://github.com/ECMWF/fckit.git",
    :using => :git,
    :tag => "0.5.2",
    :revision => "dad53e01cb6353476ca9c890071713dca353da10"

  depends_on "cmake" => :build
  depends_on "ecbuild"
  depends_on "eckit"

  def install
    # Allow open-mpi to oversubscribe processes since
    # only have 2 cores on this system.
    ENV['OMPI_MCA_rmaps_base_oversubscribe'] = '1'
    mkdir "build" do
      system "ecbuild", "..", *std_cmake_args
      system "make"
      system "ctest"
      system "make", "install"
    end
  end

  test do
  end
end
~~~~~~~~

* To install: brew install --verbose ecbuild, eckit, fckit



# MPICH2 on Mac, 7/30/19

* JEDI compiles okay using
    * Mojave
    * Clang 10.0.1
    * GNU 9.1.0
    * Mpich 3.3.1
* But get a lot of test failures
    * "channel initialization failed"
    * "sethostbyname failed, imac.fin.ucar.edu"
* Fix for this is to add an entry to the /etc/hosts file:

~~~~~~~~
hostname      # get the local host name
              # on my imac this returns "sysadmins-imac.fin.ucar.edu"

sudo vi /etc/hosts
  # add entry that uses the result of above as the second item on the line
  # For my imac this is:
  
  127.0.0.1 sysadmins-imac-fin.ucar.edu
  127.0.0.1 imac-fin.ucar.edu            # this works too
~~~~~~~~

* When running on my MacBook, using Mpich, I get the following error in addition to the ones I saw on the iMac
    * MacBook config:
        * High Sierra
        * Clang 10.0.0
        * GNU 7.4.0
        * Mpich2 3.3.1
* From OOPS, the test called test_util_intset_parser fails
    * for the test_get_channels_invalid section, the EXCEPT_THROWS() doesn't catch the exception and the program crashes
    * Don't have this dubugged yet
* Went back to openMPI on my MacBook

# Catalina OS, 11/8/19

* Fix file permissions for iTerm
    * Without this, ls won't work on ~/Documents and others
    * Set "Full Disk Access" for iTerm in the Security and Privacy system preferences
    * Instructions: http://osxdaily.com/2018/10/09/fix-operation-not-permitted-terminal-error-macos/
* Use brew to uninstall, then reinstall everything with Catalina compiled versions
    * Use Nan's list from his notes on mac build
* Go to "System Preferences" and install 11.0 and 11.2 versions of command line librarys
* Use jedi-stack build process to re-build with Clang 11.0
    * Got a link error during the build of eckit
        * Cannot link dylib directly, crypto library from openssl is out of date
        * Workaround:

```
brew upgrade openssl # openssl should already be there, but if not: brew install openssl

cd /usr/local/lib
ln -s ../Cellar/openssl/lib/libssl.1.0.0.dylib libssl.dylib
ln -s ../Cellar/openssl/lib/libcrypto.1.0.0.dylib libcrypto.dylib

# repeat the jedi-stack build process, eckit should complete this time
```

# Vagrant notes, 3/6/20

* Error: dpkg-reconfigure: unable to re-open stdin: No file or directory
    * From ubuntu VM
    * apt-get is attmpting to run in interactive mode, and is waiting for response from console
    * Fix:
        * export DEBIAN_FRONTEND=noninteractive

# Mac OS, System Integrity Protection (SIP), 5/4/20

* This is enabled by default

* Among other things, this strips DYLD_LIBRARY_PATH and LD_LIBRARY_PATH from environment
    * Big impact on python scripts
    * #!/usr/bin/env ptyhon
        * This uses python as the interpreter, but the library path variables are missing

* To shut off
    * Reboot and hold dowd Cmd-R
    * This goes into Recovery Mode
    * From Recovery Mode
        * Select Utilities -> Terminal
        * In the terminal, enter

```
csrutil status  # check if enabled, disabled
csrutil disable # shut of SIP
reboot
```

# Ctest abort, 5/4/20

* Tests that try to catch expecptions sometimes abort on Mac OS

* One example is test_ufo_parameters
    * from build directory, run the follwing which results in "Abort trap: 6"
    
```
ufo/test/test_ufo_parameters ufo/test/testinput/parameters.yaml
```

* Running a backtrace in the debugger on the above command shows that the C++ code in Parameters calls the GNU C++ library to do the stack unwind function
    * Abort is on routine \_Unwind\_Resume

* In the debugger (lldb) you can run the following to find where the text for any routine lives

```
image lookup -s _Unwind_Resume # if add -r option, then the arg to -s is a regexp
```

* Running this shows that \_Unwind\_Resume exists in 2 places:
    * ``libunwind.dylib`` - Clang version
    * ``libgcc_s.1.dylib`` - GNU version

* Turns out these are incompatible and the abort is coming from C++ code (compiled by Clang) attempting to call the GNU unwind code.
    * There are several articles on the web that discuss this issue
    * gcc.gnu.org/buzzilla/show-bug for bug number 42159 is an example with good information

* UFO and other repos have a mix of Fortran and C++ which causes problems when different compilers are used for C++ and Fortran
    * On most systems we use all Intel or all GNU for C/C++/Fortran
        * Just one path to unwind code, all is well
    * On the Mac, using Clang and GNU Fortran
        * This brings in two paths to the unwind code, causing ambiguous references
        * Running ``otool -L lib/libufo.dylib`` shows both paths to unwind code exist

```
# GNU
/usr/local/lib/gcc/9/libgfortran.5.dylib
/usr/local/lib/gcc/9/libgcc_s.1.dylib

# Clang
/usr/lib/libc++.1.dylib
/usr/lib/libSystem.B.dylib
/usr/lib/system/libunwind.dylib
```

* Whichever the dynamic loader finds first is what gets called
    * In the abort case, libgcc_s.1.dylib gets found before libunwind.dylib, but libunwind.dylib is what is needed
    * If rename libgcc_s.1.dylib (so loader can't find it), then the tests pass
        * This is because in this case, libunwind.dylib gets loaded which has the proper unwind code
        * However, two atlas tests that used to pass now fail because of unresolved references
            * The missing references must have been in libgcc_s.1.dylib

* Seems the proper fix is to split the C++ and Fortran code into two library files
    * The C++ code references only the Clang unwind code
    * The Fortran code refrences only the GNU unwind code

# Dynamic loader search order, 5/21/20

* Order is different depending on whether or not the file is specified with a path
    * Ie, in the call to dlopen()

* When just a file name (no path)
    1. LD_LIBRARY_PATH
    2. DYLD_LIBRARY_PATH
    3. process' working direcotry
    4. DYLD_FALLBACK_LIBRARY_PATH

* WHen path is included
    1. DYLD_LIBRARY_PATH
    2. Given path/filename
    3. DYLD_FALLBACK_LIBRARY_PATH

* Surprising that when the path is included, the first thing tried is DYLD_LIBRARY_PATH instead of the given path/filename

# Dynamic loader namespace, 5/21/20

* The Mac OS dynamic loader (dyld) by default uses a feature called two-level namespace
   * two-level namespace causes the link step to record where every reference in one library is resoloved in another
   * during run time, the dynamic loader follow the recording that the link step created
   * this allows both Clang and GNU std libraries to be linked to libraries containing C/C++/Fortran code and have the dynamic loader load the proper std library dylib files

* Using -flat_namespace flag during the link step disables two-level namespace (using a flat namespace instead)
   * This disables the recording of the link structure during the link step
   * The dynamic loader at runtime uses the typical LD_LIBARAY_PATH, ... search
   * This allows the Mac OS to emulate what most other Linux systems do (which can be useful)
   * However, this can cause issues with libraries that have a C/C++/Fortran code mix
       * Because the proper std library under each call is lost

# MAS (mac app store CLI), 9/30/20

* Can do updates from the App Store on the command line

* `brew install mas`

* `mas list`
    * shows available upgrades
* `mas search Xcode`
    * shows id numbers for each upgrade
* `mas install Xcode`
    * install Xcode
* `mas upgrade`
    * upgrade all packages with pending updates

# Alternative to app store for Xcode, 9/30/20

* `developer.apple.com/download/all`
    * sign in with Apple ID
    * Xcode 11.5 -> clang 11.0.3

# Vagrant, 10/29/20

* Updated to vagrant version 2.2.10

* Upgrade Vagrantfile
    * download from AWS
        * `aws s3 cp s3://data.jcsda.org/containers/Vagrantfile .`
* Update vagrant plugins
    * `vagrant plugin update`
* Rebuild vagrant

```
# remove old machine
vagrant destroy
rm -rf .vagrant

# rebuild new machine
vagrant up
vagrant ssh

# restore rc files in vagrant home directory
cp vagrant_data/.bash_profile .
cp vagrant_data/.bashrc .
cp vagrant_data/.gitconfig .
cp vagrant_data/.vimrc .
exit

# restart just to be sure
exit
vagrant halt
vagrant up
vagrant ssh
```

# jedi-stack building 7/12/21

* Using the latest Xcode, CommandLineTools breaks when building the netcdf C++ library
    * Get strange error when reading the "version" include file
    * Looks like a version string gets returned instead of the inclusion of the file
    
    * Xcode 12.5.1
    * CommandLineTools -> MacOSX11.3.sdk
    * Clang 12.0.5
    * Big Sur -> MacOSX 11.4

* Tried Xcode 13 beta 2, but this is for newer OS beyond Big Sur

* Worked through earlier versions of Xcode/CommandLineTools and found that the following works:
    * Xcode 12.4
    * CommandLineTools -> MacOSX11.1
    * Clang 12.0.0

# jedi-stack building 7/23/21

* The build system is failing on the latest Xcode/CommandLineTools version (12.5.1)
    * Get the wierd version issue noted in the prior entry when building netcdf C++ API

* I tried different versions of Xcode/CommandLineTools to see which work and which don't work
    * Big Sur, 11.5, iMac

| Xcode/CommandLineTools version | SDK version | Clang version | Does build work? |
|--------------------------------|-------------|---------------|------------------|
| 12.5.1 | 11.4 | 12.0.5 | No |
| 12.5   | 11.3 | 12.0.5 | No |
| 12.4   | 11.1 | 12.0.0 | Yes |

# jedi-stack building 8/10/21

* issue is file name conflict between new version include file in C++ stdlib and VERSION file that autoconf creates

* Temporary fix is to rename the VERSION file from autoconf after configure is run (and before make is run)

* Unidata is already aware of this and has a fix in their development track
    * https://github.com/Unidata/netcdf-cxx4/commit/41c0233cb964a3ee1d4e5db5448cd28d617925fb
    * Renames VERSION to VERSION.txt in the configure.ac file

# Mac debug shared libraries, 10/11/21

* DYLD_PRINT_LIBRARIES=YES
    * This causes the paths of dynamic libraries to be printed as they are loaded
    * Very useful for tracking down problems with multiple versions of libraries

# M1 vs Intel, 11/13/21

* On the MacBook Pro, two architectures are supported
    * Native M1
        * ARM processor
    * Emulated Intel
        * Rosetta emulator for x86_64
    * /usr/bin/arch
        * Returns which architecture you are using
        * "arm64" --> native M1
        * "i386" --> x86_64 emulation

* Need to keep things consistent across architectures
    * VS Code build exercise created arm64, but lldb crashed when starting up
        * This is because VS Code was arm64, while lldb was i386
        * Fixed this by installing the x86_64 only VS Code binaries
        * The original installation was a universal binary which can run either as arm64 or i386
            * Not sure why it didn't adjust since it was started in iTerm2 which is i386

* iTerm2 is i386
    * Installed this before switching to the new laptop

* jedi-stack built as x86_64
    * Perhaps because iTerm2 is x86_64?
    * Try re-installing iTerm2, making sure to get the i386 version

# Spack-stack, 5/2/22

## Intructions from Dom

* Repo: NOAA-EMC/spack-stack

* Make a recursive clone of the spack-stack repo
    * `git clone -b develop --recursive https://github.com/NOAA-EMC/spack-stack`

* Explore
    * `config/sties/`
        * Check out default, macos, llvm and gnu
    * compilers
        * gnu
        * apple-clang
        * llvm clang
    * MPI
        * mpich
        * openmpi

* make sure you have curl, and python poetry install
    * `brew install curl`
        * check list in the README.md file inside the `macos*` directory
    * `python3 -m pip install poetry`

* Spack-isms
    * In spack what comes first takes precedence over subsequent calls
    * '+' --> enable
    * '~' --> disable
    * '::' --> override (instead of append)
        * look in common/packages.yaml for examples

### Steps to do the build

```
# build apple clang, gnu fortran, openmpi

# create spack environment and enter the environment
cd spack-stack
source setup.h
./create-env.py --site=default --app-jedi-ufs-all --name=jedi-ufs-all-apple-clang-openmpi

spack env activate envs/jedi-ufs-all-apple-clang-openmpi

spack env status      # confirm you are inside the environment

# Use spack to auto edit the config and packages yaml.
cd envs/jedi-ufo-all-apple-clang-openmpi/site
SPACK_SYSTEM_CONFIG_PATH=`pwd` spack external find --all --scope system
SPACK_SYSTEM_CONFIG_PATH=`pwd` spack compiler find --scope system

# Check that site/compilers.yaml has the correct configurations for apple clang
# and gnu. Apple clang should be using gnu fortran.
#
# Edit the envs/jedi-ufs-all-apple-clang-openmpi/site/packages.yaml file
#
#  Remove old python spce (3.8)
#  Remove curl from /usr
#  Remove sqlite from /usr
#
#  At the top, add the following entry
#
#     all:
#       compiler:: [apple-clang]
#       providers:
#         mpi:: [openmpi]
#

# Edit the envs/jedi-ufs-all-apple-clang-openmpi/spack.yaml file
#
# Under the "specs:" section:
#     Comment out: "- ufs-weather-model/debug" line and replace with
#                  "- openmpi"

# set up and run the build
#
# The concretize command builds the dependency graph and places the result
# in enva/jedi-ufs-all-apple-clang-openmpi/spack.lock file.
#
# spack install command:
#     takes several hours when building the first time
#     can use more options
#         --fail-fast    # stop on first error
#         --reuse        # for adding a new package in a subsequent call
#                        # can use this option on the concretize command too
#
cd spack-stack   # top level directory in the clone
spack concretize 2>&1 | tee log.concretize       # builds dependency graph
spack install -v 2>&1 | tee log. install         # build and install stack
```

### Steps to create the lua module scripts

```
# Edit the envs/jedi-ufs-all-apple-clang-openmpi/common/modules.yaml file
#
#  Remove openmpi from the two blacklist files

spack module lmod refresh   # create the lua scripts

# If the above command reports duplicate errors, then need to remove the
# duplication from the build. This can be done using:
#
#   spack find -l hdf5
#   spack uninstall --dependents /HASH   # Get HASH from the find results associated
#                                        # with ESMF, etc.

./meta_modules/setup_meta_modules.py   # create lua scripts for compiler, mpi,
                                       # python, etc.
```

* Sometimes the setup_mate_modules.py doesn't work. 
    * Could be the python installation.
    * May need to use a bare-bones python installation as described below

* Spack builds separate modules for each python package that is required for the stack
    * This means that you can use a bare-bones python installation and reference everthing needed through PYTHONPATH
        * Ie, don't install all of the python packages in your python installation

# spack-stack debug, 5/2/22

* Two issues:
    * flat namespace creates collision with `_Unwind_Resume` in clang and GNU libs
    * rpaths not getting filled in for executables in JEDI build
        * These are getting filled in for the stack build

## Enabling two level namespace in the spack build
* openmpi doesn't have two level namespace option, but mpich does
* switch to mpich
* enable two level namespace
    * one way to do this is to enable on the spack install command line
    * `spack install -v mpich two_level_namespace=true`
    * but it would be better to find a way to put this in the yaml configuration

```
# go through build instructions for openmpi and substitute "mpich" for "openmpi"
#
# create envs/jedi-ufs-all-apple-clang-mpich environment
# 
# edit envs/jedi-ufs-all-apple-clang-mpich/common/packages.yaml
#   add "+two_level_namespace" to mpich.variants spec
#
#   entry should look like this:
#
#        mpich:
#            variants: ~hwloc +two_level_namespace
#

spack concretize --force 2>&1 | tee log.concretize  # can be used to rebuild
                                                    # spack.lock file
spack install -v | tee log.install      # picks up two_level_namespace from config
```

* Check that two_level_namespace is applied
    * `otool -hV file` should show "TWOLEVEL" in the flags column
    * There should be no `-Wl,-flat_namespace` arguments in the install log file

## RPATHs

* Watch out for lldb and DYLD_LIBRARY_PATH
    * Since lldb is in the protected area, DYLD_LIBRARY_PATH gets stripped out
    * Some executables/libraries are getting setup with the full paths to the library files defined in their file, while others don't
    * The files without the rpath specs rely on DYLD_LIBRARY_PATH and won't work proplerly in lldb
        * Have to add in the rpath space using install_name_tool
            * `install_name_tool -add_rpath path file`
        * Once this is done lldb will work

* Need to get build process to fill in the rpath directories
    * The spack install process does this with the libraries in the stack
    * JEDI CMake config does not do this
    * Check out https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling for help with cmake configuration

* Use `otool -l file` to see rpath specs and configuration

# spack-stack debug, 5/2/22

* once two level namespace was enabled, looks like everything gets built with two level namespace

* fv3-bundle
    * build succeeds
    * ctest has 73 failures out of 1560 tests
        * Down from 170 failures for the apple clang, openmpi build
        * Lots of `test_qg_*, saber_test_*`
        * `test_ioda_obsspace_put_db_channels` (Bus Error)
        * Seven `test_ufo_*`
        * `fv3jedi_test_tier1_lgetkf`
    * Fair amount of tests finish with a report of status = 0, but then the process crashes

* Debug test_ioda_obsspace_put_db_channels
    * Repaired RPATHs using install_name_tool
    * hdf5 library is crashing after the test finishes
        * Looks like hdf5 is getting loaded from the correct location
        * Is the linking step using the wrong location?
    * `~/projects/SPACK_STACK/fv3-bundle`
        * `build_mpich/ioda/test` check out log.test file

* Do we need to create the bare-bones Python installation to work around this
    * Ie, get rid of the hdf5 libs installed in the python area

# Specs for new laptop, 6/17/22

* MacBookPro, 16 inch
    * Space Gray
    * M1 Max (10-core CPU, 32-core GPU, 16-core Neural Engine)
    * 64GB Memory
    * 1TB SSD Storage
    * `$3899`

# How to quiet the deprecated warnings from Clang, 6/9/23

* `ecbuild -D CMAKE_CXX_FLAGS="-Wno-deprecated-declarations" ...`