# Record of versions of software when things were working October 2024

This Jupyter notebook page is for logging versions currently used at a time when everything seems to be working. This tracking should help if something breaks later. One possibility is it can be fixed by pinning versions back to when things were working.

(Adapted from [here](https://github.com/fomightez/bendit-binder/blob/master/details_on_versions_noted_when_all_working/Record_versions_2022.ipynb) where I had some older tech, and so the section under 'Jupyter-associated software versions' and maybe elsewhere varied and has been updated. Also note there, that analysis process was well tested at the point I collected a record of that data amd so I was just uploading 'the collection notebook' to the root of the session and running it. And then stored it later where I did as a record. [In other words, if I ever run there again, I need to move it to root to run, and not leave it where it is stored.] Here, I started with that as a basis and for iterating on development, it would be much easier to place it where it will remain and iterate on development, and so **an effort has been made to make it runnable from where it will end up being stored**.)

------------

### Preparation

Let's install some packaes to assist in this endeavor.

Sebastian Raschka's [watermark](https://github.com/rasbt/watermark) package is really nice for this sort of thing. See [here](https://nbviewer.org/github/rasbt/watermark/blob/master/docs/watermark.ipynb) for great documented examples of how to use it.

Min RK's Wurlitzer will be used later to suppress the C-level output from any software. (Not tested if needed actually but leaving in since combined with `%%capture` use it did make things 'quiet' [here](https://github.com/fomightez/bendit-binder/blob/master/details_on_versions_noted_when_all_working/Record_versions_2022.ipynb)).

In [1]:
%pip install watermark
%pip install wurlitzer

Note: you may need to restart the kernel to use updated packages.
Collecting wurlitzer
  Downloading wurlitzer-3.1.1-py3-none-any.whl.metadata (2.5 kB)
Downloading wurlitzer-3.1.1-py3-none-any.whl (8.6 kB)
Installing collected packages: wurlitzer
Successfully installed wurlitzer-3.1.1
Note: you may need to restart the kernel to use updated packages.


In [1]:
%load_ext watermark

___ 

## Current date and time of running this notebook

Assuming 'Run All', as is the intention.

In [2]:
%watermark -u -n -t -z 
# based on https://nbviewer.org/github/rasbt/watermark/blob/master/docs/watermark.ipynb#Last-updated-date-and-time

Last updated: Thu Oct 03 2024 19:33:43UTC



## SICILIAN software itself

The SICILIAN software itself has no version releases yet, as can be seen by looking at the [the SICILIAN repo](https://github.com/salzman-lab/SICILIAN) and looking under 'Releases' on the right side. And so we'll use the commit SHA (see [here](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/about-commits) for more about that hash) for tracking at this time.

I'm going to clone [the SICILIAN repo](https://github.com/salzman-lab/SICILIAN) and get the commit SHA and compare that to what I document in text and in my forked repo.

In [4]:
!mkdir -p gc
%cd gc 
!git clone https://github.com/salzman-lab/SICILIAN.git -q
%cd SICILIAN
!printf "\n\nbelow is the SHA of the current repository which was one before the version I adapted here,\nand only change in between was a small text addition to the README:\n"
!git log -1 --format="%H" # based on https://stackoverflow.com/a/8216896/8508004
!printf "\n\n"
# will reset the working directory back to location of this notebook for rest of notebook
%cd ~/details_on_versions_noted_when_all_working/

/home/jovyan/details_on_versions_noted_when_all_working/gc
/home/jovyan/details_on_versions_noted_when_all_working/gc/SICILIAN


below is the SHA of the current repository which was one before the version I adapted here,
and only change in between was a small text addition to the README:
8bb5ce6c7ccdfadea768902375a3899512ed0608


/home/jovyan/details_on_versions_noted_when_all_working


You'll see that corresponds to the SHA version shown [here](https://github.com/salzman-lab/SICILIAN/tree/8bb5ce6c7ccdfadea768902375a3899512ed0608) from Jun 6, 2024 that one after [here](https://github.com/salzman-lab/SICILIAN/tree/3e02c46d3f81addd5a69df31df6094bf360a1112), which has been the version of SICILIAN being used/adapted in all the work so far, see [here](https://github.com/fomightez/SICILIAN-binder/commit/6fa12290fd26cddc9375c3fdbb0599d761b8853a) where it lists the parent of my fork.

## Python Modules imported during running of the Demo Analysis

The demonstration will be run and then **all the imported packages** listed using the `--iversions` flag of watermark, that list will be what the pipeline really uses.   

First to set up to show that is how watermark will work, we'll run watermark to compare the outcome result to what the same command shows after the pipeline runs.

In [3]:
# sanity check, should only show watermark & wurlitzer, AND MAYBE NEITHER, at this point BEFORE WE RUN THE PIPELINE.
%watermark --iversions




Running the pipeline and then running watermark will be run in two different cells, so that most of the output produced running of the notebook can be suppressed by the `%%capture` cell magic from being shown here since were aren't interested in what it produces here. (Note that it takes advantage of the fact that by default notebooks called with `%run` are run in the current notebook namespace; that's not the case with scripts where you need to add the `-i` flag to get that behavior. So I added code that added the 'interactive' flag `-i`to the version of the notebook being run here with the idea that would cause the imports to happen in this namespace, but that wasn't enough to get everything because SICILIAN had been set up to run the sub-scripts it uses for each step with `sbatch` on SLURM, which I substituted with `bash` to run. And to set the step-associated sub-scripts up to run in the batch jobs on SLURM, the developers had called python to point at the sub-scripts and because of that complex handling, the called sub-scripts actually get run off in temporary shell instances even if I use the `i` flag.  So what I did was add collecting the names of the scripts and step through 'running' them with the `-i` flag here. And even though they don't actually work, because the imports happen at the top of the script, the modules they import will get imported into this notebook). NOTE THE 'RUNNING PYTHON SCRIPT' section will produce stderr. Just let it happen. I couldn't tell why `%%capture` and `wurlitzer` failed here, but in order to have `%watermark --iversions` needs to be in this namespace, so just accept it.

In [6]:
%%capture
# That cell magic line above will make this cell 'silent'
#-----------------------------------------------------------------------------------------------#
# Need to switch working directory to where notebook is since it uses other notebooks & resources
%cd ~
# add the interactive  `-i` flag to `%run SICILIAN.py` command in `Demo_SICILIAN_via_MyBinder_with_human_Chr_21.ipynb`
!sed 's/run SICILIAN.py/run -i SICILIAN.py/g' <Demo_SICILIAN_via_MyBinder_with_human_Chr_21.ipynb >Demo_SICILIAN_via_MyBinder_with_human_Chr_21INT_FLAG.ipynb
# similar replacement for another SICILIAN script
!sed -i 's/run scripts\/create_annotator/run -i scripts\/create_annotator/g' Demo_SICILIAN_via_MyBinder_with_human_Chr_21INT_FLAG.ipynb
from wurlitzer import pipes

with pipes() as (out, err):
    %run Demo_SICILIAN_via_MyBinder_with_human_Chr_21INT_FLAG.ipynb
# will reset the working directory back to location of this notebook for rest of notebook
# delete the special version wth the interactive flag, after making sure enough time pass to collect output with a buffer
import time
time.sleep(2)
!rm -rf Demo_SICILIAN_via_MyBinder_with_human_Chr_21INT_FLAG.ipynb
# To overcome the sub-scripts associated with each step being very layered into how they are executed so that nothing
# they import gets imported in the triggering namespace, I am going to run them here to see what they import
import os
import fnmatch
subdir = 'scripts'
fn_pattern = "*.py"
for file in os.listdir(subdir):
    if fnmatch.fnmatch(file, fn_pattern):
        print(f"RUNNING PYTHON SCRIPT: {file}\n")
        with pipes() as (out, err):
            try:
                %run -i {subdir+'/'+file}
            except Exception as e:
                #print(f"Error running {file}: {e}")
                #traceback.print_exc()  # Print detailed error information
                pass
#%run -i SICILIAN.py # This one can be skipped because run as part of `Demo_SICILIAN_via_MyBinder_with_human_Chr_21INT_FLAG.ipynb` already.
# Techinally, I could also skip `annotator.py` but just easier to run again as part of running scripts in `scripts` sub-directory
%cd ~/details_on_versions_noted_when_all_working/

TypeError: expected str, bytes or os.PathLike object, not NoneType

FileNotFoundError: [Errno 2] No such file or directory: '/oak/stanford/groups/horence/JuliaO/pickled/tenX_CI_df.pkl'

TypeError: expected str, bytes or os.PathLike object, not NoneType

SystemExit: 2

ValueError: Invalid file path or buffer object type: <class 'bool'>

Like I said above, disregard error. For this to work, we need to put up with it for some reason. Now to see what got loaded, run the next cell:

In [5]:
%watermark --iversions

pyarrow   : 16.1.0
wurlitzer : 3.1.1
tqdm      : 4.66.5
pysam     : 0.22.1
re        : 2.2.1
sys       : 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
pandas    : 2.2.3
matplotlib: 3.9.2
skimpy    : 0.0.15
argparse  : 1.1
numpy     : 1.26.4



Indeed, those packages indicated match up very well with those listed in [the official `requirements.txt` file for SICILIAN](https://github.com/salzman-lab/SICILIAN/blob/master/requirements.txt).


**Note that this `%watermark --iversions` approach doesn't ALWAYS show all dependendent packages at this time.** Specifically, `%watermark --iversions` wouldn't show any that use the syntax `from x import y`. (Turns out this is a known bug presently & a solution is close at hand, see [here](https://github.com/rasbt/watermark/issues/77).) But I loooked over the current SICILIAN code in relation to the necessary packages listed and I don't see them imported that way. So that current deficiency is moot here. (I think though scipy is installed as a dependency of one of the involved packages and show it will show up when you run watermark specifically using the `--packages` flag to target those present in the environment; note this is distinct from `--iversions` flag that details **imported** modules.) 

In [7]:
%watermark -p scipy,halo

scipy: 1.13.1
halo : not installed



## STAR Aligner

As [the README.md](https://github.com/salzman-lab/SICILIAN/blob/master/README.md) states:

>"SICLIAN uses STAR as the aligner and therefore it needs STAR index files."

Let's record what version of the STAR Aligner is in use. (I would have like to have featured STAR Aligner earlier; however, need the cell above to run the modified version of `Demo_SICILIAN_via_MyBinder_with_human_Chr_21.ipynb` to ensure STAR Aligner would have been installed and this cell will work. This information will also be listed among the results of running `%conda list` below. However, because it is such a integral part of this SICILIAN pipeline, I am featuring it prominently on its own.)

In [5]:
%%bash
STAR --version

2.7.11b


## Python and Machine info

In [15]:
%watermark -v -m

Python implementation: CPython
Python version       : 3.10.14
IPython version      : 8.26.0

Compiler    : GCC 12.3.0
OS          : Linux
Release     : 5.15.0-116-generic
Machine     : x86_64
Processor   : x86_64
CPU cores   : 8
Architecture: 64bit



In [16]:
import watermark
print(watermark.watermark(machine=True, globals_=globals(), iversions=True, python=True))

Python implementation: CPython
Python version       : 3.10.14
IPython version      : 8.26.0

Compiler    : GCC 12.3.0
OS          : Linux
Release     : 5.15.0-116-generic
Machine     : x86_64
Processor   : x86_64
CPU cores   : 8
Architecture: 64bit

watermark: 2.5.0



## All pip ('package installer for Python')-installed software present in the environment

In [18]:
%pip list

Package                   Version
------------------------- --------------
aiohappyeyeballs          2.4.0
aiohttp                   3.10.6
aiosignal                 1.3.1
alembic                   1.13.2
annotated-types           0.7.0
anyio                     4.4.0
argcomplete               3.5.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async_generator           1.10
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     24.2.0
Babel                     2.14.0
beautifulsoup4            4.12.3
bleach                    6.1.0
blinker                   1.8.2
Brotli                    1.1.0
cached-property           1.5.2
certifi                   2024.8.30
certipy                   0.1.3
cffi                      1.17.0
charset-normalizer        3.3.2
click                     8.1.7
comm                      0.2.2
contourpy                 1.3.0
cryptography      

See [here](https://stackoverflow.com/questions/18966564/pip-freeze-vs-pip-list#comment28043550_18966632) and [here](https://stackoverflow.com/a/28330159/8508004) as to the difference between `pip list` and `pip freeze`, used below.

In [2]:
%pip freeze

aiohappyeyeballs==2.4.2
aiohttp==3.10.6
aiosignal==1.3.1
alembic @ file:///home/conda/feedstock_root/build_artifacts/alembic_1719471393232/work
annotated-types @ file:///home/conda/feedstock_root/build_artifacts/annotated-types_1716290248287/work
anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1717693030552/work
argcomplete @ file:///home/conda/feedstock_root/build_artifacts/argcomplete_1722977963018/work
argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1692818318753/work
argon2-cffi-bindings @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi-bindings_1695386546427/work
arrow @ file:///home/conda/feedstock_root/build_artifacts/arrow_1696128962909/work
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
async-lru @ file:///home/conda/feedstock_root/build_artifacts/async-lru_1690563019058/work
async-timeout==4.0.3
async_generator @ file:///home/conda/feedstock_root/build_artifacts/async_generat

## All conda-installed software present in the environment

**This may more pertinent in MyBinder-launched sessions, as used here to collect this data, because [`environment.yml` in the `binder` directory](https://github.com/fomightez/SICILIAN-binder/blob/master/binder/environment.yml) used to specify the environment** present. And should include many of the R and R lirbary versions detailed.

In [1]:
%conda list

# packages in environment at /srv/conda/envs/notebook:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_r-mutex                  1.0.1               anacondar_1    conda-forge
aiohappyeyeballs          2.4.2                    pypi_0    pypi
aiohttp                   3.10.6                   pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
alembic                   1.13.2             pyhd8ed1ab_0    conda-forge
annotated-types           0.7.0              pyhd8ed1ab_0    conda-forge
anyio                     4.4.0              pyhd8ed1ab_0    conda-forge
argcomplete               3.5.0              pyhd8ed1ab_0    conda-forge
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py310h2372a71_4    conda-forge
arrow                

## R libraries loaded by SICILIAN's R scripts

Use [here](https://rpubs.com/Mentors_Ubiqum/list_packages) to 'List all loaded packages'/' list the libraries that are actually loaded doing'.

Like where we ran STAR , the cell running the associated notbeook above should be run, but that didn't turn out to be enough because of the sbatch use the R scripts in `scripts` were being run off in temporary shells using `Rscript` and so nothing actually loaded in the namespace here (besides it wouldn't be since this notebooks is a Ptyhon namespace, not R). Long story, not short is that extraordinary measures needed to run those scripts where they load pacakages and to see what is loaded after. (Some of this involved the trick I used with `rcheck_script.R` below, which turns out was developed before this step originally & so to understand the steps inolved fully, check out that first. Plus, checkout where I did this complex thing of iteraring on the scripts in Python above where used `watermark --iversions` for concepts. Gemini helped me translate it over.)

In [119]:
%%writefile ../loaded_by_scripts_checking_script.R
list.files("scripts", pattern = "\\.R$") -> r_scripts
for (script in r_scripts) {
  tryCatch({
    source(file.path("scripts", script))
  }, error = function(e) {
    warning(paste0("Error loading script '", script, "': ", e$message))
  })
}
#print(.packages()) #Stopping at that uncommented line will work and give grid of loaded packages, but next few lines will give list with one on each line, without base
# Get all loaded packages
loaded_packages <- .packages()
# Get base packages
base_packages <- c("base", "compiler", "datasets", "graphics", "grDevices", "grid", "methods", "parallel", "splines", "stats", "stats4", "tcltk", "tools", "utils")
# Filter out base packages
non_base_packages <- setdiff(loaded_packages, base_packages)
cat("Below lists non-base libraries loaded as ALL the R scripts in sub-directory 'scripts' were loaded and executed in an R session:\n")
cat(paste0(non_base_packages, collapse = "\n"), "\n")

Overwriting ../loaded_by_scripts_checking_script.R


In [120]:
%%capture out
# using subprocess here instead of `%%bash` so I can use `%%capture`, plus that let's me switch working directory back & forth in same code, too,
# instead of needing separate cells before & after, if I was using `%%bash`
%cd ~
import subprocess
result = subprocess.run(["Rscript", "loaded_by_scripts_checking_script.R"], capture_output=True)
sp_output = result.stdout.decode()
sp_stderr = result.stderr.decode()
# change back working directory
%cd ~/details_on_versions_noted_when_all_working/

In [121]:
libraries_loaded_text = sp_output #when code works subprocess output will just be the details of loaded libraries I want
print(libraries_loaded_text)

reading inputs:: 0 sec elapsed
Below lists non-base libraries loaded as ALL the R scripts in sub-directory 'scripts' were loaded and executed in an R session:
GenomicAlignments
Rsamtools
Biostrings
XVector
SummarizedExperiment
Biobase
MatrixGenerics
matrixStats
GenomicRanges
GenomeInfoDb
IRanges
S4Vectors
BiocGenerics
tictoc
dplyr
glmnet
Matrix
ggplot2
cutpointr
stringr
data.table 



Those listed above are the specific libraries loaded in the course of running the R scripts. You should be able to match these up with what the SICILIAN README says. Some may be added as dependencies of those listed there. Importantly, you should be able to match up those with versions listed above and below. (I may add automating that here at a later time.)

In [88]:
# clean up script run to get information
!rm ../loaded_by_scripts_checking_script.R

## R and R library versions (Some already revealed above via `%conda list`)

See list for `%conda list` above for many of these, see those prefaced with `r-`.
For example for `r-base` see:

```text
r-base                    4.3.3 
```

So using R version 4.3.3.

Confirming & dditional details provided in this section using R.

The first cell below uses a Jupyter convenience to make a `R` script we can run here using the shell because `R` is installed in the environment:

In [8]:
%%writefile rcheck_script.R
print(R.Version())
installedpackages <- as.data.frame(installed.packages()[,c(1,3:4)])
rownames(installedpackages) <- NULL
installedpackages <- installedpackages[is.na(installedpackages$Priority),1:2,drop=FALSE]
print(installedpackages, row.names=FALSE)

Overwriting rcheck_script.R


Use R command 'installed.packages()` to get a list, based on [here](https://heuristicandrew.blogspot.com/2015/06/list-of-user-installed-r-packages-and.html) and [here](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/installed.packages.html).

In [9]:
%%bash
Rscript rcheck_script.R

$platform
[1] "x86_64-conda-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "4"

$minor
[1] "3.3"

$year
[1] "2024"

$month
[1] "02"

$day
[1] "29"

$`svn rev`
[1] "86002"

$language
[1] "R"

$version.string
[1] "R version 4.3.3 (2024-02-29)"

$nickname
[1] "Angel Food Cake"

              Package   Version
                abind     1.4-5
              askpass     1.2.0
           assertthat     0.2.1
            backports     1.5.0
            base64enc     0.1-3
                   BH  1.84.0-0
              Biobase    2.62.0
         BiocGenerics    0.48.1
         BiocParallel    1.36.0
           Biostrings    2.70.1
                  bit     4.5.0
                bit64     4.5.2
               bitops     1.0-8
                 blob     1.2.4
                 brew    1.0-10
                 brio     1.1.5
                broom     1.0.7
                bslib     0.8.0
               cachem     1.1.0
                ca

In [None]:
# clean up script run to get information
!rm Rscript rcheck_script.R

## Jupyter-associated software versions

The ones shown here that aren't touched upon shouldn't cause issues; however, documenting the versions in the active sessions along with others for now.

In [11]:
!python -c "import sys; print('\n',sys.version);" && jupyter --version && jupyter labextension list


 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Selected Jupyter core packages...
IPython          : 8.26.0
ipykernel        : 6.29.5
ipywidgets       : 8.1.5
jupyter_client   : 8.6.2
jupyter_core     : 5.7.2
jupyter_server   : 2.14.2
jupyterlab       : 4.2.5
nbclient         : 0.10.0
nbconvert        : 7.16.4
nbformat         : 5.10.4
notebook         : 7.2.2
qtconsole        : not installed
traitlets        : 5.14.3
JupyterLab v4.2.5
/srv/conda/envs/notebook/share/jupyter/labextensions
        jupyter-offlinenotebook v0.3.1 [32menabled[0m [31m X[0m
        jupyterlab_pygments v0.3.0 [32menabled[0m [32mOK[0m (python, jupyterlab_pygments)
        @jupyter-notebook/lab-extension v7.2.2 [32menabled[0m [32mOK[0m
        @jupyter-server/resource-usage v1.1.0 [32menabled[0m [32mOK[0m (python, jupyter-resource-usage)
        @jupyter-widgets/jupyterlab-manager v5.0.13 [32menabled[0m [32mOK[0m (python, jupyterlab_widgets)
        @jupyterhu

------

Enjoy.