This is a general guide to bootstrapping and maintaining a complete development environment for working as a curator or developer on the NIF-Ontology, protc, sparc-curation, scibot, etc. For a general introduction to the SPARC curation process see ./background.org The environment bootstrapped by running this file was originally developed on Gentoo, and is portable to other distributions with a few tweaks.
Please report any bugs you find in this file or during the execution of any of the workflows described in this file to the sparc-curation GitHub issue tracker.
Setup takes about 3 hours.
OS level setup takes about and hour, and user setup takes about two hours.
If you do not have root or sudo access or do not administer the computer you are following this guide on you should start at user setup.
If you do have admin access then do the OS level setup first and then come back to the user setup once you are done.
If you are already on a system that has the prerequisites installed start here. If you are not you will find out fairly quickly when the following commands fail.These workflows make extensive use of git. Git needs to know who you are (and so do we) so that it can stash files that you change (for example this file, which logs to itself). Use the email that you will use for curation or development for this. You should not use your primary email account for this because it will get a whole bunch of development related emails.
Run the following in a terminal replacing the examples with the fields that apply to you.
git config --global user.name "FIRST_NAME LAST_NAME"
git config --global user.email "MY_NAME@example.com"
Bootstrapping this setup.org
file
You can run all the code in this setup.org
file automatically
using emacs org-mode. The easiest way to accomplish this is to
install scimax which is an emacs starterkit for scientists and
engineers that has everything we will need. The following steps will do this automatically for you.
All the code blocks in this Bootstrapping section need to be pasted into a terminal (shell) where you are logged in as your user. Run every code block in the order that they appear on this page. Do not skip any blocks. Read all the text between blocks. It will tell you what to do next.
When pasting blocks into the terminal (middles mouse, or C-V
control-shift-v
in the ubuntu terminal)
if you do not copy the last newline of the blocks then you will have to hit enter to run the last command.
# TODO emacs auto setup to be able to run this file
mkdir -p ~/.local/bin
mkdir ~/bin
mkdir ~/opt
mkdir ~/git
mkdir ~/files
source .profile
Run the following block to clone this repository and the scimax
repository.
pushd ~/git
git clone https://github.com/SciCrunch/sparc-curation.git
git clone https://github.com/jkitchin/scimax.git
popd
Run the following command to initialize texlive for your user. It is needed for scimax to install correctly.
tlmgr init-usertree
Run the following commands to create the scimax
command (~/bin/scimax
on linux and macos, ~/bin/scimax.ps1 on windows), and the config file
user.el that is needed for the rest of the process.
echo '(defvar *path-to-setup.org* "~/git/sparc-curation/docs/setup.org")' > vars.el
emacs --batch --load vars.el --load org --load ob-shell --eval '(org-babel-tangle-file *path-to-setup.org*)' --load ~/opt/scimax/user/user.el --eval '(org-babel-tangle-file *path-to-setup.org*)'
rm vars.el
When running the next block scimax
will launch emacs an install a number of packages (DON’T PANIC).
It is normal to see errors during this step. When everything finishes installing you should find
yourself staring at next section of this file Per user setup and can continue
from there in scimax
.
scimax --find-file ~/git/sparc-curation/docs/setup.org --eval "(add-hook 'window-setup-hook (lambda () (org-goto-section *section-per-user-setup*)))"
scimax
and can run the code blocks directly by clicking on a block
and typing C-c C-c
(control c control c). In the default
scimax
setup code blocks will appear as yellow or green.
Note that not all yellow blocks are source code, some may be
examples, you can tell because examples won’t execute and the
start with #+BEGIN_EXAMPLE
instead of #+BEGIN_SRC
.
All the following should be run as your user in scimax
.
If you run these blocks from the command line be sure to run
nameref:remote-exports first.
When you run this block emacs will think for about 3 minutes
as it retrieves everything. You can know that it is thinking
because your mouse will be in thinking mode if you hover over
emacs, and because in the minibuffer window at the bottom of
the window there will be a message saying something to the
effect of Wrote /tmp/babel-nonsense/ob-input-nonsense
.
If an error window appears when running this block just run
it again.
You can also run this block to update an existing installation.
After running this block you can move on to the Configuration files section.
See Developer setup code in the appendix for the source for this block.
This section will create and populate ~/.config/pyontutils/config.yaml, ~/.config/sparcur/config.yaml, and ~/secrets.yaml. They are used to configure the various programs that are used by the SPARC curation workflow, and store the API keys and semi private information such as hypothes.is group names, and google doc ids.NOTE: If you are on macos =~/.config= is =~/Library/Application Support=
NOTE: If you are on windows =~/.config= is =~/AppData/Local= \
The templates below should have already been tangled to the correct locations when setup.org was tangled.
auth-stores:
secrets:
path: '{:user-config-path}/orthauth/secrets.yaml'
auth-variables:
curies:
git-local-base: ~/git
git-remote-base:
google-api-creds-file:
path: google api creds-file
google-api-store-file:
path: google api store-file
google-api-store-file-readonly:
path: google api store-file-readonly
nifstd-checkout-ok:
ontology-local-repo:
ontology-org:
ontology-repo:
patch-config:
resources:
scigraph-api: https://scigraph.olympiangods.org/scigraph
scigraph-api-key:
scigraph-graphload:
scigraph-java:
scigraph-services:
scigraph-start:
scigraph-stop:
scigraph-systemd:
zip-location:
auth-stores:
secrets:
path: '{:user-config-path}/orthauth/secrets.yaml'
auth-variables:
blackfynn-organization:
cache-path:
export-path:
hypothesis-api-key: hypothesis api *replace-me-with-:your-user-name*
hypothesis-group: hypothesis group sparc-curation
hypothesis-user:
log-path:
protocols-io-api-creds-file: protocols-io api creds-file
protocols-io-api-store-file: protocols-io api store-file
If everything works then you should be able to run scig t brain
and get results.
You can move your ~/.config/orthauth/secrets.yaml
to live where ever you want, but you will need to update the auth-stores: secrets: path:
entry in
~/.config/pyontutils/config.yaml and
~/.config/sparcur/config.yaml.
blackfynn:
sparc:
key: fake-api-key
secret: fake-api-secret
google:
api:
creds-file: /path/to/creds-file.json
store-file: google-api-token-rw.pickle
store-file-readonly: google-api-token.pickle
sheets:
sparc-consistency: document-hash-id
sparc-master: document-hash-id
hypothesis:
api:
*replace-me-with-:your-user-name*: fake-api-key
group:
sparc-curation: FakeId12
protocols-io:
api:
creds-file: /path/to/creds-file.json
store-file: protocols-io-api-token-rw.pickle
At this point installation is complete. Congratulations!
You should log out and log back in to your window manager so that any new terminal you open will have access to all the programs you just installed. Logout on the default ubuntu window manager is located in the upper right.
When you you log back in run the following command to start at the next step.
scimax --find-file ~/git/sparc-curation/docs/setup.org --eval "(add-hook 'window-setup-hook (lambda () (org-goto-section *section-accounts-and-api-access*)))"
When you exit emacs it may ask you if you want to save, say yes so that the logs of the install are saved.
NOTE this will cause problems down the line when you try to pull updates for sparc-curation because git will complain.
The next section will walk you through the steps needed to get access to all the various systems holding different pieces of data that we need.
Create accounts, obtain various API keys. After you finish this section you can jump to getting data!.The notation (-> key1 key2 key3)
indicates a path in
your secrets.yaml file.
In a yaml file this looks like the block below.
Replace the fake-value
with the real value you obtain in the following sections.
key1:
key2:
key3: fake-value
You can open the secrets.yaml
file in another buffer by clicking on the link to it here. When you edit the file and
to add api keys you should save it after each one using the file menu or C-x C-x
.
For some use cases you will need access to the SciCrunch production SciGraph endpoint.
Register for an account and
get an api key.
Edit config.yaml
and update the scigraph-api-key: path:
entry to point to scicrunch api name-of-user-or-name-for-the-key
.
Edit secrets.yaml
and add the api key to (-> scicrunch api name-of-user-or-name-for-the-key)
.
Once you have a Blackfynn account on the sparc org go to your
profile
and create an API key. Put they key in (-> blackfynn sparc key)
and the secret in (-> blackfynn sparc secret)
.
While you are there you should also connect your ORCiD. Broken at the moment.
Enable the google sheets API from the google api dashboard. If you need other APIs you can enable them via the library page.
If you do not do this then at the end of the client flow you will receive a =invalid_clientUnauthorized= error.
(-> google api creds-file)
https://developers.google.com/identity/protocols/OAuth2
https://developers.google.com/api-client-library/python/guide/aaa_oauth
You will need to get API access for a OAuth client.
https://console.developers.google.com/apis/credentials
create credentials -> OAuth client ID
Fill in the consent screen, you only need the Application name field.
Download JSON
Add the name of the downloaded JSON file to secrets.yaml
(-> google api creds-file)
. Then run
python ~/git/pyontutils/pyontutils/sheets.py auth sheets
and
python ~/git/pyontutils/pyontutils/sheets.py auth sheets --readonly
.
Those commands will run the auth workflow and create the file specified at (-> google api store-file)
for you.
Get the document ids for the following.
(-> google sheets sparc-master)
(-> google sheets sparc-consistency)
(-> google sheets sparc-affiliations)
(-> google sheets sparc-field-alignment)
Document id matches this pattern https://docs.google.com/spreadsheets/d/{document_id}/edit.
google-chrome-stable https://chrome.google.com/webstore/detail/hypothesis-web-pdf-annota/bjfhmglciegochdpefhhlphglcehbmek
To get Hypothes.is API keys create an account, login, and go to your developer page.
Temporary additions to .bashrc until this can be sourced from secrets directly
HYP_USER=your-hypothesis-user-name
HYP_GROUP=$(cat ~/secrets.yaml | grep sparc-curation: | awk '{ print $2 }')
HYP_API_TOKEN=$(cat ~/secrets.yaml | grep "${HYP_USER}:" | awk '{ print $2 }')
To get protocols.io API keys create an account, login, and go to your developer page. You will need to set the redirect uri on that page to match the redirect uri in the json below.
Use the information from that page to fill in a json file with the structure below.
Add the full path to that json file to (-> protocols-io api creds-file)
in secrets.yaml
like you did for the google json file.
{
"installed": {
"client_id": "pr_live_id_fake-client-id<<<",
"client_secret": "pr_live_sc_fake-client-secret<<<",
"auth_uri": "https://www.protocols.io/api/v3/oauth/authorize",
"token_uri": "https://www.protocols.io/api/v3/oauth/token",
"redirect_uris": [
"https://sparc.olympiangods.org/curation/"
]
}
}
You will be prompted for your protocols.io email and password the first time you run.
If you can use python3.7 (>=ubuntu-19.04) you can set the embedded debugger as follows.
pip install --user pudb
export PYTHONBREAKPOINT=pudb.set_trace
~/.vimrc settings to prevent klobbering of xattrs
augroup HasXattrs
autocmd BufRead,BufNewFile * let x=system('getfattr ' . bufname('%')) | if len(x) | call HasXattrs() | endif
augroup END
function HasXattrs()
" don't create new inodes
setlocal backupcopy=yes
endfunction
root
.
They only need to be run once.
app-editors/emacs
app-editors/gvim
app-text/texlive
dev-vcs/git
dev-scheme/racket
dev-lisp/sbcl
www-client/google-chrome-stable
18.10 cosmic cuttlefish (and presumably other debian derivatives)
The following need to be run in a shell where you have root (e.g. via sudo su -
).
apt install openssh-server net-tools
Add your ssh public key to ~/.ssh/authorized_keys if you want to run this remotely.
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' \
>> /etc/apt/sources.list.d/google-chrome.list
add-apt-repository ppa:plt/racket
add-apt-repository ppa:kelleyk/emacs
add-apt-repository ppa:pypy/ppa
apt update
apt install build-essential lib64readline-dev rxvt-unicode htop attr tree sqlite curl git
apt install emacs26 vim-gtk3 texlive-full pandoc hunspell
apt install librdf0-dev python3-dev python3-pip pypy3 jupyter racket sbcl r-base r-base-dev maven
apt install inkscape gimp krita graphviz firefox google-chrome-stable xfce4
apt install nginx
update-alternatives --install /usr/bin/python python /usr/bin/python3 10
update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 10
Ubuntu struggles to set user specific PATHs correctly via
~/.profile
This code works when the user logs in. It does not
work correctly if you su
to the user. Not entirely sure why.
Doesn’t work on xfce either apparently. The absolute madness.
{ cat <<EOL # set PATH so it includes user's private bin if it exists if [ -d "$HOME/bin" ] ; then PATH="$HOME/bin:$PATH" fi # set PATH so it includes user's private bin if it exists if [ -d "$HOME/.local/bin" ] ; then PATH="$HOME/.local/bin:$PATH" fi EOL } > /etc/profile.d/user-home-paths.sh
Other software that you will probably need at some point but that is not packaged on ubuntu.
augpathlib
makes extensive use of symlinks to store metadata for remote files
that have not been downloaded. By default normal users cannot create symlinks on
windows. The best way to fix this is by granting the user that will run sparcur
permission to create symlinks (NOT to run the process as Administrator).
Three relevant links: stackoverflow superuser powershell script source.
You will need to log out and log back in for the setting to take effect.
You can use gpedit.msc
to grant these permissions by adding the user
by navigating the menu tree below. You can run gpedit.msc
directly
with Win-r
or often Win gpedit enter
.
Computer configuration └── Windows Settings └── Security Settings └── Local Policies └── User Rights Assignment Create symbolic links
Alternately you can define and run the function below as Administrator.
Run it as addSymLinkPermissions("user-to-add")
.
function addSymLinkPermissions($accountToAdd){
Write-Host "Checking SymLink permissions.."
$sidstr = $null
try {
$ntprincipal = new-object System.Security.Principal.NTAccount "$accountToAdd"
$sid = $ntprincipal.Translate([System.Security.Principal.SecurityIdentifier])
$sidstr = $sid.Value.ToString()
} catch {
$sidstr = $null
}
Write-Host "Account: $($accountToAdd)" -ForegroundColor DarkCyan
if( [string]::IsNullOrEmpty($sidstr) ) {
Write-Host "Account not found!" -ForegroundColor Red
exit -1
}
Write-Host "Account SID: $($sidstr)" -ForegroundColor DarkCyan
$tmp = [System.IO.Path]::GetTempFileName()
Write-Host "Export current Local Security Policy" -ForegroundColor DarkCyan
secedit.exe /export /cfg "$($tmp)"
$c = Get-Content -Path $tmp
$currentSetting = ""
foreach($s in $c) {
if( $s -like "SECreateSymbolicLinkPrivilege*") {
$x = $s.split("=",[System.StringSplitOptions]::RemoveEmptyEntries)
$currentSetting = $x[1].Trim()
}
}
if( $currentSetting -notlike "*$($sidstr)*" ) {
Write-Host "Need to add permissions to SymLink" -ForegroundColor Yellow
Write-Host "Modify Setting ""Create SymLink""" -ForegroundColor DarkCyan
if( [string]::IsNullOrEmpty($currentSetting) ) {
$currentSetting = "*$($sidstr)"
} else {
$currentSetting = "*$($sidstr),$($currentSetting)"
}
Write-Host "$currentSetting"
$outfile = @"
[Unicode]
Unicode=yes
[Version]
signature="`$CHICAGO`$"
Revision=1
[Privilege Rights]
SECreateSymbolicLinkPrivilege = $($currentSetting)
"@
$tmp2 = [System.IO.Path]::GetTempFileName()
Write-Host "Import new settings to Local Security Policy" -ForegroundColor DarkCyan
$outfile | Set-Content -Path $tmp2 -Encoding Unicode -Force
Push-Location (Split-Path $tmp2)
try {
secedit.exe /configure /db "secedit.sdb" /cfg "$($tmp2)" /areas USER_RIGHTS
} finally {
Pop-Location
}
} else {
Write-Host "NO ACTIONS REQUIRED! Account already in ""Create SymLink""" -ForegroundColor DarkCyan
Write-Host "Account $accountToAdd already has permissions to SymLink" -ForegroundColor Green
return $true;
}
}
You can skip this if you will only be using the windows computer locally. In a local administrator powershell install OpenSSH. The rest can then be done remotely.
Get-WindowsCapability -Online | ? Name -like 'OpenSSH*'
Add-WindowsCapability -Online -Name OpenSSH.Client~~~~0.0.1.0
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0
Set-Service sshd -StartupType Automatic
Start-Service sshd
# add your ssh key to %programdata%\ssh\administrators_authorized_keys
# disable password login in %programdata%\ssh\sshd_config
Restart-Service sshd
For managing a windows development/curation environment I highly recommend using the chocolatey package manager. Install chocolatey.
choco install `
autohotkey `
clisp `
emacs `
firefox `
GoogleChrome `
poshgit `
python `
racket `
vim
Update system Path to include packages that don’t add themselves. This needs to be run as administrator.
$path = [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine)
$prefix_path = "C:\Program Files\Racket;C:\Program Files\Git\cmd;C:\Program Files\Git\bin;"
[Environment]::SetEnvironmentVariable("Path",
$prefix_path + $path,
[EnvironmentVariableTarget]::Machine)
If you are logged in remotely restarting sshd is the easiest way to refresh the environment so commands are in PATH. This is because new shells inherit the environment of sshd at the time that it was started.
Restart-Service sshd
You will need to reconnect to a new ssh session in order to have access to git and other newly installed commands.
https://www.tug.org/texlive/windows.html https://www.tug.org/texlive/acquire-netinstall.html http://mirror.ctan.org/systems/texlive/tlnet/install-tl-windows.exe This takes quite a while, about 50 mins on a good connection with a fast computer.
https://github.com/protegeproject/protege-distribution/releases/latest
rdf tools http://librdf.org/raptor/INSTALL.html https://github.com/dajobe/raptor Unfortunately to get the latest version of these it seems you have to build them yourself.
add to PATH so we can just link everything there
%HOMEPATH%\bin
%APPDATA%\Python\Python37\Scripts
TODO -l %HOMEPATH%/opt/scimax/init.el setup.org
in the shortcut …
also %HOMEPATH%
for the start in …
You can skip this if you will only be using the osx computer locally.
sudo systemsetup -setremotelogin on
# scp your key over to ~/.ssh/authorized_keys
# set PasswordAuthentication no in /etc/ssh/sshd_config
# set ChallengeResponseAuthentication no in /etc/ssh/sshd_config
sudo launchctl unload /System/Library/LaunchDaemons/ssh.plist
sudo launchctl load -w /System/Library/LaunchDaemons/ssh.plist
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/5ecca39372cffdc4c9fbacee6e22328a0dc61eac/install)"
brew cask install \
emacs \
firefox \
gimp \
google-chrome \
inkscape \
krita \
mactex \
macvim \
protege \
racket
brew install \
coreutils \
curl \
git \
htop \
hunspell \
pandoc \
postgres \
python \
redland \
rxvt-unicode \
sbcl \
sqlite \
tree
Add the following to your ~/.bash_profile
# This file is sourced by bash for login shells. The following line
# runs your .bashrc and is recommended by the bash info pages.
[[ -f ~/.bashrc ]] && . ~/.bashrc
Add the following to your ~/.bashrc
export PATH=${HOME}/bin:${HAOME}/Library/Python/3.7/bin:${PATH}
Run the following to symlink python3 to python
mkdir ~/bin
ln -s /usr/local/bin/python3 ~/bin/python
ln -s /usr/local/bin/pip3 ~/bin/pip
pushd ~/git
for d in $(ls); do if [ -d $d/.git ]; then pushd $d; git pull || break; popd; fi; done
popd
- DO NOT USE
cp -a
copy files with xattrs!
INSTEAD usersync -X -u -v
. \cp
does not remove absent fields from xattrs of the file previously occupying that name! OH NO (is this acp
bug!?)
pushd ~/files/blackfynn_local/
spc clone ${SPARC_ORG_ID} # initialize a new repo and pull existing structure
scp refresh -f
spc fetch # actually download files
spc find -n '*.xlsx' -n '*.csv' -n '*.tsv' -n '*.msexcel' # see what to fetch
spc find -n '*.xlsx' -n '*.csv' -n '*.tsv' -n '*.msexcel'-f # fetch
spc find -n '*.xlsx' -n '*.csv' -n '*.tsv' -n '*.msexcel'-f -r 10 # slow down you are seeing errors!
ls -Q | xargs -P10 -r -n 1 sh -c 'spc refresh -r 4 "${1}"'
find -maxdepth 1 -type d -name '[C-Z]*' -exec spc refresh -r 8 {} \;
find \( -name '*.xlsx' -o -name '*.csv' -o -name '*.tsv' \) -exec ls -hlS {} \+
find -maxdepth 1 -type d -exec rmdir {} \;
Pull local copy of data to a new computer. Note the double escape needed for the space.
rsync -X -u -v -r -e ssh ${REMOTE_HOST}:/home/${DATA_USER}/files/blackfynn_local/SPARC\\\ Consortium ~/files/blackfynn_local/
-X
copy extended attributes
-u
update files
-v
verbose
-r
recursive
-e
remote shell to use
fetching a whole dataset or a subset of a dataset
spc ** -f
pushd ${SPARCDATA}
spc export datasets
popd
Setup as root
mkdir -p /var/www/sparc/sparc/archive/exports/
chown -R nginx:nginx /var/www/sparc
# export vs exports, no wonder this is so confusing >_<
function sparc-export-to-server () {
: ${SPARCUR_EXPORTS:=/var/lib/sparc/.local/share/sparcur/export}
EXPORT_BASE=${SPARCUR_EXPORTS}/N:organization:618e8dd9-f8d2-4dc4-9abb-c6aaab2e78a0/
FOLDERNAME=$(readlink ${EXPORT_BASE}/LATEST)
FULLPATH=${EXPORT_BASE}/${FOLDERNAME}
pushd /var/www/sparc/sparc
cp -a "${FULLPATH}" archive/exports/ && chown -R nginx:nginx archive && unlink exports ; ln -sT "archive/exports/${FOLDERNAME}" exports
popd
}
spc report completeness
spc server --latest --count
keywords = sorted(set([k for d in asdf['datasets'] if 'meta' in d and 'keywords' in d['meta']
for k in d['meta']['keywords']]))
tar
is the only one of the ‘usual’ suspects for file archiving that
supports xattrs, zip
cannot.
tar --force-local --xattrs -cvzf 2019-07-17T10\:44\:16\,457344.tar.gz '2019-07-17T10:44:16,457344/'
tar --force-local --xattrs -xvzf 2019-07-17T10\:44\:16\,457344.tar.gz
find 2019-07-17T10\:44\:16\,457344 -exec getfattr -d {} \;
function sparc-copy-pull () {
: ${SPARC_PARENT:=${HOME}/files/blackfynn_local/}
local TODAY=$(date +%Y%m%d)
pushd ${SPARC_PARENT} &&
mv SPARC\ Consortium "SPARC Consortium_${TODAY}" &&
rsync -ptgo -A -X -d --no-recursive --exclude=* "SPARC Consortium_${TODAY}/" SPARC\ Consortium &&
mkdir SPARC\ Consortium/.operations &&
mkdir SPARC\ Consortium/.operations/trash &&
rsync -X -u -v -r "SPARC Consortium_${TODAY}/.operations/objects" SPARC\ Consortium/.operations/ &&
pushd SPARC\ Consortium &&
spc pull || echo "spc pull failed"
popd
popd
}
jq -r '[ .datasets[] |
{id: .id,
name: .meta.folder_name,
se: [ .status.submission_errors[].message ] | unique,
ce: [ .status.curation_errors[].message ] | unique } ]' curation-export.json
Get a list of all file extensions for symlinks (usually data)
find -type l | grep -o '\(\.[a-zA-Z0-9]\+\)\+$' | sort -u
Find datasets with unknown file types.
find -name '*.roi' -exec spc meta --context --uri {} \+
This is slow, but prototypes functionality useful for the curators.
find -type d -not -name 'ephys' -name 'ses-*' -exec bash -c \
'pushd $1 1>/dev/null; pwd >> ~/manifest-stuff.txt; spc report size --tab-table ./* >> ~/manifest-stuff.txt; popd 1>/dev/null' _ {} \;
See also the sparcur developer guild
Commit any changes and push to master.
make-template-zip () {
local CLEANROOM=/tmp/cleanroom/
mkdir ${CLEANROOM} || return 1
pushd ${CLEANROOM}
git clone https://github.com/SciCrunch/sparc-curation.git &&
pushd ${CLEANROOM}/sparc-curation/resources
zip -r DatasetTemplate.zip DatasetTemplate
mv DatasetTemplate.zip ${CLEANROOM}
popd
rm -rf ${CLEANROOM}/sparc-curation
popd
}
make-template-zip
Once that is done open /tmp/cleanroom/DatasetTemplate.zip in file-roller
or similar
and make sure everything is as expected.
Create the GitHub release. The tag name should have the format dataset-template-1.1
where
the version number should match the metadata version embedded in
dataset_description.xlsx.
Minor versions such as dataset-template-1.2.1
are allowed.
Attach ${CLEANROOM}/DatasetTemplate.zip
as a release asset.
Update
https://github.com/Blackfynn/docs.sparc.science/blob/master/pages/data_submission/submit_data.md
https://github.com/Blackfynn/docs.sparc.science/blob/master/pages/sparc_portal/sparc_data_format.md
and
with the new link.
Link to the local copy.
Link to the local copy.
Use inspect.getclasstree
along with pyontutils.utils.subclasses
to display hierarchies of classes.
from inspect import getclasstree
from pyontutils.utils import subclasses
from IPython.lib.pretty import pprint
# classes to inspect
import pathlib
from sparcur import paths
def class_tree(root):
return getclasstree(list(subclasses(root)))
pprint(class_tree(pathlib.PurePosixPath))
View the latest log file with colors using less
.
less -R $(ls -d ~sparc/files/blackfynn_local/export/log/* | tail -n 1)
For a permanent fix for less
add
alias less='less -R'
You have an error!
maybe_size = c.cache.meta.size # << AttributeError here
Modify to wrap code
try:
maybe_size = c.cache.meta.size
except AttributeError as e:
breakpoint() # << investigate error
Temporary squash by logging as an exception with optional explanation
try:
maybe_size = c.cache.meta.size
except AttributeError as e:
log.exception(e)
log.error(f'explanation for error and local variables {c}')
If a dataset is removed, just move it manually to trash IF it is clear that it was supposed to be removed, otherwise to consult the curation team. You can confirm that it was actually removed by checking Blackfynn directly using DATASETID from the error trace.
spc meta -u "$(spc goto ${DATASETID})"
Example trace.
Future exception was never retrieved future: <Future finished exception=Exception("No dataset matching name or ID 'N:dataset:83e0ebd2-dae2-4ca0-ad6e-81eb39cfc053'.",)> Traceback (most recent call last): File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run result = self.fn(*self.args, **self.kwargs) File "/var/lib/sparc/git/pyontutils/pyontutils/utils.py", line 416, in <lambda> generator = (lambda:list(limited_gen(chunk, smooth_offset=(i % lc)/lc, time_est=time_est, debug=debug, thread=i)) # this was the slowdown culpret File "/var/lib/sparc/git/pyontutils/pyontutils/utils.py", line 455, in limited_gen yield element() File "/var/lib/sparc/git/pyontutils/pyontutils/utils.py", line 376, in inner return function(*args, **kwargs) File "/var/lib/sparc/git/sparc-curation/sparcur/paths.py", line 1156, in refresh size_limit_mb=size_limit_mb) File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 816, in refresh old_meta = self.meta File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 872, in meta return PathMeta(size=self.size, File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 603, in size if isinstance(self.bfobject, File): File "/var/lib/sparc/git/sparc-curation/sparcur/backends.py", line 401, in bfobject bfobject = self._api.get(self._seed) File "/var/lib/sparc/git/sparc-curation/sparcur/blackfynn_api.py", line 795, in get thing = self.bf.get_dataset(id) # heterogenity is fun! File "/var/lib/sparc/.local/lib/python3.6/site-packages/blackfynn/client.py", line 231, in get_dataset raise Exception("No dataset matching name or ID '{}'.".format(name_or_id)) Exception: No dataset matching name or ID 'N:dataset:83e0ebd2-dae2-4ca0-ad6e-81eb39cfc053'. sparc@cassava:~/files/blackfynn_local/SPARC Consortium$ spc goto 'N:dataset:83e0ebd2-dae2-4ca0-ad6e-81eb39cfc053' Hackathon Team Materials sparc@cassava:~/files/blackfynn_local/SPARC Consortium$ mv Hackathon\ Team\ Materials ../.trash/ sparc@cassava:~/files/blackfynn_local/SPARC Consortium$ spc pullIf you make any changes to this section be sure to run
#+SRC
and #+CALL:
blocks below.
GitHub repositories
augpathlib idlib hyputils orthauth ontquery parsercomb pyontutils protc rrid-metadata rkdf orgstrap |
NIF-Ontology scibot sparc-curation |
Ophirr33/pda zussitarze/qrcode |
Repository local roots. The ordering of the entries matters.
augpathlib idlib pyontutils/htmlfn pyontutils/ttlser hyputils orthauth ontquery parsercomb pyontutils pyontutils/nifstd pyontutils/neurondm protc/protcur sparc-curation scibot |
qrcode/ pda/ protc/protc-lib protc/protc-tools-lib protc/protc protc/protc-tools rkdf/rkdf-lib rkdf/rkdf rrid-metadata/rrid NIF-Ontology/ |
from itertools import chain
urs = chain((('tgbugs', r) for tr in trl for rs in tr for r in rs.split(' ')),
(('SciCrunch', r) for sr in srl for rs in sr for r in rs.split(' ')),
(ur.split('/') for o_r in orl for urs in o_r for ur in urs.split(' ')))
#print(trl, srl, orl)
#print(list(urs)) # will express the generator so there will be no result
out = []
for user, repo in urs:
out.append(f'https://github.com/{user}/{repo}')
return [' '.join(out)]
for repo in ${REPOS}; do echo ${repo}; done
echo '-------------'
for repo in ${PYROOTS}; do echo ${repo}; done
echo '-------------'
for repo in ${RKTROOTS}; do echo ${repo}; done
export REPOS='
https://github.com/tgbugs/augpathlib
https://github.com/tgbugs/idlib
https://github.com/tgbugs/hyputils
https://github.com/tgbugs/orthauth
https://github.com/tgbugs/ontquery
https://github.com/tgbugs/parsercomb
https://github.com/tgbugs/pyontutils
https://github.com/tgbugs/protc
https://github.com/tgbugs/rrid-metadata
https://github.com/tgbugs/rkdf
https://github.com/tgbugs/orgstrap
https://github.com/SciCrunch/NIF-Ontology
https://github.com/SciCrunch/scibot
https://github.com/SciCrunch/sparc-curation
https://github.com/Ophirr33/pda
https://github.com/zussitarze/qrcode
'
export PYROOTS='
augpathlib
idlib
pyontutils/htmlfn
pyontutils/ttlser
hyputils
orthauth
ontquery
parsercomb
pyontutils
pyontutils/nifstd
pyontutils/neurondm
protc/protcur
sparc-curation
scibot
'
export RKTROOTS='
qrcode/
pda/
protc/protc-lib
protc/protc-tools-lib
protc/protc
protc/protc-tools
rkdf/rkdf-lib
rkdf/rkdf
rrid-metadata/rrid
NIF-Ontology/
'
Tangle the following blocks with C-c C-v C-t
in vanilla emacs or paste it into scimax’s
;; silence ob-ipython complaining about missing command
;; THIS CAN CAUSE RUNTIME ERRORS
(setq ob-ipython-html-to-image-program "/dev/null")
;; requires
(require 'cl) ;; needed for case
;; org goto heading
(defun org-goto-section (heading)
"\`heading' should be a string matching the desired heading"
(goto-char (org-find-exact-headline-in-buffer heading)))
;; workaround for powershell cmd windows braindead handling of strings
(defvar *section-per-user-setup* "Per user setup")
(defvar *section-accounts-and-api-access* "Accounts and API access")
;; recenter a line set using --eval to be at the top of the buffer
(add-hook 'emacs-startup-hook (lambda () (recenter-top-bottom 0)))
;; line numbers so it is harder to get lost in a big file
(when (>= emacs-major-version 26)
(setq display-line-numbers-grow-only 1)
(global-display-line-numbers-mode 1))
;; open setup.org symlink without prompt
(setq vc-follow-symlinks 1)
;; sane python indenting
(setq-default indent-tabs-mode nil)
(setq tab-width 4)
(setq org-src-preserve-indentation nil)
(setq org-src-tab-acts-natively nil)
;; don't hang on tlmgr since it is broken on ubuntu
(setq scimax-installed-latex-packages t)
;; save command history
(setq history-length t)
(savehist-mode 1)
(setq savehist-additional-variables '(kill-ring search-ring regexp-search-ring))
;; racket
(when (fboundp 'use-package)
(use-package racket-mode
:mode "\\.ptc\\'" "\\.rkt\\'" "\\.sxml\\'"
:bind (:map racket-mode-map
("<f5>" . recompile-quietly))
:init
(defun my/buffer-local-tab-complete ()
"Make \`tab-always-indent' a buffer-local variable and set it to 'complete."
(make-local-variable 'tab-always-indent)
(setq tab-always-indent 'complete))
(defun rcc ()
(set (make-local-variable 'compile-command)
(format "raco make %s" (file-name-nondirectory buffer-file-name))))
(add-hook 'racket-mode-hook 'rcc)
(add-hook 'racket-mode-hook 'hs-minor-mode)
(add-hook 'racket-mode-hook 'goto-address-mode)
(add-hook 'racket-mode-hook 'my/buffer-local-tab-complete)
(add-hook 'racket-repl-mode-hook 'my/buffer-local-tab-complete)))
;; config paths
(defun config-paths (&optional os)
(case (or os system-type)
;; ucp udp uchp ulp
(gnu/linux '("~/.config"
"~/.local/share"
"~/.cache"
"~/.cache/log"))
(darwin '("~/Library/Application Support"
"~/Library/Application Support"
"~/Library/Caches"
"~/Library/Logs"))
(windows-nt (let ((ucp "~/AppData/Local"))
(list ucp ucp ucp (concat ucp "/Logs"))))
(otherwise (error (format "Unknown OS %s" (or os system-type))))))
(eval-when-compile (defvar *config-paths* (config-paths)))
(defun fcp (position &optional suffix)
(let ((base-path (funcall position *config-paths*)))
(if suffix
(format "%s/%s" base-path suffix)
base-path)))
(defun user-config-path (&optional suffix) (fcp #'first suffix))
(defun user-data-path (&optional suffix) (fcp #'second suffix))
(defun user-cache-path (&optional suffix) (fcp #'third suffix))
(defun user-log-path (&optional suffix) (fcp #'fourth suffix))
;; vim bindings if you need them
;; if undo-tree fails to install for strange reasons M-x list-packages C-s undo-tree
;; to manually install, mega gnu elpa weirdness
(setq evil-want-keybinding nil)
(when (fboundp 'use-package)
(require 'scimax-evil))
emacs -q -l ~/opt/scimax/init.el $args
emacs -q -l ~/opt/scimax/init.el $@
# implicit check for bash by being able to run this block at all
# git check on the off chance that we made it here without cloning this repo
git --version || exit 1
# python version check
python -c "print('python ok') if __import__('sys').version_info.major >= 3 else __import__('sys').exit(1)" || exit 2
pip --version || exit 3
# git email check
[[ -n "$(git config --list | grep user.email)" ]] || exit 4
pushd ~/git
for repo_url in ${REPOS}; do git clone ${repo_url}.git 2>&1; done
popd
pushd ~/git
for repo in ${PYROOTS}; do pushd ${repo}; pip install --user --editable . 2>&1 || break; popd; done
popd
ln -s ~/git/rkdf/bin/ttl-to-rkt ~/bin/ttl-to-rkt
ln -s ~/git/rkdf/bin/rkdf-convert-all ~/bin/rkdf-convert-all
pushd ~/git/NIF-Ontology
git checkout dev
rkdf-convert-all
git checkout master
popd
pushd ~/git
raco pkg install --skip-installed --auto --batch ${RKTROOTS} 2>&1
popd
Paste the results of this block into your shell if you are running the code from this file by pasting it into a terminal.
*NOTE: DO NOT EDIT THE CODE BELOW IT WILL BE OVERWRITTEN.*