Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into develop
Browse files Browse the repository at this point in the history
# Conflicts:
#	docs/source/Access/GettingAccess.md
#	docs/source/SLURM/SLURMIntro.md
#	docs/source/index.rst
#	docs/source/software/ansys.rst
#	docs/source/software/jupyter.rst
#	docs/source/software/singularity.rst
#	docs/source/system/deepthoughspecifications.md
  • Loading branch information
The-Scott-Flinders committed Nov 29, 2021
2 parents ff91d92 + e5455e2 commit 71f179e
Show file tree
Hide file tree
Showing 7 changed files with 56 additions and 77 deletions.
15 changes: 15 additions & 0 deletions .idea/git_toolbox_prj.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file.
15 changes: 9 additions & 6 deletions docs/source/SLURM/SLURMIntro.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,9 +257,9 @@ Notice that the $TMP directories are different for every step in the array? This

To reiterate the warning above - if you leave anything in the $TMP or $SHM Directories, SLURM will delete it at the end of the job, so make sure you move any results out to /scratch or /home.

### Filename Patterns
### Filename Patterns

Some commands will take a filename. The following modifiers will allow you to generate files that are substituted with different variables controlled by SLURM.
Some commands will take a filename. THe following modifiers will allow you to generate files that are substituted with different variables controlled by SLURM.

| Symbol | Substitution |
|-|-|
Expand All @@ -276,6 +276,7 @@ Some commands will take a filename. The following modifiers will allow you to g
|%u|User name. |
|%x|Job name. |


## SLURM: Extras

Here is an assortment of resources that have been passed on to the Support Team as 'Useful to me'. Your mileage may vary on how useful you find them.
Expand All @@ -288,7 +289,8 @@ Besides useful commands and ideas, this [FAQ](http://www.ceci-hpc.be/slurm_faq.h

An excellent guide to [submitting jobs](https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html).

## SLURM: Script Template

## SLURM: Script Template

#!/bin/bash
# Please note that you need to adapt this script to your job
Expand Down Expand Up @@ -321,8 +323,8 @@ An excellent guide to [submitting jobs](https://support.ceci-hpc.be/doc/_content
# %j will append the 'Job ID' from SLURM.
# %x will append the 'Job Name' from SLURM
# %
#SBATCH --output=/home/<FAN>/%x-%j.out.txt
#SBATCH --error=/home/<FAN>/%x-%j.err.txt
#SBATCH --output=/home/$FAN/%x-%j.out.txt
#SBATCH --error=/home/$FAN/%x-%j.err.txt
##################################################################
# The default partition is 'general'.
# Valid partitions are general, gpu and melfu
Expand Down Expand Up @@ -426,6 +428,7 @@ An excellent guide to [submitting jobs](https://support.ceci-hpc.be/doc/_content

# Using the example above with a shared dataset directory, your final step
# in the script should remove the directory folder
# rm -rf /local/$SLURM_JOBID
# rm -rf $DATADIR

##################################################################

65 changes: 25 additions & 40 deletions docs/source/Storage/storageusage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,30 @@ The HPC is a little different that your desktop at home when it comes to storage
So, before we start putting files onto the HPC, its best you know where to put them in the first place.

On DeepThought, are two main storage tiers. Firstly our bulk storage is the 'Scratch' area - and is slower, spinning Hard-Disk Drives (HDD's).
The hyper-fast NVMe Solid-State Drives (SSD's) are located at /local and are much smaller. For the exact specifications and capacities, see the `System Specifications`_.
The smaller, hyper-fast NVMe Solid-State Drives (SSD's) are located at /local and is much smaller. For the exact specifications and capacities, see the `System Specifications`_.

There is a critical difference between these two locations. The /scratch area is a common storage area. You can access it from all of the login, management and compute nodes on the HPC. This is not the same as /local, which is only available on each compute node. That is - if your job is running on Node001, the /local only exists on that particular node - you cannot access it anywhere else on the HPC.
There is a critical difference between these two locations. The /scratch area is a common storage area. You can access it from all of the login, management and compute nodes on the HPC. This is not the same as /local, which is only available on each compute node. That is - if you job is running on Node001, the /local only exists on that particular node - you cannot access it anywhere else on the HPC.

.. attention:: The HPC Job & Data Workflow, along with links to the new Data-Workflow Management Portal are under construction and will be linked here when completed.

################################
Storage Accessibility Overview
################################
As general guide, the following table presents the overall storage for the HPC.

+---------------------+--------------------------+-------------------------------------+
| Filesystem Location | Accessible From | Capacity |
+=====================+==========================+=====================================+
| /scratch | All Nodes | ~250TB |
+---------------------+--------------------------+-------------------------------------+
| /home | All Nodes | ~12TB |
+---------------------+--------------------------+-------------------------------------+
| /local | Individual Compute Nodes | ~400GB or ~1.5TB |
+---------------------+--------------------------+-------------------------------------+
| /r_drive/\<folder> | Head Nodes | N/A |
+---------------------+--------------------------+-------------------------------------+
| /RDrive | Head Nodes | Variable, Size of R-Drive Allocation|
+---------------------+--------------------------+-------------------------------------+

.. attention:: /The r_drive/ location is NOT the University R:\\ Drive. It is a remnant from eRSA that is being phased out to the University R:\\ Drive.
+---------------------+--------------------------+-----------------------+
| Filesystem Location | Accessible From | Capacity |
+=====================+==========================+=======================+
| /scratch | All Nodes | ~250TB |
+---------------------+--------------------------+-----------------------+
| /home | All Nodes | ~12TB |
+---------------------+--------------------------+-----------------------+
| /local | Individual Compute Nodes | ~400GB or ~1.5TB |
+---------------------+--------------------------+-----------------------+
| /r_drive/\<folder> | Head Nodes | N/A |
+---------------------+--------------------------+-----------------------+

.. attention:: /The r_drive/ location is NOT the University R:\\ Drive - the ability to move data between R:\\ and the HPC is currently undergoing testing.

The /r_drive/ locations are data mount points from the now defunct eRSA Project and are slowly being phased out. Any point under /r_drive/ will *auto mount on access*. Just attempt to touch or change to the correct directory under the /r_drive/ path and the HPC will handle this automatically for you. Until you do this, the directory **will be invisible**.

Expand All @@ -47,12 +47,11 @@ Your 'home' directories. This is a small amount of storage to store your small b
^^^^^^^^^^^^^^^^^^^^^^^^
What to store in /home
^^^^^^^^^^^^^^^^^^^^^^^^
Here is a rough guide as to what should live in your /home/$FAN directory. In general, you want to store small miscellaneous files here.
Here is a rough guide as to what should live in your /home/$FAN directory. In general, you want small, little things is here.

1. SLURM Scripts
2. Results from Jobs.
3. 'Small' Data-Sets (<5GB)
4. Self-Installed Programs or Libraries
* SLURM Scripts
* Results from Jobs.
* 'Small' Data-Sets (<5GB)

==========
/Scratch
Expand All @@ -65,32 +64,18 @@ What to store in /scratch
^^^^^^^^^^^^^^^^^^^^^^^^^^

Here is a rough guide as to what should live in your /scratch/$FAN directory. In general, anything large, bulky and only needed for a little while should go here.
1. Job Working Data-sets
2. Large Intermediate Files

* Job Working Data-sets
* Large Intermediate Files

=========
/Local
=========

.. _SLURM Temporary Directories: ../SLURM/SLURMIntro.html#tmpdir-and-slurm-job-arrays
Local is the per-node, high speed flash storage that is specific to each node. When running a job, you want to run your data-sets on /local if at all possible - its the quickest storage location on the HPC. You MUST cleanup /local once you are done.

^^^^^^^^^^^^^^^^^^^^^^^^^
What to Store in /local
^^^^^^^^^^^^^^^^^^^^^^^^^

Only *transient files* should live on /local. Anything that your job is currently working on should be on /local. Once your job has finished with these files, they should be copied (or moved) to /scratch. The directory you were working in on /local should then cleaned, removing all files.

The HPC creates a /local directory for you per job that can be used in your SLURM scripts. This is covered in more detail in `SLURM Temporary Directories`_.

===========
/RDrive
===========

/RDrive is the location for all RDrive allocations. The HPC will discover and automatically display any RDrive Folders you have access to.

All /RDrive mount points are only surfaced on the Head-Node. The /RDrive is not present on the compute nodes and you cannot use it as a part of your SLURM scripts.

The /RDrive is not a location to perform any computation on, and is limited in access speed. All data that forms part of dataset for calculations
must be copied to a HPC local mount before you commence work.
Only *transient files* should live on /local. Anything that your job is currently working on should be on /local. Once your job has finished with these files, they should be copied (or moved) to /scratch. The directory you were working in on /local should then cleaned, removing all files from your job.
6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The new Flinder HPC is called DeepThought. This new HPC comprises of AMD EPYC ba

.. _Upgrade Migration Information: migration/upgrademigration.html

.. warning::
.. warning::
DeepThought has recently undergone a series of upgrades that require some user intervention when utlising the upgraded cluster.
Please see `Upgrade Migration Information`_ for actions required.

Expand Down Expand Up @@ -47,14 +47,14 @@ The following files are provided for integration into your reference manager of
- RIS_


Table of Contentss
Table of Contents
====================

.. toctree::
:maxdepth: 1
:caption: User Documentation

Access/accessrequest.rst
Access/accessrequest.rst
Access/windows.rst
Access/unix.rst
Storage/storageusage.rst
Expand Down
8 changes: 4 additions & 4 deletions docs/source/policies/fairuse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ For example, if the HPC had 1000 'shares' that represent its resources, the foll
* 100 'shares' for General


.. _Storage Guidelines: ../Storage/storageusage.html
.. _StorageGuidelines: ../Storage/storageusage.html
.. _Module System: ../ModuleSystem/LMod.html

Storage Usage Guidelines
============================
See the page at: `Storage Guidelines`_, for details on what storage is present and other details. A general reminder for HPC Storage:
See the page at: `StorageGuidelines`_, for details on what storage is present and other details. A general reminder for HPC Storage:

- All storage is *volatile* and no backup systems are in place
- Cleanup you /home and /scratch regularly
- Cleanup and /local storage you used at the end of each job. We attempt to do this automatically, but cannot catch everything.
- Cleanup and /local storage you used at the end of each job


Software Support Guidelines
Expand Down Expand Up @@ -61,7 +61,7 @@ As an example, some of the most used program on the HPC are:
* R
* Python 3.8
* RGDAL
* CUDA Toolkit
* CUDA 10.1 Toolkit

While not an exhaustive list of the common software, it does allow the team to focus our efforts and provide more in-depth support for these programs.
This means they are usually first to be updated and have a wider range of tooling attached to them by default.
Expand Down
24 changes: 0 additions & 24 deletions docs/source/upgrades/updatelog.rst

This file was deleted.

0 comments on commit 71f179e

Please sign in to comment.