# Introduction

This notebook will cover detals about connecting to remote systems as well as transfering files to and from remote systems using a common set of file transfer protocols.

# Remote Connections via SSH

Computing tasks can often be impractical to run on local machines, whether this is as simple as storing the files or processing data, quite often this proves too much for a local machine to carry out.

In this case, there are computing resources avaliable at the university to support bigger computing jobs:
- Storage for handling large data volumes (/shared/)
- Heavy duty machines for processing (Research nodes)
- Large clusters for parallel processing (Viking)

The university filestores are designed for storage and access of work files. These are avaliable on all university-managed systems and can be mounted on others. All of these file stores are regularly backed up. The available filestores consist of;
- *Personal* - File store of around 5GB
- *Shared* - Shared among department, groups, etc with a 300GB personal quota
  - If needed, you can request a larger personal quota
- *Vault* - Longer term archival and long-term storage used for data preservation

A common way of accessing UNIX systems remotely is via secure shell, usually shortened to just

```
ssh
```

which as it turns out, is also the command we will utilise.


To gain remote access from off campus, you first need to login to the ssh gateway using `ssh -Y USERNAME@ssh.york.ac.uk`

It is often helpful to set up the following alias to speed up the ssh connection to the machine:

- `alias york "ssh USERNAME@ssh.york.ac.uk"`

An example connection command is included in the cell below.

It is best to do this using a terminal instead of the jupyter cells as they require user input for passwords.

In [None]:
!ssh -Y $USER@ssh.york.ac.uk

## A note on $USER

Note that this command assumes the username on your current machine matches your University username!

If you're on a personal laptop, this may not be true. You can check what your current shell thinks your username is via:

In [None]:
!echo $USER

On Google Colab, this is just blank. You can replace $USER in the above with your York username. This is typically 3 letters followed by 3 numbers, e.g:



```
abc123
```



## Connecting to a Specific Machine

After running this command, will be prompted to enter your password and authenticate by duo. After this is complete the terminal will ask what you want to connect to, this is where the machine to use is specified.

Try connecting to `teaching0` by following the instructions given in the terminal.

Note that upon your first connection to the gateway, you may be asked to verify you want to connect. Enter 'y'.

Once in teaching0, you will be able to check and edit your personal user files using the commands covered in the previous notebook. An example here is seen below:


## Task 1

Try moving around within your personal user files and create a new directory. Within this directory create the counting to 5 shell script covered in the previous notebook. Create and run this file within teaching0. You can use the cell below to write the file and copy this into the terminal or you can create a new file within the user directoty and edit this using a text editor like emacs.

Note that in running

```
ssh -Y
```

this is a session with interactive windows. These *may* be very slow. Typically, opening X-windows via remote connection isn't recommended. If you're using a text editor, try running it without an interactive window if possible. For example



```
emacs -nw
```

will open a new emacs session within your terminal without opening a window. You can exit via:

```
ctrl+X
ctrl+C
```
save via:
```
ctrl+s
```

# Transferring Files from Remote Filestores


When connecting to a machine via ssh, you may find that you need to transfer files between the local system and the remote system. This may be to feed input/output back and forth for example.

This can be done via many protocols. Some systems allow only certain ones to be used and some are better than others depending on the application needed.

Some common methods/commands include:

- SFTP
- scp
- rsync

All of these are usable on University systems. We will look at how to use each of these can be used.

## SFTP


SFTP stands for "Secure File Transfer Protocol" and is secure as it encrypts data during transmission.

The command `sftp USERNAME@sftp.york.ac.uk` is used to access files, using this will display the following and prompt the password and duo authentication.

Once complete, files can be retrieved using `get file_name` and transfered using `put file_name`. To list the remote system files you can use `ls`, to print local files you can use `!dir`. An example is shown below;

`cd` can be used to change the working directory of the remote system and `lcd` used to change the directory of the local system

## SCP

This stands for "Secure Copy Protocol", it encrypts data while it is in transit between your computer and the central file servers.

To transfer a local file to a remote system, the following command layout is used;
- `scp file_name USERNAME@scp.york.ac.uk:/remotePath/directory`

An example is seen below to transfer a local file "transfer.txt" to the personal remote file store of the user:

To transfer a remote file to a local directory, the following command layout is used;
- `scp USERNAME@scp.york.ac.uk:/remotePath/file_name /localPath/directory`

An example of this is seen below where the file "test.tex" is retrived from the remote system and stored in a chosen directory as seen in the scp command:

Note that the general syntax is



```
scp Target_Files Destination_Files
```

where either side can have the hostname. Note that we can use wildcards in file paths/names too. You can also use the -r flag to recursively copy anything matching the path. Use this to move whole folders over with scp.


## Rsync



Rsync stands for "Remote Sync" and is used to transfer and synchronise files between a local and remote system. For university systems, the same address as sftp is used being `USERNAME@sftp.york.ac.uk`

Rsync is used to minimize the amount of data copied by only moving portions of files that have changed, this results in less files being copied and and allows users to only transfer any changes in a directory.

As an example create a directory containing some files to be transferedd to the remote system, the example code is below.

In [None]:
mkdir rsyncExample
touch rsyncExample/file{1..100}.txt

Now using `ssh` connect to a university system and create a directory to transfer these files too using rsync. In this example the directory is called /rsyncExample2. The syntax to syncronise files from one directory to another is seen below:
- `rsync -a localPath/rsyncExample USERNAME@sftp.york.ac.uk:/remotePath`


Some useful flags for rsync include:

- -r
- -a
- -v
- -P
- --max-size=SIZE          don't transfer any file larger than SIZE
- --min-size=SIZE          don't transfer any file smaller than SIZE

As usual, these can be combined:


```
rsync --ignore-existing -ravP
```

Note - SJDK 27/08/25 - Unfortunately I left my notebook on my desk and I can never remember the particular string of args that are good. I'll add this when I'm in the office tomorrow :)

## Wildcards

Wildcards are symbols that can be used to find a string matching a specific example. The symbol `*` is used for finding full strings containing a specific subsection of the string. These are most often used when finding specific file types.

- `*.root` - This will pick up any files with the .root suffix
- `Test*` - This will pick up any files begining with Text
- `-01-` - This will pick up any files containing the string -01-

The symbol `?` can be used as well to act as a wildcard for a single specific character. This is often used to find numbered files.

- `file_0?` - This will pick up all files numbered 00 to 09 in a list of files.

# File Recovery

In the case of data loss or accidental deletion, the University filestores take snapshots of the stored files each time the central filestores are backed up. FIles can be recovered that were deleted up to 90 days ago.

Files can only be recovered if they were lost from the M: drive, H: drive and Linux home directory.

This notebook contains the guide for Linus and SFTP recovery. For windows see [here](https://support.york.ac.uk/s/article/Filestore-Recovering-files-from-backups)

 To begin, **cd** to the directory that files need to be recovered from then type and execute **cd .snapshot**. This sets the current directory to the hidden snapshot folders.

 Using **ls** will list all of the available snapshots to revover from. The naming scheme is:

Hourly:
- For the last 24 hours
- Naming style is **hourly.year-month-day-hourmin**

Daily:
- Taken just after midnight
- For the last two weeks
- Naming style is **daily.year-month-day-hourmin**

Search through these directories to find the files needed for recovery. This directory is read-only so must be copied back to the normal user space to be used.


Beyond two weeks the snapshots are found on the backup server at /shared/backups and /shared/backups/userfs for home directories. This contains last nights backups. To access older backups you will need to enter the hidden .snapshots directory like before.

### Task 2

Delete a file from your University workspace that has been there for atleast an hour and is in a backed up folder. (Don't delete anything important just incase any issues occur). Then find a past snapshot and recover the file you deleted by copying it from the snapshot directory to its original storage location

# Summary Task

As a final excersie create a shell script that prints the total sum of the fibonacci (0, 1, 1, 2, 3, 5, etc) sequence up to the nth term where n = 10, 100, 1000. Create this script locally either in this notebook or using a text editor tool.

After this, using one of the file transfer protocols above, transfer this to the university system and execute the script using an ssh connection.