Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

The swapfile and backup tool combine to completely fill the disk of Datalab VMs #1192

Closed
ojarjur opened this issue Feb 15, 2017 · 7 comments · Fixed by #1193
Closed

The swapfile and backup tool combine to completely fill the disk of Datalab VMs #1192

ojarjur opened this issue Feb 15, 2017 · 7 comments · Fixed by #1193
Assignees

Comments

@ojarjur
Copy link
Contributor

ojarjur commented Feb 15, 2017

There is a new issue with Datalab instances that have a machine type with lots of RAM (e.g. n1-standard-4).

The symptom is that the VM can become completely unresponsive. Someone digging in to the issue would see that the boot disk was out of free space because of /var/lib/docker/overlay files taking up all of the available space.

The root issue is that the new version of the CLI creates a swapfile on the persistent disk used for storing notebooks. The size of that swapfile is based on the amount of RAM in the host (so this doesn't affect the default machine type).

The backup utility tries to backup all of the contents of the notebook persistent disk, and as a result winds up with copying it into the /tmp directory (potentially multiple times... if the hourly, daily, and weekly backups are all running at the same time).

The result is that two issues combine to fill up the boot disk:

  1. The swapfile being included in the backup target
  2. The backup being written to the boot disk rather than the notebook persistent disk

For now, the simplest work around is to disable backups when creating the VM by passing the --no-backups flag.

I think the fix should be to change the volume mounts from the host to the Datalab Docker container such that we have the following mapping:

/mnt/disks/datalab-pd/datalab => /content/datalab
/mnt/disks/datalab-pd/tmp => /tmp

This would give two benefits:

  1. The swapfile would not even be exposed to the Docker container, so it will not get backed up.
  2. The backups would be stored on the notebook persistent disk, which is much larger and does not break the VM if it gets filled.

Similarly, this would also have the benefit that the lost+found directory does not show up in the Datalab file listing.

@ojarjur ojarjur self-assigned this Feb 15, 2017
ojarjur added a commit that referenced this issue Feb 15, 2017
This change alters the way that the persistent disk used for
storing notebooks is exposed to the Datalab container.

Rather than the entire disk being mounted by the container at
`/content`, we just mount the `datalab` subdirectory at
`/content/datalab`.

That change allows us to create a `tmp` subdirectory, and mount
that into the container at `/tmp`. That, in turn, prevents temp
files created in the container from filling up the VM's boot disk.

This fixes #1192
ojarjur added a commit that referenced this issue Feb 15, 2017
This change alters the way that the persistent disk used for
storing notebooks is exposed to the Datalab container.

Rather than the entire disk being mounted by the container at
`/content`, we just mount the `datalab` subdirectory at
`/content/datalab`.

That change allows us to create a `tmp` subdirectory, and mount
that into the container at `/tmp`. That, in turn, prevents temp
files created in the container from filling up the VM's boot disk.

This fixes #1192
@nikhilk
Copy link
Contributor

nikhilk commented Feb 15, 2017 via email

@ojarjur
Copy link
Contributor Author

ojarjur commented Feb 16, 2017

@nikhilk

​Are we just backing up the git repo? Or something more?

We are backing up the /content directory.

I wouldn't have expected a swap file in the backup adding to the size...​

It was a bug for the swapfile to be included in the backup, but the more important bug was the /tmp directory being on the boot disk, which is only 20 GB.

@nikhilk
Copy link
Contributor

nikhilk commented Feb 16, 2017 via email

@Di-Ku
Copy link
Contributor

Di-Ku commented Feb 16, 2017

+1 for only the git repo directory.
Data files on PD while discouraged in favor of GCS, are going to get used. They also make sense for temp files etc.

@yebrahim
Copy link
Contributor

yebrahim commented Feb 16, 2017 via email

@ramurti
Copy link

ramurti commented Nov 21, 2017

I had a similar problem today. In my case, the additional disk was out of space (not the boot one). Disk size was 10GB and there was a 9.7GB swapfile there.

The symptom was also that the VM was completely unresponsive.

Deleting the file fixed the problem. However, before deleting the file, I simply tried to increase disk size. I managed to increase the disk size from 10GB to 40GB, but when I enter a "df" command on my VM, it mounts only 10GB. I'm now wondering if Google is charging me for a 40GB disk, while offering only 10GB. I tried resetting the VM, but it didn't recognize the 40GB. Do you know what am I missing?

Thanks

@ojarjur
Copy link
Contributor Author

ojarjur commented Nov 21, 2017

@ramurti resizing the disk doesn't change the format of the disk; you have to do that manually.

If you don't, then you'll wind up with a 40GB disk that has a 10GB filesystem on it.

Instructions for doing that are included in the "Resizing the file system or partitions on a persistent disk" section of this page

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants