# Understanding the Underlying Virtual Machine

We have already seen that we can issue commands to the underlying system from our notebooks.  The prologue for our notebooks makes use of that functionality in order to import data into the virtual machine (VM).  We can use this functionality to investigate the underlying system further.


# Processor Architecture and OS

First, let's find out a bit more about the hardware and the OS we are running on. Spoiler: we are running on Linux and it has a command called `uname` that lets you probe specifically the hardware and OS versions.

In [1]:
!echo "operating system name"; uname -s 
!echo "operating system release"; uname -r
!echo "operating system version"; uname -v

operating system name
Linux
operating system release
5.4.104+
operating system version
#1 SMP Sat Jun 5 09:50:34 PDT 2021


The `echo` command prints the given string to the terminal.  From the output of the three commands above we learn that we are indeed running on Linux under a particular kernel version.  The deployed version lags about six months behind the currently available kernel version. Finally, we see the OS version.
Here, `SMP` means the kernel was built with *symmetric multi-processor support*. #1 indicates the kernel is the result of the first build from the kernel source on the machine where it was built. If it had been tweaked in some way and rebuilt, it would show #2.

([Source](https://stackoverflow.com/questions/40916064/how-do-i-know-what-linux-kernel-version-does-a-distribution-use))

Next, we'll look at the hardware,

In [2]:
!echo "machine hardware name"; uname -m
!echo "nodename"; uname -n 
!echo "processor architecture name"; uname -p

machine hardware name
x86_64
nodename
ee19fe26a545
processor architecture name
x86_64


Here we see that we are running on a 64-bit Intel chip.  The node name is something that only makes sense to Google and has no real impact on our work.

# The File System

The file system is where the OS stores its and your data.  Therefore, knowing your way around the filesystem a bit might help you identify where your notebook might have saved something or find the file that you want to load into your notebook.

Let's start at the beginning.  Where is the default location in the filesystem where our notebook assumes it can find something? We can find this out by issuing the `pwd` (print working directory) command to the underlying Linux system,

In [3]:
!pwd

/content


OK, our notebook assumes that all data is in a folder called `contents`. Just as aside, the `/` name refers to the *root* folder in Linux which contains all vital OS related information and additional folders.  We can take a peek,

In [4]:
!cd ..; ls

bin	 datalab  home	 lib64	opt	    root  srv		     tmp    var
boot	 dev	  lib	 media	proc	    run   sys		     tools
content  etc	  lib32  mnt	python-apt  sbin  tensorflow-1.15.2  usr


The command `cd ..` (change directory) means go up one level and `ls` means print a listing of the files in that directory. We can see that there are lots of folder with ominous names like `sys`, `boot`, and `root`.  It is best to leave all that alone.  Of course we can also see our `content` folder in that list of folders.

We can take a look if there are files already in our `content` folder,

In [5]:
!ls

sample_data


Sure thing, there is a folder called `sample_data`.  Let's take a peek what's in there,

In [6]:
!cd sample_data; ls

anscombe.json		      mnist_test.csv
california_housing_test.csv   mnist_train_small.csv
california_housing_train.csv  README.md


We can see a bunch of CSV datafiles which we could play around with.  We also see a `README.md` file.  To find out what that says we can issue the following commands, 

In [7]:
!cd sample_data; cat README.md

This directory includes a few sample datasets to get you started.

*   `california_housing_data*.csv` is California housing data from the 1990 US
    Census; more information is available at:
    https://developers.google.com/machine-learning/crash-course/california-housing-data-description

*   `mnist_*.csv` is a small sample of the
    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is
    described at: http://yann.lecun.com/exdb/mnist/

*   `anscombe.json` contains a copy of
    [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet); it
    was originally described in

    Anscombe, F. J. (1973). 'Graphs in Statistical Analysis'. American
    Statistician. 27 (1): 17-21. JSTOR 2682899.

    and our copy was prepared by the
    [vega_datasets library](https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/anscombe.json).


Looks like datasets that might be intesting to explore.  However, what we would like to do is to try to understand in a more detailed manner of how we import the data we have been working with for all the previous chapters.  Our prologue looks like this,

In [8]:
###### Set Up #####
# verify our folder with the data and module assets is installed
# if it is installed make sure it is the latest
!test -e ds-assets && cd ds-assets && git pull && cd ..
# if it is not installed clone it 
!test ! -e ds-assets && git clone https://github.com/lutzhamel/ds-assets.git
# point to the folder with the assets
home = "ds-assets/assets/" 
import sys
sys.path.append(home)      # add home folder to module search path

Cloning into 'ds-assets'...
remote: Enumerating objects: 164, done.[K
remote: Counting objects: 100% (164/164), done.[K
remote: Compressing objects: 100% (143/143), done.[K
remote: Total 164 (delta 60), reused 117 (delta 20), pack-reused 0[K
Receiving objects: 100% (164/164), 7.40 MiB | 29.94 MiB/s, done.
Resolving deltas: 100% (60/60), done.


If we now do an `ls` in our contents directory,

In [9]:
!ls

ds-assets  sample_data


We can see that the prologue created an additional folder here: `ds-assets`.  This folder contains all the data and Python modules we have been working with in the previous chapters,

In [10]:
!cd ds-assets/assets; ls

2fold-xval.png		   mlp_regression2.py
5fold-xval.png		   mlp_regression.py
abalone.csv		   model-performance-curves.png
bootstrap.py		   newsgroups.csv
caesarian.csv		   newsgroups-noheaders.csv
cars.csv		   PandasPythonForDataScience.jpg
classification1.jpg	   PandasPythonForDataScience.pdf
classification2.jpg	   pdf-badge.png
classification3.jpg	   perceptron-eq.jpg
colab-badge.afdesign	   perceptron.jpg
colab-icon.afdesign	   perceptron.r
colab-icon.png		   perceptron-search.png
confint.py		   perceptron-train.jpg
confusion1.png		   pipeline.png
confusion2.png		   regression1.jpg
crohnd.csv		   rs.png
cross-validated-curve.png  shuttle.csv
data-science.jpg	   shuttle.pdf
divorce.csv		   sobar-72.csv
divorce-readme.txt	   swans.jpg
elbow.py		   tennis.csv
github-icon.png		   tennis_numeric.csv
google_drive.py		   training-curves.jpg
grid-stability.csv	   train-test-curves.png
helloagain.py		   train-test-data.png
helloworld.py		   tree-model.png
iris.csv		   tree_regression2.py
kmean

By the way, you can explore all this in a visual manner by clicking on the folder icon in the left, vertical navigation bar.

# Mounting your Google Drive

In the first chapters we talked about accessing files on Google Drive via 'share links'.  There is another way of getting files on your Google Drive into the Colab VM: We can mount Google Drive into the VM filesystem.

Here is the code that will accomplish this,

In [12]:
mount_point = '/content/drive'
from google.colab import drive
drive.mount(mount_point)

Mounted at /content/drive


For the mount point we have to pick a directory that has no files in it.  In this case the 'drive' folder fits the bill.
Once we have mounted the Google Drive we can look at its contents,

In [13]:
!ls /content/drive

MyDrive  Shareddrives


We can see that two directories have been created in our mount point.  I have a directory called `Example-Directory` in `MyDrive` which in turn contains the `iris-local.csv` file,

In [14]:
!ls /content/drive/MyDrive/Example-Directory/

iris-local.csv


That means we can this file from that directory into a Pandas dataframe,

In [18]:
import pandas as pd
path = '/content/drive/MyDrive/Example-Directory/'
df = pd.read_csv(path+'iris-local.csv')
df.head(n=10)

Unnamed: 0,id,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,1,5.1,3.5,1.4,0.2,setosa
1,2,4.9,3.0,1.4,0.2,setosa
2,3,4.7,3.2,1.3,0.2,setosa
3,4,4.6,3.1,1.5,0.2,setosa
4,5,5.0,3.6,1.4,0.2,setosa
5,6,5.4,3.9,1.7,0.4,setosa
6,7,4.6,3.4,1.4,0.3,setosa
7,8,5.0,3.4,1.5,0.2,setosa
8,9,4.4,2.9,1.4,0.2,setosa
9,10,4.9,3.1,1.5,0.1,setosa


The file is as expected in that it describes the iris flowers.  Of course we could create a dataframe and write it to the drive using the `to_csv` member function of dataframes as well.