# Loading Data into Colab

In this final notebook, we'll go over a number of ways to load CSV data into Colab. This is the one area where Colab is unfortunately more difficult than to use compared with running Python on your own computer.

This difficulty stems from the fact that we need to make the data we are loading available to our Colab Runtime (the virtual machine that is running our Python code). Because the Colab Runtime is an anonymous virtual machine in the cloud, it has the same permissions to access your files as does any anonymous random person on the internet (hopefully none!). We will need to jump through some hoops to either directly provide the data to the Colab Runtime or to make our data accessible to it.

I've listed several ways to load data into Colab below. For any assignments, I will ask that you use Method 1 because that makes it easy for me to run your code myself.

# Method 1: Loading from Google Drive by Link Sharing

The easiest method for loading data into a Colab notebook is to upload it to Google drive, make it share-able with anyone who has a link, and then use a version of that link in Colab.

The down-side of this method is that anyone on the internet who has that link can now access that data.

## Steps

* Upload your data file to Google Drive.
* Enable link access to everyone in the world(!).
  * Select the file that you uploaded and click "Get Sharable Link".
  * Change the access mode from "Restricted" to "Anyone with the link".
* Copy the link.
  * This is link to open the file in Google Drive.
  * We will want to create a link to download the file directly.
* Extract the file ID.
  * The copied link will be something of the form ``https://drive.google.com/file/d/XXX/view?usp=sharing``
  * The ``XXX`` part is the Google Drive File ID.
* Create a download link.
  * Replace the ``XXX`` in ``https://drive.google.com/uc?export=download&id=XXX`` with the file ID.
  * This is your download link.

## Example

As a concrete example, the following is a link to open a test CSV file in Google Drive (the link that you get from clicking "Copy link").
```
    https://drive.google.com/file/d/1GpMDgogFAnCDpw0iRMTbUUfbWZTu6kzB/view?usp=sharing
```
The file ID for this file is
```
    1GpMDgogFAnCDpw0iRMTbUUfbWZTu6kzB
```
The direct download link is therefore 
```
    https://drive.google.com/uc?export=download&id=1GpMDgogFAnCDpw0iRMTbUUfbWZTu6kzB
```

Now, we can directly load this file into ``pandas`` using ``read_csv``.

In [1]:
import pandas as pd
df = pd.read_csv('https://drive.google.com/uc?export=download&id=1GpMDgogFAnCDpw0iRMTbUUfbWZTu6kzB')
df

Unnamed: 0,BLUE,ORANGE
0,8.7,10.66
1,8.9055,11.0828
2,8.7113,10.71
3,8.4346,11.5907
4,8.7254,12.107
5,9.0551,11.7876
6,8.9514,11.2078
7,9.2439,12.5192
8,9.1276,13.3624
9,9.3976,14.408


# Method 2: Uploading to the Colab Runtime's filesystem

Remember that the Colab Runtime is a virtual machine in the cloud. You can store files on it just like any other computer.

This is the most secure way to share files with Colab (you're only sharing the files directly with your runtime). It can also be annoying because you will have to re-upload your files every time your runtime restarts.

To upload a file:
* Click the folder icon on the left panel in Colab
* Click the upload icon, and select a file
* Click the folder-refresh icon next to the upload icon, and the file you uploaded should be visible.
* Hover over the file, click the 3-dot menu, and select "Copy path".

Now you can load the file by pasting its path as a string into ``read_csv`` as below.

In [3]:
df = pd.read_csv('sample_prices.csv')
df

Unnamed: 0,BLUE,ORANGE
0,8.7,10.66
1,8.9055,11.0828
2,8.7113,10.71
3,8.4346,11.5907
4,8.7254,12.107
5,9.0551,11.7876
6,8.9514,11.2078
7,9.2439,12.5192
8,9.1276,13.3624
9,9.3976,14.408


# Method 3: Mounting your Google Drive

With this method, you are giving the Colab Runtime (that machine in the cloud) permission to add, edit, or delete files in your entire Google Drive. Your Google Drive files will appear as part of the Colab Runtime's filesystem (like the things you uploaded).

This will only give access to your personal drive -- it will not give access to any files that have been shared with you.

**Security Warning:** This is also something you only want to do when you know and trust all of the code in the Colab Notebook that you are running. For example, a malicious person could write a Colab Notebook that asks for permission to access your Google Drive, and then has code in it that deletes all of the files from your Google Drive (or searches through them for passwords, credit card numbers, etc).

With that warning out of the way, this can be very convient to use if you are writing your own code. The instructions for doing this are to:

* Click the folder icon on the left side of your Colab window
* Click the Google Drive icon
* Select "Connect to Google Drive" at the permission requests that prompts you (if asked)
* Press the folder-refresh icon.
* You will now see a new directory, drive. You can copy the path of files as you did in the last example.

If you're wondering where the terminology comes from, you're not the only one! I [looked it up](https://english.stackexchange.com/questions/335105/etymology-of-the-use-of-drive-to-refer-to-a-digital-storage-medium). Drive refers to spools that would spin magnetic tapes that were used for storage (similar to drivetrain in a car). Mounting something referred to mounting a given tape reel on to the drive (like mounting a frame on the wall).


# Method 4: Accessing Google Sheets directly using gspread

---



One thing that you may want to try is to open a Google Sheets spreadsheet directly. If you try this using the above methods (using the Google Drive File ID, or a mounted Google Drive), you will quickly find that they don't work!

Google provides [this example code](https://colab.research.google.com/notebooks/snippets/sheets.ipynb#scrollTo=k9q0pp33dckN) to load a Google Sheets spreadsheet into Colab. The mechanism is clunky, and is probably not better than just exporting the Google Sheet as a CSV and then uploading that to either Drive or Colab.