# File Import/Export in Google Colab

## Uploading files from your local file system

`files.upload` returns a dictionary of the files which were uploaded.
The dictionary is keyed by the file name, the value is the data which was uploaded.

In [1]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving electric-car-sales.csv to electric-car-sales.csv
User uploaded file "electric-car-sales.csv" with length 10144 bytes


In [2]:
ls

electric-car-sales.csv  [0m[01;34msample_data[0m/


Please note that any file that is being uploaded to Google Colab will not persist after the session is terminated (In general, free Colab notebooks can run for at most 12 hours, depending on availability and your usage patterns.). This should be used with caution.

## Downloading files to your local file system

`files.download` will invoke a browser download of the file to the user's local computer.

In [3]:
from google.colab import files

files.download('electric-car-sales.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Removing a file from the Colab container

We can use the linux command `rm` to remove the file we just uploaded:

In [4]:
rm electric-car-sales.csv

We can check to see if the file was removed by listing the files in the current directory (`content`) by the linux command `ls`:

In [5]:
ls

[0m[01;34msample_data[0m/


## Manual upload/download

This can be done by clicking on the "Files" icon on the top left of your Colab notebook then "Upload".

One can right click on any file to download it manually.

**Note:** The working directory for Colab notebooks is "`content`" and we will only use that folder to avoid breaking anything.

Linux command `pwd` returns the `p`resent `w`orking `d`irectory:

In [6]:
pwd

'/content'

## Mounting Google Drive locally

The example below shows how to mount your Google Drive in your virtual machine using an authorization process, and shows a couple of ways to write & read files there. Once executed, check the new file is visible in https://drive.google.com/ as well.

In [7]:
# Mount the drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [8]:
ls gdrive

[0m[01;34mMyDrive[0m/  [01;34mOthercomputers[0m/  [01;34mShareddrives[0m/


In [11]:
ls gdrive/MyDrive/_01_Teaching/BA780-Introduction-to-Data-Analytics/Intro-to-Data-Analytics/data

2017_StPaul_MN_Real_Estate.csv  movies_metadata.csv   table3.csv
[0m[01;34mAnalyticsEdge-Datasets[0m/         movies_ratings.csv    table4a.csv
athlete_events.csv              pollution.csv         table4b.csv
FremontBridge.csv               state-abbrevs.csv     table5.csv
GOOGL.csv                       state-areas.csv       Telco-Customer-Churn.csv
GOT-battles.csv                 state-population.csv  weatherHistory.csv
GOT-character-deaths.csv        table1.csv
[01;34mHouse-Prices-Kaggle[0m/            table2.csv


We can now use this path (or any other Google Drive path) to store data that we want to keep. For example, in the cell below, we copy `sample_data/california_housing_train.csv` to a chosen folder in Google Drive.

In [13]:
cp sample_data/california_housing_train.csv gdrive/MyDrive/_01_Teaching/BA780-Introduction-to-Data-Analytics/Intro-to-Data-Analytics/data

The `cp` command copies files or directories from a source path to a destination path, using the syntax:

```cp [OPTION]... <source_file_or_directory> <destination_directory>```

Option could be something like `-r` which is for recursive copying (an entire folder).

Let's confirm the file was copied properly. We can also go to Drive and visually inspect it:

In [14]:
ls gdrive/MyDrive/_01_Teaching/BA780-Introduction-to-Data-Analytics/Intro-to-Data-Analytics/data

2017_StPaul_MN_Real_Estate.csv  [0m[01;34mHouse-Prices-Kaggle[0m/  table2.csv
[01;34mAnalyticsEdge-Datasets[0m/         movies_metadata.csv   table3.csv
athlete_events.csv              movies_ratings.csv    table4a.csv
california_housing_train.csv    pollution.csv         table4b.csv
FremontBridge.csv               state-abbrevs.csv     table5.csv
GOOGL.csv                       state-areas.csv       Telco-Customer-Churn.csv
GOT-battles.csv                 state-population.csv  weatherHistory.csv
GOT-character-deaths.csv        table1.csv


## Your Trun

1. From https://grouplens.org/datasets/movielens/ download `ml-latest-small.zip`. Unzip it and upload the following files to your Colab's `content` folder:
  * movies.csv
  * ratings.csv
2. Mount Google Drive locally and copy these two files into a folder called `movie_rating` within your BA780 folder.
  * You can use the following code example to copy movies.csv to a folder called tmp_BA780 (change it to your desired desitination)
```
cp movies.csv gdrive/MyDrive/tmp_BA780/movie_rating
```
  * **Note** that `gdrive/MyDrive` is the homepage of your Google Drive. If the specified path doesn't exist, you will encounter an error. To create a new path, you can use the `mkdir` command in the command line. The `-p` option allows you to create a new folder within another new folder. For example, the command below creates `movie_rating` inside `tmp_BA780`; if `tmp_BA780` doesn't exist, it will create the parent directory first, then the child directory.:
  ```mkdir -p gdrive/MyDrive/tmp_BA780/movie_rating/```
3. Check your Drive and make sure the files are there.
4. Download these files to your local laptop.

In [None]:
# Your code goes here (use as many cells as needed)