**Note**: You need to include the following setup code in your own notebooks so that your own notebooks have access to the CSC310 resources.

In [None]:
###### Set Up #####
import sys
import os
import platform

colab = True if 'google.colab' in os.sys.modules else False

if colab:
  # running in google colab
  # update/clone ds-assets repo
  !test -e ds-assets && cd ds-assets && git pull && cd ..
  !test ! -e ds-assets && git clone https://github.com/lutzhamel/ds-assets.git
  home = "ds-assets/assets/"
else:
  # running on local machine
  # set this to the folder containing the DS assets
  home = "ds-assets/assets/"

system = platform.system() # "Windows", "Linux", "Darwin"
sys.path.append(home)      # add home folder to module search path

In [3]:
# notebook imports
import pandas as pd

## Reading Data from Local Files

The CSC310 `assets` repository is local and contains many of the data sets and resources needed for this course.   

The variable `home` defined by the setup code points to that folder and we can use it to read files from that folder.  For example, above you can see that there is a file called `tennis.csv`.  Let's read that file into a Pandas dataframe,

In [2]:
home # this variable defined by set-up

'ds-assets/assets/'

In [4]:
df = pd.read_csv(home+"tennis.csv")
df.head()

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,weak,no
1,sunny,hot,high,strong,no
2,overcast,hot,high,weak,yes
3,rainy,mild,high,weak,yes
4,rainy,cool,normal,weak,yes


## Reading Files from your Google Drive

Mount the 'My Drive' folder of your Google drive.

In [5]:
# mount 'My Drive' if available
if colab:
  try:
    from google.colab import drive
    drive.mount('/content/drive')
  except Exception as e:
    print(str(e))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Once you mounted your drive using the icon in the file browser you can access the drive using the
```
/content/drive/MyDrive
```
folder.  On my machine I have a folder,
```
/content/drive/MyDrive/Example-Directory
```
that contains the file `iris-local.csv`
We can read this file.

In [6]:
if colab:
    df = pd.read_csv("/content/drive/MyDrive/Example-Directory/iris-local.csv")
    df.head(n=5)

Unnamed: 0,id,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,1,5.1,3.5,1.4,0.2,setosa
1,2,4.9,3.0,1.4,0.2,setosa
2,3,4.7,3.2,1.3,0.2,setosa
3,4,4.6,3.1,1.5,0.2,setosa
4,5,5.0,3.6,1.4,0.2,setosa


## Reading Data right from the Web

Some websites make data file directly available for download.  One such website is [Vincent Arel-Bundock's data set collection](https://vincentarelbundock.github.io/Rdatasets/).  If we follow the link to the `html` index we find a collection of datasets in `csv` format.  One such dataset is the credit card dataset (\#21) with the corresponding link to the `csv` file,
```
https://vincentarelbundock.github.io/Rdatasets/csv/AER/CreditCard.csv
```
We can read this data right from that website into our notebook.  There is no need to make a local copy of the data,

In [7]:

url = "https://vincentarelbundock.github.io/Rdatasets/csv/AER/CreditCard.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,rownames,card,reports,age,income,share,expenditure,owner,selfemp,dependents,months,majorcards,active
0,1,yes,0,37.66667,4.52,0.03327,124.9833,yes,no,3,54,1,12
1,2,yes,0,33.25,2.42,0.005217,9.854167,no,no,3,34,1,13
2,3,yes,0,33.66667,4.5,0.004156,15.0,yes,no,4,58,1,5
3,4,yes,0,30.5,2.54,0.065214,137.8692,no,no,0,25,1,7
4,5,yes,0,32.16667,9.7867,0.067051,546.5033,yes,no,2,64,1,5


## Reading Data from GitHub Repositories

GitHub maintains the `raw.githubusercontent.com` domain that allows users to access files in repositories unprocessed.
For example, consider the file `tennis.csv` in my GitHub repository `ds-assets` with the following parameters,

* Account: `lutzhamel`
* Repository: `ds-assets`
* Branch: `main`
* Folder: `assets`
* File: `tennis.csv`

Perhaps the only surprising thing here is the 'Branch' parameter.  For most file accesses we are interested in the main branch of the repository which is called either  the `master` or  the `main` branch.  The main branch in my repository is called `main`.  Given this information we can construct a raw access URL to the `tennis.csv` file using the scheme,
```
https://raw.githubusercontent.com/<account>/<repository>/<branch>/<folder>/<filename>
```
This gives us the URL,
```
https://raw.githubusercontent.com/lutzhamel/ds-assets/main/assets/tennis.csv
```
Let's try this with some code,

In [8]:
url = "https://raw.githubusercontent.com/lutzhamel/ds-assets/main/assets/tennis.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,weak,no
1,sunny,hot,high,strong,no
2,overcast,hot,high,weak,yes
3,rainy,mild,high,weak,yes
4,rainy,cool,normal,weak,yes
