[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gdsaxton/GDAN5400/blob/main/Week%202%20Notebooks/GDAN%205400%20-%20Week%202%20Notebooks%20%28I%29%20-%20Read%20in%20and%20Save%20Files%20in%20Colab.ipynb)

This notebook provides recipes for loading and saving data from external sources in Colab.

### Option 1: Uploading files from your local file system

`files.upload` returns a dictionary of the files which were uploaded.
The dictionary is keyed by the file name and values are the data which were uploaded.

```python
#Run this code to upload the file:
from google.colab import files
uploaded = files.upload()  # This will prompt you to upload the file
```

```python
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
```

In [None]:
#Run this code to upload the file:
from google.colab import files
uploaded = files.upload()  # This will prompt you to upload the file

In [None]:
#Once uploaded, you can open the file using pandas:
import pandas as pd
df = pd.read_excel('final_insurance_fraud.xlsx')
df.head()

### Option 2: Mount Google Drive Locally

The example below shows how to mount your Google Drive on your runtime using an authorization code, and how to write and read files there. Once executed, you will be able to see the new file (`foo.txt`) at [https://drive.google.com/](https://drive.google.com/).

This only supports reading, writing, and moving files; to programmatically modify sharing settings or other metadata, use one of the other options below.

**Note:** When using the 'Mount Drive' button in the file browser, no authentication codes are necessary for notebooks that have only been edited by the current user.

To access an Excel file stored in your Google Drive:

```python
from google.colab import drive
drive.mount('/content/drive')
```

Access the file:

```python
import pandas as pd
file_path = '/content/drive/My Drive/your_file.xlsx'
df = pd.read_excel(file_path)
```

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pandas as pd
file_path = '/content/drive/My Drive/final_insurance_fraud.xlsx'
df = pd.read_excel(file_path)
df.head()

All changes made in this colab session should now be visible in Drive.

### Option 3: Download from an Online URL
If the Excel file is hosted online:  
1. Use `requests` to fetch the file

```python
import pandas as pd
import requests

url = 'http://example.com/your_file.xlsx'
response = requests.get(url)
with open('temp_file.xlsx', 'wb') as f:
    f.write(response.content)

df = pd.read_excel('temp_file.xlsx')
```

In [None]:
import pandas as pd
import requests

# NOTE: replace `https://github.com/` with `https://raw.githubusercontent.com`
# https://github.com/gdsaxton/GDAN5400/blob/main/Coding%20Assignment%201/final_insurance_fraud.xlsx
url = 'https://raw.githubusercontent.com/gdsaxton/GDAN5400/main/Coding%20Assignment%201/final_insurance_fraud.xlsx'

# Download the file
response = requests.get(url)
with open('final_insurance_fraud.xlsx', 'wb') as f:
    f.write(response.content)

# Load the Excel file
df = pd.read_excel('final_insurance_fraud.xlsx', engine='openpyxl')

df.head()

# Saving Files
   - [This would be the ``Output`` tool in Alteryx]

### Option 1: Save to Your Google Drive

In [None]:
# Mount your Google Drive (should be already mounted above)
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Path to save the file in Google Drive
file_path = '/content/drive/My Drive/final_insurance_fraud.pkl'

# Save the DataFrame as a pickled file (native PANDAS format)
df.to_pickle(file_path)

print(f"File saved to {file_path}")

### Option 2: Save and Download the File to Your Computer
1. Save the DataFrame as a pickled file.  
Use pandas to save the DataFrame to a PKL file in the Colab environment:

In [None]:
df.to_pickle('final_insurance_fraud.pkl')

2. Download the File to Your Computer.   
After saving the file, use the following code to download it:

In [None]:
from google.colab import files

# Download the file
files.download('final_insurance_fraud.pkl')