<a href="https://colab.research.google.com/github/islamicity24/PythonCity/blob/main/Data_eksternal_File_Lokal%2C_Drive%2C_Spreadsheet%2C_dan_Cloud_Storage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Notebook ini menyediakan resep untuk memuat dan menyimpan data dari sumber eksternal.

# Sistem file lokal

## Mengupload file dari sistem file lokal

<code>files.upload</code> menampilkan kamus file yang diupload.
Kamus disertakan berdasarkan nama file dan nilai adalah data yang diupload.

In [1]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving restaurants.pbix to restaurants.pbix
User uploaded file "restaurants.pbix" with length 22921 bytes


## Mendownload file ke sistem file lokal

<code>files.download</code> akan memanggil browser untuk mendownload file ke komputer lokal Anda.


In [4]:
from google.colab import files

with open('example.txt', 'w') as f:
  f.write('some content')

files.download('example.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Google Drive

Anda dapat mengakses file di Drive menggunakan beberapa cara, meliputi:
- Pemasangan Google Drive di mesin virtual waktu proses
- Menggunakan wrapper di sekitar API seperti <a href="https://pythonhosted.org/PyDrive/">PyDrive</a>
- Menggunakan <a href="https://developers.google.com/drive/v3/web/about-sdk">REST API native</a>



Contoh tiap-tiap metode dapat dilihat di bawah.

## Pemasangan Google Drive secara lokal

Contoh di bawah menunjukkan cara memasang Google Drive di waktu proses Anda menggunakan kode otorisasi, serta cara menulis dan membaca file di sini. Setelah dijalankan, Anda akan dapat melihat file baru &#40;<code>foo.txt</code>&#41; di <a href="https://drive.google.com/">https://drive.google.com/</a>.

Opsi ini hanya mendukung membaca, menulis, dan memindahkan file; untuk mengubah setelan berbagi atau metadata lainnya secara terprogram, gunakan salah satu opsi lainnya di bawah.

<strong>Catatan:</strong> Saat menggunakan tombol 'Pasang Drive' di file browser, Anda tidak memerlukan kode autentikasi apa pun untuk notebook yang baru diedit oleh pengguna saat ini.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
with open('/content/drive/My Drive/foo.txt', 'w') as f:
  f.write('Hello Google Drive!')
!cat /content/drive/My\ Drive/foo.txt

Hello Google Drive!

In [None]:
drive.flush_and_unmount()
print('All changes made in this colab session should now be visible in Drive.')

All changes made in this colab session should now be visible in Drive.


## PyDrive

Contoh di bawah menunjukkan autentikasi dan upload/download file menggunakan PyDrive. Contoh lainnya tersedia di <a href="https://pythonhosted.org/PyDrive/">dokumentasi PyDrive</a>.

In [None]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

Autentikasikan dan buat klien PyDrive.


In [None]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Buat dan upload file teks.


In [None]:
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

Uploaded file with ID 14vDAdqp7BSCQnoougmgylBexIr2AQx2T


Muat file berdasarkan ID dan cetak kontennya.


In [None]:
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))

Downloaded content "Sample upload file content"


## REST API Drive

Agar dapat menggunakan Drive API, kita harus mengautentikasi dan membuat klien API terlebih dahulu.


In [None]:
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

Dengan klien ini, kita dapat menggunakan fungsi apa pun di <a href="https://developers.google.com/drive/v3/reference/">referensi Google Drive API</a>. Contohnya sebagai berikut.


### Membuat file Drive baru dengan data dari Python

Pertama, buat file lokal yang akan diupload.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Upload file tersebut menggunakan metode <a href="https://developers.google.com/drive/v3/reference/files/create"><code>files.create</code></a>. Detail selengkapnya tentang mengupload file tersedia di <a href="https://developers.google.com/drive/v3/web/manage-uploads">dokumentasi developer</a>.

In [None]:
from googleapiclient.http import MediaFileUpload

file_metadata = {
  'name': 'Sample file',
  'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()
print('File ID: {}'.format(created.get('id')))

File ID: 1Cw9CqiyU6zbXFD9ViPZu_3yX-sYF4W17


Setelah menjalankan sel di atas, Anda akan melihat file baru bernama 'Sample file' di <a href="https://drive.google.com/">https://drive.google.com/</a>.

### Mendownload data dari file Drive ke Python

Download file yang kita upload di atas.

In [None]:
file_id = created.get('id')

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))

Downloaded file contents are: b'my sample file'


Untuk mendownload file lain, tetapkan <code>file&#95;id</code> di atas ke ID file yang ingin didownload, yang akan terlihat seperti "1uBtlaggVyWshwcyP6kEI-y&#95;W3P8D26sz".

# Google Spreadsheet

Contoh di bawah menggunakan library <a href="https://github.com/burnash/gspread"><code>gspread</code></a> open-source untuk berinteraksi dengan Google Spreadsheet.

Impor dan autentikasikan library, lalu buat antarmuka ke Spreadsheet.

In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

Berikut adalah sebagian kecil contoh <code>gspread</code>. Contoh lainnya tersedia di <a href="https://github.com/burnash/gspread#more-examples">halaman GitHub <code>gspread</code></a>.

## Membuat sheet baru dengan data dari Python

In [None]:
sh = gc.create('My cool spreadsheet')

Setelah menjalankan sel di atas, Anda akan melihat spreadsheet baru yang bernama 'My cool spreadsheet' di <a href="https://sheets.google.com/">https://sheets.google.com</a>.

Buka sheet baru dan tambahkan beberapa data acak.

In [None]:
worksheet = gc.open('My cool spreadsheet').sheet1

cell_list = worksheet.range('A1:C2')

import random
for cell in cell_list:
  cell.value = random.randint(1, 10)

worksheet.update_cells(cell_list)

{'spreadsheetId': '1dsQeN0YzXuM387l_CuyEbsYzL2ew9TJFzR-E-RQnwxs',
 'updatedCells': 6,
 'updatedColumns': 3,
 'updatedRange': 'Sheet1!A1:C2',
 'updatedRows': 2}

## Mendownload data dari sheet ke Python sebagai Pandas DataFrame

Baca kembali data acak yang kita sisipkan di atas dan konversikan hasil menjadi <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html">Pandas DataFrame</a>.

In [None]:
worksheet = gc.open('My cool spreadsheet').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)

import pandas as pd
pd.DataFrame.from_records(rows)

[['6', '3', '4'], ['7', '2', '1']]


Unnamed: 0,0,1,2
0,6,3,4
1,7,2,1


# Google Cloud Storage &#40;GCS&#41;

Agar dapat menggunakan Colaboratory dengan GCS, Anda perlu membuat <a href="https://cloud.google.com/storage/docs/projects">project Google Cloud</a> atau menggunakan yang sudah ada.

Tentukan ID project Anda di bawah:

In [None]:
project_id = 'Your_project_ID_here'

File di GCS dimuat dalam <a href="https://cloud.google.com/storage/docs/buckets">bucket</a>.

Bucket harus memiliki nama yang unik secara global, jadi kita akan membuatnya di sini.

In [None]:
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

Agar dapat mengakses GCS, kita harus melakukan autentikasi.

In [None]:
from google.colab import auth
auth.authenticate_user()

GCS dapat diakses melalui utilitas command line <code>gsutil</code> atau melalui Python API native.

## `gsutil`

Pertama, kita konfigurasikan <code>gsutil</code> untuk menggunakan project yang kita tentukan di atas dengan menggunakan <code>gcloud</code>.

In [None]:
!gcloud config set project {project_id}

Updated property [core/project].


Buat file lokal yang akan diupload.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Buat bucket tempat kita akan mengupload file &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/mb">dokumentasi</a>&#41;.

In [None]:
!gsutil mb gs://{bucket_name}

Creating gs://colab-sample-bucket-44971372-baaf-11e7-ae30-0242ac110002/...


Salin file ke bucket baru &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/cp">dokumentasi</a>&#41;.

In [None]:
!gsutil cp /tmp/to_upload.txt gs://{bucket_name}/

Copying file:///tmp/to_upload.txt [Content-Type=text/plain]...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       


Hapus konten file yang baru kita salin untuk memastikan semuanya berfungsi dengan baik &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/cat">dokumentasi</a>&#41;.


In [None]:
!gsutil cat gs://{bucket_name}/to_upload.txt

my sample file

In [None]:
#@markdown Setelah proses upload selesai, data akan muncul di browser penyimpanan Cloud Console untuk project Anda:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Terakhir, kita akan mendownload file yang baru saja kita upload pada contoh di atas. Caranya semudah membalikkan urutan di perintah <code>gsutil cp</code>.

In [None]:
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt

# Cetak hasil untuk memastikan proses transfer berfungsi.
!cat /tmp/gsutil_download.txt

Copying gs://colab-sample-bucket483f20dc-baaf-11e7-ae30-0242ac110002/to_upload.txt...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       
my sample file

## Python API

Cuplikan ini berdasarkan <a href="https://github.com/GoogleCloudPlatform/storage-file-transfer-json-python/blob/master/chunked_transfer.py">contoh yang lebih besar</a> yang menunjukkan penggunaan API lainnya.

Pertama, buat klien layanan.

In [None]:
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

Buat file lokal yang akan diupload.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Buat bucket di project yang ditentukan di atas.

In [None]:
# Gunakan nama bucket unik global yang berbeda dengan contoh gsutil di atas.
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

body = {
  'name': bucket_name,
  # For a full list of locations, see:
  # https://cloud.google.com/storage/docs/bucket-locations
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')

Done


Upload file ke bucket yang baru kita buat.

In [None]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name,
                                       name='to_upload.txt',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')

Upload complete


In [None]:
#@markdown Setelah proses upload selesai, data akan muncul di browser penyimpanan Cloud Console untuk project Anda:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Download file yang baru saja diupload.

In [None]:
from apiclient.http import MediaIoBaseDownload

with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='to_upload.txt')
  media = MediaIoBaseDownload(f, request)

  done = False
  while not done:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = media.next_chunk()

print('Download complete')

Download complete


Periksa file yang didownload.


In [None]:
!cat /tmp/downloaded_from_gcs.txt

my sample file