<a href="https://colab.research.google.com/github/lemirel/asl-ml-immersion/blob/master/Vanjski_podaci_lokalne_datoteke%2C_Disk%2C_Tablice_i_Cloud_Storage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Ova bilježnica sadrži upute za učitavanje i spremanje podataka iz vanjskih izvora.

# Lokalni datotečni sustav

## Prijenos datoteka iz lokalnog datotečnog sustava

<code>files.upload</code> vraća rječnik datoteka koje su prenesene.
Rječnik je povezan s nazivom datoteke, a vrijednosti su podaci koji su preneseni.

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

## Preuzimanje datoteka u lokalni datotečni sustav

Kôd <code>files.download</code> aktivirat će preuzimanje datoteke iz preglednika na vaše lokalno računalo.


In [None]:
from google.colab import files

with open('example.txt', 'w') as f:
  f.write('some content')

files.download('example.txt')

# Google disk

Datotekama na Disku možete pristupiti na više načina, uključujući sljedeće:
- Učitavanje vašeg Google diska u virtualno računalo vremena izvođenja
- Upotreba preusmjerivača oko API-ja, primjerice biblioteke <a href="https://docs.iterative.ai/PyDrive2/">PyDrive2</a>
- Upotreba <a href="https://developers.google.com/drive/v3/web/about-sdk">nativnog REST API-ja</a>



Primjeri za svaki od njih navedeni su u nastavku.

## Lokalno učitavanje Google diska

Primjer u nastavku pokazuje kako učitati Google disk u vrijeme izvođenja pomoću autorizacijskog koda te kako ondje zapisivati i čitati datoteke. Kada se to izvrši, moći ćete vidjeti novu datoteku &#40;<code>foo.txt</code>&#41; na <a href="https://drive.google.com/">https://drive.google.com/</a>.

To podržava samo čitanje, pisanje i premještanje datoteka. Da biste programski izmijenili postavke dijeljenja ili druge metapodatke, upotrijebite jednu od ostalih opcija u nastavku.

<strong>Napomena:</strong> kad upotrebljavate gumb Učitaj Disk u pregledniku datoteka, za bilježnice koje je uređivao samo trenutačni korisnik nisu potrebni kodovi za autentifikaciju.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code
Enter your authorization code:
··········
Mounted at /content/drive


In [None]:
with open('/content/drive/My Drive/foo.txt', 'w') as f:
  f.write('Hello Google Drive!')
!cat /content/drive/My\ Drive/foo.txt

Hello Google Drive!

In [None]:
drive.flush_and_unmount()
print('All changes made in this colab session should now be visible in Drive.')

All changes made in this colab session should now be visible in Drive.


## PyDrive2

Primjeri u nastavku prikazuju autentifikaciju i prijenos/preuzimanje datoteke pomoću biblioteke PyDrive2. Dodatni primjeri dostupni su u <a href="https://docs.iterative.ai/PyDrive2/">dokumentaciji za PyDrive2</a>.

In [None]:
from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

Autentificirajte i izradite PyDrive2 klijent.


In [None]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Izradite i prenesite tekstnu datoteku.


In [None]:
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

Uploaded file with ID 14vDAdqp7BSCQnoougmgylBexIr2AQx2T


Učitajte datoteku prema ID-ju i ispišite njezin sadržaj.


In [None]:
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))

Downloaded content "Sample upload file content"


## Drive REST API

Da biste koristili Drive API, najprije moramo autentificirati i izraditi API klijent.


In [None]:
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

S tim klijentom možemo upotrebljavati bilo koju od funkcija u <a href="https://developers.google.com/drive/v3/reference/">referenci Google Drive API-ja</a>. Slijede primjeri.


### Izrada nove datoteke Diska s podacima iz Pythona

Najprije izradite lokalnu datoteku za prijenos.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Prenesite je metodom <a href="https://developers.google.com/drive/v3/reference/files/create"><code>files.create</code></a>. Dodatne pojedinosti o prijenosu datoteka dostupne su u <a href="https://developers.google.com/drive/v3/web/manage-uploads">dokumentaciji za razvojne programere</a>.

In [None]:
from googleapiclient.http import MediaFileUpload

file_metadata = {
  'name': 'Sample file',
  'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()
print('File ID: {}'.format(created.get('id')))

File ID: 1Cw9CqiyU6zbXFD9ViPZu_3yX-sYF4W17


Nakon izvršavanja prethodne ćelije, na <a href="https://drive.google.com/">https://drive.google.com/</a> prikazat će se nova datoteka pod nazivom Primjer datoteke.

### Preuzimanje podataka iz datoteke na Disku u Python

Preuzmite datoteku koju smo prenijeli gore.

In [None]:
file_id = created.get('id')

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))

Downloaded file contents are: b'my sample file'


Da biste preuzeli neku drugu datoteku, postavite prethodni <code>file&#95;id</code> na ID te datoteke, koji će izgledati ovako: 1uBtlaggVyWshwcyP6kEI-y&#95;W3P8D26sz.

# Google tablice


## Workspace proširenje za Google tablice

Imamo Workspace proširenje, <a href="https://workspace.google.com/u/0/marketplace/app/sheets_to_colab/945625412720">Sheets to Colab</a>, koje vam omogućuje da podatke iz Google tablica izravno uvozite u Colab s korisničkog sučelja Tablica. Da biste saznali više, slijedite vezu na Workspace proširenje Sheets to Colab.

## Interakcija s Google tablicama pomoću biblioteke gspread

Za interakciju s Google tablicama možete upotrijebiti i biblioteku otvorenog izvornog koda <a href="https://github.com/burnash/gspread"><code>gspread</code></a>. Kôd u nastavku pokazuje kako postaviti i autentificirati <code>gspread</code>.

In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

U nastavku je naveden manji skup primjera za <code>gspread</code>. Dodatni primjeri dostupni su na <a href="https://github.com/burnash/gspread#more-examples">stranici <code>gspread</code> na GitHubu</a>.

### Izrada nove tablice s podacima iz Pythona

In [None]:
sh = gc.create('My cool spreadsheet')

Nakon izvršavanja prethodne ćelije prikazat će se nova proračunska tablica s nazivom Moja fora proračunska tablica na <a href="https://sheets.google.com/">https://sheets.google.com</a>.

Otvorite novu tablicu i dodajte nasumične podatke.

In [None]:
worksheet = gc.open('My cool spreadsheet').sheet1

cell_list = worksheet.range('A1:C2')

import random
for cell in cell_list:
  cell.value = random.randint(1, 10)

worksheet.update_cells(cell_list)

{'spreadsheetId': '1dsQeN0YzXuM387l_CuyEbsYzL2ew9TJFzR-E-RQnwxs',
 'updatedCells': 6,
 'updatedColumns': 3,
 'updatedRange': 'Sheet1!A1:C2',
 'updatedRows': 2}

### Preuzimanje podataka iz tablice u Python kao Pandas DataFrame

Pročitajte nasumične podatke koje smo umetnuli gore i pretvorite rezultat u <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html">Pandas DataFrame</a>.

In [None]:
worksheet = gc.open('My cool spreadsheet').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)

import pandas as pd
pd.DataFrame.from_records(rows)

[['6', '3', '4'], ['7', '2', '1']]


Unnamed: 0,0,1,2
0,6,3,4
1,7,2,1


# InteractiveSheet

Sada možete ugraditi Google tablice uživo u Colabu pomoću biblioteke <code>InteractiveSheet</code>. To znači da možete izrađivati i uređivati podatke u Google tablicama i jednostavno ih uključiti u svoj notebook pomoću Pandas DataFramesa, sve iz Colaba.

In [None]:
from google.colab import sheets

# Create a new interactive sheet and add data to it.
sheet = sheets.InteractiveSheet()

In [None]:
# Get a Pandas DataFrame from the selected worksheet
df = sheet.as_df()

In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))

# Create a new sheet and include the column names as the first row.
sheet = sheets.InteractiveSheet(df=df, title='foo', include_column_headers=True)

In [None]:
# Push data from Colab to the selected worksheet
df2 = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
sheet.update(df=df)

In [None]:
# Display the sheet in the output of the current cell
sheet.display()

# Google Cloud Storage &#40;GCS&#41;

Da biste upotrebljavali Colaboratory s GCS-om, morat ćete izraditi <a href="https://cloud.google.com/storage/docs/projects">Google Cloud projekt</a> ili upotrijebiti postojeći.

Navedite ID projekta u nastavku:

In [None]:
project_id = 'Your_project_ID_here'

Datoteke u GCS-u nalaze se u <a href="https://cloud.google.com/storage/docs/buckets">segmentima</a>.

Segmenti moraju imati globalno jedinstven naziv, stoga ga generiramo ovdje.

In [None]:
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

Za pristup GCS-u potrebna je autentifikacija.

In [None]:
from google.colab import auth
auth.authenticate_user()

GCS-u se može pristupiti putem uslužnog programa u naredbenom retku <code>gsutil</code> ili putem nativnog Python API-ja.

## `gsutil`

Najprije konfiguriramo <code>gsutil</code> za upotrebu projekta koji smo prethodno odredili pomoću koda <code>gcloud</code>.

In [None]:
!gcloud config set project {project_id}

Updated property [core/project].


Izradite lokalnu datoteku za prijenos.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Izradite segment u koji ćemo prenijeti datoteku &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/mb">dokumentacija</a>&#41;.

In [None]:
!gsutil mb gs://{bucket_name}

Creating gs://colab-sample-bucket-44971372-baaf-11e7-ae30-0242ac110002/...


Kopirajte datoteku u naš novi segment &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/cp">dokumentacija</a>&#41;.

In [None]:
!gsutil cp /tmp/to_upload.txt gs://{bucket_name}/

Copying file:///tmp/to_upload.txt [Content-Type=text/plain]...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       


Kopirajte sadržaj nove kopirane datoteke kako biste provjerili je li sve u redu &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/cat">dokumentacija</a>&#41;.


In [None]:
!gsutil cat gs://{bucket_name}/to_upload.txt

my sample file

In [None]:
# @markdown Nakon dovršetka prijenosa podaci će se prikazivati u pregledniku pohrane na Cloud Consoleu za vaš projekt:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Na kraju ćemo preuzeti datoteku koju smo upravo prenijeli u prethodnom primjeru. Potrebno je samo obrnuti redoslijed u naredbi <code>gsutil cp</code>.

In [None]:
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt

# Ispišite rezultat da biste provjerili je li prijenos uspio.
!cat /tmp/gsutil_download.txt

Copying gs://colab-sample-bucket483f20dc-baaf-11e7-ae30-0242ac110002/to_upload.txt...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       
my sample file

## Python API

Ti se isječci temelje na <a href="https://github.com/GoogleCloudPlatform/storage-file-transfer-json-python/blob/master/chunked_transfer.py">većem primjeru</a> koji prikazuje dodatne upotrebe API-ja.

Najprije izrađujemo servisni klijent.

In [None]:
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

Izradite lokalnu datoteku za prijenos.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Izradite segment u prethodno navedenom projektu.

In [None]:
# Upotrijebite neki drugi globalno jedinstven naziv segmenta iz prethodno navedenog primjera za gsutil.
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

body = {
  'name': bucket_name,
  # For a full list of locations, see:
  # https://cloud.google.com/storage/docs/bucket-locations
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')

Done


Prenesite datoteku u novoizrađeni segment.

In [None]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name,
                                       name='to_upload.txt',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')

Upload complete


In [None]:
# @markdown Nakon dovršetka prijenosa podaci će se prikazivati u pregledniku pohrane na Cloud Consoleu za vaš projekt:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Preuzmite datoteku koju smo upravo prenijeli.

In [None]:
from apiclient.http import MediaIoBaseDownload

with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='to_upload.txt')
  media = MediaIoBaseDownload(f, request)

  done = False
  while not done:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = media.next_chunk()

print('Download complete')

Download complete


Provjerite preuzetu datoteku.


In [None]:
!cat /tmp/downloaded_from_gcs.txt

my sample file