# **Colabの初期設定**
`Kaggle`データセットのダウンロードはここから`GoogleDrive`上に行う。

ダウンロード後の分析作業は、各ノートブック`.ipynb`を作成して行う。

その際、`GoogleDrive`のマウントは各ノート上で実行する。

下記は、`KaggleAPI`の有効化、`GoogleDrive`のマウント化、`Tree`モジュールの追加、`Kaggle`データセットのダウンロード、githubとの連携～1st commitまでの実施手順。

## **`Kaggle API`のインストール**

In [0]:
!pip install kaggle

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
The folder you are executing pip from can no longer be found.


## **`Kaggle API`の有効化**

[Kaggle API with Colab](https://colab.research.google.com/drive/1eufc8aNCdjHbrBhuy7M7X6BGyzAyRbrF#scrollTo=5l1V_oxXsZ8l&forceEdit=true&sandboxMode=true)

下記実行前に、`kaggle.json`をあらかじめDLし、`GoogleDrive`に格納しておく。実行すると認証設定が呼び出され、許可すると`GoogleDrive`ディレクトリ内から`kaggle.json`ファイルが検索され、`root/.kaggle`以下に格納される。
元々のコードだと

`filename = "/content/.kaggle/kaggle.json"`

となっているが、API起動時に参照エラーが発生するため

`filename = "/root/.kaggle/kaggle.json"`

へ変更する事。


In [0]:
from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/root/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

Download 100%.


## **`GoogleDrive`のマウント**
先に`GoogleDrive`を`Colab上へ`マウントした場合、`Googledrive`上の`kaggle.json`内の記載が空白化する事象が発生したため、`KaggleAPI`導入後に実施

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


## **`Tree`パッケージのインストール**

ディレクトリ構成の記載に便利なため導入

In [0]:
!apt-get install tree

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-430
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 7 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 1s (60.6 kB/s)
Selecting previously unselected package tree.
(Reading database ... 135004 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


## ディレクトリ構成

`My Drive`以下は、`GoogleDrive`のマウント先。データを保存することで、
`GoogleDrive`に同期される。

`Colab Notebooks`：Notebookの格納

`datasets`：各データセットの格納

`setting`：設定ファイルの格納

In [59]:
!tree -d ../../

../../
└── My Drive
    ├── private
    └── remote-colab
        ├── Colab Notebooks
        ├── datasets
        │   ├── kaggle
        │   │   └── titanic
        │   └── ml100knock
        │       ├── 10_Questionnaire_analysis
        │       ├── 1_web_order
        │       ├── 2_Retail_data
        │       ├── 3_Customer_information
        │       ├── 4_ Customer_behavior
        │       ├── 5_Customer_withdrawal
        │       ├── 6_Logistics_route
        │       ├── 7_ Logistics_network
        │       ├── 8_ Numerical_simulation
        │       └── 9_Potential_customer
        │           ├── img
        │           └── mov
        └── setting

21 directories


## **`Kaggle`データセットのダウンロード**

使用するデータセットは`./drive/My\ Drive/datasets/kaggle/{competition title}/`以下に格納

`> !kaggle competitions download -h`

```
usage: kaggle competitions download [-h] [-f FILE_NAME] [-p PATH] [-w] [-o]
                                    [-q]
                                    [competition]

optional arguments:
  -h, --help            show this help message and exit
  competition           Competition URL suffix (use "kaggle competitions list" to show options)
                        If empty, the default competition will be used (use "kaggle config set competition")"
  -f FILE_NAME, --file FILE_NAME
                        File name, all files downloaded if not provided
                        (use "kaggle competitions files -c <competition>" to show options)
  -p PATH, --path PATH  Folder where file(s) will be downloaded, defaults to current working directory
  -w, --wp              Download files to current working path
  -o, --force           Skip check whether local version of file is up to date, force file download
  -q, --quiet           Suppress printing information about the upload/download progress
```






In [0]:
!kaggle competitions download -c titanic -p ./drive/My\ Drive/githyb/datasets/kaggle/titanic/

Downloading gender_submission.csv to ./drive/My Drive/githyb/datasets/kaggle/titanic
  0% 0.00/3.18k [00:00<?, ?B/s]
100% 3.18k/3.18k [00:00<00:00, 438kB/s]
Downloading train.csv to ./drive/My Drive/githyb/datasets/kaggle/titanic
  0% 0.00/59.8k [00:00<?, ?B/s]
100% 59.8k/59.8k [00:00<00:00, 8.30MB/s]
Downloading test.csv to ./drive/My Drive/githyb/datasets/kaggle/titanic
  0% 0.00/28.0k [00:00<?, ?B/s]
100% 28.0k/28.0k [00:00<00:00, 6.26MB/s]


## Githubとの連携
githubとcolabの連携は、[personal token](https://help.github.com/ja/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)と[https URL](https://help.github.com/ja/github/using-git/which-remote-url-should-i-use#cloning-with-https-urls-recommended)を用いて行う。
[参考](https://towardsdatascience.com/google-drive-google-colab-github-dont-just-read-do-it-5554d5824228)

モジュール検索パスの追加

In [0]:
import sys
from os.path import join

REPO_NAME = 'remote-colab'
PROJECT_PATH = '/content/drive/My Drive/'+ REPO_NAME + '/'
sys.path.append(PROJECT_PATH)

設定ファイルのインポート

In [0]:
from setting import personal_setting as PS
# PS.email_address = {'your setting e-mail address'}
# PS.personal_token = {'your token'}
# PS.user_name = {'your name'}

Clone URL・プロジェクトディレクトリの作成

In [0]:
GIT_PATH = "https://" + PS.personal_token + "@github.com/" + PS.user_name + "/" + REPO_NAME + ".git"
print("GIT_PATH: ", GIT_PATH)

# プロジェクトディレクトリの作成
!mkdir "{PROJECT_PATH}"
!cd "{PROJECT_PATH}"

クローン

In [0]:
!git clone "{GIT_PATH}"

差分の更新・コミット

In [0]:
!git add .
!git config --global user.email "{PS.email_address}"
!git config --global user.name "{PS.user_name}"

In [79]:
!git commit -m 'some fixes'

[master 74ca7ee] some fixes
 33 files changed, 48206 insertions(+), 3 deletions(-)
 create mode 100644 .gitignore
 rewrite Colab Notebooks/setting.ipynb (97%)
 create mode 100644 "datasets/ml100knock/10_Questionnaire_analysis/10\347\253\240_\343\202\242\343\203\263\343\202\261\343\203\274\343\203\210\345\210\206\346\236\220\343\202\222\350\241\214\343\201\206\343\201\237\343\202\201\343\201\256\350\250\200\350\252\236\345\207\246\347\220\206\357\274\221\357\274\220\346\234\254\343\203\216\343\203\203\343\202\257.ipynb"
 create mode 100644 "datasets/ml100knock/10_Questionnaire_analysis/10\347\253\240_\343\202\242\343\203\263\343\202\261\343\203\274\343\203\210\345\210\206\346\236\220\343\202\222\350\241\214\343\201\206\343\201\237\343\202\201\343\201\256\350\250\200\350\252\236\345\207\246\347\220\206\357\274\221\357\274\220\346\234\254\343\203\216\343\203\203\343\202\257_answer.ipynb"
 create mode 100644 "datasets/ml100knock/1_web_order/1\347\253\240_\343\202\246\343\202\247\343\203\22

In [83]:
!git push origin master

Counting objects: 52, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (50/50), done.
Writing objects: 100% (52/52), 160.83 MiB | 7.73 MiB/s, done.
Total 52 (delta 18), reused 0 (delta 0)
remote: Resolving deltas: 100% (18/18), completed with 2 local objects.[K
To https://github.com/otompton/remote-colab.git
   cca94f8..74ca7ee  master -> master


In [82]:
!echo PS.email_address

PS.email_address


In [0]:
!cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	symlinks = false
[remote "origin"]
	url = https://669c6c60d72e3214cff5cbf0b7d326763346c308@github.com/otompton/remote-colab.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master


In [60]:

print(sys.path)

['', '/env/python', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.6/dist-packages/IPython/extensions', '/root/.ipython']


In [61]:
%pwd ~/

'/content/drive/My Drive/remote-colab'

In [0]:
import sys
sys.path.append('/content/drive/My Drive/remote-colab')

In [0]:
from setting import personal_setting as ps

In [64]:
ps.personal_token

'a9a6bb9b9e0783d960cf2f011c2a64fe9017f709'

In [65]:
repo_name = 'remote-colab'
print('/content/drive/My Drive/'+ repo_name)

/content/drive/My Drive/remote-colab


In [72]:
!tree -d ../

../
├── private
└── remote-colab
    ├── Colab Notebooks
    ├── datasets
    │   ├── kaggle
    │   │   └── titanic
    │   └── ml100knock
    │       ├── 10_Questionnaire_analysis
    │       ├── 1_web_order
    │       ├── 2_Retail_data
    │       ├── 3_Customer_information
    │       ├── 4_ Customer_behavior
    │       ├── 5_Customer_withdrawal
    │       ├── 6_Logistics_route
    │       ├── 7_ Logistics_network
    │       ├── 8_ Numerical_simulation
    │       └── 9_Potential_customer
    │           ├── img
    │           └── mov
    └── setting
        └── __pycache__

21 directories


In [85]:
!git rm --cached setting/personal_setting.py

rm 'setting/personal_setting.py'
