<a href="https://colab.research.google.com/github/sn0422j/notebook/blob/master/Pytorch_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pytorch 基礎知識


**1-PyTorch 入門！人気急上昇中のPyTorchで知っておくべき6つの基礎知識**

https://www.codexa.net/pytorch-python/

特徴1 Numpyと類似した操作方法

特徴2 海外を中心にコミュニティが活発

特徴3 動的な計算グラフ

In [0]:
#Google ColabへPyTorchをインストール
!pip install -U torch torchvision

Requirement already up-to-date: torch in /usr/local/lib/python3.6/dist-packages (1.3.0+cu100)
Requirement already up-to-date: torchvision in /usr/local/lib/python3.6/dist-packages (0.4.1+cu100)


In [0]:
import torch
print(torch.__version__)

1.3.0+cu100


①PyTorchの基本操作

In [0]:
#テンソルの作成
x = torch.Tensor(2, 2)
print('x :',x)

x2_list = [[1,2,3],[4,5,6]]
x2 = torch.Tensor(x2_list)
print('x2 :',x2)

#サイズ確認
print('x2size :', x2.size())

#乱数の生成
print('一様分布乱数',torch.rand(2,2))
print('正規分布乱数',torch.randn(2,2))

#その他行列の生成
print('単位行列',torch.eye(3,3))
print('空のテンソル',torch.empty(4,1))
print('等間隔の数列',torch.linspace(0, 100, 11))

In [0]:
#基本演算
x = torch.Tensor([[2, 2], [1, 1]])
y = torch.Tensor([[3, 2], [1, 2]])

print('足し算',x + y)
print('引き算',x + y)
print('アダマール積',torch.mul(x,y))
print('内積',torch.mm(x,y))
print('全要素の総和',torch.sum(x))
print('全要素の標準偏差',torch.std(x))
print('全要素の算術平均',torch.mean(x))

②PyTorchを使って線形回帰

PyTorchのコーディング演習として、世界各国で売られているラーメンのレビューデータを使いましょう。ラーメンのブランド、発売国、さらに販売種別（カップラーメン・袋麺）などの特徴量をから評価を予測してみます。

https://www.kaggle.com/residentmario/ramen-ratings

In [0]:
#from google.colab import files
#files.upload()

In [0]:
#!mkdir -p ~/.kaggle
#!mv kaggle.json ~/.kaggle/
#!pip install kaggle
#!chmod 600 /root/.kaggle/kaggle.json

In [0]:
#!kaggle datasets download -d residentmario/ramen-ratings

In [0]:
#import zipfile
#with zipfile.ZipFile('ramen-ratings.zip') as existing_zip:
    #existing_zip.extractall()

In [0]:
!ls

ramen-ratings.csv  ramen-ratings.zip  sample_data


In [0]:
#準備
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 
import torch.nn as nn
from sklearn.metrics import mean_squared_error

ramen = pd.read_csv('ramen-ratings.csv')
ramen.head() #最初の5行を表示

Unnamed: 0,Review #,Brand,Variety,Style,Country,Stars,Top Ten
0,2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
1,2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan,1.0,
2,2578,Nissin,Cup Noodles Chicken Vegetable,Cup,USA,2.25,
3,2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
4,2576,Ching's Secret,Singapore Curry,Pack,India,3.75,


In [0]:
#探索的データ解析
print('data size :',ramen.shape)
print('data by country :')
print(ramen['Country'].value_counts()[0:10])

data size : (2580, 7)
data by country :
Japan          352
USA            323
South Korea    309
Taiwan         224
Thailand       191
China          169
Malaysia       156
Hong Kong      137
Indonesia      126
Singapore      109
Name: Country, dtype: int64


In [0]:
#未評価（Unrated）のラーメンを除外
mask = ramen.index[ramen['Stars'] == 'Unrated']
ramen = ramen.drop(index = mask)
ramen.shape

(2577, 7)

In [0]:
# Starsのデータ型を確認
print(ramen['Stars'].dtype)
# float型へ変換
ramen['Stars'] = ramen['Stars'].astype(float)
# 改めてデータ型を確認
print(ramen['Stars'].dtype)

object
float64


In [0]:
# 不要なカラムをデータから削除(Brand, Style, Countryのみ使う)
ramen = ramen.drop(columns=['Review #', 'Top Ten', 'Variety'])
ramen.head()

Unnamed: 0,Brand,Style,Country,Stars
0,New Touch,Cup,Japan,3.75
1,Just Way,Pack,Taiwan,1.0
2,Nissin,Cup,USA,2.25
3,Wei Lih,Pack,Taiwan,2.75
4,Ching's Secret,Pack,India,3.75


In [0]:
# 特徴量のダミー変数化
Country = pd.get_dummies(ramen['Country'], prefix='Country', drop_first=True)
Brand = pd.get_dummies(ramen['Brand'], prefix='Brand',drop_first=True)
Style = pd.get_dummies(ramen['Style'], prefix='Style',drop_first=True)
# ダミー変数化した特徴量を結合
ramendf = pd.concat([Country, Brand,Style], axis=1)
# 確認
ramendf.head()

Unnamed: 0,Country_Bangladesh,Country_Brazil,Country_Cambodia,Country_Canada,Country_China,Country_Colombia,Country_Dubai,Country_Estonia,Country_Fiji,Country_Finland,Country_Germany,Country_Ghana,Country_Holland,Country_Hong Kong,Country_Hungary,Country_India,Country_Indonesia,Country_Japan,Country_Malaysia,Country_Mexico,Country_Myanmar,Country_Nepal,Country_Netherlands,Country_Nigeria,Country_Pakistan,Country_Philippines,Country_Poland,Country_Sarawak,Country_Singapore,Country_South Korea,Country_Sweden,Country_Taiwan,Country_Thailand,Country_UK,Country_USA,Country_United States,Country_Vietnam,Brand_7 Select,Brand_7 Select/Nissin,Brand_A-One,...,Brand_Unif Tung-I,Brand_Unif-100,Brand_United,Brand_Unox,Brand_Unzen,Brand_Urban Noodle,Brand_Ve Wong,Brand_Vedan,Brand_Vifon,Brand_Vina Acecook,Brand_Vit's,Brand_Wai Wai,Brand_Wang,Brand_Weh Lih,Brand_Wei Chuan,Brand_Wei Lih,Brand_Wei Wei,Brand_Westbrae,Brand_Western Family,Brand_World O' Noodle,Brand_Wu Mu,Brand_Wu-Mu,Brand_Wugudaochang,Brand_Xiao Ban Mian,Brand_Xiuhe,Brand_Yamachan,Brand_Yamadai,Brand_Yamamori,Brand_Yamamoto,Brand_Yum Yum,Brand_Yum-Mie,Brand_Zow Zow,Brand_iMee,Brand_iNoodle,Style_Bowl,Style_Box,Style_Can,Style_Cup,Style_Pack,Style_Tray
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0


In [0]:
# 特徴量とターゲットへ分割
X = np.array(ramendf, dtype=np.float32) 
y = np.array(ramen[['Stars']], dtype=np.float32)

In [0]:
y.shape

(2577, 1)

線形回帰は nn.Linear() を利用します。損失関数として MSELoss() 、最適化関数として MSELoss() を利用します。

・反復処理回数（epoch）を1000と設定

・特徴量とターゲットを from_numpy() でNumpy配列からテンソルへ変換を行う

・推測値（outputs）を出力、コスト関数で実際値（targets）と比較してコスト算出

・誤差逆伝播の処理を実行

In [0]:
# 線形回帰モデル
model = nn.Linear(X.shape[1], y.shape[1])
# 損失関数
loss = nn.MSELoss()
# 最適化関数
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

In [0]:
for epoch in range(1000):
    # ステージ1 Numpy配列からテンソルへ変換
    inputs = torch.from_numpy(X)
    targets = torch.from_numpy(y)
    
    # ステージ2 推測値を出力して誤差（コスト）を算出
    outputs = model(inputs)
    cost = loss(outputs, targets)
    
    # ステージ3 誤差逆伝播（バックプロパゲーション）
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    # ステージ4 50回毎にコストを表示
    if (epoch+1) % 100 == 0:
        print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, 1000, cost.item()))

Epoch [100/1000], Loss: 0.7868
Epoch [200/1000], Loss: 0.7394
Epoch [300/1000], Loss: 0.7106
Epoch [400/1000], Loss: 0.6904
Epoch [500/1000], Loss: 0.6753
Epoch [600/1000], Loss: 0.6634
Epoch [700/1000], Loss: 0.6537
Epoch [800/1000], Loss: 0.6455
Epoch [900/1000], Loss: 0.6386
Epoch [1000/1000], Loss: 0.6327


In [0]:
# 予測を出力
y_pred = model(torch.from_numpy(X)).data.numpy()
print(y_pred)
print('MSE score :',mean_squared_error(y, y_pred))

[[4.1026893]
 [2.9811954]
 [3.4563901]
 ...
 [2.9588928]
 [2.9588928]
 [2.3429844]]
MSE score : 0.6325946
