<a href="https://colab.research.google.com/github/EAakiyama3104/python_lecture/blob/master/%5BPython%E8%AC%9B%E5%BA%A7%5D%E7%AC%AC3%E5%9B%9EAPI%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%9F%E3%83%87%E3%83%BC%E3%82%BF%E5%8F%8E%E9%9B%86.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

今回学ぶこと


* YouTube Data API を呼ぶ方法
* API からのレスポンスを使いやすい形式にまとめる
* データを可視化
* 1つのチャンネルのデータを分析
* 複数のチャンネルのデータを分析




前準備

In [0]:
# 日本語フォントをダウンロードする。
!apt-get -y install fonts-ipafont-gothic

# キャッシュを削除する。
!rm /root/.cache/matplotlib/fontList.json　# 旧cache
!rm /root/.cache/matplotlib/fontlist-v310.json # 消すべきcache

※フォントを読み込むため、ランタイムを再起動してください

In [0]:
import matplotlib
matplotlib.get_cachedir()

In [0]:
!ls /root/.cache/matplotlib

YouTube Data APIを用います

[YouTube Data API の概要](https://developers.google.com/youtube/v3/getting-started)

1. Google アカウントを作成
2. [Google API Console](https://console.developers.google.com)からプロジェクトを作成
3. 「APIとサービスを有効化」をクリックし、「YouTube Data API」と検索。「YouTube Data API v3」を選択
4. 「有効にする」をクリック
5. 「認証情報」タブから「認証情報を作成」をクリック。APIキーを選択。
6. 表示されたAPIキーをコピー








In [0]:
# APIキーを以下の値にセット
API_KEY = "作成したAPIキーを入れてください。"

In [0]:
import apiclient

import pandas as pd
import seaborn as sns

In [0]:
# グラフ描画の設定
sns.set_style("whitegrid")
sns.set_palette("husl")
sns.set_context("notebook")
sns.set(font='IPAGothic')

In [0]:
youtube_service = apiclient.discovery.build("youtube", "v3", developerKey=API_KEY)

1 グループのデータを分析

In [0]:
CHANNEL_ID = "UCDwcZ85zjLKD-3-jqlv1wQQ"

In [0]:
channel_response = youtube_service.channels().list(part="contentDetails", id=CHANNEL_ID).execute()
channel_response

In [0]:
upload_playlist = channel_response["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]
upload_playlist

In [0]:
channel_stats = {}
channel_stats["video_id"] = []
channel_stats["title"] = []
channel_stats["view_count"] = []
channel_stats["comment_count"] = []
channel_stats["like_count"] = []
channel_stats["dislike_count"] = []

page_token = None
while True:
  playlist_items_response = youtube_service.playlistItems().list(part="snippet", playlistId=upload_playlist, maxResults=50, pageToken=page_token).execute()
  items = playlist_items_response["items"]
  video_ids = [item["snippet"]["resourceId"]["videoId"] for item in items]
  videos_response = youtube_service.videos().list(part="statistics,snippet", id=",".join(video_ids)).execute()
  videos_stats = videos_response["items"]
  for stat in videos_stats:
    channel_stats["video_id"].append(stat["id"])
    channel_stats["title"].append(stat["snippet"]["title"])
    channel_stats["view_count"].append(int(stat["statistics"].get("viewCount")) if stat["statistics"].get("viewCount") else None)
    channel_stats["comment_count"].append(int(stat["statistics"].get("commentCount")) if stat["statistics"].get("commentCount") else None)
    channel_stats["like_count"].append(int(stat["statistics"].get("likeCount")) if stat["statistics"].get("likeCount") else None)
    channel_stats["dislike_count"].append(int(stat["statistics"].get("dislikeCount")) if stat["statistics"].get("dislikeCount") else None)
  if not "nextPageToken" in playlist_items_response:
    break
  else:
    page_token = playlist_items_response["nextPageToken"]

In [0]:
# Pandas DataFrame に変換
channel_df = pd.DataFrame(channel_stats).dropna()

In [0]:
channel_df["comment_ratio"] = channel_df["comment_count"] / channel_df["view_count"] * 100
channel_df["like_ratio"] = channel_df["like_count"] / channel_df["view_count"] * 100
channel_df["dislike_ratio"] = channel_df["dislike_count"] / channel_df["view_count"] * 100
channel_df["like_dislike_ratio"] = channel_df["like_count"] / (channel_df["like_count"] + channel_df["dislike_count"]) * 100

再生数が多い順に動画を表示

In [0]:
channel_df.sort_values(by="view_count", ascending=False).head(15)

高評価と低評価の割合

ヒストグラムは distplot を使用

In [0]:
_ = sns.distplot(channel_df.like_dislike_ratio.dropna())

指標対指標の相関関係

全パターンを一度に見る場合は pairplot を使用

In [0]:
_ = sns.pairplot(channel_df, diag_kind="kde")

気になった指標どうしは jointplot で詳細を確認

In [0]:
_ = sns.jointplot('comment_ratio', 'like_ratio', data=channel_df)

複数グループのデータを比較

In [0]:
YOUTUBE_CHANNELS = (
    "UCoKXb95K5h3sME3c9OCBaeA", # モーニング娘。 ’19
    "UCxjXU89x6owat9dA8Z-bzdw", # AKB48
    # "UCDwcZ85zjLKD-3-jqlv1wQQ", # アンジュルム
    # "UC6FadPgGviUcq6VQ0CEJqdQ", # Juice=Juice
    # "UCoxHJjctNXq1UgGk1vx3LUw", # カントリー・ガールズ
    # "UCt3f_Tu1lNua1xLQZu2Td-w", # こぶしファクトリー
    # "UCXTsCXNGHmePgo3a47hnsAA", # つばきファクトリー
    # "UCE5GP4BHm2EJx4xyxBVSLlg", # BEYOOOOONDS
    "UCzgEXy7eTK9EyhUbU0DZ_Lg", # 26時のマスカレイド
    "UCv7VutirxDn3RWIJXI68n_A", # =LOVE
    "UC6YNWTm6zuMFsjqd0PO3G-Q", # ももいろクローバーZ
    "UCUzpZpX2wRYOk3J8QTFGxDg", # 乃木坂46
    "UCmr9bYmymcBmQ1p2tLBRvwg", # 欅坂46
    "UCmo55h1NKPRnxiU21b4AHFg", # 虹のコンキスタドール
    "UC29nPPYXVGIhE3ZDMzLSbPQ", # フィロソフィーのダンス
    "UCfVA-R7NDd1QCuCtvWEusqQ", # アップアップガールズ
    "UClrYrddWLPz-18PyGK_ADPg", # でんぱ組.inc
    # "UCe_oTYByLWQYCUmgmOMU_xw", # IZ*ONE
    # "UCCRb6nYKaT8tzLA8CwDdUtw", # TWICE
    "UClhL_woWoXnmKvv2H8_mxwg", # 神宿
)

In [0]:
channel_dfs = []
for channel_id in YOUTUBE_CHANNELS:
  channel_response = youtube_service.channels().list(part="contentDetails", id=channel_id).execute()
  upload_playlist = channel_response["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]
  channel_stats = {}
  channel_stats["video_id"] = []
  channel_stats["title"] = []
  channel_stats["channel_id"] = []
  channel_stats["channel_title"] = []
  channel_stats["view_count"] = []
  channel_stats["comment_count"] = []
  channel_stats["like_count"] = []
  channel_stats["dislike_count"] = []

  page_token = None
  while True:
    playlist_items_response = youtube_service.playlistItems().list(part="snippet", playlistId=upload_playlist, maxResults=50, pageToken=page_token).execute()
    items = playlist_items_response["items"]
    video_ids = [item["snippet"]["resourceId"]["videoId"] for item in items]
    videos_response = youtube_service.videos().list(part="statistics,snippet", id=",".join(video_ids)).execute()
    videos_stats = videos_response["items"]
    for stat in videos_stats:
      channel_stats["video_id"].append(stat["id"])
      channel_stats["title"].append(stat["snippet"]["title"])
      channel_stats["channel_id"].append(channel_id)
      channel_stats["channel_title"].append(stat["snippet"]["channelTitle"])
      channel_stats["view_count"].append(int(stat["statistics"].get("viewCount")) if stat["statistics"].get("viewCount") else None)
      channel_stats["comment_count"].append(int(stat["statistics"].get("commentCount")) if stat["statistics"].get("commentCount") else None)
      channel_stats["like_count"].append(int(stat["statistics"].get("likeCount")) if stat["statistics"].get("likeCount") else None)
      channel_stats["dislike_count"].append(int(stat["statistics"].get("dislikeCount")) if stat["statistics"].get("dislikeCount") else None)
    if not "nextPageToken" in playlist_items_response:
      break
    else:
      page_token = playlist_items_response["nextPageToken"]
  channel_df = pd.DataFrame(channel_stats).dropna()
  channel_df["comment_ratio"] = channel_df["comment_count"] / channel_df["view_count"] * 100
  channel_df["like_ratio"] = channel_df["like_count"] / channel_df["view_count"] * 100
  channel_df["dislike_ratio"] = channel_df["dislike_count"] / channel_df["view_count"] * 100
  channel_df["like_dislike_ratio"] = channel_df["like_count"] / (channel_df["like_count"] + channel_df["dislike_count"]) * 100
  channel_dfs.append(channel_df)

In [0]:
master_df = pd.concat(channel_dfs)

In [0]:
master_df.sort_values(by="view_count", ascending=False).head(10)

複数のグループを比較する時は、boxplot を使用

高評価と低評価の割合をチャンネル毎に比較

In [0]:
_ = sns.boxplot(y="channel_title", x="like_dislike_ratio", data=master_df, orient="h", showfliers=False)

チャンネル毎に、コメント率を比較

In [0]:
_ = sns.boxplot(y="channel_title", x="comment_ratio", data=master_df, orient="h", showfliers=False)

<考察例>
コメント率が高いほど、熱心なファンが多いと言える。
熱心なファンの割合が多いグループはまだライト層が相対的に少ないので、より人気が出る可能性が高い。