#Neural Network Embedding Recommendation System
1. Load in data and clean
2. Prepare data for supervised machine learning task
3. Build the entity embedding neural network
4. Train the neural network on prediction task
5. Extract embeddings and find most similar books and wikilinks
6. Visualize the embeddings using dimension reduction techniques

## 1. Read Data & Clean
: 모든 책의 데이터는 json으로 저장되어 있음. 해당 데이터는 위키피디아의 모든 책에 대한 기사가 포함되어 있음

1-1. 데이터 불러오기

In [None]:
from IPython.core.interactiveshell import InteractiveShell

# Set shell to show all lines of output
# jupyter notebook에서 모든 ourput 나타내기
InteractiveShell.ast_node_interactivity = 'all'

In [None]:
# import os
# print(os.listdir("./input"))

In [None]:
# utils import get_file == url -> file download
import tensorflow as tf
# from keras.utils import get_file

x = tf.keras.utils.get_file('found_books_filtered.ndjson', 'https://raw.githubusercontent.com/WillKoehrsen/wikipedia-data-science/master/data/found_books_filtered.ndjson')

import json

books = []

with open(x, 'r') as fin: # 'r' 읽기용으로 파일 열기
    # Append each line to the books
    books = [json.loads(l) for l in fin]

# Remove non-book articles
books_with_wikipedia = [book for book in books if 'Wikipedia:' in book[0]]
books = [book for book in books if 'Wikipedia:' not in book[0]]
print(f'Found {len(books)} books.')


1-2. 데이터 전처리
- 책에 관한 모든 페이지를 검색해 책의 제목, 기본 정보, 다른 위키피디아 페이지(위키링크)를 가리키는 링크, 외부 사이트 링크를 저장
- 추천 시스템을 만들기 위해 필요한 정보는 제목과 위키링크 두 가지입니다
- 일부 기사에는 책에대한 기사가 아닌 것들을 잡아낸다.

In [None]:
# book list 내부의 book_with_wikipedia 일부 가져오기
[book[0] for book in books_with_wikipedia][:3]



In [None]:
# 제목, 'infoboxs book'의 정보, 위키피디아 링크, 외부링크, 최종수정날짜, 기사의 문자 수
# title, information from 'infobos book' template, wikipedia links, externel links, the date of last edit, the number of characters in article
n = 21
books[n][0], books[n][1], books[n][2][:5], books[n][:5], books[n][:5], books[n][4], books[n][5]

In [None]:
# 책정보 정수로 변경하기 # index 
book_index = {book[0] : idx for idx, book in enumerate(books)} #enumerate : 인덱스, 원소로 이루어진 tuple로 만들어줌
index_book = {idx : book for book, idx in book_index.items()} # items() : key와 대응값 가져오기 # book_index의 대응값 'title' 가져오기

book_index['Dreaming Spies']
index_book[98]
index_book[100]


In [None]:
# Exploring Wikilinks
# chain method = 자기자신을 반환하면서 다른 함수를 지속적으로 호출할 수 있는 방법
from itertools import chain
wikilinks = list(chain(*[book[2] for book in books]))
print(f"There are {len(set(wikilinks))} unique wikilinks.") # set() 중복제거


In [None]:
# 다른책에는 얼마나 많은 wiki link가 있나?
wikilinks_other_book = [link for link in wikilinks if link in book_index.keys()] #link에 key, 대응값이 있다면 wikilinks에서 link list로 뽑아 만든다
print(f"There are {len(set(wikilinks_other_book))} unique wikilinks to other books") # 중복치 제거 길이값

In [None]:
# 가장 많이 연결된 기사 찾기
# items 항목 수가 카운트된 dictionary를 반환하는 함수를 만든다.
# collections module : count(개수세기), OrderedDict

from collections import Counter, OrderedDict

def count_items(l):
  # Return ordered dictionary of counts of objects in `l`
  # create a count object
  counts = Counter(l)

  # sort by highest count first and place in orderd dictionary
  # sort(key = (key인자에 함수를 넘겨주면 우선순위가 정해진다))
  counts = sorted(counts.items(), key = lambda x: x[1], reverse = True)  # x[1] 우선순위 숫자로 변경, reverse = 높은 숫자부터
  counts = OrderedDict(counts) # 데이터 순서 설정(key, val)

  return counts


In [None]:
# Find set of wikilinks from each book and convert to a flattend last
# 각각 책에서 wikilinks 설정을 찾고 1차원으로 변경하기

# list(chain(*(set ~~))) = ????

unique_wikilinks = list(chain(*[list(set(book[2])) for book in books])) # books의 중복치를 제거한 wikilinks 값

wikilink_counts = count_items(unique_wikilinks) # 가장 많이 사용된 wikilinks의 unique_counts 상위 10개 불러오기
list(wikilink_counts.items())[:10]

In [None]:
# 대문자 -> 소문자로 변경하기
wikilinks = [link.lower() for link in unique_wikilinks] # lower() 대문자 -> 소문자 : 동일링크 : paperback, Paperback, PAPERBACK 등 링크 통합
print(f"There are {len(set(wikilinks))} unique wikilinks.")

wikilink_counts = count_items(wikilinks)
list(wikilink_counts.items())[:10]

In [None]:
# 데이터 시각화
# wikilink_ count_top10

# for i in range(11):
#   wikilink_counts_top = list(wikilink_counts.items())[i]
import matplotlib.pyplot as plt
wikilink_counts_top = list(wikilink_counts.items())[:10]

index = [8740, 8648, 6043, 6016, 5665, 4248, 3063, 2983, 2742, 2003]
columns = ['paperback', 'hardcover', 'wikipedia:wikiproject books', 'wikipedia:wikiproject novels', 'science fiction', 'english language', 'united states', 'novel', 'the new york times', 'fantasy']
bar_plot = plt.barh(columns, index)

# def autolabel(rects):
#     for idx,rect in enumerate(bar_plot):
#         height = rect.get_height()
#         ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,index[idx],
#                 ha='center', va='bottom', rotation=0)
# autolabel(bar_plot)
plt.title('wikilink_counts_top10', fontsize=20)
plt.xlabel = ('unique wikilinks')
plt.ylabel = ('counts')
plt.show()

In [None]:
## wikilinks unique값 구하기
# 가장 많은 wikilink 제거하기
# paperback, hardcover, wikipedia:wikiproject books, wikipedia:wikiproject novels
# 이유 
# 1) paperback(얇은 가벼운 재질의 책), hardcover(딱딱한 겉표지), hardback(=hardcover), e-book(책의 종류, 내용 X) -> 도서정보와 관련이 없음
# 2) wikiproject books, wikiproject novels (wikiproject = 단순 책에 대한 정보 정리, 기사 X, 콘텐츠 기반 X)

to_remove = ['hardcover', 'paperback', 'hardback', 'e-book', 'wikipedia:wikiproject books', 'wikipedia:wikiproject novels'] 

for t in to_remove:
    wikilinks.remove(t)
    _ = wikilink_counts.pop(t) # ????? #pop(t) t가 들어간 to_move의 카테고리들을 제거해라

In [None]:
# 4번 이상 나온 wikilinks를 사용한다.

links = [t[0] for t in wikilink_counts.items() if t[1] >= 4] # ?????
type(links)


In [None]:
# wikipedia에서 다른 책과 가장 많이 연결된 도서 top10
# 각 책에대한 book wikilinks 찾기
unique_wikilinks_books = list(chain(*[list(set(link for link in book[2] if link in book_index.keys())) for book in books])) # * 모든것 : chain(10) -> 10 11 12 13 14 이어붙이기 

# 다른책에서 링크된 책의 숫자
wikilinks_book_counts = count_items(unique_wikilinks_books)
list(wikilinks_book_counts.items())[:10]



In [None]:
index = [127, 104, 63, 55, 51, 51, 49, 49, 47, 39]
columns = ['The Encyclopedia of Science Fiction', 'The Discontinuity Guide', 'The Encyclopedia of Fantasy', 'Dracula', 'Encyclopædia Britannica', 'Nineteen Eighty-Four', 'Don Quixote', 'The Wonderful Wizard of Oz', "Alice's Adventures in Wonderland", 'Jane Eyre']
bar_plot = plt.barh(columns, index)

plt.title('Most linked to books by Wikipedia books', fontsize=20)
plt.xlabel = ('unique wikilinks')
plt.ylabel = ('linked counts')
plt.show()

In [None]:
# 데이터 전처리 결과
print(f'Found {len(books)} books.')
print(f'Found {len(links)} links.')

##추가 전처리 작업

In [None]:
# 잠재적인 추가제거 작업
# 데이터 전처리를 추가적으로 진행하고 싶다면 수행
for book in books:
    if 'The New York Times' in book[2] and 'New York Times' in book[2]:
        print(book[0], book[2])
        break




In [None]:
wikilink_counts.get('the new york times')

wikilink_counts.get('new york times')

In [None]:
# Wikilinks to Index
# book 데이터를 정수로 바꾸어주었듯이, Wikilinks도 정수로 바꿔준다
link_index = {link: idx for idx, link in enumerate(links)}
index_link = {idx: link for link, idx in link_index.items()}

link_index['the economist']
index_link[300]
print(f'There are {len(link_index)} wikilinks that will be used.')


######################################################################

#2.Superised Machine Learning Task
임베딩 신경망을 훈련하기 위한 머신 러닝 작업을 개발하기

##  Build a Training Set
지도학습 :
(book, links)의 값이 주어지면 데이터에 있는 정보인지 예측하는 학습모델을 만든다.
trainset을 만들기 위해 모든책의 title, wikilink는 (title, wikilink)튜플 형태로 저장한다. 


In [None]:
# # 책 이름과 책의 고유 인덱스 index 맵핑
# type(books)
# book_index = {book[0]: idx for idx, book in enumerate(books)}
# print(book_index)

# # 링크와 링크 고유 인덱스 mapping
# # links = tuple(links)
# type(links)
# link_index = {book[2]: idx for idx, book in enumerate(links)}
# print(link_index)

In [None]:
# pairs = []

# # 각각 책이 나오도록 반복
# for book in books:

#     title = book[0]
#     book_links = book[2]
#     # 책에 관한 글에 있는 wikilinks 들을 반복
#     for link in book_links:
#         # 책의 인덱스와 링크 페어 저장
#         pairs.extend(book_index[title], link_index[link])

In [None]:
pairs = [] # pairs 빈 list 생성

# 각 책에대한 반복 수행
for book in books:
    # 각 책에대한 링크를 반복 수행
    # 770,000개의 예시 추가
    # 예시 각 title마다 link가 들어간 pairs만들기 (2, 616), (2, 2914) -> 77만개
    pairs.extend((book_index[book[0]], link_index[link.lower()]) for link in book[2] if link.lower() in links)

# 모델을 훈련시키기 위한 777,000개의 예시를 보여준다.
len(pairs), len(links), len(books)
pairs[5000]

In [None]:
# 모델을 훈련시키기 위한 777,000개의 예시를 보여준다.
pairs[5000]

pairs[50]
pairs[51]
pairs[52]
pairs[53]

In [None]:
# 777,000여개의 데이터 확인하기
# 5000번대
index_book[pairs[5000][0]], index_link[pairs[5000][1]]

# 1200번대
index_book[pairs[1200][0]], index_link[pairs[1200][1]]

In [None]:
# 링크, 책 무작위 샘플링 후 확인 -> 맞지 않는 예시 만들기
pairs_set = set(pairs)

# 가장 자주 나타나는 (title, link)
x = Counter(pairs)
sorted(x.items(), key = lambda x: x[1], reverse = True)[:10]

## Train/Test set에 관한 참고사항
validation set or testset을 만들지 않는데, accuracy를 측정하는 모델이 아니라 embedding model을 생성하는 게 주된 목표이다.

model train 후, 새로운 데이터에 대한 model test가 없으므로 과적합을 방지할 필요가 없다. 


In [None]:
# 데이터셋 positive, negative 생성기 만들기
# 다시 복습 # 모르겠음 ㅠㅠ

import numpy as np
import random
random.seed(100)

def generate_batch(pairs, n_positive = 50, negative_ratio = 1.0, classification = False):
  # batch를 저장할 numpy 배열 준비하기
  batch_size = n_positive * (1 + negative_ratio)
  batch = np.zeros((batch_size, 3)) # shape = batch_size * 3

  # 라벨 조정하기
  if classification:
    neg_label = 0
  else:
    neg_label = -1

  # 생성기 만들기
  while True:
    # 랜덤 positive 예시 선택
    for idx, (book_id, link_id) in enumerate(random.sample(pairs, n_positive)):
      batch[idx, :] = (book_id, link_id, 1)

    # idx = 1씩 증가
    idx += 1

    # batchsize가 찰때까지, negative examples 추가
    while idx < batch_size:

      # 랜덤선택
      random_book = random.randrange(len(books))
      random_link = random.randrange(len(links))

      # positive sample이 아니라는 걸 체크
      if (random_book, random_link) not in pairs_set:

        # 배치에 negative_index  추가하기 
        batch[idx, :] = (random_book, random_link, neg_label)
        idx += 1

      
    # Make sure to shuffle order
        np.random.shuffle(batch)
        yield {'book': batch[:, 0], 'link': batch[:, 1]}, batch[:, 2]


In [None]:
# 새로운 batch 얻기
next(generate_batch(pairs, n_positive = 2, negative_ratio = 2))

In [None]:
# train pairs 예시 확인하기
a, b = next(generate_batch(pairs, n_positive = 2, negative_ratio = 2))

for label, book_idx, link_idx in zip(b, a['book'], a['link']):
  print(f'Book: {index_book[book_idx]:30} Link : {index_link[link_idx]:40} Label : {label}')

# x, y = next(generate_batch(pairs, n_positive = 2, negative_ratio = 2))

# for label, b_idx, l_idx in zip(y, x['book'], x['link']):
#     print(f'Book: {index_book[b_idx]:30} Link: {index_link[l_idx]:40} Label: {label}') 

#3.Neural Network Embedding Model
###5 layers 
 1) input layer : book, link에 대한 병렬 입력 \
 2) Embedding : book, link를 위한 병렬 50개 Embedding \
 3) Dot : 내적(Dot product)를 계산해 Embedding 합치기 \
 4) Reshape : Embedding shape를 단일 숫자로 형성 \
 5) Dense : sigmoid activation을 이용한 출력 

In [None]:
from tensorflow.keras.layers import Input, Embedding, Dot, Reshape, Dense
from tensorflow.keras.models import Model

In [None]:
def book_embedding_model(embedding_size=50, classification = False):

  # """Model to embed books and wikilinks using the functional API.
  #    \Trained to discern if a link is present in a article"""

    # 1차원 입력
    book = Input(name='book', shape=[1])
    link = Input(name='link', shape=[1])

    # 책 Embedding(None, 1, 50)
    book_embedding = Embedding(name = 'book_embedding',
                               input_dim = len(book_index),
                               output_dim = embedding_size)(book)

    # link Embedding(None, 1, 50)
    link_embedding = Embedding(name = 'link_embedding',
                               input_dim = len(link_index),
                               output_dim = embedding_size)(link)

    # 내적으로 book&link embedding 1개의 Embedding으로 변형
    # shape will be(None, 1, 1)
    # Dot(name, normalize(정규화), axes(샘플 간 내적계산))
    merged = Dot(name = 'dot_product', normalize = True, axes=2)([book_embedding, link_embedding])

    # Reshape to be single Number(shape will be(None, 1))
    merged = Reshape(target_shape = [1])(merged)

    # if classifcation, add extra layers and loss function is binary crossentroy
    if classification:
        merged = Dense(1, activation = 'sigmoid')(merged)
        model = Model(inputs = [book, link], outputs = merged)
        model.compile(optimizer = 'Adam', loss = 'binary_crossentrypy', metrics = ['acccuracy'])

      # Otherwise loss function is mean squared error
    else:
      # model = tf.keras.Model(inputs=inputs, outputs=outputs)
        model = Model(inputs = [book, link], outputs = merged)
        model.compile(optimizer='adam', loss='mse')

    return model

# Instantitate model and show parameters
model = book_embedding_model()
model.summary()



#4. TRAIN MODEL


In [None]:
n_positive = 1024

gen = generate_batch(pairs, n_positive, negative_ratio=2)

# Train
# steps_per_epoch = 1epoch마다 사용할 batch_size를 정의함
# verbose(상세정보) 보통 0, 자세히 1, 함축정보 2
model.fit_generator(gen, epochs = 15, steps_per_epoch = len(pairs) // n_positive, verbose=2)

In [None]:
model.save('first_attempt.h5')

### 7.Extract Embeddings and Analyze
trainset은 Embedding 공간에서 similar entity를 옆에 배치하는 (title, wikilinks)를 학습했다

In [None]:
# Extract embeddings
book_layer = model.get_layer('book_embedding')
book_weights = book_layer.get_weights()[0]
book_weights.shape

In [None]:
book_weights = book_weights / np.linalg.norm(book_weights, axis = 1).reshape((-1, 1))
book_weights[0][:10]
np.sum(np.square(book_weights[0]))

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
plt.rcParams['font.size'] = 15

def find_similar(name, weights, index_name = 'book', n = 10, least = False, return_dist = False, plot = False):
    """Find n most similar items (or least) to name based on embeddings. Option to also plot the results"""
    
    # Select index and reverse index
    if index_name == 'book':
        index = book_index
        rindex = index_book
    elif index_name == 'page':
        index = link_index
        rindex = index_link
    
    # Check to make sure `name` is in index
    try:
        # Calculate dot product between book and all others
        dists = np.dot(weights, weights[index[name]])
    except KeyError:
        print(f'{name} Not Found.')
        return
    
    # Sort distance indexes from smallest to largest
    sorted_dists = np.argsort(dists)
    
    # Plot results if specified
    if plot:
        
        # Find furthest and closest items
        furthest = sorted_dists[:(n // 2)]
        closest = sorted_dists[-n-1: len(dists) - 1]
        items = [rindex[c] for c in furthest]
        items.extend(rindex[c] for c in closest)
        
        # Find furthest and closets distances
        distances = [dists[c] for c in furthest]
        distances.extend(dists[c] for c in closest)
        
        colors = ['r' for _ in range(n //2)]
        colors.extend('g' for _ in range(n))
        
        data = pd.DataFrame({'distance': distances}, index = items)
        
        # Horizontal bar chart
        data['distance'].plot.barh(color = colors, figsize = (10, 8),edgecolor = 'k', linewidth = 2)
        plt.xlabel('Cosine Similarity');
        plt.axvline(x = 0, color = 'k');
        
        # Formatting for italicized title
        name_str = f'{index_name.capitalize()}s Most and Least Similar to'
        for word in name.split():
            # Title uses latex for italize
            name_str += ' $\it{' + word + '}$'
        plt.title(name_str, x = 0.2, size = 28, y = 1.05)
        
        return None
    
    # If specified, find the least similar
    if least:
        # Take the first n from sorted distances
        closest = sorted_dists[:n]
         
        print(f'{index_name.capitalize()}s furthest from {name}.\n')
        
    # Otherwise find the most similar
    else:
        # Take the last n sorted distances
        closest = sorted_dists[-n:]
        
        # Need distances later on
        if return_dist:
            return dists, closest
        
        
        print(f'{index_name.capitalize()}s closest to {name}.\n')
        
    # Need distances later on
    if return_dist:
        return dists, closest
    
    
    # Print formatting
    max_width = max([len(rindex[c]) for c in closest])
    
    # Print the most similar and distances
    for c in reversed(closest):
        print(f'{index_name.capitalize()}: {rindex[c]:{max_width + 2}} Similarity: {dists[c]:.{2}}')

In [None]:
find_similar('War and Peace', book_weights)

In [None]:
find_similar('War and Peace', book_weights, n = 5, plot = True)

In [None]:
find_similar('The Fellowship of the Ring', book_weights, n = 5)

In [None]:
find_similar('Artificial Intelligence: A Modern Approach', book_weights, n = 5)

In [None]:
find_similar('Bully for Brontosaurus', book_weights, n = 5, plot = True)

### Wikilink Embeddings
We also have the embeddings of wikipedia links (which are themselves Wikipedia pages). We can take a similar approach to extract these and find the most similar to a query page.

Let's write a quick function to extract weights from a model given the name of the layer.

In [None]:
def extract_weights(name, model):
    """Extract weights from a neural network model"""
    
    # Extract weights
    weight_layer = model.get_layer(name)
    weights = weight_layer.get_weights()[0]
    
    # Normalize
    weights = weights / np.linalg.norm(weights, axis = 1).reshape((-1, 1))
    return weights

link_weights = extract_weights('link_embedding', model)

In [None]:
find_similar('science fiction', link_weights, index_name = 'page')

find_similar('biography', link_weights, index_name = 'page')

find_similar('biography', link_weights, index_name = 'page', n = 5, plot = True)



In [None]:
find_similar('new york city', link_weights, index_name = 'page', n = 5)

### Classification Model
I was curious if training for the mean squared error as a regression problem was the ideal approach, so I also decided to experiment with a classification model. For this model, the negative examples receive a label of 0 and the loss function is binary cross entropy. The procedure for the neural network to learn the embeddings is exactly the same, only it will be optimizing for a slightly different measure.

In [None]:
model_class = book_embedding_model(50, classification = True)
gen = generate_batch(pairs, n_positive, negative_ratio=2, classification = True)

In [84]:
# Train the model to learn embeddings
h = model.fit_generator(gen, epochs = 15, steps_per_epoch = len(pairs) // n_positive, verbose=1)

Epoch 13/15
Epoch 14/15
Epoch 15/15


In [85]:
model_class.save('first_attempt_class.h5')



In [87]:
book_weights_class = extract_weights('book_embedding', model_class)
book_weights_class.shape

find_similar('War and Peace', book_weights_class, n = 5)

(37020, 50)

Books closest to War and Peace.

Book: War and Peace                   Similarity: 1.0
Book: Crewel (novel)                  Similarity: 0.5
Book: The Elusive Pimpernel (novel)   Similarity: 0.49
Book: Magic of Eberron                Similarity: 0.49
Book: The Dreamers (novel series)     Similarity: 0.48


In [88]:
find_similar('The Fellowship of the Ring', book_weights_class, n = 5)

Books closest to The Fellowship of the Ring.

Book: The Fellowship of the Ring   Similarity: 1.0
Book: Tales of Dunk and Egg        Similarity: 0.53
Book: Chitta Lahu                  Similarity: 0.51
Book: Buddy (Herlong novel)        Similarity: 0.51
Book: Natural Symbols              Similarity: 0.51


### Visualizations
One of the most interesting parts about embeddings is that we can use them to visualize concepts such as War and Peace or biography. First we have to take the embeddings from 50 dimensions down to either 3 or 2. We can do this using pca, tsne, or umap. We'll try both tsne and umap for comparison. TSNE takes much longer and is designed to retain local structure within the data. UMAP is generally quicker and is designed for a balance between local and global structure in the embedding.

In [94]:
!pip install umap-learn

Collecting umap-learn
  Downloading umap-learn-0.5.3.tar.gz (88 kB)
Collecting numba>=0.49
  Downloading numba-0.55.1-cp38-cp38-win_amd64.whl (2.4 MB)
Collecting pynndescent>=0.5
  Downloading pynndescent-0.5.6.tar.gz (1.1 MB)
Collecting tqdm
  Downloading tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
Collecting numpy>=1.17
  Downloading numpy-1.21.6-cp38-cp38-win_amd64.whl (14.0 MB)
Collecting llvmlite<0.39,>=0.38.0rc1
  Downloading llvmlite-0.38.0-cp38-cp38-win_amd64.whl (23.2 MB)
Building wheels for collected packages: umap-learn, pynndescent
  Building wheel for umap-learn (setup.py): started
  Building wheel for umap-learn (setup.py): finished with status 'done'
  Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82829 sha256=f877bd795b4ed4bab406dac0826b56989e0125bcd591f4f6fb13dbe46de596ae
  Stored in directory: c:\users\kwonk\appdata\local\pip\cache\wheels\a9\3a\67\06a8950e053725912e6a8c42c4a3a241410f6487b8402542ea
  Building wheel for pynndescent (setup.py

ERROR: Could not install packages due to an OSError: [WinError 5] 액세스가 거부되었습니다: 'C:\\Users\\kwonk\\miniconda3\\envs\\kwon\\Lib\\site-packages\\~umpy\\.libs\\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll'
Consider using the `--user` option or check the permissions.



In [None]:
book_r = reduce_dim(book_weights_class, components = 2, method = 'tsne')
book_r.shape