## Brute Force ile Unsupervised Nearest Neighbors 

NearestNeighbors implements unsupervised nearest neighbors learning. It acts as a uniform interface to three different nearest neighbors algorithms: BallTree, KDTree, and a brute-force algorithm based on routines in sklearn.metrics.pairwise.


The most naive neighbor search implementation involves the brute-force computation of distances between all pairs of points in the dataset: for  samples in  dimensions, this approach scales as O[D N^2].

Efficient brute-force neighbors searches can be very competitive for small data samples. However, as the number of samples  grows, the brute-force approach quickly becomes infeasible


## Kaynaklar

- https://scikit-learn.org/stable/modules/neighbors.html#unsupervised-nearest-neighbors
- https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms
- https://www.kaggle.com/sankha1998/collaborative-book-recommendation-system/data

### TODO

- gorsellestirme ekle (cesitli notebooklardan)
- detayli veri on isleme ekle
- pandas ve np API'sini iyi anla
- implicit ratinglerden bahset

In [1]:
import numpy as np
import pandas as pd 

In [2]:
# sutunlarin isimlerini liste ile belirle
#Users
u_cols = ['user_id', 'location', 'age']
users = pd.read_csv('../data/book_x/BX-Users.csv', sep=';', names=u_cols, encoding='latin-1',low_memory=False)

#Books
i_cols = ['ISBN', 'title' ,'author','year', 'publisher', 'img_s', 'img_m', 'img_l']
books = pd.read_csv('../data/book_x/BX_Books.csv', sep=';', names=i_cols, encoding='latin-1',low_memory=False)

#Ratings
r_cols = ['user_id', 'ISBN', 'rating']
ratings = pd.read_csv('../data/book_x/BX-Book-Ratings.csv', sep=';', names=r_cols, encoding='latin-1',low_memory=False)

In [3]:
users.head()

Unnamed: 0,user_id,location,age
0,User-ID,Location,Age
1,1,"nyc, new york, usa",
2,2,"stockton, california, usa",18
3,3,"moscow, yukon territory, russia",
4,4,"porto, v.n.gaia, portugal",17


In [4]:
books.head()

Unnamed: 0,ISBN,title,author,year,publisher,img_s,img_m,img_l
0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
1,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
2,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
3,0060973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
4,0374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...


In [5]:
ratings.head()

Unnamed: 0,user_id,ISBN,rating
0,User-ID,ISBN,Book-Rating
1,276725,034545104X,0
2,276726,0155061224,5
3,276727,0446520802,0
4,276729,052165615X,3


In [6]:
print("book dataframe'i bicimi", books.shape)
print("users dataframe'i bicimi", users.shape)
print("ratings dataframe'i bicimi", ratings.shape)

book dataframe'i bicimi (271380, 8)
users dataframe'i bicimi (278859, 3)
ratings dataframe'i bicimi (1149781, 3)


In [7]:
# degerlendirme yapan ozgun kullinici sayisi
ratings['user_id'].value_counts().shape

(105284,)

In [8]:
# kullanicilarin yaptigi rating sayisinin histogramini yap

### !!!! veri on isleme kesinlikle gerekli

### veri on isleme

- implicit ratingsleri mean value ile degistir
- daha detayli preprocessing yap

In [9]:
# 200 degeri bruteforce da n^2 (polinomial) arttigi icin selected_ratings'i cok arttirmak akillica degil
# yada farkli bir algoritma kullanmak gerekli

selected_ratings = ratings['user_id'].value_counts() > 200

# print(type(ratings['user_id'])) # <class 'pandas.core.series.Series'>
# print(type(selected_ratings)) # <class 'pandas.core.series.Series'>
# print(type(ratings['user_id'].value_counts()))

# print(type(selected_ratings))
selected_ratings[selected_ratings].shape # Buradaki olayi anla
print(type(selected_ratings))
# dir(selected_ratings)

<class 'pandas.core.series.Series'>


In [10]:
# Burada kaldim
# print(ratings['rating'].head())
# ratings['rating'] = ratings['rating'].iloc[1:]
# ratings['rating'] = ratings['rating'].astype(int)
# print(ratings.dtypes)
# ratings.ratings.plot(kind="hist")