# Introduction to ML - Recommendation Systems
Welcome!  
These jupyter notebooks contain the slides for the Czechitas weekend workshop. The organization is following:
- the slides contain general theory
- at the end of each section, there is a short Tips & Tricks section
    - they should help you with your own code
- the actual code will be shared after the workshop

Let's start learning!

## Part 0 - Introduction

## This weekend
- Git
- Data loading
  - SQL
  - API
- EDA
- Intro ML 
- Distances
- Regression - OLS & LASSO
- Classification - kNN
- PCA
- Collaborative Filtering
- Clustering - k-Means
- SVD

## \#Me

Michal Kubišta  
<kubistmi@gmail.com>  
[github.com/kubistmi](https://github.com/kubistmi)  

## Questions?

## Part 1 - Loading warmup!
- What is an ETL process?
- Why should I care? I want to do proper ML!
- What is ```SQL```?
- What is ```API```?

### 1.1 Load from csv

In [1]:
from IPython.core.display import display, HTML
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# API
import requests
# SQL
from sqlalchemy import create_engine, MetaData, Table, select
%matplotlib inline

In [2]:
os.listdir('data')

['books.csv',
 'book_tags.csv',
 'ratings.csv',
 'sample_book.xml',
 'tags.csv',
 'to_read.csv']

In [3]:
# ratings
rats = pd.read_csv('data/ratings.csv')

print(rats.shape)
rats.head()

(981756, 3)


Unnamed: 0,book_id,user_id,rating
0,1,314,5
1,1,439,3
2,1,588,5
3,1,1169,4
4,1,1185,4


In [4]:
# tags
tags = pd.read_csv('data/tags.csv')

print(tags.shape)
tags.head()

(34252, 2)


Unnamed: 0,tag_id,tag_name
0,0,-
1,1,--1-
2,2,--10-
3,3,--12-
4,4,--122-


In [5]:
# to read
to_read = pd.read_csv('data/to_read.csv')

print(to_read.shape)
to_read.head()

(912705, 2)


Unnamed: 0,user_id,book_id
0,1,112
1,1,235
2,1,533
3,1,1198
4,1,1874


### 1.2 Load from SQL

In [6]:
engine = create_engine(
    '{type}://{user}:{password}@{host}:{port}/{database}'.format(
        type = 'postgresql',
        host= 'ec2-54-246-117-62.eu-west-1.compute.amazonaws.com',
        port= '5432',
        database= 'd116ni9jadsuh5',
        user= 'ynjldnhfvhsctz',
        password= '4b46bec4c15716b03ea2e1980023790a80e6b8a127650ded1026f5e091f6da0e' 
        )
    )

In [7]:
# PYTHON OOP
conn = engine.connect()
metadata = MetaData()

books = Table('books', metadata, autoload_with=engine)

query = select([books]).where(books.columns.id == 1)
print(query)

SELECT books.id, books.book_id, books.best_book_id, books.work_id, books.books_count, books.isbn, books.isbn13, books.authors, books.original_publication_year, books.original_title, books.title, books.language_code, books.average_rating, books.ratings_count, books.work_ratings_count, books.work_text_reviews_count, books.ratings_1, books.ratings_2, books.ratings_3, books.ratings_4, books.ratings_5, books.image_url, books.small_image_url 
FROM books 
WHERE books.id = :id_1


In [8]:
sql_res = conn.execute(query).fetchmany(5)
display(pd.DataFrame(sql_res[:15], columns= sql_res[0].keys()))

conn.close()

Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439023480.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...


In [9]:
# DIRECTLY
sql_res = engine.execute("SELECT * FROM books").fetchall()
book = pd.DataFrame(sql_res, columns= sql_res[0].keys())

print(book.shape)
book.head()

(10000, 23)


Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439023480.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780439554930.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316015840.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061120080.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743273560.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


### Tips & Tricks

In [10]:
"""
# Tips & Tricks
engine = create_engine('{type}://{user}:{password}@{host}:{port}/{database}')
engine.execute('QUERY').fetchmany(X)
"""
display()

### 1.3 Load from API

In [11]:
req = requests.get(
    'http://{host}:{port}/{endpoint}'.format(
        host = 'localhost',
        port = '8000',
        endpoint = '')
    )
display(req, req.headers, req.encoding, req.text)

<Response [200]>

{'Accept-Ranges': 'bytes', 'Content-Length': '322', 'Content-Type': 'text/html; charset=utf-8', 'Last-Modified': 'Thu, 14 Mar 2019 14:12:00 GMT', 'Date': 'Thu, 11 Apr 2019 11:39:31 GMT'}

'utf-8'

'<html>\n  <head>\n    <title>Hello Czechitas</title>\n  </head>\n  <body>\n    <h1>Welcome to Czechitas Recommendation System Workshop data repository.</h1>\n    The API contains the following endpoints reacheable with GET method: </br>\n    <ul>\n      <li>/tags/{tag_id}</li>\n      <li>/tags-all</li>\n    </ul>\n  </body>\n</html>'

In [12]:
display(HTML(req.text))

In [13]:
req = requests.get(
    'http://{host}:{port}/{endpoint}'.format(
        host = 'localhost',
        port = '8000',
        endpoint = '/tags-all')
    )

print(req.json()[:10])

bk_tags = pd.DataFrame(req.json())
print(bk_tags.shape)
bk_tags.head()

[{'goodreads_book_id': '1', 'tag_id': '30574', 'count': 167697}, {'goodreads_book_id': '1', 'tag_id': '11305', 'count': 37174}, {'goodreads_book_id': '1', 'tag_id': '11557', 'count': 34173}, {'goodreads_book_id': '1', 'tag_id': '8717', 'count': 12986}, {'goodreads_book_id': '1', 'tag_id': '33114', 'count': 12716}, {'goodreads_book_id': '1', 'tag_id': '11743', 'count': 9954}, {'goodreads_book_id': '1', 'tag_id': '14017', 'count': 7169}, {'goodreads_book_id': '1', 'tag_id': '5207', 'count': 6221}, {'goodreads_book_id': '1', 'tag_id': '22743', 'count': 4974}, {'goodreads_book_id': '1', 'tag_id': '32989', 'count': 4364}]
(999912, 3)


Unnamed: 0,count,goodreads_book_id,tag_id
0,167697,1,30574
1,37174,1,11305
2,34173,1,11557
3,12986,1,8717
4,12716,1,33114


### Tips & Tricks

In [14]:
"""
requests.get('http://{host}:{port}/{endpoint}')
response.json()
"""
display()