# Data Formatting - React + Flask App
## 08/01/2020

The purpose of this notebook is to simply format and add a column to the data that we store for our recommender to look up books. We add a column called "normalized" which takes a books original title and lower cases it, while replacing all empty spaces with underscores.

Because we are interacting with Python APIs via Flask Requests using query params, using clean normalized titles for book lookups is needed.

In [1]:
#Imports
import numpy as np
import pandas as pd

In [25]:
#Load Data
books = pd.read_csv('../data/parsed_data/ultimate_books.csv')
books_colors = pd.read_csv('../data/parsed_data/books_colors.csv')

In [24]:
#Print all the titles in alphabetical/lexographical order

titles = list(books["original_title"][0:500])
res = []
for book in titles:
    if type(book) == str:
        res.append(book)
res.sort(reverse=False)
res

[' The Fellowship of the Ring',
 '11/22/63',
 'A Brief History of Time: From the Big Bang to Black Holes',
 'A Christmas Carol',
 'A Clash of Kings',
 'A Clockwork Orange',
 'A Confederacy of Dunces',
 'A Dance with Dragons',
 'A Discovery of Witches',
 'A Farewell to Arms',
 'A Feast for Crows',
 'A Game of Thrones',
 'A Grief Observed',
 'A Heartbreaking Work of Staggering Genius',
 'A Light in the Attic',
 'A Little Princess',
 "A Midsummer Night's Dream",
 'A Million Little Pieces',
 'A Prayer for Owen Meany',
 'A Storm of Swords',
 'A Tale of Two Cities',
 'A Thousand Splendid Suns',
 'A Tree Grows In Brooklyn ',
 'A Walk to Remember',
 'A Wrinkle in Time',
 'After You',
 "Alice's Adventures in Wonderland",
 'Allegiant',
 'American Gods',
 'American Psycho',
 'An Abundance of Katherines',
 'An Excellent conceited Tragedie of Romeo and Juliet',
 'And The Mountains Echoed',
 'Angels & Demons ',
 'Animal Farm: A Fairy Story',
 'Anna and the French Kiss',
 'Artemis Fowl',
 'Atlas Shru

### Column Normalization

In [19]:
books["normalized"] = books["original_title"].str.replace(' ', '_').str.lower()
books_colors["normalized"] = books_colors["original_title"].str.replace(' ', '_').str.lower()

### Saving Data

In [23]:
#General Data
books.to_csv(r'/Users/karthikrameshbabu/midsScratch/MIDS/w207/BooksFinalProject/MIDS207-Final-Project/data/parsed_data/ultimate.csv', index = False)

#General + Color Data  
books_colors.to_csv(r'/Users/karthikrameshbabu/midsScratch/MIDS/w207/BooksFinalProject/MIDS207-Final-Project/data/parsed_data/books_colors.csv', index = False)


In [27]:
print(books.shape)
books.head()

(8730, 23)


Unnamed: 0,book_id,goodreads_book_id,books_count,authors,original_publication_year,original_title,title,language_code,average_rating,ratings_count,...,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url,tag_name,class_features,cluster,normalized
0,1,2767052,272,Suzanne Collins,2008.0,The Hunger Games,"The Hunger Games (The Hunger Games, #1)",eng,4.34,4780653,...,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...,"fantasy,young adult,fiction,adventure,sci fi f...","Suzanne Collins,fantasy,young adult,fiction,ad...",37,the_hunger_games
1,2,3,491,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,Harry Potter and the Sorcerer's Stone (Harry P...,eng,4.44,4602479,...,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...,"fantasy,young adult,fiction,harry potter,magic...","J.K. Rowling, Mary GrandPré,fantasy,young adul...",20,harry_potter_and_the_philosopher's_stone
2,3,41865,226,Stephenie Meyer,2005.0,Twilight,"Twilight (Twilight, #1)",en-US,3.57,3866839,...,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...,"fantasy,young adult,fiction,sci fi fantasy,fan...","Stephenie Meyer,fantasy,young adult,fiction,sc...",29,twilight
3,4,2657,487,Harper Lee,1960.0,To Kill a Mockingbird,To Kill a Mockingbird,eng,4.25,3198671,...,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...,"young adult,fiction,childhood,classics,english...","Harper Lee,young adult,fiction,childhood,class...",8,to_kill_a_mockingbird
4,5,4671,1356,F. Scott Fitzgerald,1925.0,The Great Gatsby,The Great Gatsby,eng,3.89,2683664,...,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...,"young adult,fiction,classics,english,books,fav...","F. Scott Fitzgerald,young adult,fiction,classi...",8,the_great_gatsby


In [28]:
print(books_colors.shape)
books_colors.head()

(6668, 40)


Unnamed: 0.1,Unnamed: 0,index,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,...,cluster,pct_blue,pct_light,pct_green,pct_yellow,pct_red,pct_magenta,pct_cyan,pct_dark,normalized
0,0,0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,...,9,0.001276,0.086735,0.0,0.031888,0.038265,0.0,0.0,0.841837,the_hunger_games
1,1,1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",...,6,0.07398,0.11352,0.002551,0.193878,0.202806,0.008929,0.0,0.404337,harry_potter_and_the_philosopher's_stone
2,2,2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,...,4,0.0,0.290816,0.0,0.002551,0.043367,0.0,0.0,0.663265,twilight
3,3,3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,...,6,0.0,0.227041,0.001276,0.05102,0.406888,0.0,0.0,0.313776,to_kill_a_mockingbird
4,4,4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,...,1,0.020408,0.399235,0.0,0.006378,0.017857,0.0,0.0,0.556122,the_great_gatsby
