In [1]:
!pip install -Uqq fastbook
!pip install voila
!jupyter serverextension enable --sys-prefix voila 

[?25l[K     |▊                               | 10kB 29.8MB/s eta 0:00:01[K     |█▍                              | 20kB 34.6MB/s eta 0:00:01[K     |██▏                             | 30kB 21.6MB/s eta 0:00:01[K     |██▉                             | 40kB 24.3MB/s eta 0:00:01[K     |███▋                            | 51kB 24.3MB/s eta 0:00:01[K     |████▎                           | 61kB 26.9MB/s eta 0:00:01[K     |█████                           | 71kB 17.7MB/s eta 0:00:01[K     |█████▊                          | 81kB 18.9MB/s eta 0:00:01[K     |██████▌                         | 92kB 17.8MB/s eta 0:00:01[K     |███████▏                        | 102kB 17.7MB/s eta 0:00:01[K     |████████                        | 112kB 17.7MB/s eta 0:00:01[K     |████████▋                       | 122kB 17.7MB/s eta 0:00:01[K     |█████████▍                      | 133kB 17.7MB/s eta 0:00:01[K     |██████████                      | 143kB 17.7MB/s eta 0:00:01[K     |██████████▉  

In [2]:
import fastbook
import pandas as pd
import re
import bs4
from fastai.text.all import *
import ipywidgets as widgets

# Getting Data + Processing

In [3]:
def text_processing(x): #cleans up the description text
  
  x = x.lower()
  x = bs4.BeautifulSoup(x, "lxml").text
  x = x.encode('ascii', 'ignore').decode()
  x = re.sub(r'https*\S+', ' ', x)
  x = re.sub(r'@\S+', ' ', x)
  x = re.sub(r'#\S+', ' ', x)
  x = re.sub(r'\'\w+', '', x)
  x = re.sub(r'\w*\d+\w*', '', x)
  x = re.sub(r'\s{2,}', ' ', x)
  return x 

def read_df(filename, genre): #reads the data from 3 different files

  df = pd.read_csv(filename, index_col=0)
  df.drop(df.index[df['language'] != 'eng'], inplace=True) #keeps only english 
  df.dropna(subset=['description'], inplace=True) # drops if no description

  cols_to_keep = ['title', 'author', 'description']
  df = df[cols_to_keep]

  df['description'] = df.description.apply(text_processing)
  df['genre'] = genre

  return df



In [4]:
# read data

data = pd.DataFrame()
data_genre = {'mystery': 'gr_df_mystery.csv',
              'sicfi': 'gr_df_sf.csv',
              'YA': 'gr_df_ya.csv'}

for genre, filename in data_genre.items():

  df = read_df(filename, genre)
  data = pd.concat([data, df])


In [5]:
data.tail()

Unnamed: 0,title,author,description,genre
5191,"The Devil's Triangle (The Devil's Triangle, #1)",Toni De Palma,"when year old cooper dies in an attempt to burn down his school, he finds himself in the afterlife. lucy, the devil sister who has crossed party lines, decides to give cooper another shot at heaven. the deal? cooper returns to earth and has to find a girl named grace. the rest is up to him.while cooper figures out his mission, he thrown into the life he always wanted. great parents, a spot on the varsity football team and a real future are all within reach. but what he really wants is grace, a feisty girl with an abusive boyfriend who can pound cooper into pulp if he doesn watch out.while ...",YA
5193,"TRANSITION, The Chimera Hunters Series",Megan S. Johnston,"the chimera are a race so old, the humans relegated them as a myth. the gods feared the chimeras powers; they believed they were a deadly race, with physical abilities beyond belief. so, they split their race in half condemning them to wander the earth, searching for their other half to be complete. without their sodalis, each was destined to live life without dreams, without love, without hope. the future rested on finding their one true mate for life. shelby oneil has led a solitary life with her parents. so when she goes to school at washington state university, she believes her life ha...",YA
5196,"Eleventh Elementum (The Primortus Chronicles, #1)",J.L. Bond,"it has begun! fourteen years after a catastrophic disaster, the world has regained its balance with the help of the primortus. they have secretly walked amongst us since the beginning of time, protecting the earth with the power of the elements. when a fourteen-year-old girl is given two mysterious gifts from a father she never knew, she is drawn into their dangerous world of magic. she must learn to wield the power of the eleventh elementum to survive. with the help of will (her best friend), his cousin and her new stepsister, she seeks out the evil that threatens new zealand. but a black...",YA
5197,"Destiny (Destiny, #1)",Deborah Ann,"danielle kennedy is not one to believe in fairy tale love or destiny holding a plan for the future, nor does she believe in mythical legends, vampires, or spirits...at sixteen, athletic, honor student danielle has her life planned out: study hard, play soccer even harder, and slide through high school under the radar. it was a good plan, a solid plan, one thats worked so far; until cayden bridwellthe longtime shy and brainy classmate she has ignoredrides in on his motorcycle and obliterates her plans. now, life as danielle knows it, will never be the same.wealthy caydenonly recently coming...",YA
5198,"System Purge (Digital Evolution, #1)",Ross Willard,"fourteen-year-old tommy philips doesnt know where he comes from. he has questions that his foster parents cant answer, questions about who he is and what makes him so different from everyone around him. when he stumbles across evidence that one of his teachers has been guarding him for years, tommy begins an investigation that will uncover a history he never could have guessed.rowan darren wasnt just born to be a soldier, he was made to be one. the nospious, a collection of twelve houses of genetically-engineered humans, live in silent conflict, fighting quiet political wars against each o...",YA


# AI Setup

In [6]:
dls = TextDataLoaders.from_df(data, text_col='description', label_col='genre')

  return array(a, dtype, copy=False, order=order)


In [7]:
dls.show_batch(max_n=3)

# We can see that the library automatically processed all the texts to split then in tokens, adding some special tokens like:

#     xxbos to indicate the beginning of a text
#     xxmaj to indicate the next word was capitalized


Unnamed: 0,text,category
0,"xxbos spiritus mundi by robert sheppard , nominated for the prestigious pushcart prize for literature , consists of spiritus mundi , the xxunk i , and spiritus mundi , the romancebook ii . book is espionage - terror - political - religious thriller - action criss - crosses the globe from beijing to london to washington , mexico city and jerusalem presenting a vast panorama of the contemporary international world , including compelling action , deep and realistic characters and surreal adventures , while book ii xxunk the setting and scope into a fantasy ( though still rooted in the real ) adventure where the protagonists embark on a quest to the realms of middle earth and its crystal bead game and through a wormhole to the council of the immortals in the amphitheater in the center of the milky way galaxy in search of the crucial silmaril crystal ,",sicfi
1,"xxbos spiritus mundi by robert sheppard , nominated for the prestigious pushcart prize for literature , consists of spiritus mundi , the xxunk i , and spiritus mundi , the romancebook ii . book is espionage - terror - political - religious thriller - action criss - crosses the globe from beijing to london to washington , mexico city and jerusalem presenting a vast panorama of the contemporary international world , including compelling action , deep and realistic characters and surreal adventures , while book ii xxunk the setting and scope into a fantasy ( though still rooted in the real ) adventure where the protagonists embark on a quest to the realms of middle earth and its crystal bead game and through a wormhole to the council of the immortals in the amphitheater in the center of the milky way galaxy in search of the crucial silmaril crystal ,",sicfi
2,"xxbos xxunk description : danny xxunk is an average thirteen - year - old who finds himself at the beginning of his eighth - grade year , struggling with some of the more common concerns that plague a boy of his age : bullies , homework , and his mother . sabrina drake is the new girl . she is beautiful and spellbinding , but carries a fantastic xxunk into the white rock academy of illumination , a school for young squires destined to become knights of the light and battle the forces of the dark with magical weapons called xxunk , danny joins his five closest friends in the training of their lives . honed in the techniques of blade work by an xxunk xxunk and educated by a colorful assortment of knightly instructors , danny and his friends are placed on the path to becoming knighted members of",YA


In [8]:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

In [9]:
learn.fine_tune(4, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.807797,0.774422,0.662385,00:31


epoch,train_loss,valid_loss,accuracy,time
0,0.743816,0.76279,0.679816,01:18
1,0.684567,0.695924,0.692202,01:16
2,0.609107,0.705413,0.695872,01:17
3,0.547217,0.718619,0.693578,01:17


In [10]:
learn.export()

# App

To test, the learner, we provide this sample book description:
<br>
<br>
When Abby signs up for a DNA service, it’s mainly to give her friend and secret love interest, Leo, a nudge. After all, she knows who she is already: Avid photographer. Injury-prone tree climber. Best friend to Leo and Connie…although ever since the B.E.I. (Big Embarrassing Incident) with Leo, things have been awkward on that front. But she didn’t know she’s a younger sister. When the DNA service reveals Abby has a secret sister, shimmery-haired Instagram star Savannah Tully, it’s hard to believe they’re from the same planet, never mind the same parents — especially considering Savannah, queen of green smoothies, is only a year and a half older than Abby herself. The logical course of action? Meet up at summer camp (obviously) and figure out why Abby’s parents gave Savvy up for adoption. But there are complications: Savvy is a rigid rule-follower and total narc. Leo is the camp’s co-chef, putting Abby's growing feelings for him on blast. And her parents have a secret that threatens to unravel everything. But part of life is showing up, leaning in, and learning to fit all your awkward pieces together. Because sometimes, the hardest things can also be the best ones.
<br>
<br>
 From https://www.goodreads.com/book/show/53138158-you-have-a-match

In [3]:
def callback(wdgt):
    # replace by something useful
    print(wdgt.value)

In [4]:
#load learner
learn_inf = load_learner('export.pkl')

In [5]:
#define widget for interactivity
book_desc = widgets.Text(
    value='Type in a book description!',
    placeholder='Type something',
    description='Input:',
    disabled=False
)
display(book_desc)
book_desc.on_submit(callback)

Text(value='Type in a book description!', description='Input:', placeholder='Type something')

When Abby signs up for a DNA service, it’s mainly to give her friend and secret love interest, Leo, a nudge. After all, she knows who she is already: Avid photographer. Injury-prone tree climber. Best friend to Leo and Connie…although ever since the B.E.I. (Big Embarrassing Incident) with Leo, things have been awkward on that front. But she didn’t know she’s a younger sister. When the DNA service reveals Abby has a secret sister, shimmery-haired Instagram star Savannah Tully, it’s hard to believe they’re from the same planet, never mind the same parents — especially considering Savannah, queen of green smoothies, is only a year and a half older than Abby herself. The logical course of action? Meet up at summer camp (obviously) and figure out why Abby’s parents gave Savvy up for adoption. But there are complications: Savvy is a rigid rule-follower and total narc. Leo is the camp’s co-chef, putting Abby's growing feelings for him on blast. And her parents have a secret that threatens to 

In [6]:
pred, pred_idx, probs = learn_inf.predict(book_desc.value)

In [7]:
print(f"Our learner has predicted the books description to be of a {pred} book with probabilty {probs[pred_idx]}")

Our learner has predicted the books description to be of a YA book with probabilty 0.9534489512443542
