<a href="https://colab.research.google.com/github/lilswapnil/book-recommender/blob/main/notebook/text_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3. Text Classification

## 3.0 Requirements, Credentials & Libraries

In [95]:
from google.colab import userdata

HF_TOKEN = userdata.get("HUGGINGFACEHUB_API_TOKEN")
OPENAI_KEY = userdata.get("OPENAI_API_KEY")

In [96]:
import numpy as np
import pandas as pd

In [97]:

from google.colab import drive
drive.mount('/content/drive')

# Define the path where you saved the file in Google Drive
drive_path = '/content/drive/MyDrive/books.csv'

# Load the CSV file into a pandas DataFrame
loaded_books_df = pd.read_csv(drive_path)

# Display the first few rows to verify
print(f"Successfully loaded 'books.csv' from Google Drive. Shape: {loaded_books_df.shape}")
display(loaded_books_df.head())

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Successfully loaded 'books.csv' from Google Drive. Shape: (75750, 8)


Unnamed: 0,author,desc,genre,isbn,rating,reviews,title,tag
0,Laurence M. Hauptman,Reveals that several hundred thousand Indians ...,"History,Military History,Civil War,American Hi...",002914180X,3.52,5,Between Two Fires: American Indians in the Civ...,002914180X Reveals that several hundred thousa...
1,"Charlotte Fiell,Emmanuelle Dirix",Fashion Sourcebook - 1920s is the first book i...,"Couture,Fashion,Historical,Art,Nonfiction",1906863482,4.51,6,Fashion Sourcebook 1920s,1906863482 Fashion Sourcebook - 1920s is the f...
2,Andy Anderson,The seminal history and analysis of the Hungar...,"Politics,History",948984147,4.15,2,Hungary 56,948984147 The seminal history and analysis of ...
3,Carlotta R. Anderson,"""All-American Anarchist"" chronicles the life a...","Labor,History",814327079,3.83,1,All-American Anarchist: Joseph A. Labadie and ...,"814327079 ""All-American Anarchist"" chronicles ..."
4,Jeffrey Pfeffer,Why is common sense so uncommon when it comes ...,"Business,Leadership,Romance,Historical Romance...",875848419,3.73,7,The Human Equation: Building Profits by Puttin...,875848419 Why is common sense so uncommon when...


## 3.1. Text Classification (Zero Shot Classification on Genre)

In [98]:
loaded_books_df['genre'].value_counts().reset_index()

Unnamed: 0,genre,count
0,Nonfiction,348
1,History,319
2,"Games,Chess",166
3,"Esoterica,Astrology",145
4,"History,Nonfiction",117
...,...,...
58483,"Childrens,Picture Books,Childrens,Transport,Tr...",1
58484,"Fiction,Cultural,Ireland,Humor,European Litera...",1
58485,"European Literature,Finnish Literature,Fiction...",1
58486,"Manga,Yaoi,Sequential Art,Manga,Yaoi,Boys Love...",1


In [99]:
loaded_books_df['genre'].value_counts().reset_index().query("count >= 50").sort_values('count', ascending=False)

Unnamed: 0,genre,count
0,Nonfiction,348
1,History,319
2,"Games,Chess",166
3,"Esoterica,Astrology",145
4,"History,Nonfiction",117
5,Music,105
6,"Combat,Martial Arts",104
7,"Crafts,Quilting",97
8,"Science,Mathematics",90
9,Art,89


In [100]:
loaded_books_df.head()

Unnamed: 0,author,desc,genre,isbn,rating,reviews,title,tag
0,Laurence M. Hauptman,Reveals that several hundred thousand Indians ...,"History,Military History,Civil War,American Hi...",002914180X,3.52,5,Between Two Fires: American Indians in the Civ...,002914180X Reveals that several hundred thousa...
1,"Charlotte Fiell,Emmanuelle Dirix",Fashion Sourcebook - 1920s is the first book i...,"Couture,Fashion,Historical,Art,Nonfiction",1906863482,4.51,6,Fashion Sourcebook 1920s,1906863482 Fashion Sourcebook - 1920s is the f...
2,Andy Anderson,The seminal history and analysis of the Hungar...,"Politics,History",948984147,4.15,2,Hungary 56,948984147 The seminal history and analysis of ...
3,Carlotta R. Anderson,"""All-American Anarchist"" chronicles the life a...","Labor,History",814327079,3.83,1,All-American Anarchist: Joseph A. Labadie and ...,"814327079 ""All-American Anarchist"" chronicles ..."
4,Jeffrey Pfeffer,Why is common sense so uncommon when it comes ...,"Business,Leadership,Romance,Historical Romance...",875848419,3.73,7,The Human Equation: Building Profits by Puttin...,875848419 Why is common sense so uncommon when...


In [101]:
loaded_books_df["genre"] = loaded_books_df["genre"].str.split(",")
loaded_books_df

Unnamed: 0,author,desc,genre,isbn,rating,reviews,title,tag
0,Laurence M. Hauptman,Reveals that several hundred thousand Indians ...,"[History, Military History, Civil War, America...",002914180X,3.52,5,Between Two Fires: American Indians in the Civ...,002914180X Reveals that several hundred thousa...
1,"Charlotte Fiell,Emmanuelle Dirix",Fashion Sourcebook - 1920s is the first book i...,"[Couture, Fashion, Historical, Art, Nonfiction]",1906863482,4.51,6,Fashion Sourcebook 1920s,1906863482 Fashion Sourcebook - 1920s is the f...
2,Andy Anderson,The seminal history and analysis of the Hungar...,"[Politics, History]",948984147,4.15,2,Hungary 56,948984147 The seminal history and analysis of ...
3,Carlotta R. Anderson,"""All-American Anarchist"" chronicles the life a...","[Labor, History]",814327079,3.83,1,All-American Anarchist: Joseph A. Labadie and ...,"814327079 ""All-American Anarchist"" chronicles ..."
4,Jeffrey Pfeffer,Why is common sense so uncommon when it comes ...,"[Business, Leadership, Romance, Historical Rom...",875848419,3.73,7,The Human Equation: Building Profits by Puttin...,875848419 Why is common sense so uncommon when...
...,...,...,...,...,...,...,...,...
75745,Simon Monk,Design custom printed circuit boards with EAGL...,,71819266,4.07,7,Make Your Own PCBs with Eagle: From Schematic ...,71819266 Design custom printed circuit boards ...
75746,"Tracie L. Miller-Nobles,Brenda L. Mattison,Ell...","Redefining tradition in learning accounting. ,...",,133251241,4.05,1,Horngren's Financial & Managerial Accounting,133251241 Redefining tradition in learning acc...
75747,C. John Miller,In these warm reflections on his own growth as...,"[Christianity, Evangelism, Christian, Religion...",875523919,4.27,20,A Faith Worth Sharing: A Lifetime of Conversat...,875523919 In these warm reflections on his own...
75748,Albert Marrin,"John Brown is a man of many legacies, from her...","[Nonfiction, History, Biography, Military Hist...",307981533,3.63,51,A Volcano Beneath the Snow: John Brown's War A...,307981533 John Brown is a man of many legacies...


In [102]:
loaded_books_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75750 entries, 0 to 75749
Data columns (total 8 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   author   75750 non-null  object 
 1   desc     75750 non-null  object 
 2   genre    69853 non-null  object 
 3   isbn     75750 non-null  object 
 4   rating   75750 non-null  float64
 5   reviews  75750 non-null  int64  
 6   title    75750 non-null  object 
 7   tag      75750 non-null  object 
dtypes: float64(1), int64(1), object(6)
memory usage: 4.6+ MB


In [103]:
loaded_books_df["genre"] = loaded_books_df["genre"].apply(
    lambda x: list(set([g.strip() for g in x])) if isinstance(x, list) else []
)

In [104]:
genre_counts = (
    loaded_books_df.explode("genre")
      .groupby("genre")
      .size()
      .sort_values(ascending=False)
)

genre_counts.head()

Unnamed: 0_level_0,0
genre,Unnamed: 1_level_1
Nonfiction,25933
Fiction,24138
Romance,12018
History,11525
Fantasy,11257


In [105]:
category_mapping = {

    # --- FICTION ---
    "Fiction": "Fiction",
    "Literary Fiction": "Fiction",
    "Historical Fiction": "Fiction",
    "Romance": "Fiction",
    "Drama": "Fiction",
    "Thrillers": "Fiction",
    "Mystery": "Fiction",
    "Science Fiction": "Fiction",
    "Fantasy": "Fiction",
    "Horror": "Fiction",
    "Comics & Graphic Novels": "Fiction",
    "Poetry": "Fiction",

    # --- CHILDREN ---
    "Juvenile Fiction": "Children's Fiction",
    "Children": "Children's Fiction",
    "Young Adult Fiction": "Children's Fiction",

    "Juvenile Nonfiction": "Children's Nonfiction",
    "Young Adult Nonfiction": "Children's Nonfiction",

    # --- NONFICTION CORE ---
    "Nonfiction": "Nonfiction",
    "Biography & Autobiography": "Nonfiction",
    "History": "Nonfiction",
    "Philosophy": "Nonfiction",
    "Religion": "Nonfiction",
    "Literary Criticism": "Nonfiction",
    "Science": "Nonfiction",
    "Mathematics": "Nonfiction",
    "Political Science": "Nonfiction",
    "Sociology": "Nonfiction",
    "Psychology": "Nonfiction",

    # --- HOBBY / LIFESTYLE ---
    "Art": "Lifestyle",
    "Music": "Lifestyle",
    "Crafts": "Lifestyle",
    "Quilting": "Lifestyle",
    "Origami": "Lifestyle",
    "Games": "Lifestyle",
    "Chess": "Lifestyle",
    "Cooking": "Lifestyle",
    "Wine": "Lifestyle",
    "Alcohol": "Lifestyle",

    # --- PROFESSIONAL / TECH ---
    "Business": "Professional",
    "Economics": "Professional",
    "Technology": "Professional",
    "Computers": "Professional",
    "Engineering": "Professional",
    "Medical": "Professional",
    "Nursing": "Professional",
    "Law": "Professional",
    "Education": "Professional",

    # --- ESOTERIC ---
    "Occult": "Spiritual & Esoteric",
    "Tarot": "Spiritual & Esoteric",
    "Astrology": "Spiritual & Esoteric",
    "Esoterica": "Spiritual & Esoteric",

}

In [106]:
def simplify_to_primary(cat_value):

    # Case 1: null
    if cat_value is None:
        return "Other"

    # Case 2: already a list
    if isinstance(cat_value, list):
        categories = cat_value

    # Case 3: string
    elif isinstance(cat_value, str):
        categories = [c.strip() for c in cat_value.split(",")]

    else:
        return "Other"

    # Map to primary category
    for c in categories:
        if c in category_mapping:
            return category_mapping[c]

    return "Other"


loaded_books_df["category"] = loaded_books_df["genre"].apply(simplify_to_primary)

In [107]:
loaded_books_df['category'].value_counts().reset_index()

Unnamed: 0,category,count
0,Fiction,30449
1,Nonfiction,27655
2,Other,11216
3,Lifestyle,3762
4,Professional,2132
5,Spiritual & Esoteric,536


In [108]:
loaded_books_df[~loaded_books_df['genre'].isna()]

Unnamed: 0,author,desc,genre,isbn,rating,reviews,title,tag,category
0,Laurence M. Hauptman,Reveals that several hundred thousand Indians ...,"[Civil War, Military History, American History...",002914180X,3.52,5,Between Two Fires: American Indians in the Civ...,002914180X Reveals that several hundred thousa...,Nonfiction
1,"Charlotte Fiell,Emmanuelle Dirix",Fashion Sourcebook - 1920s is the first book i...,"[Couture, Nonfiction, Art, Fashion, Historical]",1906863482,4.51,6,Fashion Sourcebook 1920s,1906863482 Fashion Sourcebook - 1920s is the f...,Nonfiction
2,Andy Anderson,The seminal history and analysis of the Hungar...,"[Politics, History]",948984147,4.15,2,Hungary 56,948984147 The seminal history and analysis of ...,Nonfiction
3,Carlotta R. Anderson,"""All-American Anarchist"" chronicles the life a...","[Labor, History]",814327079,3.83,1,All-American Anarchist: Joseph A. Labadie and ...,"814327079 ""All-American Anarchist"" chronicles ...",Nonfiction
4,Jeffrey Pfeffer,Why is common sense so uncommon when it comes ...,"[Human Resources, Leadership, Management, Nonf...",875848419,3.73,7,The Human Equation: Building Profits by Puttin...,875848419 Why is common sense so uncommon when...,Nonfiction
...,...,...,...,...,...,...,...,...,...
75745,Simon Monk,Design custom printed circuit boards with EAGL...,[],71819266,4.07,7,Make Your Own PCBs with Eagle: From Schematic ...,71819266 Design custom printed circuit boards ...,Other
75746,"Tracie L. Miller-Nobles,Brenda L. Mattison,Ell...","Redefining tradition in learning accounting. ,...",[],133251241,4.05,1,Horngren's Financial & Managerial Accounting,133251241 Redefining tradition in learning acc...,Other
75747,C. John Miller,In these warm reflections on his own growth as...,"[Religion, Christian, Nonfiction, Theology, Ch...",875523919,4.27,20,A Faith Worth Sharing: A Lifetime of Conversat...,875523919 In these warm reflections on his own...,Nonfiction
75748,Albert Marrin,"John Brown is a man of many legacies, from her...","[Civil War, Military History, Juvenile, Middle...",307981533,3.63,51,A Volcano Beneath the Snow: John Brown's War A...,307981533 John Brown is a man of many legacies...,Nonfiction


In [109]:
loaded_books_df.shape

(75750, 9)

In [110]:
loaded_books_df.to_csv('categorized_books.csv', index=False)

## 3.2 Transformer Model

In [111]:
from transformers import pipeline
import torch

device = 0 if torch.cuda.is_available() else -1

classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    device=device
)

Loading weights:   0%|          | 0/515 [00:00<?, ?it/s]

In [112]:
fiction_categories = ['Fiction', 'Nonfiction']

In [113]:
sequence = loaded_books_df.loc[loaded_books_df["category"] == "Fiction", "desc"].dropna().iloc[0]
print(sequence)


"Competitive Advantage Through People" explores why - despite long-standing evidence that a committed work force is essential for success - firms continue to attach little importance to their workers. The answer, argues Pfeffer, resides in a complex web of factors based on perception, history, legislation, and practice that continues to dominate management thought and action. Yet, some organizations have been able to overcome these obstacles. In fact, the five common stocks with the highest returns between 1972 and 1992 - Southwest Airlines, Wal-Mart, Tyson Foods, Circuit City, and Plenum Publishing - were in industries that shared virtually none of the characteristics traditionally associated with strategic success. What each of these firms did share is the ability to produce sustainable competitive advantage through its way of managing people. Pfeffer documents how they - and others - resisted traditional management pitfalls, and offers frameworks for implementing these changes in an

In [114]:
print(type(sequence))

<class 'str'>


In [115]:
classifier(sequence, fiction_categories)

{'sequence': '"Competitive Advantage Through People" explores why - despite long-standing evidence that a committed work force is essential for success - firms continue to attach little importance to their workers. The answer, argues Pfeffer, resides in a complex web of factors based on perception, history, legislation, and practice that continues to dominate management thought and action. Yet, some organizations have been able to overcome these obstacles. In fact, the five common stocks with the highest returns between 1972 and 1992 - Southwest Airlines, Wal-Mart, Tyson Foods, Circuit City, and Plenum Publishing - were in industries that shared virtually none of the characteristics traditionally associated with strategic success. What each of these firms did share is the ability to produce sustainable competitive advantage through its way of managing people. Pfeffer documents how they - and others - resisted traditional management pitfalls, and offers frameworks for implementing these

In [116]:
result = classifier(sequence, fiction_categories)

max_index = np.argmax(result["scores"])
max_label = result["labels"][max_index]

print(f"The book is {max_label}")

The book is Nonfiction


In [117]:
def generate_predictions(sequence, categories):
    result = classifier(sequence, categories)
    return result["labels"][int(np.argmax(result["scores"]))]

In [118]:
# from tqdm import tqdm

# actual_cats = []
# predicted_cats = []

# for i in tqdm(range(0, 300)):
#   sequence = loaded_books_df.loc[loaded_books_df['category'] == 'Fiction', 'desc'].reset_index(drop=True)[i]

#   predicted_cats.append(generate_predictions(sequence, fiction_categories))
#   actual_cats += ['Fiction']

In [119]:
# for i in tqdm(range(0, 300)):
#   sequence = loaded_books_df.loc[loaded_books_df['category'] == 'Nonfiction', 'desc'].reset_index(drop=True)[i]
#   predicted_cats.append(generate_predictions(sequence, fiction_categories))
#   actual_cats += ['Nonfiction']

In [120]:
# 1) Build a balanced subset: 300 Fiction + 300 Nonfiction
fiction_df = loaded_books_df.loc[loaded_books_df["category"] == "Fiction"].head(300)
nonfiction_df = loaded_books_df.loc[loaded_books_df["category"] == "Nonfiction"].head(300)

subset = pd.concat([fiction_df, nonfiction_df], axis=0).reset_index(drop=True)

# 2) Create text input (title + desc)
texts = (
    subset["title"].fillna("").astype(str) + " " +
    subset["desc"].fillna("").astype(str)
).tolist()

# 3) Run classifier in batches on GPU
results = classifier(texts, fiction_categories, batch_size=16)

# 4) Extract predictions + actuals
predicted_cats = [r["labels"][0] for r in results]   # top label
actual_cats = subset["category"].tolist()

print(len(actual_cats), len(predicted_cats))  # should both be 600

600 600


In [121]:
prediction_df = pd.DataFrame({
    "actual_categories": actual_cats,
    "predicted_categories": predicted_cats
})
prediction_df["correct_prediction"] = prediction_df["actual_categories"] == prediction_df["predicted_categories"]

In [122]:
incorrect_results = prediction_df[prediction_df["correct_prediction"] == False]
prediction_df["correct_prediction"].value_counts()

Unnamed: 0_level_0,count
correct_prediction,Unnamed: 1_level_1
True,461
False,139


In [123]:
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(prediction_df["actual_categories"],
                       prediction_df["predicted_categories"]))

print(classification_report(prediction_df["actual_categories"],
                            prediction_df["predicted_categories"]))

[[176 124]
 [ 15 285]]
              precision    recall  f1-score   support

     Fiction       0.92      0.59      0.72       300
  Nonfiction       0.70      0.95      0.80       300

    accuracy                           0.77       600
   macro avg       0.81      0.77      0.76       600
weighted avg       0.81      0.77      0.76       600



In [124]:
results[20]

{'sequence': 'Flying Home and Other Stories Written between 1937 and 1954 and now available in paperback for the first time, these thirteen stories are a potent distillation of the genius of Ralph Ellison. Six of them remained unpublished during Ellison\'s lifetime and were discovered among the author\'s effects in a folder labeled "Early Stories." But they all bear the hallmarks--the thematic reach, musically layered voices, and sheer ebullience--that Ellison would bring to his classic ,Invisible Man,.,The tales in ,Flying Home, range in setting from the Jim Crow South to a Harlem bingo parlor, from the hobo jungles of the Great Depression to Wales during the Second World War. By turns lyrical, scathing, touching, and transcendently wise, ,Flying Home and Other Stories, is a historic volume, an extravagant last bequest from a giant of our literature.',
 'labels': ['Fiction', 'Nonfiction'],
 'scores': [0.8421758413314819, 0.15782414376735687]}

In [125]:
other_cats = loaded_books_df['category'] == 'Other'
other_cats.value_counts()

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
False,64534
True,11216


In [126]:
isbns = []
predicted_cats = []

missing_cats = loaded_books_df.loc[
    loaded_books_df["category"] == "Other",
    ["isbn", "desc"]
].reset_index(drop=True)

In [127]:
from tqdm import tqdm

for i in tqdm(range(len(missing_cats))):
    sequence = missing_cats["desc"].iloc[i]
    predicted_cats.append(generate_predictions(sequence, fiction_categories))
    isbns.append(missing_cats["isbn"].iloc[i])

100%|██████████| 11216/11216 [22:48<00:00,  8.20it/s]


In [128]:
missing_prediction_df = pd.DataFrame({
    "isbn": isbns,
    "predicted_categories": predicted_cats
})
missing_prediction_df

Unnamed: 0,isbn,predicted_categories
0,752456830,Nonfiction
1,791407209,Fiction
2,961328983,Nonfiction
3,913750336,Nonfiction
4,972435395,Nonfiction
...,...,...
11211,1592538592,Nonfiction
11212,1454909684,Nonfiction
11213,1849940622,Nonfiction
11214,71819266,Nonfiction


In [129]:
# find total books with category other
loaded_books_df.loc[loaded_books_df['category'] == 'Other']

Unnamed: 0,author,desc,genre,isbn,rating,reviews,title,tag,category
7,Nick Le Neve Walmsley,At the time of her construction in the late 19...,[],752456830,5.00,2,R101: A Pictorial History,752456830 At the time of her construction in t...,Other
9,Mark Verman,The earliest medieval Jewish mystical writings...,[],791407209,4.75,1,The Books of Contemplation: Medieval Jewish My...,791407209 The earliest medieval Jewish mystica...,Other
13,Graham Purchase,"In this wide-ranging book, Graham Purchase, on...","[Biology, Ecology]",961328983,3.44,1,Anarchism & Environmental Survival,"961328983 In this wide-ranging book, Graham Pu...",Other
26,"Leo Bagrow,Raleigh Ashlin Skelton",This illustrated work is intended to acquaint ...,[],913750336,5.00,0,History of Cartography: Enlarged Second Edition,913750336 This illustrated work is intended to...,Other
32,Sumpter Priddy,"Between 1790 and 1840, millions of middle-clas...",[],972435395,4.13,0,American Fancy: Exuberance in the Arts 1790-1840,"972435395 Between 1790 and 1840, millions of m...",Other
...,...,...,...,...,...,...,...,...,...
75720,Ari Bendersky,Whether you're a food photographer or a food l...,"[Cookbooks, Food, Food and Drink]",1592538592,3.00,6,"1,000 Food Art and Styling Ideas: Mouthwaterin...",1592538592 Whether you're a food photographer ...,Other
75725,J.J. Mendoza Fernandez,"\n ,Keep your mind fit with brain aerobics!,\...",[],1454909684,3.00,1,Brain Aerobics Mindteasers,"1454909684 \n ,Keep your mind fit with brain ...",Other
75730,Judith Glover,"Drink up, with this fully revised and wonderfu...","[Cookbooks, Food and Drink]",1849940622,3.64,2,Drink Your Own Garden: A Homebrew Guide Using ...,"1849940622 Drink up, with this fully revised a...",Other
75745,Simon Monk,Design custom printed circuit boards with EAGL...,[],71819266,4.07,7,Make Your Own PCBs with Eagle: From Schematic ...,71819266 Design custom printed circuit boards ...,Other


In [130]:
loaded_books_df = loaded_books_df.drop(
    columns=[c for c in loaded_books_df.columns if c.startswith("predicted_categories")],
    errors="ignore"
)

# 2) merge fresh
loaded_books_df = pd.merge(loaded_books_df, missing_prediction_df, on="isbn", how="left")

# 3) overwrite category only where needed
mask = (loaded_books_df["category"] == "Other") & loaded_books_df["predicted_categories"].notna()
loaded_books_df.loc[mask, "category"] = loaded_books_df.loc[mask, "predicted_categories"]

# 4) final df without helper column
books = loaded_books_df.drop(columns=["predicted_categories"])

In [131]:
books

Unnamed: 0,author,desc,genre,isbn,rating,reviews,title,tag,category
0,Laurence M. Hauptman,Reveals that several hundred thousand Indians ...,"[Civil War, Military History, American History...",002914180X,3.52,5,Between Two Fires: American Indians in the Civ...,002914180X Reveals that several hundred thousa...,Nonfiction
1,"Charlotte Fiell,Emmanuelle Dirix",Fashion Sourcebook - 1920s is the first book i...,"[Couture, Nonfiction, Art, Fashion, Historical]",1906863482,4.51,6,Fashion Sourcebook 1920s,1906863482 Fashion Sourcebook - 1920s is the f...,Nonfiction
2,Andy Anderson,The seminal history and analysis of the Hungar...,"[Politics, History]",948984147,4.15,2,Hungary 56,948984147 The seminal history and analysis of ...,Nonfiction
3,Carlotta R. Anderson,"""All-American Anarchist"" chronicles the life a...","[Labor, History]",814327079,3.83,1,All-American Anarchist: Joseph A. Labadie and ...,"814327079 ""All-American Anarchist"" chronicles ...",Nonfiction
4,Jeffrey Pfeffer,Why is common sense so uncommon when it comes ...,"[Human Resources, Leadership, Management, Nonf...",875848419,3.73,7,The Human Equation: Building Profits by Puttin...,875848419 Why is common sense so uncommon when...,Nonfiction
...,...,...,...,...,...,...,...,...,...
75745,Simon Monk,Design custom printed circuit boards with EAGL...,[],71819266,4.07,7,Make Your Own PCBs with Eagle: From Schematic ...,71819266 Design custom printed circuit boards ...,Nonfiction
75746,"Tracie L. Miller-Nobles,Brenda L. Mattison,Ell...","Redefining tradition in learning accounting. ,...",[],133251241,4.05,1,Horngren's Financial & Managerial Accounting,133251241 Redefining tradition in learning acc...,Nonfiction
75747,C. John Miller,In these warm reflections on his own growth as...,"[Religion, Christian, Nonfiction, Theology, Ch...",875523919,4.27,20,A Faith Worth Sharing: A Lifetime of Conversat...,875523919 In these warm reflections on his own...,Nonfiction
75748,Albert Marrin,"John Brown is a man of many legacies, from her...","[Civil War, Military History, Juvenile, Middle...",307981533,3.63,51,A Volcano Beneath the Snow: John Brown's War A...,307981533 John Brown is a man of many legacies...,Nonfiction


In [132]:
books.to_csv('categorized_books.csv', index = False)