# Milestone 01
# Kumaran Singaram

## Best Practices for Assignments & Milestones

- <b>Break the assignment into sections - one section per numbered requirement.</b> Each assignment has numbered requirements/instructions e.g. "1. Read the CIFAR-10 dataset". Each requirement should have at least one markdown cell and at least one code cell. Feel free to combine sections or make other sensible changes if that makes sense for your code and is still clear. The intent is to give you a useful structure and to make sure you get full credit for your work.

- <b>Break the milestone into sections - one section for each item in the rubric.</b> Each milestone has rubric items e.g. "5. Handle class imbalance problem". Each rubric item should have at least one markdown cell and at least one code cell. Feel free to combine sections or make other sensible changes if that makes sense for your code and is still clear. The intent is to give you a useful structure and to make sure you get full credit for your work.

- <b>Include comments, with block comments preferred over in-line comments.</b> A good habit is to start each code cell with comments.

The above put into a useful pattern:

<b>Markdown cell:</b> Requirement #1: Read the CIFAR-10 dataset<br>
<b>Code cell:</b>: Comments followed by code<br>
<b>Markdown cell:</b> Requirement #2: Explore the data<br>
<b>Code cell:</b>: Comments followed by code<br>
<b>Markdown cell:</b> Requirement #3: Preprocess the data and prepare for classification<br>
<b>Code cell:</b>: Comments followed by code<br>

For more information:
- A good notebook example: [DataFrame Basics](https://github.com/Tanu-N-Prabhu/Python/blob/master/Pandas/Pandas_DataFrame.ipynb) 
- More example notebooks: [A gallery of interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#pandas-for-data-analysis)
- [PEP 8 on commenting](https://www.python.org/dev/peps/pep-0008/)
- [PEP 257 - docstrings](https://www.python.org/dev/peps/pep-0257/)

Occasionally an assignment or milestone will ask you to do something other than write Python code e.g. ask you turn in a .docx file. In which case, please use logical structuring, but the specific notes above may not apply.

**Requirement #1: Import JSON file - books.json**

In [1]:
import json
with open('data/books.json') as f:
    books_dict = json.load(f)

**Requirement #2: Loop through each book in the data and extract the following information:**

num_authors: number of authors (we can extract this from the list of authors)

isbn: this can be directly extracted

pageCount: this can be directly extracted

title: this can be directly extracted

desc_len: the number of words in the long description (we can extract this from the longDescription entry for each book. Use 0 if there is no longDescription entry.

has_word_data: whether the word "data" appears in the longDescription entry of each book. This is a True / False column, also called a binary column or a flag column.

In [3]:
#initialize empty list
#build for loop
#count number of authors
#create rows with each criteria

rows = []

for item in books_dict:
    num_authors = len(item['authors'])
    try:
        row = (item['title'], 
               item['pageCount'], 
               num_authors,
               len(item['longDescription'].split()),
               "data" in item['longDescription'],
               item['isbn'])
        rows.append(row)
    except:
        pass

**Requirement #3: Import the pandas library as pd and use pd.DataFrame to create a structured tabular data whose columns match the list above and whose content is the content you extracted above. Show the first 20 rows of the data to make sure the data looks alright. You can do that using the .head(20) method.**

In [85]:
#import pandas to use pd.dataframe
#show top 20 rows
import pandas as pd

df = pd.DataFrame(rows, columns = ['title', 'page count', '# of authors', 'desc word count', 'data?', 'isbn'])
df.head(20)

Unnamed: 0,title,page count,# of authors,desc word count,data?,isbn
0,Unlocking Android,416,3,252,True,1933988673
1,"Android in Action, Second Edition",592,2,101,False,1935182722
2,Flex 3 in Action,576,2,254,True,1933988746
3,Flex 4 in Action,600,4,329,True,1935182420
4,Collective Intelligence in Action,425,1,244,True,1933988312
5,Zend Framework in Action,432,3,291,True,1933988320
6,Flex on Java,265,2,273,True,1933988797
7,Griffon in Action,375,4,240,False,1935182234
8,OSGi in Depth,325,1,269,False,193518217X
9,Flexible Rails,592,1,374,True,1933988509
