# Ramsey King
# DSC 540 - Data Preparation
# July 3, 2021
# Milestone 2

### _Cleaning/Formatting Flat File Source_
### Perform at least 5 data transformation and/or cleansing steps to your flat file data.

The flat file that I have chosen for my term project is a tabular format of the King James Version of the Bible as found from package 'scriptuRs' and converted to a csv file in RStudio.

Because this is a package in R, the data is pretty clean to begin with.  The 5 data transformations that I will do are as follows:

1. Check for missing values in any of the columns that will be kept in the final data set.

2. Delete the 'volume_subtitle' column.

3. Delete the 'book_subtitle' column.

These columns will be deleted since at this time it they do not provide any information pertinent to the term project.

4. Strip whitespace (if any) from the 'verse_title' column.

5. Strip whitespace (if andy) from the 'verse_short_title' column.

This will make sure that there aren't any whitespace issues that may cause issues when joining the other datasets.  These two columns will be important in joining all the data together.

6. Add a column heading of 'row_id' to the first column.

This will be done because the first column does not have a header.

In [1]:
import numpy as np
import pandas as pd

# Load the kjv.csv file

kjv_data = pd.read_csv('kjv.csv', sep=",")
kjv_data.head()

Unnamed: 0.1,Unnamed: 0,volume_id,book_id,chapter_id,verse_id,volume_title,book_title,volume_long_title,book_long_title,volume_subtitle,book_subtitle,volume_short_title,book_short_title,volume_lds_url,book_lds_url,chapter_number,verse_number,text,verse_title,verse_short_title
0,1,1,1,1,1,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,,OT,Gen.,ot,gen,1,1,IN the beginning God created the heaven and th...,Genesis 1:1,Gen. 1:1
1,2,1,1,1,2,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,,OT,Gen.,ot,gen,1,2,"And the earth was without form, and void; and ...",Genesis 1:2,Gen. 1:2
2,3,1,1,1,3,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,,OT,Gen.,ot,gen,1,3,"And God said, Let there be light: and there wa...",Genesis 1:3,Gen. 1:3
3,4,1,1,1,4,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,,OT,Gen.,ot,gen,1,4,"And God saw the light, that it was good: and G...",Genesis 1:4,Gen. 1:4
4,5,1,1,1,5,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,,OT,Gen.,ot,gen,1,5,"And God called the light Day, and the darkness...",Genesis 1:5,Gen. 1:5


In [2]:
# Transformation 1: Check for missing values in any of the columns that will be kept in the final data set.

for c in kjv_data.columns:
    miss = kjv_data[c].isnull().sum()
    if miss>0:
        print("{} has {} missing value(s).".format(c,miss))
    else:
        print("{} has no missing values.".format(c))

Unnamed: 0 has no missing values.
volume_id has no missing values.
book_id has no missing values.
chapter_id has no missing values.
verse_id has no missing values.
volume_title has no missing values.
book_title has no missing values.
volume_long_title has no missing values.
book_long_title has no missing values.
volume_subtitle has 23145 missing value(s).
book_subtitle has 28062 missing value(s).
volume_short_title has no missing values.
book_short_title has no missing values.
volume_lds_url has no missing values.
book_lds_url has no missing values.
chapter_number has no missing values.
verse_number has no missing values.
text has no missing values.
verse_title has no missing values.
verse_short_title has no missing values.


The volume_subtitle and book_subtitle columns will be deleted since they have missing values and they do not seem to be pertinent to the project.

In [3]:
# 2. Delete the 'volume_subtitle' column.

del kjv_data['volume_subtitle']
kjv_data

Unnamed: 0.1,Unnamed: 0,volume_id,book_id,chapter_id,verse_id,volume_title,book_title,volume_long_title,book_long_title,book_subtitle,volume_short_title,book_short_title,volume_lds_url,book_lds_url,chapter_number,verse_number,text,verse_title,verse_short_title
0,1,1,1,1,1,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,OT,Gen.,ot,gen,1,1,IN the beginning God created the heaven and th...,Genesis 1:1,Gen. 1:1
1,2,1,1,1,2,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,OT,Gen.,ot,gen,1,2,"And the earth was without form, and void; and ...",Genesis 1:2,Gen. 1:2
2,3,1,1,1,3,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,OT,Gen.,ot,gen,1,3,"And God said, Let there be light: and there wa...",Genesis 1:3,Gen. 1:3
3,4,1,1,1,4,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,OT,Gen.,ot,gen,1,4,"And God saw the light, that it was good: and G...",Genesis 1:4,Gen. 1:4
4,5,1,1,1,5,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,,OT,Gen.,ot,gen,1,5,"And God called the light Day, and the darkness...",Genesis 1:5,Gen. 1:5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31097,31098,2,66,1189,31098,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,,NT,Rev.,nt,rev,22,17,"And the Spirit and the bride say, Come. And le...",Revelation 22:17,Rev. 22:17
31098,31099,2,66,1189,31099,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,,NT,Rev.,nt,rev,22,18,For I testify unto every man that heareth the ...,Revelation 22:18,Rev. 22:18
31099,31100,2,66,1189,31100,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,,NT,Rev.,nt,rev,22,19,And if any man shall take away from the words ...,Revelation 22:19,Rev. 22:19
31100,31101,2,66,1189,31101,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,,NT,Rev.,nt,rev,22,20,"He which testifieth these things saith, Surely...",Revelation 22:20,Rev. 22:20


In [4]:
# 3. Delete the 'book_subtitle' column.

del kjv_data['book_subtitle']
kjv_data

Unnamed: 0.1,Unnamed: 0,volume_id,book_id,chapter_id,verse_id,volume_title,book_title,volume_long_title,book_long_title,volume_short_title,book_short_title,volume_lds_url,book_lds_url,chapter_number,verse_number,text,verse_title,verse_short_title
0,1,1,1,1,1,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,1,IN the beginning God created the heaven and th...,Genesis 1:1,Gen. 1:1
1,2,1,1,1,2,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,2,"And the earth was without form, and void; and ...",Genesis 1:2,Gen. 1:2
2,3,1,1,1,3,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,3,"And God said, Let there be light: and there wa...",Genesis 1:3,Gen. 1:3
3,4,1,1,1,4,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,4,"And God saw the light, that it was good: and G...",Genesis 1:4,Gen. 1:4
4,5,1,1,1,5,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,5,"And God called the light Day, and the darkness...",Genesis 1:5,Gen. 1:5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31097,31098,2,66,1189,31098,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,NT,Rev.,nt,rev,22,17,"And the Spirit and the bride say, Come. And le...",Revelation 22:17,Rev. 22:17
31098,31099,2,66,1189,31099,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,NT,Rev.,nt,rev,22,18,For I testify unto every man that heareth the ...,Revelation 22:18,Rev. 22:18
31099,31100,2,66,1189,31100,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,NT,Rev.,nt,rev,22,19,And if any man shall take away from the words ...,Revelation 22:19,Rev. 22:19
31100,31101,2,66,1189,31101,New Testament,Revelation,The New Testament,The Revelation of St John the Divine,NT,Rev.,nt,rev,22,20,"He which testifieth these things saith, Surely...",Revelation 22:20,Rev. 22:20


In [5]:
# 4. Strip whitespace (if any) from the 'verse_title' column.

kjv_data['verse_title'].str.strip()

0             Genesis 1:1
1             Genesis 1:2
2             Genesis 1:3
3             Genesis 1:4
4             Genesis 1:5
               ...       
31097    Revelation 22:17
31098    Revelation 22:18
31099    Revelation 22:19
31100    Revelation 22:20
31101    Revelation 22:21
Name: verse_title, Length: 31102, dtype: object

In [6]:
# 5. Strip whitespace (if any) from the 'verse_short_title' column.

kjv_data['verse_short_title'].str.strip()

0          Gen. 1:1
1          Gen. 1:2
2          Gen. 1:3
3          Gen. 1:4
4          Gen. 1:5
            ...    
31097    Rev. 22:17
31098    Rev. 22:18
31099    Rev. 22:19
31100    Rev. 22:20
31101    Rev. 22:21
Name: verse_short_title, Length: 31102, dtype: object

In [7]:
# 6. Add a column heading of 'row_id' to the first column.

kjv_data.rename(columns = {'Unnamed: 0' : 'row_id'}, inplace=True)
kjv_data.head()

Unnamed: 0,row_id,volume_id,book_id,chapter_id,verse_id,volume_title,book_title,volume_long_title,book_long_title,volume_short_title,book_short_title,volume_lds_url,book_lds_url,chapter_number,verse_number,text,verse_title,verse_short_title
0,1,1,1,1,1,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,1,IN the beginning God created the heaven and th...,Genesis 1:1,Gen. 1:1
1,2,1,1,1,2,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,2,"And the earth was without form, and void; and ...",Genesis 1:2,Gen. 1:2
2,3,1,1,1,3,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,3,"And God said, Let there be light: and there wa...",Genesis 1:3,Gen. 1:3
3,4,1,1,1,4,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,4,"And God saw the light, that it was good: and G...",Genesis 1:4,Gen. 1:4
4,5,1,1,1,5,Old Testament,Genesis,The Old Testament,The First Book of Moses called Genesis,OT,Gen.,ot,gen,1,5,"And God called the light Day, and the darkness...",Genesis 1:5,Gen. 1:5
