# Contents
- [Directory Navigation](#dir_nav)  
- [Read Files](#rd_f)
- [Write Files](#wt_f) 
- [Compare Files](#fl_comp)
- [Requests](#rqsts)
- [Read PDF](#rd_pdf)

In [1]:
import os

# Directory Navigation <a name="dir_nav"></a>

## List Items in Directory

In [2]:
# global path reference
global_path = '/home/kevcon/ds/github/ds-understanding/coding_reference/data'
os.listdir(global_path)

['dog.py',
 'wc_image.png',
 'write_2.txt',
 'usa.png',
 'write.txt',
 'plt_ref_fig.png',
 'Life_expectancy_dataset.csv',
 'write.json',
 '__pycache__',
 'Concrete_Data.csv',
 'nlp_paper.pdf',
 '.Rhistory',
 'life_expectancy_out.csv']

In [3]:
# local path reference
local_path = 'data'
os.listdir(local_path)

['dog.py',
 'wc_image.png',
 'write_2.txt',
 'usa.png',
 'write.txt',
 'plt_ref_fig.png',
 'Life_expectancy_dataset.csv',
 'write.json',
 '__pycache__',
 'Concrete_Data.csv',
 'nlp_paper.pdf',
 '.Rhistory',
 'life_expectancy_out.csv']

### List only files

In [4]:
[f for f in os.listdir(local_path) if os.path.isfile(os.path.join(local_path, f))]

['dog.py',
 'wc_image.png',
 'write_2.txt',
 'usa.png',
 'write.txt',
 'plt_ref_fig.png',
 'Life_expectancy_dataset.csv',
 'write.json',
 'Concrete_Data.csv',
 'nlp_paper.pdf',
 '.Rhistory',
 'life_expectancy_out.csv']

### Glob

In [5]:
import glob

In [6]:
# return specified filetype with global reference
glob.glob('/home/kevcon/ds/github/ds-understanding/coding_reference/data/*.csv')

['/home/kevcon/ds/github/ds-understanding/coding_reference/data/Life_expectancy_dataset.csv',
 '/home/kevcon/ds/github/ds-understanding/coding_reference/data/Concrete_Data.csv',
 '/home/kevcon/ds/github/ds-understanding/coding_reference/data/life_expectancy_out.csv']

In [7]:
# return specified filetype with local reference
glob.glob('data/*.csv')

['data/Life_expectancy_dataset.csv',
 'data/Concrete_Data.csv',
 'data/life_expectancy_out.csv']

In [8]:
# return files with variable character
glob.glob('data/us?.png')

['data/usa.png']

In [9]:
# return files starting with range of characters
glob.glob('data/[dup]*')

['data/dog.py', 'data/usa.png', 'data/plt_ref_fig.png']

# Read Files <a name="rd_f"></a>

In [10]:
path = '/home/kevcon/nltk_data/corpora/movie_reviews/pos/'

In [11]:
file = os.listdir(path)[0]

## Read File as String

In [12]:
# open file
f = open(path + file, 'r')
# read all lines as single string
document = f.read()
# close file
f.close()

In [13]:
document

' " love is the devil " is a challenging film , munundating its audience with wild imagery and a plot structure that disallows a plot , perhaps in an attempt to get us to know the artist\'s psyche rather than the artist\'s lifeline . \nwatching it , i was enthralled with the look of the film , the way the director shot everything like it was looking through a bizarre , personalized filter . \neverything looks like it is not how life looks like but how painter francis bacon , the film\'s subject , looked at it personally . \nbut while i was engrossed , i stumbled upon my thoughts halfway through the film , awakened from my trance by some inner distraction , and began to try and follow what\'s going on . \nexactly what was i looking at ? \nwatching this film , i wasn\'t sure if it was the most insightful film i had ever seen or the most vacuous . \ndirected ( and written ) by john maybury , " love is the devil " is stylish masterpiece for the senses . \neverything looks originally bizarr

In [14]:
document[0]

' '

In [15]:
# read specified range of strings
f = open(path + file, 'r')
document = f.read(20)
f.close()

In [16]:
document

' " love is the devil'

## Read File as List of Lines

In [17]:
f = open(path + file, 'r')
# read each line
document = f.readlines()
f.close()

In [18]:
document

[' " love is the devil " is a challenging film , munundating its audience with wild imagery and a plot structure that disallows a plot , perhaps in an attempt to get us to know the artist\'s psyche rather than the artist\'s lifeline . \n',
 'watching it , i was enthralled with the look of the film , the way the director shot everything like it was looking through a bizarre , personalized filter . \n',
 "everything looks like it is not how life looks like but how painter francis bacon , the film's subject , looked at it personally . \n",
 "but while i was engrossed , i stumbled upon my thoughts halfway through the film , awakened from my trance by some inner distraction , and began to try and follow what's going on . \n",
 'exactly what was i looking at ? \n',
 "watching this film , i wasn't sure if it was the most insightful film i had ever seen or the most vacuous . \n",
 'directed ( and written ) by john maybury , " love is the devil " is stylish masterpiece for the senses . \n',
 'e

In [19]:
document[0]

' " love is the devil " is a challenging film , munundating its audience with wild imagery and a plot structure that disallows a plot , perhaps in an attempt to get us to know the artist\'s psyche rather than the artist\'s lifeline . \n'

## Read File using With

In [20]:
# read file using with
with open(path + file, 'r') as f:
    document = f.read()

In [21]:
document

' " love is the devil " is a challenging film , munundating its audience with wild imagery and a plot structure that disallows a plot , perhaps in an attempt to get us to know the artist\'s psyche rather than the artist\'s lifeline . \nwatching it , i was enthralled with the look of the film , the way the director shot everything like it was looking through a bizarre , personalized filter . \neverything looks like it is not how life looks like but how painter francis bacon , the film\'s subject , looked at it personally . \nbut while i was engrossed , i stumbled upon my thoughts halfway through the film , awakened from my trance by some inner distraction , and began to try and follow what\'s going on . \nexactly what was i looking at ? \nwatching this film , i wasn\'t sure if it was the most insightful film i had ever seen or the most vacuous . \ndirected ( and written ) by john maybury , " love is the devil " is stylish masterpiece for the senses . \neverything looks originally bizarr

## Read File by Line

In [22]:
lines = []
with open(path + file, 'r') as f:
    for line in f:
        lines.append(line)

In [23]:
lines

[' " love is the devil " is a challenging film , munundating its audience with wild imagery and a plot structure that disallows a plot , perhaps in an attempt to get us to know the artist\'s psyche rather than the artist\'s lifeline . \n',
 'watching it , i was enthralled with the look of the film , the way the director shot everything like it was looking through a bizarre , personalized filter . \n',
 "everything looks like it is not how life looks like but how painter francis bacon , the film's subject , looked at it personally . \n",
 "but while i was engrossed , i stumbled upon my thoughts halfway through the film , awakened from my trance by some inner distraction , and began to try and follow what's going on . \n",
 'exactly what was i looking at ? \n',
 "watching this film , i wasn't sure if it was the most insightful film i had ever seen or the most vacuous . \n",
 'directed ( and written ) by john maybury , " love is the devil " is stylish masterpiece for the senses . \n',
 'e

## Remove \n character

In [24]:
lines = []
with open(path + file, 'r') as f:
    for line in f:
        # remove blank space at ends
        lines.append(line.strip())

In [25]:
lines

['" love is the devil " is a challenging film , munundating its audience with wild imagery and a plot structure that disallows a plot , perhaps in an attempt to get us to know the artist\'s psyche rather than the artist\'s lifeline .',
 'watching it , i was enthralled with the look of the film , the way the director shot everything like it was looking through a bizarre , personalized filter .',
 "everything looks like it is not how life looks like but how painter francis bacon , the film's subject , looked at it personally .",
 "but while i was engrossed , i stumbled upon my thoughts halfway through the film , awakened from my trance by some inner distraction , and began to try and follow what's going on .",
 'exactly what was i looking at ?',
 "watching this film , i wasn't sure if it was the most insightful film i had ever seen or the most vacuous .",
 'directed ( and written ) by john maybury , " love is the devil " is stylish masterpiece for the senses .',
 'everything looks origin

# Write Files <a name="wt_f"></a>

In [1]:
# write file line by line 
write_file = open('data/write_1.txt', 'w') 
write_file.write('Hello World \n') 
write_file.write('This is our new text file\n') 
write_file.write('and this is another line.\n') 
write_file.write('Why? Because we can.') 
write_file.close() 

In [2]:
# write file using with
with open('data/write_2.txt', 'w') as outfile:
    outfile.write('Hello World \nThis is our new text file\nand this is another line.\nWhy? Because we can.')

# Export Object

In [3]:
# define object (dictionary)
write_dict = {'Brutus': 'Sic Semper Tyrannis!', 'Julius': 'Et tu, Brute??'}

## Text

In [4]:
# create and write data to text file
with open('data/write.txt', 'w') as outfile:
    outfile.write(str(write_dict))

In [5]:
# open and read text file
with open('data/write.txt', 'r') as infile:
    document = infile.read()

In [6]:
document

"{'Brutus': 'Sic Semper Tyrannis!', 'Julius': 'Et tu, Brute??'}"

### convert back to dictionary

In [7]:
import ast

In [8]:
ast.literal_eval(document)

{'Brutus': 'Sic Semper Tyrannis!', 'Julius': 'Et tu, Brute??'}

## CSV

In [9]:
import csv

In [10]:
# create and write data to csv file
with open('data/write.csv', 'w') as outfile:
    writer = csv.writer(outfile)
    for key, value in write_dict.items():
        writer.writerow([key, value])

In [11]:
# open and read csv file
with open('data/write.csv') as infile:
    reader = csv.reader(infile)

In [12]:
reader

<_csv.reader at 0x7fa8e0108518>

### convert back to dictionary

In [14]:
with open('data/write.csv') as infile:
    reader = dict(csv.reader(infile))

In [15]:
reader

{'Brutus': 'Sic Semper Tyrannis!', 'Julius': 'Et tu, Brute??'}

## JSON

In [16]:
import json

In [17]:
# create and write data to json file
with open('data/write.json', 'w') as outfile:
    json.dump(write_dict, outfile)

In [18]:
# open and read csv file
with open('data/write.json', 'r') as infile:
    document = json.load(infile)

In [19]:
document

{'Brutus': 'Sic Semper Tyrannis!', 'Julius': 'Et tu, Brute??'}

## Pickle

In [20]:
import pickle

In [21]:
# create and write data to pkl file
with open('data/write.pkl', 'wb') as outfile:
    pickle.dump(write_dict, outfile)

In [22]:
# open and read pkl file
with open('data/write.pkl', 'rb') as infile:
    document = pickle.load(infile)

In [23]:
document

{'Brutus': 'Sic Semper Tyrannis!', 'Julius': 'Et tu, Brute??'}

# Compare Files <a name="fl_comp"></a>

In [24]:
import filecmp

In [25]:
filecmp.cmp('data/write_1.txt', 'data/write_2.txt')

True

In [27]:
filecmp.cmp('data/write.txt', 'data/write_1.txt')

False

# Requests <a name="rqsts"></a>

In [34]:
import requests

In [35]:
raw = requests.get('https://kevscon.github.io/about.html')

In [36]:
raw.text

'<!DOCTYPE html>\n<html lang="en-us">\n\n  <head>\n  <link href="http://gmpg.org/xfn/11" rel="profile" />\n  <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n  <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n\n  <!-- Enable responsiveness on mobile devices-->\n  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1" />\n\n  <title>\n    \n      About &middot; Kevin\'s Data Science Portfolio\n    \n  </title>\n\n  \n\n\n  <!-- CSS -->\n  <link rel="stylesheet" href="/assets/css/main.css" />\n  \n\n<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Abril+Fatface" />\n\n  <!-- Icons -->\n  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="/favicon.png" />\n<link rel="shortcut icon" href="/favicon.ico" />\n\n<!-- <link rel="shortcut icon" type="image/png" href="/yinyang.png" > -->\n\n\n  <!-- RSS -->\n  <link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml" />\n\n  <!-- Addi

In [37]:
raw.headers

{'Server': 'GitHub.com', 'Content-Type': 'text/html; charset=utf-8', 'Strict-Transport-Security': 'max-age=31557600', 'Last-Modified': 'Fri, 10 Aug 2018 23:41:37 GMT', 'ETag': 'W/"5b6e22b1-1d6f"', 'Access-Control-Allow-Origin': '*', 'Expires': 'Wed, 14 Nov 2018 03:01:13 GMT', 'Cache-Control': 'max-age=600', 'Content-Encoding': 'gzip', 'X-GitHub-Request-Id': 'EC52:7D3B:14EB6FD:1C331FD:5BEB8DA0', 'Content-Length': '2968', 'Accept-Ranges': 'bytes', 'Date': 'Wed, 14 Nov 2018 02:51:13 GMT', 'Via': '1.1 varnish', 'Age': '0', 'Connection': 'keep-alive', 'X-Served-By': 'cache-dca17723-DCA', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1542163873.277810,VS0,VE8', 'Vary': 'Accept-Encoding', 'X-Fastly-Request-ID': '30ed8785bcca20f59a5f43157f4f342024f42949'}

# Read PDF <a name="rd_pdf"></a>

In [38]:
import PyPDF2

In [39]:
pdfFileObj = open('data/nlp_paper.pdf', 'rb')

In [40]:
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)



In [41]:
pageObj = pdfReader.getPage(0)

In [42]:
pageObj.extractText()

'White Paper on Natural Language Processing Ralph Weiscbedel, Chairperson BBN Systems and Technologies Corporation Jaime Carbonell Carnegie-Mellon University Barbara Grosz Harvard University Ł Wendy Lehnert University of Massachusetts, Amherst Mitchell Marcus University of Pennsylvania Raymond Perrault SRI International Robert Wilensky University of California, Berkeley I. Scope 1.1. Major Challenges We take the ultimate goal of natural language processing (NLP) to be the ability to use natural languages as effectively as humans do. Natural language, whether spoken, written, or typed, is the most natural means of communication between humans, and the mode of expression of choice for most of the documents they produce. As computers play a larger role in the preparation, acquisition, transmission, monitoring, storage, analysis, and transformation of information, endowing them with the ability to understand and generate information expressed in natural languages becomes more and more nece