# Reading File Directories and Exploring WordNet

This notebook provides some guidance on working with file directories for a text corpus. It's also introducing some things we can explore with the WordNet dictionary accessible through NLTK.

Read about [Wordnet in Chapter 2, section 5.2](https://www.nltk.org/book/ch02.html#wordnet) of the NLTK book. 


In [1]:
# Imports
import os
from nltk.corpus import wordnet as wn


In [2]:
# SMOKE TEST: Explore Wordnet for specific words.
wn.synsets('clear')

[Synset('clear.n.01'),
 Synset('open.n.01'),
 Synset('unclutter.v.01'),
 Synset('clear.v.02'),
 Synset('clear_up.v.04'),
 Synset('authorize.v.01'),
 Synset('clear.v.05'),
 Synset('pass.v.09'),
 Synset('clear.v.07'),
 Synset('clear.v.08'),
 Synset('clear.v.09'),
 Synset('clear.v.10'),
 Synset('clear.v.11'),
 Synset('clear.v.12'),
 Synset('net.v.02'),
 Synset('net.v.01'),
 Synset('gain.v.08'),
 Synset('clear.v.16'),
 Synset('clear.v.17'),
 Synset('acquit.v.01'),
 Synset('clear.v.19'),
 Synset('clear.v.20'),
 Synset('clear.v.21'),
 Synset('clear.v.22'),
 Synset('clear.v.23'),
 Synset('clear.v.24'),
 Synset('clear.a.01'),
 Synset('clear.s.02'),
 Synset('clear.s.03'),
 Synset('clear.a.04'),
 Synset('clear.s.05'),
 Synset('clear.s.06'),
 Synset('clean.s.03'),
 Synset('clear.s.08'),
 Synset('clear.s.09'),
 Synset('well-defined.a.02'),
 Synset('clear.a.11'),
 Synset('clean.s.02'),
 Synset('clear.s.13'),
 Synset('clear.s.14'),
 Synset('clear.s.15'),
 Synset('absolved.s.01'),
 Synset('clear.s.17

### Look at some synset data from WordNet
Choose one of the synsets by its identifier, and lets explore what you can see with it.
In the next couple of cells we explore the various lemmas connected to a particular synset for a term.

In [3]:
wn.synset('net.v.02').lemmas()

[Lemma('net.v.02.net'), Lemma('net.v.02.clear')]

In [4]:
for synset in wn.synsets('clear'):
    print(synset.lemma_names(), len(synset.lemma_names()))

['clear'] 1
['open', 'clear'] 2
['unclutter', 'clear'] 2
['clear'] 1
['clear_up', 'clear', 'light_up', 'brighten'] 4
['authorize', 'authorise', 'pass', 'clear'] 4
['clear'] 1
['pass', 'clear'] 2
['clear'] 1
['clear'] 1
['clear', 'top'] 2
['clear', 'clear_up', 'shed_light_on', 'crystallize', 'crystallise', 'crystalize', 'crystalise', 'straighten_out', 'sort_out', 'enlighten', 'illuminate', 'elucidate'] 12
['clear'] 1
['clear'] 1
['net', 'clear'] 2
['net', 'sack', 'sack_up', 'clear'] 4
['gain', 'take_in', 'clear', 'make', 'earn', 'realize', 'realise', 'pull_in', 'bring_in'] 9
['clear'] 1
['clear'] 1
['acquit', 'assoil', 'clear', 'discharge', 'exonerate', 'exculpate'] 6
['clear', 'solve'] 2
['clear'] 1
['clear'] 1
['clear'] 1
['clear'] 1
['clear', 'clear_up'] 2
['clear'] 1
['clear'] 1
['clear', 'open'] 2
['clear'] 1
['clear'] 1
['clear'] 1
['clean', 'clear', 'light', 'unclouded'] 4
['clear', 'unmortgaged'] 2
['clear', 'clean-cut', 'clear-cut'] 3
['well-defined', 'clear'] 2
['clear'] 1
['c

## Read in just one file from a directory in your repo
Open one of your text files in your repo for reading.
In this example, we'll climb directories, and that means we'll use the os library to show you how to handle filepaths.

When your file isn't immediately in the same folder as your Python script, and you need to climb for it, start with os library by getting the current working directory:

In [5]:
# cwd is my shorthand for "current working directory"
cwd = os.getcwd()
print(cwd)
# climb up one directory and retrieve a file. (here's how you would do that)
# (ADAPT THIS CODE TO REACH DOWN OR UP AS NEEDED.)
filepath = cwd + '/../grimm.txt'
print(filepath)

# Now, Python must OPEN and READ the text file in order to process it:
f = open(filepath, 'r', encoding='utf8') 
readFile = f.read()
print(readFile)


/Users/eeb4/Documents/GitHub/newtfire/textAnalysis-Hub/Class-Examples/Python/readFileCollections-example
/Users/eeb4/Documents/GitHub/newtfire/textAnalysis-Hub/Class-Examples/Python/readFileCollections-example/../grimm.txt

    THE GOLDEN BIRD
    A certain king had a beautiful garden, and in the garden stood a tree which bore golden
      apples. These apples were always counted, and about the time when they began to grow ripe it
      was found that every night one of them was gone. The king became very angry at this, and
      ordered the gardener to keep watch all night under the tree. The gardener set his eldest son
      to watch; but about twelve o’clock he fell asleep, and in the morning another of the apples
      was missing. Then the second son was ordered to watch; and at midnight he too fell asleep, and
      in the morning another apple was gone. Then the third son offered to keep watch; but the
      gardener at first would not let him, for fear some harm should come to 

## Read in some project data from a collection of text files
You have a file directory with some text files probably near your Python script. 

 From the "for loop" in the next cell, we can then write code to process information about each file separately.

In [8]:
# Remember, we defined cwd as our current working directory holding this file.
coll = cwd + '/hughes-txt'
for file in os.listdir(coll):
    if file.endswith(".txt"):
        filepath = f"{coll}/{file}"
        print(filepath)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/eeb4/Documents/GitHub/newtfire/textAnalysis-Hub/Class-Examples/Python/readFileCollections-example/hughes-txt/'

## Processing the directory as one corpus
What if we want to create an NLTK corpus of these texts and process them as one unit? See the [NLTK book chapter 2 section 1.9](https://www.nltk.org/book/ch02.html#loading-your-own-corpus)

In [9]:
from nltk.corpus import PlaintextCorpusReader
corpus_root = 'hughes-txt'
corpus = PlaintextCorpusReader(corpus_root, '.*')
corpus.fileids()
# Check on one file in the collection
corpus.words('breakfast.txt')



OSError: No such file or directory: '/Users/eeb4/Documents/GitHub/newtfire/textAnalysis-Hub/Class-Examples/Python/readFileCollections-example/hughes-txt'

In [None]:
corpus.words