## Very first code

In [33]:
print("hello, world!")

hello, world!


## The string type
String type objects are enclosed in quotation marks. 

\+ is a concatenation operator.

Below, `greet` is a variable name assigned to a string value; note the absence of quotation marks.  

In [5]:
greet = "hello, world!"
greet + " here I come"

'hello, world! here I come'

String methods such as .upper(), .lower() transform a string. 

In [6]:
greet.upper()

'HELLO, WORLD!'

`len()` is a handy function that returns the length of a string in the # of characters. 

In [7]:
len(greet)

13

## Numbers
Integers and floats are written without quotes. 

You can use algebraic operations such as `+`, `-`, `*` and `/` with numbers. 

In [12]:
num1 = 5678
num2 = 3.141592
num2 * 1000

3141.592

## Conditionals
Python is famous for marking conditional blocks with indentation. The colon `:` signals the beginning of code block. 

`if condition1 ... elif condition2 ... else`

is how Python conditionals are structured. The `elif` and `else` block are optional.  

In [16]:
word = 'hippopotamus'
if len(word) > 13 :
    print("That is a long word.")
elif len(word) > 5 :
    print("Medium-length word.")
else :
    print("Short and sweet.")

Medium-length word.


## Lists
Lists are enclosed in `[ ]`, with elements separated with commas. Lists can have strings, numbers, and more. 

Like with string, you can use `len()` and also `+` with lists. 

In [36]:
li = ['red', 'blue', 'green', 'black', 'white', 'pink']
len(li)

6

In [37]:
li2 = ['orange', 'red', 'yellow']
li + li2

['red', 'blue', 'green', 'black', 'white', 'pink', 'orange', 'red', 'yellow']

## "in" operator
`in` can be used with strings or lists. 

With strings, it tests for substring-hood. 

With lists, it tests if an element is found in the list. 

`not` is the negation operator. 

In [38]:
'p' in word

True

In [42]:
'orange' not in li2

False

## List comprehension
List comprehension is magic.
Try: `.upper()`, `len()`, `+'ish'`

In [40]:
[x for x in li if 'e' not in x]

['black', 'pink']

In [56]:
[x + 'ish' for x in li]

['redish', 'blueish', 'greenish', 'blackish', 'whiteish', 'pinkish']

## Using NLTK
NLTK is an external module; you can start using it after importing it. 

`nltk.word_tokenize()` is a handy tokenizing function out of literally tons of functions it provides.

It turns a text (a single string) into a list tokenized words. 

In [44]:
import nltk

In [45]:
nltk.word_tokenize(greet)

['hello', ',', 'world', '!']

In [57]:
sent = "It's 5 o'clock somewhere...!!"
nltk.word_tokenize(sent)

['It', "'s", '5', "o'clock", 'somewhere', '...', '!', '!']

`nltk.FreqDist()` builds a frequency dictionary from a list. 

In [60]:
sent = 'A rose is a rose is a rose is a rose.'
toks = nltk.word_tokenize(sent.lower())
print(toks)

['a', 'rose', 'is', 'a', 'rose', 'is', 'a', 'rose', 'is', 'a', 'rose', '.']


In [62]:
freq = nltk.FreqDist(toks)
freq

FreqDist({'.': 1, 'a': 4, 'is': 3, 'rose': 4})

In [63]:
freq.most_common(3)

[('a', 4), ('rose', 4), ('is', 3)]

In [64]:
freq['rose']

4

## Reading in a text file
`open(filename).read()` reads in the content of a text file as a single string. 

In [48]:
myfile = 'corpus/JPSW1001.txt'
essay = open(myfile).read()
print(essay)

I agree greatly this topic mainly because I think that English becomes an official language in the not too distant. Now, many people can speak English or study it all over the world, and so more people will be able to speak English. Before the Japanese fall behind other people, we should be able to speak English, therefore, we must study English not only junior high school students or over but also pupils. Japanese education system is changing such a program. In this way, Japan tries to internationalize rapidly. However, I think this way won't suffice for becoming international humans. To becoming international humans, we should study English not only school but also daily life. If we can do it, we are able to master English conversation. It is important for us to master English honorific words. Without speaking English honorific, we can't speak English. If we speak English without it, it is rude of you, and so we should master proper English. Therefore, we should learn even our daily 

In [49]:
len(essay)

2368

In [51]:
'I am certain' in essay

True

In [53]:
nltk.word_tokenize(essay)

['I',
 'agree',
 'greatly',
 'this',
 'topic',
 'mainly',
 'because',
 'I',
 'think',
 'that',
 'English',
 'becomes',
 'an',
 'official',
 'language',
 'in',
 'the',
 'not',
 'too',
 'distant',
 '.',
 'Now',
 ',',
 'many',
 'people',
 'can',
 'speak',
 'English',
 'or',
 'study',
 'it',
 'all',
 'over',
 'the',
 'world',
 ',',
 'and',
 'so',
 'more',
 'people',
 'will',
 'be',
 'able',
 'to',
 'speak',
 'English',
 '.',
 'Before',
 'the',
 'Japanese',
 'fall',
 'behind',
 'other',
 'people',
 ',',
 'we',
 'should',
 'be',
 'able',
 'to',
 'speak',
 'English',
 ',',
 'therefore',
 ',',
 'we',
 'must',
 'study',
 'English',
 'not',
 'only',
 'junior',
 'high',
 'school',
 'students',
 'or',
 'over',
 'but',
 'also',
 'pupils',
 '.',
 'Japanese',
 'education',
 'system',
 'is',
 'changing',
 'such',
 'a',
 'program',
 '.',
 'In',
 'this',
 'way',
 ',',
 'Japan',
 'tries',
 'to',
 'internationalize',
 'rapidly',
 '.',
 'However',
 ',',
 'I',
 'think',
 'this',
 'way',
 'wo',
 "n't",
 'suf

In [55]:
word_tokens = nltk.word_tokenize(essay)
len(word_tokens)

471

In [65]:
word_freq = nltk.FreqDist(word_tokens)
word_freq.most_common(10)

[(',', 30),
 ('.', 22),
 ('to', 20),
 ('English', 18),
 ('we', 14),
 ('the', 13),
 ('people', 12),
 ('I', 11),
 ('is', 10),
 ('Japanese', 10)]