## NLP Understanding

We will be using Python, NLTK (Natural Language Tool Kit).
* The NLTK module is a massive tool kit, aimed at helping you with the entire Natural Language Processing (NLP) methodology.<br>

Top most Data structures used regularly in Python :
1. Strings
2. Lists
3. Tuples
4. Dictionaries
5. Sets<br>

### Installing NLTK in Python Environment :

    pip install -U nltk

* Once the installation is complete, go to command prompt, type 'python'.
* While in the Python environment, type 'import nltk', then
* nltk.download()

#### Installing NLTK in Anaconda distribution :

    conda install -c anaconda nltk 




# Python Crash Course

We will be covering 5 most data structures used in Python.

    1. Strings
    2. Lists
    3. Tuples
    4. Dictionaries
    5. Sets

#### Strings

In [1]:
# Declaring Strings

var1 = 'Single quote String'
var2 = "Double quote String"
print(var1)
print(var2)

Single quote String
Double quote String


In [6]:
# Combining Strings

print(var1 + " and " + var2) # using + operator
print(var1, "and", var2) # using , operator for appending strings; this appends space after each string automatically

Single quote String and Double quote String
Single quote String and Double quote String


In [7]:
# Combining a String with numeric data
# in this case, we only need to use comma(,) operator             [IMPORTANT]

number = 10
print(number, "Ten")

10 Ten


In [11]:
# this will cause error, appending numeric data with String using + operator gives error
# print(number + " Ten")

In [9]:
type(number)

int

In [10]:
type("Ten")

str

In [12]:
# Combining a string with numeric data
print(str(number) + " Ten")

10 Ten


In [13]:
var3 = "Cat"
len(var3)   # length of string

3

In [17]:
for i in var3:
    print(i)

C
a
t


In [18]:
for i in range(len(var3)):
    print(var3[i])

C
a
t


In [20]:
# strings are immutable, changing the letters at indices gives error
# var3[0] = 'r'

In [21]:
# Slicing
var4 = var3[1:]
print(var4)

at


In [22]:
var5 =  'r' + var4
print(var5)

rat


In [25]:
# Fomatting strings
name = "Tom"
age = 15
print("My name is %s, and I am %s years old." %(name, age))

My name is Tom, and I am 15 years old.


In [26]:
# triple quotation marks are for long, multi-line strings
para = """
This is a long string
spread across multiple lines
like a paragraph
"""
print(para)


This is a long string
spread across multiple lines
like a paragraph



In [27]:
var6 = "hello, world!"
var7 = "world"
var6.find(var7) # finding index of substring

7

In [28]:
type(var7)

str

In [29]:
type(var7.encode('utf-8')) # encoding of string to different format

bytes

#### Lists

In [30]:
# list is a collection of data of any type
list1 = ['a', 'b', 'c', 1, 2, 3]
print(list1)

['a', 'b', 'c', 1, 2, 3]


In [31]:
type(list1)

list

In [32]:
# adding new data to list, using append() method
list1.append(4)
print(list1)

['a', 'b', 'c', 1, 2, 3, 4]


In [33]:
# appending a new list to another list; complete list is treated as a sublist.
list2 = ['x', 'y', 'z']
list1.append(list2)
print(list1)

['a', 'b', 'c', 1, 2, 3, 4, ['x', 'y', 'z']]


In [34]:
# appending a new list to another list as individual elements to existing list
list1.extend(list2)
print(list1)

['a', 'b', 'c', 1, 2, 3, 4, ['x', 'y', 'z'], 'x', 'y', 'z']


In [35]:
# adding a new element to list at a specific index, use insert() method
list1.insert(2, "hello")
print(list1)

['a', 'b', 'hello', 'c', 1, 2, 3, 4, ['x', 'y', 'z'], 'x', 'y', 'z']


In [36]:
# removing a list from an existing list
list1.remove(list2)
print(list1)

['a', 'b', 'hello', 'c', 1, 2, 3, 4, 'x', 'y', 'z']


In [37]:
# remove last element from the list
list1.pop()
print(list1)

['a', 'b', 'hello', 'c', 1, 2, 3, 4, 'x', 'y']


In [38]:
# number of items in a list
len(list1)

10

In [39]:
list1.append('x')
print(list1)

['a', 'b', 'hello', 'c', 1, 2, 3, 4, 'x', 'y', 'x']


In [40]:
# count the frequency of an element in a list
list1.count('x') # element 'x' appears twice in the list

2

In [43]:
# sorting elements in a list; however in case of sort(), all elements in the list must of be of same type
list3 = [2, 4, 1, 6, 8, 3]
list3.sort()
print(list3)

[1, 2, 3, 4, 6, 8]


In [46]:
# reversing elements of a list
list3.reverse()
list3

[8, 6, 4, 3, 2, 1]

In [47]:
# slicing elements from a list
list4 = list3[2:4] # start index is inclusive; end index is exclusive
print(list4)

[4, 3]


#### Tuples

In [48]:
# tuple is a group of items, separated by commas; 
# the main difference between tuples and lists is, tuples are immutable and lists are mutable.
tup1 = (1, 2, 3)
print(tup1)

(1, 2, 3)


In [49]:
type(tup1)

tuple

In [50]:
# adding an item to tuple
# first copy the tuple as a list, add a new element to the list and converting the list back to tuple
list_tup = list(tup1)
list_tup.append(4)
tup2 = tuple(list_tup)
print(tup2)

(1, 2, 3, 4)


In [51]:
# adding two tuples together; it's like adding new elements to tuples
tup3 = (5, 6)
tup4 = tup2 + tup3
print(tup4)

(1, 2, 3, 4, 5, 6)


In [57]:
len(tup4)

6

In [59]:
# though tuples are immutable; there is one exception
# if a tuple contains a list as an element, then we can be able to change the elements in that list
tup5 = (1, 2, 3, [4, 5, 6])
print(tup5)

(1, 2, 3, [4, 5, 6])


In [63]:
tup5[3][1] = 'New'

In [64]:
print(tup5)

(1, 2, 3, [4, 'New', 6])


In [56]:
# tuples cannot have just one element; if you really want to create a tuple with just one element then add a comma(,)
tup6 = (5,)
print(tup6)

(5,)


#### Dictionaries

In [65]:
# a dictionary contains key-value pairs
dict1 = {'name':'Raghu', 'age':34}
print(dict1)

{'name': 'Raghu', 'age': 34}


In [66]:
type(dict1)

dict

In [67]:
# getting the list of keys
dict1.keys()

dict_keys(['name', 'age'])

In [68]:
# getting the list of values
dict1.values()

dict_values(['Raghu', 34])

In [69]:
dict1['name']

'Raghu'

In [70]:
dict2 = {}

In [71]:
dict2['Name'] = 'Ramya'
dict2['Age'] = 29
print(dict2)

{'Name': 'Ramya', 'Age': 29}


In [74]:
# adding a new key to dictionary
dict2['Country'] = "India"
print(dict2)

{'Name': 'Ramya', 'Age': 29, 'Country': 'India'}


In [75]:
# removing a key from dictionary
del dict2['Age']
print(dict2)

{'Name': 'Ramya', 'Country': 'India'}


In [77]:
dict_list = [dict1, dict2]
print(dict_list)

[{'name': 'Raghu', 'age': 34}, {'Name': 'Ramya', 'Country': 'India'}]


#### Sets

In [78]:
# sets - unordered collection of sequence of unique items
set1 = {1, 2, 3, 4, 5, 6}
print(set1)

{1, 2, 3, 4, 5, 6}


In [79]:
type(set1)

set

In [80]:
set2 = set('bbbddddccccaaaa')
print(set2)

{'a', 'd', 'b', 'c'}


In [82]:
set3 = set('abcde')
print(set3)

{'c', 'a', 'e', 'b', 'd'}


In [84]:
# set union
set4 = set2 | set3
print(set4)

{'c', 'a', 'e', 'b', 'd'}


In [85]:
# set intersection
set5 = set2 & set3
print(set5)

{'a', 'd', 'b', 'c'}


In [88]:
# set difference
set6 = set3 - set2 # returns items that are in first set, but not found in second set
print(set6)

{'e'}


In [89]:
# checking for subset
set2.issubset(set3) # check if one set is subset of another set

True

In [90]:
# superset
set3.issuperset(set2)

True