## Python Project: File handling and string manipulation

A sample script of Game of Thrones was taken from the page and stored in the file conv.txt in
the dataset provided.

**Tasks:**
1. Find out the number of unique dialogue speakers in the sample conversation.
2. Create a new text file by the name of the dialogue speaker and store the unique words spoken by that character in the respective text file. Make sure there is only one wordevery line.

In [1]:
# importing required libraries
import string
from pprint import pprint

In [2]:
# Opening the text file in read only mode and adding each line to a list
with open('F:\\Python\\Python(Course 1)\\Project 1\\conv.txt', 'r') as f:
    lines = f.readlines()

# printing the lines in the file
pprint(lines)

['WILL: Iâ€™ve never seen wildlings do a thing like this. Iâ€™ve never seen a '
 'thing like this, not ever in my life.\n',
 '\n',
 'WAYMAR ROYCE: How close did you get?\n',
 '\n',
 'WILL: Close as any man would.\n',
 '\n',
 'GARED: We should head back to the wall.\n',
 '\n',
 'ROYCE: Do the dead frighten you?\n',
 '\n',
 'GARED: Our orders were to track the wildlings. We tracked them. They wonâ€™t '
 'trouble us no more.\n',
 '\n',
 'ROYCE: You donâ€™t think heâ€™ll ask us how they died? Get back on your '
 'horse.\n',
 '\n',
 'WILL: Whatever did it to them could do it to us. They even killed the '
 'children.\n',
 '\n',
 'ROYCE: Itâ€™s a good thing weâ€™re not children. You want to run away south, '
 'run away. Of course, they will behead you as a deserter â€¦ If I donâ€™t '
 'catch you first. Get back on your horse. I wonâ€™t say it again.\n',
 '\n',
 'ROYCE: Your dead men seem to have moved camp.\n',
 '\n',
 'WILL: They were here.\n',
 '\n',
 'GARED: See where they went.\n',
 '\n',

In [3]:
# total number of lines in the text file
print("Number of lines in the file is:",(len(lines)))

Number of lines in the file is: 222


In [4]:
# removing the blank lines
for line in lines:
    if line == '\n':
        lines.remove(line)

pprint(lines)

['WILL: Iâ€™ve never seen wildlings do a thing like this. Iâ€™ve never seen a '
 'thing like this, not ever in my life.\n',
 'WAYMAR ROYCE: How close did you get?\n',
 'WILL: Close as any man would.\n',
 'GARED: We should head back to the wall.\n',
 'ROYCE: Do the dead frighten you?\n',
 'GARED: Our orders were to track the wildlings. We tracked them. They wonâ€™t '
 'trouble us no more.\n',
 'ROYCE: You donâ€™t think heâ€™ll ask us how they died? Get back on your '
 'horse.\n',
 'WILL: Whatever did it to them could do it to us. They even killed the '
 'children.\n',
 'ROYCE: Itâ€™s a good thing weâ€™re not children. You want to run away south, '
 'run away. Of course, they will behead you as a deserter â€¦ If I donâ€™t '
 'catch you first. Get back on your horse. I wonâ€™t say it again.\n',
 'ROYCE: Your dead men seem to have moved camp.\n',
 'WILL: They were here.\n',
 'GARED: See where they went.\n',
 'ROYCE: What is it?\n',
 'JON: Go on. Fatherâ€™s watching.\n',
 'JON: And your mot

In [5]:
# printing number of lines after removing blank lines
print('Number of lines with dialogues: ',len(lines))

Number of lines with dialogues:  111


In [6]:
# initializing an empty set
characters = set()

# using for loop to loop through the list and extracting each character's name,and adding them to a set
for line in lines:
    for char in line:
        if char == ':':        # searching for the special character ':' in each dialogue
            index = line.index(char)
            characters.add(line[0:index])        # extracting the name of the Character and adding it to the set
            break

In [7]:
# removing the duplicate characters but with different name
characters.remove('WAYMAR ROYCE')        # 'WAYMAR ROYCE' and 'ROYCE', being the same character

In [8]:
# printing the names of the unique characters 
print('The unique characters in the script are:')
pprint(characters)

# printing the number of unique characters
print('\nThe number of unique characters in the script is:',len(characters))

The unique characters in the script are:
{'ARYA',
 'BRAN',
 'CASSEL',
 'CATELYN',
 'CERSEI',
 'GARED',
 'JAIME',
 'JON',
 'NED',
 'ROBB',
 'ROBERT',
 'ROYCE',
 'SANSA',
 'SEPTA MORDANE',
 'THEON',
 'WILL'}

The number of unique characters in the script is: 16


In [9]:
'''
Defining a function to extract unique words said by each character.
Taking in the name of the Character as the parameter and the extracting the unique words spoken by the character
and putting them in a list for each Character.
'''
def unique_words(name):
    words = list()
    
    # splitting each line into words and adding to a list
    for line in lines:
        index = line.index(':')        # starting index
        line = line.translate(str.maketrans('', '', string.punctuation))        # removing the punctuations from each line
        line = line.strip()        # removes the leading and trailing white spaces
        
        if name in line:
            words.extend(line[index+1:].split(' '))        # Adding the words spoken by the Character to the list of words
    
    # converting the list of words into a set to get the unique words
    words = set(words)
    print(name,words)
    
    return(words)

In [10]:
# defining a function to write the unique words said by each character in a different text file
def write_file(name):
    file_path = 'F:\\Python\\Python(Course 1)\\Project 1\\' + name +'.txt'        # setting the file path with the name as the name of each Character
    file = open(file_path,'a')
    
    words = unique_words(name)        # getting the unique words using the function we defined

    for word in words:
        file.write(word +'\n')        # writing each word in the file in a new line
    
    print(f'Unique words of {name} written in the text file')
    print('Total number of unique words:',len(words),'\n')
    
    # closing the file
    file.close()

In [11]:
# writing the unique words of each character using the defined function
for character in characters:
    write_file(character)

WILL {'They', 'to', 'it', 'seen', 'here', 'could', 'even', 'man', 'in', 'as', 'the', 'this', 'wildlings', 'Close', 'would', 'did', 'killed', 'do', 'Iâ€™ve', 'a', 'any', 'Whatever', 'like', 'life', 'thing', 'never', 'not', 'us', 'ever', 'my', 'were', 'them', 'children'}
Unique words of WILL written in the text file
Total number of unique words: 33 

JAIME {'told', 'didnâ€™t', 'If', 'knew', 'to', 'know', 'it', 'And', 'him', 'hunting', 'he', 'Jon', 'with', 'someone', 'go', 'on', 'our', 'Their', 'or', 'other', 'their', 'lives', 'Thatâ€™s', 'short', 'choose', 'can', 'an', 'the', 'Arryn', 'boars', 'would', 'who', 'and', 'both', 'do', 'Robert', 'will', 'died', 'a', 'fucking', 'Whatever', 'king', 'skewered', 'But', 'his', 'city', 'of', 'while', 'around', 'too', 'tell', 'way', 'Or', 'by', 'life', 'heâ€™s', 'honor', 'long', 'new', 'job', 'gates', 'whores', 'are', 'off', 'be', 'Hand', 'I', 'without', 'days', 'now', 'heads', 'is'}
Unique words of JAIME written in the text file
Total number of uniq