# NLP - Session 2 - Working with Text files 

#### Working with the text files
 - Working with f-strings for formated print
 - Working with .CSV, .TSV files to read and write
 - Working with %%writefile to create simple .txt files [works in jupyter notebook only]
 - Working with Python’s inbuilt file read and write

## String Formatter

In [1]:
name = "Iron Man"

In [2]:
print("I am the {}".format(name))

I am the Iron Man


In [3]:
print(f"I am the {name}")

I am the Iron Man


## Minimum Width and Allignment Between Column

In [4]:
ds_tutes = [("Python for beginners", 19),
            ("Fearure Selection for ML", 11),
            ("ML Tutorial", 11),
            ("Deep Learning Tutorials", 19)
           ]

In [5]:
ds_tutes

[('Python for beginners', 19),
 ('Fearure Selection for ML', 11),
 ('ML Tutorial', 11),
 ('Deep Learning Tutorials', 19)]

In [6]:
for info in ds_tutes:
    print(info)

('Python for beginners', 19)
('Fearure Selection for ML', 11)
('ML Tutorial', 11)
('Deep Learning Tutorials', 19)


In [8]:
# Now using formatting
for info in ds_tutes:
    print(f"{info[0]:{50}} {info[1]:{10}}") # 50 is the space between 2 columns

Python for beginners                                       19
Fearure Selection for ML                                   11
ML Tutorial                                                11
Deep Learning Tutorials                                    19


## Using >, <, ^
 - :< - Left Allign
 - :> - Right Allign
 - :^ - Center Allign

In [13]:

print("Left Allign")
print()
for info in ds_tutes:
    print(f"{info[0]:<{50}} {info[1]:{10}}")

print()    
print("Right Allign")    
print()
for info in ds_tutes:    
    print(f"{info[0]:>{50}} {info[1]:{10}}")

print()    
print("Center Allign")        
print()
for info in ds_tutes:    
    print(f"{info[0]:^{50}} {info[1]:{10}}")

Left Allign

Python for beginners                                       19
Fearure Selection for ML                                   11
ML Tutorial                                                11
Deep Learning Tutorials                                    19

Right Allign

                              Python for beginners         19
                          Fearure Selection for ML         11
                                       ML Tutorial         11
                           Deep Learning Tutorials         19

Center Allign

               Python for beginners                        19
             Fearure Selection for ML                      11
                   ML Tutorial                             11
             Deep Learning Tutorials                       19


## Working with .tsv and .csv files

In [14]:
import pandas as pd

In [17]:
data = pd.read_csv("data/nlp-spam.tsv", sep="\t")
data.head()

Unnamed: 0,label,message,length,punct
0,ham,"Go until jurong point, crazy.. Available only ...",111,9
1,ham,Ok lar... Joking wif u oni...,29,6
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,155,6
3,ham,U dun say so early hor... U c already then say...,49,6
4,ham,"Nah I don't think he goes to usf, he lives aro...",61,2


In [18]:
data.shape

(5572, 4)

In [19]:
data["label"].value_counts()

ham     4825
spam     747
Name: label, dtype: int64

In [20]:
ham = data[data["label"]=="ham"]
ham.head()

Unnamed: 0,label,message,length,punct
0,ham,"Go until jurong point, crazy.. Available only ...",111,9
1,ham,Ok lar... Joking wif u oni...,29,6
3,ham,U dun say so early hor... U c already then say...,49,6
4,ham,"Nah I don't think he goes to usf, he lives aro...",61,2
6,ham,Even my brother is not like to speak with me. ...,77,2


In [21]:
ham.to_csv("data/ham.tsv", sep="\t")

In [22]:
ham.to_csv("data/ham.csv")

As we can see below there is an additional column for index added when saving it to csv. We can avoid it by ignoring index.

In [23]:
pd.read_csv("data/ham.tsv", sep="\t")

Unnamed: 0.1,Unnamed: 0,label,message,length,punct
0,0,ham,"Go until jurong point, crazy.. Available only ...",111,9
1,1,ham,Ok lar... Joking wif u oni...,29,6
2,3,ham,U dun say so early hor... U c already then say...,49,6
3,4,ham,"Nah I don't think he goes to usf, he lives aro...",61,2
4,6,ham,Even my brother is not like to speak with me. ...,77,2
...,...,...,...,...,...
4820,5565,ham,Huh y lei...,12,3
4821,5568,ham,Will ü b going to esplanade fr home?,36,1
4822,5569,ham,"Pity, * was in mood for that. So...any other s...",57,7
4823,5570,ham,The guy did some bitching but I acted like i'd...,125,1


Ignore index.

In [24]:
ham.to_csv("data/ham.tsv", sep="\t", index=False)

In [25]:
ham.to_csv("data/ham.csv", index=False)

Now we can see the index are not added.

In [26]:
pd.read_csv("data/ham.tsv", sep="\t")

Unnamed: 0,label,message,length,punct
0,ham,"Go until jurong point, crazy.. Available only ...",111,9
1,ham,Ok lar... Joking wif u oni...,29,6
2,ham,U dun say so early hor... U c already then say...,49,6
3,ham,"Nah I don't think he goes to usf, he lives aro...",61,2
4,ham,Even my brother is not like to speak with me. ...,77,2
...,...,...,...,...
4820,ham,Huh y lei...,12,3
4821,ham,Will ü b going to esplanade fr home?,36,1
4822,ham,"Pity, * was in mood for that. So...any other s...",57,7
4823,ham,The guy did some bitching but I acted like i'd...,125,1


## Built in magic command in jupyter %%writefile

In [27]:
%%writefile data/nlp_ses_2.txt
Hello, this is the NLP Lesson
Please like and subscribe to show your support

Writing data/nlp_ses_2.txt


Append lines to above file.

In [28]:
%%writefile -a data/nlp_ses_2.txt
Thanks for watching

Appending to data/nlp_ses_2.txt


## Read and Write using Python inbuilt command

### open()

In [29]:
file = open("data/nlp_ses_2_1.txt", "r")

In [30]:
file

<_io.TextIOWrapper name='data/nlp_ses_2.txt' mode='r' encoding='UTF-8'>

### read()

In [31]:
file.read()

'Hello, this is the NLP Lesson\nPlease like and subscribe to show your support\nThanks for watching\n'

Trying ro read again will print empty string because the read puts the cursor at the end of the file. So after first read there is no more lines to read.

In [32]:
file.read()

''

Lets put the cursor back to the starting point.

### seek()

In [33]:
file.seek(0)

0

Now lets ry to read again. And since the cursor is put back to the starting position we should be able to read lines again.

In [34]:
file.read()

'Hello, this is the NLP Lesson\nPlease like and subscribe to show your support\nThanks for watching\n'

In [35]:
file.seek(0)

0

### readline()
Reads the file line by line.

In [36]:
file.readline()

'Hello, this is the NLP Lesson\n'

In [37]:
file.readline()

'Please like and subscribe to show your support\n'

In [38]:
file.readline()

'Thanks for watching\n'

In [39]:
file.seek(0)

0

### readlines()
Reads all lines at once.

In [40]:
file.readlines()

['Hello, this is the NLP Lesson\n',
 'Please like and subscribe to show your support\n',
 'Thanks for watching\n']

### close()

In [41]:
file.close()

Using `with open` will not require closing the file. As it automatically closes the file.

In [42]:
with open("data/nlp_ses_2.txt") as file:
    text_data = file.readlines()
    print(text_data)

['Hello, this is the NLP Lesson\n', 'Please like and subscribe to show your support\n', 'Thanks for watching\n']


In [43]:
for temp in text_data:
    print(temp)

Hello, this is the NLP Lesson

Please like and subscribe to show your support

Thanks for watching



### strip()
As you can see above there is a new line added after each line. It can be avoided using strip()

In [44]:
for temp in text_data:
    print(temp.strip())

Hello, this is the NLP Lesson
Please like and subscribe to show your support
Thanks for watching


### enumerate()

In [47]:
for i, temp in enumerate(text_data):
    print(str(i) + "  ----->  " + temp.strip())

0  ----->  Hello, this is the NLP Lesson
1  ----->  Please like and subscribe to show your support
2  ----->  Thanks for watching


### File writing

In [48]:
file = open("data/nlp_ses_2_2.txt", "w")

In [49]:
file

<_io.TextIOWrapper name='data/nlp_ses_2_2.txt' mode='w' encoding='UTF-8'>

In [50]:
file.write("This is just another way to write")

33

At this point the file is not yet finished writing. We need to call close() to finish writing.

In [51]:
file.close()

The entire file writing can be done without adding additional open() and close() as shown below.

In [53]:
with open("data/nlp_ses_2_3.txt", "w") as file:
    file.write("This is just another way to write")

Now lets append without opening and closing the file seperatly.

In [54]:
with open("data/nlp_ses_2_3.txt", "a") as file:    
    file.write("This is just another way to write")

In [55]:
with open("data/nlp_ses_2_3.txt", "a") as file:    
    for temp in text_data:
        file.write("This is just another way to write")